A novel framework that leverages causal discovery to efficiently learn hierarchical subgoal structures in reinforcement learning, enabling agents to solve complex long-horizon tasks with sparse rewards.
EPFL & Leiden University
Models subgoal structures as causal graphs and uses targeted interventions for efficient exploration.
Three causally-guided rules prioritize subgoals by causal impact on the final goal.
A tailored causal discovery algorithm for HRL with theoretical guarantees.
Provable improvements for tree and Erdős–Rényi structures with bounded training cost.
Experience the core ideas of the paper through hands-on interactive visualizations.
This is a simplified version of the craftsman environment from the paper. Navigate with arrow keys or tap adjacent cells. Discover why random exploration is inefficient and why causal structure matters!
Discovered Subgoal Structure:
The pickaxe is an AND subgoal — it requires both Wood and Stone (all parents must be achieved).
The agent first learns to achieve basic subgoals (resources) that have no dependencies — these are the root nodes of the causal graph.
Instead of random exploration, the agent strategically intervenes on controllable subgoals — prioritizing those with the highest causal effect on the final goal.
From the interventional data, the agent runs a causal discovery algorithm to identify parent-child relationships among subgoals, building the subgoal structure incrementally.
Newly discovered subgoals are trained and added to the controllable set, enabling the agent to reach deeper levels of the hierarchy until the final goal is achieved.
For tree-structured subgoal graphs, HRC with causal effect ranking achieves remarkable cost reduction compared to random exploration.
Validated on complex Minecraft tasks where the agent must discover crafting recipes through causal relationships between items.
Provably efficient for a variant of random graph structures, with formal cost bounds demonstrating significant improvements.
Dive into the formal analysis, detailed algorithms, and comprehensive experiments. Published at the 42nd International Conference on Machine Learning (ICML 2025), Vancouver.