🌟 Published at ICML 2025

Hierarchical Reinforcement Learning with Targeted Causal Interventions

A novel framework that leverages causal discovery to efficiently learn hierarchical subgoal structures in reinforcement learning, enabling agents to solve complex long-horizon tasks with sparse rewards.

Sadegh Khorasani·Saber Salehkaleybar·Negar Kiyavash·Matthias Grossglauser

EPFL & Leiden University

Key Contributions

HRC Framework

Models subgoal structures as causal graphs and uses targeted interventions for efficient exploration.

Targeted Interventions

Three causally-guided rules prioritize subgoals by causal impact on the final goal.

Causal Discovery

A tailored causal discovery algorithm for HRL with theoretical guarantees.

Formal Analysis

Provable improvements for tree and Erdős–Rényi structures with bounded training cost.

Interactive Demos

Experience the core ideas of the paper through hands-on interactive visualizations.

🧑‍🔧 Mini-Craft: Experience Subgoal Discovery

This is a simplified version of the craftsman environment from the paper. Navigate with arrow keys or tap adjacent cells. Discover why random exploration is inefficient and why causal structure matters!

🪵 Wood: 0

🪨 Stone: 0

⛏️ Pickaxe: 0

Steps: 0/60

Navigate the craftsman to collect Wood and Stone, then craft a Pickaxe at the Workbench!

🧑‍🔧

🧱

🪵

🧱

🪵

🧱

🪨

🧱

🔨

🧱

🪨

🧱

Discovered Subgoal Structure:

🪵 Wood

→

⛏️ Pickaxe (AND)

←

🪨 Stone

The pickaxe is an AND subgoal — it requires both Wood and Stone (all parents must be achieved).

How HRC Works

🌱 Pre-train Root Subgoals

The agent first learns to achieve basic subgoals (resources) that have no dependencies — these are the root nodes of the causal graph.

🎯 Targeted Causal Interventions

Instead of random exploration, the agent strategically intervenes on controllable subgoals — prioritizing those with the highest causal effect on the final goal.

🔍 Causal Discovery

From the interventional data, the agent runs a causal discovery algorithm to identify parent-child relationships among subgoals, building the subgoal structure incrementally.

🚀 Train & Expand

Newly discovered subgoals are trained and added to the controllable set, enabling the agent to reach deeper levels of the hierarchy until the final goal is achieved.

Key Results

Tree Structures

For tree-structured subgoal graphs, HRC with causal effect ranking achieves remarkable cost reduction compared to random exploration.

Minecraft Environment

Validated on complex Minecraft tasks where the agent must discover crafting recipes through causal relationships between items.

Erdős–Rényi Graphs

Provably efficient for a variant of random graph structures, with formal cost bounds demonstrating significant improvements.

Read the Full Paper

Dive into the formal analysis, detailed algorithms, and comprehensive experiments. Published at the 42nd International Conference on Machine Learning (ICML 2025), Vancouver.

Read on ArXiv View Source Code