AI Agents Tackle Complex Tasks by Thinking in Trees

TL;DR

A new method breaks goals into steps, doubling success rates in simulations and letting smaller AI models match larger ones.

Artificial intelligence systems often struggle with complex, multi-step tasks like navigating a home to find and move objects, but a new approach called ReAcTree transforms how AI plans and acts. Researchers from the Electronics and Telecommunications Research Institute and the University of Science and Technology in Daejeon, South Korea, developed this hierarchical method, which constructs a dynamic tree of subgoals handled by individual AI nodes. In experiments, ReAcTree achieved a 61% goal success rate on the WAH-NL household task dataset using the Qwen 72B model, nearly doubling the 31% rate of the previous best method, ReAct. Even with smaller models, such as the 8B parameter version, it reached 37%, showing that efficient task decomposition can narrow the performance gap between large and small AI systems.

The key finding is that ReAcTree decomposes long-horizon tasks into simpler subgoals, preventing error accumulation common in single-process approaches. Each node in the tree uses a large language model (LLM) to reason, act, or expand the tree by proposing new subgoals, while control nodes coordinate strategies like sequencing or fallback to handle dependencies and recover from failures. For example, in a task to bring a pudding and cupcake to a coffee table, ReAcTree might break it into subgoals like 'find the pudding' and 'move the pudding to the table', executing them in parallel to improve efficiency and accuracy.

Methodologically, ReAcTree builds on the ReAct paradigm by adding hierarchical decomposition and memory systems. It starts with a top-level goal and dynamically grows the tree as agents decide to expand. The researchers integrated episodic memory, which retrieves goal-specific examples from past successful runs to guide decision-making, and working memory, which shares environment-specific observations like object locations among nodes to reduce redundant searches. Experiments used simulators such as VirtualHome and AI2THOR under partially observable settings, where agents receive limited feedback after each action, mimicking real-world constraints.

Results from the WAH-NL and ALFRED datasets show consistent improvements. On WAH-NL, ReAcTree outperformed baselines including Zero-Shot Planner (ZSP) and Tree-Planner across various LLMs, with subgoal success rates rising from 54.05% to 79.58% for Qwen 72B. Ablation studies confirmed that both memory components are complementary; combining them yielded the highest performance, while control flow types like sequence and fallback were crucial for handling complex dependencies. Limitations include occasional hallucinations by the LLMs, difficulties in recognizing task failures, and an inability to revise incorrectly expanded subgoals, pointing to areas for future refinement.

In context, this advancement matters for developing more reliable autonomous agents in homes, warehouses, or other settings where robots must execute multi-step commands. By making long-horizon tasks tractable, ReAcTree could enhance AI applications in robotics and assistive technologies, though it does not yet address ambiguous instructions or enable clarification dialogues. The research underscores that hierarchical planning, rather than merely scaling model size, is fundamental to robust AI decision-making in unpredictable environments.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn