A2 Flow: How AI Agents Build Their Own Tools Automatically

TL;DR

Huawei's framework extracts reusable operators from expert data to automate workflow generation, cutting manual work and boosting performance.

The promise of autonomous AI agents has long been hampered by a fundamental bottleneck: the need for human engineers to manually design the workflows these systems follow. While large language models (LLMs) have demonstrated impressive reasoning capabilities, translating that into reliable, multi-step agentic behavior has required crafting intricate sequences of prompts and actions—a labor-intensive process that limits scalability and adaptability. A new research paper from Huawei Noah's Ark Lab, titled "Agentic Workflow Generation via Self-Adaptive Abstraction Operators," proposes a radical solution: A2 Flow, a framework that allows AI agents to automatically generate their own workflows by learning reusable building blocks, or "operators," directly from data. This approach not only removes the human from the loop for initial design but also enables systems to generalize more effectively to novel tasks, from code generation to controlling robots in virtual environments.

The core innovation of A2 Flow is its three-stage pipeline for autonomously extracting what the authors call "self-adaptive abstraction operators." Instead of relying on a predefined library of manually coded operators—common in prior systems like AFLOW or DebFlow—A2 Flow starts from raw expert demonstrations. In the first stage, Case-based Initial Operator Generation, the framework uses an LLM to analyze example tasks and their solutions, generating a set of case-specific, granular operators. For an embodied task like cleaning a tomato in a simulated kitchen, this might yield operators like "PotatoLocationFinder" or "MicrowavePlacer." The second stage, Operator Clustering and Preliminary Abstraction, then groups these similar operators across different tasks to form more generalized versions, such as "TaskPlanner" or "ObjectInteractor." Finally, the Deep Extraction for Abstract Execution Operators stage employs long chain-of-thought prompting and multi-path reasoning to refine these into compact, high-level execution operators like "Planner()," "Executor()," and "Validator()." This entire process is automated, requiring no manual predefinition, and in a set of reusable, task-aware building blocks that serve as the foundation for workflow construction.

To evaluate A2 Flow, the researchers conducted extensive experiments across eight benchmark datasets spanning five distinct domains: code generation (HumanEval, MBPP), mathematical reasoning (GSM8K, MATH), reading comprehension (HotpotQA, DROP), embodied tasks (ALFWorld), and games (TextCraft). are compelling. On general benchmarks like DROP and MATH, A2 Flow achieved an average performance improvement of 2.4% over state-of-the-art baselines, including AFLOW. More strikingly, on embodied and game tasks, it delivered a 19.3% average improvement, demonstrating superior generalization to complex, open-world scenarios. The framework also proved highly efficient, reducing resource usage by 37% compared to baselines, as shown in cost-performance analyses. For instance, on the DROP benchmark with GPT-4o as the executor, A2 Flow outperformed AFLOW by 1.62% in accuracy while using only 51.37% of the computational resources—a nearly halved cost for better .

Of this research are profound for the future of AI automation. By decoupling workflow generation from human expertise, A2 Flow addresses critical limitations in scalability and adaptability. The authors describe the self-adaptive abstraction operators as central to enabling "fully automated workflow generation," which could accelerate the deployment of AI agents in diverse fields such as software engineering, scientific research, and robotics. The integrated Operators Memory Mechanism, which retains historical outputs to enrich context for decision-making, further enhances the system's ability to learn from experience and improve over time. This moves us closer to a paradigm where AI systems can not only execute tasks but also design and optimize their own problem-solving strategies, potentially unlocking new levels of autonomy and efficiency in real-world applications.

However, the study acknowledges certain limitations. While A2 Flow excels in many domains, its performance on some code generation benchmarks like HumanEval was more modest, as pre-defined operators (including Python interpreters) in baselines set a high bar that abstract operator optimization alone could not surpass. The framework also depends on the quality of the initial expert demonstrations and the reasoning capabilities of the underlying LLMs used for operator extraction. Future work could explore integrating more diverse data sources or refining the clustering algorithms to handle edge cases better. Nonetheless, the paper presents a significant leap toward automating agentic workflow design, offering a blueprint for more generalizable and resource-efficient AI systems that can build their own tools from the ground up.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn