AIResearch AIResearch
Back to articles
AI

AI Agent Learns to Think and Discover Tools Independently

AI agent learns to think independently and discover tools on its own. This breakthrough enables autonomous problem-solving without predefined workflows.

AI Research
November 14, 2025
3 min read
AI Agent Learns to Think and Discover Tools Independently

A new artificial intelligence system can now think through complex problems, discover the right tools to solve them, and execute solutions in a single, continuous reasoning process. DeepAgent, developed by researchers from Renmin University of China and Xiaohongshu Inc., represents a significant step toward AI systems that can autonomously navigate real-world challenges without being limited to predefined workflows.

The key breakthrough lies in DeepAgent's ability to dynamically discover and use tools from massive collections—ranging from 16,000 real-world APIs to specialized domain-specific tools—while maintaining coherent reasoning throughout long, complex tasks. Unlike traditional AI agents that follow rigid, step-by-step workflows, DeepAgent operates as a unified thinking process where reasoning, tool discovery, and execution happen seamlessly together.

Researchers achieved this through a novel memory management system inspired by human cognition. When DeepAgent encounters obstacles or needs to reconsider its approach, it can "take a breath" by folding its interaction history into three structured memory types: episodic memory (tracking overall progress), working memory (current sub-goals and challenges), and tool memory (experiences with different tools). This folding mechanism prevents the AI from getting stuck on wrong paths while preserving critical information, much like how humans pause to reconsider strategies.

The training methodology, called ToolPO, uses reinforcement learning specifically designed for tool-using agents. Since training with thousands of real-world APIs would be impractical due to cost and instability, the team developed an LLM-based simulator that mimics API responses. Crucially, they implemented advantage attribution that precisely assigns credit to the specific tokens responsible for correct tool invocations, providing fine-grained training signals.

Extensive testing across eight benchmarks reveals DeepAgent's superior performance. On general tool-use tasks like ToolBench (16,000 tools) and ToolHop (3,900 tools), DeepAgent achieved success rates of 64.0% and 40.6% respectively, substantially exceeding the strongest baseline methods. The advantage was even more pronounced in open-set scenarios where agents must dynamically discover which tools to use rather than working with pre-labeled options.

In practical applications, DeepAgent demonstrated remarkable capabilities. On ALFWorld, a text-based embodied AI environment, it achieved 91.8% success rates in navigating virtual worlds using basic actions like "take" and "move." For online shopping tasks in WebShop, it scored 32.0 out of 100, significantly outperforming workflow-based methods that scored only 18.0. The system also excelled at complex information-seeking tasks in the GAIA benchmark, handling web browsing, visual question answering, code compilation, and file reading with 46.7% accuracy.

The real-world implications are substantial. This technology could enable AI assistants that genuinely understand user needs and independently figure out how to fulfill them using available resources. Instead of being limited to pre-programmed capabilities, such systems could discover and learn to use new tools as they become available, adapting to changing environments and requirements.

However, the research acknowledges limitations. The system's performance still depends on the underlying large reasoning model's capabilities, and while it handles thousands of tools effectively, there may be scalability challenges with even larger tool collections. The current implementation also requires significant computational resources for training, though the inference process is efficient enough for practical deployment.

What makes this approach particularly promising is its scalability across different model architectures. When tested with both 30-billion parameter and 235-billion parameter models, DeepAgent maintained significant performance advantages over traditional methods, suggesting the framework generalizes well regardless of the underlying AI model's size.

This research moves us closer to AI systems that don't just follow instructions but genuinely understand problems and creatively assemble solutions using whatever tools are available—a capability that could transform how we interact with artificial intelligence in scientific research, customer service, education, and countless other domains where complex problem-solving is required.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn