AI Agents Now Discover Without Human Help

A new AI system can autonomously tackle complex scientific and engineering problems, from optimizing machine learning models to solving mathematical puzzles, without human intervention. Developed by researchers at Baidu Cloud, the FM Agent framework represents a leap in automating research and development, potentially accelerating innovation across industries. This breakthrough addresses the challenge of finding high-performing solutions in domains where traditional methods rely heavily on expert input and iterative tuning.

The core finding is that FM Agent achieves state-of-the-art results in multiple areas. On the MLE-Bench, which evaluates machine learning tasks from Kaggle competitions, it achieved a 43.56% medal rate, outperforming human benchmarks and other AI systems. In algorithm design, as shown in ALE-Bench, it scored 1976.3, a 5.2% improvement over previous methods. For GPU kernel optimization on KernelBench, it delivered speedups of up to 20.77 times compared to standard compilers like torch.compile. These results were obtained autonomously, with no manual tuning or interpretation required.

The methodology combines large language model (LLM) reasoning with evolutionary algorithms in a multi-agent system. It starts with a cold-start initialization that uses optional expert guidance to generate diverse candidate solutions. Then, an adaptive diversity-driven sampling mechanism balances exploration and exploitation across parallel 'islands' of solutions. Domain-specific evaluators assess correctness, effectiveness, and quality using LLM-supervised feedback. The system runs on a distributed infrastructure built with Ray, enabling asynchronous execution for scalability.

Analysis of the data reveals consistent performance gains. In machine learning, FM Agent improved feature engineering for tasks like the American Express default prediction, boosting scores by 0.003 through iterative refinement, as illustrated in Figure 7. For kernel optimization, it quickly converged on efficient designs, such as in the CosyVoice2 model, where it enhanced operators like FeedForward and SinusoidalPosEmb within few iterations, shown in Figure 8. In mathematics, it solved problems like circle packing and uncertainty inequalities, achieving new benchmarks, with visualizations in Figures 9 and 10 confirming the solutions' optimality.

This advancement matters because it can automate labor-intensive R&D workflows in enterprises, reducing the need for specialized human effort. For instance, in combinatorial optimization, it designs heuristics for logistics and production, while in kernel development, it speeds up AI model training. The implications extend to faster drug discovery, financial modeling, and more, making complex problem-solving accessible to non-experts and boosting productivity.

Limitations include the system's dependence on the initial problem framing and the computational resources required for distributed execution. The paper notes that performance can vary with problem settings, and not all domains may see uniform improvements. Future work could focus on reducing resource demands and expanding to more real-world scenarios.

AI Agents Now Discover Without Human Help

About the Author

Guilherme A.