AI Learns Math First, Then Everything Else

Large language models are becoming increasingly capable of complex reasoning, but most training approaches focus narrowly on single domains like coding or mathematics. A new method demonstrates that starting with math training creates a foundation that transfers effectively to diverse reasoning tasks, from scientific questions to programming challenges.

The researchers discovered that a curriculum beginning with mathematics reinforcement learning, followed by joint training across multiple domains, produces compact models that match or exceed the performance of larger specialized systems. Using Qwen3-4B and Llama3.1-8B models, their approach achieved competitive results across math, coding, STEM, logic, simulation, and tabular reasoning benchmarks while requiring minimal modifications to the base architecture.

The methodology follows a three-stage process. First, models undergo cold-start supervised fine-tuning on math problems to expose reasoning patterns. Second, they receive reinforcement learning specifically in the math domain using DAPO (Distributed Advantage Policy Optimization), which filters groups of responses to ensure meaningful learning signals. Finally, the models engage in joint reinforcement learning across six domains: math, STEM, coding, simulation, logic, and tabular reasoning. The system uses three evaluation methods: rule-based matching for structured answers, model-based assessment for free-form responses, and execution-based testing for code.

Results show consistent improvements across domains. The curriculum-trained Qwen3-4B model achieved 86.0% on MATH500 (compared to 77.25% for baseline models), 58.08% on HumanEval coding tasks, and 39.60% on GPQA scientific questions. For Llama3.1-8B, the approach yielded 74.40% on MATH500 and 60.40% on HumanEval. Analysis revealed increased usage of advanced reasoning skills like backtracking and verification in curriculum-trained models, with backtracking frequency rising from minimal levels to 20-30% in some domains. The math-first stage proved particularly crucial—removing it caused performance drops across most benchmarks.

This approach matters because it demonstrates that mathematical reasoning serves as an effective driver for discovering core cognitive skills that transfer to other domains. For practical applications, this means AI systems could become more versatile without requiring massive scale or domain-specific training. The method's backbone-agnostic nature makes it accessible for various model architectures, potentially enabling more efficient AI development.

The research acknowledges several limitations. The approach shows varying effectiveness across domains, with logic tasks requiring more domain-specific training even after the curriculum. Performance improvements aren't uniform—some coding benchmarks showed temporary regression during math-focused training before recovering in joint training. The method also relies on the availability of high-quality, verifiable reward signals across domains, which may not exist for all real-world applications.

AI Learns Math First, Then Everything Else

About the Author

Guilherme A.