Large language models can now solve complex math problems more efficiently by learning when to stop thinking. Researchers have developed a method that combines reinforcement learning with budget forcing to help AI models use computational resources more wisely while improving accuracy.
The key finding shows that this approach boosts mathematical reasoning performance by 14% while reducing computational usage by 41.9%. The method prevents AI models from "overthinking" problems, where they generate excessive reasoning steps without improving accuracy. This breakthrough addresses a fundamental challenge in AI efficiency: how to get better results without simply throwing more computing power at the problem.
The researchers used a three-stage pipeline starting with a 1.5 billion parameter model called Qwen2.5-1.5B-Instruct. First, they applied supervised fine-tuning using only 1,500 specially curated samples containing mathematical reasoning traces. This small dataset included keywords like "Wait" and "Alternatively" to teach the model self-correction behavior. Then they implemented reinforcement learning using a novel Group-Relative Policy Optimization (GRPO) algorithm that compares responses within groups rather than relying on separate reward networks.
The methodology employed budget forcing during inference, where the system controls how many "thinking tokens" the model generates before providing an answer. If the model generates too many tokens, the system forcibly ends the reasoning phase. If it stops too early, the system injects a "Wait" token to encourage more thinking. This dynamic control ensures the model uses its computational budget effectively.
Results from testing on the GSM8K mathematical reasoning dataset revealed significant improvements. The combined SFT+RL approach achieved 67% accuracy at the 16-step level, representing a 14% gain over the default model. More importantly, it reduced average token usage from 1,839 tokens to just 1,069 tokens—a 41.9% reduction in computational cost. The method also improved scaling efficiency, meaning the model got better at utilizing additional computational resources when available.
The real-world implications are substantial for making AI more accessible and cost-effective. Current AI models often require massive computational resources, limiting their deployment in resource-constrained environments. This approach demonstrates that smarter thinking strategies can achieve better results with less computing power, potentially lowering the cost of AI applications in education, research, and business. The method could help deploy capable AI systems on smaller devices or in situations where computational resources are limited.
However, the study acknowledges limitations. The 1.5 billion parameter model may not capture all the complex behaviors present in the training data, suggesting potential overfitting. The researchers also noted that the model struggled to achieve meaningful results on more advanced benchmarks like AIME and MATH500, indicating that the approach may have limitations with highly complex mathematical problems. Future work will explore larger models and extend evaluations to multiple domains to investigate broader applications.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn