AI That Thinks Fast or Deep Cuts Computing Costs 60%

TL;DR

A new training method teaches AI to switch between quick answers and deep reasoning, matching top models while using far less energy.

Artificial intelligence models often struggle with efficiency, spending excessive time and energy on simple tasks while needing careful thought for complex ones. Researchers have developed a new approach that mimics human cognitive flexibility, allowing AI to automatically choose between fast, intuitive responses and slow, analytical reasoning based on task difficulty. This breakthrough, detailed in a recent paper, could make AI systems more practical and cost-effective for real-world applications, from answering visual questions to solving math problems, without sacrificing accuracy.

The key finding is that a visual language model named DualMindVLM can dynamically switch between two thinking modes: fast thinking for straightforward queries and slow thinking for challenging ones. This dual-mode capability enables the model to achieve performance on par with state-of-the-art visual reasoning models while using significantly fewer computational resources. For example, on the MMStar benchmark, DualMindVLM maintains high accuracy with much lower token usage compared to models that always engage in detailed step-by-step reasoning, as shown in Figure 2 of the paper. This efficiency stems from avoiding unnecessary elaboration on simple tasks, such as recognizing an emoji, where other models produce lengthy responses despite the ease of the question.

Ology involves a two-stage reinforcement learning process. In the first stage, the researchers automatically label training data as requiring fast or slow thinking based on the output length of a pre-trained base model. They observed that shorter responses typically correspond to easier problems, while longer ones indicate harder tasks, as illustrated in Figure 4. This labeling uses thresholds: samples with average response lengths below 100 tokens are marked for fast thinking, and those above 200 tokens for slow thinking. In the second stage, the model is trained using Group Relative Policy Optimization (GRPO) with these labels to develop the ability to switch modes. The training includes a hybrid approach where half the responses are guided by thinking mode prefixes, and half are generated freely, helping the model learn when to apply each mode effectively.

From extensive experiments on six multimodal benchmarks demonstrate DualMindVLM's effectiveness. As reported in Table 1, it outperforms the base model Qwen2.5-VL by up to 7.4% in accuracy on MathVista and reduces average response length across all benchmarks. Compared to leading reasoning models like VL-Rethinker and OpenVLThinker, DualMindVLM achieves competitive or superior accuracy while using 40% fewer tokens on average. Figure 7 highlights that it saves up to 60% in tokens compared to a GRPO model without dual-mode training. Additionally, the model shows balanced thinking mode selection, favoring slow thinking for math tasks and fast thinking for perceptual ones, as seen in Figure 8, and it reduces hallucination risks, outperforming other models on the HumbleBench benchmark.

Of this research are significant for deploying AI in resource-constrained environments. By mimicking human-like cognitive efficiency, DualMindVLM can lower computational costs and energy usage, making AI more accessible for applications like educational tools, customer service, or scientific analysis. The ability to adapt reasoning depth based on task complexity could also improve user experience by providing quicker responses for simple queries without compromising on detailed explanations for complex problems. This approach aligns with broader efforts to develop more sustainable and practical AI systems that balance performance with efficiency.

However, the study acknowledges limitations. The thinking mode auto-labeling strategy may introduce biases, as seen in a failure case where the model incorrectly selects fast thinking for a chart-based question that requires slow reasoning, potentially due to training data linking chart tasks to fast thinking. The labeling thresholds are also somewhat arbitrary, and variations can affect model behavior, though experiments show consistent performance across different settings. Future work could explore more nuanced labeling s or address scalability to larger datasets, as increasing data scale does not always yield better for simpler tasks, as indicated in Figure 11.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn