Artificial intelligence systems that learn from their own outputs can become trapped in cycles of self-reinforcing simplicity, according to new research from Fudan University. This phenomenon, termed the "Matthew effect" after the sociological concept describing how advantages accumulate, reveals a fundamental limitation in how AI models improve themselves over time.
The researchers discovered that large vision-language models (LVLMs) undergoing self-improvement tend to generate predominantly simple training data while struggling with complex problems. This creates an imbalanced optimization process where models prioritize skills they already excel at, ultimately hindering their ability to tackle challenging tasks. As iterations progress, this imbalance becomes increasingly pronounced—what the researchers call the "Matthew effect in self-improvement."
To investigate this phenomenon, the team conducted experiments using the Qwen2-VL-7B-Instruct model across multiple iterations. They tracked how the model generated and learned from its own responses, analyzing both the difficulty distribution and length of self-generated data. The results showed that easy problems (level 1 difficulty) comprised 51.1% of self-generated data, while the most challenging problems (level 5) were nearly absent at just 1.5%. Additionally, self-generated responses were significantly shorter, averaging 277 tokens compared to 395 tokens in original datasets.
The data revealed clear patterns of performance degradation. While models showed early rapid improvement, they consistently reached performance plateaus and even declined in later iterations. For the most difficult problems, response length decreased dramatically by 56.5% by the fifth iteration, indicating the model was producing abbreviated reasoning processes or delivering conclusions directly without proper chain-of-thought.
To counteract this effect, the researchers developed a four-pronged approach called "head-tail re-balancing." The methodology includes distribution-reshaping strategies like threshold clipping (which limits the number of responses per query) and repeat-based padding (which ensures all difficulty levels appear with equal frequency). It also incorporates trajectory-resampling techniques including adaptive-weighted resampling (which dynamically adjusts sampling weights based on failure rates) and guided resampling (which initializes exploration from intermediate reasoning steps).
Experimental results demonstrated that these re-balancing strategies consistently improved model capabilities across multiple datasets. The Qwen2-VL-7B-Instruct model showed average performance improvements of 3.86 points compared to vanilla self-improvement approaches. The guided resampling method proved particularly effective, achieving performance gains of up to 43.94 points on certain tasks while requiring only minimal additional computational cost.
This research matters because self-improvement has emerged as a mainstream paradigm for advancing AI capabilities, potentially eliminating reliance on manual annotations and promoting better alignment with real-world scenarios. However, the identified Matthew effect represents a significant bottleneck that could limit how far AI systems can advance through self-learning alone. The findings suggest that simply scaling up computational resources or sampling numbers fails to address this fundamental challenge.
The study acknowledges several limitations. The researchers note that their methods, while effective, still face efficiency challenges in navigating vast solution spaces. Additionally, the work primarily focuses on mathematical reasoning tasks, leaving open questions about whether similar effects occur in other domains. The team also observed that optimal performance frequently occurs at suboptimal iterations relative to peak performance, reflecting persistent training bottlenecks.
Future work will explore counteracting the Matthew effect across larger models and broader datasets, alongside developing more efficient re-balancing strategies. The researchers emphasize that understanding and addressing these self-improvement limitations is crucial for developing AI systems that can genuinely advance their capabilities across the full spectrum of problem complexity.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn