A mathematical function used in virtually every AI system has been proven to be twice as stable as previously thought. This discovery affects everything from ChatGPT to self-driving cars, providing stronger guarantees about how these systems will behave when faced with unexpected inputs.
The researchers found that the softmax function, which converts raw numbers into probabilities in AI systems, has a mathematical property called Lipschitz continuity of exactly 1/2 across all distance measurements. This means that small changes to the input will only cause half as much change to the output as was commonly assumed in the AI research community.
The team from the Indian Institute of Technology, Madras used mathematical analysis to prove this tighter bound. They examined how the softmax function transforms numbers into probabilities and showed that previous estimates were too conservative. The method involved analyzing the function's Jacobian matrix and using norm interpolation techniques to establish the uniform bound across all mathematical distance measures.
As shown in their mathematical proofs, the softmax function maintains this 1/2-Lipschitz property regardless of which distance metric researchers use to measure changes. The team demonstrated that this bound is tight - meaning it cannot be improved further - and validated their findings through extensive testing on real AI systems including Vision Transformers, GPT-2, and the 8-billion parameter Qwen3 model. In their experiments with the PIQA dataset, the empirical measurements reached 0.4999, very close to the theoretical maximum of 0.5.
This sharper mathematical understanding has immediate practical implications. For AI systems that use attention mechanisms, like those in language models and computer vision systems, it means researchers can provide stronger guarantees about system stability and robustness. In reinforcement learning, where AI agents learn through trial and error, the improved bound helps ensure more stable training. The researchers showed how their result directly improves existing analysis of scaled cosine similarity attention and provides better conditions for convergence in game-theoretic applications.
The work does have limitations - the analysis focuses specifically on the softmax function itself and doesn't address how this improved bound interacts with other components in complex AI systems. Additionally, while the bound is proven mathematically, its practical impact depends on how researchers incorporate this knowledge into their system designs and analysis methods.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn