AI's Hidden Number Code Reveals Its Flaws

Large language models like ChatGPT often stumble on simple math, but a new study uncovers why: they use a universal, wavy pattern to represent numbers internally, which can lead to predictable errors. This discovery not only explains why AI sometimes fails at arithmetic but also offers a way to track and fix these mistakes, making models more reliable for tasks from data analysis to everyday calculations.

Researchers found that diverse AI models learn nearly identical representations for numbers, characterized by sinusoidal or wave-like structures in their internal layers. This universality means that numbers are processed similarly across different models, regardless of their architecture or training data. For instance, when probed, these representations showed high accuracy in identifying numbers in various contexts, such as mathematical problems or natural language scenarios, with probes achieving over 70% accuracy in many cases.

The methodology involved using specialized probes to analyze the internal activations of models like Llama and OLMo. These probes were trained to decode number representations from different layers, revealing that numbers maintain consistent sinusoidal patterns throughout the model's computations. Key techniques included Representational Similarity Analysis (RSA) and Fourier transformations, which quantified how number embeddings align across models. For example, Figure 2 in the paper shows that top Fourier frequencies for number tokens have perfect agreement, indicating shared structural properties.

Results from the analysis, detailed in figures like 5 and 9, demonstrate that errors in arithmetic operations, such as multiplication and division, often stem from specific layers where number representations diverge. In one case, removing layers responsible for error aggregation reduced division errors by up to 64%. The study also highlighted that while models can internally compute correct answers, these may not surface in the final output, as seen in division tasks where 56.8% of correct internal computations were not reflected in predictions.

This research matters because it provides a pathway to improve AI reliability in real-world applications, such as financial calculations or scientific data processing, where numerical accuracy is critical. By understanding and manipulating these internal representations, developers could enhance model robustness without extensive retraining, potentially reducing errors in AI-driven tools used by millions.

Limitations noted in the paper include that the findings are based on synthetic and natural-language probes, and their applicability to all real-world scenarios remains uncertain. The study did not explore how these representations interact with non-numeric data or complex reasoning tasks, leaving room for future research to address broader model interpretability.

AI's Hidden Number Code Reveals Its Flaws

About the Author

Guilherme A.