How Transformers Keep Every Prompt Unique in AI Models

TL;DR

Learn why transformer injectivity matters for AI safety and reliability, and how it ensures distinct inputs always produce distinct outputs.

In the rapidly evolving world of artificial intelligence, a groundbreaking study from AMD and Silo AI reveals that large language models (LLMs) like LLaMA-3 and Qwen are inherently designed to keep input prompts distinct in their internal representations, a property known as injectivity. Under real-analytic assumptions—such as using smooth activations like GELU and LayerNorm—the research demonstrates that for any finite set of prompts, the mapping to last-token hidden states is generically injective, meaning different prompts almost always produce unique internal states. This finding, detailed in the arXiv preprint 'Robustness Analytic Margins Bi-Lipschitz Uniformity of Sequence-Level Hidden States' by Mikael von Strauss and colleagues, s earlier concerns about model invertibility and memorization, offering a fresh perspective on AI robustness and privacy. By defining collision discriminants and injective strata in parameter space, the study shows that either a model is never injective or it is injective for an open, dense set of parameters, with this property persisting through typical training dynamics like gradient descent, assuming non-singular updates and absolutely continuous initializations.

Ology combines rigorous theoretical proofs with extensive empirical validation, focusing on decoder-only Transformers. Analytically, the team establishes that for each layer, the set of parameters causing collisions between prompts has Lebesgue measure zero, making injectivity a generic trait. They introduce layerwise geometric diagnostics, including a separation margin—the minimum Euclidean distance between representations of distinct prompts—and a co-Lipschitz constant, which measures how much the representation changes per unit of Hamming distance in the prompt space. These metrics are estimated using nearest-neighbor statistics on large, diverse prompt sets derived from sources like IMDB reviews and C4 data, with experiments conducted on AMD Instinct MI300X GPUs using PyTorch. The empirical analysis spans models from the LLaMA-3 and Qwen families, varying in size from 0.5B to 8B parameters, and examines effects across layers, sequence lengths, and post-hoc quantization levels, ensuring robust, scalable insights into model behavior.

From the study highlight that in full precision and under 8-bit activation quantization, no exact collisions were observed on sampled prompt sets, affirming generic injectivity. However, aggressive 4-bit quantization introduced a modest number of collisions, particularly in deeper layers, due to the non-injective nature of the quantizer itself. The research found that raw separation margins and co-Lipschitz estimates grow with depth, driven by norm inflation, but their normalized counterparts remain stable, indicating intrinsic geometric consistency. For instance, in LLaMA-3 models, normalized margins clustered around 0.2 with co-Lipschitz constants near 0.002, showing minimal drift during training in smaller models like GPT-2. Longer sequence lengths led to reduced co-Lipschitz constants, suggesting more contractive behavior in hard prompt pairs, while architectural families exhibited distinct geometric fingerprints—LLaMA models were more expansive than Qwen, underscoring how training recipes influence invertibility.

Of this work are profound for AI safety, interpretability, and deployment. By providing simple geometric diagnostics, it enables practitioners to assess how close a model is to losing injectivity under perturbations like quantization, potentially guiding decisions on model selection and optimization for privacy-sensitive applications. For example, the robust injectivity radius, derived as half the separation margin, offers a practical threshold for determining safe quantization levels, with the study showing that 4-bit quantization can breach this boundary in deeper layers. This ties into broader concerns about data memorization and inversion attacks, as injectivity ensures that internal states do not leak prompt information easily. Moreover, the persistence of injectivity under smooth training trajectories suggests that standard optimization s preserve this property, reinforcing the reliability of modern LLMs in real-world scenarios where parameter adjustments are common.

Despite its strengths, the study has limitations, primarily its reliance on real-analytic assumptions that may not fully capture non-smooth elements like ReLU activations or hard clipping in training. The empirical are based on finite prompt samples, not the full combinatorial space, and the diagnostics, while insightful, do not directly correlate with the success of inversion attacks or reconstruction algorithms. Future work could extend these geometric measures to other architectures, modalities, and training regimes, exploring their utility in adversarial settings or for enhancing model transparency. Overall, this research bridges theory and practice, offering a unified view of Transformers as generically injective systems whose practical invertibility can be probed with accessible tools, paving the way for more robust and interpretable AI systems.

Reference: von Strauss, M., 2025, arXiv preprint.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn