When multiple AI systems work together, they can develop collective biases that don't exist when they operate alone. This discovery from researchers at City, St George's University of London and other European institutions reveals that the number of AI agents in a system fundamentally changes how they behave, with important implications for how we deploy AI in finance, defense, and social media.
The researchers found that groups of large language models (LLMs) coordinating on simple tasks can amplify existing biases, create new biases where none existed before, or even override the individual preferences of the AI models themselves. This collective misalignment emerges specifically from interactions between multiple AI agents, not from any single agent's programming.
To study this phenomenon, the team created a coordination game where AI agents had to agree on naming conventions for word pairs like {man, woman}, {straight, gay}, and {American, Mexican}. The agents interacted in pairs, trying to maximize their scores by coordinating on the same word choices. Each agent maintained a memory of recent interactions, and researchers tested four different LLM models: QwQ-32B, Phi-4, GPT-4o, and Meta Llama.
The results showed three distinct patterns of collective bias. In some cases, groups amplified individual biases - if individual agents slightly preferred one word, the group would strongly favor it. More surprisingly, groups could induce biases where individual agents showed no preference at all. Most strikingly, groups sometimes reversed individual preferences, consistently choosing the option that individual agents actually disfavored.
Group size proved crucial to these effects. As populations grew from 2 to 100 agents, the researchers observed a transition from stochastic, unpredictable outcomes to deterministic behavior where larger groups almost always converged on the same convention. The threshold where this determinism emerged varied significantly across different AI models and word pairs, ranging from as few as 6 agents to over 100.
These findings matter because multi-agent AI systems are rapidly expanding into critical domains. The market for such systems is projected to grow from $5.1 billion in 2024 to $47.1 billion by 2030, with applications in finance, defense, energy management, and social media. The International AI Safety Report 2025 has already warned about risks including systemic failures and misaligned collective behaviors in multi-agent systems.
The research highlights a key limitation in current AI testing practices, which typically evaluate single agents rather than groups. Since collective biases only emerge at specific population sizes, standard safety testing might miss critical risks that appear only when AI systems are deployed at scale. The team's analytical framework shows that above a critical group size, systems converge to deterministic predictions that expose basins of attraction for competing equilibria.
While the study focused on simple coordination games, the researchers suggest similar nonlinear, scale-dependent effects likely manifest in other interaction-driven phenomena including bias amplification, collusion, deception, and cooperation. This calls for broader research to understand group effects across different tasks and domains, crucial for developing reliable frameworks to predict and control complex AI behaviors in large-scale deployments.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn