AI Models Fail to Spot Lies in Social Games

TL;DR

A new study shows advanced AI systems can't reliably detect deception in group interactions, revealing a key gap in social intelligence.

Artificial intelligence systems that can understand human social dynamics are essential for future collaboration between people and machines, but a new study reveals a fundamental weakness: current AI cannot reliably detect deception in complex group settings. Researchers from the University of Tokyo have developed a rigorous test showing that even the most advanced multimodal large language models (MLLMs) struggle to distinguish truth from falsehood in multi-party social interactions. This finding exposes a critical limitation in AI's ability to function as competent social agents, with for everything from conversational assistants to content moderation platforms.

The researchers discovered that state-of-the-art AI models, including GPT-4o and other leading systems, perform poorly at assessing deception in social situations. In their comprehensive benchmark evaluating 12 different MLLMs, the best-performing model achieved only 74% overall accuracy on the deception assessment task, and when looking specifically at true/false statements, accuracy dropped dramatically to just 39.4%. The models showed a consistent pattern of defaulting to neutral judgments rather than making decisive calls about deception, revealing an overly conservative approach that avoids high-stakes decisions. This performance gap demonstrates that while current AI excels at processing information, it lacks the social reasoning capabilities needed to navigate complex human interactions.

To create this test, the researchers turned to the social deduction game Werewolf, where players must conceal their secret roles and use deceptive communication to mislead others. This game provides a controlled environment that elicits natural, high-stakes deception while offering objective, verifiable ground truth for every statement made. The team built a novel dataset called MIDA (Multimodal Interactive Deception Assessment) containing 2,360 dialogues with synchronized video and text, carefully annotated through a semi-automated pipeline that ensured accurate labeling. The dataset includes two subsets: MIDA-Ego4D with 819 utterances from third-person recordings, and MIDA-YouTube with 1,541 utterances from online game videos, capturing both novice and expert player behavior.

Show consistent failure patterns across all tested models. When analyzing performance across different persuasive strategies, models achieved high accuracy on neutral categories like No Strategy and Interrogation (where most statements are non-deceptive), but their true reasoning capabilities collapsed on information-rich categories. For Identity Declaration statements, average accuracy was only 15.7% on one dataset and 25.3% on another, while Evidence statements remained difficult with average accuracy below 40%. The researchers identified two core failure modes: first, the models cannot effectively distinguish salient social signals from distracting noise in multimodal inputs; second, they lack functional "Theory of Mind"—the ability to internally model what others know, believe, or intend. This prevents them from inferring hidden beliefs and strategic intentions behind statements.

The study's extend beyond academic research to practical applications where AI must interact with humans. The inability to assess deception reliably means current systems cannot function as trustworthy social partners in settings ranging from customer service to collaborative work environments. The researchers tested whether providing more context would help by conducting temporal ablation studies, finding that removing conversation history caused a catastrophic drop in deception assessment performance—binary accuracy plummeted from 39.4% to 13.4%. This demonstrates that assessing deception is fundamentally a global task requiring contextual reasoning over entire conversations, not just local analysis of individual statements.

To address these limitations, the researchers proposed two new modules: a Social Chain-of-Thought (SoCoT) reasoning pipeline that forces explicit grounding in visual and acoustic cues, and a Dynamic Social Epistemic Memory (DSEM) module that maintains structured cognitive context about each participant's evolving beliefs. When tested, these modules showed measurable improvements—the DSEM module increased binary accuracy by 2.3 percentage points and macro-F1 score by 3.3 points on one dataset. However, even with these enhancements, absolute performance remained low, indicating substantial room for improvement in future AI development.

The study reveals several important limitations in current AI capabilities. Models consistently overestimated uncertainty and under-committed to deception classification, an artifact of conservative alignment objectives that prioritize safety over decisive judgment. None of the models demonstrated robust Theory of Mind capacity, failing to infer what each speaker actually knows or believes. Additionally, simply adding more video frames did not provide benefits—in some cases, multi-frame input slightly degraded performance, reinforcing that models struggle to achieve robust visual grounding even with more temporal data. These highlight that while MLLMs function as powerful knowledge engines, they fall short as competent social agents capable of navigating the complexities of human deception.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn