AIResearchAIResearch
Machine Learning

Hassabis: LLMs Cannot Model Physics, World Models Are Next

Hassabis argues language models describe physics but cannot model it, signaling a strategic pivot at DeepMind toward world model architectures for AGI.

3 min read
Hassabis: LLMs Cannot Model Physics, World Models Are Next

TL;DR

Hassabis argues language models describe physics but cannot model it, signaling a strategic pivot at DeepMind toward world model architectures for AGI.

The glass shatters when it hits the floor. Any large language model can narrate that scene convincingly. What it cannot do, according to Demis Hassabis, is model the physics that caused it.

In a January 2026 appearance on CNBC's "The Tech Download," the Google DeepMind CEO delivered a pointed diagnosis of where the artificial intelligence field currently stands: LLMs are bounded by the medium they learn from. Text encodes a compressed description of reality, not reality's underlying mechanics. Gravity, friction, spatial geometry, and causal chains that unfold over time sit largely outside what next-token prediction can reliably learn.

The proposed solution is a different class of system: "world models." Rather than predicting plausible text sequences, these would simulate and predict real-world dynamics directly, building internal representations of physical causality that language-only training cannot provide.

The reasoning gap

Crypto Briefing reports that Hassabis identified three specific failure modes: physics, causality, and spatial reasoning. These aren't peripheral use cases. They are foundational to any system that needs to act in the physical world rather than describe it, and they represent exactly the gap between narrow text competence and general reasoning.

Systems like Google's Gemini are genuinely impressive by conventional benchmarks. They process text, images, audio, and video; they pass professional licensing exams and write functional software. But those capabilities rest on statistical patterns in language, and language is a compression of reality, not reality itself. Ask such a model to reason about what happens when two objects collide, or to plan a multi-step physical task over a long time horizon, and the architecture hits a structural ceiling.

This argument also carries strategic weight. If the CEO of one of the field's most influential labs is publicly framing LLMs as an incomplete path to general intelligence, that shapes where research resources and engineering effort at DeepMind will flow next.

The world model bet

World models as a concept appear throughout the artificial intelligence review literature going back decades. Researchers have explored predictive internal models in model-based reinforcement learning and in architectures like Yann LeCun's joint-embedding predictive systems. Hassabis is not inventing the idea; he is elevating it as the primary path to general reasoning, not an auxiliary technique.

Such a system would maintain a continuously updated internal simulation of physical and causal relationships. It would allow an agent to mentally rehearse an action before executing it, compare the predicted outcome against what actually happens, and revise its model accordingly. Structurally, this is closer to how human cognition handles novel physical scenarios than anything current transformer architectures do natively.

The challenge is substantial. Building world models at scale remains an open research problem. Simulation fidelity, data requirements, and the combinatorial complexity of real environments make this significantly harder than training on text. DeepMind has not announced a specific product roadmap, and Hassabis's remarks are better read as a directional thesis than a near-term delivery commitment.

Context and competition

The broader model landscape makes the contrast stark. The pace of frontier releases in 2026 has been relentless: trackers like Price Per Token and LLM Stats document new competitive systems appearing almost weekly. According to AI Release Tracker, the field has logged over 150 frontier models since late 2022. Most iterate on the same transformer-based, language-centric paradigm Hassabis is critiquing.

Anthropics is meanwhile making a different kind of bet on frontier capability. The company has begun limited testing of a model called Mythos, which PBS NewsHour reports is considered too powerful for broad release, given its ability to identify exploitable software vulnerabilities at a level that could be weaponized. That is a capabilities-forward story, not an architectural one, and it illustrates how divergent the strategic bets across major labs currently are.

Hassabis is not dismissing LLMs. He is arguing they represent an incomplete foundation, and that reaching general-purpose artificial intelligence will require systems that can simulate causality and physics in ways text training cannot replicate. The question the field has not answered is what data regime or training methodology gets you there without requiring implausible quantities of real-world physical interaction.

Frequently asked questions

Q: What are world models in AI?
A: World models are AI systems designed to simulate and predict physical and causal dynamics directly, rather than pattern-matching over text. The goal is an internal representation of how the world works that a model can update from experience and use to reason about novel situations.

Q: Why do LLMs fail at physics and causality?
A: LLMs learn from language, which describes physical events at a high level of abstraction. The numerical, spatial, and causal relationships underlying those descriptions are not reliably encoded in text, so models struggle with reasoning that depends on them directly.

Q: How is DeepMind's approach different from competitors?
A: Hassabis is arguing for a structural departure from language-centric training toward systems that simulate physical dynamics. Most competing labs continue scaling transformer-based models trained primarily on text and multimodal data.

Q: Has DeepMind released a world model?
A: No. Hassabis's January 2026 remarks outline a research direction and a strategic thesis, not a product roadmap with specific timelines or a publicly available system.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn