Anthropic Tests AI That Finds Security Flaws, Limits Access

TL;DR

Claude Mythos spots exploitable software vulnerabilities like a pro researcher, so Anthropic is restricting access to 40+ vetted organizations only.

Anthropic has begun sharing its most capable model with a select group of companies, and says it has no plans to release it publicly. Claude Mythos, which LLM Stats logged as a preview on April 7, is being made available to more than 40 organizations including competitors, under conditions designed to surface security weaknesses before any broader deployment becomes possible.

The company's stated reason is unambiguous: the model could cause widespread disruption without tight controls. What gives that claim weight is the specific capability under scrutiny. PBS NewsHour reports that Mythos performs sustained security research comparable to a human expert working through an entire day's investigative workload. Most existing models assist with fragments of vulnerability analysis; Mythos, Anthropic says, can close the loop on identifying exploitable gaps in software systems end to end.

That distinction is not semantic. All software contains bugs, but exploiting them requires expertise and patience. A model capable of replicating that process autonomously represents a qualitative shift in offensive capability, one that sits outside the scope of conventional artificial intelligence safety testing frameworks.

The controlled testing approach

Anthropic's response treats Mythos as a research artifact rather than a product. The 40-plus participating companies are tasked with probing the model across their own systems, mapping its attack surface before that surface can map itself. That the cohort includes rivals is notable: it implies Anthropic views pre-deployment stress-testing as an industry-wide problem, not a competitive moat.

PBS NewsHour's reporting notes the restricted rollout is itself drawing scrutiny. Critics question whether controlled access can hold when participants are well-resourced organizations with divergent incentives. Anthropic's implicit counterargument is that leaving high-capability models unaudited carries its own risks, a position with precedent in responsible disclosure norms from the security community.

On benchmarks, Mythos scores 0.9 on GPQA according to LLM Stats, matching Meta's Muse Spark, which debuted one day later on April 8. Both results place these models at the high end of graduate-level scientific reasoning evaluations. GPQA measures academic reasoning, not operational security competence, so benchmark parity between the two models does not imply capability parity on the axis that actually concerns Anthropic.

What this signals for the field

The Mythos announcement lands as regulators and labs alike are trying to operationalize governance frameworks for high-capability artificial intelligence. Anthropic's approach borrows from the cybersecurity concept of responsible disclosure: surface the capability with trusted parties, probe for failure modes, then decide what comes next. Whether 40 companies constitutes a meaningfully controlled group remains debatable, but the logic is coherent.

Parallel moves at Google DeepMind suggest the wider industry is institutionalizing the same questions by different routes. The same week Mythos entered preview, DeepMind appointed Cambridge philosopher Henry Shevlin to a new role examining machine consciousness and AGI readiness, per Outlook Business. The lab also launched Gemini Robotics-ER 1.6, a spatial reasoning model for physical agents, according to SiliconAngle. Taken together, the moves suggest a lab pushing capability while actively pulling in ethical scaffolding.

For security engineers and red teams, the practical implication is concrete. A model that can autonomously trace exploitable paths through complex codebases changes the cost structure of both offense and defense. Organizations relying on complexity as a de facto security layer should treat this as a directional signal, even before Mythos reaches any broader release.

The open question is what Anthropic does after this testing window closes. A model described as too dangerous to release widely but valuable enough to develop implies the company believes controlled deployment justifies the capability's existence. That reasoning holds only as long as the controls do.

FAQ

What is Claude Mythos and why is Anthropic withholding public access?
Claude Mythos is Anthropic's latest model, currently in limited preview. Anthropic says it can autonomously perform sophisticated software vulnerability research, a capability the company considers too dangerous for unrestricted release.

How does Mythos differ from existing Claude models on security tasks?
Existing models can assist with individual steps in vulnerability analysis. Mythos is described as capable of completing the full research cycle independently, comparable to a human security researcher working through an entire day.

Which companies have access to Claude Mythos?
More than 40 technology companies, including some of Anthropic's direct competitors, received restricted access to test the model and probe for vulnerabilities in their own systems.

How does Mythos benchmark against other frontier models?
LLM Stats lists a GPQA score of 0.9 for Claude Mythos Preview, on par with Meta's Muse Spark. GPQA measures graduate-level academic reasoning and does not directly capture the operational security capabilities that make Mythos specifically notable.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn