AIResearch

AI

Deep coverage of artificial intelligence research: transformer architectures, multimodal models, agentic systems, alignment techniques, and the papers driving the frontier. Updated with primary sources and technical context for engineers, researchers, and applied scientists.

AI

Claude Performance Regression Triggers Developer Backlash

Developers report Claude makes more errors and skips steps after Anthropic reduced default token effort levels, raising transparency questions ahead of a potential IPO.

Apr 16 4 min read
AI

OpenAI Safety Fellowship funds external AI alignment research

OpenAI opens applications for a six-month Safety Fellowship funding external researchers with stipends, model access, and support to produce safety and alignment outputs.

Apr 16 4 min read
AI

Claude Opus 4.7 scores 64.3% on SWE-bench Pro, outpacing GPT-5.4

Claude Opus 4.7 leads SWE-bench Pro, CursorBench, and SWE-bench Verified with sharply reduced tool errors and stronger multi-agent capabilities for long autonomous workflows.

Apr 16 4 min read
AI

Anthropic Tests Mythos Model, Warns Against Wide Release

Claude Mythos can trace exploitable software gaps like a seasoned security researcher, prompting Anthropic to restrict testing to 40-plus vetted organizations.

Apr 16 4 min read
AI

Meta Launches Muse Spark Nine Months After $14B Wang Deal

Meta's first Muse-series model targets fast reasoning over raw benchmark supremacy as the company tries to close ground on OpenAI and Google.

Apr 16 4 min read
AI

Retrieval-Augmented Generation vs Long Context Windows: When Each Architecture Wins

As context windows push past 1 million tokens, the engineering case for RAG pipelines is shifting from necessity to optimization choice, with production benchmarks showing each approach dominates in different deployment scenarios.

Apr 16 3 min read
AI

Synthetic Data Hits a Ceiling: New Scaling Laws Reveal the 30% Threshold for LLM Pre-Training

A 100,000 GPU-hour study across 1,000+ language models finds that mixing one-third synthetic data with two-thirds human text accelerates training tenfold, but pushing past that ratio triggers the onset of model collapse.

Apr 16 4 min read
AI

Mythos Preview logs ~40 CVE candidates; full scope still undisclosed

Project Glasswing gave 50 firms access to Claude Mythos Preview to hunt bugs, but just 40 CVEs show potential attribution after eight days of testing.

Apr 15 4 min read
AI

DeepMind releases Gemini Robotics-ER 1.6 with spatial reasoning upgrades

Gemini Robotics-ER 1.6 adds relational reasoning, analog gauge reading, and a modular tool-calling layer for robots in factories, warehouses, and homes.

Apr 15 4 min read
AI

Anthropic Cuts Claude's Effort Level, Drawing Developer Backlash

Developers report Claude failing on complex workflows after Anthropic quietly reduced reasoning depth, raising transparency concerns ahead of a potential IPO.

Apr 15 4 min read
AI

Flatiron Software Open-Sources AI Summarization Plugin for WordPress Publishers

Flatiron's WordPress AI summarization plugin goes open source after months in production testing, with the publishing partner reporting major engagement gains.

Apr 15 4 min read
AI

ML Accelerates Bone Imaging Research, Clinical Use Still Distant

How machine learning is maturing for bone imaging research at University of Colorado, targeting osteoarthritis, osteoporosis, and fracture risk with multi-modal pipelines.

Apr 15 4 min read
AI

Claude Mythos Preview Claims Autonomous Zero-Day Exploit Skills

Anthropic's Claude Mythos Preview reportedly exploits zero-day vulnerabilities autonomously. Here's what the claims mean for security practitioners and what remains unverified.

Apr 15 4 min read
AI

Anthropic Tests Claude Opus 4.7 With Agentic Focus

Anthropic's next flagship model targets autonomous multi-agent workflows while a new design tool and Claude Code redesign signal a broader platform strategy.

Apr 15 4 min read
AI

Claude Mythos Autonomously Hacks Networks, UK Safety Lab Warns

The UK's AI Security Institute confirmed Anthropic's Claude Mythos executes autonomous multi-step cyberattacks at expert level, prompting urgent calls for cyber defense investment.

Apr 14 4 min read
AI

Nvidia Open-Sources Ising Models to Speed Quantum Calibration

Nvidia's open-source Ising model collection cuts quantum processor calibration from days to hours and claims 2.5x faster decoding than existing open-source tools.

Apr 14 4 min read
AI

Google DeepMind Hires a Philosopher to Study Machine Consciousness

Google DeepMind brought Henry Shevlin in-house as a Philosopher to tackle machine consciousness and AGI readiness, signaling a shift beyond ethics advisory boards.

Apr 14 4 min read
AI

Anthropic's Claude Mythos Can Hack Networks Autonomously, AISI Warns

The UK's AI Safety Institute confirms Claude Mythos autonomously executes multi-step network attacks, marking a new threshold in AI offensive cyber capabilities.

Apr 14 4 min read
AI

UK AI watchdog: Claude Mythos can autonomously breach IT networks

Britain's AISI found that Anthropic's Claude Mythos executes multi-step cyberattacks autonomously, leading Anthropic to withhold the model from public release.

Apr 14 4 min read
AI

DeepMind and Microsoft Propose Financial Risk Standard for AI Agents

Researchers from Google DeepMind, Microsoft Research and Columbia University propose escrow-based financial safeguards for autonomous AI agents handling real economic transactions.

Apr 14 4 min read
AI

Endee Labs Ships Managed Cloud for Open-Source Vector Database

Endee Cloud enters the managed vector database market claiming benchmark wins over Pinecone, Qdrant, and Milvus on throughput, recall, latency, and cost simultaneously.

Apr 13 4 min read
AI

Anthropic Restricts Claude Mythos Preview to Cybersecurity Consortium

Anthropic's Claude Mythos Preview vastly outperforms Opus 4.6 on exploit generation but stays gated behind Project Glasswing, a ten-company defensive security consortium.

Apr 13 4 min read
AI

Anthropic Withholds Claude Mythos After Zero-Day Exploit Spree

Anthropic's most capable model autonomously found decade-old vulnerabilities in major OSes, then the company locked it behind a $100M partner consortium.

Apr 13 4 min read
AI

Anthropic's Claude Mythos Cracks Zero-Days, Skips Public Launch

Anthropic withholds its most capable model from public release after it autonomously exploited vulnerabilities in every major OS and browser during testing.

Apr 13 4 min read