AI
Deep coverage of artificial intelligence research: transformer architectures, multimodal models, agentic systems, alignment techniques, and the papers driving the frontier. Updated with primary sources and technical context for engineers, researchers, and applied scientists.
Claude Performance Regression Triggers Developer Backlash
Developers report Claude makes more errors and skips steps after Anthropic reduced default token effort levels, raising transparency questions ahead of a potential IPO.
OpenAI Safety Fellowship funds external AI alignment research
OpenAI opens applications for a six-month Safety Fellowship funding external researchers with stipends, model access, and support to produce safety and alignment outputs.
Claude Opus 4.7 scores 64.3% on SWE-bench Pro, outpacing GPT-5.4
Claude Opus 4.7 leads SWE-bench Pro, CursorBench, and SWE-bench Verified with sharply reduced tool errors and stronger multi-agent capabilities for long autonomous workflows.
Anthropic Tests Mythos Model, Warns Against Wide Release
Claude Mythos can trace exploitable software gaps like a seasoned security researcher, prompting Anthropic to restrict testing to 40-plus vetted organizations.
Meta Launches Muse Spark Nine Months After $14B Wang Deal
Meta's first Muse-series model targets fast reasoning over raw benchmark supremacy as the company tries to close ground on OpenAI and Google.
Retrieval-Augmented Generation vs Long Context Windows: When Each Architecture Wins
As context windows push past 1 million tokens, the engineering case for RAG pipelines is shifting from necessity to optimization choice, with production benchmarks showing each approach dominates in different deployment scenarios.
Synthetic Data Hits a Ceiling: New Scaling Laws Reveal the 30% Threshold for LLM Pre-Training
A 100,000 GPU-hour study across 1,000+ language models finds that mixing one-third synthetic data with two-thirds human text accelerates training tenfold, but pushing past that ratio triggers the onset of model collapse.
Mythos Preview logs ~40 CVE candidates; full scope still undisclosed
Project Glasswing gave 50 firms access to Claude Mythos Preview to hunt bugs, but just 40 CVEs show potential attribution after eight days of testing.
DeepMind releases Gemini Robotics-ER 1.6 with spatial reasoning upgrades
Gemini Robotics-ER 1.6 adds relational reasoning, analog gauge reading, and a modular tool-calling layer for robots in factories, warehouses, and homes.
Anthropic Cuts Claude's Effort Level, Drawing Developer Backlash
Developers report Claude failing on complex workflows after Anthropic quietly reduced reasoning depth, raising transparency concerns ahead of a potential IPO.
Flatiron Software Open-Sources AI Summarization Plugin for WordPress Publishers
Flatiron's WordPress AI summarization plugin goes open source after months in production testing, with the publishing partner reporting major engagement gains.
ML Accelerates Bone Imaging Research, Clinical Use Still Distant
How machine learning is maturing for bone imaging research at University of Colorado, targeting osteoarthritis, osteoporosis, and fracture risk with multi-modal pipelines.
Claude Mythos Preview Claims Autonomous Zero-Day Exploit Skills
Anthropic's Claude Mythos Preview reportedly exploits zero-day vulnerabilities autonomously. Here's what the claims mean for security practitioners and what remains unverified.
Anthropic Tests Claude Opus 4.7 With Agentic Focus
Anthropic's next flagship model targets autonomous multi-agent workflows while a new design tool and Claude Code redesign signal a broader platform strategy.
Claude Mythos Autonomously Hacks Networks, UK Safety Lab Warns
The UK's AI Security Institute confirmed Anthropic's Claude Mythos executes autonomous multi-step cyberattacks at expert level, prompting urgent calls for cyber defense investment.
Nvidia Open-Sources Ising Models to Speed Quantum Calibration
Nvidia's open-source Ising model collection cuts quantum processor calibration from days to hours and claims 2.5x faster decoding than existing open-source tools.
Google DeepMind Hires a Philosopher to Study Machine Consciousness
Google DeepMind brought Henry Shevlin in-house as a Philosopher to tackle machine consciousness and AGI readiness, signaling a shift beyond ethics advisory boards.
Anthropic's Claude Mythos Can Hack Networks Autonomously, AISI Warns
The UK's AI Safety Institute confirms Claude Mythos autonomously executes multi-step network attacks, marking a new threshold in AI offensive cyber capabilities.
UK AI watchdog: Claude Mythos can autonomously breach IT networks
Britain's AISI found that Anthropic's Claude Mythos executes multi-step cyberattacks autonomously, leading Anthropic to withhold the model from public release.
DeepMind and Microsoft Propose Financial Risk Standard for AI Agents
Researchers from Google DeepMind, Microsoft Research and Columbia University propose escrow-based financial safeguards for autonomous AI agents handling real economic transactions.
Endee Labs Ships Managed Cloud for Open-Source Vector Database
Endee Cloud enters the managed vector database market claiming benchmark wins over Pinecone, Qdrant, and Milvus on throughput, recall, latency, and cost simultaneously.
Anthropic Restricts Claude Mythos Preview to Cybersecurity Consortium
Anthropic's Claude Mythos Preview vastly outperforms Opus 4.6 on exploit generation but stays gated behind Project Glasswing, a ten-company defensive security consortium.
Anthropic Withholds Claude Mythos After Zero-Day Exploit Spree
Anthropic's most capable model autonomously found decade-old vulnerabilities in major OSes, then the company locked it behind a $100M partner consortium.
Anthropic's Claude Mythos Cracks Zero-Days, Skips Public Launch
Anthropic withholds its most capable model from public release after it autonomously exploited vulnerabilities in every major OS and browser during testing.