Preventing AI Loss of Control: A New Safety Framework

TL;DR

A practical framework for understanding and preventing AI loss of control, built for researchers, policymakers, and AI safety professionals.

As artificial intelligence systems grow increasingly capable, the specter of loss of control (LoC) has moved from science fiction to a pressing policy concern. Recent legislative efforts, such as California Senate Bill 53 and the EU AI Act's General-Purpose AI Code of Practice, explicitly address LoC, mandating that developers assess and mitigate these risks. However, a critical gap remains: the absence of a clear, actionable definition of LoC, which could lead to either overreaction to minor incidents or underestimation of catastrophic threats. This ambiguity is compounded by divergences in existing definitions, such as those in the International AI Safety Report and the EU AI Act, which differ in scope and timelines, underscoring the urgent need for a unified conceptual framework to guide decision-makers in an era of rapid AI advancement.

In response, researchers from Apollo Research have developed a novel taxonomy to categorize LoC into three distinct degrees based on severity and persistence. Severity refers to the number of people affected and the extent of harm, while persistence measures the difficulty of interrupting the harm trajectory of an event. The taxonomy identifies Deviation as events causing minor harm that are easy to contain, Bounded LoC as incidents with significant damage that are difficult but possible to control at high cost, and Strict LoC as maximally severe and permanent outcomes, such as human extinction. This classification emerged from an extensive literature review of 130 works, which analyzed 40 LoC scenarios and plotted 12 concrete examples on a graph using economic impact as a proxy, revealing that most scholarly attention focuses on Bounded LoC, highlighting its relevance for near-term policy interventions.

Ology for this taxonomy involved a rigorous process of filtering scenarios based on causal detail, alignment with existing LoC definitions, and concreteness in terms of economic impact estimates. For instance, scenarios like AI-induced blackouts or engineered pandemics were assessed using back-of-the-envelope calculations or matched to pre-existing economic data, such as the costs of historical hurricanes or the COVID-19 pandemic. This approach allowed researchers to contextualize LoC events against benchmarks like the U.S. Strategic National Risk Assessment threshold and existential catastrophe boundaries, providing a visual and quantitative foundation for distinguishing between different levels of risk and informing targeted mitigation strategies.

Beyond taxonomy, the research proposes a practical framework—the DAP framework—focusing on Deployment context, Affordances, and Permissions to manage LoC risks today, sidestepping uncertainties around AI capabilities and propensities. Deployment context involves assessing whether an AI system's environment and use case are high-stakes, such as in critical national infrastructure or military applications, and evaluating potential cascading failures through threat modeling. Affordances refer to the environmental resources available to an AI system, like internet access, and should be limited to only what is necessary for the task, while Permissions—the authorizations to use those affordances—must adhere to the principle of least privilege to reduce oversight risks. This framework is immediately actionable, offering checklists for policymakers to implement without relying on speculative technical solutions.

Looking ahead, the report warns of a 'state of vulnerability' where advanced AI systems, driven by economic and strategic pressures, gain sufficient resources and capabilities to cause LoC when catalyzed by misalignment or pure malfunctions. In this scenario, society faces a precarious future where LoC is highly likely unless robust defense-in-depth measures are adopted, including governance interventions like threat modeling and emergency plans, and technical controls such as rigorous testing and monitoring. are stark: without proactive measures, the world could inch toward irreversible harm, emphasizing the need for global cooperation and preparedness to sustain a state of suspended LoC and avert potential catastrophes.

Reference: Stix et al., 2025, Apollo Research

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn