Artificial intelligence systems, no matter how advanced, face an inherent limitation: they cannot always make decisions that they themselves consider optimal. This finding comes from a new mathematical analysis of Newcomb's paradox, a classic thought experiment that has divided philosophers and scientists for over half a century. The research demonstrates that separating an entity's knowledge system from its decision-making system reveals unavoidable situations where even perfect reasoning leads to knowingly poor choices.
The key discovery is that no decision system can guarantee what researchers call "counterfactual optimization"—the ability to always choose the option that would yield the best outcome if selected. The paper proves this mathematically through Theorem 1, showing that when an entity's epistemic system (which reasons about the world) has imperfect knowledge about its own decision system, counterfactual optimization becomes impossible. Essentially, the part of an AI that knows what's best cannot always control the part that makes decisions.
The methodology builds on a clear separation between what the researchers call "Emma" (the epistemic system) and "Dan" (the decision system). Emma uses pure Bayesian reasoning—applying only probability laws to compute beliefs about the world. Dan computes decisions based on available data. The critical insight is that Emma must have some uncertainty about what Dan will decide; if Emma knew Dan's decisions with perfect certainty, counterfactual reasoning would become meaningless.
The analysis then applies this framework to Newcomb's paradox, where a player (Alice) faces a predictor (Omega) who has already predicted her choice. In the classic formulation, Alice chooses between taking one box (which may contain a large reward if Omega predicted she'd take it) or two boxes (getting a smaller guaranteed reward plus whatever's in the second box). The paper shows that whether one-boxing or two-boxing is counterfactually optimal depends entirely on how much additional data Alice believes Omega possesses about her decision-making process.
When Alice doesn't expect Omega to know significantly more than she does about her own decisions, two-boxing becomes the only counterfactually optimal choice. But if Alice believes Omega has access to enough additional data to become "quasi-omniscient" about her decision system, then one-boxing becomes counterfactually preferable. This finding, formalized in Theorem 2, resolves the long-standing debate by showing that both strategies can be rational depending on the information context.
The implications extend beyond philosophical puzzles to practical AI systems. The research suggests that entities with vastly more data than us—such as social media algorithms that track our behavior—could potentially predict our decisions better than we can ourselves. This creates real-world Newcomb-like situations where our best individual decisions might not align with what would be optimal if we could account for how others predict our behavior.
The analysis acknowledges several limitations. It assumes perfect Bayesian reasoning, which requires unrealistic computing resources in practice. The framework also doesn't account for computational irreducibility—the idea that some computations cannot be shortcut, meaning systems might not be able to derive their own decisions without actually performing the computations. Additionally, the paper focuses on a simplified reward system and doesn't fully address how imperfect data collection affects decision quality in complex real-world scenarios.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn