AI Models Can't Answer 'What If' Questions Reliably

Artificial intelligence systems that make high-stakes decisions in healthcare, law, and finance face a fundamental limitation: they can't reliably answer hypothetical 'what if' questions about their own reasoning. This gap prevents doctors from asking 'Would this diagnosis change with different treatment?' or loan officers from questioning 'Would this applicant be approved with higher income?' - critical questions that humans routinely use to understand decisions.

Researchers from Columbia University discovered that current AI models, including both standard 'black box' systems and newer 'concept-based' approaches, fail to provide consistent answers to counterfactual queries. These are questions about how predictions would change under different conditions. The team analyzed common model architectures and found neither type can guarantee consistent responses to hypothetical scenarios, even when they perform equally well on standard prediction tasks.

The researchers developed a mathematical framework to determine when AI models can reliably answer 'what if' questions. They found that models must be specifically designed to respect causal relationships between features. For example, to reliably answer 'Would this face be more attractive if the person smiled?', the model must use only features that aren't causally influenced by smiling, such as gender or bone structure, rather than features like cheekbone appearance that might change when someone smiles.

Experimental results using synthetic datasets showed this fundamental trade-off between interpretability and accuracy. Models designed to answer more counterfactual questions necessarily sacrifice some predictive power, while the most accurate models can answer fewer types of hypothetical queries. In tests with facial attractiveness prediction, models using only non-descendant features of smiling (like gender) provided consistent answers about how smiles affect attractiveness, while standard models gave contradictory responses.

The team's framework provides a practical way to evaluate which questions a given AI model can reliably answer and how to design models that can handle specific types of counterfactual queries. This approach doesn't require full knowledge of all causal relationships - only understanding which features influence others.

For real-world applications, this means AI systems used in medical diagnosis, loan approval, or legal decisions could be designed to answer the specific 'what if' questions most relevant to their domain. A medical AI could be built to reliably answer questions about how different symptoms or test results would affect diagnoses, while a financial system could handle queries about how income changes would impact credit decisions.

The main limitation is that building causally interpretable models requires understanding which features influence others in the domain. In complex real-world scenarios, these causal relationships may not be fully known. Additionally, there's an inherent trade-off - the more types of 'what if' questions a model can answer, the less accurate it becomes at standard predictions.

AI Models Can't Answer 'What If' Questions Reliably

About the Author

Guilherme A.