AIResearch AIResearch
Back to articles
Science

AI Answers Questions More Like Humans Do

New approach helps AI determine when questions can't be answered, improving accuracy on complex reading comprehension tasks by learning probability relationships between answerability and answer positions.

AI Research
November 11, 2025
3 min read
AI Answers Questions More Like Humans Do

As artificial intelligence systems increasingly handle real-world question answering tasks, they face a fundamental challenge: determining when a question simply cannot be answered from the available information. Current systems often struggle with this basic human judgment, either forcing answers where none exist or missing answerable questions entirely. A new approach addresses this limitation by teaching AI to understand the natural relationship between whether a question can be answered and where potential answers might appear.

The researchers developed a method that learns the joint probability between answerability and answer positions, treating these as interconnected decisions rather than separate problems. Unlike previous systems that relied on artificial 'no-answer' tokens or sentinel values, this approach models the natural dependency where valid answer positions depend on whether a question is answerable in the first place. This more intuitive structure better reflects how humans approach reading comprehension.

The system combines two established AI architectures: BERT, which provides contextual understanding of language, and BiDAF, which handles the interactive exploration between questions and context. By integrating these at both word and character levels, the model captures nuanced linguistic patterns while maintaining computational efficiency. The architecture processes question-context pairs through multiple layers that progressively refine understanding, culminating in a probability predictor that jointly considers answerability and answer spans.

Experimental results on the challenging SQuAD 2.0 dataset show significant improvements. The system achieved an F1 score of 75.84% and exact match accuracy of 72.24%, outperforming baseline methods by substantial margins. The model particularly excelled at the 'Answer vs. No Answer' metric, reaching 79.68% accuracy in determining whether questions could be answered at all. These gains came without sacrificing performance on answerable questions, demonstrating balanced improvement across different question types.

This advancement matters because real-world question answering systems—from virtual assistants to customer service chatbots—frequently encounter unanswerable questions. Current systems often produce misleading responses when they should simply acknowledge uncertainty. By better modeling the relationship between answerability and answer positions, this approach could make AI assistants more reliable and trustworthy in everyday applications. The method's natural probability structure also makes it more interpretable than previous black-box approaches.

The researchers acknowledge limitations, including computational constraints that prevented larger batch sizes and potential gradient imbalance between different model components. The system still struggles with some error types, particularly when questions contain distracting terms or when answer boundaries are imprecise. Future work will focus on addressing these challenges while maintaining the method's theoretical consistency and practical performance.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn