AIResearch AIResearch
Back to articles
AI

AI Still Can't Read PowerPoints Properly

AI models fail at basic PowerPoint comprehension, struggling with slide layouts and narrative flow—exposing critical weaknesses in business and education tools we rely on daily.

AI Research
November 14, 2025
3 min read
AI Still Can't Read PowerPoints Properly

A new study reveals that today's most advanced AI models struggle with a task millions of humans perform daily: understanding PowerPoint presentations. Despite impressive capabilities in other areas, these vision-language models frequently misinterpret slide layouts, fail to recognize fonts accurately, and cannot reconstruct the narrative flow of presentation decks. The findings from Microsoft researchers highlight significant limitations in AI's ability to comprehend structured visual documents that are fundamental to business, education, and scientific communication.

The research team developed VLM-SlideEval, a comprehensive framework to test AI models on three critical aspects of slide comprehension. First, they evaluated basic element extraction—how well models could identify text boxes, images, and other components from slide images. Second, they tested sensitivity to controlled perturbations by systematically altering slide layouts, content, and styling. Third, they assessed narrative understanding by asking models to reorder shuffled presentation slides into their original sequence.

The methodology involved analyzing 1,948 PowerPoint slides from publicly available Zenodo datasets. Researchers created a standardized evaluation pipeline that combined rasterized slide images with ground truth data extracted from the original PowerPoint files. This allowed precise comparison between what AI models perceived and the actual slide content. The team tested multiple state-of-the-art models including GPT-4o, GPT-4.1, o3, and GPT-5 variants under controlled conditions.

Results showed stark limitations across all tested dimensions. While newer models like GPT-5 achieved near-perfect success rates (99.5%+) in parsing simple slides, performance dropped dramatically as slide complexity increased. For slides with 32 or more elements, GPT-4o's parseability rate plummeted to 66.7%. In element matching accuracy, GPT-5 variants led with F1 scores around 0.71-0.72, but still fell short of perfect comprehension.

Font recognition proved particularly challenging. The best-performing models achieved only 17-42% accuracy in identifying font families, despite fonts being crucial for understanding document hierarchy and emphasis. Geometry comprehension was similarly limited, with models showing 1-IoU scores around 0.55-0.65, indicating substantial errors in locating and sizing slide elements.

Most strikingly, models demonstrated poor narrative understanding. When asked to reorder shuffled presentation decks, all models performed only marginally better than random guessing, with Spearman correlation coefficients ranging from 0.05 to 0.13. This suggests that while AI can identify individual slide components, it struggles to comprehend the logical flow and storytelling structure that makes presentations effective communication tools.

The study's controlled perturbation tests revealed another critical finding: AI models show inconsistent behavior when slides are slightly altered. While models maintained reasonable consistency in evaluating text quality across different perturbation levels, they exhibited significant variability in assessing layout geometry and styling. This inconsistency could lead to unreliable performance in real-world applications where presentation formats frequently vary.

These limitations matter because PowerPoint and similar presentation tools are ubiquitous in professional and educational settings. As organizations increasingly look to automate document processing and analysis, understanding AI's current capabilities and shortcomings becomes crucial. The findings suggest that while AI can handle basic slide parsing, it's not yet ready to replace human comprehension of complex visual documents.

The research acknowledges several limitations, including its focus on publicly available PowerPoint files and a specific set of evaluation metrics. Future work could explore broader document types, richer narrative understanding tasks, and more diverse perturbation scenarios. For now, the study serves as a caution against over-reliance on AI for critical document comprehension tasks and highlights the need for continued improvement in multimodal AI systems.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn