AIResearch AIResearch
Back to articles
AI

Flight Delays Reveal AI's Hidden Weakness

Flight delays cost billions, but AI keeps failing to predict them. Discover how a new dataset exposes AI's blind spots and pushes for smarter solutions.

AI Research
November 14, 2025
3 min read
Flight Delays Reveal AI's Hidden Weakness

Flight delays cost billions and disrupt travel worldwide, yet predicting them accurately has stumped artificial intelligence systems. A new dataset called Aeolus exposes why: AI models often fail because they ignore the complex, interconnected nature of real-world data. By integrating flight schedules, weather, and airport networks, Aeolus provides a realistic testbed that reveals critical gaps in current machine learning approaches, pushing researchers toward more robust and practical solutions.

Researchers discovered that AI models struggle with flight delay prediction due to oversimplified data. Traditional datasets treat flights as isolated events, but delays cascade through networks—a late arrival in Chicago can disrupt departures in Atlanta. Aeolus captures this by combining three data types: tabular details like flight times and weather, sequences of flights by the same aircraft, and graphs showing airport and crew connections. This multi-modal approach mirrors real aviation dynamics, where delays propagate through shared resources and schedules.

The team built Aeolus using nine years of U.S. flight data from 2016 to 2024, including over 54 million flights, airport metadata, and hourly weather measurements. They structured it into flight chains to model how delays move from one leg to another and into graphs to represent airport congestion and crew rotations. To prevent data leakage—where models cheat by seeing future information—they used time-based splits, ensuring evaluations reflect real-world scenarios. The dataset supports tasks like regression (predicting delay minutes), classification (identifying delays), and uncertainty estimation, with tools for preprocessing and benchmarking.

Results show that no single AI model excels across all tasks. In tabular data experiments, FTTransformer performed best for arrival delay regression (mean squared error of 0.914), while TabulaRNN led in classification accuracy (up to 77.2%). For sequential data, models like MogrifierLSTM achieved stable but modest performance (AUC around 69%), indicating that flight chains preserve dependencies but challenges remain. Graph-based approaches, such as combining VGAE embeddings with AFM, improved prediction by 0.71% in AUC by capturing multi-hop delay propagation. Crucially, tests revealed that random data splits inflate performance by an average of 0.057 AUC, highlighting the risk of overoptimistic results without proper temporal safeguards.

This work matters because accurate delay prediction can save airlines and passengers from economic losses, reduce fuel waste, and cut carbon emissions. In 2022, U.S. flight disruptions led to an estimated $30–34 billion in costs and added millions of tons of CO2. Aeolus enables researchers to develop AI that handles real-world complexities, potentially improving air traffic management and traveler experiences. However, its focus on North America and exclusion of factors like air traffic control decisions mean it's not a complete solution, urging caution in direct applications.

Limitations include geographic bias, with 78.4% of data from North America, and missing real-time operational signals. The COVID-19 period introduced anomalies, with average delays dropping sharply, which may affect model generalizability. Future efforts could expand to global data and incorporate finer details like maintenance logs, but for now, Aeolus serves as a wake-up call for AI to address structural realities in data.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn