AIResearch AIResearch
Back to articles
Data

FOOTPASS: A New Dataset for AI-Powered Soccer Analysis

Soccer, the world's most popular sport, generates vast amounts of data from broadcast videos, fueling the rise of soccer analytics that blends computer vision, machine learning, and domain expertise. …

AI Research
November 22, 2025
4 min read
FOOTPASS: A New Dataset for AI-Powered Soccer Analysis

Soccer, the world's most popular sport, generates vast amounts of data from broadcast videos, fueling the rise of soccer analytics that blends computer vision, machine learning, and domain expertise. Traditionally, extracting detailed play-by-play data—such as who passed the ball, when, and where—has relied heavily on manual annotation by experts, a time-consuming process that limits scalability. This gap has spurred research into automated s, but existing datasets often lack the multi-modal, multi-agent context needed to capture the tactical nuances of the game. Enter FOOTPASS, a groundbreaking dataset introduced by researchers from Mines Paris and Footovision, designed to bridge this divide by providing full-length soccer broadcast videos aligned with rich tactical data, including player positions, velocities, and identities. This innovation aims to support the development of AI systems that can reliably spot actions in real-time, transforming how teams and analysts derive insights from match footage.

The creation of FOOTPASS involved a meticulous process to ensure high-quality, diverse, and realistic data. The dataset comprises 54 full matches from major European leagues like Ligue 1, Bundesliga, and the UEFA Champions League, totaling 102,992 annotated on-ball events such as passes, drives, and shots. Each event is manually labeled with precise temporal anchors, player jersey numbers, and team affiliations, while game-state data—like player locations and roles—is reconstructed using advanced computer vision techniques. This includes field line detection, camera calibration, player tracking, and imputation for missing values, all certified under FIFA EPTS standards. The annotations cover eight action classes, with a natural imbalance reflecting real soccer dynamics: passes and drives dominate at nearly 90% of events, while shots and tackles are rare. Notably, the dataset addresses s like occlusions and broadcast replays, with about 81.5% of events having bounding boxes for actors, though classes like throw-ins and headers show lower coverage due to crowded areas or editing cuts.

Benchmarking experiments on FOOTPASS evaluated several state-of-the-art s to assess their performance in player-centric action spotting. The baseline, TAAD (Track-Aware Action Detector), a purely visual approach, achieved high recall but low precision, generating many false positives. Enhancements like TAAD+GNN, which incorporates graph neural networks to model spatiotemporal inter-player relationships, improved both metrics, boosting F1-score from 35.9% to 52.1%. The top performer, TAAD+DST, leverages denoising sequence transduction with long-range temporal and tactical context, nearly doubling the F1-score to 67.5% by refining noisy predictions into coherent action sequences. Class-specific revealed significant gains for drives and passes, which benefit from contextual reasoning, while sparse actions like crosses and blocks saw precision improvements due to role-based priors. However, tackles remained challenging across s, highlighting areas for future refinement in handling rare events.

Of FOOTPASS extend beyond academic research, offering practical benefits for sports analytics and AI development. By providing a public benchmark with multi-modal data, it enables the training of models that integrate visual cues with tactical knowledge, leading to more reliable automated annotation systems. This could reduce the manual effort required by clubs and broadcasters, allowing for real-time performance analysis and tactical modeling. For instance, tools like TacticAI could be enhanced with FOOTPASS data to predict player movements and game outcomes more accurately. Moreover, the dataset's focus on high-recall scenarios supports applications in assisted annotation, where filtering false positives is easier than locating missed events, streamlining workflows in professional soccer environments.

Despite its strengths, FOOTPASS has limitations that warrant consideration. The dataset's size, while substantial, is smaller than some predecessors, potentially affecting the robustness of models trained on it, especially for imbalanced classes. Additionally, the reliance on inferred player positions from broadcast footage—rather than ground-truth multi-camera systems—introduces uncertainties, and events without bounding boxes pose s for purely visual s. Future work could expand annotations to include sparse events like fouls or referee interventions, incorporate audio and commentary for richer context, and explore end-to-end learning approaches. FOOTPASS is publicly available for research, with annotations on Hugging Face and code on GitHub, though video access requires agreements with SoccerNet, ensuring it serves as a foundational resource for advancing AI in sports analytics.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn