Soccer fans around the world rely on commentary to enhance their viewing experience, but automated systems have struggled to match the depth and accuracy of human broadcasters. Previous AI models for generating soccer commentary often produced generic descriptions with placeholder labels like "[PLAYER]" instead of specific names, lacked context about the game state, and missed the statistical insights that make professional commentary informative. This gap has limited the usefulness of AI in real-world sports applications, where audiences expect detailed, engaging narratives that reflect the unfolding action on the field.
Researchers from Tsinghua University have developed a new AI system called GAME SIGHT that addresses these shortcomings by generating commentary that closely resembles live televised broadcasts. The system achieves this by accurately identifying players and teams in video footage and enriching the commentary with historical statistics and real-time game context. In tests, GAME SIGHT improved player alignment accuracy by 18.5% compared to a leading model like Gemini 2.5-pro, as shown in experiments on the SN-Caption-test-align dataset. This means the AI can correctly name the players involved in events, such as goals or fouls, rather than using anonymous labels, making the commentary more precise and relatable for viewers.
Ology behind GAME SIGHT involves a two-stage process that mimics how human commentators work. In the first stage, the system uses visual reasoning to align anonymous entities in the commentary with specific players and teams. It analyzes video segments for fine-grained details like close-up views, player faces, jersey numbers, and team affiliations, combining this with contextual clues such as team line-ups and recent game events. This stage was trained using supervised fine-tuning and group relative policy optimization on a model called Qwen2.5-VL-7B-Instruct, which helped improve its reasoning capabilities. An empirical study with expert commentators showed that humans rely on similar visual and contextual cues, achieving 96.3% accuracy in player identification, validating the approach.
In the second stage, GAME SIGHT refines the commentary by incorporating external and internal knowledge. It uses a soccer knowledge-augmented generation system to pull in historical statistics, such as a player's goal-scoring record or a team's past performance, and tracks internal game context like scores and key events. This allows the AI to add explanations and comments beyond mere description. For example, in evaluations, GAME SIGHT showed high accuracy in referencing knowledge, with 98.76% accuracy for goal-related information and 90.74% for other context aspects. The system's commentary was compared to live televised transcripts and outperformed previous models in metrics like deep coherence and anaphor overlap, indicating better narrative flow and contextual relevance.
Demonstrate that GAME SIGHT produces commentary that is not only more accurate but also more engaging. In segment-level evaluations, it achieved competitive scores on metrics like BLEU and ROUGE when compared to live televised commentary, and at the game level, it matched the structural composition of professional broadcasts, with less than 50% descriptive content and more explanation and commentary. Human evaluations also favored GAME SIGHT, with a mean opinion score of 4.08 out of 5, compared to 3.14 for live text commentary. This suggests the AI can enhance the viewing experience by providing insights similar to those of human commentators, potentially benefiting broadcasters and fans who seek richer, more informative coverage.
Despite its advancements, GAME SIGHT has limitations that highlight areas for future research. The system is currently tailored specifically for soccer and may not generalize easily to other sports without adaptation to their unique event structures and knowledge bases. Real-time processing remains a , as the model operates on fixed-length video segments, introducing latency that could affect live commentary applications. Additionally, while it integrates statistical knowledge, it does not incorporate more diverse sources like player health data or social media trends, and it lacks the creative, intuitive touch of human commentators, such as humor or emotional expression. Addressing these issues could further bridge the gap between AI-generated and human commentary.
Overall, GAME SIGHT represents a significant step forward in automated sports commentary, leveraging visual reasoning and knowledge enhancement to create more human-like narratives. By accurately grounding entities and adding contextual depth, it paves the way for AI systems that can enrich sports broadcasting and fan engagement. As the technology evolves, it may expand to other domains, but for now, it offers a promising tool for making soccer viewing more informative and captivating.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn