AI Is Changing How Gymnastics Gets Judged

TL;DR

New AI tools trained on gymnastics rules score routines more consistently than human judges. Here is how the technology works.

In the high-stakes world of international gymnastics, precision and objectivity in judging have long been elusive goals, but a groundbreaking study from Fujitsu Research is changing that narrative. Researchers Fan Yang, Shigeyuki Odashima, Shoichi Masui, Ikuo Kusajima, Sosuke Yamao, and Shan Jiang have developed a robust multi-camera tracking system that leverages artificial intelligence to monitor gymnasts in real-time, addressing unique s like limited camera installations and environmental variabilities. This innovation, detailed in their paper, has already been deployed at Gymnastics World Championships, earning accolades from the International Gymnastics Federation for its ability to provide near-instant feedback and enhance the fairness of competitions. By integrating domain-specific knowledge into AI algorithms, the system marks a significant leap forward in sports technology, potentially setting a new standard for how athletic performances are analyzed and scored globally.

Ology behind this system is both intricate and adaptive, designed to overcome the inherent difficulties of tracking gymnasts who perform rapid, high-altitude movements in crowded stadiums. Using four calibrated RGB cameras positioned around the performance area, the framework begins by employing parallelized person detectors, such as YOLOX-S, to generate bounding boxes from each video stream. These 2D detections are then processed through a cascaded data association paradigm that dynamically switches between traditional triangulation and a novel ray-plane intersection technique, depending on the availability of cross-view detections. When detections from opposing cameras are sparse—a common issue due to occlusions or lighting changes—the system utilizes gymnastics domain knowledge, assuming the gymnast's center lies within a predefined vertical plane during much of their routine, to estimate 3D positions more reliably. This hybrid approach minimizes tracking failures and ensures robust 3D trajectory generation, even in scenarios where conventional s would falter, such as when only two opposing views are available.

Experimental validate the superiority of this approach, with extensive testing on a diverse dataset covering seven gymnastics disciplines, including Balance Beam, Pommel Horse, and Uneven Bars. Compared to state-of-the-art multi-camera tracking s like MvPT and UniMMT, the proposed system significantly reduces identity switches and average Euclidean distance errors in 3D tracklets, particularly in challenging cases with limited camera views. For instance, in Uneven Bars routines with only two opposing views, cut ID switches by up to 89.5% and improved tracking accuracy by over 90%, as measured against ground truth annotations. The system also maintains high computational efficiency, achieving speeds of over 400 frames per second without relying on appearance features, which makes it suitable for real-time applications in live competitions and training sessions.

Of this technology extend far beyond gymnastics, offering a blueprint for integrating domain knowledge into AI systems across various fields. In sports, it could revolutionize performance analysis and injury prevention by providing precise, data-driven insights into athlete movements, while in broader contexts like surveillance or robotics, similar approaches could enhance object tracking in complex environments. Socially, the system promotes fairness and transparency in judging, reducing human bias and enabling coaches to deliver more targeted feedback. Moreover, its success at international events underscores the potential for AI to augment human expertise in high-pressure scenarios, fostering innovation in how we monitor and interpret dynamic physical activities.

Despite its achievements, the study acknowledges certain limitations, such as the occasional occurrence of identity switches in extremely crowded scenes and the context-dependent effectiveness in environments with ample camera coverage. The reliance on geometric cues alone, for speed purposes, means that appearance features are omitted, which could otherwise improve robustness but at a computational cost. Future work could explore hybrid models that balance accuracy and efficiency, or adapt the framework to other sports with similar constraints. Overall, this research not only advances multi-camera tracking but also highlights the critical role of domain-specific insights in pushing the boundaries of AI applications, paving the way for more intelligent and responsive technological solutions in sports and beyond.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn