A new artificial intelligence system can translate sign language into text instantly, potentially breaking down communication barriers for deaf communities worldwide. Developed by researchers at Vietnam National University, this technology could transform how deaf individuals access healthcare, education, and public services that often exclude them due to language differences.
The key breakthrough is a hybrid AI system that achieves 92% accuracy in recognizing sign language gestures, with particularly strong performance on distinct signs like "Hello" (95% accuracy) and "Thank you" (91% accuracy). The system works by combining two types of neural networks—one that analyzes spatial patterns in hand and body positions, and another that tracks how these positions change over time to understand complete gestures.
Researchers built the system using a clever approach that avoids the need for special gloves or sensors. Instead, it uses ordinary camera footage processed through Google's MediaPipe technology to identify 522 key points on a person's body—including 21 points on each hand, 468 facial points, and 33 body joint positions. Think of it like creating a digital skeleton that captures not just hand shapes but also facial expressions and body movements, all essential components of sign language communication.
The system combines convolutional neural networks (CNNs), which excel at analyzing spatial relationships in single frames, with long short-term memory (LSTM) networks that track how these relationships evolve across 30 consecutive video frames. This dual approach allows the AI to understand both the shape of each gesture and its timing and flow, much like how humans process sign language by watching both hand positions and movement patterns.
Performance results show the system achieves an average accuracy of 92% across different signs, with a recall rate of 89% and F1-score of 90.5%. The detailed breakdown reveals it performs exceptionally well on clearly distinct gestures but struggles somewhat with visually similar signs. For example, while it accurately recognizes "Hello" 95% of the time, it confuses "Call" and "Yes" in some cases because these gestures share similar hand positions and movements.
The researchers developed a practical demonstration using Streamlit, creating an interactive interface that shows real-time translation. When users sign in front of a camera, the system immediately displays the predicted meaning as text. This immediate feedback makes the technology accessible and demonstrates its potential for real-world applications.
What makes this development significant is its potential to address real communication gaps. In healthcare settings, it could enable deaf patients to communicate directly with medical staff without interpreters. In education, it could help integrate deaf students into mainstream classrooms and serve as a learning tool for both deaf communities and hearing individuals wanting to learn sign language. For public services like government offices and transportation, it could provide immediate translation assistance.
However, the technology still faces limitations. The system sometimes confuses visually similar gestures, performance can be affected by poor lighting conditions, and it needs to account for individual variations in how people execute signs. The researchers note that these challenges represent opportunities for improvement rather than fundamental flaws.
The team has made their code publicly available on GitHub, including training scripts, evaluation tools, and the interactive demonstration interface. This openness allows other researchers to build upon their work and could accelerate development of more robust sign language recognition systems.
Future improvements could include adding more contextual information, using advanced attention-based models like Transformers to better distinguish similar gestures, and expanding the vocabulary of recognized signs. As the technology evolves, it could eventually handle complete sentences and conversations rather than individual gestures.
This research represents an important step toward creating more inclusive technology that bridges communication divides. While not perfect, the 92% accuracy rate for real-time translation demonstrates that AI systems are becoming capable enough to handle the complex, nuanced nature of sign language communication.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn