AI Translates Chinese-English Better Than Ever After New...

TL;DR

Tencent's AI topped the WMT20 translation contest by combining multiple model architectures with new data generation and fine-tuning methods.

Machine translation between Chinese and English represents one of the most challenging tasks in artificial intelligence, with implications for global communication, business, and cross-cultural exchange. The WMT20 competition serves as a benchmark for measuring progress in this field, where Tencent's WeChat team developed a system that achieved the highest score among constrained submissions.

The key finding from this research is that combining multiple neural network architectures with sophisticated data generation and fine-tuning techniques can significantly improve translation quality. The system achieved a case-sensitive BLEU score of 36.9 on the newstest2020 evaluation, outperforming other submissions in the Chinese-to-English translation task.

Methodologically, the researchers employed four different neural network architectures: Deeper Transformer, Wider Transformer, Average Attention Transformer, and DTMT (a novel RNN-based model). Rather than relying on a single approach, they trained multiple variants of each architecture and combined them through ensemble methods. The team generated synthetic training data using back-translation (translating monolingual English text to Chinese) and knowledge distillation (transferring knowledge from teacher to student models). They also implemented iterative knowledge transfer, where models trained on synthetic data were used to generate new training examples, creating a self-improving cycle.

Results analysis shows progressive improvement through each training stage. Starting from baseline scores around 26.2 BLEU, back-translation provided substantial gains to approximately 29.6 BLEU. The first knowledge transfer iteration boosted scores to around 38.1 BLEU, with the second transfer providing smaller but consistent improvements. Advanced fine-tuning techniques including parallel scheduled sampling, target denoising, and minimum risk training further increased performance to 39.1 BLEU. The final ensemble of 20 diverse models achieved the competition-winning 36.9 BLEU score on the official test set.

This advancement matters because high-quality Chinese-English translation has real-world applications in international business, diplomacy, and cross-cultural communication. As China's global influence grows, accurate machine translation becomes increasingly important for breaking down language barriers. The techniques demonstrated here—particularly the combination of multiple architectures and iterative training—could be applied to other language pairs and domains beyond translation.

Limitations noted in the research include the computational intensity of the approach, requiring weeks of training on multiple GPUs. The paper also acknowledges that simply combining top-performing models provided only marginal gains (0.1 BLEU), suggesting that model diversity rather than individual performance drives ensemble effectiveness. The research focused specifically on Chinese-English translation, leaving open questions about how well these techniques generalize to other language pairs with different linguistic structures.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn