Hospitals worldwide collect vast amounts of electronic health records (EHRs), but using this data to train artificial intelligence models for predicting patient outcomes has been hampered by a major obstacle: the data is stored in different languages, formats, and coding systems. Traditionally, researchers have had to manually standardize these records, a time-consuming and expensive process that limits the scalability of medical AI. A new study demonstrates a practical solution by using AI to automatically translate and harmonize multilingual EHRs, allowing a single predictive model to learn from diverse international datasets without manual intervention. This approach could accelerate the development of robust AI tools for critical care, making them more accessible across global healthcare systems.
The researchers found that by converting structured EHR data into text and using large language models (LLMs) for word-level translation into English, they could train a unified model that outperforms existing s. They evaluated this on seven publicly available ICU datasets from the United States, Netherlands, Switzerland, and Austria, covering languages including English, Dutch, and German. The translation-based , which aligned non-English terms to English using LLMs like Qwen3-Instruct-8B, achieved an average AUROC of 0.788 across all datasets when trained jointly, compared to 0.776 for models trained on single datasets. This indicates that pooling data across languages and institutions improves predictive accuracy for tasks such as mortality and acute kidney injury prediction.
Ology involved a text-based framework where raw EHR tables were linearized into hierarchical textual sequences without any manual harmonization. For multilingual handling, the team implemented two strategies: using multilingual encoders directly on mixed-language text, and translating non-English records into English via LLM-based word-level translation. They applied token-level language identification to classify terms as English, Dutch, German, or undetected, then translated Dutch and German tokens using dictionary-based or LLM-based approaches. The translated text was then processed with a shared English tokenizer and encoder, enabling pooled training across all datasets. This schema-agnostic approach eliminated the need for common data models like OMOP, which typically require extensive manual mapping.
From the study, detailed in Table II, show that the translation-based text model consistently outperformed baselines, including feature-aligned s like YAIB and code-based approaches. In multi-institutional learning, the LLM-aligned model achieved the highest average AUROC of 0.788, compared to 0.776 for single-dataset training. The researchers also tested transfer learning, where models pre-trained on source datasets were fine-tuned on target datasets with limited data. As shown in Figure 4, the text-based model with LLM alignment performed comparably to feature-aligned baselines in few-shot scenarios, demonstrating its adaptability to new hospitals without extensive retraining. Ablation studies in Table III confirmed that LLM-based alignment, particularly with Qwen3-Instruct-8B, yielded the best performance under pooled training.
Of this work are significant for global health AI, as it provides a scalable path to leveraging diverse EHR data without the high costs of manual standardization. By enabling language-agnostic predictive models, this approach could help deploy AI tools in resource-limited settings where data harmonization is impractical. It also supports few-shot transfer learning, allowing hospitals to quickly adapt pre-trained models with minimal labeled data. However, the study has limitations: it covers only a small set of languages and relies on off-the-shelf LLMs for translation, which may introduce errors for rare clinical terms. Future work could expand language coverage and explore more robust alignment s to further enhance model reliability and fairness in multilingual healthcare applications.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn