AI Fails at Non-English Medical Records, Study Finds

TL;DR

A new study shows simple rule-based AI beats advanced language models at reading Polish medical records, with key lessons for global healthcare systems.

Extracting vital medical information from unstructured clinical notes is a persistent in healthcare, especially in non-English settings where resources are limited. This issue is crucial because accurate data retrieval can enhance diagnoses and treatments, yet many hospitals lack the infrastructure for sophisticated AI tools. A recent study from Poland compares traditional rule-based s with modern large language models (LLMs) for pulling details like patient age, sex, medications, and skin lesions from electronic health records. underscore the importance of balancing accuracy, cost, and adaptability in real-world medical applications, particularly in languages with complex grammar like Polish.

Researchers discovered that rule-based approaches significantly outperformed LLMs in key areas, such as extracting patient age and sex with high precision. For instance, rule-based s using SpaCy Matcher achieved accuracy rates above 0.95 for age and over 0.98 for sex across different doctors' records, as detailed in the paper's . In contrast, LLMs like Llama-3-8B-it and Gemma-7b-it showed lower accuracy, with Gemma dropping to as low as 0.160 for age extraction in some cases. The study highlighted that rule-based systems were not only more accurate but also required less computational power, making them suitable for hospitals with limited IT resources. This advantage stems from their ability to handle the grammatical nuances of Polish, where word forms change based on gender and other factors.

Ology involved analyzing 1,679 pediatric allergy records from a Polish hospital, employing two main approaches. Rule-based s used pattern matching and dictionaries for tasks like identifying age from formats such as '5 4/12' (meaning 5 years and 4 months) and sex through linguistic cues in the first sentence of each record. LLMs, including Llama-3-8B-it and Gemma-7b-it, were applied with structured prompts to extract the same information, and their performance was also tested on texts translated from Polish to English. The researchers measured accuracy and mean absolute error (MAE) to assess how well each captured details, ensuring a fair comparison between low-compute rule-based systems and resource-intensive LLMs.

From the analysis showed clear trade-offs: rule-based s excelled in demographic extraction, with MAE for age as low as 0.150 (about 1.8 months) for one doctor, while LLMs had errors up to 0.763 (over 9 months) in some instances. For drug name recognition, however, LLMs like Llama outperformed rule-based systems, achieving an average score of 0.858 compared to 0.843, as they didn't require manually built dictionaries. Translation experiments revealed that converting Polish text to English improved drug extraction but led to information loss, such as missing gender cues due to simplified grammar. These outcomes, illustrated in figures throughout the paper, emphasize that no single is universally best, and context matters greatly in medical data handling.

Of this research are significant for healthcare systems worldwide, especially in non-English-speaking countries where data extraction can impact patient care. By demonstrating that simpler, rule-based AI can be highly effective, the study offers a practical path for hospitals to improve data reliability without heavy investments. However, the paper notes limitations, such as the potential for rule-based systems to fail with new terminology and LLMs' tendencies toward hallucinations or inaccuracies in underrepresented languages. Future work should explore hybrid models that combine the strengths of both approaches, aiming for more robust and efficient clinical NLP tools in diverse settings.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn