AI Ranks Genes to Diagnose Rare Diseases Faster

TL;DR

A new AI system combines language models with medical data to rank genes accurately for rare disease diagnosis, cutting errors and saving clinicians time.

Rare diseases affect millions worldwide, yet patients often wait years for a diagnosis due to the complexity of linking genetic variants to symptoms. Clinicians typically sift through extensive medical literature and databases manually, a process prone to oversight and delay. A new AI tool, LA-MARRVEL, addresses this by improving the accuracy and interpretability of gene prioritization in rare disease cases, offering a more reliable aid for time-strapped medical teams.

LA-MARRVEL operates as a reranker that builds on an existing high-recall pipeline, AI-MARRVEL, which supplies curated gene and variant context. It queries a large language model multiple times using structured prompts that include human phenotype ontology (HPO) terms, variant details, and inheritance patterns. The system then aggregates these partial rankings using Tideman's ranked-pairs voting method to produce a stable, consensus-based gene order. This approach reduces randomness and enhances reliability by integrating evidence from repeated runs.

In evaluations across three independent clinical cohorts—Baylor Genetics (BG), Deciphering Developmental Disorders (DDD), and the Undiagnosed Diseases Network (UDN)—LA-MARRVEL consistently outperformed established tools like Exomiser and LIRICAL. As shown in Figure 2, it achieved the highest recall rates (the fraction of cases where the causal gene appears in the top-K candidates) at all K values from 1 to 10. Notably, at Top-1 and Top-3 thresholds, LA-MARRVEL improved performance by roughly 5-10 percentage points over AI-MARRVEL, 20-30 points over Exomiser, and 30-45 points over LIRICAL. In the UDN cohort, it reached about 90% recall at Top-3, while the next best method trailed by significant margins. The tool was particularly effective at rescuing cases where the causal gene was initially poorly ranked, with 100% improvement observed for many genes originally placed lower in the list.

LA-MARRVEL's success hinges on its knowledge-grounded prompts; removing HPO information caused the largest drop in accuracy, such as a 20.22 percentage point decrease in Recall@1. The system also includes an explainer feature that generates plain-language justifications for gene rankings, detailing phenotypic matches, variant assessments, and inheritance checks based on ACMG evidence codes. For example, in one case study, it correctly promoted the SPG7 gene by highlighting strong phenotype alignment despite molecular concerns, and demoted CLDN16 due to weak support and inheritance mismatches. This transparency helps clinicians quickly verify results and focus on substantive decisions.

The implications for real-world healthcare are substantial: by delivering more accurate and interpretable gene lists, LA-MARRVEL can reduce the burden on clinicians, shorten diagnostic odysseys, and improve patient outcomes. However, the study notes limitations, such as the trade-off between the number of genes considered and performance—increasing candidate sets can boost recall at higher ranks but may slightly reduce top-1 accuracy. Future work could explore integrating richer data sources and adaptive ensembling to enhance efficiency in clinical settings.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn