AIResearch AIResearch
Back to articles
Science

AI Finds Medical Terms That Don't Officially Exist

A new method uses hyperbolic geometry to locate concepts in medical databases even when the exact words aren't listed, improving search for electronic health records.

AI Research
March 26, 2026
3 min read
AI Finds Medical Terms That Don't Officially Exist

When doctors or researchers search medical databases for terms like 'tingling pins sensation,' they often hit a dead end because such phrases aren't officially listed in standardized ontologies like SNOMED CT. This gap can hinder clinical decision-making and data analysis, as these out-of-vocabulary queries represent real-world concepts that lack direct matches. A new study from the University of Manchester addresses this by applying artificial intelligence to map these missing terms onto existing hierarchical structures, offering a more intuitive way to navigate complex biomedical knowledge.

The researchers developed a that uses language model-based ontology embeddings to retrieve relevant concepts for out-of-vocabulary queries. They focused on SNOMED CT, a widely used biomedical ontology with over 350,000 concepts organized hierarchically, where concepts like 'Pins and needles' are subsumed under broader categories like 'Paresthesia.' The key finding is that their approach, particularly using the Ontology Transformer model, outperformed existing s by accurately identifying parent and ancestor concepts for queries that have no equivalent in the ontology. For example, for the query 'tingling pins sensation,' the system could correctly retrieve 'Pins and needles' as the most direct subsumer, as illustrated in Figure 1 of the paper.

To achieve this, the team employed two main techniques: the Hierarchy Transformer and the Ontology Transformer. These s embed SNOMED CT concepts into a hyperbolic space—a geometric representation where concepts are positioned based on their hierarchical relationships, with more general concepts near the origin and specific ones toward the edges. They fine-tuned pre-trained language models using contrastive learning to capture the ontology's structure, then used scoring functions that combine hyperbolic distance and depth-based metrics to rank potential matches. This allowed them to handle not only simple hierarchies but also complex logical relationships expressed in OWL, such as conjunctions and existential restrictions.

, Detailed in Tables 1, 2, and 3 of the paper, show that the Ontology Transformer model achieved a mean reciprocal rank of 0.63 in the single-target setting, meaning it ranked the correct most direct subsumer highly on average. In multi-target evaluations, where ancestors within up to five hops were considered relevant, performance improved further, with hit rates reaching 88% for the top five . The model consistently outperformed baselines like SBERT and lexical matching s, with a median rank of 1 across all tests, indicating it often placed the correct concept at the top of the list. The researchers also found that training on semantically coherent fragments of the ontology, rather than the entire SNOMED CT, yielded better , as seen with the 'OnT Mini' variant.

This advancement has significant for healthcare and data management, as it enables more effective retrieval in electronic health records and clinical support systems. By allowing users to find relevant concepts even when their search terms aren't explicitly listed, can improve terminology navigation and semantic interoperability between different medical systems. The approach is generalizable to other ontologies beyond SNOMED CT, potentially benefiting fields like biology or law where hierarchical knowledge structures are common. The researchers have released their code and datasets publicly, encouraging further development and application in real-world settings.

However, the study has limitations, primarily due to the small evaluation dataset of only 50 manually annotated queries, constrained by time and annotation effort. This limits the statistical robustness of and may not fully represent the diversity of out-of-vocabulary terms in practice. Future work should expand the dataset size, involve domain experts for annotation, and explore training strategies for larger ontologies. Despite these constraints, represents a step forward in making AI-driven retrieval more adaptable and useful in complex, hierarchical knowledge systems.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn