Medical billing errors cost the U.S. healthcare system billions annually, with Medicare alone making $6.7 billion in inappropriate payments for evaluation and management services in 2010. A new artificial intelligence system developed by Oracle Health researchers tackles this problem head-on, achieving a 36% improvement in coding accuracy over commercial systems while providing transparent reasoning for every decision.
The ProFees framework uses large language models to automate the complex process of assigning Current Procedural Terminology codes to patient encounters. This system addresses five major challenges in medical coding: the absence of intermediate decision labels, disagreement among human coders, lack of explainability, model inconsistency, and the need for broad clinical knowledge. On a test set of 99 real-world patient encounters, ProFees achieved 36.85% higher accuracy than commercial System A and 4.73% better performance than the strongest single-prompt baseline.
Researchers employed a modular approach where separate AI components handle different aspects of the coding process. The system first classifies the encounter type, then assesses medical decision-making complexity across three elements: problems addressed, data reviewed, and risk of complications. Each prediction undergoes recursive criticism and improvement, where the AI systematically audits its own decisions against comprehensive checklists aligned with official coding guidelines.
To ensure reliability, the system runs three parallel inferences for every encounter and uses majority voting to produce final predictions. This self-consistency approach addresses the inherent variability in language models while maintaining reasonable computational costs. The framework also incorporates dynamic few-shot prompting, retrieving relevant examples from a curated database of expert-annotated cases to guide its reasoning process.
The practical impact is substantial. In Florida alone, nearly 9% of primary care visits were undercoded, resulting in approximately $114 million in lost hospital revenue annually. ProFees not only improves accuracy but provides human-readable justifications for every coding decision, crucial for audits and compliance. The system's chain-of-thought approach generates step-by-step reasoning that clinicians and auditors can review and verify.
Despite these advances, the system currently predicts only one code per encounter and relies on limited real-world datasets due to privacy constraints. The researchers acknowledge potential algorithmic biases and are developing synthetic datasets to enable broader evaluation. They emphasize that ProFees serves as an assistive tool rather than an autonomous decision-maker, with human professionals maintaining ultimate responsibility for coding accuracy.
The work demonstrates how AI can handle complex, rule-based decision-making in regulated domains while providing the transparency needed for real-world deployment. Beyond healthcare billing, the approach could extend to other domains requiring auditable, rule-aligned predictions where accuracy and explainability are equally important.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn