Medical education faces a critical in radiology training: novice readers need iterative feedback to build pattern recognition and diagnostic reasoning skills for interpreting chest X-rays, the most common radiological examination. Traditional simulators often rely on static quizzes without explaining why a diagnosis is correct, while human tutoring is limited by faculty availability. Intelligent tutoring systems have shown promise but typically focus on knowledge assessment and concept mapping, with limited integration of image-based learning. Recent vision-language models like CheXagent and Radiology-GPT demonstrate image-grounded reasoning but lack formal multi-agent orchestration, mastery tracking, and eye gaze validation. IMACT-CXR addresses these gaps by creating an interactive multi-agent conversational tutor that unifies spatial annotation, gaze analysis, knowledge retrieval, and image-grounded reasoning in a single workflow, offering a more comprehensive and adaptive learning tool for trainees.
IMACT-CXR helps trainees interpret chest X-rays by simultaneously processing their bounding boxes, gaze samples, and free-text observations through a coordinated system of specialized agents. The system validates spatial annotations using an IoU threshold of 0.6 to measure overlap between student and expert bounding boxes, and it analyzes gaze data with a lung-lobe segmentation module derived from a TensorFlow U-Net, computing metrics like coverage ratio, dwell time ratio, and sequence score. Bayesian Knowledge Tracing maintains skill-specific mastery estimates that drive adaptive feedback, such as when to reinforce knowledge from PubMed, suggest similar cases from the REFLACX dataset, or trigger NV-Reason-CXR-3B for vision-language reasoning. Safety mechanisms, including ground-truth sanitization and progressive disclosure, prevent premature answer leakage, preserving based learning while ensuring the tutor responds only after validating all learner evidence.
Ology involves an AutoGen-based orchestration workflow that executes stages synchronously: focus validation, assessment and mastery update, decision routing for Socratic prompts, knowledge snippets, case similarity suggestions, and NV-Reason reasoning, followed by faculty-style response generation. Key components include a Socratic tutor agent that generates open-ended coaching statements, a knowledge base agent that retrieves PubMed evidence or synthesizes summaries, a case similarity agent that surfaces up to three similar cases from REFLACX, and a reasoning agent that provides step-by-step image-grounded guidance using NV-Reason-CXR-3B. The system is deployed on a workstation with 8 NVIDIA RTX 8000 GPUs and uses GPT-4 for LLM-based agents, with performance profiling showing bounded latency, such as 24.07 seconds for a full turn without reasoning and 66.50 seconds with reasoning.
Preliminary evaluation with a single novice participant over 20 cases shows that IMACT-CXR outperforms baselines in localization and diagnostic reasoning. For localization, measured by mean Intersection over Union, IMACT-CXR achieved 0.59, compared to 0.51 for Radiology-GPT and 0.43 for a text-based tutor, with 63% of cases passing the IoU gate versus 45% and 38%, respectively. In diagnostic reasoning accuracy, IMACT-CXR improved from 0.46 at the start to 0.71 at the end, surpassing Radiology-GPT's 0.48 to 0.62 and the text-based tutor's 0.42 to 0.54. The system also reached mastery thresholds in fewer turns (average 4.2) compared to the traditional tutor (6.1) and Radiology-GPT (5.3), with an average mastery progression increase of 0.38. Ablation studies confirm each component's contribution: removing gaze analytics reduced localization by 6.8% and diagnostic accuracy by 5.6%, disabling BKT increased turns to mastery by 21.4% and reduced final accuracy by 8.5%, omitting reasoning guidance reduced diagnostic accuracy by 4.2%, and excluding knowledge retrieval reduced localization by 8.5% and diagnostic accuracy by 9.9%.
Of this research are significant for radiology education and beyond, as IMACT-CXR demonstrates how AI can emulate expert mentorship by integrating multimodal inputs to provide personalized, adaptive feedback. By combining spatial validation, gaze analytics, and mastery tracking, the system offers more actionable guidance than previous models, such as referencing specific lung lobes based on gaze data. This approach could enhance training efficiency, reduce reliance on scarce human tutors, and improve diagnostic accuracy among novices. The system's extensibility suggests potential applications in other imaging modalities or medical fields, though it currently focuses on chest X-rays using the REFLACX dataset. The ability to operate with bounded response times, as shown in latency measurements, makes it feasible for interactive use in clinical or educational settings, paving the way for live residency deployment.
However, the system has limitations that must be addressed in future work. It relies on proprietary LLMs like GPT-4 for assessment and responder stages, which may raise cost and accessibility concerns. The computational cost of NV-Reason-CXR-3B is high, with local inference taking about 41.88 seconds per call, potentially limiting use in resource-constrained environments. The need for reliable eye-tracking hardware could be a barrier to widespread adoption, and the preliminary evaluation involved only a single participant, requiring expansion to formal user studies with multiple participants and statistical validation to establish generalizability. Future efforts may include compressing the reasoning model for on-premise deployment, integrating additional vision-language models for comparison, and conducting longitudinal studies to measure knowledge retention and long-term impact on clinical skills.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn