Simple AI Outperforms Complex Models at Spotting Tooth Decay

TL;DR

A new study shows older, simpler AI beats advanced architectures at detecting cavities in X-rays, raising questions about complexity in medical imaging.

A new study reveals that simpler artificial intelligence models can outperform more advanced architectures in a critical medical task: detecting tooth decay from dental X-rays. This finding s the prevailing trend in AI research, where newer, more complex models like transformers and Mambas are often assumed to be superior. The research, conducted on a dataset of panoramic radiographs, shows that convolutional neural networks (CNNs), a well-established type of AI, achieved the highest accuracy in segmenting dental caries, which are areas of tooth decay. This has significant for clinical practice, as accurate early detection of caries is essential for preventing tooth loss and oral discomfort, conditions that affect over one-third of the global population, including 514 million children worldwide.

The key finding from the study is that the CNN-based DoubleU-Net model outperformed all transformer and Mamba variants across multiple performance metrics. Specifically, DoubleU-Net achieved a dice coefficient of 0.7345, a mean Intersection over Union (mIoU) of 0.5978, and a precision of 0.8145 on the DC1000 dataset, which consists of 597 high-resolution panoramic radiographs annotated by experienced dentists. The top three across all metrics were all from CNN-based architectures, including U-Net and ColonSegNet. In contrast, transformer-based models like PVTFormer and Mamba-based models like RMAMamba-S showed lower performance, despite their theoretical advantages in modeling global context. This indicates that for dental caries segmentation, architectural simplicity and effective spatial feature aggregation can be more beneficial than complexity.

The researchers conducted a comprehensive benchmarking study to compare 12 state-of-the-art segmentation architectures under identical experimental conditions. They used the DC1000 dataset, which includes 497 training images and 100 test images, with pixel-level masks for caries regions. All models were trained on a single NVIDIA V100 GPU using the PyTorch framework, with consistent data splits, augmentations, and optimization settings. Data augmentations included horizontal flipping, random shifts, rotations, and contrast adjustments to improve model robustness. Performance was evaluated using standard metrics such as mIoU, dice coefficient, precision, recall, and Hausdorff Distance, ensuring a fair comparison across different AI approaches, from CNNs to transformers and Mambas.

, Detailed in Table 2 of the paper, show that DoubleU-Net not only led in accuracy but also demonstrated strong computational efficiency. For instance, U-Net achieved the fastest inference speed at 40.83 frames per second, while ResUNet++ and DoubleU-Net followed with 34.22 and 33.09 FPS, respectively. In contrast, transformer and Mamba models, despite having high area under the curve (AUC) scores in ROC analysis—with MambaUNet reaching an AUC of 0.9816—often failed to translate this into precise boundary localization, as seen in qualitative comparisons in Figure 3. The study also highlighted that carious regions cover only a small fraction of each radiograph, amplifying the impact of missing pixels on metrics like mIoU, which may explain why models with better global context modeling struggled with fine details.

Of this research are substantial for both AI development and clinical dentistry. By showing that simpler CNN models can outperform more complex architectures in data-limited medical imaging, the study underscores the importance of architecture-task alignment over mere model complexity. This could lead to more practical and efficient AI tools for dental diagnostics, potentially improving early detection and treatment planning in real-world settings where computational resources and annotated data are often limited. suggest that clinicians might benefit from adopting AI systems based on proven, efficient models like DoubleU-Net, which offer a favorable trade-off between accuracy and speed, as shown in Table 3's computational analysis.

However, the study has limitations that must be considered. The DC1000 dataset, while diverse, is relatively small with 597 images, which may not fully leverage the capabilities of transformer and Mamba models that typically require larger datasets to excel. Additionally, the dataset contains class imbalance and numerous low-contrast or subtle lesions, which could have constrained performance across all architectures. These factors highlight the need for future work to expand datasets through multi-institutional data collection and to test algorithms on more heterogeneous cohorts, potentially including other dental conditions like impacted teeth or periapical radiolucencies.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn