AIResearch AIResearch
Back to articles
Science

AI Improves Breast Cancer Detection Across Hospitals

A new AI method uses consistency regularization to enhance tumor segmentation in ultrasound images, achieving significant improvements in external validation across three international datasets.

AI Research
March 26, 2026
3 min read
AI Improves Breast Cancer Detection Across Hospitals

A new AI has demonstrated substantial improvements in detecting breast tumors from ultrasound images across different hospitals and countries, addressing a critical in medical imaging. Breast cancer remains a leading cause of mortality among women worldwide, with early detection being vital for better patient outcomes. Ultrasound is particularly useful for dense breast tissue, but AI models often struggle when applied to data from different clinical centers due to variations in imaging systems and protocols. This research introduces a consistency regularization approach that mitigates destructive interference between segmentation and classification tasks, enabling more reliable generalization to external datasets.

The key finding is that the proposed multi-task learning achieves statistically significant improvements in tumor segmentation performance compared to baseline approaches. On three external datasets—UDIAT from Spain, BUSI from Egypt, and BUS-UCLM from Spain—showed Dice coefficient increases of 37%, 18%, and 41%, respectively, all with p<0.001. For example, on the UDIAT dataset, the Dice coefficient improved from 0.59 to 0.81, matching state-of-the-art single-task s while also providing malignancy classification. This indicates that the AI can accurately outline tumor boundaries in ultrasound images from diverse sources, a crucial step for assisting radiologists in diagnosis.

Ology involves an encoder-decoder architecture with a shared backbone for simultaneous segmentation and classification, using an EfficientNet-B1 encoder and U-Net decoder with attention gates. The innovation lies in consistency regularization through differentiable BI-RADS-inspired morphological features. These features—Area, Boundary Roughness, Compactness, and Texture—are computed from soft segmentation masks and combined into a composite malignancy prior using learned weights. A consistency loss enforces agreement between the predicted malignancy score and this morphology-based prior, resolving task interference by aligning segmentation with clinical characteristics. The training objective balances segmentation and classification losses, with higher weight given to segmentation to prioritize spatial accuracy.

Analysis, as shown in Table 2, reveals that baseline multi-task learning suffers from destructive interference, underperforming single-task models on external datasets. In contrast, the proposed not only overcomes this but achieves Dice coefficients of 0.66 on BUSI, 0.81 on UDIAT, and 0.69 on BUS-UCLM, with statistical significance confirmed via paired Wilcoxon signed-rank tests. Figure 2 illustrates that produces cleaner boundaries and better handles challenging cases compared to baselines. For classification, it maintains strong performance with AUC values ranging from 0.74 to 0.79 externally, demonstrating that consistency regularization enables beneficial synergy without trade-offs. The model selection, based on internal validation AUC, proved effective for external generalization, as Proposed-best-cls consistently matched or outperformed variants optimized solely for segmentation.

Are significant for real-world clinical applications, as 's ability to generalize across different hospitals and ultrasound systems enhances its utility in diverse healthcare settings. By leveraging BI-RADS morphological priors, which are domain-robust, the AI can adapt to varying imaging protocols while maintaining accuracy. This could lead to more reliable computer-aided diagnosis systems that assist radiologists in early detection, potentially improving patient outcomes through timely intervention. The approach also sets a precedent for using consistency regularization in other medical imaging tasks where multi-task learning is applied.

Limitations include that the study focused primarily on segmentation improvement, with classification performance kept competitive but not optimized; future work could explore classifier-optimized variants. The research used datasets from specific regions, and further validation on more diverse global populations is needed to confirm broader applicability. Additionally, while mitigates destructive interference, it relies on predefined morphological features, which may not capture all clinical nuances. The paper notes that model selection based on internal segmentation performance alone may not yield optimal external generalization, highlighting the need for careful validation protocols in AI development for healthcare.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn