Set Theory Predicts Breast Cancer Risk More Accurately

TL;DR

A math-based system uses routine health data to flag breast cancer risk at 70% accuracy, helping avoid unnecessary invasive procedures.

Breast cancer remains a leading cause of death worldwide, but early detection significantly improves survival rates. A new mathematical approach using set theory could help identify high-risk patients more efficiently, potentially reducing the need for invasive diagnostic procedures like biopsies. This system analyzes common clinical measurements that are already collected during routine medical visits, making it accessible and non-invasive.

The researchers developed a set theory-based system that assesses breast cancer risk by analyzing five key clinical parameters: age, body mass index (BMI), insulin levels, leptin levels, and adiponectin levels. These factors were selected because they represent measurable indicators that can be obtained from standard blood tests and patient information. The system calculates a risk score by applying fuzzy set inference computations to these parameters, essentially determining how strongly each patient's characteristics align with known risk patterns for breast cancer.

The methodology involves transforming raw clinical data into fuzzy sets, which can handle the uncertainty and vagueness inherent in medical diagnosis. For example, instead of simply categorizing a patient as "young" or "old," the system assigns membership values across multiple age categories (Child, Young, Mild, Old) using mathematical functions. This same approach was applied to all five parameters, with BMI divided into obesity classes, insulin into hypoglycemia/normal/hyperinsulinemia categories, leptin into low/medium/high/very high levels, and adiponectin into low/medium/high concentrations.

The researchers used data from the UCI Machine Learning Repository, working with 116 patient instances. After fuzzifying the data, they applied Kong's algorithm to forecast which patients would develop breast cancer. The results showed the system correctly identified 7 out of 10 patients in their test sample, achieving 70% accuracy. Specifically, patients μ3, μ11, μ19, μ82, and μ91 were correctly classified as healthy controls, while μ60 and μ104 were correctly identified as high-risk patients. However, the system misclassified patient μ31.

This approach matters because it provides a quantitative method for preliminary risk assessment using data that's already routinely collected. Healthcare professionals could use such a system to determine which patients truly need further diagnostic procedures, potentially reducing unnecessary biopsies and other invasive tests. The non-invasive nature of the assessment makes it particularly valuable for initial screening in resource-limited settings or for patients who may be hesitant about more invasive procedures.

The study acknowledges several limitations. The 70% accuracy rate, while promising, indicates room for improvement before clinical implementation. The researchers suggest that future work should extend the system using fuzzy-rough nearest neighbor techniques to better handle the inherent uncertainty in medical data. Additionally, the current validation was performed on a relatively small dataset, and broader testing would be necessary to confirm the system's reliability across diverse patient populations.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn