AI Detects Rare Objects Better With Balanced Training

TL;DR

A new method groups similar object classes and uses weighted loss to boost detection accuracy to 24.5% on a tough benchmark.

Object detection systems, which identify and locate items in images, often struggle in real-world settings where some categories appear much more frequently than others. This imbalance, known as a long-tailed distribution, causes models to favor common objects like cars or people over rare ones like specific animals or tools, limiting their usefulness in applications from autonomous driving to medical imaging. A new study addresses this issue by enhancing an existing approach called Balanced Group Softmax (BAGS), achieving a state-of-the-art mean Average Precision (mAP) of 24.5% on the LVISv1 dataset, which includes 1,203 categories and 164,000 images. This improvement, up from 24.0%, demonstrates progress in making AI systems more reliable across diverse scenarios where data is unevenly distributed.

The researchers found that by modifying the BAGS framework, they could boost performance for rare and common categories without sacrificing accuracy on frequent ones. In their experiments, they tested several extensions to BAGS, such as increasing the number of bins from four to five, using clustered bins based on instance frequencies, and applying class weighting within bins. The most effective strategy was a hybrid approach that used softmax with class weights for rare and common categories and standard softmax for frequent categories, resulting in an overall mAP of 24.5%. Specifically, this improved the Average Precision (AP) for rare categories to 16.6%, up from 15.6% in the original BAGS, and for common categories to 23.1%, while maintaining a high AP of 29.4% for frequent categories. These gains highlight how targeted adjustments to training can mitigate bias in imbalanced datasets.

To achieve these , the study employed a two-stage Faster R-CNN architecture, a common object detection model that uses a Region Proposal Network to generate bounding boxes and a classifier to identify objects. The key innovation involved grouping categories into bins based on their frequency in the dataset, such as separating rare classes with fewer than 10 instances from more common ones. Within each bin, the researchers applied modifications like class weighting, where each category's influence was adjusted inversely to its instance count, and Focal Loss, which focuses training on harder-to-classify examples. They also explored metric learning techniques, including Center Loss and Large Margin Cosine Loss, to create feature embeddings that are tightly clustered within classes and well-separated between them, though these s showed mixed in practice.

The data from the experiments, detailed in tables in the paper, reveal the impact of these modifications. For example, when using class weighting within bins, the mAP for rare categories increased to 16.1%, and for common categories to 23.6%, though frequent category performance slightly dropped to 28.7%. In contrast, metric learning approaches, such as training with Center Loss and using k-Nearest Neighbors for inference, did not perform as well overall, with mAP values as low as 5.0% on the full dataset. Visualization of features using t-SNE plots showed that tail class features, like those for "cucumber," were scattered and did not form tight clusters, explaining s in classification. These underscore that while grouping and weighting are effective, creating distinct feature representations for rare classes remains a hurdle.

This research matters because it brings object detection systems closer to handling real-world environments where data imbalance is the norm, not the exception. By improving detection of rare categories, s could enhance applications in fields like wildlife monitoring, where spotting endangered species is critical, or in retail, where identifying less common products improves inventory management. The study also points to future directions, such as using segmentation masks from the LVIS dataset to boost accuracy or exploring generative AI to augment data for tail classes. However, the authors note limitations, including the difficulty of achieving tight feature clusters for highly variable tail classes and the trade-offs in performance when prioritizing rare over frequent categories. As AI systems become more integrated into daily life, advances like these are essential for ensuring they work reliably across all scenarios, not just the most common ones.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn