AI Sharpens Blurry Image Boundaries to Boost Robot Vision

Computers that can accurately identify object boundaries in images are crucial for technologies from self-driving cars to medical imaging, but current systems often struggle with the messy, imprecise annotations that human labelers create. A new approach called STEAL (Semantically Thinned Edge Alignment) addresses this by teaching AI to predict sharper, more precise boundaries despite training on noisy data. This improvement could enhance how robots interact with objects and how computers analyze complex visual scenes.

The researchers developed a method that forces boundary detection networks to predict responses specifically at edge locations, regularizing the direction of these responses. This approach, which can be added to any existing boundary detection architecture, significantly improves performance. On standard benchmarks, STEAL boosted the backbone network's performance by 4% in maximum F-measure and 18.61% in average precision, outperforming state-of-the-art methods.

The team implemented their approach by adding a thinning layer and alignment framework to existing boundary detection networks. The thinning layer encourages pixels to achieve their highest response in the direction normal to the boundary, while the alignment framework treats ground-truth boundaries as variables that can be optimized during training. This two-step process—evolving boundaries toward areas where the network is highly confident, then optimizing network parameters—allows the system to learn from noisy annotations while producing cleaner predictions.

Experimental results demonstrate clear improvements. On the Semantic Boundaries Dataset, STEAL achieved 68.15% maximum F-measure compared to 64.03% for the baseline CASENet architecture. More impressively, the method showed even greater gains on high-quality re-annotated data, improving from 64.84% to 68.15% in maximum F-measure. The approach also proved effective at refining coarse annotations, improving intersection-over-union scores by more than 20-30% for masks with 16px and 32px errors.

This advancement matters because precise boundary detection is fundamental to many real-world applications. In robotics, accurate object boundaries enable better manipulation and grasping. In medical imaging, clearer boundaries can improve diagnosis and treatment planning. The method's ability to work with noisy training data also addresses a practical challenge: creating perfectly annotated datasets is extremely time-consuming, often taking 30-60 seconds per object boundary.

The approach does have limitations. The alignment framework relies on the network becoming sufficiently accurate during training before it can effectively refine boundaries. Additionally, while the method handles various levels of annotation noise, its performance depends on having reasonable initial training data. The researchers note that during early training stages, when the network lacks confidence, the inferred boundaries may remain noisy.

AI Sharpens Blurry Image Boundaries to Boost Robot Vision

Original Source

About the Author

Guilherme A.