A new study reveals that robots can dramatically improve their vision by copying a simple behavior from insects: peering side-to-side. This approach allows robots to see through clutter and obstacles in real-time, overcoming a major limitation in conventional camera systems. The research, conducted by an international team of scientists, bridges biology and robotics, showing how animal-inspired motions can enhance artificial perception. This breakthrough has immediate applications in fields like search-and-rescue, surveillance, and terrain exploration, where clear vision is critical but often obstructed.
The key finding is that by performing peering motions—similar to how insects like locusts and katydids move their heads—robots can suppress partial occlusion and see backgrounds that would otherwise be hidden. In the paper, the researchers demonstrate this using a quadrupedal robot, the ANYbotics ANYmal, which captures hundreds of images during a peering motion. These images are then computationally combined to create a synthetic aperture integral image, where occluding objects in the foreground are blurred out, while the background remains in sharp focus. For example, in Figure 1, a peering motion of approximately 11 centimeters was used to sample and integrate 300 images, effectively revealing occluded scenes in both visible and near-infrared light.
Ology involves synthetic aperture sensing, an optical imaging technique where a moving camera collects multiple images over time and location. During peering, the robot's head describes a synthetic aperture several centimeters wide, and the captured images are projected onto a synthetic focal surface and averaged. This process, detailed in Figure S2 of the supplementary materials, requires precise camera pose data and is efficient enough for real-time operation on mobile processors. The researchers tested various peering motions, including lateral shifts and rotations, finding that horizontal shifts maximize motion parallax and yield superior , as shown in Figure S3. The technique is wavelength-independent, working across spectral bands like RGB, thermal, and near-infrared, making it versatile for different environments.
From the study show that this bio-inspired approach significantly enhances scene understanding. In Figure 1 and Figure S4, the researchers used large multimodal models, such as ChatGPT-5.0, to interpret scenes. These models struggled with occluded conventional images but successfully identified content in synthetic aperture integral images where occlusion was suppressed. For instance, in one test, the model correctly described a car approximately 5 meters away after peering removed foreground clutter. The paper also compares synthetic aperture sensing to alternative 3D vision s like structure-from-motion and neural radiance fields, which failed due to occluded features and required hours of computation on high-end GPUs, whereas the peering operates in real-time.
Of this research are broad, as it enables robots to navigate and perceive in cluttered environments without relying on expensive or slow technologies. The ability to peer is not limited to quadrupedal robots; it can be adapted to bipedal, wheeled, or other platforms, as noted in the paper. This could revolutionize applications in surveillance, where robots need to see through vegetation, or in inspection tasks where obstacles obscure critical infrastructure. Moreover, the study opens a bidirectional exchange between robotics and animal behavior research, suggesting that further insights from evolved animal motions could lead to even more effective peering patterns.
However, the approach has limitations. The efficacy of synthetic aperture sensing for occlusion removal is highest at 50% occlusion but degrades with increasing occlusion density, as mentioned in the paper. Additionally, the technique requires precise camera poses and may struggle with large horizontal occluders if only horizontal peering motions are used. The researchers suggest that future work should explore optimal peering motions, integrate omnidirectional and multi-spectral vision, and leverage motion parallax to break camouflage. These advancements could further enhance the robustness and applicability of this bio-inspired vision system in real-world scenarios.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn