AIResearch AIResearch
Back to articles
AI

Robots Navigate Like Humans Using AI Vision

Robots now navigate unknown spaces like humans, finding targets 65% faster. This breakthrough means autonomous helpers could soon work in homes and disaster zones without human guidance.

AI Research
November 14, 2025
3 min read
Robots Navigate Like Humans Using AI Vision

Robots can now navigate unfamiliar environments as effectively as humans by combining two types of visual perception—immediate surroundings and broader context—using advanced vision-language models. This breakthrough in object-oriented navigation means autonomous robots could soon perform complex tasks in homes, offices, and disaster zones without prior mapping or human guidance.

The key finding from researchers at Tsinghua University and Chinese University of Hong Kong is that robots using their HyPerNav system successfully find target objects in completely unknown environments 65.4% of the time, outperforming most existing methods. When they do succeed, they follow paths that are 43.7% closer to the optimal route than previous approaches. This represents a significant improvement in both reliability and efficiency for autonomous navigation.

The methodology combines what the researchers call "hybrid perception"—using both egocentric views (what the robot sees directly in front of it) and global top-down maps it builds as it explores. Like humans who simultaneously pay attention to immediate obstacles while maintaining awareness of the overall layout, HyPerNav uses Qwen-VL vision-language models to analyze both perspectives. The system processes local RGB-D camera data for precise object detection while constructing a real-time map of the environment for strategic planning.

Results from extensive testing across two major navigation datasets—HM3D and the more challenging OVON—show consistent performance improvements. In HM3D environments, HyPerNav achieved the highest success rate among training-free methods at 65.4%, while maintaining the second-highest overall success rate across all methods. More importantly, its Success weighted by Path Length (SPL) score of 43.7% indicates it finds more direct routes to targets. The system proved particularly effective at handling complex object descriptions like "L-shaped sofa" and "clothes dryer" in the OVON dataset, though it occasionally confused semantically similar objects like mistaking "window curtain" for "shower curtain."

This advancement matters because it addresses a fundamental limitation in current robotics: the disconnect between local perception and global understanding. Previous systems typically relied on one perspective or the other, causing robots to get stuck in corners or miss obvious navigation cues. The hybrid approach enables more intelligent decision-making, similar to how humans naturally navigate. Real-world validation using a wheeled robot in laboratory settings confirmed the method's effectiveness, with the system successfully locating beds and umbrellas in office environments.

The research acknowledges several limitations. The system still fails about one-third of the time, primarily due to poor object detection, incomplete mapping data, or ineffective navigation guidance. When objects are completely surrounded by obstacles, the robot sometimes cannot reach them even when detected. The method also requires significant computational resources, though at 1.2 seconds per decision, it's faster than comparable VLM-based approaches that take 3.3 seconds per step.

What remains unknown is how well this approach scales to more complex, dynamic environments with moving obstacles or multiple simultaneous targets. The current evaluation focused on static indoor environments, and the researchers note that advancing toward human-level navigation potential will require addressing these more challenging scenarios.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn