AIResearch AIResearch
Back to articles
AI

Robots Learn Complex Tasks with Minimal Training

Robots can now master intricate manipulation tasks with dramatically less training data than previously required, thanks to a new hierarchical learning approach that enables them to generalize skills …

AI Research
November 14, 2025
3 min read
Robots Learn Complex Tasks with Minimal Training

Robots can now master intricate manipulation tasks with dramatically less training data than previously required, thanks to a new hierarchical learning approach that enables them to generalize skills across different objects and environments. This breakthrough addresses a fundamental challenge in robotics: teaching machines to adapt to new situations without extensive retraining.

The researchers developed Generalizable Hierarchical Skill Learning (GSL), a framework that allows robots to decompose complex tasks into reusable object-centric skills. The system bridges high-level vision-language understanding with low-level motor control through a novel interface that represents skills relative to individual objects rather than the entire scene. This object-centric approach enables robots to transfer learned skills to new object arrangements, appearances, and compositions they haven't encountered during training.

The method works through a two-level architecture. First, during training, the system automatically breaks down human demonstrations into object-focused skill primitives. Each skill is defined by a mask identifying the target object, a language label describing the action, and a trajectory segment showing how to manipulate that object. These skills are then "canonicalized" - transformed into a consistent object-centered coordinate system that makes them independent of the object's position or orientation in the world.

At execution time, the high-level agent analyzes the current scene and predicts which skill-object pairs to execute next. The low-level controller then generates precise manipulation trajectories in the object's coordinate frame, which are mapped back to the robot's actual environment for execution. This separation allows the system to focus on local object interactions while maintaining semantic understanding of the overall task.

Experimental results demonstrate substantial improvements in both sample efficiency and generalization. In simulation tests, GSL achieved success rates averaging 91.5% across 10 training tasks while using only 1 demonstration per task - outperforming baseline methods that required 10 times more training data. More impressively, in zero-shot transfer tests where robots faced completely new object configurations, GSL showed a 23.4% performance improvement over the best existing methods. Real-world experiments confirmed these gains, with the system successfully completing tasks like opening drawers, placing objects, and preparing tea using skills learned from minimal demonstrations.

The practical implications are significant for applications ranging from manufacturing to household assistance. Robots could be deployed more quickly in new environments, adapting to variations in object placement, appearance, and task requirements without costly retraining. This could make robotic systems more practical for small-batch production, customized manufacturing, and dynamic home environments where conditions constantly change.

However, the approach has limitations. It currently focuses on interactions with single objects, struggling with tasks that require coordinated manipulation of multiple objects simultaneously. The system also depends on accurate object segmentation from vision models, and performance degrades when these models fail to properly identify object boundaries or parts. In testing, drawer-related tasks proved particularly challenging because the vision system sometimes treated entire cabinets as single objects rather than identifying individual drawers.

Future work will need to address multi-object interactions and improve robustness to imperfect visual perception. The researchers suggest that incorporating part-aware segmentation and extending the object-centric interface to handle multiple masks could help overcome these limitations. Despite these challenges, the framework represents a substantial step toward robots that can learn efficiently and adapt flexibly to real-world variability.

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn