Robots Learn to Climb Stairs Using 38% Less Energy

TL;DR

A new AI method trains hybrid robots to blend flying and driving, cutting energy use by 38% on real-world stair-climbing tasks.

Hybrid robots that can both fly and drive offer a promising solution for navigating complex environments, but they often struggle with energy efficiency on challenging terrains like staircases. Researchers have developed an AI approach that enables these robots to coordinate their aerial and ground actuators more effectively, reducing power consumption significantly. This breakthrough could make hybrid robots more practical for long-duration missions in urban or disaster-response scenarios where stairs and gaps are common obstacles.

The key finding from the study is that a reinforcement learning framework can train a single continuous policy to blend propeller thrust, wheel traction, and tilt servos without predefined modes, leading to emergent energy-efficient behaviors. In simulation, this approach achieved about four times lower energy consumption compared to using propellers alone, as shown in Figure 3. When transferred to a real DoubleBee prototype robot, the learned policy reduced average power by 38% and total energy by 37.7% on an 8 cm gap-climbing task, outperforming a rule-based decoupled controller that maintained constant thrust regardless of terrain.

Ology involved training the policy in the Isaac Lab simulation environment with 4096 parallel environments, using hardware-calibrated models to map actuator commands to true electrical energy. The robot's observation space included wheel velocities, base motion, gravity projection, contact indicators, goal direction, height scans, and action history, while the action space controlled propellers, wheels, and servos. A reward function combined task completion, energy penalties based on actual power consumption, and stability terms, with domain randomization applied to improve sim-to-real transfer by varying parameters like mass and friction within ±20% ranges.

Analysis from Figure 6 shows that the learned policy modulated propeller output selectively, with thrust bursts during climbing phases and reduced usage otherwise, unlike the decoupled controller which maintained steady high thrust. This led to an average power of 119 W for the RL policy versus 192 W for the successful decoupled controller, with the RL policy achieving a mean ground speed of 0.38 m/s compared to 0.31 m/s for the decoupled one. In simulation, the hybrid policy reached a success rate above 55% on multi-stair terrain, while propellers-only control plateaued at 30%, and wheels-only failed entirely, as detailed in Figure 3.

Of this research are significant for real-world applications where energy efficiency is critical, such as search-and-rescue operations or infrastructure inspection. By enabling robots to use thrust only when necessary, extends operational endurance without sacrificing mobility on discontinuous terrains. The study demonstrates that learning can discover efficient hybrid actuation strategies, such as thrust-assisted driving, which traditional rule-based controllers might miss due to their reliance on discrete modes.

Limitations of the approach include a success rate of 3 out of 5 trials in real-world tests, with failures occurring when the robot became mechanically stuck on step edges or pitched forward excessively. The paper notes that improving robustness to these failure modes, possibly through safety constraints on thrust transients, is left for future work. Additionally, the simulation training focused on inverted-pyramid stair terrains, and extending the framework to more complex outdoor surfaces remains an open .

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn