AI Makes Humanoid Robots Walk More Accurately

TL;DR

A hybrid method blends classic robotics with Transformer networks to improve state estimation and give robots more stable movement on complex terrain.

Humanoid robots are stepping out of controlled lab settings into real-world applications like healthcare and search-and-rescue missions, but their ability to walk stably on varied terrains remains a significant . Accurate state estimation—the process of determining a robot's position, orientation, and velocity from noisy sensor data—is crucial for stable locomotion, as it provides real-time feedback to motion controllers. Traditional s, such as Kalman filters, require expert tuning and can struggle with model mismatches, while purely data-driven approaches demand large datasets and computational resources. A new study introduces a hybrid approach called InEKFormer, which merges an invariant extended Kalman filter (InEKF) with a Transformer network to enhance state estimation for humanoid robots, potentially improving their reliability in dynamic environments.

The researchers developed InEKFormer to predict the Kalman gain—a key parameter in state estimation that adjusts how sensor measurements update the robot's state—using a Transformer model, rather than relying on hand-tuned noise parameters as in classical s. This hybrid internally integrates the model-based InEKF, which leverages Lie group symmetries for better convergence, with a data-driven Transformer network that processes sequences of input features like observation differences and contact states. As shown in Figure 1 of the paper, the approach combines propagation steps from IMU data and correction steps from leg odometry, with the Transformer predicting gains to account for unmodeled dynamics and sensor noise. The goal is to provide more accurate and robust state estimates for the RH5 humanoid robot, a 32-degree-of-freedom platform used in the experiments.

To train and test InEKFormer, the team created a new dataset from the RH5 robot, comprising over 575,000 data samples collected over 50 minutes from both simulation and real-world experiments. The dataset includes five motion types: walking, squatting, turning, hip movement, and single-leg balancing, with ground truth data from a motion capture system. The Transformer model was trained using a mean squared error loss function, with optimization via the Adam optimizer and a OneCycleLR scheduler, as detailed in ology section. Input features for the gain estimator included observation differences, innovation differences, and contact states, processed through an encoder-decoder architecture with scaled dot-product attention, illustrated in Figure 3. Training modes varied, with some models using teacher-forcing and others attempting autoregressive training, though stable autoregressive training on larger datasets proved challenging.

, Summarized in Table III, show that InEKFormer outperformed the hybrid KalmanNet in most tests, particularly on high-dimensional data from the RH5 robot. For example, in baseline tests on short sequences, InEKFormer models achieved root mean-square error (RMSE) values up to 103 times lower than KalmanNet models when trained in autoregressive mode. In single trajectory tests, model Ω6, trained on real robot data, showed lower errors in orientation and position compared to models trained on simulation data alone, though it was still outperformed by the pure InEKF in some dimensions like z-axis velocity. Figure 7 illustrates that InEKFormer estimates closely followed ground truth trajectories, while KalmanNet models failed to learn patterns effectively. However, the study noted limitations, such as transversal oscillations in some estimates due to inconsistent sensor sampling frequencies and the difficulty of achieving stable autoregressive training for online execution.

Of this research extend beyond humanoid robots to other robotic platforms like quadrupeds or aerial vehicles, as the hybrid approach could enhance state estimation in various applications requiring robust navigation. By reducing reliance on expert tuning and improving accuracy, InEKFormer may contribute to safer and more efficient robots in industrial, domestic, and rescue scenarios. The openly available dataset also supports further research in sim-to-real transfer learning, though the paper highlights gaps, such as the need for better simulation models to reduce discrepancies between simulated and real data. Future work should focus on advanced regularization techniques and more efficient Transformer architectures to enable real-time, online deployment, ultimately advancing the field of robotics toward more autonomous and adaptable systems.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn