Robots Learn from Mistakes to Pick Objects Faster

TL;DR

Memory-based policies help robots avoid repeated failures in bin picking, boosting efficiency by over 100% in some cases.

In automated warehouses and e-commerce fulfillment centers, robots that pick items from bins are essential for meeting growing demand, but they often get stuck repeatedly failing on the same objects, slowing down operations. This research introduces a smarter approach where robots remember past mistakes to adjust their actions, significantly improving performance without needing complex new hardware. For businesses relying on automation, this could mean faster order processing and fewer delays, making systems more reliable and cost-effective.

The key finding is that non-Markov policies, which incorporate memory of past actions and failures, outperform traditional methods that only consider the current observation. In physical experiments with 50 heaps of objects, the most effective non-Markov policy increased the mean picks per hour (MPPH) by 107% compared to the baseline Markov policy. This improvement stems from reducing sequential failure objects (SFOs)—items that cause repeated grasp failures due to unobservable properties like porous surfaces or insufficient friction.

To achieve this, the researchers developed three non-Markov policies: cluster, circle, and swap. These policies use techniques such as masking grasp spaces or tracking objects over time to avoid repeated errors. For example, the cluster policy segments objects in the bin and masks gripper types that have failed on specific objects, while the circle policy restricts grasps to a small area around a failed attempt. The swap policy alternates between grippers and resorts to circle-like restrictions if failures persist. All policies build on the Dex-Net system, which uses a grasp quality convolutional neural network (GQCNN) to evaluate potential grasps, and they were tested in simulations and on a physical ABB YuMi robot equipped with parallel-jaw and suction-cup grippers.

The results show clear benefits. In simulations with synthetic SFOs, non-Markov policies consistently reduced the sequential failure rate (SFR) and increased MPPH across different failure scenarios. For instance, in environments with both gripper and placement failures, the swap policy achieved an SFR of 1.64 compared to 3.25 for the Markov policy, and MPPH rose from 578 to 726. Physical experiments confirmed these gains, with the swap policy improving the median sequence length of failures from 4.0 to 2.0 and boosting MPPH from 81.4 to 168.3. These metrics indicate not only fewer repeated errors but also higher overall productivity, as robots spend less time on unsuccessful grasps.

This advancement matters because it addresses a common bottleneck in robotics for logistics and manufacturing. By enabling robots to learn from errors in real-time, systems can handle a wider variety of objects without manual intervention, leading to smoother operations in settings like Amazon warehouses or automotive assembly lines. For everyday consumers, this could translate to faster delivery times and lower costs, as automated systems become more adept at managing unpredictable items.

Limitations include the reliance on specific sensor setups, such as depth and weight sensors, which may not generalize to all environments. The paper notes that uncertainty in object segmentation and tracking could affect performance, and future work is needed to explore neural network corrections for grasp quality predictions. Additionally, the policies were tested primarily in controlled settings, and their effectiveness in highly dynamic or cluttered real-world scenarios remains to be fully validated.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn