AI Beats Hard Games and Learns New Ones Much Faster

TL;DR

A new reinforcement learning method reaches superhuman scores in tough Atari games and adapts to new tasks with 1000x fewer training samples.

Artificial intelligence has taken a significant step forward in mastering complex environments that require long-term planning and strategic exploration. Researchers have developed a method called Abstract Exploration (ABSPLORE) that enables AI agents to excel in some of the most challenging Atari games, including achieving superhuman performance in Pitfall! without needing human demonstrations. This breakthrough addresses a fundamental limitation in reinforcement learning: the difficulty of planning in high-dimensional environments where small errors compound over time, leading to poor long-term predictions.

The key discovery is that by working with simplified representations of game states rather than raw pixels, the AI can maintain highly accurate models that enable effective planning. The researchers found that their approach constructs what they call an 'abstract Markov Decision Process' - a simplified map of the game world that captures essential information like the agent's position and inventory while ignoring less critical details. This abstraction allows the AI to plan strategic exploration paths and reliably navigate between different game states.

The methodology operates through a manager-worker system. The manager handles high-level planning using the abstract game representation, while the worker deals with the messy details of actual game play. As shown in Figure 2 of the research, the manager navigates the agent to the edges of known territory, then randomly explores to discover new transitions. Once a potential path between abstract states is identified, the worker learns the specific skills needed to reliably traverse that path. This separation of concerns prevents the compounding errors that typically plague model-based reinforcement learning in complex environments.

The results demonstrate remarkable performance improvements. In Pitfall!, ABSPLORE achieved an average score of 9,959.6 after 8 billion training frames, significantly exceeding human performance of 6,464. The method also showed strong results in Montezuma's Revenge and Private Eye, outperforming previous state-of-the-art approaches. Perhaps most impressively, the learned abstract models enable rapid transfer to new tasks. When presented with new reward functions never seen during training, ABSPLORE achieved about 3 times higher reward using 1,000 times fewer samples than model-free methods trained from scratch.

This research matters because it addresses real-world challenges where multiple tasks share the same underlying dynamics but differ in their goals. The paper notes practical applications like navigating the internet or manipulating objects with a robotic arm, where the same basic skills can be applied to different objectives. The ability to quickly adapt to new tasks without extensive retraining could significantly reduce the computational costs of deploying AI systems in changing environments.

The approach does have limitations, primarily its reliance on a predefined abstraction function that maps complex game states to simplified representations. While the paper shows promising results for automatically learning this abstraction from pixels alone, current performance using learned abstractions (75% accuracy in Montezuma's Revenge, 64% in Pitfall!, and 81% in Private Eye) still falls short of using hand-designed abstractions. Additionally, the method assumes that rewards depend only on the abstract states, which may not hold in all real-world scenarios.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn