AI Learns New Tasks Without Forgetting Old Ones

TL;DR

A new method lets AI agents master new skills by reusing past knowledge, solving the catastrophic forgetting problem in reinforcement learning.

Artificial intelligence systems that learn through trial and error, known as reinforcement learning agents, often struggle when faced with multiple tasks over their lifetime. They typically require extensive exploration for each new challenge and tend to forget how to perform earlier ones—a issue called catastrophic forgetting. This inefficiency makes training costly, especially in real-world applications like robotics or medical treatments where experience is expensive. A new approach addresses this by allowing AI to build on past knowledge, accelerating learning without sacrificing performance on previous tasks.

The researchers developed a method called Lifelong Policy Gradient Learning with Factored Policies (LPG-FTW), which enables agents to learn new tasks faster by reusing information accumulated from earlier ones. Unlike traditional methods that train each task independently or use a single model for all tasks, this approach factors the agent's policy into shared knowledge components and task-specific mappings. This means the AI can search for solutions within the span of previously learned factors, leveraging past experience to find high-performing policies more quickly. The method ensures that as the agent encounters new tasks, it updates its shared knowledge base every few steps, incorporating relevant information without modifying the core components needed for earlier tasks.

To implement this, the algorithm uses policy gradient methods, which are effective for controlling high-dimensional systems like robots. It factors the policy parameters into a shared dictionary and task-specific coefficients, optimizing them directly via gradients. Every M steps, the shared knowledge is updated based on information from the current task, using a second-order approximation to maintain performance on all previous tasks. This process avoids the need to store large amounts of data or recompute parameters for earlier tasks, making it efficient for lifelong learning. The initialization phase starts with an empty dictionary and adds columns as new tasks are encountered, preventing redundant discoveries and enabling early tasks to benefit from subsequent learning.

Empirical evaluations on benchmark domains, including MuJoCo simulations and the challenging Meta-World benchmark, demonstrate that LPG-FTW learns significantly faster than single-task learning and other lifelong baselines. For example, in HalfCheetah and Hopper environments with varying gravity or body part sizes, LPG-FTW achieved higher performance in five out of six domains with less experience, as shown in Figure 1. It completely avoided catastrophic forgetting, maintaining proficiency on all earlier tasks, whereas methods like Elastic Weight Consolidation (EWC) failed in some domains and PG-ELLA often performed worse than single-task learning. In Meta-World, which involves a robotic arm manipulating objects, LPG-FTW accelerated learning and suffered no forgetting in the MT50 benchmark, outperforming all baselines.

This advancement matters because it reduces the amount of experience needed for AI to become proficient at diverse tasks, making reinforcement learning more practical for real-world systems. In settings like robotics or personalized medicine, where training data is limited and costly, this could lead to more efficient and adaptable AI agents. By avoiding catastrophic forgetting, the method ensures that agents retain their skills over time, similar to how humans build on past knowledge to learn new things faster.

The study notes limitations, such as reliance on task indicators to reconstruct individual policies and the assumption that tasks are drawn independently from a stationary environment. Future work could address non-stationarity by dynamically adding or removing factors as the environment changes. Despite these, the method represents a step toward scalable lifelong learning, with theoretical guarantees of convergence to an approximate multi-task objective.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn