Multi-agent reinforcement learning (MARL) enables AI systems to learn cooperative or competitive behaviors through repeated interactions, but most research assumes stable environments. In reality, AI teams often operate in open systems where agents join or leave, tasks appear or vanish, and agent capabilities change over time. This study investigates how such dynamics undermine a core challenge in MARL: the credit assignment problem (CAP), which determines how much each agent contributed to overall outcomes. Accurate credit assignment is crucial for effective learning and coordination, yet open systems violate the static assumptions of traditional methods, leading to misattribution and degraded performance.
The key finding is that openness in multi-agent systems directly causes credit misassignment, resulting in unstable learning and reduced coordination. Researchers identified three types of openness: agent openness (agents entering or leaving), task openness (tasks emerging or disappearing), and type openness (agent capabilities or preferences changing). In experiments, these dynamics led to significant performance drops, with systems struggling to assign credit correctly when team composition or tasks shifted mid-operation.
Methodology involved conceptual analysis and empirical evaluations using established MARL algorithms. The team adapted Deep Q-Networks (DQN) for temporal credit assignment (TCA), which links actions to delayed rewards, and Multi-Agent Proximal Policy Optimization (MAPPO) for structural credit assignment (SCA), which attributes outcomes to specific agents or components. They tested these in a wildfire suppression simulation from the MOASEI benchmark, where firefighters (agents) must coordinate to put out fires (tasks) while dealing with recharging, equipment changes, and dynamic fire intensities. Techniques like padding and action masking were used to handle variable agent numbers, though this did not fully capture unbounded openness.
Results analysis showed that in static conditions (no openness), both DQN and MAPPO achieved high average rewards and stable learning curves, indicating reliable credit assignment. However, under agent openness—such as agents leaving or returning—performance degraded sharply. For example, DQN's Q-value estimates became volatile, with training curves showing high variance and slower convergence, as broken temporal connections made reward propagation unreliable. MAPPO also suffered, with critic loss increasing and actor updates becoming erratic, reflecting an inability to accurately attribute credit in changing teams. In the wildfire domain, this led to under-coordination, with more fires burning out of control and penalties accumulating. Combined openness conditions (agent, task, and type changes together) caused the worst performance, with coordination collapsing and systems failing to learn effective policies.
Contextually, these findings matter because real-world AI applications—from autonomous vehicles to disaster response robots—often involve open systems. Misassignment of credit can lead to slow learning, suboptimal decisions, and even catastrophic failures in cooperative tasks. For instance, in a ridesharing system, if drivers (agents) or ride requests (tasks) change dynamically, AI might misattribute successes or failures, reducing efficiency and reliability. This research highlights the need for algorithms that adapt to fluid environments, ensuring AI teams remain robust in practical scenarios.
Limitations from the paper include the use of bounded implementations (via padding and masking) that do not fully address unbounded openness, where agent numbers or tasks are not artificially constrained. Additionally, the study focused on cooperative settings, leaving competitive or mixed environments less explored. Future work should develop adaptive mechanisms, such as context-sensitive bootstrapping or graph-based attribution, to handle non-stationarity and evolving team structures effectively.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn