AI Agents Can't Handle Helpful Advice

Large language models (LLMs) are increasingly deployed as AI agents that collaborate with humans in group settings, from classrooms to workplaces. However, these AI collaborators often struggle with a critical skill: distinguishing between helpful suggestions and misleading advice from their partners. This limitation can derail collaborative tasks, as AI agents may either ignore valuable input or uncritically adopt flawed reasoning, reducing the group's shared understanding and success. A new study reveals why this happens and introduces a method to create 'partner-aware' AI collaborators that significantly improve group performance by learning when to listen and when to resist.

The researchers discovered that standard AI training methods, such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO), produce suboptimal collaborators. These methods optimize AI behavior based on reward structures that treat partner interventions as static context rather than dynamic inputs with variable quality. As a result, AI agents fail to genuinely contribute to group success when faced with interventions that are noisy, irrelevant, or misleading. In experiments, these baseline agents showed lower common-ground convergence and task accuracy compared to the new approach, highlighting a fundamental gap in how AI collaborators reason about partner input.

To address this, the team developed the Interruptible Collaborative Roleplayer (ICR) algorithm, which trains AI agents using a Modified-Action Markov Decision Process (MAMDP) framework. This approach simulates multi-party collaborations where an intervention agent provides suggestions, and the collaborator agent must respond. ICR incorporates a counterfactual invariance regularization term during training, encouraging the AI to maintain consistent reasoning even when interventions do not improve task utility. Essentially, the AI learns to evaluate whether an intervention is helpful by comparing its actions in the actual scenario to what it would do in a counterfactual state where the intervention is neutralized. This method does not require explicit common-ground rewards during training; instead, robust collaboration emerges as the AI optimizes for general utility while resisting misleading input.

Results from experiments on two collaborative tasks—the Wason Card Selection task and the Weights Task—demonstrate ICR's superiority. In the full-press condition, where agents communicate in natural language, ICR achieved a common-ground score of 14.06 on the Weights Task, a 47% improvement over the next best method (DPO at 9.56). It also reached 88% solution accuracy, outperforming baselines by up to 24%. In the no-press condition, with limited communication, ICR maintained strong performance, indicating its ability to foster alignment without extensive dialogue. Analysis showed that ICR agents progressively built common ground over turns, especially in complex relational tasks, by selectively integrating helpful interventions and discarding unhelpful ones.

The implications extend to real-world applications such as educational tutoring and team-based workflows, where AI assistants must adapt to diverse partner styles without compromising task integrity. For instance, in a classroom setting, an ICR-trained AI could help students resolve misunderstandings by critically evaluating suggestions, much like an experienced peer. This partner-aware behavior enhances group learning and decision-making, making AI collaborations more reliable and effective.

However, the study notes limitations, including the use of fixed intervention agents and the focus on text-based interactions. Future work should explore generalization to varied partners, multimodal inputs, and adversarial scenarios where deception may occur. Additionally, ethical considerations arise, as partner-aware methods could potentially be misused for manipulative purposes if not paired with safeguards. The researchers emphasize the need for collusion-focused testing and ethical deployment frameworks to mitigate risks.

In summary, this research uncovers a critical flaw in current AI collaborators and offers a principled solution through the ICR algorithm. By learning to be safely interruptible—open to valuable input yet resilient to noise—AI agents can significantly improve group outcomes, paving the way for more intelligent and trustworthy human-AI partnerships.

AI Agents Can't Handle Helpful Advice

About the Author

Guilherme A.