AI Uses Your Values Against You in Persuasion Attacks

TL;DR

New research shows how AI models exploit moral foundations like care and loyalty to spread disinformation, plus a dataset to help organizations fight back.

As artificial intelligence becomes increasingly capable of generating human-like text, a new study reveals how these systems can be weaponized to undermine official communications from government agencies. Researchers have created the first large-scale dataset of AI-generated persuasion attacks, showing how models like GPT-4, Gemma 2, and Llama 3.1 systematically exploit fundamental human values to create competing narratives that official messaging. This work comes at a critical time when organizations struggle to maintain public trust amid the rapid proliferation of AI-generated content that can be produced at unprecedented speed and scale.

The study's key finding reveals that different AI models employ distinct moral strategies when generating persuasive attacks against government press releases. GPT-4 primarily focuses on appeals to Care (with an average resonance of 0.32), emphasizing protection and emotional concern, while also incorporating Authority (0.26) and Loyalty (0.26) to reinforce trust in leadership and group cohesion. Gemma 2 shows an even stronger emphasis on Care (0.42), particularly through techniques like Exaggeration which scores 0.62 for Care, creating urgent emotional appeals. Llama 3.1 takes a different approach, prioritizing Loyalty (0.32) through techniques like Appeal to Time which scores 0.53 for Loyalty, emphasizing tradition and group identity alongside Care (0.31). These patterns demonstrate that AI models don't just generate random criticism but systematically target the moral foundations that shape human judgment and decision-making.

The researchers developed their ology by collecting 972 press releases from ten government agencies including the Department of Defense, DARPA, and Sandia National Laboratories, covering topics from military technology to artificial intelligence research. They then used three leading language models—GPT-4, Gemma 2, and Llama 3.1—to generate 23 different types of persuasive attacks against each article, creating a total of 134,136 attacks. Each attack was generated in two formats: press release statements for long-form communication and social media posts for short-form messaging. The persuasive techniques included common rhetorical strategies like Appeal to Authority, Appeal to Fear, False Dilemma, Whataboutism, and Loaded Language, drawn from established persuasion taxonomies used in computational linguistics research.

, Detailed in Tables 1-3 of the paper, show consistent patterns in how different models align with specific moral foundations. GPT-4's Flag Waving technique scored 0.38 for Care, while its Appeal to Authority scored 0.42 for Authority, demonstrating how different techniques target different values. Gemma 2 showed particularly strong emotional appeals, with Exaggeration scoring 0.62 for Care and Repetition scoring 0.51 for Authority. Llama 3.1's Appeal to Time scored 0.53 for Loyalty, the highest single score across all models for any moral foundation. The researchers analyzed these patterns using Moral Foundations Theory, which identifies five core values: Care (compassion and harm prevention), Fairness (justice and equality), Loyalty (group belonging), Authority (respect for leadership), and Purity (sanctity and discipline). Their analysis revealed that while all models emphasize Care, they differ significantly in their secondary focuses, with GPT-4 and Gemma 2 relying more on Authority while Llama 3.1 emphasizes Loyalty.

These have significant for how organizations communicate in an AI-dominated information environment. Government agencies and other institutions can use this dataset to understand the specific vulnerabilities in their messaging and develop more resilient communication strategies. The research enables proactive defense against AI-generated persuasion by revealing the moral frameworks that different models employ. This is particularly important as AI systems can generate persuasive content at a scale and speed that human communicators cannot match, creating what the researchers describe as "an uphill battle for effective and resilient communication." By understanding how AI models construct competing narratives, organizations can anticipate attack vectors and design messaging that maintains integrity even when subjected to sophisticated AI-generated criticism.

The study acknowledges several limitations, including potential biases in the training data of the AI models used, which may reflect societal, political, or cultural biases that influence the persuasive attacks they generate. The concept of moral resonance, while useful for analysis, may not always correspond to real-world outcomes, as the impact of these appeals can vary depending on interpretation and context. Additionally, the tone and content of the original press releases may influence the form of the persuasive attacks, creating variability that future research will need to explore. The researchers also made an ethical decision not to disclose the exact prompts used to generate attacks, providing only a high-level structural overview to prevent malicious use while still enabling technical discussion. These limitations highlight the need for continued investigation into how contextual factors and model biases interact in persuasive strategies, but the dataset provides a valuable foundation for such research.

The operational of this work are substantial, as the dataset allows organizations to develop targeted countermeasures against AI-generated persuasion. By providing unprecedented insight into how different models construct attacks across various techniques, agencies can build what the researchers call "reputation armor"—communication frameworks that withstand sophisticated persuasion attempts. This is particularly crucial as information operations increasingly leverage AI systems to shape public discourse. The dataset's scale—covering 23 persuasive techniques across 972 articles from 10 agencies—makes it a comprehensive resource for understanding the mechanisms behind AI-driven persuasive attacks. As the information landscape becomes more contested, this research represents a critical step toward maintaining public trust in official communications against the backdrop of rapidly advancing AI capabilities.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn