AI Fails to Forget Visual Misinformation

Multimodal large language models (MLLMs) are increasingly used in applications like image understanding and question answering, but they often learn and spread false information, raising privacy and security concerns. A new study reveals that current methods to remove such learned misinformation are inadequate, especially when it involves images, exposing vulnerabilities that could affect real-world uses in social media, news, and personal data handling.

The researchers discovered that MLLMs struggle to unlearn visual rumors—false information embedded in images—and that erased knowledge can be easily recovered through simple attacks or relearning. In tests, methods designed to forget specific data failed to remove associations between images and text, with performance dropping significantly in selective unlearning scenarios. For example, when trying to erase only private details like a player's transfer fee while keeping shared facts, models often disrupted both, indicating poor precision. The study also found that unlearning efficacy is largely driven by catastrophic forgetting, where models lose unrelated knowledge, and that all evaluated methods are vulnerable to adversarial prompts that trick them into recalling deleted information.

To assess these issues, the team developed OFFSIDE, a benchmark based on real-world data from football players, including 15.68K records with images and text. It evaluates unlearning in four settings: complete unlearning (removing all entity connections), selective unlearning (erasing only private attributes), corrective relearning (reintroducing corrected facts), and unimodal unlearning (using text-only inputs). The data was manually curated from sources like Google, ensuring diversity and realism, and split into subsets for training, testing, and evaluation. Models such as Qwen2.5-VL-3B and Qwen2.5-VL-7B were fine-tuned and then subjected to unlearning algorithms like gradient ascent and preference optimization, with metrics tracking accuracy, generation quality, and factuality.

Results from the paper show that in complete unlearning, methods like gradient ascent caused significant performance declines, while in selective unlearning, all baselines struggled to preserve shared knowledge. Corrective relearning revealed a 'bounce-back' effect, where unlearned information resurfaced after retraining, and unimodal unlearning proved ineffective in multimodal contexts. These findings highlight that current approaches do not truly erase data but conceal it, making models prone to misinformation propagation. For everyday users, this means AI systems might retain and reveal sensitive or false details despite attempts to delete them, impacting trust in technologies used for content moderation or personal assistants.

Limitations noted in the study include the focus on football-related data, which may not cover all misinformation types, and the inability of unlearning methods to handle complex visual-text associations robustly. The researchers emphasize that more work is needed to develop techniques that can securely forget information without compromising model utility, ensuring AI systems align with ethical standards and real-world demands.

AI Fails to Forget Visual Misinformation

About the Author

Guilherme A.