AI Text Watermarks Fall Short of EU Rules

TL;DR

A new study finds no current AI watermarking method meets EU reliability standards, putting content trust and regulation at risk.

As artificial intelligence systems generate more content, distinguishing AI from human work becomes critical for trust and accountability. The European Union's AI Act mandates that large language model outputs be marked with reliable watermarks, but a recent analysis shows existing techniques fall short, posing risks for misinformation and ethical use.

The researchers found that no current watermarking method satisfies all four EU criteria: reliability, robustness, effectiveness, and interoperability. They categorized techniques based on when the watermark is applied in the LLM lifecycle—before, during, or after training—and evaluated them against these standards. For example, pre- and post-processing methods, like character substitutions, are easy to implement but vulnerable to removal, while in-processing approaches embedded in model weights offer better resistance but may degrade output quality.

The methodology involved a systematic review of state-of-the-art watermarking, focusing on how techniques align with the AI Act's requirements. The study introduced a taxonomy to clarify when watermarks are applied, such as during training (e.g., via distillation or reinforcement learning) or inference (e.g., biasing token selection). This framework allowed for a clear comparison of methods, referencing figures like the schematic overview of watermark styles in the paper's Figure 1.

Results indicate that trade-offs are inherent: methods ensuring high detectability often lack robustness against attacks, such as paraphrasing or model modifications. For instance, techniques like Kirchenbauer's token coloring (shown in the paper's evaluations) achieve good detectability but can be erased by simple text alterations. The analysis, summarized in the paper's comparative tables, shows that no single method meets all EU pillars, with gaps particularly in interoperability—the ability for different watermarks to work together across systems.

This matters because without effective watermarks, AI-generated content could spread unchecked, leading to issues like plagiarism, misinformation, and eroded public trust. The EU's push for standardization aims to foster trustworthy AI, but the study highlights that current techniques are not ready for real-world deployment, affecting sectors from education to journalism where authenticity is paramount.

Limitations of the research include the lack of empirical evidence for interoperability and the need for more nuanced comparisons between watermarking families. The paper notes that future work should test methods in realistic environments and address how watermarks perform under common LLM modifications, such as fine-tuning or quantization.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn