Korean AI Safety Dataset Reveals Non-English Gaps

TL;DR

A new multimodal dataset shows how AI models produce harmful Korean content, from hate speech to deepfakes, exposing flaws in English-only safety testing.

Generative AI models like large language models and image generators have transformed how people create content, get programming help, and learn. However, these powerful tools come with serious societal risks, including the spread of misinformation, creation of deepfakes for defamation, perpetuation of social biases, and use in cyberattacks like phishing. Current s to evaluate AI safety often fall short because they rely heavily on English-centric datasets, missing the unique linguistic nuances and socio-cultural contexts of languages like Korean. This gap leaves Korean users vulnerable to AI-generated harms that existing benchmarks cannot detect, from culturally specific hate speech to multimodal threats in images and videos.

The researchers behind AssurAI have developed a comprehensive taxonomy of 35 distinct AI risk factors, tailored to both universal harms and Korea's specific socio-cultural context. This taxonomy, created by a multidisciplinary expert group, includes categories like Harmful & Violent Content, Interpersonal Harm, Sensitive & Adult Content, Misinformation & Manipulation, Illegal & Unethical Activities, and Socioeconomic & Cognitive Risks. It covers everything from explicit threats like hate speech and illegal activities to subtle, long-term risks such as the devaluation of human labor and loss of user autonomy. Based on this taxonomy, they constructed AssurAI, a large-scale Korean multimodal dataset with 11,480 instances across text, image, video, and audio, designed to systematically assess generative AI safety in a Korean context.

To build AssurAI, the team employed a rigorous, multi-stage process focused on quality control. They began with expert-led seed data generation, where specialists manually crafted high-quality samples for each of the 35 risk factors, serving as guidelines for later stages. This was followed by crowdsourced mass production, where trained workers scaled up data construction using eight diverse prompt types, such as Multiple-Choice, Role-Playing, and Chain-of-Thought, to test AI models from various angles. Throughout, they implemented triple independent annotation to ensure objectivity, iterative expert review using a red team approach to identify errors, and pilot testing with actual generative models to validate effectiveness. This systematic ology aimed to balance depth and scale while minimizing biases and ensuring data reliability.

The dataset's composition reveals key insights: text data dominates at 83.3% (9,560 instances), with images at 10.1% (1,160), videos at 3.7% (430), and audio at 2.9% (330). Risk factors like 'Discriminatory Activities' and 'Unauthorized Privacy Violations' have the highest instance counts, at 1,000 and 900 respectively, due to their broad sub-scenarios. In pilot experiments, the team evaluated models like EXAONE 3.5, Llama 3.1, Mistral, and Qwen 2.5 on the text track, using a Judge Model to score responses on a 5-point safety scale. showed mean scores ranging from 3.3 to 3.9, with low standard deviations indicating stable evaluation, but significant differences emerged: Llama 3.1 scored lower, suggesting a more permissive approach, while models like EXAONE 3.5 performed better in categories like hate speech. Multimodal tests with Gemini and Veo models revealed varied safety patterns, with audio and video modalities showing risk-averse policies but s in complex prompts like Chain-of-Thought.

Of AssurAI are profound for both the Korean community and global AI safety efforts. By providing a culturally tailored benchmark, it enables researchers and developers to quantitatively evaluate Korean language models, identify vulnerabilities, and enhance safety mechanisms. This can help mitigate social side effects, such as the spread of harmful content or erosion of democratic participation, and foster a more trustworthy AI ecosystem. The dataset's release promotes transparency and collaboration, encouraging further research into dynamic evaluation frameworks that adapt to evolving AI technologies. For everyday users, it means safer AI interactions in Korean, reducing risks from deepfakes to biased outputs that current English-focused tools might overlook.

Despite its strengths, AssurAI has limitations. The 35 risk factors, while comprehensive, cannot predict all future risks from new AI technologies. The dataset is deeply customized to Korea's context, making direct application to other cultures challenging. Judgments of 'harmfulness' involve some subjectivity, and though mitigated through triple annotation and expert review, bias may persist. Additionally, as a static resource built at a specific time, it requires continuous expansion to keep pace with rapidly updating AI models. The pilot multimodal evaluations also had constraints, such as reliance on single-frame analysis for videos and limited judge model configurations, highlighting areas for future refinement.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn