In an era where passwords are the primary defense for online accounts, a new study reveals a troubling vulnerability: the personal information we freely share on social media can be exploited to guess or evaluate our passwords with alarming accuracy. Researchers from the University of Cagliari and University of Salerno have developed a tool called SODA ADVANCE that reconstructs user data from platforms like Facebook, LinkedIn, and Instagram to assess password strength, and they've tested how large language models (LLMs) like ChatGPT and Claude can use this information to generate or evaluate passwords. This work highlights a critical gap in traditional password security, which often relies on syntax rules without considering the semantic context of user choices, making it easier for attackers to infer passwords from publicly available details.
The key finding from the study is that LLMs can effectively generate strong, personalized passwords and evaluate password strength when they have access to user profile data from social networks. In experiments with 100 real users, LLMs like Claude, Google Gemini, and ChatGPT outperformed others, with Claude achieving a Cumulative Password Strength (CPS) score of 0.82 on average for generated passwords. The CPS metric, introduced in the SODA ADVANCE tool, combines four s—CUPP, LEET, COVERAGE, and FORCE—to provide a cumulative strength value between 0 and 1, where scores above 0.55 are considered strong. This demonstrates that LLMs can create passwords that are both secure and easy to remember by leveraging personal information, but it also exposes a risk: attackers could use similar techniques to guess passwords based on public data.
Ology involved three pipelines combining SODA ADVANCE with LLMs. First, SODA ADVANCE reconstructs user data from social networks using web crawling, scraping, and facial recognition to verify identity across platforms. Then, it merges this information to evaluate password strength with the CPS metric. In the first pipeline, LLMs generate passwords based on user input, which are then evaluated by SODA ADVANCE. The second pipeline has LLMs evaluate password strength by considering user data, while the third pipeline integrates SODA ADVANCE's evaluation directly into prompts for LLMs, enhancing accuracy. Prompt engineering was crucial, with functions like fpassword-generation and fevaluate-password designed to interact with LLMs, such as one prompt asking ChatGPT to assess passwords like 'OrangeSystems23' based on user details like name and city.
From the experimental evaluation show significant improvements when social media data is included. For RQ1, LLMs generated strong passwords with Claude leading at a CPS score of 0.82. For RQ2, in binary classification (weak vs. strong), Claude achieved an accuracy of 0.75 and precision of 0.76, outperforming other LLMs and ensemble models. For RQ3, adding broader personal information boosted performance; for example, Falcon's precision improved from 0.48 to 0.77, and ChatGPT reached high scores. However, for RQ4, when evaluating passwords with three categories (weak, medium, strong), LLMs struggled, with performance dropping compared to binary tasks. SODA ADVANCE showed good capabilities, classifying passwords as weak if they contained user-linked information, while tools like Zxcvbn and Semantic PCFG often overestimated strength for syntactically complex passwords unrelated to users. In a comparison with the PassBERT model for targeted password guessing, only 22 out of 25,000 strong passwords generated by LLMs were inferred, highlighting the robustness of LLM-generated passwords.
Of this research are profound for everyday internet users. It underscores the privacy risks of sharing personal data online, as attackers could use LLMs to guess passwords based on public information. The study suggests that users should adopt stronger privacy settings and secure password practices, and it calls for clearer ethical and legal guidelines regarding LLM use. For the general public, this means being more cautious about what is posted on social media, as even seemingly innocuous details can be pieced together to compromise security. The combination of data reconstruction tools and AI models presents both an opportunity for better password management and a threat that requires proactive measures to mitigate.
Limitations of the study include of evaluating password strength with three categories, where LLMs performed poorly compared to binary classification. SODA ADVANCE tends to overestimate password complexity when words are not semantically linked to the user, while other tools like CKL_PSM and Zxcvbn may underestimate strength for complex syntax with user-related information. Additionally, LLMs did not excel at generating medium-strength passwords, and the research is limited to data from specific social networks, with future directions suggesting expansion to other web platforms. The study also notes the need for further investigation into LLM compliance with regulations like GDPR, emphasizing that while AI can enhance security, it also introduces new vulnerabilities that must be addressed.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn