AI Reads Cybersecurity Reports and Flags Key Risks

TL;DR

New method uses language models to extract and standardize vulnerabilities from security reports automatically, cutting manual effort and improving risk...

In the fast-evolving landscape of cybersecurity, organizations grapple with thousands of vulnerabilities detected by tools like OpenVAS and Tenable WAS, but inconsistent report formats hinder automated analysis and prioritization. This is amplified by limited resources, as highlighted by data showing over 500,000 unaddressed vulnerabilities in 2017, a number that continues to grow with expanding digital threats. The new AI-driven approach addresses this by converting unstructured reports into standardized datasets, making it easier to manage risks and share data securely across institutions without the need for labor-intensive manual processing.

Researchers developed a that leverages large language models (LLMs) to automatically extract and structure vulnerabilities from OpenVAS and Tenable WAS reports, which are widely used for identifying flaws in web applications. The tool, called Vulnerability Extractor, processes PDF reports by first reading and dividing the text into logical chunks to handle the models' token limits, then uses specific prompts to identify key fields like description, impact, solution, and references. This process includes explicit mapping of tool-specific fields to a unified schema, ensuring consistency and filling missing data with NULL values to avoid generating false information, thus preserving the original report's fidelity.

In evaluations using an OpenVAS report with 34 vulnerabilities, models like GPT-4.1 and DeepSeek achieved the highest similarity to a manually validated baseline, with ROUGE-L scores greater than 0.7, indicating highly similar extractions. As shown in Figure 2, these models outperformed others such as Llama-3 and GPT-4, which had lower scores due to factors like computational efficiency priorities and architectural limitations. Qualitative analysis revealed occasional issues such as duplications, omissions, and labeling errors, particularly in vulnerabilities related to SSL/TLS protocols, where semantic substitutions and context limitations from chunking reduced precision.

Of this research are significant for cybersecurity practices, as it enables automated prioritization of vulnerabilities based on standardized data, helping organizations focus on high-risk issues more efficiently. By transforming heterogeneous reports into usable datasets, supports future integration with anonymization modules, allowing secure data sharing among institutions without exposing sensitive details. This advancement could streamline risk management in sectors like finance and healthcare, where rapid response to threats is critical, and reduce the manual overhead that often leads to vulnerabilities being overlooked.

Despite its successes, has limitations, including precision degradation from factors like semantic truncation, delimiter loss during PDF extraction, and hallucinations in technical sections, as illustrated in Figure 3 where some fields fell below 70% similarity. These issues stem from chunking strategies that limit global context and tokenization variations, even with low temperature settings, highlighting the need for improved segmentation and validation mechanisms in future work to enhance reliability and consistency in automated extractions.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn