As cyber threats evolve faster than ever, artificial intelligence systems struggle to keep up, often making dangerous errors when identifying new vulnerabilities. Researchers have developed a hybrid retrieval method that significantly improves AI accuracy in cybersecurity tasks, offering a more reliable approach to threat detection without the high costs of retraining large language models.
The key finding demonstrates that combining two retrieval techniques—semantic search and keyword matching—substantially enhances AI performance. Using the Llama-3-8B-Instruct model, the hybrid approach achieved 72.7% accuracy on Common Vulnerabilities and Exposures (CVE) identification tasks, compared to 59.2% without retrieval augmentation. For Common Weakness Enumeration (CWE) tasks, accuracy reached 92.2%, nearly eliminating random guessing errors.
Methodology centered on a sparse-dense retriever that integrates semantic similarity search with BM25 keyword-based retrieval. The system processes cybersecurity documents by chunking them into manageable segments, creating vector embeddings for semantic search, while simultaneously applying keyword matching for precise identifier detection. A regular expression filter specifically targets CVE identifiers like 'CVE-2024-5022', ensuring vulnerability-specific queries receive appropriate context.
Results analysis reveals consistent improvements across multiple metrics. The hybrid method outperformed baseline retrieval-augmented generation (RAG) by approximately 5 percentage points on CVE tasks and nearly 7 points on CWE tasks. Temperature settings significantly impacted performance, with lower temperatures (0.01) yielding more stable results than higher settings (1.0), where accuracy dropped by over 5 percentage points due to increased response variability.
Context matters because cybersecurity AI systems frequently misinterpret technical terminology and struggle with temporal reasoning—the ability to understand how threats evolve over time. The hybrid approach addresses this by ensuring both semantic understanding and precise keyword matching work together, making AI systems more trustworthy for real-world security applications where errors can have serious consequences.
Limitations include the current focus on CVE and CWE datasets only, leaving broader cybersecurity contexts unexplored. The study also restricted evaluation to a single AI model architecture, and the regex matcher currently only handles CVE identifiers, not other cybersecurity taxonomies like ATT&CK or CAPEC. Future work should test generalization across diverse security domains and model architectures.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn