Small AI Models Handle Cybersecurity Tasks Without the Cloud

TL;DR

New research shows small language models can classify security incidents on local hardware, cutting costs without sacrificing accuracy.

Cybersecurity teams are overwhelmed by a rising tide of incidents, needing faster ways to sort and prioritize threats. Automated tools based on artificial intelligence promise to help, but using large, cloud-based models often comes with high costs, delays, and privacy risks. A new study investigates whether smaller, locally run AI models can handle this effectively, providing a practical solution for organizations with limited resources.

The researchers found that the temperature setting, a common adjustment in AI models to control randomness in responses, has little influence on how well these small models categorize security incidents. Instead, the number of parameters in the model and the capacity of the graphics processing unit (GPU) are the key factors determining performance. For example, medium-sized models like DeepSeek-R1 14B and Qwen3 4B achieved the highest precision rates, balancing accuracy with computational cost, while smaller models such as Gemma3 1B were faster but less accurate.

To test this, the team evaluated 21 small language models ranging from 1 billion to 20 billion parameters, using a dataset of real, anonymized security incidents from a Computer Security Incident Response Team (CSIRT). They employed two different computer setups: one with an AMD Ryzen 7 processor and an NVIDIA GeForce GTX 1650 GPU, and another with an Intel Core i7 processor and an NVIDIA RTX A4000 GPU. The experiments varied the temperature hyperparameter across four settings (0, 0.4, 0.7, and 1) and measured both execution time and precision in categorizing incidents into six balanced categories.

, Detailed in Table I and Figure 2, show that execution times were significantly faster on the more powerful i7/RTX A4000 architecture compared to the Ryzen7/GTX 1650 setup. For instance, the Gemma3 1B model completed tasks in about 7 minutes on the i7 system versus over 18 minutes on the Ryzen7 system. Precision, as shown in Table II and Figure 3, remained relatively stable across different temperature settings, with average values indicating that temperature had limited impact. The DeepSeek-R1 14B model achieved the highest precision on the Ryzen7 architecture, while GPT-OSS 20B performed best on the i7 system, highlighting how hardware resources affect model outcomes.

This research matters because it demonstrates that organizations can use small, locally deployed AI models to automate cybersecurity tasks without relying on expensive cloud services, reducing costs and enhancing data privacy. For security operations centers (SOCs) and incident response teams, this means faster, more scalable solutions for handling threats, especially in environments with constrained budgets or sensitive information. suggest that investing in better hardware, like advanced GPUs, can further improve performance, making AI-driven automation more accessible and effective.

However, the study has limitations. It focused only on the temperature hyperparameter and did not explore other settings like top-k or top-p, which might influence . The dataset was relatively small, with 24 incidents across six categories, potentially limiting generalizability to larger or more diverse threat landscapes. Future work could investigate detailed CPU and GPU usage, lightweight optimization techniques such as quantization, and the combined effects of multiple hyperparameters to refine these models for broader applications.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn