Resumen
Understanding the tactics (why), techniques (how) and procedures (methods) behind a cybersecurity attack is paramount to develop defenses against them or to mitigate their effects. However, this task requires a high-level of technical expertise, is time-consuming and error prone. In this work we verify that open-source Llama 3.1 LLMs (Large Language Models) cannot automatically identify which of the 625 MITRE techniques is used within a cybersecurity attack procedure. We evaluate two RAG (Retrieval Augmented Generation) approaches to enhance the classification accuracy. Our experiments show the importance of the embedding model in information retrieval. Moreover, our analysis shows that selecting appropriate examples helps the language model reduce ambiguity. Specifically, a dynamic few-shot learning strategy performs best for larger models, whereas a multiple-choice strategy is more appropriate for smaller models. In contrast, corrective RAG techniques fail to provide significant enhancements, highlighting current methodological limitations and the inherent complexity of this task.
| Título traducido de la contribución | Clasificación de Procedimientos de Ataques de Ciberseguridad mediante Generación Aumentada por Recuperación |
|---|---|
| Idioma original | Inglés |
| Páginas (desde-hasta) | 199-210 |
| Número de páginas | 12 |
| Publicación | Procesamiento del Lenguaje Natural |
| Volumen | 75 |
| Estado | Publicada - sept 2025 |
Huella
Profundice en los temas de investigación de 'Clasificación de Procedimientos de Ataques de Ciberseguridad mediante Generación Aumentada por Recuperación'. En conjunto forman una huella única.Citar esto
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver