TY - JOUR
T1 - A pattern-aware LSTM-based approach for APT detection leveraging a realistic dataset for critical infrastructure security
AU - Iturbe, Eider
AU - Dalamagkas, Christos
AU - Radoglou-Grammatikis, Panagiotis
AU - Rios, Erkuden
AU - Toledo, Nerea
N1 - Publisher Copyright:
© 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license. http://creativecommons.org/licenses/by-nc-nd/4.0/
PY - 2026/5
Y1 - 2026/5
N2 - Advanced Persistent Threats (APTs) represent some of the most sophisticated and coordinated cyberattacks, often targeting critical infrastructure with stealthy, multi-stage techniques. Despite the availability of numerous intrusion detection datasets, most fail to capture the sequential and strategic nature of APT campaigns as outlined in frameworks like MITRE ATT&CK. This paper introduces a novel dataset based on a realistic emulation of the Sandworm APT group targeting the Supervisory Control and Data Acquisition (SCADA) system of a Wide Area Measurement System (WAMS). The dataset captures the full lifecycle of an APT attack, from initial access to impact, in a structured and time-ordered manner, enabling the study of both atomic and multi-step intrusion behaviours. We train and evaluate supervised multiclass sequence-aware models, specifically Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) architectures, to detect these behaviours using network flow data, assessing their performance and analysing their strengths and limitations. Our results show that BiLSTM models offer greater stability and generalization, while LSTM models achieve competitive performance with optimal configurations. These findings highlight the importance of realistic, sequence-aware datasets for developing robust intrusion detection systems tailored to modern APT threats.
AB - Advanced Persistent Threats (APTs) represent some of the most sophisticated and coordinated cyberattacks, often targeting critical infrastructure with stealthy, multi-stage techniques. Despite the availability of numerous intrusion detection datasets, most fail to capture the sequential and strategic nature of APT campaigns as outlined in frameworks like MITRE ATT&CK. This paper introduces a novel dataset based on a realistic emulation of the Sandworm APT group targeting the Supervisory Control and Data Acquisition (SCADA) system of a Wide Area Measurement System (WAMS). The dataset captures the full lifecycle of an APT attack, from initial access to impact, in a structured and time-ordered manner, enabling the study of both atomic and multi-step intrusion behaviours. We train and evaluate supervised multiclass sequence-aware models, specifically Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) architectures, to detect these behaviours using network flow data, assessing their performance and analysing their strengths and limitations. Our results show that BiLSTM models offer greater stability and generalization, while LSTM models achieve competitive performance with optimal configurations. These findings highlight the importance of realistic, sequence-aware datasets for developing robust intrusion detection systems tailored to modern APT threats.
KW - Advanced persistent threat
KW - APT
KW - Dataset
KW - Intrusion detection
KW - LSTM
KW - Multi-stage attack
UR - https://www.scopus.com/pages/publications/105030020461
U2 - 10.1016/j.future.2025.108308
DO - 10.1016/j.future.2025.108308
M3 - Article
AN - SCOPUS:105030020461
SN - 0167-739X
VL - 178
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
M1 - 108308
ER -