TY - JOUR
T1 - Reinforcement Learning in action
T2 - Powering intelligent intrusion responses to advanced cyber threats in realistic scenarios
AU - Iturbe, Eider
AU - Rego, Angel
AU - Llorente-Vazquez, Oscar
AU - Rios, Erkuden
AU - Dalamagkas, Christos
AU - Merkouris, Dimitris
AU - Toledo, Nerea
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2026/1/15
Y1 - 2026/1/15
N2 - Given the increasing incidence of sophisticated cyber-attacks, particularly Advanced Persistent Threats (APTs), there is a growing need for intelligent and adaptive intrusion response solutions. In this paper, we propose a Reinforcement Learning (RL)-based model for APT intrusion response that can manage dynamic, multi-stage attacks and large observation spaces. The model supports both policy-based and value-based learning approaches, enabling comparative evaluation between different strategies. We introduce a realistic RL training environment based on emulation infrastructure, which accurately reproduces APT scenarios using real systems and executes a wide range of authentic Intrusion Response System (IRS) actions. This setup includes time and variability constraints commonly encountered in operational environments, offering a more practical alternative to traditional simulations. The RL agents, implemented using Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) algorithms, were both trained and evaluated within this industrial-style emulated environment. Empirical results demonstrate that both DRL algorithms successfully learned effective and well-timed defensive actions under realistic constraints, confirming their capability to operate in dynamic, real-world APT scenarios.
AB - Given the increasing incidence of sophisticated cyber-attacks, particularly Advanced Persistent Threats (APTs), there is a growing need for intelligent and adaptive intrusion response solutions. In this paper, we propose a Reinforcement Learning (RL)-based model for APT intrusion response that can manage dynamic, multi-stage attacks and large observation spaces. The model supports both policy-based and value-based learning approaches, enabling comparative evaluation between different strategies. We introduce a realistic RL training environment based on emulation infrastructure, which accurately reproduces APT scenarios using real systems and executes a wide range of authentic Intrusion Response System (IRS) actions. This setup includes time and variability constraints commonly encountered in operational environments, offering a more practical alternative to traditional simulations. The RL agents, implemented using Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) algorithms, were both trained and evaluated within this industrial-style emulated environment. Empirical results demonstrate that both DRL algorithms successfully learned effective and well-timed defensive actions under realistic constraints, confirming their capability to operate in dynamic, real-world APT scenarios.
KW - APT
KW - Advanced persistent threat
KW - Decision making
KW - IRS
KW - Intrusion response system
KW - Multi-stage attack
KW - Reinforcement learning
UR - https://www.scopus.com/pages/publications/105012177214
U2 - 10.1016/j.eswa.2025.129168
DO - 10.1016/j.eswa.2025.129168
M3 - Article
AN - SCOPUS:105012177214
SN - 0957-4174
VL - 296
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 129168
ER -