Abstract
In the context of cybersecurity digital twin environments, the ability to simulate realistic network traffic is critical for validating and training intrusion detection systems. However, generating synthetic data that accurately reflects the complex, time-dependent nature of network flows remains a significant challenge. This paper presents an AI-based data generation approach designed to generate multivariate temporal network flow data that accurately reflects adversarial scenarios. The proposed method integrates a Long Short-Term Memory (LSTM) architecture trained to capture the temporal dynamics of both normal and attack traffic, ensuring the synthetic data preserves realistic, sequence-aware behavioral patterns. To further enhance data fidelity, a combination of deep learning-based generative models and statistical techniques is employed to synthesize both numerical and categorical features while maintaining the correct proportions and temporal relationships between attack and normal traffic. A key contribution of the framework is its ability to generate high-fidelity synthetic data that supports the simulation of realistic, production-like cybersecurity scenarios. Experimental results demonstrate the effectiveness of the approach in generating data that supports robust machine learning-based detection systems, making it a valuable tool for cybersecurity validation and training in digital twin environments.
| Original language | English |
|---|---|
| Article number | 11574 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 15 |
| Issue number | 21 |
| DOIs | |
| Publication status | Published - Nov 2025 |
Keywords
- AI-based simulation
- cybersecurity
- digital twin
- network flow data
- synthetic data generation