Abstract
The deployment of Large Language Models (LLMs) in multi-Graphics Processing Unit (GPU) environments faces significant challenges regarding energy consumption and load distribution. While most research focuses on optimizing inference throughput, there is a critical lack of frameworks bridging fine-grained telemetry with proactive, energy-aware load balancing. This paper presents a modular forecasting driven perception layer that leverages near real-time GPU power telemetry to enable optimized workload allocation. Using fine-grained telemetry from an operational High Performance Computing (HPC) cluster, we evaluate state-of-the-art time-series architectures, including Spiking Neural Networks (SNN), Recurrent Neural Networks (RNN), Transformers, and Structured State Space Models (SSSM). These models are assessed across operational horizons of 30 s for near-instantaneous balancing and 1 min for near-future system stability. Our results demonstrate that the Gated Recurrent Unit (GRU) achieves superior performance, with a Mean Absolute Error (MAE) of 7.97 W for the 30 s window and 9.7 W for the 1 min window. By establishing a validated forecasting backbone, this approach provides a plug-and-play forecasting component that can be integrated into Deep Reinforcement Learning (DRL) or heuristic schedulers, offering a scalable solution to improve the sustainability and efficiency of large-scale LLM serving.
| Original language | English |
|---|---|
| Article number | 101333 |
| Journal | Sustainable Computing: Informatics and Systems |
| Volume | 50 |
| DOIs | |
| Publication status | Published - Jun 2026 |
Keywords
- Energy forecasting
- Green AI
- Green LLM
- HPC
- LLM
- LLM energy consumption
- Load-balancing
- RNN
- S4
- SNN
- Sustainable computing
- Time-series forecasting
- Transformers
Fingerprint
Dive into the research topics of 'A forecasting-based perception layer for energy-aware resource management in LLM serving deployed on high-performance computing clusters'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver