Skip to main navigation Skip to search Skip to main content

A forecasting-based perception layer for energy-aware resource management in LLM serving deployed on high-performance computing clusters

  • University of Deusto

Research output: Contribution to journalArticlepeer-review

Abstract

The deployment of Large Language Models (LLMs) in multi-Graphics Processing Unit (GPU) environments faces significant challenges regarding energy consumption and load distribution. While most research focuses on optimizing inference throughput, there is a critical lack of frameworks bridging fine-grained telemetry with proactive, energy-aware load balancing. This paper presents a modular forecasting driven perception layer that leverages near real-time GPU power telemetry to enable optimized workload allocation. Using fine-grained telemetry from an operational High Performance Computing (HPC) cluster, we evaluate state-of-the-art time-series architectures, including Spiking Neural Networks (SNN), Recurrent Neural Networks (RNN), Transformers, and Structured State Space Models (SSSM). These models are assessed across operational horizons of 30 s for near-instantaneous balancing and 1 min for near-future system stability. Our results demonstrate that the Gated Recurrent Unit (GRU) achieves superior performance, with a Mean Absolute Error (MAE) of 7.97 W for the 30 s window and 9.7 W for the 1 min window. By establishing a validated forecasting backbone, this approach provides a plug-and-play forecasting component that can be integrated into Deep Reinforcement Learning (DRL) or heuristic schedulers, offering a scalable solution to improve the sustainability and efficiency of large-scale LLM serving.

Original languageEnglish
Article number101333
JournalSustainable Computing: Informatics and Systems
Volume50
DOIs
Publication statusPublished - Jun 2026

Keywords

  • Energy forecasting
  • Green AI
  • Green LLM
  • HPC
  • LLM
  • LLM energy consumption
  • Load-balancing
  • RNN
  • S4
  • SNN
  • Sustainable computing
  • Time-series forecasting
  • Transformers

Fingerprint

Dive into the research topics of 'A forecasting-based perception layer for energy-aware resource management in LLM serving deployed on high-performance computing clusters'. Together they form a unique fingerprint.

Cite this