TY - JOUR
T1 - Large Language Models for Structured Task Decomposition in Reinforcement Learning Problems with Sparse Rewards
AU - Ruiz-Gonzalez, Unai
AU - Andres, Alain
AU - Del Ser, Javier
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/12
Y1 - 2025/12
N2 - Reinforcement learning (RL) agents face significant challenges in sparse-reward environments, as insufficient exploration of the state space can result in inefficient training or incomplete policy learning. To address this challenge, this work proposes a teacher–student framework for RL that leverages the inherent knowledge of large language models (LLMs) to decompose complex tasks into manageable subgoals. The capabilities of LLMs to comprehend problem structure and objectives, based on textual descriptions, can be harnessed to generate subgoals, similar to the guidance a human supervisor would provide. For this purpose, we introduce the following three subgoal types: positional, representation-based, and language-based. Moreover, we propose an LLM surrogate model to reduce computational overhead and demonstrate that the supervisor can be decoupled once the policy has been learned, further lowering computational costs. Under this framework, we evaluate the performance of three open-source LLMs (namely, Llama, DeepSeek, and Qwen). Furthermore, we assess our teacher–student framework on the MiniGrid benchmark—a collection of procedurally generated environments that demand generalization to previously unseen tasks. Experimental results indicate that our teacher–student framework facilitates more efficient learning and encourages enhanced exploration in complex tasks, resulting in faster training convergence and outperforming recent teacher–student methods designed for sparse-reward environments.
AB - Reinforcement learning (RL) agents face significant challenges in sparse-reward environments, as insufficient exploration of the state space can result in inefficient training or incomplete policy learning. To address this challenge, this work proposes a teacher–student framework for RL that leverages the inherent knowledge of large language models (LLMs) to decompose complex tasks into manageable subgoals. The capabilities of LLMs to comprehend problem structure and objectives, based on textual descriptions, can be harnessed to generate subgoals, similar to the guidance a human supervisor would provide. For this purpose, we introduce the following three subgoal types: positional, representation-based, and language-based. Moreover, we propose an LLM surrogate model to reduce computational overhead and demonstrate that the supervisor can be decoupled once the policy has been learned, further lowering computational costs. Under this framework, we evaluate the performance of three open-source LLMs (namely, Llama, DeepSeek, and Qwen). Furthermore, we assess our teacher–student framework on the MiniGrid benchmark—a collection of procedurally generated environments that demand generalization to previously unseen tasks. Experimental results indicate that our teacher–student framework facilitates more efficient learning and encourages enhanced exploration in complex tasks, resulting in faster training convergence and outperforming recent teacher–student methods designed for sparse-reward environments.
KW - goal-oriented reinforcement learning
KW - sparse-reward environments
KW - teacher–student
UR - https://www.scopus.com/pages/publications/105025903388
U2 - 10.3390/make7040126
DO - 10.3390/make7040126
M3 - Article
AN - SCOPUS:105025903388
SN - 2504-4990
VL - 7
JO - Machine Learning and Knowledge Extraction
JF - Machine Learning and Knowledge Extraction
IS - 4
M1 - 126
ER -