TY - GEN
T1 - An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments
AU - Andres, Alain
AU - Villar-Rodriguez, Esther
AU - Del Ser, Javier
N1 - Publisher Copyright:
© 2022, IFIP International Federation for Information Processing.
PY - 2022
Y1 - 2022
N2 - In the last few years, the research activity around reinforcement learning tasks formulated over environments with sparse rewards has been especially notable. Among the numerous approaches proposed to deal with these hard exploration problems, intrinsic motivation mechanisms are arguably among the most studied alternatives to date. Advances reported in this area over time have tackled the exploration issue by proposing new algorithmic ideas to generate alternative mechanisms to measure the novelty. However, most efforts in this direction have overlooked the influence of different design choices and parameter settings that have also been introduced to improve the effect of the generated intrinsic bonus, forgetting the application of those choices to other intrinsic motivation techniques that may also benefit of them. Furthermore, some of those intrinsic methods are applied with different base reinforcement algorithms (e.g. PPO, IMPALA) and neural network architectures, being hard to fairly compare the provided results and the actual progress provided by each solution. The goal of this work is to stress on this crucial matter in reinforcement learning over hard exploration environments, exposing the variability and susceptibility of avant-garde intrinsic motivation techniques to diverse design factors. Ultimately, our experiments herein reported underscore the importance of a careful selection of these design aspects coupled with the exploration requirements of the environment and the task in question under the same setup, so that fair comparisons can be guaranteed.
AB - In the last few years, the research activity around reinforcement learning tasks formulated over environments with sparse rewards has been especially notable. Among the numerous approaches proposed to deal with these hard exploration problems, intrinsic motivation mechanisms are arguably among the most studied alternatives to date. Advances reported in this area over time have tackled the exploration issue by proposing new algorithmic ideas to generate alternative mechanisms to measure the novelty. However, most efforts in this direction have overlooked the influence of different design choices and parameter settings that have also been introduced to improve the effect of the generated intrinsic bonus, forgetting the application of those choices to other intrinsic motivation techniques that may also benefit of them. Furthermore, some of those intrinsic methods are applied with different base reinforcement algorithms (e.g. PPO, IMPALA) and neural network architectures, being hard to fairly compare the provided results and the actual progress provided by each solution. The goal of this work is to stress on this crucial matter in reinforcement learning over hard exploration environments, exposing the variability and susceptibility of avant-garde intrinsic motivation techniques to diverse design factors. Ultimately, our experiments herein reported underscore the importance of a careful selection of these design aspects coupled with the exploration requirements of the environment and the task in question under the same setup, so that fair comparisons can be guaranteed.
KW - Exploration-exploitation
KW - Hard exploration
KW - Intrinsic motivation
KW - Reinforcement learning
KW - Sparse rewards
UR - http://www.scopus.com/inward/record.url?scp=85136921132&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-14463-9_13
DO - 10.1007/978-3-031-14463-9_13
M3 - Conference contribution
AN - SCOPUS:85136921132
SN - 9783031144622
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 201
EP - 220
BT - Machine Learning and Knowledge Extraction - 6th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2022, Proceedings
A2 - Holzinger, Andreas
A2 - Holzinger, Andreas
A2 - Holzinger, Andreas
A2 - Kieseberg, Peter
A2 - Tjoa, A Min
A2 - Weippl, Edgar
A2 - Weippl, Edgar
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference for Machine Learning and Knowledge Extraction, CD-MAKE 2022, held in conjunction with the 17th International Conference on Availability, Reliability and Security, ARES 2022
Y2 - 23 August 2022 through 26 August 2022
ER -