Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Alain Andres*, Esther Villar-Rodriguez, Javier Del Ser

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. Unfortunately, in a broad range of problems the design of a good reward function is not trivial, so in such cases sparse reward signals are instead adopted. The lack of a dense reward function poses new challenges, mostly related to exploration. Imitation Learning has addressed those problems by leveraging demonstrations from experts. In the absence of an expert (and its subsequent demonstrations), an option is to prioritize well-suited exploration experiences collected by the agent in order to bootstrap its learning process with good exploration behaviors. However, this solution highly depends on the ability of the agent to discover such trajectories in the early stages of its learning process. To tackle this issue, we propose to combine imitation learning with intrinsic motivation, two of the most widely adopted techniques to address problems with sparse reward. In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process. This combination is shown to yield an improved performance and better generalization in procedurally-generated environments, outperforming previously reported self-imitation learning methods and achieving equal or better sample efficiency with respect to intrinsic motivation in isolation.

Original languageEnglish
Title of host publicationProceedings of the 2022 IEEE Symposium Series on Computational Intelligence, SSCI 2022
EditorsHisao Ishibuchi, Chee-Keong Kwoh, Ah-Hwee Tan, Dipti Srinivasan, Chunyan Miao, Anupam Trivedi, Keeley Crockett
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages890-899
Number of pages10
ISBN (Electronic)9781665487689
DOIs
Publication statusPublished - 2022
Event2022 IEEE Symposium Series on Computational Intelligence, SSCI 2022 - Singapore, Singapore
Duration: 4 Dec 20227 Dec 2022

Publication series

NameProceedings of the 2022 IEEE Symposium Series on Computational Intelligence, SSCI 2022

Conference

Conference2022 IEEE Symposium Series on Computational Intelligence, SSCI 2022
Country/TerritorySingapore
CitySingapore
Period4/12/227/12/22

Funding

ACKNOWLEDGMENTS The authors would like to thank Daochen Zha from Rice University for the help provided in the implementation of RAPID. A. Andres receives funding support from the Basque Government through its BIKAINTEK PhD support program. J. Del Ser also acknowledges funding support from the Department of Education of the Basque Government (Consolidated Research Group MATHMODE, IT1456-22).

FundersFunder number
Department of Education of the Basque GovernmentIT1456-22
Eusko Jaurlaritza

    Keywords

    • Generalization
    • Intrinsic Motivation
    • Reinforcement Learning
    • Self Imitation Learning
    • Sparse Rewards

    Fingerprint

    Dive into the research topics of 'Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation'. Together they form a unique fingerprint.

    Cite this