Memory Reduction through Experience Classification f or Deep Reinforcement Learning with Prioritized Experience Replay

Kai Huan Shen, Pei Yun Tsai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Prioritized experience replay has been widely used in many online reinforcement learning algorithms, providing high efficiency in exploiting past experiences. However, a large replay buffer consumes system storage significantly. Thus, in this paper, a segmentation and classification scheme is proposed. The distribution of temporal-difference errors (TD errors) is first segmented. The experience for network training is classified according to its updated TD error. Then, a swap mechanism for similar experiences is implemented to change the lifetimes of experiences in the replay buffer. The proposed scheme is incorporated in the Deep Deterministic Policy Gradient (DDPG) algorithm, and the Inverted Pendulum and Inverted Double Pendulum tasks are used for verification. From the experiments, our proposed mechanism can effectively remove the buffer redundancy and further reduce the correlation of experiences in the replay buffer. Thus, better learning performance with reduced memory size is achieved at the cost of additional computations of updated TD errors.

Original languageEnglish
Title of host publication2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages166-171
Number of pages6
ISBN (Electronic)9781728119274
DOIs
StatePublished - Oct 2019
Event33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019 - Nanjing, China
Duration: 20 Oct 201923 Oct 2019

Publication series

NameIEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
Volume2019-October
ISSN (Print)1520-6130

Conference

Conference33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019
Country/TerritoryChina
CityNanjing
Period20/10/1923/10/19

Keywords

  • deep determinis-Tic policy gradient (DDPG)
  • prioritized experience replay (PER)
  • Reinforcement learning (RL)

Fingerprint

Dive into the research topics of 'Memory Reduction through Experience Classification f or Deep Reinforcement Learning with Prioritized Experience Replay'. Together they form a unique fingerprint.

Cite this