TY - GEN
T1 - Memory Reduction through Experience Classification f or Deep Reinforcement Learning with Prioritized Experience Replay
AU - Shen, Kai Huan
AU - Tsai, Pei Yun
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Prioritized experience replay has been widely used in many online reinforcement learning algorithms, providing high efficiency in exploiting past experiences. However, a large replay buffer consumes system storage significantly. Thus, in this paper, a segmentation and classification scheme is proposed. The distribution of temporal-difference errors (TD errors) is first segmented. The experience for network training is classified according to its updated TD error. Then, a swap mechanism for similar experiences is implemented to change the lifetimes of experiences in the replay buffer. The proposed scheme is incorporated in the Deep Deterministic Policy Gradient (DDPG) algorithm, and the Inverted Pendulum and Inverted Double Pendulum tasks are used for verification. From the experiments, our proposed mechanism can effectively remove the buffer redundancy and further reduce the correlation of experiences in the replay buffer. Thus, better learning performance with reduced memory size is achieved at the cost of additional computations of updated TD errors.
AB - Prioritized experience replay has been widely used in many online reinforcement learning algorithms, providing high efficiency in exploiting past experiences. However, a large replay buffer consumes system storage significantly. Thus, in this paper, a segmentation and classification scheme is proposed. The distribution of temporal-difference errors (TD errors) is first segmented. The experience for network training is classified according to its updated TD error. Then, a swap mechanism for similar experiences is implemented to change the lifetimes of experiences in the replay buffer. The proposed scheme is incorporated in the Deep Deterministic Policy Gradient (DDPG) algorithm, and the Inverted Pendulum and Inverted Double Pendulum tasks are used for verification. From the experiments, our proposed mechanism can effectively remove the buffer redundancy and further reduce the correlation of experiences in the replay buffer. Thus, better learning performance with reduced memory size is achieved at the cost of additional computations of updated TD errors.
KW - Reinforcement learning (RL)
KW - deep determinis-Tic policy gradient (DDPG)
KW - prioritized experience replay (PER)
UR - http://www.scopus.com/inward/record.url?scp=85082394901&partnerID=8YFLogxK
U2 - 10.1109/SiPS47522.2019.9020610
DO - 10.1109/SiPS47522.2019.9020610
M3 - 會議論文篇章
AN - SCOPUS:85082394901
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
SP - 166
EP - 171
BT - 2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019
Y2 - 20 October 2019 through 23 October 2019
ER -