Processing Element Architecture Design for Deep Reinforcement Learning with Flexible Block Floating Point Exploiting Signal Statistics

Juyn Da Su, Pei Yun Tsai

研究成果: 書貢獻/報告類型會議論文篇章同行評審

1 引文 斯高帕斯(Scopus)

摘要

Deep reinforcement learning is a technique that allows the agent to have evolving learning capability for unknown environments and thus has the potential to surpass human expertise. The hardware architecture for DRL supporting on-line Q-learning and on-line training is presented in this paper. Two processing element (PE) arrays are used for handling evaluation network and target network respectively. Through configuration of two modes for PE operations, all required forward and backward computations can be accomplished and the number of processing cycles can be derived. Due to the precision required for on-line Q-learning and training, we propose flexible block floating-point (FBFP) to reduce the overhead of floating-point adders. The FBFP exploits different signal statistics during the learning process. Furthermore, the respective block exponents of gradients are adjusted following the variation of temporal difference (TD) error to reserve resolution. From the simulation results, the FBFP multiplier-and-accumulator (MAC) can reduce 15.8% of complexity compared to FP MAC while good learning performance can be maintained.

原文???core.languages.en_GB???
主出版物標題2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面82-87
頁數6
ISBN(電子)9789881476883
出版狀態已出版 - 7 12月 2020
事件2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Virtual, Auckland, New Zealand
持續時間: 7 12月 202010 12月 2020

出版系列

名字2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
國家/地區New Zealand
城市Virtual, Auckland
期間7/12/2010/12/20

指紋

深入研究「Processing Element Architecture Design for Deep Reinforcement Learning with Flexible Block Floating Point Exploiting Signal Statistics」主題。共同形成了獨特的指紋。

引用此