TY - JOUR
T1 - An On-Chip Fully Connected Neural Network Training Hardware Accelerator Based on Brain Float Point and Sparsity Awareness
AU - Tsai, Tsung Han
AU - Lin, Ding Bang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2023
Y1 - 2023
N2 - In recent years, deep neural networks (DNNs) have brought revolutionary progress in various fields with the advent of technology. It is widely used in image pre-processing, image enhancement technology, face recognition, voice recognition, and other applications, gradually replacing traditional algorithms. It shows that the rise of neural networks has led to the reform of artificial intelligence. Since neural network algorithms are computationally intensive, they require GPUs or accelerated hardware for real-time computation. However, the high cost and high power consumption of GPUs result in low energy efficiency. It recently led to much research on accelerated digital circuit hardware design for deep neural networks. In this paper, we propose an efficient and flexible neural network training processor for fully connected layers. Our proposed training processor features low power consumption, high throughput, and high energy efficiency. It uses the sparsity of neuron activations to reduce the number of memory accesses and memory space to achieve an efficient training accelerator. The proposed processor uses a novel reconfigurable computing architecture to maintain high performance when operating Forward Propagation and Backward Propagation. The processor is implemented in Xilinx Zynq UltraSacle+MPSoC ZCU104 FPGA, with an operating frequency of 200MHz and power consumption of 6.444W, and can achieve 102.43 GOPS.
AB - In recent years, deep neural networks (DNNs) have brought revolutionary progress in various fields with the advent of technology. It is widely used in image pre-processing, image enhancement technology, face recognition, voice recognition, and other applications, gradually replacing traditional algorithms. It shows that the rise of neural networks has led to the reform of artificial intelligence. Since neural network algorithms are computationally intensive, they require GPUs or accelerated hardware for real-time computation. However, the high cost and high power consumption of GPUs result in low energy efficiency. It recently led to much research on accelerated digital circuit hardware design for deep neural networks. In this paper, we propose an efficient and flexible neural network training processor for fully connected layers. Our proposed training processor features low power consumption, high throughput, and high energy efficiency. It uses the sparsity of neuron activations to reduce the number of memory accesses and memory space to achieve an efficient training accelerator. The proposed processor uses a novel reconfigurable computing architecture to maintain high performance when operating Forward Propagation and Backward Propagation. The processor is implemented in Xilinx Zynq UltraSacle+MPSoC ZCU104 FPGA, with an operating frequency of 200MHz and power consumption of 6.444W, and can achieve 102.43 GOPS.
KW - Fully connected layers
KW - energy efficient
KW - learning on edge
KW - on-chip training
KW - optimized memory access
KW - sparsity
UR - http://www.scopus.com/inward/record.url?scp=85173702166&partnerID=8YFLogxK
U2 - 10.1109/OJCAS.2023.3245061
DO - 10.1109/OJCAS.2023.3245061
M3 - 期刊論文
AN - SCOPUS:85173702166
SN - 2644-1225
VL - 4
SP - 85
EP - 98
JO - IEEE Open Journal of Circuits and Systems
JF - IEEE Open Journal of Circuits and Systems
ER -