TY - GEN
T1 - A Many-core Architecture for an Ensemble Ternary Neural Network Toward High-Throughput Inference
AU - Kayanoma, Ryota
AU - Jinguji, Akira
AU - Nakahara, Hiroki
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Machine learning is expanding in various applications, such as image processing in data centers. With the spread of deep learning, neural-network-based models have frequently been adopted in recent years. Due to the slow processing speed of machine learning evaluation on a CPU, high-speed, dedicated hardware accelerators are often used. In particular, the demand for hardware accelerators in data centers is increasing, with a need for low power consumption and high-speed processing in a limited space. Here, we propose an implementation method for a ternary neural network, utilizing the rewritable look-up table (LUT) of a field-programmable gate array (FPGA). Ternary neural networks (TNNs), quantized to 2 bits, can be realized with LUT-based combinational circuits, allowing inference processing in a single cycle. Thus, a very high-speed inference system can be realized. Moreover, we have reduced the hardware quantity by 70% by introducing sparsity, i.e., approximating the parameters to zero. However, there was a downside of reduced recognition accuracy due to the low-bit representation. In this paper, we used an ensemble to achieve recognition accuracy equivalent to that of the 32-bit float model to prevent the decrease in recognition accuracy. We also designed a voting circuit for the ensemble TNN that does not decrease throughput. By implementing it on the AMD Alveo U50 FPGA card, we achieved a high processing speed of 100 Mega Frames Per Second (MFPS). Our FPGA-based system was 1,286 times faster than the CPU and 1,364 times faster than the GPU. Therefore, we achieve a high-speed inference system without compromising recognition accuracy.
AB - Machine learning is expanding in various applications, such as image processing in data centers. With the spread of deep learning, neural-network-based models have frequently been adopted in recent years. Due to the slow processing speed of machine learning evaluation on a CPU, high-speed, dedicated hardware accelerators are often used. In particular, the demand for hardware accelerators in data centers is increasing, with a need for low power consumption and high-speed processing in a limited space. Here, we propose an implementation method for a ternary neural network, utilizing the rewritable look-up table (LUT) of a field-programmable gate array (FPGA). Ternary neural networks (TNNs), quantized to 2 bits, can be realized with LUT-based combinational circuits, allowing inference processing in a single cycle. Thus, a very high-speed inference system can be realized. Moreover, we have reduced the hardware quantity by 70% by introducing sparsity, i.e., approximating the parameters to zero. However, there was a downside of reduced recognition accuracy due to the low-bit representation. In this paper, we used an ensemble to achieve recognition accuracy equivalent to that of the 32-bit float model to prevent the decrease in recognition accuracy. We also designed a voting circuit for the ensemble TNN that does not decrease throughput. By implementing it on the AMD Alveo U50 FPGA card, we achieved a high processing speed of 100 Mega Frames Per Second (MFPS). Our FPGA-based system was 1,286 times faster than the CPU and 1,364 times faster than the GPU. Therefore, we achieve a high-speed inference system without compromising recognition accuracy.
KW - Ensemble nueral network
KW - FPGA
KW - Many core
KW - Neural Network
UR - http://www.scopus.com/inward/record.url?scp=85184657248&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184657248&partnerID=8YFLogxK
U2 - 10.1109/MCSoC60832.2023.00073
DO - 10.1109/MCSoC60832.2023.00073
M3 - Conference contribution
AN - SCOPUS:85184657248
T3 - Proceedings - 2023 16th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2023
SP - 446
EP - 453
BT - Proceedings - 2023 16th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2023
Y2 - 18 December 2023 through 21 December 2023
ER -