TY - GEN
T1 - A Light-Weight Vision Transformer Toward Near Memory Computation on an FPGA
AU - Senoo, Takeshi
AU - Kayanoma, Ryota
AU - Jinguji, Akira
AU - Nakahara, Hiroki
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Computer Vision AI is making remarkable advances in image recognition, object detection, and segmentation tasks. However, the model size continuously expands, necessitating dedicated hardware acceleration for the real-time processing of these tasks on embedded systems. The Vision Transformer (ViT) is gaining attention as a new approach to replace Convolutional Neural Networks (CNN) in image recognition tasks. However, ViT, while achieving high recognition accuracy, requires a complex structure and many parameters, making it difficult to implement in real time. Near-memory computing allows faster processing by closely placing data processing and memory access together. We are optimizing ViT for near-memory computation. We design a distributed on-chip memory suitable for near-memory computing and a calculation flow that closely integrates with it on an FPGA. This allows us to achieve more real-time image AI processing with higher recognition accuracy. With ImageNet2012 test images, the recognition accuracy of LW-ViT was 78.38% in the Top-1 category and 94.12% in the Top-5 category. Our implementation was 1.6 times faster than an embedded GPU while maintaining the same recognition accuracy. Compared with other FPGA implementations, while achieving a real-time processing time of 29.97 fps for camera images, the recognition accuracy was 6.6–10.2 points higher. Therefore, our implementation is suitable for real-time image recognition with high recognition accuracy.
AB - Computer Vision AI is making remarkable advances in image recognition, object detection, and segmentation tasks. However, the model size continuously expands, necessitating dedicated hardware acceleration for the real-time processing of these tasks on embedded systems. The Vision Transformer (ViT) is gaining attention as a new approach to replace Convolutional Neural Networks (CNN) in image recognition tasks. However, ViT, while achieving high recognition accuracy, requires a complex structure and many parameters, making it difficult to implement in real time. Near-memory computing allows faster processing by closely placing data processing and memory access together. We are optimizing ViT for near-memory computation. We design a distributed on-chip memory suitable for near-memory computing and a calculation flow that closely integrates with it on an FPGA. This allows us to achieve more real-time image AI processing with higher recognition accuracy. With ImageNet2012 test images, the recognition accuracy of LW-ViT was 78.38% in the Top-1 category and 94.12% in the Top-5 category. Our implementation was 1.6 times faster than an embedded GPU while maintaining the same recognition accuracy. Compared with other FPGA implementations, while achieving a real-time processing time of 29.97 fps for camera images, the recognition accuracy was 6.6–10.2 points higher. Therefore, our implementation is suitable for real-time image recognition with high recognition accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85174439147&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174439147&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-42921-7_23
DO - 10.1007/978-3-031-42921-7_23
M3 - Conference contribution
AN - SCOPUS:85174439147
SN - 9783031429200
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 338
EP - 353
BT - Applied Reconfigurable Computing. Architectures, Tools, and Applications - 19th International Symposium, ARC 2023, Proceedings
A2 - Palumbo, Francesca
A2 - Keramidas, Georgios
A2 - Voros, Nikolaos
A2 - Diniz, Pedro C.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th International Symposium on Applied Reconfigurable Computing, ARC 2023
Y2 - 27 September 2023 through 29 September 2023
ER -