TY - GEN
T1 - A memory-bandwidth-efficient word2vec accelerator using OpenCL for FPGA
AU - Shoji, Tomoki
AU - Waidyasooriya, Hasitha Muthumala
AU - Ono, Taisuke
AU - Hariyama, Masanori
AU - Aoki, Yuichiro
AU - Kondoh, Yuki
AU - Nakagawa, Yaoko
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Word2vec is a word embedding method that converts words into vectors in such a way that the semantically and syntactically relevant words are closed to each other in the vector space. FPGAs can be used to design low-power accelerators for Word2vec. FPGAs use highly parallel computations which require parallel data access. Since FPGAs generally have a small external memory access bandwidth compared to CPUs and GPUs, the processing speed is often restricted. We evaluate the trade-off between bandwidth and accuracy using different fixed-point formats, and propose a memory-bandwidth-efficient FPGA accelerator by utilizing 19-bit fixed-point data. We have implemented the proposed accelerator on an Intel Arria 10 FPGA using OpenCL, and achieved upto 28% bandwidth reduction without any degradation to the computation accuracy. Since the reduced bandwidth allows us to access more data without any data access bottleneck, it is possible to increase the processing speed by increasing the degree of parallelism.
AB - Word2vec is a word embedding method that converts words into vectors in such a way that the semantically and syntactically relevant words are closed to each other in the vector space. FPGAs can be used to design low-power accelerators for Word2vec. FPGAs use highly parallel computations which require parallel data access. Since FPGAs generally have a small external memory access bandwidth compared to CPUs and GPUs, the processing speed is often restricted. We evaluate the trade-off between bandwidth and accuracy using different fixed-point formats, and propose a memory-bandwidth-efficient FPGA accelerator by utilizing 19-bit fixed-point data. We have implemented the proposed accelerator on an Intel Arria 10 FPGA using OpenCL, and achieved upto 28% bandwidth reduction without any degradation to the computation accuracy. Since the reduced bandwidth allows us to access more data without any data access bottleneck, it is possible to increase the processing speed by increasing the degree of parallelism.
KW - Data compression
KW - FPGA
KW - Machine learning
KW - Natural language processing
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=85078835410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078835410&partnerID=8YFLogxK
U2 - 10.1109/CANDARW.2019.00026
DO - 10.1109/CANDARW.2019.00026
M3 - Conference contribution
AN - SCOPUS:85078835410
T3 - Proceedings - 2019 7th International Symposium on Computing and Networking Workshops, CANDARW 2019
SP - 103
EP - 108
BT - Proceedings - 2019 7th International Symposium on Computing and Networking Workshops, CANDARW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Symposium on Computing and Networking Workshops, CANDARW 2019
Y2 - 26 November 2019 through 29 November 2019
ER -