TY - GEN
T1 - Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs
AU - Mondigo, Antoniette
AU - Ueno, Tomohiro
AU - Tanaka, Daichi
AU - Sano, Kentaro
AU - Yamamoto, Satoru
N1 - Funding Information:
This research was partially supported by Grant-in-Aid for Scientific Research (B) No.17H01706 from MEXT, Japan. We thank the support of Intel university program.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/23
Y1 - 2017/8/23
N2 - Stream computing in Field Programmable Gate Arrays (FPGAs) is seen as a promising solution in delivering the necessary performance and energy efficiency requirements of compute-intensive applications like numerical simulations. The inherent structure and customizability of FPGAs naturally make them the better alternative in achieving a highly-scalable computing design solution. This paper presents a scalable custom computing approach through temporal parallelism by increasing the depth of a computing pipeline in a 1D ring of cascaded FPGAs with high-speed, low-latency communication links. Spatial parallelism is also explored by replicating the computing core inside the FPGAs to further increase throughput. Due to communication bandwidth limitations, a hardware-based lossless bandwidth compression scheme was utilized in order to alleviate this bottleneck and transfer more data streams. A performance model is presented for the scalability analysis and performance estimation of this approach. For evaluation and verification, an actual numerical simulation was implemented on an Intel Arria 10 FPGA with spatially paralleled computing cores. Initial results show that the measured performance ratings are close to the predicted values using the performance model. Similarly, it was also demonstrated that the 1D ring topology of multiple FPGAs with bandwidth-compressed links can scale the performance when a sufficiently large data set is computed, even with a deeper pipeline and insufficient inter-FPGA bandwidth.
AB - Stream computing in Field Programmable Gate Arrays (FPGAs) is seen as a promising solution in delivering the necessary performance and energy efficiency requirements of compute-intensive applications like numerical simulations. The inherent structure and customizability of FPGAs naturally make them the better alternative in achieving a highly-scalable computing design solution. This paper presents a scalable custom computing approach through temporal parallelism by increasing the depth of a computing pipeline in a 1D ring of cascaded FPGAs with high-speed, low-latency communication links. Spatial parallelism is also explored by replicating the computing core inside the FPGAs to further increase throughput. Due to communication bandwidth limitations, a hardware-based lossless bandwidth compression scheme was utilized in order to alleviate this bottleneck and transfer more data streams. A performance model is presented for the scalability analysis and performance estimation of this approach. For evaluation and verification, an actual numerical simulation was implemented on an Intel Arria 10 FPGA with spatially paralleled computing cores. Initial results show that the measured performance ratings are close to the predicted values using the performance model. Similarly, it was also demonstrated that the 1D ring topology of multiple FPGAs with bandwidth-compressed links can scale the performance when a sufficiently large data set is computed, even with a deeper pipeline and insufficient inter-FPGA bandwidth.
UR - http://www.scopus.com/inward/record.url?scp=85030627736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030627736&partnerID=8YFLogxK
U2 - 10.1109/ReCoSoC.2017.8016148
DO - 10.1109/ReCoSoC.2017.8016148
M3 - Conference contribution
AN - SCOPUS:85030627736
T3 - 12th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip, ReCoSoC 2017 - Proceedings
BT - 12th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip, ReCoSoC 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip, ReCoSoC 2017
Y2 - 12 July 2017 through 14 July 2017
ER -