TY - GEN
T1 - Scalability analysis of tightly-coupled FPGA-cluster for lattice Boltzmann computation
AU - Kono, Yoshiaki
AU - Sano, Kentaro
AU - Yamamoto, Satoru
PY - 2012
Y1 - 2012
N2 - This paper presents a performance model of an LBM accelerator to be implemented on a tightly-coupled FPGA cluster. In strong scaling, each accelerator node has a smaller computation as the nodes increase, and consequently communication overhead becomes apparent and limits the scalability. Our tightly-coupled FPGA cluster has the 1D ring of the accelerator-domain network (ADN) which allows FPGAs to send and receive data with low communication overhead. We propose the LBM accelerator architecture and its stream computation appropriate to use ADN. We formulate a sustained-performance model of the accelerator, which consists of three cases depending on one of the resource availability, the network bandwidth and the size of shift-registers. With the model, we show that the network bandwidth is much more important than the memory bandwidth. The wider the network bandwidth is, the more FPGAs can scale the sustained performance in computing a constant size of a lattice. This result demonstrates the importance of ADN in the tightly-coupled FPGA cluster.
AB - This paper presents a performance model of an LBM accelerator to be implemented on a tightly-coupled FPGA cluster. In strong scaling, each accelerator node has a smaller computation as the nodes increase, and consequently communication overhead becomes apparent and limits the scalability. Our tightly-coupled FPGA cluster has the 1D ring of the accelerator-domain network (ADN) which allows FPGAs to send and receive data with low communication overhead. We propose the LBM accelerator architecture and its stream computation appropriate to use ADN. We formulate a sustained-performance model of the accelerator, which consists of three cases depending on one of the resource availability, the network bandwidth and the size of shift-registers. With the model, we show that the network bandwidth is much more important than the memory bandwidth. The wider the network bandwidth is, the more FPGAs can scale the sustained performance in computing a constant size of a lattice. This result demonstrates the importance of ADN in the tightly-coupled FPGA cluster.
UR - http://www.scopus.com/inward/record.url?scp=84870685431&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870685431&partnerID=8YFLogxK
U2 - 10.1109/FPL.2012.6339275
DO - 10.1109/FPL.2012.6339275
M3 - Conference contribution
AN - SCOPUS:84870685431
SN - 9781467322560
T3 - Proceedings - 22nd International Conference on Field Programmable Logic and Applications, FPL 2012
SP - 120
EP - 127
BT - Proceedings - 22nd International Conference on Field Programmable Logic and Applications, FPL 2012
T2 - 22nd International Conference on Field Programmable Logic and Applications, FPL 2012
Y2 - 29 August 2012 through 31 August 2012
ER -