TY - GEN
T1 - Systolic architecture for computational fluid dynamics on FPGAs
AU - Sano, Kentaro
AU - Iizuka, Takanori
AU - Yamamoto, Satoru
PY - 2007
Y1 - 2007
N2 - This paper presents an FPGA-based flow solver based on the systolic architecture. We show that the fractionalstep method employing central difference schemes can be expressed as a systolic algorithm, and therefore the systolic architecture is suitable for a dedicated processor to the flow solver. We have designed a 2D systolic array of cells, each of which has a micro-programmable data-path containing a MAC (multiplication and accumulation) unit and a local memory to store necessary data for computational fluid dynamics. With ALTERA Stratix II FPGA, we implemented 96(= 12 x 8) cells running at 60MHz. Since the MAC unit has both an adder and a multiplier for single-precision floating-point numbers, the total peak performance is 11.5(= 96 x 6MHz x 2) GFlops. We made a choice of 2D square driven cavity flow as a benchmark computation based on the fractional-step method. For this computation, the FPGA-based processor running only at 60MHz achieved 7.14 and 6.41 times faster computations than Pentium4 processor at 3.2 GHz and Itanium2 at 1.4 GHz, respectively.
AB - This paper presents an FPGA-based flow solver based on the systolic architecture. We show that the fractionalstep method employing central difference schemes can be expressed as a systolic algorithm, and therefore the systolic architecture is suitable for a dedicated processor to the flow solver. We have designed a 2D systolic array of cells, each of which has a micro-programmable data-path containing a MAC (multiplication and accumulation) unit and a local memory to store necessary data for computational fluid dynamics. With ALTERA Stratix II FPGA, we implemented 96(= 12 x 8) cells running at 60MHz. Since the MAC unit has both an adder and a multiplier for single-precision floating-point numbers, the total peak performance is 11.5(= 96 x 6MHz x 2) GFlops. We made a choice of 2D square driven cavity flow as a benchmark computation based on the fractional-step method. For this computation, the FPGA-based processor running only at 60MHz achieved 7.14 and 6.41 times faster computations than Pentium4 processor at 3.2 GHz and Itanium2 at 1.4 GHz, respectively.
UR - http://www.scopus.com/inward/record.url?scp=47349089632&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47349089632&partnerID=8YFLogxK
U2 - 10.1109/FCCM.2007.20
DO - 10.1109/FCCM.2007.20
M3 - Conference contribution
AN - SCOPUS:47349089632
SN - 0769529402
SN - 9780769529400
T3 - Proceedings 2007 IEEE Symposium on Field-Programme Custom Computing Machines, FCCM 2007
SP - 107
EP - 116
BT - Proceedings 2007 IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2018
PB - IEEE Computer Society
T2 - 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2007
Y2 - 23 April 2007 through 25 April 2007
ER -