TY - JOUR
T1 - Performance evaluation of the LBM simulations in fluid dynamics on SX-Aurora TSUBASA vector engine
AU - Sun, Xiangcheng
AU - Takahashi, Keichi
AU - Shimomura, Yoichi
AU - Takizawa, Hiroyuki
AU - Wang, Xian
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2025/2
Y1 - 2025/2
N2 - Currently, the lattice Boltzmann method (LBM) with high-performance computing (HPC) technologies, such as graphics processing units (GPUs), has been widely adopted to solve various complex problems in fluid dynamics. In addition to GPUs, the vector engine (VE) developed by NEC Corporation has also emerged as an effective solution for memory-intensive numerical simulations such as LBM. Consequently, it is imperative to evaluate the performance of LBM simulations accelerated by VE. This study discusses our self-developed LBM code for both classical and fused implementations on the VE. Through numerical simulations of 2D and 3D lid-driven cavity flows, the performance of the brand-new VE Type 30A (VE30) in conducting large-scale grid is evaluated and analyzed, and a comparison is made against the results obtained with VE Type 20B (VE20), NVIDIA A100 GPU (A100) and H100 GPU (H100). The results indicate that, regardless of the LBM implementation, H100 achieves the highest performance. Furthermore, owing to the substantial enhancements in VE30's memory hierarchy, the performance of the streaming kernel in the classical implementation of LBM has been significantly improved compared to VE20 and A100, approaching that of H100. However, due to the characteristic of fused implementation requiring fewer memory accesses, the performance of VE30 is inferior to that of H100 in the fused implementation. Additionally, it is anticipated that, under specific physical issues and requirements, VE30 will exhibit evident performance potential in LBM simulations with large-scale grid sizes.
AB - Currently, the lattice Boltzmann method (LBM) with high-performance computing (HPC) technologies, such as graphics processing units (GPUs), has been widely adopted to solve various complex problems in fluid dynamics. In addition to GPUs, the vector engine (VE) developed by NEC Corporation has also emerged as an effective solution for memory-intensive numerical simulations such as LBM. Consequently, it is imperative to evaluate the performance of LBM simulations accelerated by VE. This study discusses our self-developed LBM code for both classical and fused implementations on the VE. Through numerical simulations of 2D and 3D lid-driven cavity flows, the performance of the brand-new VE Type 30A (VE30) in conducting large-scale grid is evaluated and analyzed, and a comparison is made against the results obtained with VE Type 20B (VE20), NVIDIA A100 GPU (A100) and H100 GPU (H100). The results indicate that, regardless of the LBM implementation, H100 achieves the highest performance. Furthermore, owing to the substantial enhancements in VE30's memory hierarchy, the performance of the streaming kernel in the classical implementation of LBM has been significantly improved compared to VE20 and A100, approaching that of H100. However, due to the characteristic of fused implementation requiring fewer memory accesses, the performance of VE30 is inferior to that of H100 in the fused implementation. Additionally, it is anticipated that, under specific physical issues and requirements, VE30 will exhibit evident performance potential in LBM simulations with large-scale grid sizes.
KW - Computational fluid dynamics
KW - Lattice Boltzmann method
KW - Memory-intensive application
KW - Performance evaluation
KW - SX-Aurora TSUBASA
KW - Vector engine
UR - http://www.scopus.com/inward/record.url?scp=85207322705&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85207322705&partnerID=8YFLogxK
U2 - 10.1016/j.cpc.2024.109411
DO - 10.1016/j.cpc.2024.109411
M3 - Article
AN - SCOPUS:85207322705
SN - 0010-4655
VL - 307
JO - Computer Physics Communications
JF - Computer Physics Communications
M1 - 109411
ER -