TY - JOUR
T1 - A full GPU implementation of a numerical method for simulating capsule suspensions
AU - Matsunaga, Daiki
AU - Imai, Yohsuke
AU - Omori, Toshihiro
AU - Ishikawa, Takuji
AU - Yamaguchi, Takami
N1 - Publisher Copyright:
© 2014 The Japan Society of Mechanical Engineers.
PY - 2014
Y1 - 2014
N2 - Although boundary element (BE) based methods are highly accurate for simulating capsule suspensions in Stokes flows, computational time has been a major issue, even when only a few capsules are simulated. We propose a full graphics processing unit (GPU) implementation of a numerical method coupling the BE method of fluid mechanics with the finite element method of membrane mechanics. In single GPU computing, the performance achieves 0.12 TFlop/s when computing one capsule (2562 nodes and 5120 elements) and 0.29 TFlop/s for two capsules. The performance increases with the number of capsules, achieving a maximum of 0.59 TFlop/s. We also implement a multi-GPU method with the data communication overlapping the computation. A weak scaling test shows perfect scalability for any number of computational nodes per GPU, indicating that the communication time is completely hidden. For a practical use of the present results, we estimate the computational time required for 10000 time steps. When we simulate one capsule and two capsules on one GPU, only 2.0 and 9.1 minutes are required to complete the simulation, respectively, and a simulation with 256 capsules on 16 GPUs takes 3.8 days.
AB - Although boundary element (BE) based methods are highly accurate for simulating capsule suspensions in Stokes flows, computational time has been a major issue, even when only a few capsules are simulated. We propose a full graphics processing unit (GPU) implementation of a numerical method coupling the BE method of fluid mechanics with the finite element method of membrane mechanics. In single GPU computing, the performance achieves 0.12 TFlop/s when computing one capsule (2562 nodes and 5120 elements) and 0.29 TFlop/s for two capsules. The performance increases with the number of capsules, achieving a maximum of 0.59 TFlop/s. We also implement a multi-GPU method with the data communication overlapping the computation. A weak scaling test shows perfect scalability for any number of computational nodes per GPU, indicating that the communication time is completely hidden. For a practical use of the present results, we estimate the computational time required for 10000 time steps. When we simulate one capsule and two capsules on one GPU, only 2.0 and 9.1 minutes are required to complete the simulation, respectively, and a simulation with 256 capsules on 16 GPUs takes 3.8 days.
KW - Boundary integral formulation
KW - Multi gpus
KW - Parallel computing
KW - Stokes flow
KW - Suspension
UR - http://www.scopus.com/inward/record.url?scp=84923650040&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84923650040&partnerID=8YFLogxK
U2 - 10.1299/jbse.14-00039
DO - 10.1299/jbse.14-00039
M3 - Article
AN - SCOPUS:84923650040
SN - 1880-9863
VL - 9
SP - 1
EP - 16
JO - Journal of Biomechanical Science and Engineering
JF - Journal of Biomechanical Science and Engineering
IS - 3
ER -