TY - GEN
T1 - CheCL
T2 - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
AU - Takizawa, Hiroyuki
AU - Koyama, Kentaro
AU - Sato, Katsuto
AU - Komatsu, Kazuhiko
AU - Kobayashi, Hiroaki
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional check pointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function, two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely check pointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent check pointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.
AB - In this paper, we propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high-performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional check pointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is forwarded to another process called an API proxy, and the API proxy invokes the API function, two processes, an application process and an API proxy, are launched for an OpenCL application. In this case, as the application process is not an OpenCL process but a standard process, it can be safely check pointed. While CheCL intercepts all API calls, it records the information necessary for restoring OpenCL objects. The application process does not hold any OpenCL handles, but CheCL handles to keep such information. Those handles are automatically converted to OpenCL handles and then passed to API functions. Upon restart, OpenCL objects are automatically restored based on the recorded information. This paper demonstrates the feasibility of transparent check pointing of OpenCL programs including MPI applications, and quantitatively evaluates the runtime overheads. It is also discussed that CheCL can enable process migration of OpenCL applications among distinct nodes, and among different kinds of compute devices such as a CPU and a GPU.
UR - http://www.scopus.com/inward/record.url?scp=80053270870&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053270870&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2011.85
DO - 10.1109/IPDPS.2011.85
M3 - Conference contribution
AN - SCOPUS:80053270870
SN - 9780769543857
T3 - Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
SP - 864
EP - 876
BT - Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
Y2 - 16 May 2011 through 20 May 2011
ER -