TY - GEN
T1 - A comparison of performance tunabilities between OpenCL and OpenACC
AU - Sugawara, Makoto
AU - Hirasawa, Shoichi
AU - Komatsu, Kazuhiko
AU - Takizawa, Hiroyuki
AU - Kobayashi, Hiroaki
PY - 2013/1/1
Y1 - 2013/1/1
N2 - To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. This paper hence discusses the performance tunabilities of OpenACC and OpenCL. As OpenACC cannot synchronize threads running on GPUs, some important techniques are not available to OpenACC. Therefore, we also design an additional compiler directive for thread synchronization. Evaluation results show that both OpenCL and OpenACC need architecture-aware optimizations, and similar approaches to performance optimization are effective for both OpenCL and OpenACC. The additional directive can allow OpenACC to describe more tuning techniques in the same approach as OpenCL. As it is obvious that OpenACC is more productive than OpenCL especially for legacy application migration, OpenACC is a very promising programming model if it can achieve the same performance as the conventional GPU programming models such as CUDA and OpenCL.
AB - To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. This paper hence discusses the performance tunabilities of OpenACC and OpenCL. As OpenACC cannot synchronize threads running on GPUs, some important techniques are not available to OpenACC. Therefore, we also design an additional compiler directive for thread synchronization. Evaluation results show that both OpenCL and OpenACC need architecture-aware optimizations, and similar approaches to performance optimization are effective for both OpenCL and OpenACC. The additional directive can allow OpenACC to describe more tuning techniques in the same approach as OpenCL. As it is obvious that OpenACC is more productive than OpenCL especially for legacy application migration, OpenACC is a very promising programming model if it can achieve the same performance as the conventional GPU programming models such as CUDA and OpenCL.
KW - Autotuning
KW - OpenACC
KW - OpenCL
UR - http://www.scopus.com/inward/record.url?scp=84892638831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84892638831&partnerID=8YFLogxK
U2 - 10.1109/MCSoC.2013.31
DO - 10.1109/MCSoC.2013.31
M3 - Conference contribution
AN - SCOPUS:84892638831
SN - 9780768550862
T3 - Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013
SP - 147
EP - 152
BT - Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013
PB - IEEE Computer Society
T2 - 2013 IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013
Y2 - 26 September 2013 through 28 September 2013
ER -