OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA

Hiroyuki Takizawa, Shinji Shiotsuki, Naoki Ebata, Ryusuke Egawa

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


This paper presents an OpenCL-like offload programming framework for NEC SX-Aurora TSUBASA (SX-Aurora) and also discusses the benefit of employing metaprogramming to describe architecture-specific parts of the programs. Unlike traditional vector systems, one node of an SX-Aurora system consists of a host processor and some vector processors on PCI-Express cards, which are called a vector host and vector engines, respectively. Since the standard OpenCL execution model does not naturally fit in the vector engine, this paper discusses how to adapt the OpenCL specification to SX-Aurora while considering the trade off between performance and code portability. This paper employs OpenCL to minimize non-portable parts of an application code for offload programming, and then metaprogramming to describe the non-portable parts. Performance evaluation results clearly demonstrate that, with a moderate programming effort, the proposed framework can express the collaboration between a vector host and a vector engine so as to make a good use of both of the two different processors. By delegating the right task to the right processor, an OpenCL-like program can fully exploit the performance of SX-Aurora. Moreover, metaprogramming can express vectorization-aware performance optimization to enhance the performance portability across different architectures including SX-Aurora.

Original languageEnglish
Article number102754
JournalParallel Computing
Publication statusPublished - 2021 May


  • Code reusability
  • Metaprogramming
  • Offload programming
  • OpenCL
  • SX-Aurora TSUBASA


Dive into the research topics of 'OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA'. Together they form a unique fingerprint.

Cite this