Automatic tuning of CUDA execution parameters for stencil processing

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

8 Citations (Scopus)


Recently, Compute Unified Device Architecture (CUDA) has enabled Graphics Processing Units (GPUs) to accelerate various applications. However, to exploit the GPU's computing power fully, a programmer has to carefully adjust some CUDA execution parameters even for simple stencil processing kernels. Hence, this paper develops an automatic parameter tuning mechanism based on profiling to predict the optimal execution parameters. This paper first discusses the scope of the parameter exploration space determined by GPU's architectural restrictions. To find the optimal execution parameters, performance models are created by profiling execution times of kernel using each promising parameter configuration. The execution parameters are determined by using those performance models. This paper evaluates the performance improvement due to the proposed mechanism using two benchmark programs. From the evaluation results, it is clarified that the proposed mechanism can appropriately select a suboptimal Cooperative Thread Array (CTA) configuration whose performance is comparable to the optimal one.

Original languageEnglish
Title of host publicationSoftware Automatic Tuning
Subtitle of host publicationFrom Concepts to State-of-the-Art Results
PublisherSpringer New York
Number of pages20
ISBN (Print)9781441969347
Publication statusPublished - 2010


Dive into the research topics of 'Automatic tuning of CUDA execution parameters for stencil processing'. Together they form a unique fingerprint.

Cite this