An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

A specialized unit in NVIDIA's GPUs, called Tensor Core, keeps attracting attention in the last couple of years due to its high computing capability for general matrix-matrix multiplications (GEMMs). A Tensor Core unit is capable of calculating a matrix multiply-accumulate (MMA) operation of a specific size. However, if the size of input matrices is skinner than that of a Tensor Core operation, some computations of a Tensor Core operation become wasted. Thus, this paper presents a method to optimize the calculation of skinny matrix-matrix multiplication that exploits the potential of the Tensor core units. The proposed method feeds multiple segments of an input matrix into a Tensor Core operation to utilize more computations. The experimental results show that the proposed method achieves up to a 2.7× speedup compared with the cuBLAS 11.0 library.

Original languageEnglish
Title of host publicationProceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages164-167
Number of pages4
ISBN (Electronic)9781728199191
DOIs
Publication statusPublished - 2020 Nov
Event8th International Symposium on Computing and Networking Workshops, CANDARW 2020 - Virtual, Naha, Japan
Duration: 2020 Nov 242020 Nov 27

Publication series

NameProceedings - 2020 8th International Symposium on Computing and Networking Workshops, CANDARW 2020

Conference

Conference8th International Symposium on Computing and Networking Workshops, CANDARW 2020
Country/TerritoryJapan
CityVirtual, Naha
Period20/11/2420/11/27

Keywords

  • GEMM
  • GPU
  • Tensor Core
  • optimization
  • tall-and-skinny

Fingerprint

Dive into the research topics of 'An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations'. Together they form a unique fingerprint.

Cite this