TY - GEN
T1 - A hardware prefetching mechanism for vector gather instructions
AU - Takayashiki, Hikaru
AU - Sato, Masayuki
AU - Komatsu, Kazuhiko
AU - Kobayashi, Hiroaki
N1 - Funding Information:
ACKNOWLEDGMENT This work is partially supported by MEXT Next Generation High-Performance Computing Infrastructures and Applications R&D Program, entitled ”R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications”, Grants-in-Aid for Early-Career Scientists No. 19K20232, and Grant-in-Aid for Scientific Research(A) No. 19H01095. The experimental results in this research were partially obtained by using supercomputing resources at Cyberscience Center, Tohoku University.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Vector gather instructions are responsible for handling indirect memory accesses in vector processing. Since the indirect memory accesses usually express irregular access patterns, they have relatively low spatial and temporal locality compared with regular access patterns. As a result, an application with many vector gather instructions suffers from long latencies of the indirect memory accesses. Thus, the long latencies cause a significant performance degradation in vector processing. This paper proposes a hardware prefetching mechanism to hide memory access latencies of indirect memory accesses. The mechanism prefetches cacheable index data before executing a vector gather instruction, and predicts the addresses of the memory requests issued by the vector gather instruction. The mechanism then tries to prefetch the data based on the predicted addresses. As a result, the mechanism can reduce the memory access latencies of vector gather instructions. Moreover, this paper discusses how many cache blocks should be loaded per prediction regarding a single vector gather instruction by varying the prefetching parameters of distance and degree. In the evaluation, the performance of a simple kernel is examined with two types of index data: sequential and random. The evaluation results show that the prefetching mechanism improves the performance of the sequential-indexed and random-indexed kernels by 2.2x and 1.2x, respectively.
AB - Vector gather instructions are responsible for handling indirect memory accesses in vector processing. Since the indirect memory accesses usually express irregular access patterns, they have relatively low spatial and temporal locality compared with regular access patterns. As a result, an application with many vector gather instructions suffers from long latencies of the indirect memory accesses. Thus, the long latencies cause a significant performance degradation in vector processing. This paper proposes a hardware prefetching mechanism to hide memory access latencies of indirect memory accesses. The mechanism prefetches cacheable index data before executing a vector gather instruction, and predicts the addresses of the memory requests issued by the vector gather instruction. The mechanism then tries to prefetch the data based on the predicted addresses. As a result, the mechanism can reduce the memory access latencies of vector gather instructions. Moreover, this paper discusses how many cache blocks should be loaded per prediction regarding a single vector gather instruction by varying the prefetching parameters of distance and degree. In the evaluation, the performance of a simple kernel is examined with two types of index data: sequential and random. The evaluation results show that the prefetching mechanism improves the performance of the sequential-indexed and random-indexed kernels by 2.2x and 1.2x, respectively.
KW - Hardware prefetching
KW - Indirect memory access
KW - Vector instructions
UR - http://www.scopus.com/inward/record.url?scp=85078189097&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078189097&partnerID=8YFLogxK
U2 - 10.1109/IA349570.2019.00015
DO - 10.1109/IA349570.2019.00015
M3 - Conference contribution
AN - SCOPUS:85078189097
T3 - 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019
SP - 59
EP - 66
BT - 2019 IEEE/ACM 9th Workshop on Irregular Applications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019
Y2 - 18 November 2019
ER -