TY - GEN
T1 - Register Flush-free Runahead Execution for Modern Vector Processors
AU - Takayashiki, Hikaru
AU - Sato, Masayuki
AU - Komatsu, Kazuhiko
AU - Kobayashi, Hiroaki
N1 - Funding Information:
This work is partially supported by MEXT Next Generation High-Performance Computing Infrastructures and Applications R&D Program, entitled ”R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications”, Grants-in-Aid for Early-Career Scientists No. 19K20232, Grants-in-Aid for Scientific Research(A) No. 19H01095, Grants-in-Aid for Scientific Research(C) No. 20K11838, and Japan-Russia Research Co-operative Program between JSPS and RFBR, Grant number JPJSBP120214801. The experimental results in this research were partially obtained by using supercomputing resources at Cyberscience Center, Tohoku University.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Modern vector processors have been designed to achieve high sustained performance, especially in HPC applications, because of their powerful instruction set oriented to data-level parallelism. Additionally, the latest vector processor adopts the out-of-order execution of the vector instructions to exploit instruction-level parallelism due to a significant gap in latency between vector arithmetic instructions and vector load/store instructions. In spite of the effort, this gap still brings a deterioration of sustained performance of the modern vector processors. This paper proposes a runahead execution mechanism for the modern vector processors to fill the latency gap by further exploiting instruction-level parallelism. If the processor stalls due to a long latency instruction, the conventional runahead execution mechanism changes the processor state from a normal mode to a runahead mode, and the processor speculatively executes the subsequent instructions that can cause stalls and their dependencies. However, the conventional runahead execution mechanisms flush the registers' values calculated in the runahead mode after finishing this mode and cannot reuse them in the subsequent normal mode. Since the vector processors have many values even in one vector register, these flushes and re-executions waste the bandwidth between cores and caches. Thus, to solve this problem of the conventional runahead mechanism, our proposed mechanism leaves the registers containing the results in the runahead mode in order for the processor to use the registers even after returning to the normal mode. For correctly using these registers after exiting the runahead mode, the proposed mechanism newly realizes functions to inherit the commit order information and the register aliasing information of the runahead-executed instructions into the normal mode. The evaluation results show that the proposed mechanism improves the performance by up to 20% and 3% on average by the conventional mechanism.
AB - Modern vector processors have been designed to achieve high sustained performance, especially in HPC applications, because of their powerful instruction set oriented to data-level parallelism. Additionally, the latest vector processor adopts the out-of-order execution of the vector instructions to exploit instruction-level parallelism due to a significant gap in latency between vector arithmetic instructions and vector load/store instructions. In spite of the effort, this gap still brings a deterioration of sustained performance of the modern vector processors. This paper proposes a runahead execution mechanism for the modern vector processors to fill the latency gap by further exploiting instruction-level parallelism. If the processor stalls due to a long latency instruction, the conventional runahead execution mechanism changes the processor state from a normal mode to a runahead mode, and the processor speculatively executes the subsequent instructions that can cause stalls and their dependencies. However, the conventional runahead execution mechanisms flush the registers' values calculated in the runahead mode after finishing this mode and cannot reuse them in the subsequent normal mode. Since the vector processors have many values even in one vector register, these flushes and re-executions waste the bandwidth between cores and caches. Thus, to solve this problem of the conventional runahead mechanism, our proposed mechanism leaves the registers containing the results in the runahead mode in order for the processor to use the registers even after returning to the normal mode. For correctly using these registers after exiting the runahead mode, the proposed mechanism newly realizes functions to inherit the commit order information and the register aliasing information of the runahead-executed instructions into the normal mode. The evaluation results show that the proposed mechanism improves the performance by up to 20% and 3% on average by the conventional mechanism.
UR - http://www.scopus.com/inward/record.url?scp=85124368952&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124368952&partnerID=8YFLogxK
U2 - 10.1109/SBAC-PAD53543.2021.00023
DO - 10.1109/SBAC-PAD53543.2021.00023
M3 - Conference contribution
AN - SCOPUS:85124368952
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 114
EP - 125
BT - Proceedings - 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2021
PB - IEEE Computer Society
T2 - 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2021
Y2 - 26 October 2021 through 29 October 2021
ER -