TY - JOUR
T1 - DeLoc
T2 - A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
AU - Agung, Mulya
AU - Amrizal, Muhammad Alfian
AU - Egawa, Ryusuke
AU - Takizawa, Hiroyuki
N1 - Funding Information:
This work was partially supported by Japan’s Ministry of Education, Culture, Sports, Science and Technology Next Generation High-Performance Computing Infrastructures and Applications R&D Program ‘‘R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications’’ and Grant-in-Aid for Scientific Research(B) #16H02822 and #17H01706.
Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61% performance improvement compared with the state-of-the-art locality-based method.
AB - The mapping of tasks to processor cores, called task mapping, is crucial to achieving scalable performance on multicore processors. On modern NUMA (non-uniform memory access) systems, the memory congestion problem could degrade the performance more severely than the data locality problem because heavy congestion on shared caches and memory controllers could cause long latencies. Conventional work on task mapping mostly focuses on improving the locality of memory accesses. However, our previous work showed that on modern NUMA systems, maximizing the locality can degrade the performance due to memory congestion. In this work, we propose a task mapping method that addresses the locality and the memory congestion problems to improve the performance of parallel applications. In the proposed method, first, the spatial and temporal communication behaviors of the applications are analyzed from the time-series dataset of communications among the parallel tasks. Then, a data clustering technique is employed to detect groups of tasks that potentially cause the memory congestion. Finally, this information is used to compute the task mapping to improve the locality and reduce the memory congestion. We also provide a set of metrics to describe the communication behaviors and to evaluate if the target application can benefit from our method. The proposed method is evaluated with the NPB and PARSEC applications on a real NUMA system and a multicore simulator. A detailed analysis of the sources of performance gain is also provided. Experimental results show that our method can achieve up to a 61% performance improvement compared with the state-of-the-art locality-based method.
KW - High-performance computing
KW - locality
KW - memory congestion
KW - NUMA
KW - process mapping
KW - task mapping
KW - thread mapping
UR - http://www.scopus.com/inward/record.url?scp=85078332079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078332079&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2963726
DO - 10.1109/ACCESS.2019.2963726
M3 - Article
AN - SCOPUS:85078332079
SN - 2169-3536
VL - 8
SP - 6937
EP - 6953
JO - IEEE Access
JF - IEEE Access
M1 - 8949493
ER -