An automatic mpi process mapping method considering locality and memory congestion on numa systems

Mulya Agung, Muhammad Alfian Amrizal, Ryusuke Egawa, Hiroyuki Takizawa

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

MPI process mapping is an important step to achieve scalable performance on non-uniform memory access (NUMA) systems. Conventional approaches have focused only on improving the locality of communication. However, related studies have shown that on modern NUMA systems, the memory congestion problem could cause more severe performance degradation than the locality problem because a high number of processor cores in the systems can cause heavy congestion on shared caches and memory controllers. To optimize the process mapping, it is necessary to determine the communication behavior of the MPI processes. Previous methods rely on offline profiling to analyze the communication behavior, which incurs a high overhead and is potentially time-consuming. In this paper, we propose a method that automatically performs MPI process mapping for adapting to communication behaviors while considering both locality and memory congestion. Our method works at runtime during the execution of an MPI application. It does not require modifications to the application, previous knowledge of the communication behavior, or changes to the hardware and operating system. The proposed method has been evaluated with the NAS parallel benchmarks on a NUMA system. Experimental results show that our method can achieve performance close to an oracle-based mapping method with low overhead to the application execution. The performance improvement is up to 27.4% (13.4% on average) compared with the default mapping of the MPI runtime system.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages17-24
Number of pages8
ISBN (Electronic)9781728148823
DOIs
Publication statusPublished - 2019 Oct
Event13th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019 - Singapore, Singapore
Duration: 2019 Oct 12019 Oct 4

Publication series

NameProceedings - 2019 IEEE 13th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019

Conference

Conference13th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2019
Country/TerritorySingapore
CitySingapore
Period19/10/119/10/4

Keywords

  • Congestion
  • Locality
  • MPI
  • Multi-core
  • NUMA
  • Process mapping

Fingerprint

Dive into the research topics of 'An automatic mpi process mapping method considering locality and memory congestion on numa systems'. Together they form a unique fingerprint.

Cite this