A QA-Assisted Job Scheduler for Minimizing the Impact of Urgent Computing on HPC System Operation

Tatsuyoshi Ohmura, Keichi Takahashi, Ryusuke Egawa, Hiroyuki Takizawa

研究成果: 書籍の章/レポート/Proceedings会議への寄与査読

抄録

In recent years, there has been an increase in large-scale natural disasters such as earthquakes, tsunamis, and storms, raising the importance of disaster prevention and mitigation. Thus, extensive studies on urgent computing use High Performance Computing (HPC) systems for rapid simulations and prompt countermeasures. To meet the deadlines of urgent jobs, job schedulers may have some features so that urgent jobs could have a higher priority of execution than other jobs. As a result, urgent job execution could have negative impacts on the execution of other jobs with lower priorities and, hence, on overall HPC system operation. The goal of this paper is to efficiently execute urgent jobs to meet their deadlines while minimizing the negative impacts on HPC system operation. This paper assumes that some of the running jobs can be suspended (and resumed later) to immediately execute an urgent job. Another important assumption is that there is a possibility that the suspended job could be terminated if necessary to meet the deadline, and thus its intermediate computation results are lost if the total memory usage of urgent and suspended jobs exceeds the memory capacity. The proposed method employs quantum annealing or quantum-inspired annealing (QA) techniques to find an appropriate combination of jobs to be suspended so as to minimize the loss of computational results while meeting the deadlines of urgent jobs. The evaluation results show that the proposed method can properly select an appropriate combination of jobs to be suspended so that it can minimize the computational losses. The results also demonstrate that the superiority of the proposed method becomes more remarkable in practical situations where the power-saving feature of HPC systems is enabled.

本文言語英語
ホスト出版物のタイトルProceedings - 2024 12th International Symposium on Computing and Networking Workshops, CANDARW 2024
出版社Institute of Electrical and Electronics Engineers Inc.
ページ197-203
ページ数7
ISBN(電子版)9798331505349
DOI
出版ステータス出版済み - 2024
イベント12th International Symposium on Computing and Networking Workshops, CANDARW 2024 - Naha, 日本
継続期間: 2024 11月 262024 11月 29

出版物シリーズ

名前Proceedings - 2024 12th International Symposium on Computing and Networking Workshops, CANDARW 2024

会議

会議12th International Symposium on Computing and Networking Workshops, CANDARW 2024
国/地域日本
CityNaha
Period24/11/2624/11/29

フィンガープリント

「A QA-Assisted Job Scheduler for Minimizing the Impact of Urgent Computing on HPC System Operation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル