TY - GEN
T1 - Improving Resource Utilization in Data Centers using an LSTM-based Prediction Model
AU - Thonglek, Kundjanasith
AU - Ichikawa, Kohei
AU - Takahashi, Keichi
AU - Iida, Hajimu
AU - Nakasan, Chawanat
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP18K11326.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Data centers are centralized facilities where computing and networking hardware are aggregated to handle large amounts of data and computation. In a data center, computing resources such as CPU and memory are usually managed by a resource manager. The resource manager accepts resource requests from users and allocates resources to their applications. A commonly known problem in resource management is that users often request more resources than their applications actually use. This leads to the degradation of overall resource utilization in a data center. This paper aims to improve resource utilization in data centers by predicting the required resource for each application. We designed and implemented a neural network model based on Long Short-Term Memory (LSTM) to predict more efficient resource allocation for a job based on historical data. Our model has two LSTM layers each of which learns the relationship between: (1) allocation and usage, and (2) CPU and memory. We used Googles cluster-usage trace, which contains a trace of resource allocation and usage for each job executed on a Google data center, to train our neural network. Googles cluster scheduler simulator was used to evaluate our proposed method. Our simulation indicated that the proposed method improved the CPU utilization and memory utilization by 10.71% and 47.36%, respectively, compared to a conventional resource manager. Moreover, we discovered that increasing the memory cell size of our LSTM model improves the accuracy of the prediction in return for longer training time.
AB - Data centers are centralized facilities where computing and networking hardware are aggregated to handle large amounts of data and computation. In a data center, computing resources such as CPU and memory are usually managed by a resource manager. The resource manager accepts resource requests from users and allocates resources to their applications. A commonly known problem in resource management is that users often request more resources than their applications actually use. This leads to the degradation of overall resource utilization in a data center. This paper aims to improve resource utilization in data centers by predicting the required resource for each application. We designed and implemented a neural network model based on Long Short-Term Memory (LSTM) to predict more efficient resource allocation for a job based on historical data. Our model has two LSTM layers each of which learns the relationship between: (1) allocation and usage, and (2) CPU and memory. We used Googles cluster-usage trace, which contains a trace of resource allocation and usage for each job executed on a Google data center, to train our neural network. Googles cluster scheduler simulator was used to evaluate our proposed method. Our simulation indicated that the proposed method improved the CPU utilization and memory utilization by 10.71% and 47.36%, respectively, compared to a conventional resource manager. Moreover, we discovered that increasing the memory cell size of our LSTM model improves the accuracy of the prediction in return for longer training time.
KW - Computing Resources
KW - Long Short-Term Memory
KW - Resource Management
KW - Resource Utilization
UR - http://www.scopus.com/inward/record.url?scp=85075269447&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075269447&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2019.8891022
DO - 10.1109/CLUSTER.2019.8891022
M3 - Conference contribution
AN - SCOPUS:85075269447
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
BT - Proceedings - 2019 IEEE International Conference on Cluster Computing, CLUSTER 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Cluster Computing, CLUSTER 2019
Y2 - 23 September 2019 through 26 September 2019
ER -