TY - JOUR
T1 - Evidence-based data mining method to reveal similarities between materials based on physical mechanisms
AU - Ha, Minh Quyet
AU - Nguyen, Duong Nguyen
AU - Nguyen, Viet Cuong
AU - Kino, Hiori
AU - Ando, Yasunobu
AU - Miyake, Takashi
AU - Denœux, Thierry
AU - Huynh, Van Nam
AU - Dam, Hieu Chi
N1 - Funding Information:
This work was supported by the Ministry of Education, Culture, Sports, Science, and Technology of Japan (MEXT) with the Program for Promoting Research on the Supercomputer Fugaku (DPMSD), JSPS KAKENHI grants 20K05301, JP19H05815 (Grants-in-Aid for Scientific Research on Innovative Areas Interface Ionics), 21K14396 (Grant-in-Aid for Early- Career Scientists), and 20K05068, Japan.
Publisher Copyright:
© 2023 Author(s).
PY - 2023/2/7
Y1 - 2023/2/7
N2 - Measuring the similarity between materials is essential for estimating their properties and revealing the associated physical mechanisms. However, current methods for measuring the similarity between materials rely on theoretically derived descriptors and parameters fitted from experimental or computational data, which are often insufficient and biased. Furthermore, outliers and data generated by multiple mechanisms are usually included in the dataset, making the data-driven approach challenging and mathematically complicated. To overcome such issues, we apply the Dempster-Shafer theory to develop an evidential regression-based similarity measurement (eRSM) method, which can rationally transform data into evidence. It then combines such evidence to conclude the similarities between materials, considering their physical properties. To evaluate the eRSM, we used two material datasets, including 3d transition metal-4f rare-earth binary and quaternary high-entropy alloys with target properties, Curie temperature, and magnetization. Based on the information obtained on the similarities between the materials, a clustering technique is applied to learn the cluster structures of the materials that facilitate the interpretation of the mechanism. The unsupervised learning experiments demonstrate that the obtained similarities are applicable to detect anomalies and appropriately identify groups of materials whose properties correlate differently with their compositions. Furthermore, significant improvements in the accuracies of the predictions for the Curie temperature and magnetization of the quaternary alloys are obtained by introducing the similarities, with the reduction in mean absolute errors of 36% and 18%, respectively. The results show that the eRSM can adequately measure the similarities and dissimilarities between materials in these datasets with respect to mechanisms of the target properties.
AB - Measuring the similarity between materials is essential for estimating their properties and revealing the associated physical mechanisms. However, current methods for measuring the similarity between materials rely on theoretically derived descriptors and parameters fitted from experimental or computational data, which are often insufficient and biased. Furthermore, outliers and data generated by multiple mechanisms are usually included in the dataset, making the data-driven approach challenging and mathematically complicated. To overcome such issues, we apply the Dempster-Shafer theory to develop an evidential regression-based similarity measurement (eRSM) method, which can rationally transform data into evidence. It then combines such evidence to conclude the similarities between materials, considering their physical properties. To evaluate the eRSM, we used two material datasets, including 3d transition metal-4f rare-earth binary and quaternary high-entropy alloys with target properties, Curie temperature, and magnetization. Based on the information obtained on the similarities between the materials, a clustering technique is applied to learn the cluster structures of the materials that facilitate the interpretation of the mechanism. The unsupervised learning experiments demonstrate that the obtained similarities are applicable to detect anomalies and appropriately identify groups of materials whose properties correlate differently with their compositions. Furthermore, significant improvements in the accuracies of the predictions for the Curie temperature and magnetization of the quaternary alloys are obtained by introducing the similarities, with the reduction in mean absolute errors of 36% and 18%, respectively. The results show that the eRSM can adequately measure the similarities and dissimilarities between materials in these datasets with respect to mechanisms of the target properties.
UR - http://www.scopus.com/inward/record.url?scp=85147968182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147968182&partnerID=8YFLogxK
U2 - 10.1063/5.0134999
DO - 10.1063/5.0134999
M3 - Article
AN - SCOPUS:85147968182
SN - 0021-8979
VL - 133
JO - Journal of Applied Physics
JF - Journal of Applied Physics
IS - 5
M1 - 053904
ER -