TY - GEN
T1 - Comparative Study of Outlier Detection Algorithms for Machine Learning
AU - Nazari, Zahra
AU - Yu, Seong Mi
AU - Kang, Dongshik
AU - Kawachi, Yousuke
N1 - Publisher Copyright:
© 2018 ACM.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - Outliers are unusual data points which are inconsistent with other observations. Human error, mechanical faults, fraudulent behavior, instrument error, and changes in the environment are some reasons to arise outliers. Several types of outlier detection algorithms are developed and a number of surveys and overviews are performed to distinguish their advantages and disadvantages. Multivariate outlier detection algorithms are widely used among other types, therefore we concentrate on this type. In this work a comparison between effects of multivariate outlier detection algorithms on machine learning problems is performed. For this purpose, three multivariate outlier detection algorithms namely distance based, statistical based and clustering based are evaluated. Benchmark datasets of Heart disease, Breast cancer and Liver disorder are used for the experiments. To identify the effectiveness of mentioned algorithms, the above datasets are classified by Support Vector Machines (SVM) before and after outlier detection. Finally a comparative review is performed to distinguish the advantages and disadvantages of each algorithm and their respective effects on accuracy of SVM classifiers.
AB - Outliers are unusual data points which are inconsistent with other observations. Human error, mechanical faults, fraudulent behavior, instrument error, and changes in the environment are some reasons to arise outliers. Several types of outlier detection algorithms are developed and a number of surveys and overviews are performed to distinguish their advantages and disadvantages. Multivariate outlier detection algorithms are widely used among other types, therefore we concentrate on this type. In this work a comparison between effects of multivariate outlier detection algorithms on machine learning problems is performed. For this purpose, three multivariate outlier detection algorithms namely distance based, statistical based and clustering based are evaluated. Benchmark datasets of Heart disease, Breast cancer and Liver disorder are used for the experiments. To identify the effectiveness of mentioned algorithms, the above datasets are classified by Support Vector Machines (SVM) before and after outlier detection. Finally a comparative review is performed to distinguish the advantages and disadvantages of each algorithm and their respective effects on accuracy of SVM classifiers.
KW - Machine Learning
KW - Outlier Detection
KW - Support Vector Machines
UR - http://www.scopus.com/inward/record.url?scp=85055525196&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055525196&partnerID=8YFLogxK
U2 - 10.1145/3234804.3234817
DO - 10.1145/3234804.3234817
M3 - Conference contribution
AN - SCOPUS:85055525196
T3 - ACM International Conference Proceeding Series
SP - 47
EP - 51
BT - ICDLT 2018 - 2018 2nd International Conference on Deep Learning Technologies
PB - Association for Computing Machinery
T2 - 2nd International Conference on Deep Learning Technologies, ICDLT 2018
Y2 - 27 June 2018 through 29 June 2018
ER -