Comparative Study of Outlier Detection Algorithms for Machine Learning

Zahra Nazari, Seong Mi Yu, Dongshik Kang, Yousuke Kawachi

研究成果: 書籍の章/レポート/Proceedings会議への寄与査読

1 被引用数 (Scopus)

抄録

Outliers are unusual data points which are inconsistent with other observations. Human error, mechanical faults, fraudulent behavior, instrument error, and changes in the environment are some reasons to arise outliers. Several types of outlier detection algorithms are developed and a number of surveys and overviews are performed to distinguish their advantages and disadvantages. Multivariate outlier detection algorithms are widely used among other types, therefore we concentrate on this type. In this work a comparison between effects of multivariate outlier detection algorithms on machine learning problems is performed. For this purpose, three multivariate outlier detection algorithms namely distance based, statistical based and clustering based are evaluated. Benchmark datasets of Heart disease, Breast cancer and Liver disorder are used for the experiments. To identify the effectiveness of mentioned algorithms, the above datasets are classified by Support Vector Machines (SVM) before and after outlier detection. Finally a comparative review is performed to distinguish the advantages and disadvantages of each algorithm and their respective effects on accuracy of SVM classifiers.

本文言語英語
ホスト出版物のタイトルICDLT 2018 - 2018 2nd International Conference on Deep Learning Technologies
出版社Association for Computing Machinery
ページ47-51
ページ数5
ISBN(電子版)9781450364737
DOI
出版ステータス出版済み - 2018 6月 27
イベント2nd International Conference on Deep Learning Technologies, ICDLT 2018 - Chongqing, 中国
継続期間: 2018 6月 272018 6月 29

出版物シリーズ

名前ACM International Conference Proceeding Series

会議

会議2nd International Conference on Deep Learning Technologies, ICDLT 2018
国/地域中国
CityChongqing
Period18/6/2718/6/29

フィンガープリント

「Comparative Study of Outlier Detection Algorithms for Machine Learning」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル