Back to Search Start Over

Cluster-Based Improved Isolation Forest

Authors :
Chen Shao
Xusheng Du
Jiong Yu
Jiaying Chen
Source :
Entropy, Vol 24, Iss 5, p 611 (2022)
Publication Year :
2022
Publisher :
MDPI AG, 2022.

Abstract

Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Improved Isolation Forest) that combines clustering and Isolation Forest is proposed. CIIF first uses the k-means method to cluster the data set, selects a specific cluster to construct a selection matrix based on the results of the clustering, and implements the selection mechanism of the algorithm through the selection matrix; then builds multiple isolation trees. Finally, the outliers are calculated according to the average search length of each sample in different isolation trees, and the Top-n objects with the highest outlier scores are regarded as outliers. Through comparative experiments with six algorithms in eleven real data sets, the results show that the CIIF algorithm has better performance. Compared to the Isolation Forest algorithm, the average AUC (Area under the Curve of ROC) value of our proposed CIIF algorithm is improved by 7%.

Details

Language :
English
ISSN :
10994300
Volume :
24
Issue :
5
Database :
Directory of Open Access Journals
Journal :
Entropy
Publication Type :
Academic Journal
Accession number :
edsdoj.38a473f6192e44ff88975e83ecf2dd5b
Document Type :
article
Full Text :
https://doi.org/10.3390/e24050611