1. An Efficient Density-Based Local Outlier Detection Approach for Scattered Data
- Author
-
Shubin Su, Rongbin Xu, Limin Xiao, Shupan Li, Li Ruan, Fei Gu, and Zhaokai Wang
- Subjects
General Computer Science ,Computer science ,Nearest neighbor search ,02 engineering and technology ,020204 information systems ,Outlier detection ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Cluster analysis ,local outlier factor ,neighborhood variance ,Local outlier factor ,Degree (graph theory) ,business.industry ,General Engineering ,Pattern recognition ,Object (computer science) ,rough clustering ,ComputingMethodologies_PATTERNRECOGNITION ,Outlier ,020201 artificial intelligence & image processing ,Anomaly detection ,scattered dataset ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,Focus (optics) ,business ,lcsh:TK1-9971 - Abstract
After the local outlier factor was first proposed, there is a large family of local outlier detection approaches derived from it. Since the existing approaches only focus on the extent of overall separation between an object and its neighbors, and ignore the degree of dispersion between them, the precision of these approaches will be affected by various degrees in the scattered datasets. In addition, the outlier data occupy a relatively small amount in the dataset, but the existing approaches need to perform local outlier factor calculation on all data during the outlier detection, which greatly reduces the efficiency of the algorithms. In this paper, we redefine a local outlier factor called local deviation coefficient (LDC) by taking full advantage of the distribution of the object and its neighbors. And then, we propose a safe non-outlier objects elimination approach named as rough clustering based on multi-level queries (RCMLQ) to preprocess the datasets to eliminate the non-outlier objects to the utmost. Finally, an efficient local outlier detection approach named as efficient density-based local outlier detection for scattered data (E2DLOS) is proposed based on the LDC and RCMLQ. The RCMLQ greatly reduces the amount of data that needs to be quantified for local outlier factor and the LDC is more sensitive to the degree of anomaly of the scattered datasets, and so the E2DLOS improves the existing local outlier detection approaches in time efficiency and detection accuracy. Experiments show that the LDC can better reflect the true abnormal situations of the data for the scattered datasets. And the RCMLQ can be used in parallel with the traditional methods of improving the efficiency of the nearest neighbor search, which can further improve the efficiency of the E2DLOS algorithm by about 16%.
- Published
- 2019