Back to Search Start Over

Data-Defect Inspection With Kernel-Neighbor-Density-Change Outlier Factor.

Authors :
Cao, Hui
Ma, Rui
Ren, Hongliang
Ge, Shuzhi Sam
Source :
IEEE Transactions on Automation Science & Engineering. Jan2018, Vol. 15 Issue 1, p225-238. 14p.
Publication Year :
2018

Abstract

Data-defect would affect the data quality and the analysis results of data mining. This paper presents a data-defect inspection method with kernel-neighbor-density-change outlier factor (KNDCOF). The definition of kernel neighbor density is proposed to represent the density of each object in database, and the ascending distance series (ADS) of each object is calculated based on the kernel distance between the object and its neighbors. Then, the average density fluctuation (ADF) of the object is established according to the weighted sum of the square of density difference between the object and others in ADS. Finally, the KNDCOF of the object is equal to the ratios of the ADF of the object and the average ADF of neighbors of the object. The degree of the object being an outlier is indicated by the KNDCOF value. The experiments are performed on three real data sets to evaluate the effectiveness of the proposed method. The experimental results verify that the proposed method has higher quality of data-defect inspection and does not increase the time complexity.</p><p>Note to Practitioners–Data-defect inspection is an important procedure of data preprocessing for a real industrial process. This paper presents a data-defect inspection method with kernel-neighbor-density-change outlier factor to identify the outliers, and addresses the challenges associated with the strong correlation and the nonlinearity of the industrial data. The proposed method calculates the outlier factor for each object, which quantifies how outlying it is. The outlier factor is based on the density difference between the object and its neighbors. The larger the outlier factor of an object is, the higher the outlierness of the object is. The proposed method could be wildly used in an industrial complex data set with different density regions. In the industrial field, engineers can deal with the objects with high outlier factor values based on the actual requirements. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15455955
Volume :
15
Issue :
1
Database :
Academic Search Index
Journal :
IEEE Transactions on Automation Science & Engineering
Publication Type :
Academic Journal
Accession number :
127154251
Full Text :
https://doi.org/10.1109/TASE.2016.2603420