Start Over

A Supervised Approach for Detection of Outliers in Healthcare Claims Data.

Authors :: Jyothi, P. Naga
Lakshmi, D. Rajya
Rao, K. V. S. N. Rama
Source :: Journal of Engineering Science & Technology Review. 2020, Vol. 13 Issue 1, p204-213. 10p.
Publication Year :: 2020
Abstract: Outlier detection is a fast-moving method in healthcare data and it is the major concern for the health insurance providers. Most of the Medicare data is related to real-world data. Outlier analysis plays a crucial role in data validity and reliability. To detect outlier for medical data is a complex task as it is having more number of variables and is of multivariate in nature. The paper presents a model-based approach in which outliers are detected and they were assigned with labels. The outlier or suspicious is defined as some outcome, which is expected that it is going to commit fraud. The methodology carried out in two phases to develop a Supervised Outlier Detection Approach in healthcare Claims (SODAC). Initially, the data preprocessing stage for feature selection it uses the filter method and set grouping hierarchy to select the best subset and to organize the features. The outlier detection phase uses the combination of classic methods of statistical and distance-based approach. To evaluate the distribution of data the Gaussian probability density function is applied for the data values. The distance-based approach which reflects the outputs as outlier codes. The partitioning of the input dataset and applies statistical mean to each subset and further uses derived multi aggregate metric to consolidate the data instances of the partitions(subsets). The distance-based outlier detection (dod) is done by calculating the maximum distance from the inner average statistical mean measure of the neighborhood from the data objects (instances) of the input. The data object value goes beyond the maximum or minimum of calculated measure is termed as suspicious. This type of detection of outliers is called as indicative fraud potential. The results performed relatively stable for a large scale data as illustrated in the experimentation part over publicly available real world data. [ABSTRACT FROM AUTHOR]