Back to Search Start Over

Local outlier factor for anomaly detection in HPCC systems.

Authors :
Adesh, Arya
G, Shobha
Shetty, Jyoti
Xu, Lili
Source :
Journal of Parallel & Distributed Computing. Oct2024, Vol. 192, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

• LOF is an unsupervised anomaly detection algorithm that mines anomalies by calculating the local density of data points relative to their neighborhood. In this work LOF algorithm was implemented using the ECL (enterprise control language) programming language on the HPCC systems (high-performance computing cluster system) platform, an open-source distributed computing platform. • Improved LOF is a modified version of normal LOF, designed to handle datasets with duplicates. This work discusses the implementation of both normal LOF and improved LOF algorithms in HPCC systems for credit card fraud detection and localization data for person activity datasets. • Segmented k - d tree and unsegmented k - d tree are techniques proposed for neighbor search in a distributed system with worst-case time complexity of O((MinPts *| D |)*log(| D |)), where | D | represents the number of data points in the dataset and MinPts is the hyperparameter value. • LOF compared with other anomaly detection algorithms like COF, LoOP, and kNN across 6 benchmark datasets in the HPCC systems platform, demonstrating a favorable balance between execution time and precision in anomaly detection. • LOF implementation was compared across big-data frameworks like Spark, Hadoop, and HPCC systems revealing superior scalability and performance in HPCC systems, especially with larger datasets and higher MinPts values. Local Outlier Factor (LOF) is an unsupervised anomaly detection algorithm that finds anomalies by assessing the local density of a data point relative to its neighborhood. Anomaly detection is the process of finding anomalies in datasets. Anomalies in real-time datasets may indicate critical events like bank frauds, data compromise, network threats, etc. This paper deals with the implementation of the LOF algorithm in the HPCC Systems platform, which is an open-source distributed computing platform for big data analytics. Improved LOF is also proposed which efficiently detects anomalies in datasets rich in duplicates. The impact of varying hyperparameters on the performance of LOF is examined in HPCC Systems. This paper examines the performance of LOF with other algorithms like COF, LoOP, and kNN over several datasets in the HPCC Systems. Additionally, the efficacy of LOF is evaluated across big-data frameworks such as Spark, Hadoop, and HPCC Systems, by comparing their runtime performances. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07437315
Volume :
192
Database :
Academic Search Index
Journal :
Journal of Parallel & Distributed Computing
Publication Type :
Academic Journal
Accession number :
178536421
Full Text :
https://doi.org/10.1016/j.jpdc.2024.104923