Back to Search Start Over

Detection of Rare Events: Cluster Based Preprocessing of the Training Set: The Case on Complaints for Invoice Time Series

Authors :
Huseyin Carpanali
Ayse Humeyra Bilge
Arif Selcuk Ogrenci
Tarkan Ozmen
Ayse Tosun
Kubra Cakar
Huseyin Carpanali
Ayse Humeyra Bilge
Arif Selcuk Ogrenci
Tarkan Ozmen
Ayse Tosun
Kubra Cakar
Source :
American Scientific Research Journal for Engineering, Technology, and Sciences; Vol. 97 No. 1 (2024); 188-202; 2313-4402; 2313-4410
Publication Year :
2024

Abstract

Detection of rare events is a major problem when dealing with unbalanced data. In the application of machine learning tools, data is split into training and test samples and preprocessing is applied to the training set, with the aim of obtaining a more balanced sample. In this paper we discuss preprocessing methods applied to heterogenous data clustered with respect to expected anomaly types. We propose a method for deciding on oversampling and under-sampling from each cluster, based on the variability of the items in each cluster, using Principal Component Analysis. The method is applied to the problem of detecting anomalies in a time series invoices, with an average rate of complaints of orders 10-4.

Details

Database :
OAIster
Journal :
American Scientific Research Journal for Engineering, Technology, and Sciences; Vol. 97 No. 1 (2024); 188-202; 2313-4402; 2313-4410
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1439337506
Document Type :
Electronic Resource