Back to Search
Start Over
Detection of Rare Events: Cluster Based Preprocessing of the Training Set: The Case on Complaints for Invoice Time Series
- Source :
- American Scientific Research Journal for Engineering, Technology, and Sciences; Vol. 97 No. 1 (2024); 188-202; 2313-4402; 2313-4410
- Publication Year :
- 2024
-
Abstract
- Detection of rare events is a major problem when dealing with unbalanced data. In the application of machine learning tools, data is split into training and test samples and preprocessing is applied to the training set, with the aim of obtaining a more balanced sample. In this paper we discuss preprocessing methods applied to heterogenous data clustered with respect to expected anomaly types. We propose a method for deciding on oversampling and under-sampling from each cluster, based on the variability of the items in each cluster, using Principal Component Analysis. The method is applied to the problem of detecting anomalies in a time series invoices, with an average rate of complaints of orders 10-4.
Details
- Database :
- OAIster
- Journal :
- American Scientific Research Journal for Engineering, Technology, and Sciences; Vol. 97 No. 1 (2024); 188-202; 2313-4402; 2313-4410
- Notes :
- application/pdf, English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1439337506
- Document Type :
- Electronic Resource