Back to Search Start Over

Generating virtual samples to improve learning performance in small datasets with non-linear and asymmetric distributions.

Authors :
Lin, Liang-Sian
Lin, Yao-San
Li, Der-Chiang
Source :
Neurocomputing. Sep2023, Vol. 548, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

[Display omitted] • The small dataset problem is an important issue in manufacturing and academia. • A new Newton-VSG approach generates virtual samples with non-linear distributions. • The suggested method's efficacy is verified using two real datasets. • Paired Wilcoxon test elucidates the significance of differences among four VSG methods. In today's highly competitive environment, modeling the relationship between inputs and outputs using limited data for a management system at the early stages is important, but difficult. The virtual sample generation (VSG) method has been proposed in many studies to explore potential information to improve prediction performance of learning models for small datasets. However, those studies in general must assume an underlying distribution such as a linear triangular membership function or a Gaussian distribution to generate virtual samples. Thus, previous VSG methods may not effectively upgrade learning performance when the assumed distribution is not elastic enough for small datasets. To address this issue, in this paper, we proposed a novel VSG method called Newton-VSG method to generate virtual samples for small datasets with non-linear and asymmetric distributions. In the suggested method, we used Newton's method to estimate the minimum value of data range and the shape of data distribution. Further, we developed two deep learning models including Siamese network (SN) model for screening virtual sample input values and bagging auto-encoder (AE) model for predicting virtual sample output to ensure the quality of virtual samples. One real dataset for solidification cracking susceptibility test data and an other real dataset obtained from TFT-LCD process of a leading company in Taiwan were used to demonstrate the efficacy of the proposed method. On the partial least square regression (PLSR) and the back propagation neural network (BPNN) predictive models, we compared the proposed method with three state-of-the-art VSG methods in items of the mean absolute error (MAE) and the root mean squared error (RMSE). The experimental results demonstrated that the proposed method outperforms the other three VSG methods in prediction accuracy for small datasets. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
548
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
164857487
Full Text :
https://doi.org/10.1016/j.neucom.2023.126408