머신러닝 모델의 성능 저하 완화를 위한 반복적 결측값 처리 기법.

Authors :: 이종관
이민우
Source :: Journal of the Korea Institute of Information & Communication Engineering; Apr2024, Vol. 28 Issue 4, p387-394, 8p
Publication Year :: 2024
Abstract: Machine learning models find extensive application across diverse domains, with their performance heavily reliant on the data quality employed during the learning process. However, real-world datasets include some missing data due to limitations and errors in data collection methods, incomplete or inconsistent data-gathering processes, and human errors during processing. Consequently, effective handling of missing values becomes imperative to ensure optimal model performance. A common way to deal with missing data is to either delete the data containing the missing values or to impute them appropriately. Deletion is straightforward, but at the cost of information loss. Imputation, on the other hand, can result in a loss of variability in the dataset and skewed correlations between variables. The proposed scheme reduces dimensionality by utilizing variables without missing values and employs the outcomes to estimate the missing values. Experimental validations affirm that the proposed scheme mitigates the performance degradation of various machine learning models compared to existing methods. [ABSTRACT FROM AUTHOR]

Language :: Korean
ISSN :: 22344772
Volume :: 28
Issue :: 4
Database :: Complementary Index
Journal :: Journal of the Korea Institute of Information & Communication Engineering
Publication Type :: Academic Journal
Accession number :: 177021027
Full Text :: https://doi.org/10.6109/jkiice.2024.28.4.387