Back to Search Start Over

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

Authors :
Pham, Nhat-Hao
Vo, Khanh-Linh
Vu, Mai Anh
Nguyen, Thu
Riegler, Michael A.
Halvorsen, Pål
Nguyen, Binh T.
Publication Year :
2023

Abstract

Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two common missing patterns: random and monotone. We aim to provide practical strategies and recommendations for researchers and practitioners in creating and analyzing the correlation plot. Our experimental results suggest that while imputation is commonly used for missing data, using imputed data for plotting the correlation matrix may lead to a significantly misleading inference of the relation between the features. We recommend using DPER, a direct parameter estimation approach, for plotting the correlation matrix based on its performance in the experiments.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2305.06044
Document Type :
Working Paper