Back to Search Start Over

Regression models tolerant to massively missing data: a case study in solar radiation nowcasting.

Authors :
Žliobaitė, I.
Hollmén, J.
Junninen, H.
Source :
Atmospheric Measurement Techniques Discussions; 2014, Vol. 7 Issue 7, p7137-7174, 38p
Publication Year :
2014

Abstract

Statistical models for environmental monitoring strongly rely on automatic data acquisition systems, using various physical sensors. Often, sensor readings are missing for extended periods of time while model outputs need to be continuously available in real time. With a case study in solar radiation nowcasting, we investigate how to deal with massively missing data (around 50% of the time some data are unavailable) in such situations. Our goal is to analyze the characteristics of missing data and recommend a strategy for deploying regression models, which would be robust to missing data in situations, where data are massively missing. We are after one model that performs well at all times, with and without data gaps. Due to the need to provide instantaneous outputs with minimum energy consumption for computing in the data streaming setting, we dismiss computationally demanding data imputation methods, and resort to a simple mean replacement. We use an established strategy for comparing different regression models, with the possibility of determining how many missing sensor readings can be tolerated before model outputs become obsolete. We experimentally analyze accuracies and robustness to missing data of seven linear regression models and recommend using regularized PCA regression. We recommend using our established guideline in training regression models, which themselves are robust to missing data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18678610
Volume :
7
Issue :
7
Database :
Complementary Index
Journal :
Atmospheric Measurement Techniques Discussions
Publication Type :
Academic Journal
Accession number :
97447159
Full Text :
https://doi.org/10.5194/amtd-7-7137-2014