1. Weighted Machine Learning for Spatial-Temporal Data
- Author
-
Mahdi Hashemi and Hassan A. Karimi
- Subjects
Atmospheric Science ,inductive learning ,010504 meteorology & atmospheric sciences ,Computer science ,Feature vector ,autocorrelation ,Geophysics. Cosmic physics ,Sample (statistics) ,010502 geochemistry & geophysics ,Machine learning ,computer.software_genre ,01 natural sciences ,spatial data ,Relevance (information retrieval) ,Computers in Earth Sciences ,TC1501-1800 ,Analytical learning ,temporal data ,0105 earth and related environmental sciences ,business.industry ,QC801-809 ,Autocorrelation ,Process (computing) ,Temporal database ,Support vector machine ,Ocean engineering ,machine learning ,Kernel (statistics) ,Artificial intelligence ,business ,computer - Abstract
Applying machine learning techniques to spatial-temporal data poses the question that how the recorded location and time for training samples should contribute to the training and testing process. The prior knowledge of how spatial-temporal phenomena are autocorrelated cannot be properly captured by machine learning techniques, which either ignore location and time altogether or consider them as input features. Not to mention that the latter approach leads to slightly increased sparseness of data in the feature space and more free parameters in the predictor; thus, demanding for larger training datasets. We use the prior knowledge about the spatial-temporal autocorrelation to determine how relevant each training sample would be, given its spatial and temporal distances to the irresponsive (unlabeled) sample. Weighted machine learning techniques use this prior knowledge by taking the relevance of training samples with regard to the irresponsive sample into account as training samples’ weights. The proposed approach overcomes the aforementioned issues by enriching the training process with the prior knowledge about spatial-temporal autocorrelation. Because the spatial-temporal weight of training samples depends on the irresponsive sample's location and time, the machine needs to be trained separately for each irresponsive sample. However, we show that in practice using only a small subset of training samples with largest spatial-temporal weights not only mitigates the training time but also results in the best accuracy in most cases.
- Published
- 2020