1. A Novel Strategy for Automatic Selection of Cross‐Basin Data to Improve Local Machine Learning‐Based Runoff Models.
- Author
-
Nai, Congyi, Liu, Xingcai, Tang, Qiuhong, Liu, Liu, Sun, Siao, and Gaffney, Paul P. J.
- Subjects
RUNOFF models ,DEEP learning ,DATA integrity ,WATERSHEDS ,RUNOFF - Abstract
Previous studies have shown that regional deep learning (DL) models can improve runoff prediction by leveraging large hydrological datasets. However, training a DL regional model using all data without screening may degrade local performance. This study focuses on constructing enhanced local models through the utilization of cross‐basin data. To this end, we propose an approach that employs a novel training strategy to optimize DL model training for specific basins. The approach measures the impact of any one basin's gradient on the loss of the basin of interest, providing insights into the relationships between different basins. The approach was validated using 531 basins from the CAMELS dataset. Results suggest that local performance degradation is a common occurrence in regional models, and imbalanced data are likely to result in a specific pattern dominating the entire regional model. In comparison to a regional model simply trained with all basins, the median Nash‐Sutcliffe efficiency (NSE) for our models is 0.031 higher. In particular, the increase in NSE can exceed 0.2 for some dry basins. Our findings indicate that this novel DL strategy can significantly improve model performance in specific basins using large hydrological datasets, while mitigating local performance loss. Plain Language Summary: In the realm of deep learning, incorporating more data into the training process typically results in a more potent model. Conventionally, large datasets have been employed to train regional models with the intention of predicting rainfall‐runoff processes across all basins. However, such regional models often encounter performance degradation when applied to local basins. This degradation can be attributed to the extraction of overly general features, leading to a loss of specificity. In this paper, our objective is to harness information from a large dataset to establish a more robust local model. We have introduced a method that autonomously learns the similarities in rainfall‐runoff behavior among basins. Subsequently, we utilize this learned similarity to selectively choose data that proves advantageous for training the local model. Our results demonstrate that this strategy can significantly enhance runoff prediction, particularly in arid basins. Key Points: Training a DL regional model using all data without screening may degrade local performanceProper selection of training data is crucial for enhancing DL model training for individual basins, especially arid basinsData from different basins might act as mutual noise during the training process [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF