1. From calibration to parameter learning: Harnessing the scaling effects of big data in geoscientific modeling
- Author
-
Chaopeng Shen, Ming Pan, Jiangtao Liu, Dapeng Feng, Kathryn Lawson, Hylke E. Beck, Yuan Yang, and Wen-Ping Tsai
- Subjects
Scheme (programming language) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Water resources ,Process (engineering) ,Science ,Big data ,General Physics and Astronomy ,FOS: Physical sciences ,Machine learning ,computer.software_genre ,General Biochemistry, Genetics and Molecular Biology ,Article ,Machine Learning (cs.LG) ,Generalizability theory ,Differentiable function ,Scaling ,computer.programming_language ,Multidisciplinary ,business.industry ,Deep learning ,General Chemistry ,Computational Physics (physics.comp-ph) ,Environmental sciences ,Environmental science ,Artificial intelligence ,Hydrology ,business ,computer ,Physics - Computational Physics ,Coherence (physics) - Abstract
The behaviors and skills of models in many geosciences (e.g., hydrology and ecosystem sciences) strongly depend on spatially-varying parameters that need calibration. A well-calibrated model can reasonably propagate information from observations to unobserved variables via model physics, but traditional calibration is highly inefficient and results in non-unique solutions. Here we propose a novel differentiable parameter learning (dPL) framework that efficiently learns a global mapping between inputs (and optionally responses) and parameters. Crucially, dPL exhibits beneficial scaling curves not previously demonstrated to geoscientists: as training data increases, dPL achieves better performance, more physical coherence, and better generalizability (across space and uncalibrated variables), all with orders-of-magnitude lower computational cost. We demonstrate examples that learned from soil moisture and streamflow, where dPL drastically outperformed existing evolutionary and regionalization methods, or required only ~12.5% of the training data to achieve similar performance. The generic scheme promotes the integration of deep learning and process-based models, without mandating reimplementation., Much effort is invested in calibrating model parameters for accurate outputs, but established methods can be inefficient and generic. By learning from big dataset, a new differentiable framework for model parameterization outperforms state-of-the-art methods, produce more physically-coherent results, using a fraction of the training data, computational power, and time. The method promotes a deep integration of machine learning with process-based geoscientific models.
- Published
- 2021