1. Distribution‐Based Model Evaluation and Diagnostics: Elicitability, Propriety, and Scoring Rules for Hydrograph Functionals
- Author
-
Vrugt, Jasper A.
- Abstract
Distribution forecasts Pover future quantities or events are routinely made in hydrology but usually traded for a (likelihood‐weighted) mean or median prediction to accommodate error measures or scoring functionssuch as the mean absolute error or mean squared error. Case in point is the so‐called KG efficiency (KGE) of Gupta et al. (2009, https://doi.org/10.1016/j.jhydrol.2009.08.003) and improvements thereof (Lamontagne et al., 2020, https://doi.org/10.1029/2020wr027101), which have rapidly gained popularity among hydrologists as alternative scoring functionsto the commonly used Nash and Sutcliffe (1970, https://doi.org/10.1016/0022‐1694(70)90255‐6) efficiency, but are equally exclusive in how they quantify model performance using only single‐valued output of the quantities of interest. This point‐valued mapping necessarily implies a loss of information about model performance. This paper advocates the use of probabilistic watershed model training, evaluation and diagnostics. Distribution evaluation opens a mature literature on scoring ruleswhose strong statistical underpinning provides, as we will demonstrate, the theory, context and guidelines necessary for the development of robust information‐theoretically principled metrics for watershed signatures. These so‐called hydrograph functionals are scalar‐valued mappings of major behavioral watershed functions embodied in a strictly properscoring rule. We discuss past developments that led to the current state‐of‐the‐art of distribution evaluation in hydrology and review scoring rules for dichotomous and categorical events, quantiles (intervals) and density forecasts. We are particularly concerned with elicitable functionals and scoring rule propriety, discuss the decomposition of scoring rules into a sharpness, reliability and entropy term and present diagnostically appealing strictly properdivergence scores of hydrograph functionals for flood frequency analysis, flow duration and recession curves. The usefulness and power of distribution‐based model evaluation and diagnostics by means of scoring rules is demonstrated on simple illustrative problems and discharge distributions simulated with watershed models using random sampling and Bayesian model averaging. The presented theory (a) enables a more complete evaluation of distribution forecasts, (b) offers a statistically principled means for watershed model training, evaluation, diagnostics and selection using hydrograph functionals and/or extreme events and (c) provides a universal framework for metric development of watershed signatures, promoting metric standardization and reproducibility. The past decades have witnessed an unbridled growth in goodness‐of‐fit metrics of hydrologic models. These metrics may satisfy the needs of hydrologists but lack conforming theory and principles. This state of affairs (a) elicits improper model training and evaluation, (b) provokes and supports misguided inferences, (c) impedes statistically‐principled uncertainty quantification, metric standardization and development of universal model benchmarks and (d) obfuscates determination of whether the model has finished learning. What is more, most hydrologic model evaluation metrics in use today are rather exclusive in how they quantify model performance using only single‐valued simulated output of the quantities of interest. Predictive distributions derived from (quasi)‐Bayesian methods or ensembles are usually traded for a (likelihood‐weighted) mean or median prediction to accommodate error measures (scoring functions) such as the mean absolute error. This implies a large loss of information. This paper develops a distribution‐based approach to hydrologic model evaluation and diagnostics. Distribution evaluation opens the necessary theory and guidelines for development of robust information‐theoretically principled metrics of watershed signatures. These so‐called hydrograph functionals are scalar‐valued mappings of major behavioral watershed functions embodied in a strictly properscoring rule. The hydrograph functionals offer a statistically principled means for hydrologic model evaluation, diagnostics and selection. Scoring rules of hydrograph functionals provide an information‐theoretically principled means for watershed model training, evaluation, and diagnosticsWe present strictly proper(divergence) scores for flood frequency analysis, flow duration, and recession curvesPropriety and elicitability offer useful working paradigms for metric development of hydrograph functionals Scoring rules of hydrograph functionals provide an information‐theoretically principled means for watershed model training, evaluation, and diagnostics We present strictly proper(divergence) scores for flood frequency analysis, flow duration, and recession curves Propriety and elicitability offer useful working paradigms for metric development of hydrograph functionals
- Published
- 2024
- Full Text
- View/download PDF