Back to Search Start Over

Explaining parameter-based models through Executable Research Compendia: support or barrier?

Authors :
Konkol, Markus
Publication Year :
2022
Publisher :
Open Science Framework, 2022.

Abstract

Introduction In many scientific articles, the computational results can be based on models, e.g. for calculating the damage costs caused by a flood event. To make these models also applicable to other scenarios, they are often based on parameters which describe, for example, the properties of a flood event, such as the velocity and duration. Due to the effect the parameters have on the outcome of the model and thus as well on the conclusions, it is important for reviewers and readers to understand how these parameters work in order to evaluate the quality of the model or to reuse the source code. Figures are used frequently to support this task but explaining the behavior of the parameters through static text and figures is challenging for the paper authors. If the source code is open and reproducible, readers can investigate the code by themselves and rerun the analysis based on different parameter values. However, this is time-consuming and becoming more complex with a growing number of parameters. Readers would need to investigate which parameters are used in the model and how these can be changed. More advanced approaches like the Executable Research Compendium (ERC) support this task with the help of so-called Bindings, fine-grained links between those source code lines and data subsets that are needed to reproduce a specific computational result, e.g., a figure (Nüst et al. 2017, Konkol et al. 2019). These links can be used to create interactive figures allowing readers to change parameter values using, for example, a slider or radio buttons. Despite this benefit, researchers are often reluctant to publish their research in this format since they fear additional work. Furthermore, it remains open if ERCs eventually help authors communicate the results in a way that readers can better understand them in less time and with more confidence compared to papers that are supplemented by a folder containing the code and data. To shed light on these issues, the experiment reported in this study examines the following research questions: Compared to static articles supplemented by source code and data (RQ1) how effective are ERCs in helping readers understand the model parameters underlying the computational results reported in a scientific paper? (RQ2) how confident do readers feel when answering questions while working with ERCs? (RQ3) how much time do readers need for studying ERCs? And finally, (RQ4) which of the two approaches do readers prefer? Selecting and preparing a use case Use case We selected the paper “INSYDE: a synthetic, probabilistic flood damage model based on explicit cost analysis” (Dottori et al. 2016) since it is open access, supplemented by open and reproducible data and R code, includes a computational analysis that is visualized in a figure, and based on parameters. Changing these parameters affect the outcome of the analysis. Furthermore, the analysis and the figure are computed quickly which avoids long waiting times after changing parameter values. The paper describes a model that calculates damage costs after a flood event based on several parameters, e.g. flood duration and water flow velocity. The authors discuss these parameters and suggest alternative values. Moreover, they report on a sensitivity analysis showing that the influence of the parameters is important. Parameters We prepared the paper for the study as follows: After downloading and reproducing the code and the data, we looked for parameters in the analysis and investigated how we could make them interactive (see Konkol et al. (2019) for further details). From the numerous parameters used in the analysis, we selected “duration” of the flood event, “velocity” of the water flow, and “sediment concentration” of the water. We assumed these parameters are of particular importance since they are mentioned in the corresponding figure caption, part of a separate R file dedicated to parameters, and changing them within the range suggested in the paper affects the damage calculation. The fourth parameter mentioned in the caption, “water quality”, was not considered since it only indicates the presence of pollutants in the water and thus cannot be changed within a predefined range. Hence, it is not comparable to the other parameters. To reduce reading time to a minimum that can be handled in a study, we excluded the following sections from the text: Related Work, Discussion, Conclusion, sections reporting on the uncertainty of the analysis, parts from the abstract that refer to excluded sections, and citations. Although the section Sensitivity Analysis explains some of the model behavior, we also excluded it since we had no code to reproduce the resulting figure. The final text is available online (https://uni-muenster.sciebo.de/s/6UxNyvgG3Bb1n7U). We also made adaptations to the source code. We renamed the parameter names to more meaningful names considering their actual purpose, e.g. “v” to “velocity”. Afterwards, we created two versions, the Executable Research Compendium (https://o2r.uni-muenster.de/#/erc/tRnj5) and the folder version (https://uni-muenster.sciebo.de/s/QptKFlUumKRVAXh). For the Executable Research Compendium, we created an R Markdown including the entire source code as well as the manuscript and bindings. We uploaded the ERC to the o2r-platform. The folder contains the HTML as well as the PDF version of the manuscript and the attached materials, i.e., all datasets, R scripts, and the same R Markdown that was used for the ERC. Except for the bindings, both versions are thus identical but while the ERC is used in combination with the platform, the folder is used locally. The three selected parameters have the following properties and affect the damage cost calculation as follows: The “duration” parameter is provided in hours and the default value is 24. The range is indicated from 0 to any value larger than 0. Since 24 is the largest value we could find in the paper, we defined the range from 1-24. We thus implemented the interactive figure with a slider that ranges from 1-24 and a step size of 1 since only integer values were used in the text. Changing the duration from 1 to 12 hours while keeping the other values constant leads to no increase in damage costs. As of 13 hours, the damage costs increase more and more. The “velocity” parameter is provided in meter per seconds and the default value is 0.5. The range is indicated from 0 to any value larger than 0. Since 2.0 is the largest value we could find in the paper, we defined the range from 0.0-2.0. We thus implemented the interactive figure with a slider that ranges from 0.0-2.0 and a step size of 0.1 since the values in the text were indicated with one digit after the decimal point. Changing the velocity while keeping the other values constant leads to the first increase at 1 m/s. Further increases occur between 1.3 and 1.5 as well as at 2 m/s. The “sediment concentration” parameter is provided in percent and the default value is 0.05. The range is indicated from 0-1. We thus implemented the interactive figure with a slider that ranges from 0.00-1.00 and a step size of 0.05. Changing the sediment concentration while keeping the other values the same constantly results in higher damage costs. The experimental setup is explained in the following sections.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........be1d5963c492e0221c67d62a7c09a2d0
Full Text :
https://doi.org/10.17605/osf.io/95tza