1. Increasing the Reproducibility and Replicability of Supervised AI/ML in the Earth Systems Science by Leveraging Social Science Methods.
- Author
-
Wirz, Christopher D., Sutter, Carly, Demuth, Julie L., Mayer, Kirsten J., Chapman, William E., Cains, Mariana Goodall, Radford, Jacob, Przybylo, Vanessa, Evans, Aaron, Martin, Thomas, Gaudet, Lauriana C., Sulia, Kara, Bostrom, Ann, Gagne, David John, Bassill, Nick, Schumacher, Andrea, and Thorncroft, Christopher
- Subjects
EARTH system science ,SUPERVISED learning ,ARTIFICIAL intelligence ,SOCIAL science research ,ARTIFICIAL hands - Abstract
Artificial intelligence (AI) and machine learning (ML) pose a challenge for achieving science that is both reproducible and replicable. The challenge is compounded in supervised models that depend on manually labeled training data, as they introduce additional decision‐making and processes that require thorough documentation and reporting. We address these limitations by providing an approach to hand labeling training data for supervised ML that integrates quantitative content analysis (QCA)—a method from social science research. The QCA approach provides a rigorous and well‐documented hand labeling procedure to improve the replicability and reproducibility of supervised ML applications in Earth systems science (ESS), as well as the ability to evaluate them. Specifically, the approach requires (a) the articulation and documentation of the exact decision‐making process used for assigning hand labels in a "codebook" and (b) an empirical evaluation of the reliability" of the hand labelers. In this paper, we outline the contributions of QCA to the field, along with an overview of the general approach. We then provide a case study to further demonstrate how this framework has and can be applied when developing supervised ML models for applications in ESS. With this approach, we provide an actionable path forward for addressing ethical considerations and goals outlined by recent AGU work on ML ethics in ESS. Plain Language Summary: Artificial intelligence and machine learning can make it hard to do science in a way that can be repeated. This can mean redoing a study in the exact same way to see if you can get the same or similar results (reproducibility) or trying to use the same study design on a new problem to see if the results are the same or similar (replicability). These types of scientific repetitions is important for developing robust knowledge, but is hard to do with certain types of machine learning that rely on data that were categorized by researchers. The researchers have to make decisions and categorize their data, which the machine learning algorithm then uses as a guide to make its own decisions. Generally, there is not enough information shared by the researchers about how these decisions were made to repeat the science or evaluate how good it is. In this paper, we provide a way to address these shortcomings. The approach and example we offer illustrates how to (a) create a rulebook that can be shared for how to make decisions and (b) quantitatively measure how consistent the researchers are at using that rulebook to make their decisions. Key Points: We provide a rigorous hand labeling procedure to improve the replicability and reproducibility of supervised machine learning (ML)Our case study and step‐by‐step guide clearly outline how the procedure can be appliedThe procedure is an actionable path forward for addressing ethical considerations and goals for ML development in Earth systems science [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF