Start Over

Why Openness and Reproducibility in Machine Learning Matter

Authors :: Messerschmidt, Lena
Schrader, Antonia C.
Steinbach, Peter
Ferguson, Lea Maria
Pampel, Heinz
Publication Year :: 2023
Publisher :: Zenodo, 2023.
Abstract: In research fields with complex scientific and technical infrastructures that generate large volumes of research data, Artificial Intelligence (AI) and Machine Learning (ML) methods are ubiquitous and hold promising possibilities to reuse these unique data treasures. In this endeavour, it is ever more important that these methods are trustworthy and reliable. This includes transparency and openness of the infrastructure, tools, workflows, and resources that are used to enable (computational) reproducibility of research results. Research is reproducible when sufficient detail (about data, code, software, hardware, and implementation details) is provided to run the analysis again, re-creating the results. This is a key quality indicator in research, which is also in line with established principles of good scientific practice. In Data Science, reproducibility is an important requirement for the integrity of model results and building of trust towards the overwhelming expansion of AI systems applications. However, the field of Machine Learning (e.g. Large Language Models and others) experiences what is called a reproducibility crisis and it is difficult to reproduce important results. Experience reports refer to many publications as being not replicable, being statistically insignificant, or suffering from narrative fallacy. The endeavour of Open Science, of making scientific outputs as easily accessible as possible for everyone, closely links to the reproducibility of research. The application of open and reproducible practices in ML research has the potential to promote responsible use of AI by openly describing the procedures and applications, thus promoting the overall integrity of the scientific output and applications. Open and reproducible practices are therefore essential pillars of the democratization of AI sciences. This presentation focuses on Open Science practices to improve reproducibility at Helmholtz and highlight its importance for robust, reproducible and trustworthy research in ML.