1. Workflow and Utilities for Research Data Management in Applied Plasma Physics
- Author
-
Nick Plathe, Hans Höft, Steffen Franke, Ihda Chaerony Siffa, and Markus M. Becker
- Subjects
Data Science ,Research Data Management ,Applied Plasma Physics - Abstract
With the emergence of ever-increasing size of data- and metadata sets in applied plasma physics, there is a need for development of robust, reliable and reproducible methods and workflows supporting researchers in their work. The successful tracking and storage of metadata has become increasingly important in recent years, evolving from paper-based notation to computer-assisted collection and storage utilising modern, scalable and flexible approaches and techniques. However, despite several interoperable and supportive standard data formats, the FAIR data principles and a growing number of tools, the collection and maintenance of data and metadata remains a lengthy and complicated process. Not only because of the challenge of vendor-dependent, non-standard data formats and complex datasets, but also because of the different needs in the course of laboratory work, reflected in different experimental setups, internal and external conditions, or equipment used and installed, even in the same laboratories. Facing these challenges, in the present contribution we propose several applications assembled in an exemplary research data management workflow, which is currently under development. The aim of this workflow is to simplify the process of annotating data obtained from electrical measurements using an oscilloscope, which is an often-used diagnostic technique in the field of applied plasma physics. The workflow is implemented in Python, utilising JSON schema to gather and organise metadata. Since relevant metadata is partly stored in the data files generated by the oscilloscopes and other potentially used devices, a modularly structured library that provides read functions for raw and converted oscilloscope data from different manufacturers is used to capture and organise the metadata. Furthermore, JSON schema validation is applied to ensure compliant metadata structures. This ensures that the generated metadata can be further processed in a simple way. The modularity makes maintaining code easier and provides a comprehensive way to easily include additional functionality to read in more and other vendor-dependent data or to transform the way data and metadata are acquired. In order to document and manage missing metadata, a second library responsible for the communication between the other parts of the workflow tools provides a template storage and invokes software that supports editing the data based on the used schema. Future versions of the workflow implementation will feature data wrangling and analysis as well. Finally, a third library provides capabilities to store the data either locally or in an electronic laboratory notebook. The standardised data from devices are supposed to be retrieved from either a local storage or an electronic laboratory notebook. All of these functions are controlled and executed via a graphical user interface, but may be used via command-line as well. This enables further automation and integration of provided functionalities into existing solutions. It is planned to publish the workflow implementation as open source in the near future. Two dedicated tools for interaction with metadata stored in the JSON file format are part of the presented workflow. Both tools are written in Python as well and can be used as standalone applications in a local environment. They both feature a plain user interface with technical details being hidden from the user in order to simplify usability and work with metadata, while structural concepts are retained. The first tool is a metadata editor based on JSON schema templates. Information can be entered and edited through a table-like structure, which most end users are familiar with. The second tool is capable of finding, indexing and searching metadata stored on a local hard drive, emphasising on usability and (re-)findability of data files described by metadata. Several different search modes shall provide accessibility and organisation of experimental data together with their metadata.
- Published
- 2022
- Full Text
- View/download PDF