Back to Search Start Over

Setting up an Interdisciplinary Data Infrastructure: Why Cooperation between Domain Experts and Computer Scientists Matters - An Experience Report from the GFBio Project

Authors :
Jana Hoffmann
Felicitas Löffler
Dagmar Triebel
Birgitta König-Ries
Falko Glöckler
Anton Güntsch
Robert Huber
Janine Felden
Source :
Biodiversity Information Science and Standards 1: e20198
Publication Year :
2017
Publisher :
Pensoft Publishers, 2017.

Abstract

The German Federation for Biological Data (GFBio; Diepenbroek et al. 2014) is implementing a national infrastructure for the preservation, integration, and publication of biological data collected in German research projects. GFBio is built upon an archive infrastructure comprised of nine data centers including PANGAEA and the major German Natural Science Collections (German Federation for Biological Data (GFBio) 2017a). Creating and running GFBio requires close collaborations within a highly interdisciplinary consortium. Bringing together expertise from collections, scientists in the relevant fields, biodiversity informaticians and computer scientists proved to be essential for designing and building this system. GFBio is currently in its second funding phase. Essential services, required for the operation of the future infrastructure, have been successfully implemented. The realized technologies and tools use globally accepted standards as well as innovative concepts e.g., for data visualisation or semantic integration. A portal (https://www.gfbio.org) provides a common point of access to all GFBio services: data submission, data discovery, data visualisation and analysis, a terminology service, and a help desk. In addition, archived research data is shared with international information infrastructures such as the Global Biodiversity Information Facility (GBIF) and the Biological Collection Access Service (BioCASE). As the data centers use different systems and thus internally build upon different data structures (German Federation for Biological Data (GFBio) 2017b), the search functionality integrated in the portal is an good example of the collaboration between teams of different expertise. Since the aim was to provide an integrated, faceted search, it was necessary to agree on common fields that can be used to feed the facets. Therefore, the GFBio data centers agreed on using ABCD 2.06 (Access to Biological Collection Data) as a common standard and specified thirty elements for data exchange. Here, it was essential to bring together (1) domain experts for defining which facets they consider useful for an effective search, (2) computer scientists for providing the implementation based on Elasticsearch (Elasticsearch 2017), (3) biodiversity informaticians for defining mappings between different standards and (4) data curators from the GFBio data centers and long-term repositories for negotiating the set of mandatory fields. The starting point for broader research data management workflows was derived from high-quality data provided via publishing pipelines established at each data center. With that, primary collection and research data are available with metadata and data units according to the ABCD community standard and are ready to be reused following the FAIR data principles (Wilkinson et al. 2016): Findable, Accessible, Interoperable, Re-usable. Consequently, interdisciplinary cooperation is the GFBio data portal's measure of success.

Details

Language :
English
ISSN :
25350897
Database :
OpenAIRE
Journal :
Biodiversity Information Science and Standards
Accession number :
edsair.doi.dedup.....cc4b7586741c0c17936063cab03033eb