Back to Search
Start Over
Setting up an Interdisciplinary Data Infrastructure: Why Cooperation between Domain Experts and Computer Scientists Matters - An Experience Report from the GFBio Project
- Source :
- Biodiversity Information Science and Standards 1: e20198
- Publication Year :
- 2017
- Publisher :
- Pensoft Publishers, 2017.
-
Abstract
- The German Federation for Biological Data (GFBio; Diepenbroek et al. 2014) is implementing a national infrastructure for the preservation, integration, and publication of biological data collected in German research projects. GFBio is built upon an archive infrastructure comprised of nine data centers including PANGAEA and the major German Natural Science Collections (German Federation for Biological Data (GFBio) 2017a). Creating and running GFBio requires close collaborations within a highly interdisciplinary consortium. Bringing together expertise from collections, scientists in the relevant fields, biodiversity informaticians and computer scientists proved to be essential for designing and building this system. GFBio is currently in its second funding phase. Essential services, required for the operation of the future infrastructure, have been successfully implemented. The realized technologies and tools use globally accepted standards as well as innovative concepts e.g., for data visualisation or semantic integration. A portal (https://www.gfbio.org) provides a common point of access to all GFBio services: data submission, data discovery, data visualisation and analysis, a terminology service, and a help desk. In addition, archived research data is shared with international information infrastructures such as the Global Biodiversity Information Facility (GBIF) and the Biological Collection Access Service (BioCASE). As the data centers use different systems and thus internally build upon different data structures (German Federation for Biological Data (GFBio) 2017b), the search functionality integrated in the portal is an good example of the collaboration between teams of different expertise. Since the aim was to provide an integrated, faceted search, it was necessary to agree on common fields that can be used to feed the facets. Therefore, the GFBio data centers agreed on using ABCD 2.06 (Access to Biological Collection Data) as a common standard and specified thirty elements for data exchange. Here, it was essential to bring together (1) domain experts for defining which facets they consider useful for an effective search, (2) computer scientists for providing the implementation based on Elasticsearch (Elasticsearch 2017), (3) biodiversity informaticians for defining mappings between different standards and (4) data curators from the GFBio data centers and long-term repositories for negotiating the set of mandatory fields. The starting point for broader research data management workflows was derived from high-quality data provided via publishing pipelines established at each data center. With that, primary collection and research data are available with metadata and data units according to the ABCD community standard and are ready to be reused following the FAIR data principles (Wilkinson et al. 2016): Findable, Accessible, Interoperable, Re-usable. Consequently, interdisciplinary cooperation is the GFBio data portal's measure of success.
- Subjects :
- 0301 basic medicine
010504 meteorology & atmospheric sciences
Computer science
Data management
01 natural sciences
Domain (software engineering)
03 medical and health sciences
data portal
Experience report
0105 earth and related environmental sciences
Research data
data curation
Spatial data infrastructure
Access to Biological Collection Data
Data curation
FAIR principles
business.industry
Data management plan
General Medicine
research data
Data science
data archiving
Data portal
030104 developmental biology
GFBio
data management
business
data standards
Subjects
Details
- Language :
- English
- ISSN :
- 25350897
- Database :
- OpenAIRE
- Journal :
- Biodiversity Information Science and Standards
- Accession number :
- edsair.doi.dedup.....cc4b7586741c0c17936063cab03033eb