1. Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1.
- Author
-
Hardisty, Alex, Saarenmaa, Hannu, Casino, Ana, Dillen, Mathias, Gödderz, Karsten, Groom, Quentin, Hardy, Helen, Koureas, Dimitris, Nieva de la Hidalga, Abraham, Paul, Deborah L., Runnel, Veljo, Vermeersch, Xavier, van Walsum, Myriam, and Willemse, Luc
- Subjects
DIGITIZATION ,GENETIC barcoding ,SCIENTIFIC community ,DATA analysis - Abstract
DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking, DiSSCo makes the data from natural science collections available as one virtual data cloud, connected with data emerging from new techniques and not already linked to specimens. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (Computer-assisted Tomography (CT), Synchrotron, etc.), to name but a few; and will lead to a wide range of end-user services that begins with finding, accessing, using and improving data. DiSSCo will deliver the diagnostic information required for novel approaches and new services that will transform the landscape of what is possible in ways that are hard to imagine today. With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in many tens of petabytes of new data over the next decades, used on average by 5,000 - 15,000 unique users every day. This requires new skills, clear policies and robust procedures and new technologies to create, work with and manage large digital datasets over their entire research data lifecycle, including their long-term storage and preservation and open access. Such processes and procedures must match and be derived from the latest thinking in open science and data management, realising the core principles of 'findable, accessible, interoperable and reusable' (FAIR). Synthesised from results of the ICEDIG project ("Innovation and Consolidation for Large Scale Digitisation of Natural Heritage", EU Horizon 2020 grant agreement No. 777483) the DiSSCo Conceptual Design Blueprint covers the organisational arrangements, processes and practices, the architecture, tools and technologies, culture, skills and capacity building and governance and business model proposals for constructing the digitisation infrastructure of DiSSCo. In this context, the digitisation infrastructure of DiSSCo must be interpreted as that infrastructure (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities for mass digitisation and digitisation-ondemand, and for the subsequent management (i.e., curation, publication, processing) and use of the resulting data. The blueprint constitutes the essential background needed to continue work to raise the overall maturity of the DiSSCo Programme across multiple dimensions (organisational, technical, scientific, data, financial) to achieve readiness to begin construction. Today, collection digitisation efforts have reached most collection-holding institutions across Europe. Much of the leadership and many of the people involved in digitisation and working with digital collections wish to take steps forward and expand the efforts to benefit further from the already noticeable positive effects. The collective results of examining technical, financial, policy and governance aspects show the way forward to operating a large distributed initiative i.e., the Distributed System of Scientific Collections (DiSSCo) for natural science collections across Europe. Ample examples, opportunities and need for innovation and consolidation for large scale digitisation of natural heritage have been described. The blueprint makes one hundred and four (104) recommendations to be considered by other elements of the DiSSCo Programme of linked projects (i.e., SYNTHESYS+, COST MOBILISE, DiSSCo Prepare, and others to follow) and the DiSSCo Programme leadership as the journey towards organisational, technical, scientific, data and financial readiness continues. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF