1. Research data sharing, reuse, and metrics : adoption and challenges across disciplines and repositories
- Author
-
Khan, Nushrat and Thelwall, Mike
- Subjects
data sharing ,data reuse ,data repositories ,research data - Abstract
Data sharing is widely believed to be beneficial to science and is now supported by digitization and new online infrastructures for sharing datasets. Nevertheless, differences in research cultures and the sporadic development of data repositories, support services, guidelines, and policies have resulted in uneven data sharing and reuse practices. An overall understanding of the current situation is therefore needed to identify gaps and next steps. In response, using two case studies and two surveys, this dissertation explores the current landscape and identifies challenges within data sharing and reuse practices. The results demonstrate how present systems and policies could be modified to support and encourage these activities. The researcher survey found that the type and format of data produced, as well as systematic data sharing varied between disciplines, with Physical Sciences and Earth and Planetary Sciences leading and Business and Economics, Engineering, and Medicine lagging in some respects. Surveys and observations were frequently produced in most fields, with samples and simulations being common in science and engineering and qualitative data being more prevalent in the social sciences, business, and humanities. Researchers who had prior data reuse experience shared data more frequently (56.8%, n=1,004) than those who only used their primary data for research (32.6%, n=396). The biodiversity case study and surveys show that secondary data are valuable for many purposes, but most struggle to find datasets to reuse. Data citations can incentivize data sharing, although a lack of appropriate data citations and reliable technologies make it difficult to efficiently track them. In biodiversity, where the sharing and reuse of open data via mature infrastructures is common, citing secondary datasets in references or data access statements has been increasing (48%, n=99). However, users simultaneously exploiting many data subsets in this field complicate the situation. This thesis makes recommendations for handling large numbers of biodiversity data subsets to attribute citations accurately. It also suggests further enhancements for the article-dataset linking service, Scholexplorer, to automatically capture such links. Based on responses from data repository managers, this research further identifies nine objectives for future repository systems. Specifically, 30% (n=34) of the surveyed managers would like integration and interoperability between data and systems, 19% (n=22) want better research data management tools, 16% (n=18) want tools that allow computation without downloading datasets, and 16% (n=18) want automated systems. It also makes 23 recommendations in three categories to support data sharing and promote further data reuse including 1) improved access and usability of data, as well as formal data citations; 2) improved search systems with suggested new features; and 3) cultural and policy-related issues around awareness and acceptance, incentives, collaboration, guidelines, and documentation. Finally, based on researcher feedback, this study proposes an alternative scoring model that combines a dataset quality score and a data reuse indicator that can be incorporated in academic evaluation systems. The outcomes from this research will help funders, policymakers and technology developers prioritize areas of improvement to incentivize data sharing and support data reuse with easily discoverable and usable data.
- Published
- 2021