Back to Search Start Over

Unlocking web archives through metadata, seed lists and derived data

Authors :
Luxembourg Centre for Contemporary and Digital History (C2DH) > Contemporary European History (EHI) [research center]
Clavert, Frédéric
Schafer, Valerie
Luxembourg Centre for Contemporary and Digital History (C2DH) > Contemporary European History (EHI) [research center]
Clavert, Frédéric
Schafer, Valerie
Publication Year :
2022

Abstract

This presentation addresses the use, re-use, access and dissemination of data related to web archives. Web archives (Brügger, 2018) have been for several years in a hybrid position regarding access, depending on the institutions that were preserving them. While Internet Archive has made its collections available online since 2001 through the Wayback Machine (but with limited features for scholars willing to conduct a distant reading based on data, WARC files, etc.), most national libraries only allowed an onsite access due to authors rights restrictions (and in some cases the frame of legal deposits), while starting to provide interesting metadata for research projects willing to explore them. However, the situation is currently evolving in the frame of several research projects that allow to access a vast amount of (international) metadata and datasets. Taking two research projects in progress as case studies, WARCnet and AWAC2, this paper aims to present the move towards the use of metadata and derived data related to huge collections of web archives of the COVID crisis. WARCnet (Web ARChive studies network researching web domains and events) is a network whose activities (funded by the Independent Research Fund Denmark | Humanities (grant no 9055-00005B)) run in 2020-2023. The networking activities are guided by overarching research questions, one of them being “How transnational events developed on the European web?” (and notably the COVID crisis which is explored in WG2 (https://cc.au.dk/en/warcnet/working-groups)). AWAC2 (Analysing Web Archives of the COVID Crisis through the IIPC Novel Coronavirus dataset) is a project part of the Archives Unleashed Cohort Program, that supports and facilitates research engagement with web archives. It aims to explore a unique collection of web material (https://archive-it.org/collections/13529) related to the pandemic, with contributions from over 30 members of IIPC (International Internet Preservation Consortium) as well as

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1333446868
Document Type :
Electronic Resource