Drobac, Senka, Enqvist, Johanna, Leskinen, Petri, Wahjoe, Muhammad Faiz, Rantala, Heikki, Koho, Mikko, Pikkanen, Ilona, Jauhiainen, Iida, Tuominen, Jouni, Paloposki, Hanna-Leena, La Mela, Matti, Hyvönen, Eero, Scholger, Walter, Vogeler, Georg, Tasovac, Toma, Baillot, Anne, Raunig, Elisabeth, Scholger, Martina, Steiner, Elisabeth, Centre for Information Modelling, and Helling, Patrick
This paper describes the process of gathering, aggregating, harmonizing, and publishing epistolary metadata through collaboration with Finnish cultural heritage (CH) organizations in order to create an inclusive archive for bottom-up analyses of 19th-century epistolary culture in the Grand Duchy of Finland (1808/09-1917). The authors are working in the digital humanities consortium project Constellations of Correspondence (CoCo) [1]. The unified metadata collections are harmonized, linked, enriched, and published on a Linked Open Data (LOD) service, and as a semantic web portal. In Europe, there are several digital humanities projects using well-curated metadata (detailed information about senders, recipients, dates, and places) from edited letter collections - like CKCC [2], correspSearch [3, 4], the Early Modern Letters Online (EMLO) [5][6], Norkorr [7], and SKILLNET [8]. In our project, most of the data come from unpublished collections scattered around different Finnish CH organizations. Collaboration with these CH organizations is pivotal for the successful outcome of the project. It requires a dialogue with them throughout the whole project period in the form of seminars and site visits, as well as sharing blogs and newsletters, also after the organizations have provided their letter metadata. We have also already seen that some of the participating organizations are prepared to clean their metadata or catalogue previously uncatalogued archival material to provide better and more metadata for the project. We will discuss this two-way process using the Finnish National Gallery as a case study. An important challenge yet to be studied profoundly is, if and how the CoCo project will be able to deliver to the CH organizations their metadata in an enriched format. In the first phase of the project, we conducted a survey that was sent to over 100 CH organizations (extending from small local museums to official central archives). The paper describes how the information was collected and how the survey was constructed in order to provide us with detailed enough information regarding their 19th-century collections and metadata formats. At the same time, we had to keep the query succinct in order to make the answering as effortless as possible. As to the data processing, we began with more than 350 000 letters, from eight different sources, each in its own digital format. Although the received data is mostly structured, we needed to parse running text to retrieve metadata in nearly every collection. Moreover, we had to analyze each dataset and identify possible structural mistakes. Furthermore, some records required Natural Language Processing to get actor names (e.g. senders, recipients) in dictionary format. The most difficult task has been to process word files which contain correspondence metadata in a variety of formats, easily understandable to humans but difficult for computational processing. A harmonizing data model for epistolary metadata collections was developed, which builds on international standards like CIDOC CRM to promote interoperability. The most central classes are Letter, Place and Actor. Also, provenance and archival information are included. Finally, the actor data is enriched by linking it to external databases like Wikidata and the Finnish AcademySampo and BiographySampo. These external sources provide detailed biographical information, e.g., times and places of birth and death, name variations, occupations, or genealogical relationships. Information present in the letter metadata like actor names and times of sending and receiving is used for matching entities between our data and the external databases, and further to reconcile the actors between data sources. References [1] J. Tuominen, et al., Constellations of Correspondence: a linked data service and portal for studying large and small networks of epistolary exchange in the Grand Duchy of Finland, in: 6th Digital Humanities in Nordic and Baltic Countries Conference, 2022. URL: http://ceur-ws.org/Vol-3232/paper41.pdf. [2] C. van den Heuvel, Mapping knowledge exchange in Early Modern Europe: Intellectual and technological geographies and network representations, International Journal of Humanities and Arts Computing 9 (2015) 95–114. doi:10.3366/ijhac.2015.0140. [3] S. Dumont, S. Grabsch, J. Müller-Laackman, correspsearch – connect scholarly editions of correspondence (2.0.0) [web service], Berlin–Brandenburg Academy of Sciences and Humanities, 2021. URL: https://correspSearch.net. [4] S. Dumont, correspSearch – connecting scholarly editions of letters, Journal of the Text Encoding Initiative (2016). doi:10.4000/jtei.1742. [5] URL: http://emlo.bodleian.ox.ac.uk. [6] H. Hotson, T. Wallnig (Eds.), Reassembling the Republic of Letters in the Digital Age: Standards, Systems, Scholarship, Göttingen University Press, 2019. [7] A. Rockenberger, et al., Norwegian correspondences and linked open data, in: Proceedings of the Digital Humanities in the Nordic Countries 4th Conference, volume 2364 of CEUR Workshop Proceedings, 2019, pp. 365–375. URL: http://ceur-ws.org/Vol-2364/33_paper.pdf. [8] Sharing Knowledge in Learned and Literary Networks – The Republic of Letters as a Pan-European Knowledge Society (SKILLNET), URL: https://skillnet.nl.