1. D2.6: Ontology of licencing, ownership and conditions of use (V1.0)
- Author
-
Daga, Enrico, Carvalho, Jason, Gurrieri, Marco, Scharnhorst, Andrea, Daga, Enrico, Carvalho, Jason, Gurrieri, Marco, and Scharnhorst, Andrea
- Abstract
In research workflows under the paradigm of Open Science (standing for reproducibility of research, open access toknowledge, and societal responsibility of research) licences play an increasing role. With digitisation and automaticinformation processing, licences become important to also to guide the actions of machines, for example, in sup-porting the exploration and selection of resources and auditing their fair reuse. In the context of Polifonia we dealprimarily with licences which come with content provided in the public sphere by cultural heritage institutions. But,we are also dealing with other source material: for instance information scrapped from websites, and we produceand re-use software which also comes with a licence, such as the resources catalogued by the musoW registry ofmusical resources on the Web. There are various issues when it comes to licences: - there is a large variety of licences and copyright statementsused in the domain of musical content - the information about licences is not always added to metadata or not addedin a standardised way, but often ’hidden’ in plain text on websites - licences regulating the access to and use ofa webservices (e.g., repositories) and licences regulating the access and use of content provided via webservices(e.g. datasets in a repository) are kind of entangled - there might be various, sometimes contradicting each other,licence information available for a certain data collection. In this deliverable, we focus on the problem of extracting licence information from Web resources. More specifically,we look into the coverage of licence metadata in data registries, such as musoW a catalogue in which all main datacomponents used by Polifonia are registered, next to a large number other sources. We set up piplines to check forlicence information, and where possible to enrich it, text-mining the original websites/soruces to which the cataloguerefers. We do so with the aid of Large Language Mode
- Published
- 2024