39 results on '"Ionov, Maxim"'
Search Results
2. Querying the Lexicon der indogermanischen Verben in the LiLa Knowledge Base: Two Use Cases
- Author
-
Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Boano Valeria, Irene, Passarotti, Marco Carlo, Ginevra, Riccardo, Passarotti Marco (ORCID:0000-0002-9806-7187), Ginevra Riccardo (ORCID:0000-0002-6731-6494), Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Boano Valeria, Irene, Passarotti, Marco Carlo, Ginevra, Riccardo, Passarotti Marco (ORCID:0000-0002-9806-7187), and Ginevra Riccardo (ORCID:0000-0002-6731-6494)
- Abstract
This paper presents two use cases of the etymological data provided by the *Lexicon der indogermanischen Verben* (LIV) after their publication as Linked Open Data and their linking to the LiLa Knowledge Base (KB) of interoperable linguistic resources for Latin. The first part of the paper briefly describes the LiLa KB and its structure. Then, the LIV and the information it contains are introduced, followed by a short description of the ontologies and the extensions used for modelling the LIV{'}s data and interlinking them to the LiLa ecosystem. The last section details the two use cases. The first case concerns the inflection types of the Latin verbs that reflect Proto-Indo-European stems, while the second one focusses on the Latin derivatives of the inherited stems. The results of the investigations are put in relation to current research topics in Historical Linguistics, demonstrating their relevance to the discipline.
- Published
- 2024
3. The MOLOR Lemma Bank: a New LLOD Resource for Old Irish
- Author
-
Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Fransen, Theodoru, Anderson, Cormac, Beniamine, Sacha, Passarotti, Marco Carlo, Fransen Theodorus (ORCID:0000-0001-5639-8626), Passarotti Marco (ORCID:0000-0002-9806-7187), Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Fransen, Theodoru, Anderson, Cormac, Beniamine, Sacha, Passarotti, Marco Carlo, Fransen Theodorus (ORCID:0000-0001-5639-8626), and Passarotti Marco (ORCID:0000-0002-9806-7187)
- Abstract
This paper describes the first steps in creating a Lemma Bank for Old Irish (600-900CE) within the Linked Data paradigm, taking inspiration from a similar resource for Latin built as part of the LiLa project (2018{--}2023). The focus is on the extraction and RDF conversion of nouns from Goidelex, a novel and highly structured morphological resource for Old Irish. The aim is to strike a good balance between retaining a representative level of morphological granularity and at the same time keeping the amount of lemma variants within workable limits, to facilitate straightforward resource interlinking for Old Irish, planned as future work.
- Published
- 2024
4. The Services of the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
- Author
-
Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Passarotti Marco (ORCID:0000-0002-9806-7187), Mambrini Francesco (ORCID:0000-0003-0834-7562), Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Passarotti Marco (ORCID:0000-0002-9806-7187), and Mambrini Francesco (ORCID:0000-0003-0834-7562)
- Abstract
This paper describes three online services designed to ease the tasks of querying and populating the linguistic resources for Latin made interoperable through their publication as Linked Open Data in the LiLa Knowledge Base. As for querying the KB, we present an interface to search the collection of lemmas that represents the core of the Knowledge Base, and an interactive, graphical platform to run queries on the resources currently interlinked. As for populating the KB with new textual resources, we describe a tool that performs automatic tokenization, lemmatization and Part-of-Speech tagging of a raw text in Latin and links its tokens to LiLa.
- Published
- 2024
5. cqp4rdf: Towards a Suite for RDF-Based Corpus Linguistics
- Author
-
Ionov, Maxim, primary, Stein, Florian, additional, Sehgal, Sagar, additional, and Chiarcos, Christian, additional
- Published
- 2020
- Full Text
- View/download PDF
6. LLODifying Linguistic Glosses
- Author
-
Chiarcos, Christian, Ionov, Maxim, Rind-Pawlowski, Monika, Fäth, Christian, Schreur, Jesse Wichers, Nevskaya, Irina, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gracia, Jorge, editor, Bond, Francis, editor, McCrae, John P., editor, Buitelaar, Paul, editor, Chiarcos, Christian, editor, and Hellmann, Sebastian, editor
- Published
- 2017
- Full Text
- View/download PDF
7. Universal Morphology zwischen Sprachtechnologie und Sprachwissenschaft: Sprachressourcen für Kaukasussprachen
- Author
-
Chiarcos, Christian, Donandt, Kathrin, Ionov, Maxim, Rind-Pawlowski, Monika, Sargsian, Hasmik, Wichers Schreur, Jesse, Vogeler, Georg, and Helling, Patrick
- Subjects
Universal Morphology ,Morphologie(generierung) ,DHd2018 ,Sprachdokumentation ,Community-Standards ,ddc:004 ,Schnittstellenprobleme (NLP vs. Sprachwissenschaft) - Abstract
A single abstract from the DHd-2018 Book of Abstracts., Sofern eine editorische Arbeit an dieser Publikation stattgefunden hat, dann bestand diese aus der Eliminierung von Bindestrichen in Überschriften, die aufgrund fehlerhafter Silbentrennung entstanden sind, der Vereinheitlichung von Namen der Autor*innen in das Schema "Nachname, Vorname" und/oder der Trennung von Überschrift und Unterüberschrift durch die Setzung eines Punktes, sofern notwendig., {"references":["https://doi.org/10.5281/zenodo.3684897","https://github.com/DHd-Verband/DHd-Abstracts-2018"]}
- Published
- 2023
8. Linking the Tower of Babel: modelling a massive set of etymological dictionaries as RDF
- Author
-
Abromeit, Frank, Chiarcos, Christian, Fäth, Christian, and Ionov, Maxim
- Published
- 2023
9. Etymology meets linked data: a case study in Turkic [Abstract]
- Author
-
Chiarcos, Christian, Abromeit, Frank, Fäth, Christian, and Ionov, Maxim
- Published
- 2023
10. Embeddings for the lexicon: modelling and representation
- Author
-
Chiarcos, Christian, Declerck, Thierry, and Ionov, Maxim
- Subjects
ddc:004 - Published
- 2023
11. Querying a dozen corpora and a thousand years with Fintan
- Author
-
Chiarcos, Christian, Fäth, Christian, and Ionov, Maxim
- Subjects
ddc:400 - Abstract
Large-scale diachronic corpus studies covering longer time periods are difficult if more than one corpus are to be consulted and, as a result, different formats and annotation schemas need to be processed and queried in a uniform, comparable and replicable manner. We describes the application of the Flexible Integrated Transformation and Annotation eNgineering (Fintan) platform for studying word order in German using syntactically annotated corpora that represent its entire written history. Focusing on nominal dative and accusative arguments, this study hints at two major phases in the development of scrambling in modern German. Against more recent assumptions, it supports the traditional view that word order flexibility decreased over time, but it also indicates that this was a relatively sharp transition in Early New High German. The successful case study demonstrates the potential of Fintan and the underlying LLOD technology for historical linguistics, linguistic typology and corpus linguistics. The technological contribution of this paper is to demonstrate the applicability of Fintan for querying across heterogeneously annotated corpora, as previously, it had only been applied for transformation tasks. With its focus on quantitative analysis, Fintan is a natural complement for existing multi-layer technologies that focus on query and exploration.
- Published
- 2023
12. Balancing the digital presence of languages in and for technological development: a policy brief on the inclusion of data of under-resourced languages into the linked data cloud
- Author
-
Bosque-Gil, Julia, Mititelu, Verginica Barbu, Oliveira, Hugo Gonçalo, Ionov, Maxim, Gracia, Jorge, Rychkova, Liudmila, Valunaite Oleskeviciene, Giedre, Chiarcos, Christian, Declerck, Thierry, and Dojchinovsk, M.
- Published
- 2023
13. An ontology for CoNLL-RDF: formal data structures for TSV formats in language technology
- Author
-
Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, and Fäth, Christian
- Subjects
data models ,Information systems → Graph-based database models ,Computing methodologies → Knowledge representation and reasoning ,language technology ,Computing methodologies → Language resources ,CoNLL-RDF ,ontology ,ddc:400 - Abstract
In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects. CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018]. We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats., OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 20:1-20:14
- Published
- 2023
14. OntoLex-Morph: morphology for the web of data [Abstract]
- Author
-
Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Anas Fahad, Labropoulou, Penny, Passarotti, Marco, and Pellegrini, Matteo
- Published
- 2023
15. Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France
- Author
-
Declerck, Thierry, McCrae, John Philip, Montiel, Elena, Chiarcos, Christian, and Ionov, Maxim
- Subjects
ddc:400 - Published
- 2023
16. Modelling collocations in OntoLex-FrAC
- Author
-
Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Kabashi, Besim, Kha, Fahad, and Truică, Ciprian-Octavian
- Subjects
ddc:400 - Abstract
Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications.
- Published
- 2023
17. Unifying morphology resources with OntoLex-Morph: a case study in German
- Author
-
Chiarcos, Christian, Fäth, Christian, and Ionov, Maxim
- Subjects
ddc:400 - Abstract
The OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entries are linked with each other. The main contribution of this paper is the discussion of the current state of OntoLex-Morph and its validation on different types of real-world resources for a single language. In the longer term, the successful application of OntoLex-Morph to such diverse data, along with the adjustments to the vocabulary observed in the process, will be a means to establish interoperability among morphological resources as well as between them and classical lexical data such as dictionaries, WordNets, or thesauri.
- Published
- 2023
18. Untangling the Semantic Web: Microdata Use in Russian Video Content Delivery Sites
- Author
-
Kutuzov, Andrey, Ionov, Maxim, Ignatov, Dmitry I., editor, Khachay, Mikhail Yu., editor, Panchenko, Alexander, editor, Konstantinova, Natalia, editor, and Yavorsky, Rostislav E., editor
- Published
- 2014
- Full Text
- View/download PDF
19. When linguistics meets web technologies. Recent advances in modelling linguistic linked data
- Author
-
Khan, Anas Fahad, Chiarcos, Christian, Declerck, Thierry, Gifu, Daniela, González-Blanco García, Elena, Gracia, Jorge, Ionov, Maxim, Labropoulou, Penny, Mambrini, Francesco, McCrae, John P., Pagé-Perron, Émilie, Passarotti, Marco, Muñoz, Salvador Ros, and Truică, Ciprian-Octavian
- Subjects
FAIR principles ,Computer Networks and Communications ,Settore L-LIN/01 - GLOTTOLOGIA E LINGUISTICA ,Linguistic Linked Data ,Semnatic web ,ddc:400 ,Language resources ,Linguistic Linked Open Data ,Computer Science Applications ,Information Systems ,Semantic Web - Abstract
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE andlime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
- Published
- 2022
- Full Text
- View/download PDF
20. When linguistics meets web technologies. Recent advances in modelling linguistic linked data
- Author
-
Khan, Anas Fahad, primary, Chiarcos, Christian, additional, Declerck, Thierry, additional, Gifu, Daniela, additional, García, Elena González-Blanco, additional, Gracia, Jorge, additional, Ionov, Maxim, additional, Labropoulou, Penny, additional, Mambrini, Francesco, additional, McCrae, John P., additional, Pagé-Perron, Émilie, additional, Passarotti, Marco, additional, Muñoz, Salvador Ros, additional, and Truică, Ciprian-Octavian, additional
- Published
- 2022
- Full Text
- View/download PDF
21. Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
- Author
-
Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Fantoli, Margherita, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Ruffolo, Paolo, Marco Passarotti (ORCID:0000-0002-9806-7187), Francesco Mambrini (ORCID:0000-0003-0834-7562), Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Fantoli, Margherita, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Ruffolo, Paolo, Marco Passarotti (ORCID:0000-0002-9806-7187), and Francesco Mambrini (ORCID:0000-0003-0834-7562)
- Abstract
This paper describes the process of interlinking the 130 Classical Latin texts provided by an annotated corpus developed at the LASLA laboratory with the LiLa Knowledge Base, which makes linguistic resources for Latin interoperable by following the principles of the Linked Data paradigm and making reference to classes and properties of widely adopted ontologies to model the relevant information. After introducing the overall architecture of the LiLa Knowledge Base and the LASLA corpus, the paper details the phases of the process of linking the corpus with the collection of lemmas of LiLa and presents a federated query to exemplify the added value of interoperability of LASLA's texts with other resources for Latin.
- Published
- 2022
22. Computational Morphology with OntoLex-Morph
- Author
-
Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Fahad, Labropoulou, Penny, Passarotti, Marco Carlo, Pellegrini, Matteo, Passarotti Marco (ORCID:0000-0002-9806-7187), Pellegrini Matteo (ORCID:0000-0003-4378-5824), Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Fahad, Labropoulou, Penny, Passarotti, Marco Carlo, Pellegrini, Matteo, Passarotti Marco (ORCID:0000-0002-9806-7187), and Pellegrini Matteo (ORCID:0000-0003-4378-5824)
- Abstract
This paper describes the current status of the emerging OntoLex module for linguistic morphology. It serves as an update to the previous version of the vocabulary (Klimek et al. 2019). Whereas this earlier model was exclusively focusing on descriptive morphology and focused on applications in lexicography, we now present a novel part and a novel application of the vocabulary to applications in language technology, i.e., the rule-based generation of lexicons, introducing a dynamic component into OntoLex.
- Published
- 2022
23. D3.3 Language Resource Transformation Software
- Author
-
F��th, Christian, Ionov, Maxim, and Chiarcos, Christian
- Abstract
Within the Prêt-à-LLOD project, five major challenges when working with linguistic resources are addressed (cf. Fig. 1): ● Discovery of resources ● Data management and licensing ● Transformation of heterogeneous resources ● Interlinking resources ● Embedding resources and algorithms into complex workflows This short report accompanies the Prêt-à-LLOD software deliverable D3.3 1 “Resource Transformation Software”. It is meant to provide a quick overview of the motivation, software architecture and its basic functionalities. It also serves as a pointer to the repositories, where both the code and more detailed user guidelines are available.
- Published
- 2021
- Full Text
- View/download PDF
24. Linking Discourse Marker Inventories
- Author
-
Chiarcos, Christian and Ionov, Maxim
- Subjects
Information systems → Graph-based database models ,OntoLex ,Computing methodologies → Discourse, dialogue and pragmatics ,OLiA ,linked data ,ddc:400 ,discourse markers ,discourse processing - Abstract
The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data., OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 40:1-40:15
- Published
- 2021
- Full Text
- View/download PDF
25. APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text
- Author
-
Ionov, Maxim
- Subjects
Information systems → Graph-based database models ,Computing methodologies → Knowledge representation and reasoning ,interlinear glossed text (IGT) ,Computing methodologies → Language resources ,Linguistic Linked Open Data (LLOD) ,less-resourced languages in the (multilingual) Semantic Web ,data modeling - Abstract
This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation., OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 27:1-27:8
- Published
- 2021
- Full Text
- View/download PDF
26. An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology
- Author
-
Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, Chiarcos, Christian, Ionov, Maxim, and Glaser, Luis
- Published
- 2021
- Full Text
- View/download PDF
27. An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology
- Author
-
Christian Chiarcos and Maxim Ionov and Luis Glaser and Christian Fäth, Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, Fäth, Christian, Christian Chiarcos and Maxim Ionov and Luis Glaser and Christian Fäth, Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, and Fäth, Christian
- Abstract
In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects. CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018]. We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats.
- Published
- 2021
- Full Text
- View/download PDF
28. APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text
- Author
-
Maxim Ionov, Ionov, Maxim, Maxim Ionov, and Ionov, Maxim
- Abstract
This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation.
- Published
- 2021
- Full Text
- View/download PDF
29. Linking Discourse Marker Inventories
- Author
-
Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, Ionov, Maxim, Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, and Ionov, Maxim
- Abstract
The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data.
- Published
- 2021
- Full Text
- View/download PDF
30. Use of polyamidoamine dendrimers to engineer BDNF-producing human mesenchymal stem cells
- Author
-
Shakhbazau, Antos, Shcharbin, Dzmitry, Seviaryn, Ihar, Goncharova, Natalya, Kosmacheva, Svetlana, Potapnev, Mihail, Gabara, Barbara, Ionov, Maxim, and Bryszewska, Maria
- Published
- 2010
- Full Text
- View/download PDF
31. D5.1 Report on Vocabularies for Interoperable Language Resources and Services
- Author
-
Chiarcos, Christian, Cimiano, Philipp, Bosque-Gil, Julia, Declerck, Thierry, F��th, Christian, Gracia, Jorge, Ionov, Maxim, McCrae, John P., Montiel-Ponsoda, Elena, Pia di Buono, Maria, Saur��, Roser, Bobillo, Fernando, and Elahi, Mohammad Fazleh
- Abstract
This document provides a survey over vocabularies for language resources and services and sketch necessary extensions and the expected contribution of the Prêt-à-LLOD project to their further development for phenomena currently not sufficiently covered. Future updates with respect to this will be documented within Task 5.4. We focus on three main aspects of linguistically analyzed data 1. lexical-conceptual resources, i.e., repositories of terminology, lexical data, translation, and semantics, 2. linguistically annotated data, concerning linguistic analysis of textual or transcribed data, and 3. language resource terminology, i.e., linguistic data categories and metadata For these areas, we describe representative vocabularies from the Linguistic Linked Open Data community (RDF-based vocabularies) as well as other approaches (e.g., ISO TC37 standards), we identify a number of gaps, and we describe ongoing efforts to address these gaps within the Prêt-à-LLOD project.
- Published
- 2020
- Full Text
- View/download PDF
32. Proceedings of the LREC 2020 7th Workshop on Linked Data in Linguistics
- Author
-
Ionov, Maxim, McCrae, John, Chiarcos, Christian, Declerck, Thierry, Bosque-Gil, Julia, and Gracia, Jorge
- Subjects
WP7 ,strategies, tools, standards for lexicographic resources (objective 3) ,WP2 ,Linguistic Linked Open Data, Standards, Infrastructure ,WP3 ,Linguistic Linked Open Data - Abstract
Past years have seen a growing interest in the application of knowledge graphs and Semantic Web technologies to language resources, and their publication as linked data on the Web. As of today, a large amount of language resources were either converted or created natively as linked data on the basis of data models specifically designed for the representation of linguistic content. Examples are wordnets, dictionaries, corpora, culminating in the emergence of a Linguistic Linked Open Data (LLOD) cloud (http://linguistic-lod.org/). Since its establishment in 2012, the Linked Data in Linguistics (LDL) workshop series has become the major forum for presenting, discussing and disseminating technologies, vocabularies, resources and experiences regarding the application of semantic technologies and the Linked Open Data (LOD) paradigm to language resources in order to facilitate their visibility, accessibility, interoperability, reusability, enrichment, combined evaluation and integration. The LDL workshops contribute to the discussion, dissemination and establishment of community standards that drive this development, most notably the OntoLex-lemon model for lexical resources, as well as standards for other types of language resources still under development. The workshop series is organized by Open Linguistics, founded 2010 as a Working Group of the Open Knowledge Foundation with close involvement of related communities, such as W3C Community Groups, and international research projects. It takes a general focus on LOD-based resources, vocabularies, infrastructures and technologies as means for managing, improving and using language resources on the Web. As technology and resources increasingly converge towards a LOD based ecosystem, this year we particularly encouraged submissions on Linked-Data Aware Tools and Services and Linked Language Resources Infrastructure, i.e. managing, curating and applying LLOD technologies and resources in a reliable and reproducible way for the needs of linguistics, NLP and digital humanities., The Workshop is also part of the dissemination activities of the COST Action CA18209, NexusLinguarum: European network for Web-centred linguistic data science, and ELEXIS.
- Published
- 2020
- Full Text
- View/download PDF
33. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF
- Author
-
Chiarcos, Christian and Ionov, Maxim
- Subjects
000 Computer science, knowledge, general works ,Computer Science ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,ddc:400 - Abstract
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.
- Published
- 2019
- Full Text
- View/download PDF
34. Challenges for the Representations for Morphology in Ontology Lexicons
- Author
-
Klimek, Bettina, McCrae, John P., Ionov, Maxim, Tauber, James K., Chiarcos, Christian, Bosque-Gil, Julia, and Buitelaar, Paul
- Abstract
Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently developed. This papers presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons.
- Published
- 2019
- Full Text
- View/download PDF
35. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF
- Author
-
Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, Ionov, Maxim, Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, and Ionov, Maxim
- Abstract
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.
- Published
- 2019
- Full Text
- View/download PDF
36. Expanding the horizons: adding a new language to the news personalization system
- Author
-
Fedorovsky, Andrey, primary, Ionov, Maxim, additional, Litvinova, Varvara, additional, Olenina, Tatyana, additional, and Trofimova, Darya, additional
- Published
- 2015
- Full Text
- View/download PDF
37. Use of polyamidoamine dendrimers to engineer BDNF-producing human mesenchymal stem cells
- Author
-
Shakhbazau, Antos, primary, Shcharbin, Dzmitry, additional, Seviaryn, Ihar, additional, Goncharova, Natalya, additional, Kosmacheva, Svetlana, additional, Potapnev, Mihail, additional, Gabara, Barbara, additional, Ionov, Maxim, additional, and Bryszewska, Maria, additional
- Published
- 2009
- Full Text
- View/download PDF
38. Fintan - Flexible, integrated transformation and annotation engineering
- Author
-
F��th, Christian, Chiarcos, Christian, Ebbrecht, Bj��rn, and Ionov, Maxim
- Subjects
TSV / CSV ,Corpora ,Dictionaries ,Linked Data ,Ontologies ,Lexical Data ,CoNLL ,ddc:400 ,NLP ,Graphs ,Semantic Web ,RDF - Abstract
We introduce the Flexible and Integrated Transformation and Annotation eNgeneering (Fintan) platform for converting heterogeneous linguistic resources to RDF. With its modular architecture, workflow management and visualization features, Fintan facilitates the development of complex transformation pipelines by integrating generic RDF converters and augmenting them with extended graph processing capabilities: Existing converters can be easily deployed to the system by means of an ontological data structure which renders their properties and the dependencies between transformation steps. Development of subsequent graph transformation steps for resource transformation, annotation engineering or entity linking is further facilitated by a novel visual rendering of SPARQL queries. A graphical workflow manager allows to easily manage the converter modules and combine them to new transformation pipelines. Employing the stream-based graph processing approach first implemented with CoNLL-RDF, we address common challenges and scalability issues when transforming resources and showcase the performance of Fintan by means of a purely graph-based transformation of the Universal Morphology data to RDF.
39. Proceedings of the LREC 2020 7th Workshop on Linked Data in Linguistics
- Author
-
'Ionov, Maxim
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.