Author: "Ionov, Maxim" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ionov, Maxim"' showing total 39 results

Start Over Author "Ionov, Maxim"

39 results on '"Ionov, Maxim"'

1. cqp4rdf: Towards a Suite for RDF-Based Corpus Linguistics

Author: Ionov, Maxim, Stein, Florian, Sehgal, Sagar, Chiarcos, Christian, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Harth, Andreas, editor, Presutti, Valentina, editor, Troncy, Raphaël, editor, Acosta, Maribel, editor, Polleres, Axel, editor, Fernández, Javier D., editor, Xavier Parreira, Josiane, editor, Hartig, Olaf, editor, Hose, Katja, editor, and Cochez, Michael, editor
Published: 2020
Full Text: View/download PDF

2. Querying the Lexicon der indogermanischen Verben in the LiLa Knowledge Base: Two Use Cases

Author: Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Boano Valeria, Irene, Passarotti, Marco Carlo, Ginevra, Riccardo, Passarotti Marco (ORCID:0000-0002-9806-7187), Ginevra Riccardo (ORCID:0000-0002-6731-6494), Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Boano Valeria, Irene, Passarotti, Marco Carlo, Ginevra, Riccardo, Passarotti Marco (ORCID:0000-0002-9806-7187), and Ginevra Riccardo (ORCID:0000-0002-6731-6494)
Abstract: This paper presents two use cases of the etymological data provided by the *Lexicon der indogermanischen Verben* (LIV) after their publication as Linked Open Data and their linking to the LiLa Knowledge Base (KB) of interoperable linguistic resources for Latin. The first part of the paper briefly describes the LiLa KB and its structure. Then, the LIV and the information it contains are introduced, followed by a short description of the ontologies and the extensions used for modelling the LIV{'}s data and interlinking them to the LiLa ecosystem. The last section details the two use cases. The first case concerns the inflection types of the Latin verbs that reflect Proto-Indo-European stems, while the second one focusses on the Latin derivatives of the inherited stems. The results of the investigations are put in relation to current research topics in Historical Linguistics, demonstrating their relevance to the discipline.
Published: 2024

3. The MOLOR Lemma Bank: a New LLOD Resource for Old Irish

Author: Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Fransen, Theodoru, Anderson, Cormac, Beniamine, Sacha, Passarotti, Marco Carlo, Fransen Theodorus (ORCID:0000-0001-5639-8626), Passarotti Marco (ORCID:0000-0002-9806-7187), Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Fransen, Theodoru, Anderson, Cormac, Beniamine, Sacha, Passarotti, Marco Carlo, Fransen Theodorus (ORCID:0000-0001-5639-8626), and Passarotti Marco (ORCID:0000-0002-9806-7187)
Abstract: This paper describes the first steps in creating a Lemma Bank for Old Irish (600-900CE) within the Linked Data paradigm, taking inspiration from a similar resource for Latin built as part of the LiLa project (2018{--}2023). The focus is on the extraction and RDF conversion of nouns from Goidelex, a novel and highly structured morphological resource for Old Irish. The aim is to strike a good balance between retaining a representative level of morphological granularity and at the same time keeping the amount of lemma variants within workable limits, to facilitate straightforward resource interlinking for Old Irish, planned as future work.
Published: 2024

4. The Services of the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin

Author: Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Passarotti Marco (ORCID:0000-0002-9806-7187), Mambrini Francesco (ORCID:0000-0003-0834-7562), Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Khan, Fahad, McCrae, John P., Montiel-Ponsoda, Elena, Martín-Chozas, Patricia, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Passarotti Marco (ORCID:0000-0002-9806-7187), and Mambrini Francesco (ORCID:0000-0003-0834-7562)
Abstract: This paper describes three online services designed to ease the tasks of querying and populating the linguistic resources for Latin made interoperable through their publication as Linked Open Data in the LiLa Knowledge Base. As for querying the KB, we present an interface to search the collection of lemmas that represents the core of the Knowledge Base, and an interactive, graphical platform to run queries on the resources currently interlinked. As for populating the KB with new textual resources, we describe a tool that performs automatic tokenization, lemmatization and Part-of-Speech tagging of a raw text in Latin and links its tokens to LiLa.
Published: 2024

5. cqp4rdf: Towards a Suite for RDF-Based Corpus Linguistics

Author: Ionov, Maxim, primary, Stein, Florian, additional, Sehgal, Sagar, additional, and Chiarcos, Christian, additional
Published: 2020
Full Text: View/download PDF

6. LLODifying Linguistic Glosses

Author: Chiarcos, Christian, Ionov, Maxim, Rind-Pawlowski, Monika, Fäth, Christian, Schreur, Jesse Wichers, Nevskaya, Irina, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gracia, Jorge, editor, Bond, Francis, editor, McCrae, John P., editor, Buitelaar, Paul, editor, Chiarcos, Christian, editor, and Hellmann, Sebastian, editor
Published: 2017
Full Text: View/download PDF

7. Universal Morphology zwischen Sprachtechnologie und Sprachwissenschaft: Sprachressourcen für Kaukasussprachen

Author: Chiarcos, Christian, Donandt, Kathrin, Ionov, Maxim, Rind-Pawlowski, Monika, Sargsian, Hasmik, Wichers Schreur, Jesse, Vogeler, Georg, and Helling, Patrick
Subjects: Universal Morphology, Morphologie(generierung), DHd2018, Sprachdokumentation, Community-Standards, ddc:004, Schnittstellenprobleme (NLP vs. Sprachwissenschaft)
Abstract: A single abstract from the DHd-2018 Book of Abstracts., Sofern eine editorische Arbeit an dieser Publikation stattgefunden hat, dann bestand diese aus der Eliminierung von Bindestrichen in Überschriften, die aufgrund fehlerhafter Silbentrennung entstanden sind, der Vereinheitlichung von Namen der Autor*innen in das Schema "Nachname, Vorname" und/oder der Trennung von Überschrift und Unterüberschrift durch die Setzung eines Punktes, sofern notwendig., {"references":["https://doi.org/10.5281/zenodo.3684897","https://github.com/DHd-Verband/DHd-Abstracts-2018"]}
Published: 2023

8. Linking the Tower of Babel: modelling a massive set of etymological dictionaries as RDF

Author: Abromeit, Frank, Chiarcos, Christian, Fäth, Christian, and Ionov, Maxim
Published: 2023

9. Etymology meets linked data: a case study in Turkic [Abstract]

Author: Chiarcos, Christian, Abromeit, Frank, Fäth, Christian, and Ionov, Maxim
Published: 2023

10. Embeddings for the lexicon: modelling and representation

Author: Chiarcos, Christian, Declerck, Thierry, and Ionov, Maxim
Subjects: ddc:004
Published: 2023

11. Querying a dozen corpora and a thousand years with Fintan

Author: Chiarcos, Christian, Fäth, Christian, and Ionov, Maxim
Subjects: ddc:400
Abstract: Large-scale diachronic corpus studies covering longer time periods are difficult if more than one corpus are to be consulted and, as a result, different formats and annotation schemas need to be processed and queried in a uniform, comparable and replicable manner. We describes the application of the Flexible Integrated Transformation and Annotation eNgineering (Fintan) platform for studying word order in German using syntactically annotated corpora that represent its entire written history. Focusing on nominal dative and accusative arguments, this study hints at two major phases in the development of scrambling in modern German. Against more recent assumptions, it supports the traditional view that word order flexibility decreased over time, but it also indicates that this was a relatively sharp transition in Early New High German. The successful case study demonstrates the potential of Fintan and the underlying LLOD technology for historical linguistics, linguistic typology and corpus linguistics. The technological contribution of this paper is to demonstrate the applicability of Fintan for querying across heterogeneously annotated corpora, as previously, it had only been applied for transformation tasks. With its focus on quantitative analysis, Fintan is a natural complement for existing multi-layer technologies that focus on query and exploration.
Published: 2023

12. Balancing the digital presence of languages in and for technological development: a policy brief on the inclusion of data of under-resourced languages into the linked data cloud

Author: Bosque-Gil, Julia, Mititelu, Verginica Barbu, Oliveira, Hugo Gonçalo, Ionov, Maxim, Gracia, Jorge, Rychkova, Liudmila, Valunaite Oleskeviciene, Giedre, Chiarcos, Christian, Declerck, Thierry, and Dojchinovsk, M.
Published: 2023

13. An ontology for CoNLL-RDF: formal data structures for TSV formats in language technology

Author: Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, and Fäth, Christian
Subjects: data models, Information systems → Graph-based database models, Computing methodologies → Knowledge representation and reasoning, language technology, Computing methodologies → Language resources, CoNLL-RDF, ontology, ddc:400
Abstract: In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects. CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018]. We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats., OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 20:1-20:14
Published: 2023

14. OntoLex-Morph: morphology for the web of data [Abstract]

Author: Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Anas Fahad, Labropoulou, Penny, Passarotti, Marco, and Pellegrini, Matteo
Published: 2023

15. Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Declerck, Thierry, McCrae, John Philip, Montiel, Elena, Chiarcos, Christian, and Ionov, Maxim
Subjects: ddc:400
Published: 2023

16. Modelling collocations in OntoLex-FrAC

Author: Chiarcos, Christian, Gkirtzou, Katerina, Ionov, Maxim, Kabashi, Besim, Kha, Fahad, and Truică, Ciprian-Octavian
Subjects: ddc:400
Abstract: Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications.
Published: 2023

17. Unifying morphology resources with OntoLex-Morph: a case study in German

Author: Chiarcos, Christian, Fäth, Christian, and Ionov, Maxim
Subjects: ddc:400
Abstract: The OntoLex vocabulary has become a widely used community standard for machine-readable lexical resources on the web. The primary motivation to use OntoLex in favor of tool- or application-specific formalisms is to facilitate interoperability and information integration across different resources. One of its extension that is currently being developed is a module for representing morphology, OntoLex-Morph. In this paper, we show how OntoLex-Morph can be used for the encoding and integration of different types of morphological resources on a unified basis. With German as the example, we demonstrate it for (a) a full-form dictionary with inflection information (Unimorph), (b) a dictionary of base forms and their derivations (UDer), (c) a dictionary of compounds (from GermaNet), and (d) lexicon and inflection rules of a finite-state parser/generator (SMOR/Morphisto). These data are converted to OntoLex-Morph, their linguistic information is consolidated and corresponding lexical entries are linked with each other. The main contribution of this paper is the discussion of the current state of OntoLex-Morph and its validation on different types of real-world resources for a single language. In the longer term, the successful application of OntoLex-Morph to such diverse data, along with the adjustments to the vocabulary observed in the process, will be a means to establish interoperability among morphological resources as well as between them and classical lexical data such as dictionaries, WordNets, or thesauri.
Published: 2023

18. Untangling the Semantic Web: Microdata Use in Russian Video Content Delivery Sites

Author: Kutuzov, Andrey, Ionov, Maxim, Ignatov, Dmitry I., editor, Khachay, Mikhail Yu., editor, Panchenko, Alexander, editor, Konstantinova, Natalia, editor, and Yavorsky, Rostislav E., editor
Published: 2014
Full Text: View/download PDF

19. When linguistics meets web technologies. Recent advances in modelling linguistic linked data

Author: Khan, Anas Fahad, Chiarcos, Christian, Declerck, Thierry, Gifu, Daniela, González-Blanco García, Elena, Gracia, Jorge, Ionov, Maxim, Labropoulou, Penny, Mambrini, Francesco, McCrae, John P., Pagé-Perron, Émilie, Passarotti, Marco, Muñoz, Salvador Ros, and Truică, Ciprian-Octavian
Subjects: FAIR principles, Computer Networks and Communications, Settore L-LIN/01 - GLOTTOLOGIA E LINGUISTICA, Linguistic Linked Data, Semnatic web, ddc:400, Language resources, Linguistic Linked Open Data, Computer Science Applications, Information Systems, Semantic Web
Abstract: This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE andlime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
Published: 2022
Full Text: View/download PDF

20. When linguistics meets web technologies. Recent advances in modelling linguistic linked data

Author: Khan, Anas Fahad, primary, Chiarcos, Christian, additional, Declerck, Thierry, additional, Gifu, Daniela, additional, García, Elena González-Blanco, additional, Gracia, Jorge, additional, Ionov, Maxim, additional, Labropoulou, Penny, additional, Mambrini, Francesco, additional, McCrae, John P., additional, Pagé-Perron, Émilie, additional, Passarotti, Marco, additional, Muñoz, Salvador Ros, additional, and Truică, Ciprian-Octavian, additional
Published: 2022
Full Text: View/download PDF

21. Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin

Author: Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Fantoli, Margherita, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Ruffolo, Paolo, Marco Passarotti (ORCID:0000-0002-9806-7187), Francesco Mambrini (ORCID:0000-0003-0834-7562), Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Fantoli, Margherita, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Ruffolo, Paolo, Marco Passarotti (ORCID:0000-0002-9806-7187), and Francesco Mambrini (ORCID:0000-0003-0834-7562)
Abstract: This paper describes the process of interlinking the 130 Classical Latin texts provided by an annotated corpus developed at the LASLA laboratory with the LiLa Knowledge Base, which makes linguistic resources for Latin interoperable by following the principles of the Linked Data paradigm and making reference to classes and properties of widely adopted ontologies to model the relevant information. After introducing the overall architecture of the LiLa Knowledge Base and the LASLA corpus, the paper details the phases of the process of linking the corpus with the collection of lemmas of LiLa and presents a federated query to exemplify the added value of interoperability of LASLA's texts with other resources for Latin.
Published: 2022

22. Computational Morphology with OntoLex-Morph

Author: Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Fahad, Labropoulou, Penny, Passarotti, Marco Carlo, Pellegrini, Matteo, Passarotti Marco (ORCID:0000-0002-9806-7187), Pellegrini Matteo (ORCID:0000-0003-4378-5824), Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Fahad, Labropoulou, Penny, Passarotti, Marco Carlo, Pellegrini, Matteo, Passarotti Marco (ORCID:0000-0002-9806-7187), and Pellegrini Matteo (ORCID:0000-0003-4378-5824)
Abstract: This paper describes the current status of the emerging OntoLex module for linguistic morphology. It serves as an update to the previous version of the vocabulary (Klimek et al. 2019). Whereas this earlier model was exclusively focusing on descriptive morphology and focused on applications in lexicography, we now present a novel part and a novel application of the vocabulary to applications in language technology, i.e., the rule-based generation of lexicons, introducing a dynamic component into OntoLex.
Published: 2022

23. D3.3 Language Resource Transformation Software

Author: F��th, Christian, Ionov, Maxim, and Chiarcos, Christian
Abstract: Within the Prêt-à-LLOD project, five major challenges when working with linguistic resources are addressed (cf. Fig. 1): ● Discovery of resources ● Data management and licensing ● Transformation of heterogeneous resources ● Interlinking resources ● Embedding resources and algorithms into complex workflows This short report accompanies the Prêt-à-LLOD software deliverable D3.3 1 “Resource Transformation Software”. It is meant to provide a quick overview of the motivation, software architecture and its basic functionalities. It also serves as a pointer to the repositories, where both the code and more detailed user guidelines are available.
Published: 2021
Full Text: View/download PDF

24. Linking Discourse Marker Inventories

Author: Chiarcos, Christian and Ionov, Maxim
Subjects: Information systems → Graph-based database models, OntoLex, Computing methodologies → Discourse, dialogue and pragmatics, OLiA, linked data, ddc:400, discourse markers, discourse processing
Abstract: The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data., OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 40:1-40:15
Published: 2021
Full Text: View/download PDF

25. APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text

Author: Ionov, Maxim
Subjects: Information systems → Graph-based database models, Computing methodologies → Knowledge representation and reasoning, interlinear glossed text (IGT), Computing methodologies → Language resources, Linguistic Linked Open Data (LLOD), less-resourced languages in the (multilingual) Semantic Web, data modeling
Abstract: This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation., OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 27:1-27:8
Published: 2021
Full Text: View/download PDF

26. An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology

Author: Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, Chiarcos, Christian, Ionov, Maxim, and Glaser, Luis
Published: 2021
Full Text: View/download PDF

27. An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology

Author: Christian Chiarcos and Maxim Ionov and Luis Glaser and Christian Fäth, Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, Fäth, Christian, Christian Chiarcos and Maxim Ionov and Luis Glaser and Christian Fäth, Chiarcos, Christian, Ionov, Maxim, Glaser, Luis, and Fäth, Christian
Abstract: In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects. CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018]. We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats.
Published: 2021
Full Text: View/download PDF

28. APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text

Author: Maxim Ionov, Ionov, Maxim, Maxim Ionov, and Ionov, Maxim
Abstract: This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation.
Published: 2021
Full Text: View/download PDF

29. Linking Discourse Marker Inventories

Author: Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, Ionov, Maxim, Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, and Ionov, Maxim
Abstract: The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data.
Published: 2021
Full Text: View/download PDF

30. Use of polyamidoamine dendrimers to engineer BDNF-producing human mesenchymal stem cells

Author: Shakhbazau, Antos, Shcharbin, Dzmitry, Seviaryn, Ihar, Goncharova, Natalya, Kosmacheva, Svetlana, Potapnev, Mihail, Gabara, Barbara, Ionov, Maxim, and Bryszewska, Maria
Published: 2010
Full Text: View/download PDF

31. D5.1 Report on Vocabularies for Interoperable Language Resources and Services

Author: Chiarcos, Christian, Cimiano, Philipp, Bosque-Gil, Julia, Declerck, Thierry, F��th, Christian, Gracia, Jorge, Ionov, Maxim, McCrae, John P., Montiel-Ponsoda, Elena, Pia di Buono, Maria, Saur��, Roser, Bobillo, Fernando, and Elahi, Mohammad Fazleh
Abstract: This document provides a survey over vocabularies for language resources and services and sketch necessary extensions and the expected contribution of the Prêt-à-LLOD project to their further development for phenomena currently not sufficiently covered. Future updates with respect to this will be documented within Task 5.4. We focus on three main aspects of linguistically analyzed data 1. lexical-conceptual resources, i.e., repositories of terminology, lexical data, translation, and semantics, 2. linguistically annotated data, concerning linguistic analysis of textual or transcribed data, and 3. language resource terminology, i.e., linguistic data categories and metadata For these areas, we describe representative vocabularies from the Linguistic Linked Open Data community (RDF-based vocabularies) as well as other approaches (e.g., ISO TC37 standards), we identify a number of gaps, and we describe ongoing efforts to address these gaps within the Prêt-à-LLOD project.
Published: 2020
Full Text: View/download PDF

32. Proceedings of the LREC 2020 7th Workshop on Linked Data in Linguistics

Author: Ionov, Maxim, McCrae, John, Chiarcos, Christian, Declerck, Thierry, Bosque-Gil, Julia, and Gracia, Jorge
Subjects: WP7, strategies, tools, standards for lexicographic resources (objective 3), WP2, Linguistic Linked Open Data, Standards, Infrastructure, WP3, Linguistic Linked Open Data
Abstract: Past years have seen a growing interest in the application of knowledge graphs and Semantic Web technologies to language resources, and their publication as linked data on the Web. As of today, a large amount of language resources were either converted or created natively as linked data on the basis of data models specifically designed for the representation of linguistic content. Examples are wordnets, dictionaries, corpora, culminating in the emergence of a Linguistic Linked Open Data (LLOD) cloud (http://linguistic-lod.org/). Since its establishment in 2012, the Linked Data in Linguistics (LDL) workshop series has become the major forum for presenting, discussing and disseminating technologies, vocabularies, resources and experiences regarding the application of semantic technologies and the Linked Open Data (LOD) paradigm to language resources in order to facilitate their visibility, accessibility, interoperability, reusability, enrichment, combined evaluation and integration. The LDL workshops contribute to the discussion, dissemination and establishment of community standards that drive this development, most notably the OntoLex-lemon model for lexical resources, as well as standards for other types of language resources still under development. The workshop series is organized by Open Linguistics, founded 2010 as a Working Group of the Open Knowledge Foundation with close involvement of related communities, such as W3C Community Groups, and international research projects. It takes a general focus on LOD-based resources, vocabularies, infrastructures and technologies as means for managing, improving and using language resources on the Web. As technology and resources increasingly converge towards a LOD based ecosystem, this year we particularly encouraged submissions on Linked-Data Aware Tools and Services and Linked Language Resources Infrastructure, i.e. managing, curating and applying LLOD technologies and resources in a reliable and reproducible way for the needs of linguistics, NLP and digital humanities., The Workshop is also part of the dissemination activities of the COST Action CA18209, NexusLinguarum: European network for Web-centred linguistic data science, and ELEXIS.
Published: 2020
Full Text: View/download PDF

33. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

Author: Chiarcos, Christian and Ionov, Maxim
Subjects: 000 Computer science, knowledge, general works, Computer Science, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 02 engineering and technology, ddc:400
Abstract: The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.
Published: 2019
Full Text: View/download PDF

34. Challenges for the Representations for Morphology in Ontology Lexicons

Author: Klimek, Bettina, McCrae, John P., Ionov, Maxim, Tauber, James K., Chiarcos, Christian, Bosque-Gil, Julia, and Buitelaar, Paul
Abstract: Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent lexical data on the Web. However, traditional dictionaries contain a considerable amount of morphological information which is not straightforwardly representable as LLD within the current model. In order to fill this gap a new Morphology Module of OntoLex-lemon is currently developed. This papers presents the results of this model as on-going work as well as the underlying challenges that emerged during the module development. Based on the MMoOn Core ontology, it aims to account for a wide range of morphological information, ranging from endings to derive whole paradigms to the decomposition and generation of lexical entries which is in compliance to other OntoLex-lemon modules and facilitates the encoding of complex morphological data in ontology lexicons.
Published: 2019
Full Text: View/download PDF

35. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

Author: Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, Ionov, Maxim, Christian Chiarcos and Maxim Ionov, Chiarcos, Christian, and Ionov, Maxim
Abstract: The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.
Published: 2019
Full Text: View/download PDF

36. Expanding the horizons: adding a new language to the news personalization system

Author: Fedorovsky, Andrey, primary, Ionov, Maxim, additional, Litvinova, Varvara, additional, Olenina, Tatyana, additional, and Trofimova, Darya, additional
Published: 2015
Full Text: View/download PDF

37. Use of polyamidoamine dendrimers to engineer BDNF-producing human mesenchymal stem cells

Author: Shakhbazau, Antos, primary, Shcharbin, Dzmitry, additional, Seviaryn, Ihar, additional, Goncharova, Natalya, additional, Kosmacheva, Svetlana, additional, Potapnev, Mihail, additional, Gabara, Barbara, additional, Ionov, Maxim, additional, and Bryszewska, Maria, additional
Published: 2009
Full Text: View/download PDF

38. Fintan - Flexible, integrated transformation and annotation engineering

Author: F��th, Christian, Chiarcos, Christian, Ebbrecht, Bj��rn, and Ionov, Maxim
Subjects: TSV / CSV, Corpora, Dictionaries, Linked Data, Ontologies, Lexical Data, CoNLL, ddc:400, NLP, Graphs, Semantic Web, RDF
Abstract: We introduce the Flexible and Integrated Transformation and Annotation eNgeneering (Fintan) platform for converting heterogeneous linguistic resources to RDF. With its modular architecture, workflow management and visualization features, Fintan facilitates the development of complex transformation pipelines by integrating generic RDF converters and augmenting them with extended graph processing capabilities: Existing converters can be easily deployed to the system by means of an ontological data structure which renders their properties and the dependencies between transformation steps. Development of subsequent graph transformation steps for resource transformation, annotation engineering or entity linking is further facilitated by a novel visual rendering of SPARQL queries. A graphical workflow manager allows to easily manage the converter modules and combine them to new transformation pipelines. Employing the stream-based graph processing approach first implemented with CoNLL-RDF, we address common challenges and scalability issues when transforming resources and showcase the performance of Fintan by means of a purely graph-based transformation of the Universal Morphology data to RDF.

39. Proceedings of the LREC 2020 7th Workshop on Linked Data in Linguistics

Author: 'Ionov, Maxim

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

39 results on '"Ionov, Maxim"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources