Author: "Declerck, Thierry" / Publication Year Range: Last 10 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Declerck, Thierry"' showing total 333 results

Start Over Author "Declerck, Thierry" Publication Year Range Last 10 years

333 results on '"Declerck, Thierry"'

1. What's the Meaning of Superhuman Performance in Today's NLU?

Author: Tedeschi, Simone, Bos, Johan, Declerck, Thierry, Hajic, Jan, Hershcovich, Daniel, Hovy, Eduard H., Koller, Alexander, Krek, Simon, Schockaert, Steven, Sennrich, Rico, Shutova, Ekaterina, and Navigli, Roberto
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks., Comment: 9 pages, long paper at ACL 2023 proceedings
Published: 2023

2. Representing terminological data in the Semantic Web

Author: Martín-Chozas, Patricia, primary, Declerck, Thierry, additional, Montiel-Ponsoda, Elena, additional, and Rodríguez-Doncel, Víctor, additional
Published: 2024
Full Text: View/download PDF

3. Ontological Modelling of Rumors

Author: Declerck, Thierry, Osenova, Petya, Georgiev, Georgi, Lendvai, Piroska, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Liu, Ting, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Trandabăţ, Diana, editor, and Gîfu, Daniela, editor
Published: 2016
Full Text: View/download PDF

4. The Generation of a Corpus for Clinical Sentiment Analysis

Author: Deng, Yihan, Declerck, Thierry, Lendvai, Piroska, Denecke, Kerstin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Sack, Harald, editor, Rizzo, Giuseppe, editor, Steinmetz, Nadine, editor, Mladenić, Dunja, editor, Auer, Sören, editor, and Lange, Christoph, editor
Published: 2016
Full Text: View/download PDF

5. Language Technologies for the Challenges of the Digital Age: 27th International Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceedings

Author: Rehm, Georg and Declerck, Thierry
Subjects: Technology & Engineering, Education, Language Arts & Disciplines
Abstract: semantics; artificial intelligence; natural language processing systems; natural language processing; NLP; machine learning; social networking; named entities; support vector machines; SVM
Published: 2018

6. A uniform RDF-based Representation of the Interlinking of Wordnets and Sign Language Data

Author: Carvalho, Sara, Khan, Anas Fahad, Anić, Ana Ostroški, Spahiu, Blerina, Gracia, Jorge, McCrae, John P., Gromann, Dagmar, Heinisch, Barbara, Salgado, Ana, Declerck, Thierry, Bigeard, Sam, Callus, Dorianne, Matthews, Benjamin, Olsen, Sussi, Xuereb, Loran Ripard, Carvalho, Sara, Khan, Anas Fahad, Anić, Ana Ostroški, Spahiu, Blerina, Gracia, Jorge, McCrae, John P., Gromann, Dagmar, Heinisch, Barbara, Salgado, Ana, Declerck, Thierry, Bigeard, Sam, Callus, Dorianne, Matthews, Benjamin, Olsen, Sussi, and Xuereb, Loran Ripard
Published: 2023

7. A Linked Data Approach for linking and aligning Sign Language and Spoken Language Data

Author: Declerck, Thierry, Bigeard, Sam, Khan, Anas Fahad, Murtagh, Irene, Olsen, Sussi, Rosner, Michael, Schuurman, Ineke, Tchechmedjiev, Andon, Way, Andy, Declerck, Thierry, Bigeard, Sam, Khan, Anas Fahad, Murtagh, Irene, Olsen, Sussi, Rosner, Michael, Schuurman, Ineke, Tchechmedjiev, Andon, and Way, Andy
Published: 2023

8. Towards an RDF Representation of the Infrastructure consisting in using Wordnets as a conceptual Interlingua between multilingual Sign Language Datasets

Author: Rigau, German, Bond, Francis, Rademaker, Alexandre, Declerck, Thierry, Troelsgaard, Thomas, Olsen, Sussi, Rigau, German, Bond, Francis, Rademaker, Alexandre, Declerck, Thierry, Troelsgaard, Thomas, and Olsen, Sussi
Published: 2023

9. Linked Open Data compliant Representation of the Interlinking of Nordic Wordnets and Sign Language Data

Author: Ilinykh, Nikolai, Morger, Felix, Dannélls, Dana, Dobnik, Simon, Megyesi, Beáta, Nivre, Joakim, Declerck, Thierry, Olsen, Sussi, Ilinykh, Nikolai, Morger, Felix, Dannélls, Dana, Dobnik, Simon, Megyesi, Beáta, Nivre, Joakim, Declerck, Thierry, and Olsen, Sussi
Published: 2023

10. Proceedings of the 5th Workshop on Linked Data in Linguistics: Managing, Building and Using Linked Language Resources (LDL-2016), 24 May 2016, Portorož, Slovenia

Author: McCrae, John P., Chiarcos, Christian, Montiel Ponsoda, Elena, Declerck, Thierry, Osenova, Petya, and Hellmann, Sebastian
Subjects: ddc:004
Published: 2023

11. Recent developments for the linguistic linked open data infrastructure

Author: Declerck, Thierry, McCrae, John, Hartung, Matthias, Gracia, Jorge, Chiarcos, Christian, Montiel, Elena, Cimiano, Philipp, Revenko, Artem, Lee, Deidre, Racioppa, Stefania, Nasir, Jamal, Orlikowski, Matthias, Lanau-Coronas, Marta, Fäth, Christian, Rico, Mariano, Elahi, Mohammad Fazleh, Khvalchik, Maria, Sauri, Roser, Gonzalez, Meritxell, and Katharine Cooney
Subjects: Standards, Infrastructure, strategies, tools, standards for lexicographic resources (objective 3), WP2, Linguistic Linked Open Data, Standards, Infrastructure, ddc:400, Linguistic Linked Open Data
Abstract: In this paper we describe the contributions made by the European H2020 project “Prêt-à-LLOD” (‘Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors’) to the further development of the Linguistic Linked Open Data (LLOD) infrastructure. Prêt-à-LLOD aims to develop a new methodology for building data value chains applicable to a wide range of sectors and applications and based around language resources and language technologies that can be integrated by means of semantic technologies. We describe the methods implemented for increasing the number of language data sets in the LLOD. We also present the approach for ensuring interoperability and for porting LLOD data sets and services to other infrastructures, as well as the contribution of the projects to existing standards., This paper ist partially based upon work from COST Action CA18209 - NexusLinguarum "European Network for Web-centred Linguistic Data Science", supported by COST (European Cooperation in Science and Technology).
Published: 2023

12. Embeddings for the lexicon: modelling and representation

Author: Chiarcos, Christian, Declerck, Thierry, and Ionov, Maxim
Subjects: ddc:004
Published: 2023

13. Balancing the digital presence of languages in and for technological development: a policy brief on the inclusion of data of under-resourced languages into the linked data cloud

Author: Bosque-Gil, Julia, Mititelu, Verginica Barbu, Oliveira, Hugo Gonçalo, Ionov, Maxim, Gracia, Jorge, Rychkova, Liudmila, Valunaite Oleskeviciene, Giedre, Chiarcos, Christian, Declerck, Thierry, and Dojchinovsk, M.
Published: 2023

14. A survey of guidelines and best practices for the generation, interlinking, publication, and validation of linguistic linked data

Author: Khan, Fahad, Chiarcos, Christian, Declerck, Thierry, Di Buono, Maria Pia, Dojchinovski, Milan, Gracia, Jorge, Valunaite Oleskeviciene, Giedre, and Gifu, Daniela
Subjects: ddc:400
Abstract: This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.
Published: 2023

15. Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference (LREC2022), 20-25 June 2022, Marseille, France

Author: Declerck, Thierry, McCrae, John Philip, Montiel, Elena, Chiarcos, Christian, and Ionov, Maxim
Subjects: ddc:400
Published: 2023

16. Enriching Multiword Terms in Wiktionary with Pronunciation Information

Author: Bajcetic, Lenka, primary, Declerck, Thierry, additional, and Sérasset, Gilles, additional
Published: 2023
Full Text: View/download PDF

17. FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN

Author: Agarwal, Milind, primary, Agrawal, Sweta, additional, Anastasopoulos, Antonios, additional, Bentivogli, Luisa, additional, Bojar, Ondřej, additional, Borg, Claudia, additional, Carpuat, Marine, additional, Cattoni, Roldano, additional, Cettolo, Mauro, additional, Chen, Mingda, additional, Chen, William, additional, Choukri, Khalid, additional, Chronopoulou, Alexandra, additional, Currey, Anna, additional, Declerck, Thierry, additional, Dong, Qianqian, additional, Duh, Kevin, additional, Estève, Yannick, additional, Federico, Marcello, additional, Gahbiche, Souhir, additional, Haddow, Barry, additional, Hsu, Benjamin, additional, Mon Htut, Phu, additional, Inaguma, Hirofumi, additional, Javorský, Dávid, additional, Judge, John, additional, Kano, Yasumasa, additional, Ko, Tom, additional, Kumar, Rishu, additional, Li, Pengwei, additional, Ma, Xutai, additional, Mathur, Prashant, additional, Matusov, Evgeny, additional, McNamee, Paul, additional, P. McCrae, John, additional, Murray, Kenton, additional, Nadejde, Maria, additional, Nakamura, Satoshi, additional, Negri, Matteo, additional, Nguyen, Ha, additional, Niehues, Jan, additional, Niu, Xing, additional, Kr. Ojha, Atul, additional, E. Ortega, John, additional, Pal, Proyag, additional, Pino, Juan, additional, van der Plas, Lonneke, additional, Polák, Peter, additional, Rippeth, Elijah, additional, Salesky, Elizabeth, additional, Shi, Jiatong, additional, Sperber, Matthias, additional, Stüker, Sebastian, additional, Sudoh, Katsuhito, additional, Tang, Yun, additional, Thompson, Brian, additional, Tran, Kevin, additional, Turchi, Marco, additional, Waibel, Alex, additional, Wang, Mingxuan, additional, Watanabe, Shinji, additional, and Zevallos, Rodolfo, additional
Published: 2023
Full Text: View/download PDF

18. What’s the Meaning of Superhuman Performance in Today’s NLU?

Author: Tedeschi, Simone, primary, Bos, Johan, additional, Declerck, Thierry, additional, Hajič, Jan, additional, Hershcovich, Daniel, additional, Hovy, Eduard, additional, Koller, Alexander, additional, Krek, Simon, additional, Schockaert, Steven, additional, Sennrich, Rico, additional, Shutova, Ekaterina, additional, and Navigli, Roberto, additional
Published: 2023
Full Text: View/download PDF

19. Are there just WordNets or also SignNets?

Author: Schuurman, Ineke, Declerck, Thierry, Brosens, Caro, Janssens, Margot, Vandeghinste, Vincent, and Bram Vanroy
Subjects: computational linguistics, lt3, Leuven.ai, sign language, signon, wordnet, signnet, Languages and Literatures
Abstract: For Sign Languages (SLs), can we create a SignNet, like a WordNet for spoken languages: a network of semantic relations between constitutive elements of SLs? We first discuss approaches that link SL data to wordnets, or integrate such elements with some adaptations into the structure of WordNet. Then, we present requirements for a SignNet, which is built on SL data and then linked to WordNet.
Published: 2023

20. When linguistics meets web technologies. Recent advances in modelling linguistic linked data

Author: Khan, Anas Fahad, Chiarcos, Christian, Declerck, Thierry, Gifu, Daniela, González-Blanco García, Elena, Gracia, Jorge, Ionov, Maxim, Labropoulou, Penny, Mambrini, Francesco, McCrae, John P., Pagé-Perron, Émilie, Passarotti, Marco, Muñoz, Salvador Ros, and Truică, Ciprian-Octavian
Subjects: FAIR principles, Computer Networks and Communications, Settore L-LIN/01 - GLOTTOLOGIA E LINGUISTICA, Linguistic Linked Data, Semnatic web, ddc:400, Language resources, Linguistic Linked Open Data, Computer Science Applications, Information Systems, Semantic Web
Abstract: This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE andlime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
Published: 2022
Full Text: View/download PDF

21. UsingWiktionary to Create Specialized Lexical Resources and Datasets

Author: Bajčetić, Lenka and Declerck, Thierry
Subjects: ComputingMethodologies_PATTERNRECOGNITION, Ambiguities, strategies, tools, standards for lexicographic resources (objective 3), WP2, WP4, Wiktionary
Abstract: This paper describes an approach aiming at utilizing Wiktionary data for creating specialized lexical datasets which can be used for enriching other lexical (semantic) resources or for generating datasets that can be used for evaluating or improving NLP tasks, like Word Sense Disambiguation, Word-in-Context challenges, or Sense Linking across lexicons and dictionaries. We have focused on Wiktionary data about pronunciation information in English, and grammatical number and grammatical gender in German.
Published: 2022
Full Text: View/download PDF

22. When linguistics meets web technologies. Recent advances in modelling linguistic linked data

Author: Khan, Anas Fahad, primary, Chiarcos, Christian, additional, Declerck, Thierry, additional, Gifu, Daniela, additional, García, Elena González-Blanco, additional, Gracia, Jorge, additional, Ionov, Maxim, additional, Labropoulou, Penny, additional, Mambrini, Francesco, additional, McCrae, John P., additional, Pagé-Perron, Émilie, additional, Passarotti, Marco, additional, Muñoz, Salvador Ros, additional, and Truică, Ciprian-Octavian, additional
Published: 2022
Full Text: View/download PDF

23. Towards the Profiling of Linked Lexicographic Resources

Author: Bajčetić, Lenka, Yim, Seung-Bin, and Declerck, Thierry
Subjects: ComputerApplications_COMPUTERSINOTHERSYSTEMS, ELEXIS, Lexicographic Profiling, Dictionary evaluation
Abstract: This paper presents Edie: ELEXIS Dictionary Evaluator. Edie is designed to create profiles for lexicographic resources accessible through the ELEXIS platform. These profiles can be used to evaluate and compare lexicographic resources, and in particular they can be used to identify potential data that could be linked.
Published: 2022
Full Text: View/download PDF

24. Towards the Linking of a Sign Language Ontology with Lexical Data

Author: Declerck, Thierry
Subjects: OntoLex-Lemon, Linked Data, Sign Languages
Abstract: We describe our current work for linking a new ontology for representing constitutive elements of Sign Languages with lexical data encoded within the OntoLex-Lemon framework. We first present very briefly the current state of the ontology, and show how transcriptions of signs can be represented in OntoLex-Lemon, in a minimalist manner, before addressing the challenges of linking the elements of the ontology to full lexical descriptions of the spoken languages
Published: 2022
Full Text: View/download PDF

25. Proceedings of the 8thWorkshop on Linked Data in Linguistics (LDL-2022)

Author: Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, and Ionov, Max
Subjects: WP7, cooperation and knowledge exchange (objective 1)
Abstract: This volume documents the Proceedings of the 8th Workshop on Workshop on Linked Data in Linguistics, held on Friday teh 24th of June as part of the LREC 2022 conference (International Conference on Language Resources and Evaluation). Since its inception, the workshop series on Linked Data in Linguistics (LDL) established itself as the main venue for discussing how Linked Open Data (LOD) and semantic web technologies can be used for processing, analysing, publishing, and managing linguistic data. This includes the fields of natural language processing (NLP), language resources (LRs), lexicography and digital humanities (DH), and has been leading to the development of linguistic data science as a new area of study. The LDL workshop series has contributed greatly to the development of the Linguistic Linked Open Data (LLOD) cloud and the development of best practices for publishing and accessing language resources and providing language technology services on the web. Most notably, this includes community standards such as the NLP Interchange Format (NIF), the OntoLex-Lemon model of the W3C Community Group Ontology-Lexica, and numerous domain-specific adaptations and extensions that these models have had an influence on. In addition, there are an increasing number of national, European, and international research projects that build on LLOD technology. These will contribute to its further development and will help ensure the success of this workshop and a high attendance rate. The 10th anniversary edition of the LDL can count on the support of the COST action “NexusLinguarum: European Network for Web-centered Linguistic Data Science”, as well as two Horizon 2020 projects. Firstly, the Prêt-à-LLOD project, which is making linguistic linked open data ready-to-use, and, secondly, the ELEXIS project on building a lexicographic infrastructure. &nbsp
Published: 2022
Full Text: View/download PDF

26. AGILe: The First Lemmatizer for Ancient Greek Inscriptions

Author: de Graaf, Evelien, Stopponi, Silvia, Bos, Jasper, Peels-Matthey, Saskia, Nissim, Malvina, Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, Piperidis, Stelios, Computational Linguistics (CL), Theoretical and Empirical Linguistics (TEL), and Research Centre for Historical Studies (CHS)
Subjects: ancient Greek, lemmatizer, digital classics
Abstract: To facilitate corpus searches by classicists as well as to reduce data sparsity when training models, we focus on the automatic lemmatization of ancient Greek inscriptions, which have not received as much attention in this sense as literary text data has. We show that existing lemmatizers for ancient Greek, trained on literary data, are not performant on epigraphic data, due to major language differences between the two types of texts. We thus train the first inscription-specific lemmatizer achieving above 80% accuracy, and make both the models and the lemmatized data available to the community. We also provide a detailed error analysis highlighting peculiarities of inscriptions which again highlights the importance of a lemmatizer dedicated to inscriptions.
Published: 2022

27. Proceedings of the Language Resources and Evaluation Conference

Author: Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, Piperidis, Stelios, LS OZ Taal en spraaktechnologie, and ILS LLI
Subjects: Artificial Intelligence, Language and Linguistics
Published: 2022

28. Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages

Author: Dhar, Prajit, Bisazza, Arianna, van Noord, Gertjan, Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, Piperidis, Stelios, and Computational Linguistics (CL)
Abstract: The scarcity of parallel data is a major limitation for Neural Machine Translation (NMT) systems, in particular for translation into morphologically rich languages (MRLs). An important way to overcome the lack of parallel data is to leverage target monolingual data, which is typically more abundant and easier to collect. We evaluate a number of techniques to achieve this, ranging from back-translation to random token masking, on the challenging task of translating English into four typologically diverse MRLs, under low-resource settings. Additionally, we introduce Inflection Pre-Training (or PT-Inflect), a novel pre-training objective whereby the NMT system is pre-trained on the task of re-inflecting lemmatized target sentences before being trained on standard source-to-target language translation. We conduct our evaluation on four typologically diverse target MRLs, and find that PT-Inflect surpasses NMT systems trained only on parallel data. While PT-Inflect is outperformed by back-translation overall, combining the two techniques leads to gains in some of the evaluated language pairs.
Published: 2022

29. Introducing Frege to Fillmore: A FrameNet Dataset that Captures both Sense and Reference

Author: Remijnse, Levi, Vossen, Piek, Fokkens, Antske, Titarsolej, Sam, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Language, and Network Institute
Subjects: lexicon, annotation tool, frame semantics, reference, events, SDG 4 - Quality Education
Abstract: This article presents the first output of the Dutch FrameNet annotation tool, which facilitates both referential- and frame-annotations of language-independent corpora. On the referential level, the tool links in-text mentions to structured data, grounding the text in the real world. On the frame level, those same mentions are annotated with respect to their semantic sense. This way of annotating not only generates a rich linguistic dataset that is grounded in real-world event instances, but also guides the annotators in frame identification, resulting in high inter-annotator-agreement and consistent annotations across documents and at discourse level, exceeding traditional sentence level annotations of frame elements. Moreover, the annotation tool features a dynamic lexical lookup that increases the development of a cross-domain FrameNet lexicon.
Published: 2022

30. Efficiently and Thoroughly Anonymizing a Transformer Language Model for Dutch Electronic Health Records: a Two-Step Method

Author: Verkijk, Stella, Vossen, Piek, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Language, and Network Institute
Subjects: SDG 17 - Partnerships for the Goals, Medical Text Data, Anonymization, Language Model
Abstract: Neural Network (NN) architectures are used more and more to model large amounts of data, such as text data available online. Transformer-based NN architectures have shown to be very useful for language modelling. Although many researchers study how such Language Models (LMs) work, not much attention has been paid to the privacy risks of training LMs on large amounts of data and publishing them online. This paper presents a new method for anonymizing a language model by presenting the way in which MedRoBERTa.nl, a Dutch language model for hospital notes, was anonymized. The two step method involves i) automatic anonymization of the training data and ii) semi-automatic anonymization of the LM's vocabulary. Adopting the fill-mask task where the model predicts what tokens are most probable to appear in a certain context, it was tested how often the model will predict a name in a context where a name should be. It was shown that it predicts a name-like token 0.2% of the time. Any name-like token that was predicted was never the name originally presented in the training data. By explaining how a LM trained on highly private real-world medical data can be safely published with open access, we hope that more language resources will be published openly and responsibly so the community can profit from them.
Published: 2022

31. Story Trees: Representing Documents using Topological Persistence

Author: Haghighatkhah, Pantea, Fokkens, Antske, Sommerauer, Pia, Speckmann, Bettina, Verbeek, Kevin, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Language, and Network Institute
Subjects: Semantic Vectors, SDG 4 - Quality Education, Document level discourse, Topical Data Analysis
Abstract: Topological Data Analysis (TDA) focuses on the inherent shape of (spatial) data. As such, it may provide useful methods to explore spatial representations of linguistic data (embeddings) which have become central in NLP. In this paper we aim to introduce TDA to researchers in language technology. We use TDA to represent document structure as so-called story trees. Story trees are hierarchical representations created from semantic vector representations of sentences via persistent homology. They can be used to identify and clearly visualize prominent components of a story line. We showcase their potential by using story trees to create extractive summaries for news stories.
Published: 2022

32. Proceedings of the Language Resources and Evaluation Conference

Author: LS OZ Taal en spraaktechnologie, ILS LLI, Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, Piperidis, Stelios, LS OZ Taal en spraaktechnologie, ILS LLI, Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, and Piperidis, Stelios
Published: 2022

33. The Universal Anaphora Scorer

Author: Sub Natural Language Processing, Natural Language Processing, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Yu, Juntao, Khosla, Sopan, Moosavi, Nafise Sadat, Paun, Silviu, Pradhan, Sameer, Poesio, Massimo, Sub Natural Language Processing, Natural Language Processing, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Yu, Juntao, Khosla, Sopan, Moosavi, Nafise Sadat, Paun, Silviu, Pradhan, Sameer, and Poesio, Massimo
Published: 2022

34. What a Creole Wants, What a Creole Needs

Author: Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Lent, Heather, Ogueji, Kelechi, de Lhoneux, Miryam, Ahia, Orevaoghene, Søgaard, Anders, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios, Lent, Heather, Ogueji, Kelechi, de Lhoneux, Miryam, Ahia, Orevaoghene, and Søgaard, Anders
Abstract: In recent years, the natural language processing (NLP) community has given increased attention to the disparity of efforts directed towards high-resource languages over low-resource ones. Efforts to remedy this delta often begin with translations of existing English datasets into other languages. However, this approach ignores that different language communities have different needs. We consider a group of low-resource languages, Creole languages. Creoles are both largely absent from the NLP literature, and also often ignored by society at large due to stigma, despite these languages having sizable and vibrant communities. We demonstrate, through conversations with Creole experts and surveys of Creole-speaking communities, how the things needed from language technology can change dramatically from one language to another, even when the languages are considered to be very similar to each other, as with Creoles. We discuss the prominent themes arising from these conversations, and ultimately demonstrate that useful language technology cannot be built without involving the relevant community.
Published: 2022

35. Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin

Author: Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Fantoli, Margherita, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Ruffolo, Paolo, Marco Passarotti (ORCID:0000-0002-9806-7187), Francesco Mambrini (ORCID:0000-0003-0834-7562), Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Fantoli, Margherita, Passarotti, Marco Carlo, Mambrini, Francesco, Moretti, Giovanni, Ruffolo, Paolo, Marco Passarotti (ORCID:0000-0002-9806-7187), and Francesco Mambrini (ORCID:0000-0003-0834-7562)
Abstract: This paper describes the process of interlinking the 130 Classical Latin texts provided by an annotated corpus developed at the LASLA laboratory with the LiLa Knowledge Base, which makes linguistic resources for Latin interoperable by following the principles of the Linked Data paradigm and making reference to classes and properties of widely adopted ontologies to model the relevant information. After introducing the overall architecture of the LiLa Knowledge Base and the LASLA corpus, the paper details the phases of the process of linking the corpus with the collection of lemmas of LiLa and presents a federated query to exemplify the added value of interoperability of LASLA's texts with other resources for Latin.
Published: 2022

36. Computational Morphology with OntoLex-Morph

Author: Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Fahad, Labropoulou, Penny, Passarotti, Marco Carlo, Pellegrini, Matteo, Passarotti Marco (ORCID:0000-0002-9806-7187), Pellegrini Matteo (ORCID:0000-0003-4378-5824), Declerck, Thierry, McCrae, John P., Montiel, Elena, Chiarcos, Christian, Ionov, Maxim, Gkirtzou, Katerina, Khan, Fahad, Labropoulou, Penny, Passarotti, Marco Carlo, Pellegrini, Matteo, Passarotti Marco (ORCID:0000-0002-9806-7187), and Pellegrini Matteo (ORCID:0000-0003-4378-5824)
Abstract: This paper describes the current status of the emerging OntoLex module for linguistic morphology. It serves as an update to the previous version of the vocabulary (Klimek et al. 2019). Whereas this earlier model was exclusively focusing on descriptive morphology and focused on applications in lexicography, we now present a novel part and a novel application of the vocabulary to applications in language technology, i.e., the rule-based generation of lexicons, introducing a dynamic component into OntoLex.
Published: 2022

37. The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base

Author: Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, Piperidis, Stelios, Mambrini, Francesco, Passarotti, Marco, Moretti, Giovanni, Pellegrini, Matteo, Mambrini Francesco (ORCID:0000-0003-0834-7562), Passarotti Marco (ORCID:0000-0002-9806-7187), Pellegrini Matteo (ORCID:0000-0003-4378-5824), Calzolari, Nicoletta, Béchet, Frédéric, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Odijk, Jan, Piperidis, Stelios, Mambrini, Francesco, Passarotti, Marco, Moretti, Giovanni, Pellegrini, Matteo, Mambrini Francesco (ORCID:0000-0003-0834-7562), Passarotti Marco (ORCID:0000-0002-9806-7187), and Pellegrini Matteo (ORCID:0000-0003-4378-5824)
Abstract: Although the Universal Dependencies initiative today allows for cross-linguistically consistent annotation of morphology and syntax in treebanks for several languages, syntactically annotated corpora are not yet interoperable with many lexical resources that describe properties of the words that occur therein. In order to cope with such limitation, we propose to adopt the principles of the Linguistic Linked Open Data community, to describe and publish dependency treebanks as LLOD. In particular, this paper illustrates the approach pursued in the LiLa Knowledge Base, which enables interoperability between corpora and lexical resources for Latin, to publish as Linguistic Linked Open Data the annotation layers of two versions of a Medieval Latin treebank (the Index Thomisticus Treebank).
Published: 2022

38. The Generation of a Corpus for Clinical Sentiment Analysis

Author: Deng, Yihan, primary, Declerck, Thierry, additional, Lendvai, Piroska, additional, and Denecke, Kerstin, additional
Published: 2016
Full Text: View/download PDF

39. Integration of sign language lexical data in the OntoLex-Lemon framework

Author: Declerck, Thierry
Subjects: strategies, tools, standards for lexicographic resources (objective 3), OntoLex-Lemon, WP4, Sign Languages, LLOD, 420 Englisch
Abstract: We describe the status of work intending at including sign language lexical data within the OntoLex-Lemon framework. Our general goal is to provide for a multimodal extension to this framework, which was originally conceived for covering only the written and phonetic representation of lexical data. Our aim is to achieve in the longer term the same type of semantic interoperability between sign language lexical data as this is achieved for their spoken or written counterparts. We want also to achieve this goal across modalities: between sign language lexical data and spoken/written lexical data.
Published: 2022
Full Text: View/download PDF

40. SuMe: A Dataset Towards Summarizing Biomedical Mechanisms

Author: Bastan, Mohaddeseh, Shankar, N., Surdeanu, Mihai, Balasubramanian, Niranjan, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, and Piperidis, Stelios
Subjects: Biomedical NLP, Summarization, Text Generation, Explanation Generation, Relation Extraction
Abstract: Can language models read biomedical texts and explain the biomedical mechanisms discussed? In this work we introduce a biomedical mechanism summarization task. Biomedical studies often investigate the mechanisms behind how one entity (e.g., a protein or a chemical) affects another in a biological context. The abstracts of these publications often include a focused set of sentences that present relevant supporting statements regarding such relationships, associated experimental evidence, and a concluding sentence that summarizes the mechanism underlying the relationship. We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism. Using a small amount of manually labeled mechanism sentences, we train a mechanism sentence classifier to filter a large biomedical abstract collection and create a summarization dataset with 22k instances. We also introduce conclusion sentence generation as a pretraining task with 611k instances. We benchmark the performance of large bio-domain language models. We find that while the pretraining task help improves performance, the best model produces acceptable mechanism outputs in only 32% of the instances, which shows the task presents significant challenges in biomedical language understanding and summarization.
Published: 2022

41. Multilingual SynSemClass for the Semantic Web (MSSW)

Author: Fučíková, Eva, Urešová, Zdeňka, Declerck, Thierry, Hajič, Jan, Straková, Jana, and Fernández-Alcaina, Cristina
Abstract: LLOD (Linguistic Linked Open Data) is a generic name for a set of mutually connected language resources, using ontological relations. The connections between concepts and between concepts and their expressions in natural language make them suitable for both research and industrial applications in the area of content analysis, natural language understanding, (language- and knowledge-based) inferencing and other tasks. In the presented task, the concrete work will be on converting the SynSemClass project dataset (in part as a result of a previous Humane AI Net microproject called META-O-NLU) into LLOD, connecting it to the huge amount or interlinked data already available. A partner is involved in the Prêt-à-LLOD H2020 project, making this project synergistic in nature and multiplicative in terms of results in previous projects. Partners are also involved in the COST Action “European network for Web-centered linguistic data science” (NexusLinguarum).
Published: 2022

42. D2.4 Strategic Report on Business Plan Development v2

Author: Baviera, Pierre, Cooney, Katharine, Hartung, Matthias, Martin, Katherine, Thurner, Thomas, and Declerck, Thierry
Abstract: This deliverable presents the second version of the strategic business development plan for Prêt-à-LLOD, developing and reporting the commercial partners’ specific business plans and achievements from version 1 of the report, delivered in month 18. The Prêt-à-LLOD commercial partners are: Semantic Web Company 1 (SWC) Oxford University Press 2 (OUP) Derilinx 3 (DLX) Semalytix 4 (SEM) All Prêt-à-LLOD commercial partners have been involved in the documentation of their respective business development plans and thereby in the creation of this deliverable with the aim of creating a comprehensive picture across the pilot projects of the Prêt- à-LLOD project. Updates to version 1 of the deliverable have been made in the following sections: 1.3.1 Impact on Technology companies 2.3 Business Model Canvas for pilot II – Linking Lexical Knowledge to Facilitate Rapid Integration and Wider Application of Lexicographic Resources for Technology Companies 2.4 BMC for pilot III - Supporting the Development of Public Services in Open Government both within and across borders 3.1 General market analysis 3.3.3 Competitor Overview: Pilot II - Linking Lexical Knowledge to Facilitate Rapid Integration and Wider Application of Lexicographic Resources for Technology Companies - Oxford University Press (OUP) 3.3.4 Market Analysis: Pilot II - Oxford University Press (OUP) 3.3.5 Competitor Overview: Pilot III - Derilinx 3.3.6 Market Analysis: Pilot III - Derilinx 4.1 SWOT Analysis for Prêt-à-LLOD 5 Collaboration with Academic Partners 6 Conclusion
Published: 2021
Full Text: View/download PDF

43. D5.5 Community Building Report

Author: Declerck, Thierry, McCrae, John P., Gracia, Jorge, and Bosque-Gil, Julia
Abstract: This is the second deliverable reporting on activities pursued in the context of Task 5.4. “Community Building”, which is a part of WP5 “Language Resource and Service Sustainability”. While there might be some overlap with activities reported in the first version of this report (D.5.4), the focus is on activities taking place in the time between January 2020 and March 2021.
Published: 2021
Full Text: View/download PDF

44. Covid-19 MLIA Information Extraction Task Round 1 Presentation and Main Findings

Author: Grouin, Cyril, Declerck, Thierry, Zweigenbaum, Pierre, Information, Langue Ecrite et Signée (ILES), Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL), CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence (DFKI), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Sciences et Technologies des Langues (STL), and Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Abstract: International audience; In this paper, we present the information extraction task proposed in the first round of the Covid-19 MLIA @ Eval Initiative. During this first round, we proposed to identify six categories of information potentially relevant for the Covid-19 issue (sign or symptom and disease, medical test, drug and treatment, legal rules, everyday life actions, and findings), on texts available in seven languages (English, French, German, Greek, Italian, Spanish, and Swedish). Since no gold standard annotations were given, the participants reused their existing tools and ressources. Four teams participated in this task.
Published: 2021

45. OASIcs, Volume 93, LDK 2021, Complete Volume

Author: Gromann, Dagmar, Sérasset, Gilles, Declerck, Thierry, McCrae, John P., Gracia, Jorge, Bosque-Gil, Julia, Bobillo, Fernando, and Heinisch, Barbara
Subjects: Data processing Computer science, Computing methodologies → Knowledge representation and reasoning, OASIcs, Volume 93, LDK 2021, Complete Volume, ddc:004, Computing methodologies → Natural language processing
Abstract: OASIcs, Volume 93, LDK 2021, Complete Volume, OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 1-516
Published: 2021
Full Text: View/download PDF

46. Towards the Addition of Pronunciation Information to Lexical Semantic Resources

Author: Declerck, Thierry and Baj��eti��, Lenka
Subjects: strategies, tools, standards for lexicographic resources (objective 3), WP2, lexicographic standards (objective 2)
Abstract: This paper describes ongoing work aiming at adding pronunciation information to lexical semantic resources, with a focus on open wordnets. Our goal is not only to add a new modality to those semantic networks, but also to mark heteronyms listed in them with the pronunciation information associated with their different meanings. This work could contribute in the longer term to the disambiguation of multi-modal resources, which are combining text and speech.
Published: 2021
Full Text: View/download PDF

47. Front Matter, Table of Contents, Preface, Conference Organization

Author: Gromann, Dagmar, Sérasset, Gilles, Declerck, Thierry, McCrae, John P., Gracia, Jorge, Bosque-Gil, Julia, Bobillo, Fernando, and Heinisch, Barbara
Subjects: Conference Organization, Table of Contents, Computing methodologies → Knowledge representation and reasoning, Preface, Computing methodologies → Natural language processing, Front Matter
Abstract: Front Matter, Table of Contents, Preface, Conference Organization, OASIcs, Vol. 93, 3rd Conference on Language, Data and Knowledge (LDK 2021), pages 0:i-0:xvi
Published: 2021
Full Text: View/download PDF

48. Front Matter, Table of Contents, Preface, Conference Organization

Author: Dagmar Gromann and Gilles Sérasset and Thierry Declerck and John P. McCrae and Jorge Gracia and Julia Bosque-Gil and Fernando Bobillo and Barbara Heinisch, Gromann, Dagmar, Sérasset, Gilles, Declerck, Thierry, McCrae, John P., Gracia, Jorge, Bosque-Gil, Julia, Bobillo, Fernando, Heinisch, Barbara, Dagmar Gromann and Gilles Sérasset and Thierry Declerck and John P. McCrae and Jorge Gracia and Julia Bosque-Gil and Fernando Bobillo and Barbara Heinisch, Gromann, Dagmar, Sérasset, Gilles, Declerck, Thierry, McCrae, John P., Gracia, Jorge, Bosque-Gil, Julia, Bobillo, Fernando, and Heinisch, Barbara
Abstract: Front Matter, Table of Contents, Preface, Conference Organization
Published: 2021
Full Text: View/download PDF

49. OASIcs, Volume 93, LDK 2021, Complete Volume

Author: Dagmar Gromann and Gilles Sérasset and Thierry Declerck and John P. McCrae and Jorge Gracia and Julia Bosque-Gil and Fernando Bobillo and Barbara Heinisch, Gromann, Dagmar, Sérasset, Gilles, Declerck, Thierry, McCrae, John P., Gracia, Jorge, Bosque-Gil, Julia, Bobillo, Fernando, Heinisch, Barbara, Dagmar Gromann and Gilles Sérasset and Thierry Declerck and John P. McCrae and Jorge Gracia and Julia Bosque-Gil and Fernando Bobillo and Barbara Heinisch, Gromann, Dagmar, Sérasset, Gilles, Declerck, Thierry, McCrae, John P., Gracia, Jorge, Bosque-Gil, Julia, Bobillo, Fernando, and Heinisch, Barbara
Abstract: OASIcs, Volume 93, LDK 2021, Complete Volume
Published: 2021
Full Text: View/download PDF

50. D5.1 Report on Vocabularies for Interoperable Language Resources and Services

Author: Chiarcos, Christian, Cimiano, Philipp, Bosque-Gil, Julia, Declerck, Thierry, F��th, Christian, Gracia, Jorge, Ionov, Maxim, McCrae, John P., Montiel-Ponsoda, Elena, Pia di Buono, Maria, Saur��, Roser, Bobillo, Fernando, and Elahi, Mohammad Fazleh
Abstract: This document provides a survey over vocabularies for language resources and services and sketch necessary extensions and the expected contribution of the Prêt-à-LLOD project to their further development for phenomena currently not sufficiently covered. Future updates with respect to this will be documented within Task 5.4. We focus on three main aspects of linguistically analyzed data 1. lexical-conceptual resources, i.e., repositories of terminology, lexical data, translation, and semantics, 2. linguistically annotated data, concerning linguistic analysis of textual or transcribed data, and 3. language resource terminology, i.e., linguistic data categories and metadata For these areas, we describe representative vocabularies from the Linguistic Linked Open Data community (RDF-based vocabularies) as well as other approaches (e.g., ISO TC37 standards), we identify a number of gaps, and we describe ongoing efforts to address these gaps within the Prêt-à-LLOD project.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

333 results on '"Declerck, Thierry"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources