13 results on '"Morphological lexicon"'
Search Results
2. Developing Morpho-SLaWS: An API for the Morphosyntactic Annotation of the Serbian Language
- Author
-
Tasovac, Toma, Rudan, Saša, Rudan, Siniša, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Mahlow, Cerstin, editor, and Piotrowski, Michael, editor
- Published
- 2015
- Full Text
- View/download PDF
3. A computational morphological lexicon for Turkish: TrLex.
- Author
-
Aslan, Ozkan, Gunal, Serkan, and Dincer, B. Taner
- Subjects
- *
COMPUTATIONAL linguistics , *LEXICON , *TURKISH language , *STEMMING (Linguistics) , *MORPHOLOGY (Grammar) - Abstract
A morphological lexicon that is a computational source should be considered together with derivational morphology especially for agglutinative languages. To the best of our knowledge, in the Turkish language there has been no study that analyzes the derivational suffixes on the lexicon in a computational paradigm. This study provides a very rich lexical resource, filling a gap in the field, and would hopefully lead to new related studies as well. The morphological lexicon can be used in morphological analysis as well as in several other tasks, such as stemming and part of speech (POS) tagging. In this study, we introduce a morphological lexicon named TrLex and present its components, preparation processes and some statistics. We observed that more than half of the single-word lemmas (56.7%) are in the derived structure. Since the word formation in Turkish prefer the morphological processes, this number is higher than the rate of compound-type words (2.7%). As a result of the work, we obtained a knowledge-intensive data table including several fields such as form, structure, semantic information. We also extracted Lexical Markup Framework (LMF) formatted file containing only morphological and POS information and made the file freely available. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
4. OFrLex: A Computational Morphological and Syntactic Lexicon for Old French
- Author
-
Guibon, Gaël, Sagot, Benoît, Laboratoire de Linguistique Formelle (LLF - UMR7110), Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité), Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), This work was partly funded by the French national ANR grant PROFITEROLE (ANR-16-CE38-0010) headed by Sophie Prévost, as well as by the second author’s chair in the PRAIRIE institute, funded by the French national agency ANR as part of the 'Investissements d’avenir' pro-gramme under the reference ANR-19-P3IA-0001, ANR-16-CE38-0010,PROFITEROLE,Modélisation de l'évolution de la langue à partir de textes d'ancien français instrumentés(2016), ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019), Laboratoire de Linguistique Formelle (LLF UMR7110), and Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP)
- Subjects
Lexicon Enrichment ,Morphological lexicon ,Old French ,Syntactic lexicon ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
Due to COVID19 pandemic, the 12th edition is cancelled. The LREC 2020 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2020/index.htmlThe version 2 of the paper is an updated version with regard to the originally published version (minor corrections).; International audience; In this paper we describe our work on the development and enrichment of OFrLex, a freely available, large-coverage morphological and syntactic Old French lexicon. We rely on several heterogeneous language resources to extract structured and exploitable information. The extraction follows a semi-automatic procedure with substantial manual steps to respond to difficulties encountered while aligning lexical entries from distinct language resources. OFrLex aims at improving natural language processing tasks on Old French such as part-of-speech tagging and dependency parsing. We provide quantitative information on OFrLex and discuss its reliability. We also describe and evaluate a semi-automatic, word-embedding-based lexical enrichment process aimed at increasing the accuracy of the resource. Results of this extension technique will be manually validated in the near future, a step that will take advantage of OFrLex's viewing, searching and editing interface, which is already accessible online.
- Published
- 2020
5. Développement d'un lexique morphologique et syntaxique de l'ancien français
- Author
-
Sagot, Benoît, Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), ANR-16-CE38-0010,PROFITEROLE,Modélisation de l'évolution de la langue à partir de textes d'ancien français instrumentés(2016), Sagot, Benoît, and Modélisation de l'évolution de la langue à partir de textes d'ancien français instrumentés - - PROFITEROLE2016 - ANR-16-CE38-0010 - AAPG2016 - VALID
- Subjects
Lexique syntaxique ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Old French ,Morphological lexicon ,Ancien français ,Syntactic lexicon ,[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Lexique morphologique - Abstract
In this paper we describe our work on the development of a large-scale morphological and syntactic lexicon of Old French for natural language processing. We rely on dictionary and lexical resources, from which the extraction of structured and exploitable information required specific developments. In addition, matching information from these different sources posed difficulties. We provide quantitative information on the resulting lexicon, and discuss its reliability in its current version and the prospects for improvement allowed by the existence of a first version, in particular through the automatic analysis of textual data., Nous décrivons dans cet article notre travail de développement d'un lexique morphologique et syntaxique à grande échelle de l'ancien français pour le traitement automatique des langues. Nous nous sommes appuyés sur des ressources dictionnairiques et lexicales dans lesquelles l'extraction d'informations structurées et exploitables a nécessité des développements spécifiques. De plus, la mise en correspondance d'informations provenant de ces différentes sources a soulevé des difficultés. Nous donnons quelques indications quantitatives sur le lexique obtenu, et discutons de sa fiabilité dans sa version actuelle et des perspectives d'amélioration permises par l'existence d'une première version, notamment au travers de l'analyse automatique de données textuelles.
- Published
- 2019
6. Computerising the lexicon: Modelling, development and use of morphological, syntactic and semantic lexicons
- Author
-
Sagot, Benoît, Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Sorbonne Université, and Ludovic Denoyer
- Subjects
Parsing ,Lexique morphosyntaxique ,Analyse syntaxique ,Morphosyntactic tagging ,Natural language processing ,WordNet ,Morphologie computationnelle ,[SCCO.LING]Cognitive science/Linguistics ,Traitement automatique des langues ,Computational morphology ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Lexique syntaxique ,Morphological lexicon ,Part-of-speech tagging ,Développement de ressources lexicales ,Syntactic lexicon ,Lexical resource development ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,Analyse morphosyntaxique ,Lexicon ,Lexique - Published
- 2018
7. Computerising the lexicon
- Author
-
Sagot, Benoît, Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Sorbonne Université, and Ludovic Denoyer
- Subjects
Parsing ,Lexique morphosyntaxique ,Analyse syntaxique ,Morphosyntactic tagging ,Natural language processing ,WordNet ,Morphologie computationnelle ,[SCCO.LING]Cognitive science/Linguistics ,Traitement automatique des langues ,Computational morphology ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Lexique syntaxique ,Morphological lexicon ,Part-of-speech tagging ,Développement de ressources lexicales ,Syntactic lexicon ,Lexical resource development ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,Analyse morphosyntaxique ,Lexicon ,Lexique - Published
- 2018
8. A computational morphological lexicon for Turkish: TrLex
- Author
-
Ozkan Aslan, Serkan Gunal, B. Taner Dinçer, Anadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, and Günal, Serkan
- Subjects
Agglutinative language ,Morphology ,Linguistics and Language ,Computer science ,Turkish ,Compounding ,02 engineering and technology ,computer.software_genre ,Lexicon ,Language and Linguistics ,Field (computer science) ,Morphological Lexicon ,0202 electrical engineering, electronic engineering, information engineering ,Lexical Markup Framework ,060201 languages & linguistics ,business.industry ,06 humanities and the arts ,Word formation ,Part of speech ,language.human_language ,0602 languages and literature ,Morphological analysis ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Derivation ,business ,computer ,Natural language processing - Abstract
WOS: 000430906100002, A morphological lexicon that is a computational source should be considered together with derivational morphology especially for agglutinative languages. To the best of our knowledge, in the Turkish language there has been no study that analyzes the derivational suffixes on the lexicon in a computational paradigm. This study provides a very rich lexical resource, filling a gap in the field, and would hopefully lead to new related studies as well. The morphological lexicon can be used in morphological analysis as well as in several other tasks, such as stemming and part of speech (POS) tagging. In this study, we introduce a morphological lexicon named TrLex and present its components, preparation processes and some statistics. We observed that more than half of the single-word lemmas (56.7%) are in the derived structure. Since the word formation in Turkish prefer the morphological processes, this number is higher than the rate of compound-type words (2.7%). As a result of the work, we obtained a knowledge-intensive data table including several fields such as form, structure, semantic information. We also extracted Lexical Markup Framework (LMF) formatted file containing only morphological and POS information and made the file freely available, Anadolu University [1410F415], This work was supported by Anadolu University, Fund of Scientific Research Projects [grant number 1410F415].
- Published
- 2018
9. Automatic acquisition of inflectional lexica for morphological normalisation
- Author
-
Jan Šnajder, Marko Tadić, and B. Dalbelo Basic
- Subjects
Computer science ,business.industry ,Lemmatisation ,Speech recognition ,Library and Information Sciences ,Management Science and Operations Research ,Lexicon ,computer.software_genre ,Knowledge acquisition ,Computer Science Applications ,Focus (linguistics) ,Inflection ,Media Technology ,Artificial intelligence ,Morphological normalisation ,morphological lexicon ,lexicon acquisition ,inflection ,Croatian language ,text mining ,information retrieval ,Computational linguistics ,business ,computer ,Word (computer architecture) ,Natural language processing ,Natural language ,Information Systems - Abstract
Due to natural language morphology, words can take on various morphological forms. Morphological normalisation – often used in information retrieval and text mining systems – conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.
- Published
- 2008
- Full Text
- View/download PDF
10. Extending the adverbial coverage of a French morphological lexicon
- Author
-
Tolone, Elsa, Voyatzi, Stavroula, Martineau, Claude, Constant, Mathieu, Equipe d'Informatique Linguistique - Grupo de Procesamiento de Lenguaje Natural (PLN), Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM)-Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM)-Facultad de Matemática, Astronomía y Física [Cordoba] (FaMAF), Universidad Nacional de Córdoba [Argentina]-Universidad Nacional de Córdoba [Argentina], Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Viavoo [Boulogne Billancourt], Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)-Facultad de Matemática, Astronomía y Física [Cordoba] (FaMAF), Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS), and Tolone, Elsa
- Subjects
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,[SCCO.COMP] Cognitive science/Computer science ,Classification ACM - I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing ,morphological lexicon ,[SCCO.COMP]Cognitive science/Computer science ,paraphrase ,[SCCO.LING] Cognitive science/Linguistics ,[SCCO.LING]Cognitive science/Linguistics ,adverb ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] - Abstract
International audience; We present an extension of the adverbial entries of the French morphological lexicon DELA (Dictionnaires Electroniques du LADL / LADL electronic dictionaries). Adverbs were extracted from LGLex, a NLP-oriented syntactic resource for French, which in its turn contains all adverbs extracted from the Lexicon-Grammar tables of both simple adverbs ending in -ment (i.e., '-ly') (Molinier and Levrier, 2000) and compound adverbs (Gross, 1986b; Gross, 1986a). This work exploits fine-grained linguistic information provided in existing resources. The resulting resource is reviewed in order to delete duplicates and is freely available under the LGPL-LR license.
- Published
- 2012
11. Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search
- Author
-
Merkler, Danijela, Agić, Željko, and Tadić, Marko
- Subjects
morphological lexicon ,automatic enlargement ,Croatian language ,automatic enrichment ,large corpora ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL - Abstract
The first version of the Croatian Morphological Lexicon (HML) was developed as early as 1994 and was utilized in the implementation of various experiments and systems dealing with Croatian. Since the HML is frequently used both as a stand-alone application and as a module in many other systems for processing Croatian, the lexicon is constantly being updated to newer versions by manual inserting unknown wordforms (i.e. the corresponding 3- tuples of lemmas, wordforms and morphosyntactic tags) in batches. Current version of HML cosists of 110.000 lemmas and more than 4.000.000 lexicon entries. Due to limitations in availability of expert human annotators and various other constraints, the process of manual inspection, lemma assingment and inflectional pattern selection for unknown wordforms is a rather slow procedure. Accordingly, in this paper, we propose a generic approach to (semi-)automatic generation of new candidate lemmas for HML, their verification, assingment of inflectional patterns and finally creation and insertion of new lexicon entries to HML in a single processing pipeline.
- Published
- 2012
12. Generating a Morphological Lexicon of Organization Entity Names
- Author
-
Ljubešić, Nikola, Lauc, Tomislava, Boras, Damir, and Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Daniel Tapias
- Subjects
morphological lexicon ,lexicon generation ,organization entity names ,linear successive abstraction - Abstract
This paper describes methods used for generating a morphological lexicon of organization entity names in Croatian. This resource is intended for two primary tasks: template-based natural language generation and named entity identification. The main problems concerning the lexicon generation are high level of inflection in Croatian and low linguistic quality of the primary resource containing named entities in normal form. The problem is divided into two subproblems concerning single- word and multi-word expressions. The single-word problem is solved by training a supervised learning algorithm called linear successive abstraction. With existing common language morphological resources and two simple hand-crafted rules backing up the algorithm, accuracy of 98.70% on the test set is achieved. The multi-word problem is solved through a semi- automated process for multi-word entities occurring in the first 10, 000 named entities. The generated multi-word lexicon will be used for natural language generation only while named entity identification will be solved algorithmically in forthcoming research. The single-word lexicon is capable of handling both tasks.
- Published
- 2008
13. Generation of verbal stems in derivationally rich language
- Author
-
Sojat, K., Preradovic, N. M., Marko Tadić, Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry, Ugur Dogan, Mehmet, Maegaard, Bente, Mariani, Joseph, Odijk, Jan, and Piperidis, Stelios
- Subjects
computational morphology ,language generation ,morphological lexicon ,valency lexicon ,Croatian language - Abstract
The paper presents a procedure for generating prefixed verbs in Croatian comprising combinations of one, two or three prefixes. The result of this generation process is a pool of derivationally valid prefixed verbs, although not necessarily occuring in corpora. The statistics of occurences of generated verbs in Croatian National Corpus has been calculated. Further usage of such language resource with generated potential verbs is also suggested, namely, enrichment of Croatian Morphological Lexicon, Croatian Wordnet and CROVALLEX.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.