Back to Search Start Over

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Authors :
Edoardo Maria Ponti
Helen O'Horan
Ekaterina Shutova
Yevgeni Berzak
Anna Korhonen
Roi Reichart
Thierry Poibeau
Ivan Vulić
Massachusetts Institute of Technology (MIT)
LIIR
LIIR - Department of Computer Science - KU Leuven
LIIR - Department of Computer Science - KU Leuven-LIIR - Department of Computer Science - KU Leuven
Technion - Israel Institute of Technology [Haifa]
Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice)
Département Littératures et langage - ENS Paris (LILA)
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3
University of Amsterdam [Amsterdam] (UvA)
Computer Laboratory [Cambridge]
University of Cambridge [UK] (CAM)
ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Ponti, Edoardo [0000-0002-6308-1050]
Apollo - University of Cambridge Repository
Département Littératures et langage (LILA)
Projet ERC Lexical
Institut 3IA Prairie
ILLC (FNWI)
Language and Computation (ILLC, FNWI/FGw)
Poibeau, Thierry
PaRis Artificial Intelligence Research InstitutE - - PRAIRIE2019 - ANR-19-P3IA-0001 - P3IA - VALID
Source :
Computational Linguistics, Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2019, 45 (3), pp.559-601. ⟨10.1162/coli_a_00357⟩, Computational Linguistics, 45(3), 559-601. MIT Press Journals, Ponti, E M, O’Horan, H, Berzak, Y, Vulić, I, Reichart, R, Poibeau, T, Shutova, E & Korhonen, A 2019, ' Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing ', Computational Linguistics, vol. 45, no. 3, pp. 559-601 . https://doi.org/10.1162/coli_a_00357, Computational Linguistics, Vol 45, Iss 3, Pp 559-601 (2019)
Publication Year :
2019
Publisher :
HAL CCSD, 2019.

Abstract

Addressing the cross-lingual variation of grammatical structures and meaning categorization is a key challenge for multilingual Natural Language Processing. The lack of resources for the majority of the world's languages makes supervised learning not viable. Moreover, the performance of most algorithms is hampered by language-specific biases and the neglect of informative multilingual data. The discipline of Linguistic Typology provides a principled framework to compare languages systematically and empirically and documents their variation in publicly available databases. These enshrine crucial information to design language-independent algorithms and refine techniques devised to mitigate the above-mentioned issues, including cross-lingual transfer and multilingual joint models, with typological features. In this survey, we demonstrate that typology is beneficial to several NLP applications, involving both semantic and syntactic tasks. Moreover, we outline several techniques to extract features from databases or acquire them automatically: these features can be subsequently integrated into multilingual models to tie parameters together cross-lingually or gear a model towards a specific language. Finally, we advocate for a new typology that accounts for the patterns within individual examples rather than entire languages, and for graded categories rather than discrete ones, in oder to bridge the gap with the contextual and continuous nature of machine learning algorithms.

Subjects

Subjects :
FOS: Computer and information sciences
[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing
Computational linguistics
02 engineering and technology
computer.software_genre
Language and Linguistics
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
[SCCO]Cognitive science
0302 clinical medicine
Language typology
[SHS.STAT] Humanities and Social Sciences/Methods and statistics
0202 electrical engineering, electronic engineering, information engineering
Sociology
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
Computer Science - Computation and Language
[SHS.STAT]Humanities and Social Sciences/Methods and statistics
Problem of universals
Linguistics
Computer Science Applications
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Variation (linguistics)
47 Language, Communication and Culture
020201 artificial intelligence & image processing
Computation and Language (cs.CL)
Natural language processing
Natural language
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
Typology
Linguistics and Language
Modeling language
Process (engineering)
[SHS.INFO]Humanities and Social Sciences/Library and information sciences
[SCCO.COMP]Cognitive science/Computer science
[SHS.INFO] Humanities and Social Sciences/Library and information sciences
03 medical and health sciences
46 Information and Computing Sciences
[SCCO.COMP] Cognitive science/Computer science
Artificial Intelligence
Machine learning
Natural Language Processing
Semantic variation
business.industry
4704 Linguistics
lcsh:P98-98.5
[SCCO] Cognitive science
[SCCO.LING]Cognitive science/Linguistics
[SHS.LANGUE] Humanities and Social Sciences/Linguistics
Linguistic typology
4605 Data Management and Data Science
030221 ophthalmology & optometry
Artificial intelligence
lcsh:Computational linguistics. Natural language processing
[SCCO.LING] Cognitive science/Linguistics
business
computer

Details

Language :
English
ISSN :
08912017 and 15309312
Database :
OpenAIRE
Journal :
Computational Linguistics, Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2019, 45 (3), pp.559-601. ⟨10.1162/coli_a_00357⟩, Computational Linguistics, 45(3), 559-601. MIT Press Journals, Ponti, E M, O’Horan, H, Berzak, Y, Vulić, I, Reichart, R, Poibeau, T, Shutova, E & Korhonen, A 2019, ' Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing ', Computational Linguistics, vol. 45, no. 3, pp. 559-601 . https://doi.org/10.1162/coli_a_00357, Computational Linguistics, Vol 45, Iss 3, Pp 559-601 (2019)
Accession number :
edsair.doi.dedup.....de4af499f2fb2acee71a65ff23f1946b
Full Text :
https://doi.org/10.1162/coli_a_00357⟩