1. Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian
- Author
-
Sara Castagnoli, M. Silvia Micheli, Malvina Nissim, Francesca Masini, Andrea Zaninello, J. Monti, F. Dell'Orletta, F. Tamburini, Francesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina Nissim, Monti, J., Dell'Orletta, F., Tamburini, F., Masini, F, Micheli, M, Zaninello, A, Castagnoli, S, and Nissim, M
- Subjects
Distribution (number theory) ,Italian ,multiword expressions, corpora, Italian, Natural Language Processing ,Computer science ,Multiword expression ,AriEmozione ,corpora ,computer.software_genre ,Settore L-LIN/01 - Glottologia e Linguistica ,Online Hate Speech ,Resource (project management) ,CBX ,Multilingual NLU ,Twitter during Pandemic ,Lemma (mathematics) ,Automatic Sarcasm Detection ,Linguistic Ostracism in Social Networks ,business.industry ,COVID-19 ,multiword expressions, MWE dataset, computational linguistics, corpus linguistics, Italian MWE ,Linguistics ,LAN000000 ,Quantitative Linguistic Investigations ,Fine-grained sentiment analysis ,Computational Linguistics ,DistilBERT ,Depression from Social Media ,Distributional Semantics ,Gender Bias ,AEREST ,E3C Project ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,TrAVaSI ,Artificial intelligence ,business ,computer ,Natural language processing ,L-LIN/01 - GLOTTOLOGIA E LINGUISTICA - Abstract
The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
- Published
- 2020