Back to Search Start Over

Naive Bayes-based Experiments in Romanian Dialect Identification

Authors :
Tommi Jauhiainen
Heidi Annika Jauhiainen
Bo Krister Johan Linden
Department of Digital Humanities
Language Technology
Centre of Excellence in Ancient Near Eastern Empires (ANEE)
Department of Modern Languages 2010-2017
Zampieri, Marcos
Nakov, Preslav
Ljubešic, Nikola
Tiedemann , Jörg
Scherrer , Yves
Jauhiainen, Tommi
Source :
University of Helsinki
Publication Year :
2021

Abstract

This article describes the experiments and systems developed by the SUKI team for the second edition of the Romanian Dialect Identification (RDI) shared task which was organized as part of the 2021 VarDial Evaluation Campaign. We submitted two runs to the shared task and our second submission was the overall best submission by a noticeable margin. Our best submission used a character n-gram based naive Bayes classifier with adaptive language models. We describe our experiments on the development set leading to both submissions.

Details

Language :
English
Database :
OpenAIRE
Journal :
University of Helsinki
Accession number :
edsair.dedup.wf.001..33ba0aee7769a043e434417842a66cef