Back to Search Start Over

All that is English may be Hindi: Enhancing language identification through automatic ranking of likeliness of word borrowing in social media

Authors :
Patro, Jasabanta
Samanta, Bidisha
Singh, Saurabh
Basu, Abhipsa
Mukherjee, Prithwish
Choudhury, Monojit
Mukherjee, Animesh
Publication Year :
2017

Abstract

In this paper, we present a set of computational methods to identify the likeliness of a word being borrowed, based on the signals from social media. In terms of Spearman correlation coefficient values, our methods perform more than two times better (nearly 0.62) in predicting the borrowing likeliness compared to the best performing baseline (nearly 0.26) reported in literature. Based on this likeliness estimate we asked annotators to re-annotate the language tags of foreign words in predominantly native contexts. In 88 percent of cases the annotators felt that the foreign language tag should be replaced by native language tag, thus indicating a huge scope for improvement of automatic language identification systems.<br />Comment: 11 pages, accepted in the 2017 conference on Empirical Methods on Natural Language Processing(EMNLP 2017) arXiv admin note: substantial text overlap with arXiv:1703.05122

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1707.08446
Document Type :
Working Paper
Full Text :
https://doi.org/10.18653/v1/D17-1240