Back to Search Start Over

Unsupervised learning of morphology in the USSR

Authors :
Burlot, Franck
Yvon, Francois
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)
Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Damon Mayaffre, Céline Poudat, Laurent Vanni, Véronique Magri, Peter Follette
Publications, Limsi
Source :
Journées internationales d'Analyse statistique des Données Textuelles, Journées internationales d'Analyse statistique des Données Textuelles, Damon Mayaffre, Céline Poudat, Laurent Vanni, Véronique Magri, Peter Follette, Jun 2016, Nice, France, [1]-10 (2016)., Proceedings of the 13th International Conference on Statistical Analysis of Textual Data, Proceedings of the 13th International Conference on Statistical Analysis of Textual Data3. International Conference on Statistical Analysis of Textual Data, JADT 2016, Nice, France, 2016-06-07-2016-06-10
Publication Year :
2016
Publisher :
HAL CCSD, 2016.

Abstract

International audience; This article deals with an important task for the processing of morphologically rich languages. Unsupervised learning of morphology mainly consists of learning a grammar that enables word segmentation into morphemes without any prior knowledge of the analysed language. It is usually assumed that the origins of such a task date back to the times of Zellig Harris, an assumption which ignores the important contribution of his contemporary, the Soviet linguist Nikolaj Dmitrievič Andreev, who developed a statistico-combinatorial model to learn morphology in the 1960s. We propose a critical description of Andreev’s model and attempt to bring to light its pioneering aspects as well as its weaknesses. Finally, we show results over several European languages. Our implementation of the model can be downloaded from https://github.com/franckbrl/stat_comb_model.

Details

Language :
English
Database :
OpenAIRE
Journal :
Journées internationales d'Analyse statistique des Données Textuelles, Journées internationales d'Analyse statistique des Données Textuelles, Damon Mayaffre, Céline Poudat, Laurent Vanni, Véronique Magri, Peter Follette, Jun 2016, Nice, France, [1]-10 (2016)., Proceedings of the 13th International Conference on Statistical Analysis of Textual Data, Proceedings of the 13th International Conference on Statistical Analysis of Textual Data3. International Conference on Statistical Analysis of Textual Data, JADT 2016, Nice, France, 2016-06-07-2016-06-10
Accession number :
edsair.dedup.wf.001..9c9f2143866d4db9fc2705c4141f532e