Back to Search Start Over

Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts

Authors :
Irina Kotiurova
Polina Trenina
Source :
Proceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 31, Iss 1, Pp 115-124 (2022)
Publication Year :
2022
Publisher :
FRUCT, 2022.

Abstract

The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger.

Details

Language :
English
ISSN :
23057254 and 23430737
Volume :
31
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Proceedings of the XXth Conference of Open Innovations Association FRUCT
Publication Type :
Academic Journal
Accession number :
edsdoj.62b35d32f854ff98835032b2f3da2f5
Document Type :
article
Full Text :
https://doi.org/10.23919/FRUCT54823.2022.9770886