1. Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
- Author
-
Irina Kotiurova and Polina Trenina
- Subjects
learner corpus ,part-of-speech tagger ,german ,pos-tagging ,Telecommunication ,TK5101-6720 - Abstract
The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger.
- Published
- 2022
- Full Text
- View/download PDF