Back to Search
Start Over
Comparative Analysis of Automatic POS Taggers Applied to German Learner Texts
- Source :
- Proceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 31, Iss 1, Pp 115-124 (2022)
- Publication Year :
- 2022
- Publisher :
- FRUCT, 2022.
-
Abstract
- The process of assigning morpho-syntactic categories of each element in the sentence including punctuation marks in a text document according to the context is called Part of Speech (POS) tagging. The article presents the analysis of testing and comparison of five part-of-speech taggers: CoreNLP, spaCy, TextBlob, RFTagger and TreeTagger, based on the texts from the annotated learner corpus of Petrozavodsk State University (PACT, Petrozavodsk Annotated Corpus of Texts). All these tools are applicable to the German language; however, learner texts have their own characteristics, primarily associated with a large number of errors. The problem of scientific research is in finding out how traditional instruments will cope with the task of automatic annotation when they come across a large number of grammatical, lexical and spelling mistakes. The conclusions were drawn about the frequency of errors in the part-of-speech identification, the tagging quality, weaknesses and strengths of each tagger.
- Subjects :
- learner corpus
part-of-speech tagger
german
pos-tagging
Telecommunication
TK5101-6720
Subjects
Details
- Language :
- English
- ISSN :
- 23057254 and 23430737
- Volume :
- 31
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- Proceedings of the XXth Conference of Open Innovations Association FRUCT
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.62b35d32f854ff98835032b2f3da2f5
- Document Type :
- article
- Full Text :
- https://doi.org/10.23919/FRUCT54823.2022.9770886