1. Named Entity Normalization in Czech Texts
- Author
-
Kubát, Petr, Vidová Hladká, Barbora, and Popel, Martin
- Subjects
normalizace ,named entities ,rule-based system ,pojmenované entity ,pravidlový systém ,normalization - Abstract
Named entities are collocations used to refer to real world objects in text. Named entity normalization is a process of generating the basic form for a given named entity. The thesis is focused on creating a rule- based procedure for named entity normalization in Czech texts. The process of designing individual rules is closely examined. Stress is laid on the fact that each rule is motivated by entities from real-world texts. Additionally, some aspects of Czech language syntax are analyzed in order to achieve the highest possible accuracy. Based on the theoretical description of the procedure, a normalization application is implemented, and its accuracy is evaluated by comparison with manually normalized entities. Together with already existing tools for automatic named entity recognition, it is possible to use this normalizer in other text processing tasks, such as machine translation, searching and categorization, etc. Powered by TCPDF (www.tcpdf.org)
- Published
- 2014