Back to Search Start Over

Towards A Welsh Semantic Annotation System

Authors :
Calzolari, Nicoletta
Choukri, Khalid
Cieri, Christopher
Declerck, Thierry
Goggi, Sara
Hasida, Koiti
Isahara, Hitoshi
Maegaard, Bente
Mariani, Joseph
Mazo, Helene
Moreno, Asuncion
Odijk, Jan
Piperidis, Stelios
Tokunaga, Takenobu
Piao, Scott Songlin
Rayson, Paul Edward
Knight, Dawn
Watkins, Gareth
Calzolari, Nicoletta
Choukri, Khalid
Cieri, Christopher
Declerck, Thierry
Goggi, Sara
Hasida, Koiti
Isahara, Hitoshi
Maegaard, Bente
Mariani, Joseph
Mazo, Helene
Moreno, Asuncion
Odijk, Jan
Piperidis, Stelios
Tokunaga, Takenobu
Piao, Scott Songlin
Rayson, Paul Edward
Knight, Dawn
Watkins, Gareth
Publication Year :
2018

Abstract

Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantic taggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resource languages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotation tool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a large scale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semantic classification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsets into a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to 91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpus and information processing at semantic level.

Details

Database :
OAIster
Notes :
application/pdf, application/pdf, https://eprints.lancs.ac.uk/id/eprint/123588/1/lrec2018_cysemtagger.pdf, English, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1425707944
Document Type :
Electronic Resource