Back to Search Start Over

Spanish all-words semantic class disambiguation using Cast3LB corpus

Authors :
Izquierdo Beviá, Rubén
Moreno Monteagudo, Lorenza
Navarro Colorado, Borja
Suárez Cueto, Armando
Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos
Procesamiento del Lenguaje Natural y Sistemas de Información
Source :
RUA. Repositorio Institucional de la Universidad de Alicante, Universidad de Alicante (UA)
Publication Year :
2006
Publisher :
Springer Berlin / Heidelberg, 2006.

Abstract

In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such as SemCor do not have enough examples for many senses when used in a machine learning method. Using semantic classes instead of senses allows to collect a larger number of examples for each class while polysemy is reduced, improving the accuracy of semantic disambiguation. Cast3LB, a SemCor-like corpus, manually annotated with Spanish WordNet 1.5 senses, has been used in this paper to perform semantic disambiguation based on several sets of classes: lexicographer files of WordNet, WordNet Domains, and SUMO ontology. This paper has been supported by the Spanish Government under projects CESS-ECE (HUM2004-21127-E) and R2D2 (TIC2003-07158-C04-01).

Details

Language :
English
Database :
OpenAIRE
Journal :
RUA. Repositorio Institucional de la Universidad de Alicante, Universidad de Alicante (UA)
Accession number :
edsair.dedup.wf.001..6b35b2ada7f521d64dc6dacf1cd55781