Back to Search Start Over

Integrating Linguistic Resources in TC through WSD.

Authors :
Ureña-López, L. Alfonso
Buenaga, Manuel
Gómez, José M.
Source :
Computers & the Humanities. May2001, Vol. 35 Issue 2, p215-230. 16p.
Publication Year :
2001

Abstract

Information access methods must be improved to overcome the information overload that most professionals face nowadays. Text classification tasks, like Text Categorization, help the users to access to the great amount of text they find in the Internet and their organizations. TC is the classification of documents into a predefined set of categories. Most approaches to automatic TC are based on the utilization of a training collection, which is a set of manually classified documents. Other linguistic resources that are emerging, like lexical databases, can also be used for classification tasks. This article describes an approach to TC based on the integration of a training collection (Reuters-21578) and a lexical database (WordNet 1.6) as knowledge sources. Lexical databases accumulate information on the lexical items of one or several languages. This information must be filtered in order to make an effective use of it in our model of TC. This filtering process is a Word Sense Disambiguation task. WSD is the identification of the sense of words in context. This task is an intermediate process in many natural language processing tasks like machine translation or multilingual information retrieval. We present the utilization of WSD as an aid for TC. Our approach to WSD is also based on the integration of two linguistic resources: a training collection (SemCor and Reuters-21578) and a lexical database (WordNet 1.6). We have developed a series of experiments that show that: TC and WSD based on the integration of linguistic resources are very effective; and, WSD is necessary to effectively integrate linguistic resources in TC. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00104817
Volume :
35
Issue :
2
Database :
Academic Search Index
Journal :
Computers & the Humanities
Publication Type :
Academic Journal
Accession number :
16898801
Full Text :
https://doi.org/10.1023/a:1002632712378