Back to Search Start Over

A Multi-word Term Extraction System.

Authors :
Qiang Yang
Webb, Geoff
Jisong Chen
Chung-Hsing Yeh
Chau, Rowena
Source :
PRICAI 2006: Trends in Artificial Intelligence; 2006, p1160-1165, 6p
Publication Year :
2006

Abstract

Traditional statistical approaches for identifying multi-word terms have to handle a large amount of noisy data and are extremely time consuming. This paper introduces a multi-word term extraction system for extracting multi-word terms from a set of documents based on the co-related text-segments existing in these documents. The system uses a short predefined stoplist as an initial input to segment a set of documents into text-segments, calculates the segment-weights of all text-segments, and then applies the short text-segments to segment the longer text-segments based on the weight values recursively until all text-segments cannot be further divided. The resultant text-segments can thus be identified as terms based on a specified threshold. The initial experimental result on a set of traditional Chinese documents shows that this system can achieve a minimum of 76.39% of recall rate and a minimum of 91.05% of precision rate on retrieving multiple occurrences terms, which include 18.30% of new identified terms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540366676
Database :
Complementary Index
Journal :
PRICAI 2006: Trends in Artificial Intelligence
Publication Type :
Book
Accession number :
32907676
Full Text :
https://doi.org/10.1007/11801603_153