Back to Search
Start Over
THE DESIGN OF A STATISTICAL ALGORITHM FOR RESOLVING STRUCTURAL AMBIGUITY IN “V NP[sub 1] usde NP[sub 0]”.
- Source :
-
Computational Intelligence . Feb2003, Vol. 19 Issue 1, p64-85. 22p. - Publication Year :
- 2003
-
Abstract
- The existence of structural ambiguity in modifying clauses renders noun phrase (NP) extraction from running Chinese texts complicated. It is shown from previous experiments that nearly 33% of the errors in an NP extractor were actually caused by the use of clause modifiers. For example, consider the sequence “V + NP[sub 1]+ (of) + NP[sub 0].” It can be interpreted as two alternatives, a verb phrase (i.e., [V[NP[sub 1]+ + NP[sub 0]][sub NP]][sub VP]) or a noun phrase (i.e., [[V NP[sub 1]][sub VP]+ + NP[sub 0]][sub NP]). To resolve this ambiguity, syntactical, contextual, and semantics-based approaches are investigated in this article. The conclusion is that the problem can be overcome only when the semantic knowledge about words is adopted. Therefore, a structural disambiguation algorithm based on lexical association is proposed. The algorithm uses the semantic class relation between a word pair derived from a standard Chinese thesaurus, , to work out whether a noun phrase or a verb phrase has a stronger lexical association within the collocation. This can, in turn, determine the intended phrase structure. With the proposed algorithm, the best accuracy and coverage are 79% and 100%, respectively. The experiment also shows that the backed-off model is more effective for this purpose. With this disambiguation algorithm, parsing performance can be significantly improved. [ABSTRACT FROM AUTHOR]
- Subjects :
- *ALGORITHMS
*STATISTICS
*SEMANTICS
*CHINESE language
*NOUN phrases (Grammar)
Subjects
Details
- Language :
- English
- ISSN :
- 08247935
- Volume :
- 19
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- Computational Intelligence
- Publication Type :
- Academic Journal
- Accession number :
- 10153752
- Full Text :
- https://doi.org/10.1111/1467-8640.00214