Back to Search Start Over

Syntactic Simplification for Improving Content Selection in Multi-Document Summarization

Authors :
COLUMBIA UNIV NEW YORK DEPT OF COMPUTER SCIENCE
Siddharthan, Advaith
Nenkova, Ani
McKeown, Kathleen
COLUMBIA UNIV NEW YORK DEPT OF COMPUTER SCIENCE
Siddharthan, Advaith
Nenkova, Ani
McKeown, Kathleen
Source :
DTIC
Publication Year :
2004

Abstract

In this paper, we explore the use of automatic syntactic simplification for improving content selection in multi-document summarization. In particular, we show how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. We argue that the inclusion of parenthetical information in a summary is a reference-generation task rather than a content-selection one, and implement a baseline reference rewriting module. We perform our evaluations on the test sets from the 2003 and 2004 Document Understanding Conference and report that simplifying parentheticals results in significant cant improvement on the automated evaluation metric Rouge. Syntactic simplification is an NLP task, the goal of which is to rewrite sentences to reduce their grammatical complexity while preserving their meaning and information content. Text simplification is a useful task for varied reasons. Chandrasekar et al. (1996) viewed text simplification as a preprocessing tool to improve the performance of their parser. The PSET project (Carroll et al., 1999), on the other hand, focused its research on simplifying newspaper text for aphasics, who have trouble with long sentences and complicated grammatical constructs. We have previously (Siddharthan, 2002; Siddharthan, 2003) developed a shallow and robust syntactic simplification system for news reports, that simplifies relative clauses, apposition and conjunction. In this paper, we explore the use of syntactic simplification in multi-document summarization.

Details

Database :
OAIster
Journal :
DTIC
Notes :
text/html, English
Publication Type :
Electronic Resource
Accession number :
edsoai.ocn831966731
Document Type :
Electronic Resource