1. Identifying High Quality Document--Summary Pairs through Text Matching.
- Author
-
Yongshuai Hou, Yang Xiang, Buzhou Tang, Qingcai Chen, Xiaolong Wang, and Fangze Zhu
- Subjects
NATURAL language processing ,ARTIFICIAL intelligence ,ELECTRONIC data processing ,BIG data ,MICROBLOGS - Abstract
Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document--summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document--summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF