Back to Search Start Over

CNN-IETS

Authors :
Yongxin Shen
An Liu
Meng Hu
Lei Zhao
Guanfeng Liu
Zhixu Li
Kai Zheng
Source :
CIKM
Publication Year :
2017
Publisher :
ACM, 2017.

Abstract

Information Extraction by Text Segmentation (IETS) aims at segmenting text inputs to extract implicit data values contained in them.The state-of-art IETS approaches mainly rely on machine learning techniques, either supervised or unsupervised.However, while the supervised approaches require a large labelled training data, the performance of the unsupervised ones could be unstable on different data sets.To overcome their weaknesses, this paper introduces CNN-IETS, a novel unsupervised probabilistic approach that takes the advantages of pre-existing data and a Convolution Neural Network (CNN)-based probabilistic classification model. While using the CNN model can ease the burden of selecting high-quality features in associating text segments with attributes of a given domain, the pre-existing data as a domain knowledge base can provide training data with a comprehensive list of features for building the CNN model.Given an input text, we do initial segmentation (according to the occurrences of these words in the knowledge base) to generate text segments for CNN classification with probabilities. Then, based on the probabilistic CNN classification results, we work on finding the most probable labelling way to the whole input text.As a complementary, a bidirectional sequencing model learned on-demand from test data is finally deployed to do further adjustment to some problematic labelled segments.Our experimental study conducted on several real data collections shows that CNN-IETS improves the extraction quality of state-of-art approaches by more than 10%.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Accession number :
edsair.doi...........9b9587d8dc897d34e89481c5f3c4f923