1. Social relation extraction with improved distant supervised and word embedding features
- Author
-
Jinwen Liu, Weikang Rui, Liping Zhang, and Yawei Jia
- Subjects
Word embedding ,business.industry ,Computer science ,Feature extraction ,Context (language use) ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,Semantics ,01 natural sciences ,Relationship extraction ,Knowledge-based systems ,Information extraction ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,The Internet ,Chinese language ,Artificial intelligence ,business ,F1 score ,computer ,Natural language processing ,0105 earth and related environmental sciences - Abstract
With the rapid development of Internet, how to extract personal relations from Internet has become an important research topic in information extraction. However, current relation extraction researches mainly focus on the processing of English language, the researches focus on Chinese are less. At the same time, there are two main problems in current personal relation extraction approaches: 1) it is difficult to get a large amount of high quality training data without manually label effort; 2) the performance of personal relation extraction from Chinese is unsatisfactory. To solve the first problem, we propose an improved distant supervision method which is applied to Chinese language and can label large-scale of high quality training data automatically. To solve the second problem, we extract three features based on word embedding and combine them with the basic features. In the experiment, the improved distant supervision methods improve the quality of training data significantly. And the word embedding features improve the average F1 score by 4% than the basic features.
- Published
- 2016