1. A new unsupervised Algorithm for extracting relationship words between two entities
- Author
-
Li Yao, Honghai Feng, Fan Wu, and Taihao Zheng
- Subjects
Relation (database) ,Computer science ,business.industry ,Feature extraction ,Supervised learning ,Semantics ,computer.software_genre ,Relationship extraction ,Unsupervised algorithm ,Knowledge graph ,Artificial intelligence ,Web crawler ,business ,computer ,Natural language processing - Abstract
Purpose: In order to use a popular supervised learning algorithm such as BERT to extract the relationships of concepts (triple relationship extraction), it is necessary to label the relationship types manually. If some relation words are not been labeled in the training stag, they cannot be recognized probably in the test stage and the corresponding entities cannot been recognized accordingly. This paper proposes a new unsupervised algorithm to extract as many relation words as possible of two entities, especially those that are easily overlooked. Methods: The disease-cause relationship was taken as an example, and 10204 effective sentences of disease and corresponding causes were extracted by web crawler. According to the constraints of syntactic, semantic and lexical features, the relationship words were extracted with an unsupervised manner, and the automatic extracted results were summarized. Results: Some specific relation words that are ignored in manual labeling stage are found; the conjoining relation words often appeared together in the texts are recognized; some types and features of relation words are obtained. These types and features can be used to help the relation labeling in the supervised learning stage, and to help expanding the relevant knowledge graphs and improving the accuracy of information retrieval.
- Published
- 2021
- Full Text
- View/download PDF