Back to Search Start Over

Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network

Authors :
LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi
Source :
Jisuanji kexue, Vol 49, Iss 9, Pp 92-100 (2022)
Publication Year :
2022
Publisher :
Editorial office of Computer Science, 2022.

Abstract

With the deep integration of computer technology into social life,more and more short text messages are spreaded all over the web platform.Aiming at the problem of data sparsity of short texts,a robust heterogeneous information network framework(HTE) for modeling short texts,which can integrate any type of additional information and capture the relationship between them to solve the data sparsity problem,is constructed.Based on this framework,six short text expansion methods are designed using different external knowledge,and the short text features are enriched by introducing entity information such as entities,entity categories,inter-entity relationships and textual information such as text topics from Wikipedia and Freebase knowledge bases.Finally,the similarity measurement result is used to verify the experimental effect.By comparing the six text expansion me-thods with the traditional three similarity measures on two short text datasets and the current mainstream short text matching algorithms,the results of the proposed six text expansion methods are improved.Compared with BERT,the similarity measurement results of the best method improves by 5.97%.The proposed framework is robust and can include any type of external know-ledge,and the proposed method can overcome the data sparsity problem of short texts and can perform similarity metrics on short texts with high accuracy in an unsupervised manner.

Details

Language :
Chinese
ISSN :
1002137X, 21070024, and 79993389
Volume :
49
Issue :
9
Database :
Directory of Open Access Journals
Journal :
Jisuanji kexue
Publication Type :
Academic Journal
Accession number :
edsdoj.79993389f7ac4720aeaaab762a9ab5a2
Document Type :
article
Full Text :
https://doi.org/10.11896/jsjkx.210700241