1. An integrated framework of learning and evidential reasoning for user profiling using short texts
- Author
-
Van-Nam Huynh, Jessada Karnjana, and Duc-Vinh Vo
- Subjects
Information retrieval ,User profile ,Word embedding ,Computer science ,User modeling ,Evidential reasoning approach ,020206 networking & telecommunications ,02 engineering and technology ,Recommender system ,Hardware and Architecture ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Profiling (information science) ,020201 artificial intelligence & image processing ,Cluster analysis ,Pignistic probability ,Software ,Information Systems - Abstract
Inferring user profiles based on texts created by users on social networks has a variety of applications in recommender systems such as job offering, item recommendation, and targeted advertisement. The problem becomes more challenging when working with short texts like tweets on Twitter, or posts on Facebook. This work aims at proposing an integrated framework based on Dempster–Shafer theory of evidence, word embedding, and k -means clustering for user profiling problem, which is capable of not only working well with short texts but also dealing with uncertainty inherently in user texts. The proposed framework is essentially composed of three phases: (1) Learning abstract concepts at multiple levels of abstraction from user corpora; (2) Evidential inference and combination for user modeling; and (3) User profile extraction. Particularly, in the first phase, a word embedding technique is used to convert preprocessed texts into vectors which capture semantics of words in user corpus, and then k -means clustering is utilized for learning abstract concepts at multiple levels of abstraction, each of which reflects appropriate semantics of user profiles. In the second phase, by considering each document in user corpus as an evidential source that carries some partial information for inferring user profiles, we first infer a mass function associated with each user document by maximum a posterior estimation, and then apply Dempster’s rule of combination for fusing all documents’ mass functions into an overall one for the user corpus. Finally, in the third phase, we apply the so-called pignistic probability principle to extract top- n keywords from user’s overall mass function to define the user profile. Thanks to the ability of combining pieces of information from many documents, the proposed framework is flexible enough to be scaled when input data coming from not only multiple modes but different sources on web environments. Besides, the resulting profiles are interpretable, visualizable, and compatible in practical applications. The effectiveness of the proposed framework is validated by experimental studies conducted on datasets crawled from Twitter and Facebook.
- Published
- 2021