Back to Search
Start Over
Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
- Source :
- Information Processing & Management. 56:91-109
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- Recently, social network has been paid more and more attention by people. Inaccurate community detection in social network can provide better product designs, accurate information recommendation and public services. Thus, the community detection (CD) algorithm based on network topology and user interests is proposed in this paper. This paper mainly includes two parts. In first part, the focused crawler algorithm is used to acquire the personal tags from the tags posted by other users. Then, the tags are selected from the tag set based on the TFIDF weighting scheme, the semantic extension of tags and the user semantic model. In addition, the tag vector of user interests is derived with the respective tag weight calculated by the improved PageRank algorithm. In second part, for detecting communities, an initial social network, which consists of the direct and unweighted edges and the vertexes with interest vectors, is constructed by considering the following/follower relationship. Furthermore, initial social network is converted into a new social network including the undirected and weighted edges. Then, the weights are calculated by the direction and the interest vectors in the initial social network and the similarity between edges is calculated by the edge weights. The communities are detected by the hierarchical clustering algorithm based on the edge-weighted similarity. Finally, the number of detected communities is detected by the partition density. Also, the extensively experimental study shows that the performance of the proposed user interest detection (PUID) algorithm is better than that of CF algorithm and TFIDF algorithm with respect to F-measure, Precision and Recall. Moreover, Precision of the proposed community detection (PCD) algorithm is improved, on average, up to 8.21% comparing with that of Newman algorithm and up to 41.17% comparing with that of CPM algorithm.
- Subjects :
- Computer science
020206 networking & telecommunications
02 engineering and technology
Library and Information Sciences
Management Science and Operations Research
Focused crawler
Semantic data model
computer.software_genre
Network topology
Partition (database)
Computer Science Applications
Hierarchical clustering
Weighting
0202 electrical engineering, electronic engineering, information engineering
Media Technology
020201 artificial intelligence & image processing
Data mining
Precision and recall
tf–idf
computer
Information Systems
Subjects
Details
- ISSN :
- 03064573
- Volume :
- 56
- Database :
- OpenAIRE
- Journal :
- Information Processing & Management
- Accession number :
- edsair.doi...........a4117262590ba5d70a2192cf19ef9681