101. Micro-blog commercial word extraction based on improved TF-IDF algorithm.
- Author
-
Huang, Xing and Wu, Qing
- Abstract
Nowadays found some micro-blog commercial extraction algorithm only considering the relationship between the key words and the number of it appearing in texts, and ignoring the key words' distribution in a certain category, which leads the decreased accuracy problems of micro-blog commercial word extraction. To solve this problem, the application of TF-IDF algorithm in words weight calculation was researched in this paper. Combining the relevant knowledge of information theory and analyzing the distribution of keywords within a class, the article proposed improving TF-IDF algorithm and applying it in term weight calculation. To test the feasibility of the improved algorithm, this paper initially classified the massive micro-blog information into certain types, and then used improved TFIDF algorithm to calculate term weight among the categories, and, this calculation was realized under the Hadoop Distributed framework. The experiment results demonstrated that in the application of micro-blog commercial word extraction, the improved TF-IDF algorithm is effective and feasible. Compared with traditional algorithms, the improved algorithm greatly improved accuracy. In addition, the data processing speed has greatly improved under Hadoop framework. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF