Back to Search Start Over

Research on Chinese Text and Application Based on the Latent Dirichlet Allocation

Authors :
Mingzhu Wei
Zhang Junqing
Mai Weijie
Feng Yuan
Source :
2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE).
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Chinese text research has always been a hot and difficult point in natural language processing. On the one hand, Chinese and English are very different in grammar, resulting in the uniqueness of Chinese word segmentation. On the other hand, Chinese part of speech tagging methods are also different from English. To some extent, the above two points hinder the research and application of natural language processing technology in Chinese text. The paper presents a word segmentation algorithm for Chinese text, and then introduces how to use vector space model to represent Chinese text. At the same time, the topic number k is given according to different criteria for measuring text clustering in order to build the Latent Dirichlet Allocation model base on Guangdong Province consumer complaint text. Through the model to assist the administrative departments to formulate corresponding policies, to grasp the key points of complaints accurately and quickly for the administrative personnel.

Details

Database :
OpenAIRE
Journal :
2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)
Accession number :
edsair.doi...........33ef9f3ce481c63d553f24bd431ef88d