Back to Search
Start Over
Research on Chinese Text and Application Based on the Latent Dirichlet Allocation
- Source :
- 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE).
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- Chinese text research has always been a hot and difficult point in natural language processing. On the one hand, Chinese and English are very different in grammar, resulting in the uniqueness of Chinese word segmentation. On the other hand, Chinese part of speech tagging methods are also different from English. To some extent, the above two points hinder the research and application of natural language processing technology in Chinese text. The paper presents a word segmentation algorithm for Chinese text, and then introduces how to use vector space model to represent Chinese text. At the same time, the topic number k is given according to different criteria for measuring text clustering in order to build the Latent Dirichlet Allocation model base on Guangdong Province consumer complaint text. Through the model to assist the administrative departments to formulate corresponding policies, to grasp the key points of complaints accurately and quickly for the administrative personnel.
- Subjects :
- Point (typography)
Grammar
Computer science
business.industry
media_common.quotation_subject
GRASP
Text segmentation
Document clustering
computer.software_genre
Latent Dirichlet allocation
symbols.namesake
Vector space model
symbols
Segmentation
Artificial intelligence
business
computer
Natural language processing
media_common
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)
- Accession number :
- edsair.doi...........33ef9f3ce481c63d553f24bd431ef88d