101. Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis
- Author
-
Shuo Yang, Jingzhi Guo, Hengliang Tan, and Ran Wei
- Subjects
Word embedding ,Computer Networks and Communications ,business.industry ,Computer science ,Document classification ,media_common.quotation_subject ,Semantic analysis (machine learning) ,02 engineering and technology ,Ambiguity ,computer.software_genre ,Human-Computer Interaction ,Semantic similarity ,020204 information systems ,Synonym (database) ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,020201 artificial intelligence & image processing ,Artificial intelligence ,Polysemy ,business ,computer ,Software ,Natural language processing ,media_common - Abstract
Document classification has become an indispensable technology to realize intelligent information services. This technique is often applied to the tasks such as document organization, analysis, and archiving or implemented as a submodule to support high-level applications. It has been shown that semantic analysis can improve the performance of document classification. Although this has been incorporated in previous automatic document classification methods, with an increase in the number of documents stored online, the use of semantic information for document classification has attracted greater attention as it can greatly reduce human effort. In this present paper, we propose two semantic document classification strategies for two types of semantic problems: (1) a novel semantic similarity computation (SSC) method to solve the polysemy problem and (2) a strong correlation analysis method (SCM) to solve the synonym problem. Experimental results indicate that compared with traditional machine learning, n-gram, and contextualized word embedding methods, the efficient semantic similarity and correlation analysis allow eliminating word ambiguity and extracting useful features to improve the accuracy of semantic document classification for texts in Chinese.
- Published
- 2020