Back to Search Start Over

Natural Language Processing Using Neighbour Entropy-based Segmentation

Authors :
Qiao, Jianfeng
Yan, Xingzhi
Lv, Shuran
Qiao, Jianfeng
Yan, Xingzhi
Lv, Shuran
Source :
Journal of computing and information technology; ISSN 1330-1136 (Print); ISSN 1846-3908 (Online); Volume 29; Issue 2
Publication Year :
2021

Abstract

In natural language processing (NLP) of Chinese hazard text collected in the process of hazard identification, Chinese word segmentation (CWS) is the first step to extracting meaningful information from such semi-structured Chinese texts. This paper proposes a new neighbor entropy-based segmentation (NES) model for CWS. The model considers the segmentation benefits of neighbor entropies, adopting the concept of "neighbor" in optimization research. It is defined by the benefit ratio of text segmentation, including benefits and losses of combining the segmentation unit with more information than other popular statistical models. In the experiments performed, together with the maximum-based segmentation algorithm, the NES model achieves a 99.3% precision, 98.7% recall, and 99.0% f-measure for text segmentation; these performances are higher than those of existing tools based on other seven popular statistical models. Results show that the NES model is a valid CWS, especially for text segmentation requirements necessitating longer-sized characters. The text corpus used comes from the Beijing Municipal Administration of Work Safety, which was recorded in the fourth quarter of 2018.

Details

Database :
OAIster
Journal :
Journal of computing and information technology; ISSN 1330-1136 (Print); ISSN 1846-3908 (Online); Volume 29; Issue 2
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1337493789
Document Type :
Electronic Resource