Back to Search Start Over

Research on feature extraction of unstructured large power texts

Authors :
WANG Jiakai
HUANG Peizhuo
LI Yongle
SHENG Shuang
LIU Yang
ZHENG Ling
WEI Zhenhua
Source :
Zhejiang dianli, Vol 43, Iss 6, Pp 117-124 (2024)
Publication Year :
2024
Publisher :
zhejiang electric power, 2024.

Abstract

Large power texts contain numerous abbreviations of technical terms, alternative names, and irregular expressions. Existing word segmentation tools often fail to identify specialized vocabulary in the electrical engineering field, significantly hindering the analysis and utilization of unstructured texts. To address this challenge, this paper proposes a set of indexing rules tailored to the characteristics of unstructured texts in electrical engineering. Segmentation based on these rules can significantly enhance segmentation accuracy, laying a solid foundation for feature extraction of power texts. Furthermore, by employing effective long-text segmentation algorithms to preserve the semantic information of the original text, the paper integrates and embeds text feature information extracted by the BERT model with vocabulary feature information extracted by Word2Vec. This combined approach enables the extraction of precise features from large unstructured power texts. Finally, experimental results have demonstrated the effectiveness of the proposed method for extracting features from large unstructured power texts.

Details

Language :
Chinese
ISSN :
10071881
Volume :
43
Issue :
6
Database :
Directory of Open Access Journals
Journal :
Zhejiang dianli
Publication Type :
Academic Journal
Accession number :
edsdoj.439b32b3d8f2448fb4b05b4df0ec2933
Document Type :
article
Full Text :
https://doi.org/10.19585/j.zjdl.202406013