Back to Search Start Over

Fusing information entropy and similarity: A novel active learning strategy for chemical process fault classifications.

Authors :
Wu, Shuhui
Zhao, Zihao
Yin, Min
Li, Hongguang
Source :
Chemometrics & Intelligent Laboratory Systems. Jun2023, Vol. 237, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

Supervised learning for chemical process fault classifications need for a large amount of sampling data with fault labels which significantly consume expert manpower in general. It is seen that active learning is becoming an effective solution for this problem. To improve the quality of sampling data used for expert labeling, this paper proposes sampling evaluation metrics combing information entropy and similarity for active learning strategies in chemical process fault classifications. Specifically, in response to the insensitive direction of the cosine similarity, the Euclidean distance is merged with evaluate the sampling representativeness. Further, combing with information entropy, a comprehensive evaluation index (E-ECos) is established to select the most valuable samples for expert labeling, adding to the training set of classifiers, by labeling them by experts and adding them to the supervised learning training set of the classifier, the classification learning performance can be effectively improved with a relatively small number of actively labeled samples. On the TE process simulation platform, a deep learning network and an ensemble learning network based classifiers are employed to classify industrial process faults, respectively. Experimental results demonstrate the effectiveness of the proposed method. • This paper proposes a fault classification method with an improved active learning. To improve the quality of sampling data used for expert labeling, this method combing information entropy and similarity for active learning strategies in chemical process fault classifications. • Sample similarities: This paper comprehensively evaluates the similarity of samples by evaluating the cosine similarity and Euclidean distance between samples. On the one hand, Euclidean distance complements the problem that cosine similarity is insensitive to absolute direction. On another hand, the Euclidean distance is based on the absolute value of dimensional features. • E-ECOS: Taking advantage of the different sample selecting evaluation metrics of the two active learning algorithms, we establish a comprehensive evaluation index in terms of sample informativeness and sample representativeness. • Future potentiality: The following research is suggested to combine active learning with auto-machine labeling to further reduce the consumption of manpower for fault sample labeling. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01697439
Volume :
237
Database :
Academic Search Index
Journal :
Chemometrics & Intelligent Laboratory Systems
Publication Type :
Academic Journal
Accession number :
163656973
Full Text :
https://doi.org/10.1016/j.chemolab.2023.104821