1. Fusing information entropy and similarity: A novel active learning strategy for chemical process fault classifications.
- Author
-
Wu, Shuhui, Zhao, Zihao, Yin, Min, and Li, Hongguang
- Subjects
- *
ACTIVE learning , *SUPERVISED learning , *ENTROPY (Information theory) , *CHEMICAL processes , *DEEP learning , *LEARNING strategies , *EUCLIDEAN distance - Abstract
Supervised learning for chemical process fault classifications need for a large amount of sampling data with fault labels which significantly consume expert manpower in general. It is seen that active learning is becoming an effective solution for this problem. To improve the quality of sampling data used for expert labeling, this paper proposes sampling evaluation metrics combing information entropy and similarity for active learning strategies in chemical process fault classifications. Specifically, in response to the insensitive direction of the cosine similarity, the Euclidean distance is merged with evaluate the sampling representativeness. Further, combing with information entropy, a comprehensive evaluation index (E-ECos) is established to select the most valuable samples for expert labeling, adding to the training set of classifiers, by labeling them by experts and adding them to the supervised learning training set of the classifier, the classification learning performance can be effectively improved with a relatively small number of actively labeled samples. On the TE process simulation platform, a deep learning network and an ensemble learning network based classifiers are employed to classify industrial process faults, respectively. Experimental results demonstrate the effectiveness of the proposed method. • This paper proposes a fault classification method with an improved active learning. To improve the quality of sampling data used for expert labeling, this method combing information entropy and similarity for active learning strategies in chemical process fault classifications. • Sample similarities: This paper comprehensively evaluates the similarity of samples by evaluating the cosine similarity and Euclidean distance between samples. On the one hand, Euclidean distance complements the problem that cosine similarity is insensitive to absolute direction. On another hand, the Euclidean distance is based on the absolute value of dimensional features. • E-ECOS: Taking advantage of the different sample selecting evaluation metrics of the two active learning algorithms, we establish a comprehensive evaluation index in terms of sample informativeness and sample representativeness. • Future potentiality: The following research is suggested to combine active learning with auto-machine labeling to further reduce the consumption of manpower for fault sample labeling. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF