1. Fully unsupervised fault detection in solar power plants using physics-informed deep learning
- Author
-
Zgraggen, Jannik, Guo, Yuyan, Notaristefano, Antonio, Goren Huber, Lilach, Zgraggen, Jannik, Guo, Yuyan, Notaristefano, Antonio, and Goren Huber, Lilach
- Abstract
Machine learning algorithms for anomaly detection often assume training with historical data gathered under normal conditions, and detect anomalies based on large residuals at inference time. In real-world applications, labelled anomaly-free data is most often unavailable. In fact, a common situation is that the training data is contaminated with an unknown fraction of anomalies or faults of the same type we aim to detect. In this case, training residual-based models with the contaminated data often leads to increased missed detections and/or false alarms. While this challenge is rather common, in particular in technical fault detection setups, it is only rarely addressed in the scientific literature. In this paper we address this problem by introducing a data refinement algorithm that is capable of cleaning the contaminated training data in a fully unsupervised manner, and apply the algorithm to a problem of fault detection in grid-scale solar power plants. The data refinement framework is based on an original physics informed deep learning classification algorithm that would require healthy data as its input, in order to generate from it synthetic faulty data and train a binary classifier. We show that in order to achieve high fault detection performance, it is essential to avoid contamination of the original healthy data with unlabelled faults. To this end, we introduce an algorithm that isolates the healthy data in a fully unsupervised manner prior to training the binary classifier. We test our algorithm with field data from an operational solar power plant which includes contamination of unlabelled faulty data and demonstrate its high performance. In addition, we demonstrate the robustness of the proposed refinement method against an increasing fraction of faults in the training data.
- Published
- 2023