Descriptor: "isolation forest" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"isolation forest"' showing total 566 results

Start Over Descriptor "isolation forest"

566 results on '"isolation forest"'

351. TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios

Author: Tommaso Barbariol and GIAN ANTONIO SUSTO
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, TinyML, Weakly supervised, Information Systems and Management, Anomaly detection, Decision support systems, Isolation forest, Computer Science Applications, Theoretical Computer Science, Machine Learning (cs.LG), Artificial Intelligence, Control and Systems Engineering, Outlier detection, Software
Abstract: Unsupervised anomaly detection tackles the problem of finding anomalies inside datasets without the labels availability; since data tagging is typically hard or expensive to obtain, such approaches have seen huge applicability in recent years. In this context, Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. These are built using a random partitioning procedure that is extremely fast and cheap to train. However, we find that the standard algorithm might be improved in terms of memory requirements, latency and performances; this is of particular importance in low resources scenarios and in TinyML implementations on ultra-constrained microprocessors. Moreover, Anomaly Detection approaches currently do not take advantage of weak supervisions: being typically consumed in Decision Support Systems, feedback from the users, even if rare, can be a valuable source of information that is currently unexplored. Beside showing iForest training limitations, we propose here TiWS-iForest, an approach that, by leveraging weak supervision is able to reduce Isolation Forest complexity and to enhance detection performances. We showed the effectiveness of TiWS-iForest on real word datasets and we share the code in a public repository to enhance reproducibility.
Published: 2021
Full Text: View/download PDF

352. A Comparison of Anomaly Detection Methods for Industrial Screw Tightening

Author: Guilherme Moreira, André Luiz Pilastri, Luís Miguel Matos, Diogo Aires Gonçalves Ribeiro, Paulo Cortez, and Universidade do Minho
Subjects: Isolation Forest, One-class classification, Computer science, Context (language use), 02 engineering and technology, Indústria, inovação e infraestruturas, Unsupervised learning, Deep Learning, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Random Forest, Science & Technology, Local outlier factor, business.industry, Deep learning, Ciências Naturais::Ciências da Computação e da Informação, Pattern recognition, Autoencoder, Industry 4.0, Random forest, 020201 artificial intelligence & image processing, Anomaly detection, Artificial intelligence, business
Abstract: Within the context of Industry 4.0, quality assessment pro- cedures using data-driven techniques are becoming more critical due to the generation of massive amounts of production data. In this paper, we address the detection of abnormal screw tightening processes, which is a relevant industrial task. Since labeling is costly, requiring a manual effort, we focus on unsupervised approaches. In particular, we assume a low-dimensional input screw fastening approach that is based only on angle-torque pairs. Using such pairs, we explore three main unsuper- vised Machine Learning (ML) algorithms: Local Outlier Factor (LOF), Isolation Forest (iForest) and a deep learning Autoencoder (AE). For benchmarking purposes, we also explore a supervised Random Forest (RF) algorithm. Several computational experiments were held by us- ing recent industrial data with 2.8 million angle-torque pair records and a realistic and robust rolling window evaluation. Overall, high quality anomaly discrimination results were achieved by the iForest (99%) and AE (95% and 96%) unsupervised methods, which compared well against the supervised RF (99% and 91%). When compared with iForest, the AE requires less computation effort and provides faster anomaly detection response times., This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internation- alization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479].
Published: 2021
Full Text: View/download PDF

353. Development of behavioral data analytics SIEM models for detecting cybersecurity incidents

Subjects: autoencoder, Ð´Ð¾Ð»Ð³Ð°Ñ ÐºÑÐ°ÑÐºÐ¾ÑÑÐ¾ÑÐ½Ð°Ñ Ð¿Ð°Ð¼ÑÑÑ, ÐºÐ¾Ð¼Ð¿ÑÑÑÐµÑÐ½Ð°Ñ Ð±ÐµÐ·Ð¾Ð¿Ð°ÑÐ½Ð¾ÑÑÑ, isolation forest, Ð¿Ð¾Ð¸ÑÐº Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹, audit, one-class svm, ubuntu, neural networks, lstm, anomaly detection, ÑÐ±ÑÐ½ÑÐ°, Ð¸Ð·Ð¾Ð»ÑÑÐ¸Ð¾Ð½Ð½ÑÐ¹Ð»ÐµÑ, Ð¾Ð´Ð½Ð¾ÐºÐ»Ð°ÑÑÐ¾Ð²ÑÐ¹ Ð¼ÐµÑÐ¾Ð´ Ð¾Ð¿Ð¾ÑÐ½ÑÑ Ð²ÐµÐºÑÐ¾ÑÐ¾Ð², Ð½ÐµÐ¹ÑÐ¾Ð½Ð½ÑÐµ ÑÐµÑÐ¸, recurrent neural network, Ð°Ð²ÑÐ¾ÐºÐ¾Ð´Ð¸ÑÐ¾Ð²ÑÐ¸Ðº, ÑÐµÐºÑÑÑÐµÐ½ÑÐ½ÑÐµ Ð½ÐµÐ¹ÑÐ¾Ð½Ð½ÑÐµ ÑÐµÑÐ¸, Ð°ÑÐ´Ð¸Ñ ÑÐµÑÐ²Ð¸Ñ, computer security
Abstract: ÐÐ°Ð½Ð½Ð°Ñ ÑÐ°Ð±Ð¾ÑÑ Ð¿Ð¾ÑÐ²ÑÑÐµÐ½Ð° ÑÐ°Ð·ÑÐ°Ð±Ð¾ÑÐºÐµ Ð°Ð»Ð³Ð¾ÑÐ¸ÑÐ¼Ð° Ð²ÑÑÐ²Ð»ÐµÐ½Ð¸Ñ Ð½ÐµÑÐ¸Ð¿Ð¸ÑÐ½Ð¾Ð³Ð¾ Ð¿Ð¾Ð²ÐµÐ´ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑÐ·Ð¾Ð²Ð°ÑÐµÐ»Ñ, Ð²ÑÐ¿Ð¾Ð»Ð½ÑÑÑÐµÐ³Ð¾ ÑÐµÑÐ¼Ð¸Ð½Ð°Ð»ÑÐ½ÑÐµ ÐºÐ¾Ð¼Ð°Ð½Ð´Ñ Ð½Ð° ÑÐ´Ð°Ð»ÐµÐ½Ð½Ð¾Ð¼ ÑÐµÑÐ²ÐµÑÐµ. ÐÐ°Ð´Ð°ÑÐ¸, ÐºÐ¾ÑÐ¾ÑÑÐµ ÑÐµÑÐ°Ð»Ð¸ÑÑ Ð² Ñ Ð¾Ð´Ðµ ÑÐ°Ð·ÑÐ°Ð±Ð¾ÑÐºÐ¸: 1. ÐÐ¾Ð´ÐµÐ»Ð¸ÑÐ¾Ð²Ð°Ð½Ð¸Ðµ Ð¿Ð¾Ð²ÐµÐ´ÐµÐ½Ð¸Ñ ÑÐ¸ÑÑÐµÐ¼Ð½Ð¾Ð³Ð¾ Ð°Ð´Ð¼Ð¸Ð½Ð¸ÑÑÑÐ°ÑÐ¾ÑÐ° Ð¸ Ð°ÑÐ°ÐºÑÑÑÐµÐ¹ ÑÑÐ¾ÑÐ¾Ð½Ñ. 2. ÐÐ°ÑÑÑÐ¾Ð¹ÐºÐ° audit ÑÐµÑÐ²Ð¸ÑÐ° Ð´Ð»Ñ Ð¿Ð¾Ð»ÑÑÐµÐ½Ð¸Ñ Ð»Ð¾Ð³Ð¾Ð² Ð¸ Ð¸Ñ Ð¿Ð¾ÑÐ»ÐµÐ´ÑÑÑÐµÐ³Ð¾ Ð¿ÑÐµÐ¾Ð±Â ÑÐ°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð² Ð´Ð°Ð½Ð½ÑÐµ. 3. ÐÑÐ¸Ð¼ÐµÐ½ÐµÐ½Ð¸Ðµ Ð¸ Ð°Ð½Ð°Ð»Ð¸Ð· ÑÐ°Ð·Ð»Ð¸ÑÐ½ÑÑ Ð¼ÐµÑÐ¾Ð´Ð¾Ð² Ð¼Ð°ÑÐ¸Ð½Ð½Ð¾Ð³Ð¾ Ð¾Ð±ÑÑÐµÐ½Ð¸Ñ Ð´Ð»Ñ Ð·Ð°Ð´Ð°ÑÐ¸ Ð¿Ð¾Ð¸ÑÐºÐ° Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹. Ð ÑÐ°Ð±Ð¾ÑÐµ Ð±ÑÐ»Ð¸ Ð¿ÑÐµÐ´Ð»Ð¾Ð¶ÐµÐ½Ñ Ð¼Ð¾Ð´ÐµÐ»Ð¸ Ð´Ð»Ñ Ð¸Ð¼Ð¸ÑÐ°ÑÐ¸Ð¸ Ð¿Ð¾Ð²ÐµÐ´ÐµÐ½Ð¸Ñ ÑÐ¸ÑÑÐµÐ¼Ð½Ð¾Ð³Ð¾ Ð°Ð´Ð¼Ð¸Ð½Ð¸ÑÑÑÐ°ÑÐ¾ÑÐ° Ð¸ Ð°ÑÐ°ÐºÑÑÑÐµÐ¹ ÑÑÐ¾ÑÐ¾Ð½Ñ, Ð¿ÑÐµÐ´Ð»Ð¾Ð¶ÐµÐ½Ð° ÐºÐ¾Ð½ÑÐ¸Ð³ÑÑÐ°ÑÐ¸Ñ audit ÑÐµÑÐ²Ð¸ÑÐ°, ÑÐ°Ð·ÑÐ°Ð±Ð¾ÑÐ°Ð½Ð° Ð¼Ð°ÑÐµÐ¼Ð°ÑÐ¸ÑÐµÑÐºÐ°Ñ Ð¼Ð¾Ð´ÐµÐ»Ñ Ð´Ð°Ð½Ð½ÑÑ Ð¸ ÑÐ°ÑÑÐ¼Ð¾ÑÑÐµÐ½Ñ Ð°Ð»Ð³Ð¾ÑÐ¸ÑÐ¼Ñ Ð¸Ð·Ð¾Ð»ÑÑÐ¸Ð¾Ð½ÂÐ½Ð¾Ð³Ð¾ Ð»ÐµÑÐ°, Ð¾Ð´Ð½Ð¾ÐºÐ»Ð°ÑÑÐ¾Ð³Ð¾ Ð¼ÐµÑÐ¾Ð´Ð° Ð¾Ð¿Ð¾ÑÐ½ÑÑ Ð²ÐµÐºÑÐ¾ÑÐ¾Ð² Ð¸ Ð½ÐµÐ¹ÑÐ¾Ð½Ð½ÑÑ ÑÐµÑÐµÐ¹ Ð´Ð»Ñ Ð¿Ð¾Ð¸ÑÐºÐ° Ð½ÐµÑÐ¸Ð¿Ð¸ÑÐ½Ð¾Ð³Ð¾ Ð¿Ð¾Ð²ÐµÐ´ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑÐ·Ð¾Ð²Ð°ÑÐµÐ»Ñ. Ð ÑÐµÐ·ÑÐ»ÑÑÐ°ÑÐµ Ð±ÑÐ» Ð¿ÑÐ¾Ð¸Ð·Ð²ÐµÐ´ÐµÐ½ ÑÑÐ°Ð²Ð½Ð¸ÑÐµÐ»ÑÐ½ÑÐ¹ Ð°Ð½Ð°Ð»Ð¸Ð· ÑÐ°ÑÑÐ¼Ð¾ÑÑÐµÐ½Ð½ÑÑ Ð°Ð»Ð³Ð¾ÂÑÐ¸ÑÐ¼Ð¾Ð², Ð¿ÑÐ¾Ð¸Ð·Ð²ÐµÐ´ÐµÐ½Ð° Ð¾ÑÐµÐ½ÐºÐ° Ð²Ð¾Ð·Ð¼Ð¾Ð¶Ð½Ð¾ÑÑÐ¸ Ð¿ÑÐ¸Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð¼ÐµÑÐ¾Ð´Ð¾Ð² Ðº Ð´Ð°Ð½Ð½Ð¾Ð¹ Ð·Ð°Ð´Ð°ÑÐµ., This program is designed to develop algorithms for detection anomaly userâs behavior, that execute terminal commands on the remote server. Tasks that were solved during development: 1. Modeling the behavior of the system administrator and the attacker. 2. Configuring the audit service to receive logs and then convert them into data. 3. Application and analysis of various machine learning methods for the problem of finding anomalies. The mathematical model of the data is developed. In this work, models were proposed to simulate the behavior of the system administrator and the attacker. The audit service configuration is proposed. The algorithms of the isolation forest, the one-class svm and neural networks for searching for anomaly user behavior are considered. As a result, comparative analysis of the considered algorithms was performed and the possibility of applying the methods to this problem is evaluated
Published: 2021
Full Text: View/download PDF

354. Robust soft computing control algorithm for sustainable enhancement of renewable energy sources based microgrid: A hybrid Garra rufa fish optimization – Isolation forest approach.

Author: Anantha Krishnan, V. and Senthil Kumar, N.
Subjects: RENEWABLE energy sources, MICROGRIDS, SOFT computing, VOLTAGE-frequency converters, IDEAL sources (Electric circuits)
Abstract: A novel hybrid Garra rufa Fish optimization (GRFO) – isolation Forest (iForest) soft computing approach is proposed in this paper for optimizing the controller parameters in an isolated test microgrid. The test microgrid comprises of two PV units and one wind generator synchronized via Voltage Source Converters (VSC) in the ac side. Each VSC is regulated based on power generation in individual micro sources. The operation of VSC is the key factor maintaining the stability of the micro grid. Each VSC is controlled by outer power loop and inner current regulation loop employing PI controllers. The performance of the VSC is enhanced using the proposed hybrid GRFO – iForest algorithm for tuning the PI controller gains. The performance of the proposed approach is then compared with conventional tuning PI controller and optimization techniques like Ant Lion Optimization (ALO) and Modified Ant Lion optimization based Artificial Neural Network (MALANN). The gain parameter of the proposed controller is optimally tuned and the controller provides reliable sustainable microgrid system operation. The proposed approach is simulated in MATLAB/Simulink and validated using OP-4500 Real Time Simulator environment. Based on the results, the performance of the GRFO-iForest approach is superior during steady-state and transient operation. It enhances the sustainability of the microgrid by restoring normal operating conditions after a small physical disturbance and the microgrid remains stable with the optimized regulator. The stability of the overall system is proved with the mathematical model. • A hybrid optimization technique is proposed to optimize the gains of PI controller. • The control of Voltage Source Converter (VSC) is optimized in accordance with the available generation from renewable energy sources. • The optimized controller stabilizes the output power of the microgrid generators effectively under various operating conditions. • The steady-state error on the power output of distributed generators is reduced by 88% compared to other methods. • A sustainable microgrid system is developed which is fault tolerant in various operating conditions. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

355. Explainable anomaly detection framework for predictive maintenance in manufacturing systems.

Author: Choi, Heejeong, Kim, Donghwa, Kim, Jounghee, Kim, Jina, and Kang, Pilsung
Subjects: MANUFACTURING processes, PREDICTIVE control systems
Abstract: To conduct preemptive essential maintenance, predictive maintenance detects the risk of unexpected shutdowns in a manufacturing system, thereby ensuring operational continuity. Traditional methods that heavily rely on the domain knowledge of expert engineers to detect any abnormal status in processing facilities are extremely time-consuming and domain-dependent. Conversely, recently studied data-driven approaches without much domain knowledge have yielded fairly good performance. However, most only identify whether the current status is normal or abnormal and do not offer any explanations or analyses. In this paper, we propose a real-time explainable anomaly detection framework for predictive maintenance in a manufacturing system. Various well-known anomaly detection algorithms are investigated to construct a framework suitable for shutdown prognosis. In addition, model interpretation techniques are also employed to provide a reasonable explanation for a detected shutdown. The experimental results on a real-world dataset derived from a chemical process show that the proposed framework could identify abnormal signs early and derive significant causes for each detected shutdown. • We proposed explainable anomaly detection framework for predictive maintenance. • The framework detects the warning signs of a shutdown early in real-time. • The framework detects anomalies using only the characteristics of normal instances. • The framework provides the root causes of a detected shutdown. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

356. Reconstruction of Sentinel-2 derived time series using robust Gaussian mixture models — Application to the detection of anomalous crop development.

Author: Mouret, Florian, Albughdadi, Mohanad, Duthoit, Sylvie, Kouamé, Denis, Rieu, Guillaume, and Tourneret, Jean-Yves
Subjects: *GAUSSIAN mixture models, *CROP development, *TIME series analysis, *SYNTHETIC aperture radar, *MULTISPECTRAL imaging, *RAPESEED
Abstract: • We show that GMM are efficient for the joint reconstruction of vegetation indices. • A robust GMM is proposed to take into account the presence of irrelevant samples. • The proposed approaches are validated on wheat and rapeseed parcels in France. • Using Sentinel-1 data can improve the imputation of vegetation indices. • The proposed algorithms are applied to the detection of anomalous crop development. Missing data is a recurrent problem in remote sensing, mainly due to cloud coverage for multispectral images and acquisition problems. This can be a critical issue for crop monitoring, especially for applications relying on machine learning techniques, which generally assume that the feature matrix does not have missing values. This paper proposes a Gaussian Mixture Model (GMM) for the reconstruction of parcel-level features extracted from multispectral images. A robust version of the GMM is also investigated, since datasets can be contaminated by inaccurate samples or features (e.g., wrong crop type reported, inaccurate boundaries, undetected clouds, etc). Additional features extracted from Synthetic Aperture Radar (SAR) images using Sentinel-1 data are also used to provide complementary information and improve the imputations. The robust GMM investigated in this work assigns reduced weights to the outliers during the estimation of the GMM parameters, which improves the final reconstruction. These weights are computed at each step of an Expectation–Maximization (EM) algorithm by using outlier scores provided by the isolation forest (IF) algorithm. Experimental validation is conducted on rapeseed and wheat parcels located in the Beauce region (France). Overall, we show that the GMM imputation method outperforms other reconstruction strategies. A mean absolute error (MAE) of 0.013 (resp. 0.019) is obtained for the imputation of the median Normalized Difference Index (NDVI) of the rapeseed (resp. wheat) parcels. Other indicators (e.g., Normalized Difference Water Index) and statistics (for instance the interquartile range, which captures heterogeneity among the parcel indicator) are reconstructed at the same time with good accuracy. In a dataset contaminated by irrelevant samples, using the robust GMM is recommended since the standard GMM imputation can lead to inaccurate imputed values. An application to the monitoring of anomalous crop development in the presence of missing data is finally considered. In this application, using the proposed method leads to the best detection results, especially when SAR data are used jointly with multispectral images. Exploiting the information contained in cloudy multispectral images instead of removing these images is beneficial for this application. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

357. iBeacon Indoor Positioning Method Combined with Real-Time Anomaly Rate to Determine Weight Matrix

Author: Weizhu Zhu, Guo Yu, Jiazhu Zheng, Guiqiu Xiang, and Di Shaoning
Subjects: Computer science, 02 engineering and technology, lcsh:Chemical technology, 01 natural sciences, Biochemistry, Article, Analytical Chemistry, iBeacon, Base station, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, lcsh:TP1-1185, Electrical and Electronic Engineering, Instrumentation, business.industry, 010401 analytical chemistry, Levenberg-Marquadt, indoor positioning, isolation forest, Atomic and Molecular Physics, and Optics, anomaly detection, 0104 chemical sciences, iBeacon-based positioning, 020201 artificial intelligence & image processing, Anomaly detection, Artificial intelligence, business
Abstract: This paper proposes an indoor positioning method based on iBeacon technology that combines anomaly detection and a weighted Levenberg&ndash, Marquadt (LM) algorithm. The proposed solution uses the isolation forest algorithm for anomaly detection on the collected Received Signal Strength Indicator (RSSI) data from different iBeacon base stations, and calculates the anomaly rate of each signal source while eliminating abnormal signals. Then, a weight matrix is set by using each anomaly ratio and the RSSI value after eliminating the abnormal signal. Finally, the constructed weight matrix and the weighted LM algorithm are combined to solve the positioning coordinates. An Android smartphone was used to verify the positioning method proposed in this paper in an indoor scene. This experimental scenario revealed an average positioning error of 1.540 m and a root mean square error (RMSE) of 1.748 m. A large majority (85.71%) of the positioning point errors were less than 3 m. Furthermore, the RMSE of the method proposed in this paper was, respectively, 38.69%, 36.60%, and 29.52% lower than the RMSE of three other methods used for comparison. The experimental results show that the iBeacon-based indoor positioning method proposed in this paper can improve the precision of indoor positioning and has strong practicability.
Published: 2020
Full Text: View/download PDF

358. THE EFFECTIVENESS OF MACHINE LEARNING-BASED ANOMALY DETECTION ALGORITHMS APPLIED TO DEFENSE CONTRACT FINANCIAL DATA

Author: Edmonds, Keith D., Koyak, Robert A., Smithmeyer, Colby J., and Operations Research (OR)
Subjects: auditors, USAspending.gov, machine learning, benchmark, IF, defense contract, Army, contracts, isolation forest, benchmarking, anomaly detection, financial data, dimensionality reduction
Abstract: In fiscal year 2020, the U.S. Army spent nearly $77 billion on contracts. Auditors employ various techniques, including anomaly detection, to select contracts that merit scrutiny. But in a resource-constrained environment, auditors can review only a limited number of contracts. Using data obtained from USAspending.gov, we consider how anomaly detection combined with dimensionality reduction can be used to recommend contracts for investigation. We analyze over 20,000 fixed-price Army contracts between fiscal years 2017 to 2020, using more than one hundred combinations of dimensionality reduction and anomaly detection techniques, and formations of artificial anomalies. A consistent finding is that dimensionality reduction using principal components or autoencoders is not demonstrably beneficial. This finding may be due to the discrete nature of the USAspending.gov data and may not apply to other data sets. The best performance is obtained using isolation forests for anomaly detection without dimensionality reduction. Outstanding Thesis Major, United States Army Approved for public release. distribution is unlimited
Published: 2020

359. Convolutional Autoencoder-Based Flaw Detection for Steel Wire Ropes

Author: Zhang Guoyong, Weihua Gui, Zhaohui Tang, and Jin Zhang
Subjects: flaw detection, Computer science, 02 engineering and technology, Iterative reconstruction, engineering.material, lcsh:Chemical technology, Biochemistry, Article, Analytical Chemistry, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, lcsh:TP1-1185, Electrical and Electronic Engineering, Instrumentation, autoencoder, Training set, business.industry, 020208 electrical & electronic engineering, isolation forest, Wire rope, Pattern recognition, Autoencoder, Atomic and Molecular Physics, and Optics, wire rope, few training data, engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Feature learning, Classifier (UML)
Abstract: Visual perception-based methods are a promising means of capturing the surface damage state of wire ropes and hence provide a potential way to monitor the condition of wire ropes. Previous methods mainly concentrated on the handcrafted feature-based flaw representation, and a classifier was constructed to realize fault recognition. However, appearances of outdoor wire ropes are seriously affected by noises like lubricating oil, dust, and light. In addition, in real applications, it is difficult to prepare a sufficient amount of flaw data to train a fault classifier. In the context of these issues, this study proposes a new flaw detection method based on the convolutional denoising autoencoder (CDAE) and Isolation Forest (iForest). CDAE is first trained by using an image reconstruction loss. Then, it is finetuned to minimize a cost function that penalizes the iForest-based flaw score difference between normal data and flaw data. Real hauling rope images of mine cableways were used to test the effectiveness and advantages of the newly developed method. Comparisons of various methods showed the CDAE-iForest method performed better in discriminative feature learning and flaw isolation with a small amount of flaw training data.
Published: 2020
Full Text: View/download PDF

360. Comparison of New Anomaly Detection Technique for Wind Turbine Condition Monitoring Using Gearbox SCADA Data

Author: David Infield, Alasdair McDonald, Sofia Koukoura, Conor McKinnon, Conaill Soraghan, and James Carroll
Subjects: Isolation Forest, anomaly detection, gearbox, SCADA, condition monitoring, One Class Support Vector Machine, Elliptical Envelope, Control and Optimization, Computer science, 020209 energy, Energy Engineering and Power Technology, 02 engineering and technology, Fault (power engineering), Turbine, lcsh:Technology, Fault detection and isolation, 0202 electrical engineering, electronic engineering, information engineering, elliptical envelope, Electrical and Electronic Engineering, Engineering (miscellaneous), Wind power, Renewable Energy, Sustainability and the Environment, business.industry, lcsh:T, 020208 electrical & electronic engineering, isolation forest, Condition monitoring, one class support vector machine, Reliability engineering, Support vector machine, TA170, Anomaly detection, business, Energy (miscellaneous)
Abstract: Anomaly detection for wind turbine condition monitoring is an active area of research within the wind energy operations and maintenance (O & M) community. In this paper three models were compared for multi-megawatt operational wind turbine SCADA data. The models used for comparison were One-Class Support Vector Machine (OCSVM), Isolation Forest (IF), and Elliptical Envelope (EE). Each of these were compared for the same fault, and tested under various different data configurations. IF and EE have not previously been used for fault detection for wind turbines, and OCSVM has not been used for SCADA data. This paper presents a novel method of condition monitoring that only requires two months of data per turbine. These months were separated by a year, the first being healthy and the second unhealthy. The number of anomalies is compared, with a greater number in the unhealthy month being considered correct. It was found that for accuracy IF and OCSVM had similar performances in both training regimes presented. OCSVM performed better for generic training, and IF performed better for specific training. Overall, IF and OCSVM had an average accuracy of 82% for all configurations considered, compared to 77% for EE.
Published: 2020
Full Text: View/download PDF

361. Data science for tax administration

Author: Pijnenburg, M.G.F., Kowalczyk, W.J., Kraaij, W., Kok, J.N., Kowalczyk W.J., Plaat A., Bäck T.H.W, Hel-van Dijk E.C.J.M. van der, Blockeel H., Siebes A.P.J.M., Arendsen R., Veenman C.J., Knobbe A.J., and Leiden University
Subjects: Isolation Forest, Statistical tests in process mining, Parallel Coordinates Plots, Restricted Boltzmann Machines, Singular Outliers, SODA, Logistic Regression, Factorization Machines, Tax Administration, Tax Analytics
Abstract: In this PhD-thesis several new and existing data science application are described that are particularly focused on applications for tax administrations. The thesis contains a chapter on the managerial side of analytics with a balanced overview of the pros and cons of applying analytics within taxpayer supervision. Another topic is (tax) fraud detection with unsupervised anomaly detection techniques. Here a new type of outliers is described (singular outliers) and an algorithm is provided for finding them. Attention is also paid to improving risk selection models. It is noted that most current algorithms cannot treat interactions of categorical variables with many levels very well. An extension of logistic regression is provided that uses Factorization Machines, which resulted in a ten percent improvement in precision. A fourth topic is statistical testing on similar treatment of similar cases. A contribution is made by providing an algorithm to statistically test on similar treatment based on process logs. The thesis contains further a benchmark study of different anomaly detection algorithms. Finally HR Analytics, Reinforcement Learning and applications of fuzzy sets are shortly described.
Published: 2020

362. Mašininio mokymosi pritaikymas blokų grandinių tyrimui ir sukčiavimo aptikimui

Author: Žilinskas, Evaldas, Alzbutas, Robertas, and Šapkauskienė, Alfreda
Subjects: sukčiavimo aptikimas, fraud detection, izoliavimo miškas, k-means, bitkoinas, bitcoin, ethereum, isolation forest, eteris, k-vidurkių metodas
Abstract: Susidomėjimas blokų grandinės technologija auga nuo šios idėjos atsiradimo 2008 metais. Tai palyginti nauja technologija, kuri gyvuoja dar tik apie 12 metų, tačiau sulaukia nemažai žiniasklaidos ir mokslininkų dėmesio. Pagrindinis domėjimosi objektas yra bitkoino kriptografinė valiuta, kuriai blokų grandinė pirmiausia ir buvo sukurta. Kriptografinių valiutų populiarumas pritraukia ir įvairių sukčių, kurie vykdo nekorektiškas veiklas ir stengiasi gauti finansinės naudos. Kol kas 2019 metai pagal sukčiavimo apimtis buvo patys didžiausi ir per šiuos metus padaryta žala yra vertinama 4,3 milijardais dolerių. Bitkoinui tenka didžiausia šių nuostolių suma, tačiau antra pagal kapitalizacijos dydį eterio kriptografinė valiuta taip pat sulaukia sukčių dėmesio. Sukčiavimo aptikimas yra pirmas veiksmas norint sumažinti riziką ir apsisaugoti nuo galimų vagysčių ir apgavysčių. Šio darbo tikslas – pritaikant didžiųjų duomenų analitikos metodus sukurti mašininio mokymosi modelį, kuris įgalintų apdoroti didelius duomenų kiekius ir sėkmingai aptikti sukčiavimus bitkoino ir eterio blokų grandinėje. Visi bitkoino ir eterio sandoriai yra viešai prieinami. Panaudojant šiuos didžiuosius duomenis buvo išskirti požymiai (atliktų pavedimų skaičius, vidutinė atliktų pavedimų vertė ir pan.), kurie naudojami kuriant modelius. Sukčiavimo aptikimui buvo sukurti išskirčių nustatymo modeliai paremti k-vidurkių ir izoliavimo miško metodais. Dėl turimo didelio duomenų kiekio buvo kuriami atitinkamų modelių ansambliai. Sukurti mašininio mokymosi modeliai leido identifikuoti kriptografinių valiutų adresus (bankinės sąskaitos atitikmuo kriptografinių valiutų blokų grandinėje), kurie yra susiję su sukčiavimo atvejais. Bitkoino kriptografinės valiutos blokų grandinėje vienas k-vidurkių modelis, k-vidurkių modelių ansamblis, izoliavimo miško modelių ansamblis aptiko panašų apgaulių kiekį (29–30). Eterio kriptografinės valiutos blokų grandinėje sukčiavimą geriausiai sekėsi aptikti k-vidurkių modelių ansambliui, kuris iš viso aptiko 65 apgaules. Rezultatams patikrinti buvo naudojami trys skirtingi apgaulių duomenų rinkiniai. Iš „BitcoinTalk“ duomenų rinkinio pavyko identifikuoti 15 iš 16 bitkoino adresų, susijusių su apgaulėmis. Tai yra labai geras rezultatas, nes panašiuose tyrimuose, iš šio duomenų rinkinio, daugiausiai pavykdavo aptikti tik 5 apgaulės atvejus. Iš Ponzi schemų duomenų rinkinio eterio blokų grandinėje pavyko identifikuoti 64 apgaules iš 102. Iš „CryptoScamDB“ duomenų rinkinio bitkoino blokų grandinėje dėl ten patenkančių mažesnio mąsto apgaulių identifikuoti pavyko 14 apgaulių iš 140. Taip pat atlikti tyrimai parodė, kad mašininio mokymosi modeliai sukurti naudojant bitkoino sandorių duomenis, gali būti sėkmingai panaudoti aptinkant sukčiavimo atvejus eterio blokų grandinėje. Tačiau, modeliai, kurti naudojant bitkoino sandorių duomenis, aptinka mažiau apgaulių eterio blokų grandinėje, nei modeliai sukurti naudojant eterio sandorių duomenis., Interest in blockchain technology has been growing since 2008 when this concept was created. It is a relatively new technology that has been around for only 12 years but has received much attention in the media and from academics. The main object of the media’s focus is the bitcoin cryptocurrency, for which blockchain technology was first developed. The popularity of cryptocurrencies also attracts various scammers who engage in improper activities and seek financial gain. So far, the most significant damage has been done in 2019 and is estimated at $ 4.3 billion. As bitcoin is the most popular cryptocurrency, it bears the most considerable amount of damage caused by these thefts and frauds. Ethereum, the second-largest cryptocurrency by capitalization, is also receiving attention from scammers. Fraud detection is the first step in reducing risk and preventing potential theft and fraud. This study aims to develop a machine learning model using big data analytics methods that would be able to process large amounts of data and successfully identify fraud within the bitcoin and ethereum blockchain. All bitcoin and ethereum transactions are publicly available. Using these big data, the features (number of transactions received, an average value of the received transaction, etc.) that were used to develop the models were extracted. The k-means and the isolation forest methods were applied to create fraud detection model. Due to the big amount of data available, ensembles of these methods were developed. The developed machine learning models identified addresses that are associated with cases of fraud and scam. Looking at the overall results, one k-means model, an ensemble of k-means models, and an ensemble of isolation forest models found almost the same number of frauds in the bitcoin blockchain (29–30). In the ethereum blockchain, frauds were best detected by using an ensemble of k-means models, which caught a total of 65 scams. Three different data sets of fraud were used to verify the results. The developed models in the BitcoinTalk dataset identified 15 of the 16 bitcoin addresses associated with frauds. This is very good result, as a maximum 5 cases of fraud were detected in similar studies before. In the Ponzi schemes dataset were identified 64 scams in the ethereum blockchain out of 102. The developed models in the CryptoScamDB dataset identified 14 scams in the bitcoin blockchain of 140 because this dataset included smaller scams. Studies by other authors using Ponzi schemes and CryptoScamDB dataset use different methods (e.g. classification methods are used, results are calculated differently) and therefore the results are not comparable. This study has also shown that machine learning models developed using bitcoin transaction data can be successfully used to detect fraud in the ethereum blockchain. However, models developed using bitcoin transaction data detect fewer cases of scam in the ethereum blockchain than models developed using ethereum transaction data.
Published: 2020

363. Quality Monitoring for Micro Resistance Spot Welding with Class-Imbalanced Data Based on Anomaly Detection

Author: Ran Tian, Jiaquan Zeng, and Biao Cao
Subjects: Production line, 0209 industrial biotechnology, Computer science, Feature extraction, 02 engineering and technology, Welding, lcsh:Technology, Automotive engineering, law.invention, lcsh:Chemistry, 020901 industrial engineering & automation, law, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, class-imbalanced data, Instrumentation, Spot welding, lcsh:QH301-705.5, micro resistance spot welding, Fluid Flow and Transfer Processes, Local outlier factor, lcsh:T, Process Chemistry and Technology, General Engineering, isolation forest, anomaly detection, lcsh:QC1-999, Computer Science Applications, Support vector machine, quality monitoring, lcsh:Biology (General), lcsh:QD1-999, lcsh:TA1-2040, 020201 artificial intelligence & image processing, Anomaly detection, lcsh:Engineering (General). Civil engineering (General), lcsh:Physics, Voltage
Abstract: Micro resistance spot welding (MRSW) is an important technology widely used in electronics manufacturing for micro component joining. For the joining of micro enameled wire, quality control is heavily dependent on manual inspection till now. In this paper, a quality monitoring approach based on isolation forest (iForest) is proposed to identify abnormal welds and normal welds. Electrode voltage and welding current of over 110,000 spot welds were collected from a production line. The dynamic resistance and heat input were calculated for all welds and used for feature extraction. A class imbalance problem existed in the collected dataset because abnormal welds were far fewer than normal welds. The anomaly detection model based on iForest was established for the imbalanced data classification after comparison with other methods such as one-class (support vector machine) SVM and local outlier factor. Test results show that the similarity of dynamic resistance profile and heat input compared with the previous ten welds are valid features for detecting a part of the abnormal welds. The iForest model is effective for distinguishing incomplete fusion welds from normal welds with high efficiency. It can assist in the on-line quality monitoring of enameled wire welding process in production.
Published: 2020

364. Performance Evaluation of the Multiple Quantile Regression Model for Estimating Spatial Soil Moisture after Filtering Soil Moisture Outliers

Author: Chung-Gil Jung, Jiwan Lee, Yonggwan Lee, and Seong-Joon Kim
Subjects: Terra MODIS, 010504 meteorology & atmospheric sciences, Science, 0208 environmental biotechnology, multiple quantile regression, isolation forest, Quantile regression model, 02 engineering and technology, outlier detection, 01 natural sciences, 020801 environmental engineering, spatial soil moisture, Outlier, Statistics, General Earth and Planetary Sciences, Environmental science, Water content, 0105 earth and related environmental sciences
Abstract: The spatial distribution of soil moisture (SM) was estimated by a multiple quantile regression (MQR) model with Terra Moderate Resolution Imaging Spectroradiometer (MODIS) and filtered SM data from 2013 to 2015 in South Korea. For input data, observed precipitation and SM data were collected from the Korea Meteorological Administration and various institutions monitoring SM. To improve the work of a previous study, prior to the estimation of SM, outlier detection using the isolation forest (IF) algorithm was applied to the observed SM data. The original observed SM data resulted in IF_SM data following outlier detection. This study obtained an average data removal rate of 20.1% at 58 stations. For various reasons, such as instrumentation, environment, and random errors, the original observed SM data contained approximately 20% uncertain data. After outlier detection, this study performed a regression analysis by estimating land surface temperature quantiles. The soil characteristics were considered through reclassification into four soil types (clay, loam, silt, and sand), and the five-day antecedent precipitation was considered in order to estimate the regression coefficient of the MQR model. For all soil types, the coefficient of determination (R2) and root mean square error (RMSE) values ranged from 0.25 to 0.77 and 1.86% to 12.21%, respectively. The MQR results showed a much better performance than that of the multiple linear regression (MLR) results, which yielded R2 and RMSE values of 0.20 to 0.66 and 1.08% to 7.23%, respectively. As a further illustration of improvement, the box plots of the MQR SM were closer to those of the observed SM than those of the MLR SM. This result indicates that the cumulative distribution functions (CDF) of MQR SM matched the CDF of the observed SM. Thus, the MQR algorithm with outlier detection can overcome the limitations of the MLR algorithm by reducing both the bias and variance.
Published: 2020
Full Text: View/download PDF

365. Unsupervised crop anomaly detection at the parcel-level using optical and SAR images: application to wheat and rapeseed crops

Author: Mouret, Florian, Albughdadi, Mohanad, Duthoit, Sylvie, Kouamé, Denis, Poilvé, Hervé, Rieu, Guillaume, Tourneret, Jean-Yves, CoMputational imagINg anD viSion (IRIT-MINDS), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, TerraNIS, Airbus Defence and Space [Toulouse], and Research project funded by TerraNIS SAS
Subjects: Isolation Forest, Unsupervised crop monitoring, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Parcel-level analysis, Sentinel-2 images, Sentinel-1, Crop anomaly detection, Wheat crops, Local Outlier Factor, Rapeseed crops, One-Class SVM
Abstract: This paper proposes a generic approach for crop anomaly detection at the parcel-level based on unsupervised point anomaly detection techniques. The input data is derived from synthetic aperture radar (SAR) and optical images acquired using Sentinel-1 and Sentinel-2 satellites. The proposed strategy consists of four sequential steps: acquisition and preprocessing of optical and SAR images, extraction of optical and SAR indicators, computation of zonal statistics at the parcel-level and point anomaly detection. This paper analyzes different factors that can affect the results of anomaly detection such as the considered features and the anomaly detection algorithm used. The proposed procedure is validated on two crop types in Beauce (France), namely, rapeseed and wheat crops. Two different parcel delineation databases are considered to validate the robustness of the strategy to changes in parcel boundaries.
Published: 2020

366. Malware triage for early identification of Advanced Persistent Threat activities

Author: Luca Mazzotti, Riccardo Lazzeretti, and Giuseppe Laurenza
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Advanced persistent threat, Isolation Forest, Computer Science - Cryptography and Security, Computer Networks and Communications, Computer science, computer.software_genre, Computer security, 01 natural sciences, Modularity, Machine Learning (cs.LG), 010104 statistics & probability, 0101 mathematics, Malware analysis, Malware Analysis, Class (computer programming), business.industry, Triage, Computer Science Applications, Identification (information), Knowledge base, Advanced Persistent Threats, Hardware and Architecture, Malware, business, Cryptography and Security (cs.CR), Safety Research, computer, Software, Information Systems
Abstract: In the last decade, a new class of cyber-threats has emerged. This new cybersecurity adversary is known with the name of "Advanced Persistent Threat" (APT) and is referred to different organizations that in the last years have been "in the center of the eye" due to multiple dangerous and effective attacks targeting financial and politic, news headlines, embassies, critical infrastructures, TV programs, etc. In order to early identify APT related malware, a semi-automatic approach for malware samples analysis is needed. In our previous work we introduced a "malware triage" step for a semi-automatic malware analysis architecture. This step has the duty to analyze as fast as possible new incoming samples and to immediately dispatch the ones that deserve a deeper analysis, among all the malware delivered per day in the cyber-space, the ones that really worth to be further examined by analysts. Our paper focuses on malware developed by APTs, and we build our knowledge base, used in the triage, on known APTs obtained from publicly available reports. In order to have the triage as fast as possible, we only rely on static malware features, that can be extracted with negligible delay, and use machine learning techniques for the identification. In this work we move from multiclass classification to a group of oneclass classifier, which simplify the training and allows higher modularity. The results of the proposed framework highlight high performances, reaching a precision of 100% and an accuracy over 95%
Published: 2020

367. Anomaly detection of user behaviour for cybersecurity

Author: Rodríguez Formisano, Emer, Béjar Alonso, Javier, and Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
Subjects: CI/CD, cybersecurity, proxy web, isolation forest, Dades massives, quadre de comandament, Seguretat informàtica, outlier detection, ciberseguretat, anomaly detection, detecció d'anomalies, Big data, núvol, analítica, Computer security, Informàtica [Àrees temàtiques de la UPC], agrupació, dashboard, Machine learning, Aprenentatge automàtic, analytics, cloud, clustering
Published: 2020

368. Tracing outliers in the dataset of Drosophila suzukii records with the Isolation Forest method

Author: Alessandro Cini, Alessio Papini, and Ugo Santosuosso
Subjects: 0106 biological sciences, lcsh:Computer engineering. Computer hardware, Information Systems and Management, Geospatial analysis, Computer Networks and Communications, Computer science, Big data, lcsh:TK7885-7895, computer.software_genre, 010603 evolutionary biology, 01 natural sciences, Isolation forest, lcsh:QA75.5-76.95, Silhouette, invasive species, Invasion, Drosophila suzukii, Cluster analysis, K-means, Drosophila suzuki, Geoprofiling, lcsh:T58.5-58.64, lcsh:Information technology, business.industry, k-means clustering, Isolation forest, Geoprofiling, Drosophila suzukii, invasive species, K-means, Data set, 010602 entomology, Hardware and Architecture, Outlier, Geographic profiling, lcsh:Electronic computers. Computer science, Data mining, business, computer, Information Systems
Abstract: The analysis of big data is a fundamental challenge for the current and future stream of data coming from many different sources. Geospatial data is one of the sources currently less investigated. A typical example of always increasing data set is that produced by the distribution data of invasive species on the concerned territories. The dataset of Drosophila suzuki invasion sites in Europe up to 2011 was used to test a possible method to pinpoint its outliers (anomalies). Our aim was to find a method of analysis that would be able to treat large amount of data in order to produce easily readable outputs to summarize and predict the status and, possibly, the future development of a biological invasion. To do that, we aimed to identify the so called anomalies of the dataset, identified with a Python script based on the machine learning algorithm “Isolation Forest”. We used also the K-Means clustering method to partition the dataset. In our test, based on a real dataset, the Silhouette method yielded a number of clusters of 10 as the best result. The clusters were drawn on the map with a Voronoi tessellation, showing that 8 clusters were centered on industrial harbours, while the last two were in the hinterland. This fact led us to guess that: (1) the main entrance mechanisms in Europe may be the wares import fluxes through ports, occurring apparently several times; (2) the spreading into the inland may be due to road transportation of wares; (3) the outliers (anomalies) found with the isolation forest method would identify individuals or populations that tend to detach from their original cluster and hence represent indications about the lines of further spreading of the invasion. This type of analysis aims hence to identify the future direction of an invasion, rather than the center of origin as in the case of geographic profiling. Isolation Forest provides therefore complimentary results with respect to PGP. The recent records of the invasive species, mainly localized close to the outliers position, are an indication that the isolation forest method can be considered predictive and proved to be a useful method to treat large datasets of geospatial data.
Published: 2020

369. Development of a system for the analysis of anomalies in magnetometric diagnostics

Subjects: python, machine learning, it-technologies, lof algorithm, softwear development, ÑÐ°Ð·ÑÐ°Ð±Ð¾ÑÐºÐ° Ð¿ÑÐ¾Ð³ÑÐ°Ð¼Ð¼Ð½Ð¾Ð³Ð¾ Ð¾Ð±ÐµÑÐ¿ÐµÑÐµÐ½Ð¸Ñ, lof Ð°Ð»Ð³Ð¾ÑÐ¸ÑÐ¼, isolation forest, it-ÑÐµÑ Ð½Ð¾Ð»Ð¾Ð³Ð¸Ð¸, anomaly detection, Ð²ÑÑÐ²Ð»ÐµÐ½Ð¸Ðµ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹
Abstract: ÐÑÐ¿ÑÑÐºÐ½Ð°Ñ ÐºÐ²Ð°Ð»Ð¸ÑÐ¸ÐºÐ°ÑÐ¸Ð¾Ð½Ð½Ð°Ñ ÑÐ°Ð±Ð¾ÑÐ° Ð¿Ð¾ÑÐ²ÑÑÐµÐ½Ð° ÑÐ°Ð·ÑÐ°Ð±Ð¾ÑÐºÐµ ÑÐ¸ÑÑÐµÐ¼Ñ Ð´Ð»Ñ Ð²ÑÑÐ²Ð»ÐµÐ½Ð¸Ñ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹ Ð¿ÑÐ¸ Ð¼Ð°Ð³Ð½Ð¸ÑÐ¾Ð¼ÐµÑÑÐ¸ÑÐµÑÐºÐ¾Ð¹ Ð´Ð¸Ð°Ð³Ð½Ð¾ÑÑÐ¸ÐºÐµ. ÐÐ°Ð½Ð½Ð°Ñ ÑÐ¸ÑÑÐµÐ¼Ð° Ð¿ÑÐµÐ´Ð½Ð°Ð·Ð½Ð°ÑÐµÐ½Ð° Ð´Ð»Ñ Ð²ÑÑÐ²Ð»ÐµÐ½Ð¸Ñ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹ Ð² ÑÑÑÐ±Ð¾Ð¿ÑÐ¾Ð²Ð¾Ð´Ðµ. Ð Ñ Ð¾Ð´Ðµ ÑÐ°Ð±Ð¾ÑÑ ÑÐ°ÑÑÐ¼Ð¾ÑÑÐµÐ½Ñ ÑÑÑÐµÑÑÐ²ÑÑÑÐ¸Ðµ ÑÐµÑÐµÐ½Ð¸Ñ, Ð¾ÑÐ½Ð¾Ð²Ð°Ð½Ð½ÑÐµ Ð½Ð° Ð°Ð»Ð³Ð¾ÑÐ¸ÑÐ¼Ð°Ñ Ð¼Ð°ÑÐ¸Ð½Ð½Ð¾Ð³Ð¾ Ð¾Ð±ÑÑÐµÐ½Ð¸Ñ, Ð¿ÑÐ¾Ð²ÐµÐ´ÐµÐ½ Ð°Ð½Ð°Ð»Ð¸Ð· ÑÑÑÐµÑÑÐ²ÑÑÑÐ¸Ñ ÑÐ¿Ð¾ÑÐ¾Ð±Ð¾Ð² ÑÐ°ÑÐ¿Ð¾Ð·Ð½Ð°Ð²Ð°Ð½Ð¸Ñ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹ Ñ Ð¿Ð¾Ð¼Ð¾ÑÑÑ Ð¼Ð°ÑÐ¸Ð½Ð½Ð¾Ð³Ð¾ Ð¾Ð±ÑÑÐµÐ½Ð¸Ñ. Ð ÐµÐ·ÑÐ»ÑÑÐ°ÑÐ¾Ð¼ ÑÐ°Ð±Ð¾ÑÑ ÑÐ²Ð»ÑÐµÑÑÑ ÑÐ¸ÑÑÐµÐ¼Ð°, Ð¾Ð±ÑÑÐµÐ½Ð½Ð°Ñ ÑÐ°ÑÐ¿Ð¾Ð·Ð½Ð°Ð²Ð°ÑÑ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¸ Ð½Ð° ÑÑÐ°ÑÑÐºÐ°Ñ ÑÑÑÐ±Ð¾Ð¿ÑÐ¾Ð²Ð¾Ð´Ð¾Ð²., The final qualification work is devoted to the development of a system for detecting anomalies in magnetometric diagnostics. This system is designed to detect defects in the pipeline. In the course of the work, existing solutions based on machine learning algorithms were considered, an analysis was made of existing methods for recognizing anomalies using machine learning. The result of the work is a system trained to recognize defects in pipeline sections.
Published: 2020
Full Text: View/download PDF

370. Аналіз усереднених електрокардіограм у нормі методами машинного навчання

Author: Попов, Антон Олександрович
Subjects: maсhine learning, пошук аномалій, ECG, ЕКГ, усереднений кардіокомплекс, signal-averaged electrocardiogram, isolation forest, аналіз параметрів ЕКГ, ECG parameters analysis, машинне навчання, outlier detection
Abstract: Метою дипломної роботи є визначення різниці між пацієнтами з нормальними сигналами електрокардиограми. Для цього необхідно створити норму для групи молодих людей та визначити їх «відстань» до створеної норми. На основі отриманих результатів складені висновки про різницю між клінічно здоровими та схожими пацієнтами. В першому розділі розглянуті теоретичні відомості про історію ЕКГ, методику отримання сигналу ЕКГ, способи отримання усередненого комплексу та можливе їх використання. В другому розділі описується теорія машинного навчання, запропонований метод вирішення поставленого перед роботою завдання. Відповідно запропонованого способу розглянуті доступні алгоритми машинного навчання та обраний найбільш підходящий (Isolation Forest). В третьому розділі описані використовувані вхідні дані. Вони отримані з бази даних курсантів військового училища. Вікова категорія – 17 - 19 років, всі пацієнти вважаються клінічно здоровими. На стадії формування завдання дипломної роботи було прийнято рішення використовувати усереднений кардіокомплекс. В практичній частині створений код для вичитування і форматування вхідних даних з файлів. Використовуючи обраний метод побудована область норми та визначені аномалії відносно неї. Отримані результати порівняні з аналізом програми Cardiol. Всі пацієнти, судячи з результатів програми, мають деякі відхилення в комплексі показників, що відповідають за стан міокарду в різній мірі. Зроблено висновок про поняття норми для різни вікових категорій пацієнтів та запропоновані методи покращення результатів експерименту. The purpose of this thesis is determination the difference between patients with normal ECG signals. Necessary to create a “normal cloud” for a group of young people and determine their “distance” to the established “cloud”. Based on the results obtained, it is possible to draw conclusions about difference between clinically healthy and similar patients. Partially answer the question: “How healthy is the patient?”. In the first section, there are theoretical information about the history of the ECG, method of obtaining an ECG signal, methods of obtaining an averaged ECG complex and their possible using. In the second section there are describes the theory of machine learning, proposed method of increasing main problem of labor. According to the proposed method, the available machine learning algorithms are considered and the most suitable (Isolation Forest) was selected. It is advantages are: 1. The possibility to work with multidimensional input data; 2. Low consumption computer resource; 3. The possibility of automatic formation of a “normal” sample. The third section describes the inputs used. They are obtained from the database of cadets of military school. Age category – 17-19 years, all of patients are clinically healthy. At the stage of forming the thesis task, it was decided to use an averaged cardio complex. In the practical part, code is created for reading and formatting input data from files. The “norm cloud” was constructed and anomalies are defined. The obtained results was comparable with the analysis of the program, that names Cardiol. All patients have some deviations in the set of parameters that are responsible for the state of the myocardium. Interesting, that 36% of them do not belong to a given age category. They are 67% of similar records in the entire database. Based on this fact conclusion – concept of norm for different age categories of patients is differ. The following steps have been proposed to improve the outcomes: 1. Refinement and increase the database; 2. To find the most weighty parameters for used algorithm; 3. More correctly find the parameters.
Published: 2020

371. Outlier detection at the parcel-level in wheat and rapeseed crops using multispectral and SAR time series

Author: Florian Mouret, Denis Kouame, Mohanad Albughdadi, Sylvie Duthoit, Jean-Yves Tourneret, Guillaume Rieu, CoMputational imagINg anD viSion (IRIT-MINDS), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, TerraNIS, Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), TerraNIS (FRANCE), Laboratoire de recherche en télécommunications spatiales et aéronautiques - TéSA (FRANCE), and Institut de Recherche en Informatique de Toulouse - IRIT (Toulouse, France)
Subjects: Synthetic aperture radar, FOS: Computer and information sciences, Computer Science - Machine Learning, 010504 meteorology & atmospheric sciences, Science, Multispectral image, 0211 other engineering and technologies, Machine Learning (stat.ML), Anomaly detection, 02 engineering and technology, Isolation forest, Unsupervised learning, 01 natural sciences, Normalized Difference Vegetation Index, Machine Learning (cs.LG), vigor, Statistics - Machine Learning, [SDV.SA.STA]Life Sciences [q-bio]/Agricultural sciences/Sciences and technics of agriculture, FOS: Electrical engineering, electronic engineering, information engineering, Traitement du signal et de l'image, unsupervised, Preprocessor, crop monitoring, Sentinel-1, Sentinel-2, isolation forest, anomaly detection, heterogeneity, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences, Remote sensing, Mathematics, 2. Zero hunger, [SDE.IE]Environmental Sciences/Environmental Engineering, Image and Video Processing (eess.IV), Vegetation, Electrical Engineering and Systems Science - Image and Video Processing, Crop monitoring, [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV], Outlier, General Earth and Planetary Sciences
Abstract: International audience; This paper studies the detection of anomalous crop development at the parcel-level based on an unsupervised outlier detection technique. The experimental validation is conducted on rapeseed and wheat parcels located in Beauce (France). The proposed methodology consists of four sequential steps: (1) preprocessing of synthetic aperture radar (SAR) and multispectral images acquired using Sentinel-1 and Sentinel-2 satellites, (2) extraction of SAR and multispectral pixel-level features, (3) computation of parcel-level features using zonal statistics and (4) outlier detection. The different types of anomalies that can affect the studied crops are analyzed and described. The different factors that can influence the outlier detection results are investigated with a particular attention devoted to the synergy between Sentinel-1 and Sentinel-2 data. Overall, the best performance is obtained when using jointly a selection of Sentinel-1 and Sentinel-2 features with the isolation forest algorithm. The selected features are co-polarized (VV) and cross-polarized (VH) backscattering coefficients for Sentinel-1 and five Vegetation Indexes for Sentinel-2 (among us, the Normalized Difference Vegetation Index and two variants of the Normalized Difference Water). When using these features with an outlier ratio of 10%, the percentage of detected true positives (i.e., crop anomalies) is equal to 94.1% for rapeseed parcels and 95.5% for wheat parcels.
Published: 2020
Full Text: View/download PDF

372. Exploring the Quality of Dynamic Open Government Data Using Statistical and Machine Learning Methods.

Author: Karamanou A, Brimos P, Kalampokis E, and Tarabanis K
Subjects: Algorithms, Government, Artificial Intelligence, Machine Learning
Abstract: Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors' knowledge, this is the first time a study has explored the quality of dynamic OGD.
Published: 2022
Full Text: View/download PDF

373. A Self-Calibrating Localization Solution for Sport Applications with UWB Technology.

Author: Piavanini M, Barbieri L, Brambilla M, Cerutti M, Ercoli S, Agili A, and Nicoli M
Subjects: Algorithms, Computers, Technology, Wireless Technology, Sports
Abstract: This study addressed the problem of localization in an ultrawide-band (UWB) network, where the positions of both the access points and the tags needed to be estimated. We considered a fully wireless UWB localization system, comprising both software and hardware, featuring easy plug-and-play usability for the consumer, primarily targeting sport and leisure applications. Anchor self-localization was addressed by two-way ranging, also embedding a Gauss-Newton algorithm for the estimation and compensation of antenna delays, and a modified isolation forest algorithm working with low-dimensional set of measurements for outlier identification and removal. This approach avoids time-consuming calibration procedures, and it enables accurate tag localization by the multilateration of time difference of arrival measurements. For the assessment of performance and the comparison of different algorithms, we considered an experimental campaign with data gathered by a proprietary UWB localization system.
Published: 2022
Full Text: View/download PDF

374. EEG-Based Mental Tasks Recognition via a Deep Learning-Driven Anomaly Detector.

Author: Dairi A, Zerrouki N, Harrou F, and Sun Y
Abstract: This paper introduces an unsupervised deep learning-driven scheme for mental tasks' recognition using EEG signals. To this end, the Multichannel Wiener filter was first applied to EEG signals as an artifact removal algorithm to achieve robust recognition. Then, a quadratic time-frequency distribution (QTFD) was applied to extract effective time-frequency signal representation of the EEG signals and catch the EEG signals' spectral variations over time to improve the recognition of mental tasks. The QTFD time-frequency features are employed as input for the proposed deep belief network (DBN)-driven Isolation Forest (iF) scheme to classify the EEG signals. Indeed, a single DBN-based iF detector is constructed based on each class's training data, with the class's samples as inliers and all other samples as anomalies (i.e., one-vs.-rest). The DBN is considered to learn pertinent information without assumptions on the data distribution, and the iF scheme is used for data discrimination. This approach is assessed using experimental data comprising five mental tasks from a publicly available database from the Graz University of Technology. Compared to the DBN-based Elliptical Envelope, Local Outlier Factor, and state-of-the-art EEG-based classification methods, the proposed DBN-based iF detector offers superior discrimination performance of mental tasks.
Published: 2022
Full Text: View/download PDF

375. GNSS vector quality modelling combining Isolation Forest and Independent Vortices Search

Author: Ismael Érique Koch, Luiz Gonzaga, João Francisco Galera Monico, Vinicius Francisco Rofatto, Mauricio Roberto Veronez, Marcelo Tomio Matsuoka, Ivandro Klein, Unisinos University, Federal Institute of Santa Catarina, Federal University of Paraná, Universidade Federal de Uberlândia (UFU), Federal University of Rio Grande do Sul, and Universidade Estadual Paulista (UNESP)
Subjects: Isolation Forest, Computer science, Applied Mathematics, Linear model, Metaheuristics, Condensed Matter Physics, Ephemeris, GNSS vectors, Independent Vortices Search, Modelling, Identification (information), Variable (computer science), GNSS applications, Linear regression, Outlier detection, Penalty method, Electrical and Electronic Engineering, Instrumentation, Algorithm, Metaheuristic
Abstract: Made available in DSpace on 2022-04-29T08:36:56Z (GMT). No. of bitstreams: 0 Previous issue date: 2022-02-15 Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) Estimating the quality of GNSS vectors is decisive in planning GNSS networks and in several land surveying activities. These vectors are indirectly present in many civil infrastructures usually in the first stages of constructions and thus of high importance. In this research we assembled over 1000 baselines producing an extensive database with over 170,000 processed vectors. We propose a novel identification of outlying GNSS vectors based on the Isolation Forest (IF) algorithm based on the vectors deviations. And a new procedure to build linear models based on metaheuristics and a penalty function. The linear regressions presented models with a coefficient of determination R2 up to 0.996. The observation time span variable remained in all equations at least twice, showing its importance for the outcome quality of a vector. Overall, the three-dimensional deviation of vectors processed with broadcast ephemeris is 2.4 times higher than for precise ephemerides. Graduate Program in Applied Computing Unisinos University, Av. Unisinos, 950 Department of Civil Construction Federal Institute of Santa Catarina Graduate Program in Geodetic Sciences Federal University of Paraná Institute of Geography Federal University of Uberlandia Graduate Program in Remote Sensing Federal University of Rio Grande do Sul São Paulo State University UNESP, Presidente Prudente São Paulo State University UNESP, Presidente Prudente CAPES: 001
Published: 2022
Full Text: View/download PDF

376. Learning to Detect Local Overheating of the High-Power Microwave Heating Process With Deep Learning

Author: Yao Zheng, Ma Longkun, Shan Liang, Kai Wang, Sun Guotan, Yu Xing, Qingyu Xiong, and Tong Liu
Subjects: Electromagnetic field, Pollution, General Computer Science, Computer science, media_common.quotation_subject, Feature extraction, Overheating (economics), 02 engineering and technology, Convolutional neural network, Thermal, convolutional neural networks, 0202 electrical engineering, electronic engineering, information engineering, Electronic engineering, General Materials Science, Overheating (electricity), media_common, business.industry, Deep learning, 020208 electrical & electronic engineering, General Engineering, isolation forest, Electromagnetic heating, local overheating, Microwave heating, 020201 artificial intelligence & image processing, Anomaly detection, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: As a new kind of heating technology, microwave heating could replace traditional heating methods, because it has the advantages of high efficiency, no secondary pollution, and rapid heating. But the microwave heating process, which involves complex coupling between time-varying electromagnetic field and thermal field, is extremely complicated. At this point, the heated medium may produce local overheating. Worse, it may cause unexpected safety accidents, such as burning and even explosion. However, the temperature variation during the period of microwave heating could barely be obtained. In order to solve the problem of local overheating, this paper proposes a deep learning algorithm based on multi-dimensional data to construct an anomaly detection model for detecting local overheating. The algorithm consists of convolutional neural networks (CNNs) and unsupervised learning method named isolation forest algorithm (IFA). First, CNNs is utilized to extract features of the data collected from a WXD15S microwave heating system. Then, IFA detects the local overheating. Compared with the algorithm with common model, experiment results show that the proposed algorithm owns better measurement performance and higher precision.
Published: 2018

377. An enhanced variable selection and Isolation Forest based methodology for anomaly detection with OES data

Author: Sen McLoone and Luca Puggini
Subjects: 0209 industrial biotechnology, Computer science, Dimensionality reduction, isolation forest, Context (language use), Feature selection, Semiconductor, forward selection component analysis, 02 engineering and technology, computer.software_genre, fault detection, Fault detection and isolation, 020901 industrial engineering & automation, Artificial Intelligence, Control and Systems Engineering, OES spectrum, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Anomaly detection, Data mining, Electrical and Electronic Engineering, computer, Dimensionality Reduction, Interpretability, Curse of dimensionality
Abstract: The development of efficient and interpretable anomaly detection systems is fundamental to keeping production costs low, and is an active area of research in semiconductor manufacturing, particularly in the context of using Optical Emission Spectroscopy (OES) data. The high dimension and correlated nature of OES data can limit the performance achievable with anomaly detection systems. In this paper we present a dimensionality reducing variable selection and isolation forest based anomaly detection and diagnosis methodology that addresses these issues. In particular, it takes account of isolated variables that can be overlooked when using conventional approaches such as PCA, and provides greater interpretability than afforded by PCA. The proposed methodology is illustrated with the aid of simulated and industrial plasma etch case studies.
Published: 2018
Full Text: View/download PDF

378. Research on false alarm detection algorithm of nuclear power system based on BERT-SAE-iForest combined algorithm.

Author: Li, Xiangyu, Cheng, Kun, Huang, Tao, and Tan, Sichao
Subjects: *FALSE alarms, *DETECTION alarms, *ALGORITHMS, *SYSTEM identification, *NUCLEAR energy
Abstract: During the operation of nuclear power system, the instrument detection data often deviates from the normal operating state for a short time due to system or environmental fluctuations. And the control system will send an alarm signal, resulting in the false alarm. Aiming at the problem of false alarm, three algorithms are improved and combined to form a false alarm algorithm in this paper. The algorithm consists of the transient operating parameters processing algorithm and the nuclear power system anomaly identification algorithm. The transient operating parameters processing algorithm is based on the Bidirectional Encoder Representations from Transformers (BERT) algorithm and is used to determine whether the deviation between the measured values of the instrument and the theoretical calculated values exceeds the set threshold. The anomaly identification algorithm of nuclear power system is based on Sparse Auto Encoder (SAE) and Isolation Forest (iForest) algorithm to judge the operating state of nuclear power system. When the transient operating parameters processing algorithm judges that the deviation values have exceeded the set threshold, but the anomaly identification algorithm judges that the nuclear power system is in normal operation, it can determine that the current alarm signal is the false alarm. The false alarm detection algorithm of nuclear power system can not only provide judgment basis for the operators to analyze the state of nuclear power system, but also improve the intelligent level of nuclear power system, so as to further improve the safety and reliability of nuclear power system. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

379. Unsupervised Outlier Detection Ensembles and Sentinel-2 Time Series for Land Cover Map Confidence Assessment. A Case Study in Upper Austria with a special Focus on Isolation Forests.

Author: Lackner, Stefan and Lackner, Stefan
Abstract: In this study, the feasibility of unsupervised ensemble outlier detection for land cover map confidence assessment is investigated. Isolation Forest (IF) and Extended Isolation Forest (EIF) algorithms are tested over a range of parameters and datasets to investigate their usefulness for producing pixel-wise quality measures of land cover maps. Benchmark experiments are based on synthetic data, machine learning data from the literature and land cover data derived from a habitat map produced at the Environment Agency Austria (EAA). Manually created inlier-outlier datasets and mono- as well as multi temporal Sentinel2 satellite data with different band combinations are used for performance evaluation. Results show that the EIF is not superior over the IF in terms of Area Under the Curve (AUC) neither on machine learning benchmark data nor on land cover data. A few special cases in which the oblique splitting scheme of the EIF is beneficiary are identified and the impacts of different parameter settings for both algorithms are investigated. While the number of trees is found to have a minor impact on performance, the number of samples is found to be positively related to AUC results. The use of a high number of bands and time series data is also beneficiary in the conducted experiments. These findings are used to produce a confidence assessment layer for the EAA habitat map within the Austrian part of Sentinel2 tile T33UVP. Results show that unsupervised ensemble outlier detection can be successfully used to highlight areas of problematic quality in a LCM. *****In this study, the feasibility of unsupervised ensemble outlier detection for land cover map confidence assessment is investigated. Isolation Forest (IF) and Extended Isolation Forest (EIF) algorithms are tested over a range of parameters and datasets to investigate their usefulness for producing pixel-wise quality measures of land cover maps. Benchmark experiments are based on synthetic data, machine learning da
Published: 2019

380. Latent space conditioning for improved classification and anomaly detection

Author: Norlander, Erik, Sopasakis, Alexandros, Norlander, Erik, and Sopasakis, Alexandros
Abstract: We propose a variational autoencoder to perform improved pre-processing forclustering and anomaly detection on data with a given label. Anomalies howeverare not known or labeled. We call our method conditioned variationalautonencoder since it separates the latent space by conditioning on informationwithin the data. The method fits one prior distribution to each class in thedataset, effectively expanding the prior distribution to include a Gaussianmixture model. Our approach is compared against the capabilities of a typicalvariational autoencoder by measuring their V-score during cluster formationwith respect to the k-means and EM algorithms. For anomaly detection, we use a new metric composed of the mass-volume andexcess-mass curves which can work in an unsupervised setting. We compare theresults between established methods such as as isolation forest, local outlierfactor and one-class support vector machine.
Published: 2019

381. Relevance feature mapping for content-based multimedia information retrieval

Author: Zhou, Guang-Tong, Ting, Kai Ming, Liu, Fei Tony, and Yin, Yilong
Subjects: *INFORMATION retrieval, *MULTIMEDIA systems, *FEATURE extraction, *DIGITAL images, *MULTIMEDIA cartography, *DATABASES
Abstract: Abstract: This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to a profile of the targeted multimedia database. We show that the task of CBMIR can be done more effectively using the relevance features than the original features. Furthermore, additional performance gain is achieved by incorporating our new ranking scheme which modifies instance rankings based on the weighted average of relevance feature values. Experiments on image and music databases validate the efficacy and efficiency of the proposed framework. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

382. Isolation-Based Anomaly Detection.

Author: Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou
Subjects: OUTLIERS (Statistics), BINARY number system, TREE graphs, DIMENSIONAL analysis, DETECTORS, COMPUTATIONAL complexity, TRAINING
Abstract: Anomalies are data points that are few and different. As a result of these properties, we show that, anomalies are susceptible to a mechanism called isolation. This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure--fundamentally different from all existing methods. As a result, iForest is able to exploit subsampling (i) to achieve a low linear time-complexity and a small memory-requirement and (ii) to deal with the effects of swamping and masking effectively. Our empirical evaluation shows that iForest outperforms ORCA, one-class SVM, LOF and Random Forests in terms of AUC, processing time, and it is robust against masking and swamping effects. Iforest also works well in high dimensional problems containing a large number of irrelevant attributes, and when anomalies are not available in training sample. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

383. Assessing Electrocardiogram and Respiratory Signal Quality of a Wearable Device (SensEcho): Semisupervised Machine Learning-Based Validation Study

Author: Zhengbo Zhang, Zhicheng Yang, Ke Lan, Di Wu, Jiachen Wang, Chenbin Ma, Anshuo Wu, Haoran Xu, Yaning Zang, Wei Yan, and Yan Muyang
Subjects: Support Vector Machine, Computer science, signal quality, Health Informatics, electrocardiogram, Signal, Constant false alarm rate, Set (abstract data type), Electrocardiography, Wearable Electronic Devices, Humans, mobile health, Original Paper, business.industry, isolation forest, Arrhythmias, Cardiac, Pattern recognition, Random forest, Support vector machine, machine learning, Test set, Anomaly detection, Artificial intelligence, Noise (video), business, respiratory signal, Algorithms
Abstract: Background With the development and promotion of wearable devices and their mobile health (mHealth) apps, physiological signals have become a research hotspot. However, noise is complex in signals obtained from daily lives, making it difficult to analyze the signals automatically and resulting in a high false alarm rate. At present, screening out the high-quality segments of the signals from huge-volume data with few labels remains a problem. Signal quality assessment (SQA) is essential and is able to advance the valuable information mining of signals. Objective The aims of this study were to design an SQA algorithm based on the unsupervised isolation forest model to classify the signal quality into 3 grades: good, acceptable, and unacceptable; validate the algorithm on labeled data sets; and apply the algorithm on real-world data to evaluate its efficacy. Methods Data used in this study were collected by a wearable device (SensEcho) from healthy individuals and patients. The observation windows for electrocardiogram (ECG) and respiratory signals were 10 and 30 seconds, respectively. In the experimental procedure, the unlabeled training set was used to train the models. The validation and test sets were labeled according to preset criteria and used to evaluate the classification performance quantitatively. The validation set consisted of 3460 and 2086 windows of ECG and respiratory signals, respectively, whereas the test set was made up of 4686 and 3341 windows of signals, respectively. The algorithm was also compared with self-organizing maps (SOMs) and 4 classic supervised models (logistic regression, random forest, support vector machine, and extreme gradient boosting). One case validation was illustrated to show the application effect. The algorithm was then applied to 1144 cases of ECG signals collected from patients and the detected arrhythmia false alarms were calculated. Results The quantitative results showed that the ECG SQA model achieved 94.97% and 95.58% accuracy on the validation and test sets, respectively, whereas the respiratory SQA model achieved 81.06% and 86.20% accuracy on the validation and test sets, respectively. The algorithm was superior to SOM and achieved moderate performance when compared with the supervised models. The example case showed that the algorithm was able to correctly classify the signal quality even when there were complex pathological changes in the signals. The algorithm application results indicated that some specific types of arrhythmia false alarms such as tachycardia, atrial premature beat, and ventricular premature beat could be significantly reduced with the help of the algorithm. Conclusions This study verified the feasibility of applying the anomaly detection unsupervised model to SQA. The application scenarios include reducing the false alarm rate of the device and selecting signal segments that can be used for further research.
Published: 2021
Full Text: View/download PDF

384. Research on anomaly detection method of nuclear power plant operation state based on unsupervised deep generative model.

Author: Li, Xiangyu, Huang, Tao, Cheng, Kun, Qiu, Zhifang, and Sichao, Tan
Subjects: *NUCLEAR power plants, *STEAM generators, *MACHINE learning, *REAL-time control
Abstract: In the field of traditional industrial control, anomaly detection method is mainly used to identify data items that do not match the normal operation state of the system. The traditional machine learning algorithm needs the transient operation data of normal and accident conditions to identify the abnormal state of nuclear power plant. The transient operation data of nuclear power plant during normal condition is sufficient, but it is lacks of transient operation data in accident conditions. To solve the above problems, an abnormal operation state detection method of nuclear power plant based on unsupervised deep generative model is established by using Variational Auto Encoders (VAE) and Isolation Forest (iForest) in this paper. The biggest advantage of this method is that it can only make use of the normal operation data of the nuclear power plant to make the nuclear power plant control system effectively identify whether the current state of nuclear power plant is normal operation or in accident condition. In the unsupervised deep generative model, VAE is used for data preprocessing, and iForest is used to identify abnormal operation data. Then, the method is verified in seven variable and accident conditions, such as power reduction condition, steam generator tube rupture accident and so on. The verification results show that the anomaly detection method can recognize the current abnormal condition immediately when the accident happens. And the time consumed to identify a group of operation parameters corresponding to the operation state of the nuclear power plant is about 3 ms, which can satisfy the real-time requirements of the control system. Therefore, the anomaly detection method based on unsupervised deep generative model can distinguish the normal or abnormal operation state of the nuclear power plant in real time and effectively, and provide judgment basis for accident classification and subsequent rescue. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

385. GNSS vector quality modelling combining Isolation Forest and Independent Vortices Search.

Author: Koch, Ismael É., Klein, Ivandro, Gonzaga, Luiz, Rofatto, Vinicius F., Matsuoka, Marcelo T., Monico, João F.G., and Veronez, Maurício R.
Subjects: *BASE isolation system, *SURVEYING (Engineering), *METAHEURISTIC algorithms, *REGRESSION analysis, *OUTLIER detection
Abstract: Estimating the quality of GNSS vectors is decisive in planning GNSS networks and in several land surveying activities. These vectors are indirectly present in many civil infrastructures usually in the first stages of constructions and thus of high importance. In this research we assembled over 1000 baselines producing an extensive database with over 170,000 processed vectors. We propose a novel identification of outlying GNSS vectors based on the Isolation Forest (IF) algorithm based on the vectors deviations. And a new procedure to build linear models based on metaheuristics and a penalty function. The linear regressions presented models with a coefficient of determination R 2 up to 0.996. The observation time span variable remained in all equations at least twice, showing its importance for the outcome quality of a vector. Overall, the three-dimensional deviation of vectors processed with broadcast ephemeris is 2.4 times higher than for precise ephemerides. • Identification of outlying GNSS vectors based on the Isolation Forest algorithm. • GNSS vectors' models for deviation estimation with precise and broadcast ephemeris. • New procedure for models building combining metaheuristic and penalty function. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

386. Fault anomaly detection of synchronous machine winding based on isolation forest and impulse frequency response analysis.

Author: Chen, Yu, Zhao, Zhongyong, Wu, Hanzhi, Chen, Xi, Xiao, Qianbo, and Yu, Yueqiang
Subjects: *IMPULSE response, *BASE isolation system, *WINDING machines, *SUPERVISED learning, *AIRBORNE lasers, *DATA structures, *SYNCHRONOUS electric motors
Abstract: • This method adopts unsupervised learning and does not need to label the experimental data. • The method proposed in this paper is more suitable for data structures in real life. • A winding fault detection method of synchronous machine based on isolated forest and IFRA is proposed, which overcomes the shortcomings of traditional methods, such as long time-consuming, low accuracy, demand for massive fault experimental data. Synchronous machine is one of the critical power generation parts in the power system. Its stable operation ensures people's normal economic activities. Winding is an essential component of a synchronous machine, and the winding fault is a common fault type. The reliable and efficient fault diagnosis of synchronous machine winding is of great significance to ensure the stability of the power system. Therefore, this paper proposes an anomaly detection method of synchronous machine winding fault based on isolation forest (IF) and impulse frequency response analysis (IFRA). Firstly, the basic principle of the anomaly detection method is introduced, and mathematical-statistical indicators of IFRA signatures used are then explained. Besides, the experimental verification is carried out on a 5 kW synchronous machine, and the performance of the anomaly detection method for winding fault is compared with other conventional methods. The experimental results show that the proposed method is feasible and effective, and the generalization ability of the data is strong. The comparative experimental results show that the proposed method is superior to the existing conventional supervised learning method. It has a shorter calculation time and higher accuracy, with stronger robustness, which is more suitable for the actual data structure. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

387. Automatic fault detection system for mining conveyor using distributed acoustic sensor.

Author: Wijaya, Hendrik, Rajeev, Pathmanathan, Gad, Emad, and Vivekanantham, Ravi
Subjects: *CONVEYING machinery, *OPTICAL signal detection, *SUPPLY & demand, *SYSTEM downtime, *MINES & mineral resources, *OPTICAL sensors, *ACOUSTIC transducers, *ALGORITHMS
Abstract: • Monitoring of mining conveyor using distributed optical fibre sensor is reported. • Optical signal is used to classify the various type of fault and its progression. • System architecture for automatic large data analysis and management is developed. • Automatic fault detection is proposed by modifying iForest algorithm. Condition monitoring of mining conveyor is a highly essential task to ensure minimum disruption to the mining operational system. Failure of one or more conveyor components can result in significant operational downtime, economic loss, and safety risks. The current monitoring method still involves subjective measure from maintenance engineers, where at some cases, fault can be left undetected and leads into site incident. Therefore, there is a high demand for real-time condition monitoring technology to detect early fault on conveyor. In this study, the effective application of distributed optical fibre sensor (DOFS) was explored for long distance real-time condition monitoring of mining conveyor. The fault detection framework was developed by integrating and modifying the Isolation Forest algorithm to analyse optical signals for effective detection of defective idlers. Further, the optical signal was analysed to extract the damage progression of defective idler with time and space. The results were used to classify various levels of damage and to set appropriate damage thresholds. Also, software interface, that can be used to set the sensing parameters, to collect, analyse, and visualise the signal in real-time, was developed. Finally, the developed condition monitoring system was used to monitor a 1.6 km long section of a conveyor structure in Western Australia for a period of 10 months. The results and findings from the field monitoring were presented together with automated fault detection framework for condition monitoring of mining conveyor. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

388. Functional Isolation Forest

Author: Staerman, Guillaume, Mozharovskyi, Pavlo, Clémençon, Stephan, d'Alché-Buc, Florence, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, and Institut Polytechnique de Paris (IP Paris)
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Statistics - Machine Learning, isolation forest, Machine Learning (stat.ML), Anomaly detection, unsupervised learning, Machine Learning (cs.LG), functional data analysis
Abstract: International audience; For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular Isolation Forest (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a Functional Statistical Depth function is defined and discussed as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.
Published: 2019

389. Explainable Machine Learning in Industry 4.0: Evaluating Feature Importance in Anomaly Detection to Enable Root Cause Analysis

Author: Chiara Masiero, Gian Antonio Susto, Alessandro Beghi, and Mattia Carletti
Subjects: Isolation Forest, 0209 industrial biotechnology, Anomaly Detection, Industry 4.0, Interpretability, Machine Learning, Computer science, Feature extraction, 02 engineering and technology, Machine learning, computer.software_genre, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, business.industry, Feature (computer vision), Task analysis, Key (cryptography), 020201 artificial intelligence & image processing, Anomaly detection, Artificial intelligence, business, Root cause analysis, computer
Abstract: In the past recent years, Machine Learning methodologies have been applied in countless application areas. In particular, they play a key role in enabling Industry 4.0. However, one of the main obstacles to the diffusion of Machine Learning-based applications is related to the lack of interpretability of most of these methods. In this work, we propose an approach for defining a ‘feature importance’ in Anomaly Detection problems. Anomaly Detection is an important Machine Learning task that has an enormous applicability in industrial scenarios. Indeed, it is extremely relevant for the purpose of quality monitoring. Moreover, it is often the first step towards the design of a Machine Learning-based smart monitoring solution because Anomaly Detection can be implemented without the need of labelled data. The proposed feature importance evaluation approach is designed for Isolation Forest, one of the most commonly used algorithm for Anomaly Detection. The efficacy of the proposed method is tested on synthetic and real industrial datasets.
Published: 2019
Full Text: View/download PDF

390. Combining the outputs of various k-nearest neighbor anomaly detectors to form a robust ensemble model for high-dimensional geochemical anomaly detection.

Author: Chen, Yongliang, Zhao, Qingying, and Lu, Laijun
Subjects: *ANOMALY detection (Computer security), *GEOCHEMICAL modeling, *INTRUSION detection systems (Computer security), *GAUSSIAN mixture models, *DETECTORS, *SUPPORT vector machines, *RANDOM forest algorithms
Abstract: Machine learning techniques provide useful methods for high-dimensional geochemical anomaly detection for mineral exploration targeting. However, the instability of the machine learning models often leads to the uncertainty of high-dimensional geochemical anomaly detection result. Combining various individual models to form an adaptive ensemble anomaly detector is a feasible way to enhance the robustness of machine learning anomaly detectors. In this study, the average method, maximization method, average of maximum (AOM) method, and maximum of average (MOA) method were adopted to combine the outputs of various k -nearest neighbor (KNN) anomaly detectors to improve the robustness of the KNN models in the high-dimensional geochemical anomaly detection in the Baishan district (Jilin Province, China). The effectiveness of the four combination algorithms for high-dimensional geochemical anomaly detection was evaluated by comparing the ensemble models obtained by using the four combination algorithms with the single KNN model, Gaussian mixture model (GMM), one-class support vector machine (OCSVM), and isolation forest (IForest) in the case study. It is found that the four ensemble models (a) perform similarly well in high-dimensional geochemical anomaly detection, and (b) perform better than the single KNN model, GMM, OCSVM, and IForest in high-dimensional geochemical anomaly detection. Therefore, the average method, maximization method, AOM method, and MOA method are potentially useful algorithms for combining the outputs of various KNN models to form robust ensemble models for high-dimensional geochemical anomaly detection. • A KNN model is used to detect high-dimensional geochemical anomalies. • Robust high-dimensional geochemical anomaly detection models are formed by combining the outputs of various KNN models. • The robust models are compared with KNN, GMM, OCSVM and IForest. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

391. Enhanced anomaly scores for isolation forests.

Author: Mensi, Antonella and Bicego, Manuele
Subjects: *OUTLIER detection, *RANDOM forest algorithms
Abstract: • We propose novel anomaly scores for Isolation Forests. • We design weighted variants of the anomaly score that embed additional information. • We propose a novel aggregation function to combine the scores at forest level. • Experiments confirm the suitability of the approach. • The novel proposed scores outperform the original one. Isolation Forest represents a variant of Random Forest largely and successfully employed for outlier detection. The main idea is that outliers are likely to get isolated in a tree after few splits. The anomaly score is therefore a function inversely related to the leaf depth. This paper proposes enhanced anomaly scores of the Isolation Forest by making two different contributions. The first consists in weighing the path traversed by an object to obtain a more informative anomaly score. The second contribution employs a different aggregation function to combine the tree scores. We thoroughly evaluate the proposed methodology by testing it on sixteen datasets. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

392. A new method for fault detection of aero-engine based on isolation forest.

Author: Wang, Hongfei, Jiang, Wen, Deng, Xinyang, and Geng, Jie
Subjects: *BASE isolation system, *TURBOFAN engines, *PROBLEM solving, *INTERNAL combustion engines
Abstract: The research on fault detection of aero-engine is of great significance to its safe and reliable operation. In this paper, a dynamic threshold method for aero-engine fault detection based on Isolation Forest (i Forest) is proposed. The proposed method can use only normal aero-engine data for training to build the fault detection model, which solves the problem that there is no large amount of fault data for training in the field of aero-engine fault detection due to the limitations of actual conditions. The method is verified by the residual data of the turbofan engine gas path system which is generated by the state variable model under three different fault states. Compared with the results of other methods, it is found that the proposed method can not only achieve high detection accuracy but also has a short running time. It is proved that the proposed method is suitable for fault detection of aero-engine. • An effective Aero-engine fault detection model based on Isolation Forest. • The proposed model can construct an adaptive dynamic threshold. • The proposed model can not only achieve high detection accuracy but also has a short running time. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

393. Investigation of Isolation Forest for Wind Turbine Pitch System Condition Monitoring Using SCADA Data.

Author: McKinnon, Conor, Carroll, James, McDonald, Alasdair, Koukoura, Sofia, and Plumley, Charlie
Subjects: *SUPERVISORY control & data acquisition systems, *HYDRAULIC turbines, *WIND turbines
Abstract: Wind turbine pitch system condition monitoring is an active area of research, and this paper investigates the use of the Isolation Forest Machine Learning model and Supervisory Control and Data Acquisition system data for this task. This paper examines two case studies, turbines with hydraulic or electric pitch systems, and uses an Isolation Forest to predict failure ahead of time. This novel technique compared several models per turbine, each trained on a different number of months of data. An anomaly proportion for three different time-series window lengths was compared, to observe trends and peaks before failure. The two cases were compared, and it was found that this technique could detect abnormal activity roughly 12 to 18 months before failure for both the hydraulic and electric pitch systems for all unhealthy turbines, and a trend upwards in anomalies could be found in the immediate run up to failure. These peaks in anomalous behaviour could indicate a future failure and this would allow for on-site maintenance to be scheduled. Therefore, this method could improve scheduling planned maintenance activity for pitch systems, regardless of the pitch system employed. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

394. A two-stage model for forecasting consumers’ intention to purchase with e-coupons

Author: Xianhao Xu, Jingjing Cao, Xinxin Ren, Yeming Gong, emlyon business school, and business school, emlyon
Subjects: Marketing, Consumption (economics), Consumer segmentation, business.industry, 05 social sciences, Big data, Logistic regression, Advertising, Electronic coupon redemption, [SHS.ECO]Humanities and Social Sciences/Economics and Finance, Data imbalance, Isolation forest, Online advertising, Preference, Data-driven strategy, 0502 economics and business, [SHS.GESTION]Humanities and Social Sciences/Business administration, 050211 marketing, Segmentation, Business, Isolation (database systems), [SHS.ECO] Humanities and Social Sciences/Economics and Finance, [SHS.GESTION] Humanities and Social Sciences/Business administration, 050203 business & management
Abstract: International audience; E-coupons (electronic coupons) have been a mainstay of online marketing to attract consumers and promote them to repeat purchase, distributing right e-coupons to right consumers is of critical importance. In big data era, analyzing consumers preferences for e-coupons by their online behavior and the impact of data imbalance caused by low active consumers are rarely studied. Thus, we propose a two-stage hybrid model. Firstly, consumer segmentation is implemented to analyze behavioral characteristics for each segment and distinguish low active consumers, then models are constructed for different consumer segments. The proposed model is applied to a real online consumption data. Consumers are aggregated into four segments: potential e-coupons user, low discount sensitive user, high discount sensitive user (including discount preference and fixed preference). The first one is defined as low active consumer segment and others are high active consumer segments. Isolation forest model and logistic regression model are respectively constructed for them. Result shows that data imbalance is effectively relieved, prediction performance is also significantly better than the traditional approaches. Finally, e-coupons’ usage characteristics for each consumer segment are summarized, according to that, companies can increase sales and improve consumer satisfaction as well.
Published: 2021
Full Text: View/download PDF

395. Explainable Anomaly Detection Framework for Maritime Main Engine Sensor Data.

Author: Kim, Donghyun, Antariksa, Gian, Handayani, Melia Putri, Lee, Sangbong, and Lee, Jihwan
Subjects: *ANOMALY detection (Computer security), *MARINE engines, *DETECTORS, *MACHINE learning, *HIERARCHICAL clustering (Cluster analysis)
Abstract: In this study, we proposed a data-driven approach to the condition monitoring of the marine engine. Although several unsupervised methods in the maritime industry have existed, the common limitation was the interpretation of the anomaly; they do not explain why the model classifies specific data instances as an anomaly. This study combines explainable AI techniques with anomaly detection algorithm to overcome the limitation above. As an explainable AI method, this study adopts Shapley Additive exPlanations (SHAP), which is theoretically solid and compatible with any kind of machine learning algorithm. SHAP enables us to measure the marginal contribution of each sensor variable to an anomaly. Thus, one can easily specify which sensor is responsible for the specific anomaly. To illustrate our framework, the actual sensor stream obtained from the cargo vessel collected over 10 months was analyzed. In this analysis, we performed hierarchical clustering analysis with transformed SHAP values to interpret and group common anomaly patterns. We showed that anomaly interpretation and segmentation using SHAP value provides more useful interpretation compared to the case without using SHAP value. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

396. Fuzzy C-Means-based Isolation Forest.

Author: Karczmarek, Paweł, Kiersztyn, Adam, Pedrycz, Witold, and Czerwiński, Dariusz
Subjects: ANOMALY detection (Computer security), DATA analysis, FRAUD investigation, OUTLIER detection
Abstract: The problem of finding anomalies (outliers) in databases is one of the most important issues in modern data analysis. One of the reasons is the occurrence of this issue in almost every type of database, including numerical, categorical, time, mixed, or graphic data. There are currently many methods often dedicated to specific data analysis. Finally, this topic is extremely interesting per se, as a research problem that intrigues researchers. One of the classic methods of data analysis dedicated to finding the anomalies in the data is Isolation Forest. However, this method, with a few exceptions, has not been modified from the time of its first publication, and, in particular, it has not yet appeared in combination with the typical fuzzy methods used for grouping such as Fuzzy C-Means (FCM) clustering. In this study, we thoroughly analyze this approach, as well as several related ones. We examine the possibilities of this technique and analyze it in detail for characteristics of data (database size, number of attributes, records, their type, etc.). It is worth noting that FCM allows to obtain membership grades of elements forming Isolation Forest nodes to clusters on the basis of which these nodes are built. Hence, at the stage of calculating the anomaly scores, this information is effectively used, in particular to express how much a given element may belong to a group of similar elements, which can be inferred from the characteristics of the cluster in which it lies. In this study, we propose a set of methods enhancing the Isolation Forest on a basis of Fuzzy C-Means. The results of numerical experiments carried using 27 various datasets and reported in this paper lead us to the conclusion that FCM can play a pivotal role in an enhancement of Isolation Forest approach and raises up the values of particular measures of effectiveness of the anomaly detection methods. • Fuzzy enhancements of Isolation Forest are proposed. • Fuzzy C-Means method is applied to split the tree nodes. • Different variants of search tree building are considered. • Numerical experiments' results demonstrating the efficiency are presented. • The experiments on real world logistic dataset are conducted. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

397. Research on the fault monitoring method of marine diesel engines based on the manifold learning and isolation forest.

Author: Wang, Ruihan, Chen, Hui, Guan, Cong, Gong, Wenfeng, and Zhang, Zehui
Subjects: *MARINE engines, *DIESEL motors, *AIRBORNE lasers, *DATA reduction, *FALSE alarms
Abstract: In this paper, an innovative hybrid fault monitoring scheme integrating the manifold learning and the isolation forest was established to monitor the state of marine diesel engine. The manifold learning was used to extract the useful feature and realize data dimension reduction, and these extracted features could ameliorate the fault monitoring process. Then, the isolation forest only utilized the normal operating data to realize the fault monitoring, and with manifold learning, the hybrid model can reduce computation complexity and improve diagnostic accuracy. However, the conventional isolation forest may ignore some fault information and cannot provide satisfactory fault detection performance. Therefore, a threshold based on partial monitoring fault data and the clustering algorithm was set to provide more transparent and accurate diagnostic results. For validating the proposed scheme, a two-stroke marine diesel engine was developed in MATLAB/Simulink environment based on zero-dimensional approach to represent a real engine behavior, and an in-service marine diesel engine provided reliable normal and fault condition datasets. Finally, comparisons of fault detection rate and false alarm rate of other state-of-the-art methods on simulated and measured datasets demonstrated the excellent performance of the proposed hybrid fault monitoring scheme. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

398. Insider Threat detection using Isolation Forest

Author: Scherman, Maja, Bülow, Joakim, Scherman, Maja, and Bülow, Joakim
Abstract: In contrast to the need for companies to get real time information about insider threats, there is a privacy and integrity based limitation of what the individual accepts as acceptable surveillance. This creates a problem since performing online surveillance would pose an infringement on the employees privacy and integrity. Therefore we present a model using Isolation Forest to solve this problem. We focus on analyzing the non-intrusive features in a real time, event based approach. We process our features using periodic features, which we have sta- tistically proven to be more effective than periodic features used with Isolation Forest. Our results show that by analyzing employees login and logout times, we can detect 76% of all insider threats while only falsely classify 7% of all nor- mal instances. The recall rate, which shows how complete the results are, is 76%., Digitalization has brought us great opportunities for economic growth and there has been a global trend for companies to store more and more of their assets and products in digital form for many years. But digitalization also brought new types of risks and vulnerabilities and to stay secure companies needs to invest in countermeasures. Companies are prone to put great recourses into securing their digital perimeters and exposure to the Internet to prevent cyber-crime and digital theft. What is commonly ignored is the possibility of threats from within the company perimeters, so called insider threats. What if we could detect and prevent insider threats before they ever occurred? For example, if an employee feels let down by the company and decides to sell information to the highest bidder; this might be preceded by certain actions that could be detected. In this thesis we have im- plemented a model for detection of insider threats adapted for companies which is usable when trying to reduce the risk of insider threats. Detection of insider threats can be done with the help of machine learning. Ma- chine learning is when a computer learns from input data without being specifically pro- grammed. In our case we use a machine learning algorithm, called Isolation Forest, which is specialized in detecting anomalies. Machine learning typically needs a large amount of data to be able to perform well. This leads us to a common problem among researchers of insider threats - real data is often not made public. Most companies treat data of inter- nal attacks, insider threats, with high confidentially, and do not make it publicly available. This led a team of researchers to develop a synthetic data set, adapted for researches who research about insider threats. The data set consists of lots of normal employee behavior as well as a small part of suspicious events that indicates insider threats. The small part of only only 0.023% suspicious events introduces problems that several m
Published: 2018

399. Peak-Load Forecasting for Small Industries: A Machine Learning Approach

Author: Dong-Hoon Kim, Eun-Kyu Lee, and Naik Bakht Sania Qureshi
Subjects: Computer science, Process (engineering), 020209 energy, small industry, lcsh:TJ807-830, Geography, Planning and Development, lcsh:Renewable energy sources, 02 engineering and technology, Management, Monitoring, Policy and Law, Compensation (engineering), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), peak-load forecasting, lcsh:Environmental sciences, lcsh:GE1-350, Ensemble forecasting, Renewable Energy, Sustainability and the Environment, business.industry, lcsh:Environmental effects of industries and plants, ensemble, 020208 electrical & electronic engineering, isolation forest, Industrial engineering, Renewable energy, machine learning, lcsh:TD194-195, Peak load, business, Energy (signal processing)
Abstract: Peak-load forecasting prevents energy waste and helps with environmental issues by establishing plans for the use of renewable energy. For that reason, the subject is still actively studied. Most of these studies are focused on improving predictive performance by using varying feature information, but most small industrial facilities cannot provide such information because of a lack of infrastructure. Therefore, we introduce a series of studies to implement a generalized prediction model that is applicable to these small industrial facilities. On the basis of the pattern of load information of most industrial facilities, new features were selected, and a generalized model was developed through the aggregation of ensemble models. In addition, a new method is proposed to improve prediction performance by providing additional compensation to the prediction results by reflecting the fewest opinions among the prediction results of each model. Actual data from two small industrial facilities were applied to our process, and the results proved the effectiveness of our proposed method.
Published: 2020
Full Text: View/download PDF

400. Anomaly detection for elderly home care

Author: Widyawan, Anton Satria Prabuwono, Lutfan Lazuardi, Lukito Edi Nugroho, Kurnianingsih, Mahardhika Pratama, Kurnianingsih, Nugroho, Lukito E, Widyawan, Lazuardi, Lutfan, Prabuwono, Anton Satria, and Pratama, Mahardhika
Subjects: Information Systems and Management, Isolation (health care), Computer science, Small number, elderly home care, Decision tree, Vital signs, Skin temperature, isolation forest, anomaly detection, Random forest, Management Information Systems, Statistics, Anomaly detection, Statistics, Probability and Uncertainty, Reliability (statistics)
Abstract: usc Refereed/Peer-reviewed In this paper, we propose a model for detecting anomalies in elderly home care. Two scenarios are investigated in detecting anomalies: 1) the elderly person's vital signs and their surrounding environment; 2) the mobility patterns of the elderly. We evaluated our proposed model by employing the isolation forest which detects anomalies using an isolation approach on a random forest of decision trees. We compare isolation forest on unlabeled data with statistical methods on labelled data. Subsequently, to show the reliability of the isolation concept, we compare it with a distance measure concept. The experiment shows that isolation forest has higher detection accuracy and lower error prediction for two attributes in the first scenario: skin temperature and heart rate, whereas, in the second scenario, multi-covariance determinant has a slightly better accuracy compared to isolation forest (3.9% difference in accuracy) and has a small number of prediction errors compared to isolation forest.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

566 results on '"isolation forest"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources