11 results
Search Results
2. Effect of feature optimization on performance of machine learning models for predicting traffic incident duration.
- Author
-
Obaid, Lubna, Hamad, Khaled, Khalil, Mohamad Ali, and Nassif, Ali Bou
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE performance , *DATA distribution , *K-nearest neighbor classification , *PRINCIPAL components analysis , *MACHINE learning - Abstract
Developing a high-performing traffic incident-duration prediction model is considered a key component for evaluating the impact of these incidents on the roadway network. Various research studies have developed robust incident-duration prediction models. Still, they have faced many issues in providing an accurate prediction result due to the countless data modeling issues, such as complex correlations, highly skewed data distributions, heteroscedasticity, and outliers. This paper investigates the impact of feature optimization (FO) - a relatively new term encompassing two already-known topics: feature engineering (FE) and feature selection (FS) techniques - on the performance of several machine learning models developed for predicting incident durations. The models developed included multivariate linear regression, decision trees, support vector regressors, K-Nearest Neighbors regression, ensembles, and artificial neural networks. Various FO techniques have been used for each model to derive the massive traffic incidents dataset and repeat the prediction process. Our results show that the proposed filtering, wrapper, and embedded FS techniques can successfully reduce the number of features without sacrificing the prediction performance. Using log-normal transformation to deal with continuous data skewness, min-max normalization to deal with data variability, and principal component analysis (PCA) to reform the dataset into a smaller independent feature subset, FE techniques can enhance the accuracy of incident duration estimation over the assessed ML models. The best-performing FE technique was the PCA since performance improvements were observed across all developed ML models. The best-performing FS technique was the Recursive Feature Elimination, outperforming other tested techniques in reducing model complexity while maintaining model accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. A systematic literature review of recent lightweight detection approaches leveraging machine and deep learning mechanisms in Internet of Things networks.
- Author
-
Mukhaini, Ghada AL, Anbar, Mohammed, Manickam, Selvakumar, Al-Amiedy, Taief Alaa, and Momani, Ammar Al
- Subjects
INTERNET of things ,MACHINE learning ,DEEP learning ,FEATURE selection ,DENIAL of service attacks ,SCIENCE databases - Abstract
The Internet of Things (IoT) connects daily use devices to the Internet, such as home appliances, health care equipment, sensors, and industrial devices. Concurrently, numerous cyber-attacks target those objects and their backbone IoT networks consecutively. Therefore, several researchers have adopted Machine Learning (ML) and Deep Learning (DL) algorithms to develop efficient Intrusion Detection Systems (IDSs). However, the restricted resources of IoT devices hinder integrating those systems with those tiny devices. Hence, designing lightweight IDSs gets more interest from researchers to build efficient detection models to discard attacks in IoT networks. To give a holistic insight into this research domain, this paper presents a Systematic Literature Review (SLR) to review and analyse the recent ML and DL techniques to lighten the IDS models for detecting attacks in IoT devices. In addition, the literature studies were retrieved from six scientific databases Google Scholar, Science Direct, IEEE Xplore®, Scopus, Web of Science, and Springer. From 4,703 identified records, 57 studies were adopted based on predesigned research questions and inclusion/exclusion criteria. The study's findings illustrate the most recently used ML and DL mechanisms and feature engineering techniques to lighten the proposed IDS models. It also shows the most attacks detected, datasets used, tools and network simulators employed, and evaluation metrics and parameters. Furthermore, it suggests the research challenges and future direction after discussing the limitations of the currently proposed techniques. This study shows that most selected studies are journal articles published in IEEE Xplore®. Furthermore, the most used feature engineering techniques are filter-based, as they deliver better performance and lightness than the developed models. Most studies use correlation algorithms as a feature selection technique. Finally, the most discussed attack in the selected studies is the DoS attack. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Very short-term load forecasting on factory level – A machine learning approach.
- Author
-
Walther, Jessica, Spanier, Dario, Panten, Niklas, and Abele, Eberhard
- Abstract
In the context of energy transition in Germany, precise load forecasting enables reducing the impact of increased volatility in power generation induced by renewable energies. This paper presents a machine learning approach to generate a 15 minutes forecasting model of the electric load for the ETA research factory at TU Darmstadt on a factory level. In the first iteration, a feature selection process was conducted to select significant features for machine learning datasets. The raw data contained 1,554 features from machine tools, technical building equipment, the building itself and external factors like the weather. The second iteration examined the forecasting capabilities of six hyperparameter tuned algorithms on the feature selected datasets. In the third iteration, feature engineering and hyperparameter tuning led to an optimized Gradient Boosting Regression Trees (GBRT) algorithm. The results indicate that the utilized machine learning approach is feasible and creates a precise very short term load forecasting model, depending on the use case of the load forecast. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
5. Deeppipe: Theory-guided prediction method based automatic machine learning for maximum pitting corrosion depth of oil and gas pipeline.
- Author
-
Du, Jian, Zheng, Jianqin, Liang, Yongtu, Xu, Ning, Liao, Qi, Wang, Bohong, and Zhang, Haoran
- Subjects
- *
PIPELINES , *PITTING corrosion , *FEATURE selection , *PETROLEUM pipelines , *MACHINE learning , *STANDARD deviations , *EPOXY coatings ,PIPELINE corrosion - Abstract
• Scientific theory of pitting corrosion is firstly integrated into machine learning model. • The prediction model of maximum pitting corrosion depth is established via automatic model generation. • A real-world pitting corrosion dataset is employed to verify the superiority of the proposed model. • The proposed model achieves better accuracy and efficiency than other models. Accurate monitoring of pipeline corrosion is important and necessary not only for the normal operation of oil and gas pipelines but also for the reliable and stable supply of energy. To avoid failures of buried steel pipelines, the precise prediction of maximum pitting corrosion depth should be conducted to prevent accidents. In this paper, an automatic machine learning (AML) based approach is developed to automate the construction of corrosion depth prediction model. The engineering theory and domain knowledge are integrated into feature engineering, which is an important part of the machine learning (ML) modeling process, to overcome the drawback of the conventional modeling method of ML. Subsequently, a novel prediction method, so-called theory-guided AML (Tg-AML) is proposed for the maximum depth prediction of pitting corrosion pipeline. Firstly, several new feature variables are constructed based on the corrosion mechanism (empirical model and interaction between input variables). Then, seven different feature subsets are developed based on correlation analysis. To select the suitable feature subset and verify the superiority of Tg-AML, a real-world pitting corrosion dataset is utilized for performance comparison based on evaluation metrics. After acquiring the suitable feature subset, the maximum pitting corrosion depth prediction model that fits the data and is guided by both engineering theory and domain knowledge is established. The results indicate that the proposed model achieves better accuracy and efficiency than other models, such as neural network, and decision tree, with root mean square error (RMSE) being 0.288, mean absolute error (MAE) being 0.174, confidence index (Cl) being 0.933. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. A novel aging characteristics-based feature engineering for battery state of health estimation.
- Author
-
Wang, Jinyu, Zhang, Caiping, Zhang, Linjing, Su, Xiaojia, Zhang, Weige, Li, Xu, and Du, Jingcai
- Subjects
- *
FEATURE selection , *STANDARD deviations , *MACHINE learning , *ELECTRIC batteries - Abstract
State of health (SOH) estimation is essential for lithium-ion battery systems to ensure safe and reliable operation. The existing SOH estimation considers a few available signals, such as voltage and current, to extract specified and limited capacity-related features. Once the cell or materials is changed, features require manual re-built as the construction is specific and unsystematic. This paper proposes a novel aging information-based feature engineering framework for SOH diagnosis, which combines a comprehensive feature library driven by three-step construction strategy and an automatic feature selection pipeline fused with embedded-based and filter-based methods. In the feature space, the role played by each feature type and the extent to which the combination of features affects SOH estimation are explored by accuracy and robustness. For the collected datasets, a library of 206 features is generated as inputs for feature selection which eventually output a space with 7 features to track SOH change. These features perform well under all three typical machine learning models, with the maximum absolute error within 1% and the root mean square error (RMSE) below 0.29% for all cells of transfer operations. Compared to the existing literature using the features of discharge capacity differences between two cycles [ΔQ(V) curve], the RMSE is reduced by up to 85.1%. The approach is automated to produce a highly robust feature subset for accurate SOH estimation across usage protocols and multiple battery chemistries due to the wide range of feature sets and the superiority of feature selection. • A comprehensive feature library driven by a three-step construction method is generated. • An automatic feature selection pipeline for generating an algorithm-free feature subset is developed. • The performance of feature space-based SOH estimation is verified by three machine learning models on cross-service cells. • The role played by each feature type and the influence of feature combination on SOH are investigated. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. A review of machine learning in building load prediction.
- Author
-
Zhang, Liang, Wen, Jin, Li, Yanfei, Chen, Jianli, Ye, Yunyang, Fu, Yangyang, and Livingood, William
- Subjects
- *
MACHINE learning , *FEATURE selection , *BUILDING operation management , *MACHINE performance , *FORECASTING , *MAXIMUM power point trackers - Abstract
• This paper reviews building load prediction with machine learning techniques. • Review and technical papers are searched by Sub-keyword Synonym Searching method. • Technical papers are reviewed in terms of application, algorithms, and data. • Primary limitations and gaps are identified; future trends are predicted. • A guidance for future technical paper on building load prediction is proposed. The surge of machine learning and increasing data accessibility in buildings provide great opportunities for applying machine learning to building energy system modeling and analysis. Building load prediction is one of the most critical components for many building control and analytics activities, as well as grid-interactive and energy efficiency building operation. While a large number of research papers exist on the topic of machine-learning-based building load prediction, a comprehensive review from the perspective of machine learning is missing. In this paper, we review the application of machine learning techniques in building load prediction under the organization and logic of the machine learning, which is to perform tasks T using Performance measure P and based on learning from Experience E. Firstly, we review the applications of building load prediction model (task T). Then, we review the modeling algorithms that improve machine learning performance and accuracy (performance P). Throughout the papers, we also review the literature from the data perspective for modeling (experience E), including data engineering from the sensor level to data level, pre-processing, feature extraction and selection. Finally, we conclude with a discussion of well-studied and relatively unexplored fields for future research reference. We also identify the gaps in current machine learning application and predict for future trends and development. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. A Dual-Staged heterogeneous stacked ensemble model for gender recognition using speech signal.
- Author
-
kala, Jaideep, Taran, Sachin, and Pandey, Anukul
- Subjects
- *
AUTOMATIC speech recognition , *SPEECH perception , *RECEIVER operating characteristic curves , *FEATURE selection , *RANDOM forest algorithms , *MACHINE learning - Abstract
• A dual-staged heterogeneous stacked ensemble model is introduced. • A variance-based feature selection provides the lower computational complexity model. • Meta-learner-controlled the over-fitting and improved the predictive accuracy. • The proposed technique obtaining better performance for gender recognition. Gender classification is one of the most popular topics in the field of machine learning. It can be done with images or sounds. With the aim of developing a low-complexity and highly accurate gender recognition model, this paper proposes a Dual-staged Heterogeneous Stacked Ensemble Model (DH-SEM). In the proposed ensemble model, the classification problem is addressed by the amalgamation of three supervised base learners at stage-1 and random forest as a Meta learner in stage-2. The proposed work uses a speech dataset compiled from various sources (Harvard-Haskins, Carnegie Mellon and McGill University) and consists of statistical features. Rigorous feature engineering has been performed to select the most discriminative information from the dataset. For gender recognition, the DH-SEM method adopts the reduced feature space. The gender recognition accuracy of the proposed DH-SEM model is 99.36%, which is highest as compared to the state-of-the-art methods. The robustness of the proposed technique is also validated by other performance evaluation metrics and receiver operating characteristics. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. GeoClust: Feature engineering based framework for location-sensitive disaster event detection using AHP-TOPSIS.
- Author
-
Rani, Monika and Kaushal, Sakshi
- Subjects
- *
MACHINE learning , *FEATURE selection , *BUILDING failures , *DISASTER relief , *EMERGENCY management , *DISASTERS , *TERRORISM , *DECISION making , *MULTIPLE criteria decision making - Abstract
• Feature engineering based framework for location-sensitive disaster event detection. • Augmentation of context-free and context-based features with place of occurrence. • Evaluation with unsupervised machine learning algorithm with 09 performance metrics. • AHP-TOPSIS based selection of an efficient machine learning algorithm and feature set. • Location-augmented context-based features outperformed traditional textual features. Disaster event detection aims to identify events like terrorist attacks, fire incidents, stampede incidents, building collapse, etc., reported in the online news articles or social media. Place of occurrence of disaster event is a significant feature associated with events for location-sensitive disaster event detection. Efficient feature selection and their augmentation with location information can contribute towards the evolution of traditional approaches and their adoption for location-sensitive disaster event detection leading to improvement in the overall process as a whole. Since the evaluation of event detection techniques deliberates various intrinsic and extrinsic performance metrics, the decision-making for the selection of feature sets is treated as a Multiple-Criteria Decision Making (MCDM) problem. This paper proposes a framework, GeoClust , that is based on feature engineering of traditional textual features in order to enhance their capability for improved location-sensitive disaster event detection. The framework augments context-free and context-based textual feature sets with feature sets of place of occurrence of the events and evaluates their performance using unsupervised machine learning algorithms for various performance metrics. Finally, the best feature set is selected using AHP-TOPSIS technique of MCDM in order to tune the system for automatic and efficient location-sensitive disaster event detection in real-time. Extensive set of experiments have been performed in order to evaluate the framework on a dataset of online news articles reporting disaster events about terrorist attacks, fire incidents, stampede incidents, building collapse and maoist attacks happened at different locations in India. The results show that the location-augmented feature sets significantly improve performance of location-sensitive disaster event detection as compared with traditional feature sets. The results also demonstrate that the context-based feature sets with location-augmentation are ranked higher than the context-free feature sets in MCDM analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Automatic recommendation of feature selection algorithms based on dataset characteristics.
- Author
-
Parmezan, Antonio Rafael Sabino, Lee, Huei Diana, Spolaôr, Newton, and Wu, Feng Chung
- Subjects
- *
FEATURE selection , *ENGINEERING models , *MACHINE learning , *ALGORITHMS , *DATA mining , *FEATURE extraction - Abstract
Feature selection in real-world data mining problems is essential to make the learning task efficient and more accurate. Identifying the best feature selection algorithm, among the many available, is a complex activity that still relies heavily on human experts or some random trial-and-error procedure. Thus, the automated machine learning community has taken some steps towards the automation of this process. In this paper, we address the metalearning challenge of recommending feature selection algorithms by proposing a novel meta-feature engineering model. Our model considers a broad collection of meta-features that enable the study of the relationship between the dataset properties and the feature selection algorithm performance in terms of several criteria. We arrange the input meta-features into eight categories: (i) simple, (ii) statistical, (iii) information-theoretical, (iv) complexity, (v) landmarking, (vi) based on symbolic models, (vii) based on images, and (viii) based on complex networks (graphs). The target meta-features emerge from a multi-criteria performance measure, based on five individual performance indexes, that assesses feature selection methods grounded in information, distance, dependence, consistency, and precision measures. We evaluate our proposal using a recently developed framework that extracts the input meta-features from 213 benchmark datasets, and ranks the assessed feature selection algorithms, to fill in the target meta-features in meta-bases. This evaluation uses five state-of-the-art classification methods to induce recommendation models from meta-bases: C4.5, Random Forest, XGBoost, ANN, and SVM. The results showed that it is possible to reach an average accuracy of up to 90% applying our meta-feature engineering model. This work is the first to use an extensive empirical evaluation to provide a careful discussion of the strengths and limitations of more than 160 meta-features. These meta-features, while designed to aid the task of feature selection algorithm recommendation, can readily be employed in other metalearning scenarios. Therefore, we believe our findings are a valuable contribution to the fields of automated machine learning and data mining, as well as to the feature extraction and pattern recognition communities. • A novel meta-feature engineering model recommends feature selection algorithms. • The proposal obtains promising results from 213 datasets with hit rates of up to 90%. • Some simple, landmarking, image, and graph-based input meta-features highlighted. • A multi-criteria performance measure rigorously assesses candidate algorithms. • Chains of binary or multiclass classifiers can efficiently rank candidate algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Machine learning based very short term load forecasting of machine tools.
- Author
-
Dietrich, Bastian, Walther, Jessica, Weigold, Matthias, and Abele, Eberhard
- Subjects
- *
LOAD forecasting (Electric power systems) , *MACHINE tools , *MACHINE learning , *LOADERS (Machines) , *ELECTRIC power distribution grids , *ARTIFICIAL neural networks , *FEATURE selection - Abstract
• Accurate transferrable machine learning based load forecasting of machine tools. • Automated data preprocessing, feature construction and selection process. • Time lag and moving average feature construction increases forecasting accuracy. • Autocorrelation function is a valuable information source of feature construction. With the ongoing integration of renewable energies into the electrical power grid, industrial energy flexibility gains importance. To enable demand response applications, knowledge about the future energy demand is necessary. This paper presents a machine learning process to forecast the very short term load of two machine tools, which can be utilized as a decision support basis for control schemes and measures to increase energy flexibility and decrease energy cost in manufacturing. The presented process is developed and evaluated on production machines in a research factory. The results indicate that the developed machine learning process is feasible and creates an accurate very short term load forecasting model for different production machines. It can be used as a blueprint to develop load forecasting models for other production machines using the historic load profile and various machine and process data. A combination of time series features and an Artificial Neural Network proves to be the most robust model regarding the presented machine tools with achieved coefficients of determination between 0.57 and 0.64 for a 100 step forecast. Improvements are still needed regarding the forecasting accuracy, especially of load peaks, for which different measures are proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.