Publication Year Range: Last 50 years / Publisher: springer nature / Topic: feature selection and machine learning - Searchworks@Jio Institute Digital Library Search Results

Showing total 1,044 results

Start Over Topic feature selection Topic machine learning Publication Year Range Last 50 years Publisher springer nature

1,044 results

1. Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods: Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

Author: Fahimifar, Sepideh, Mousavi, Khadijeh, Mozaffari, Fatemeh, and Ausloos, Marcel
Subjects: MACHINE learning, DATA mining, FEATURE selection, PEARSON correlation (Statistics), MEDICAL informatics, STATISTICAL correlation
Abstract: Highly cited papers are influenced by external factors that are not directly related to the document's intrinsic quality. In this study, 50 characteristics for measuring the performance of 68 highly cited papers, from the Journal of The American Medical Informatics Association indexed in Web of Science (WOS), from 2009 to 2019 were investigated. In the first step, a Pearson correlation analysis is performed to eliminate variables with zero or weak correlation with the target ("dependent") variable (number of citations in WOS). Consequently, 32 variables are selected for the next step. By applying the Ridge technique, 13 features show a positive effect on the number of citations. Using three different algorithms, i.e., Ridge, Lasso, and Boruta, 6 factors appear to be the most relevant ones. The "Number of citations by international researchers", "Journal self-citations in citing documents", and "Authors' self-citations in citing documents", are recognized as the most important features by all three methods here used. The "First author's scientific age", "Open-access paper", and "Number of first author's citations in WOS" are identified as the important features of highly cited papers by only two methods, Ridge and Lasso. Notice that we use specific machine learning algorithms as feature selection methods (Ridge, Lasso, and Boruta) to identify the most important features of highly cited papers, tools that had not previously been used for this purpose. In conclusion, we re-emphasize the performance resulting from such algorithms. Moreover, we do not advise authors to seek to increase the citations of their articles by manipulating the identified performance features. Indeed, ethical rules regarding these characteristics must be strictly obeyed. Highlights: Comparing 3 feature selection methods: Ridge, Lasso, Boruta Analyzed data set relies on a representative sample from a health informatics journal Highly cited papers are studied "External" features which affect a high citation level are addressed [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Attribute-based quality classification of academic papers.

Author: Nakatoh, Tetsuya, Hirokawa, Sachio, Minami, Toshiro, Nanri, Takeshi, and Funamori, Miho
Abstract: Investigating the relevant literature is very important for research activities. However, it is difficult to select the most appropriate and important academic papers from the enormous number of papers published annually. Researchers search paper databases by combining keywords, and then select papers to read using some evaluation measure—often, citation count. However, the citation count of recently published papers tends to be very small because citation count measures accumulated importance. This paper focuses on the possibility of classifying high-quality papers superficially using attributes such as publication year, publisher, and words in the abstract. To examine this idea, we construct classifiers by applying machine-learning algorithms and evaluate these classifiers using cross-validation. The results show that our approach effectively finds high-quality papers. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

3. How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts.

Author: Kocak, Burak, Kus, Ece Ates, and Kilickesmez, Ozgur
Subjects: *ARTIFICIAL intelligence, *MACHINE learning, *FEATURE selection, *INFORMATION sharing, *RADIOLOGY
Abstract: In recent years, there has been a dramatic increase in research papers about machine learning (ML) and artificial intelligence in radiology. With so many papers around, it is of paramount importance to make a proper scientific quality assessment as to their validity, reliability, effectiveness, and clinical applicability. Due to methodological complexity, the papers on ML in radiology are often hard to evaluate, requiring a good understanding of key methodological issues. In this review, we aimed to guide the radiology community about key methodological aspects of ML to improve their academic reading and peer-review experience. Key aspects of ML pipeline were presented within four broad categories: study design, data handling, modelling, and reporting. Sixteen key methodological items and related common pitfalls were reviewed with a fresh perspective: database size, robustness of reference standard, information leakage, feature scaling, reliability of features, high dimensionality, perturbations in feature selection, class balance, bias-variance trade-off, hyperparameter tuning, performance metrics, generalisability, clinical utility, comparison with traditional tools, data sharing, and transparent reporting. Key Points • Machine learning is new and rather complex for the radiology community. • Validity, reliability, effectiveness, and clinical applicability of studies on machine learning can be evaluated with a proper understanding of key methodological concepts about study design, data handling, modelling, and reporting. • Understanding key methodological concepts will provide a better academic reading and peer-review experience for the radiology community. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

4. Feature engineering of EEG applied to mental disorders: a systematic mapping study.

Author: García-Ponsoda, Sandra, García-Carrasco, Jorge, Teruel, Miguel A., Maté, Alejandro, and Trujillo, Juan
Subjects: MENTAL illness, MACHINE learning, ELECTROENCEPHALOGRAPHY, ARTIFICIAL intelligence, ENGINEERING
Abstract: Around a third of the total population of Europe suffers from mental disorders. The use of electroencephalography (EEG) together with Machine Learning (ML) algorithms to diagnose mental disorders has recently been shown to be a prominent research area, as exposed by several reviews focused on the field. Nevertheless, previous to the application of ML algorithms, EEG data should be correctly preprocessed and prepared via Feature Engineering (FE). In fact, the choice of FE techniques can make the difference between an unusable ML model and a simple, effective model. In other words, it can be said that FE is crucial, especially when using complex, non-stationary data such as EEG. To this aim, in this paper we present a Systematic Mapping Study (SMS) focused on FE from EEG data used to identify mental disorders. Our SMS covers more than 900 papers, making it one of the most comprehensive to date, to the best of our knowledge. We gathered the mental disorder addressed, all the FE techniques used, and the Artificial Intelligence (AI) algorithm applied for classification from each paper. Our main contributions are: (i) we offer a starting point for new researchers on these topics, (ii) we extract the most used FE techniques to classify mental disorders, (iii) we show several graphical distributions of all used techniques, and (iv) we provide critical conclusions for detecting mental disorders. To provide a better overview of existing techniques, the FE process is divided into three parts: (i) signal transformation, (ii) feature extraction, and (iii) feature selection. Moreover, we classify and analyze the distribution of existing papers according to the mental disorder they treat, the FE processes used, and the ML techniques applied. As a result, we provide a valuable reference for the scientific community to identify which techniques have been proven and tested and where the gaps are located in the current state of the art. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Comprehensive review of solar radiation modeling based on artificial intelligence and optimization techniques: future concerns and considerations.

Author: Attar, Nasrin Fathollahzadeh, Sattari, Mohammad Taghi, Prasad, Ramendra, and Apaydin, Halit
Subjects: SOLAR radiation, ARTIFICIAL intelligence, MATHEMATICAL optimization, RENEWABLE energy sources, SOLAR energy, FEATURE selection
Abstract: An alternative energy source such as solar is one of the most important renewable resources. A reliable solar radiation prediction is essential for various applications in agriculture, industry, transport, and the environment because they reduce greenhouse gases and are environmentally friendly. Solar radiation data series have embedded fluctuations and noise signals due to complexity, stochasticity, non-stationarity, and nonlinearity with uncertain and time-varying nature. Aside from being highly nonlinear, solar radiation is highly influenced by the environment and environmental parameters such as air temperature, cloud cover, surface reflectivity, and aerosols. In addition, the spatial measurements of these variables are not readily available. To tackle these challenges, it is necessary to consider data preprocessing techniques and to develop and test precise solar radiation predicting models at different forecast horizons. There is, however, controversy regarding the performance of such models in various studies. Comparisons are not conducted systematically among the different studies. Using a critical literature review, the authors hope to answer these questions and believe that further investigation of solar radiation can benefit researchers and practitioners alike. This study presents a comprehensive evaluation of solar radiation modeling using artificial intelligence in the last 15 years and provides a novel detailed analysis of the available models. The studies conducted in different climates of the world that were published in distinguished journals were considered (i.e., 90 papers in total) for this purpose. Newly discovered procedures for optimizing forecasts, data cleaning, feature selection, classification methods, and stand-alone or hybrid data-driven models for solar radiation prediction and modeling were evaluated. The results strikingly showed that the most used artificial intelligence methods were artificial neural network, adaptive neuro-fuzzy inference system, and decision tree family of models. In addition, the extreme learning machine, support vector machine, and particle swarm optimization were the most used optimization techniques in solar radiation modeling. In terms of forecast horizons, the most common forecast horizon found in papers was on the daily scale (51% of studies), followed by the hourly scale (26%), and the least common was the monthly scale (18%). Based on the regional studies, the highest number of solar radiation papers originated from Asia, with Europe in second place and African countries in third place. An increasing trend in the number of papers from 2011 to 2015 was noted, and the second peak started from 2018 till the present. Under each section, a summary of findings is provided. The paper concludes with future thoughts and directions on solar radiation modeling. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. Application of machine learning in predicting mechanical properties of sandcrete blocks made from quarry dust: a review.

Author: Braimah, John Igeimokhia, Ajagbe, Wasiu Olabamiji, and Olonade, Kolawole Adisa
Subjects: SUSTAINABILITY, FEATURE selection, MATERIALS science, QUARRIES & quarrying, MACHINE learning
Abstract: Quarry dust, conventionally considered waste, has emerged as a potential solution for sustainable construction materials. This paper comprehensively review the mechanical properties of blocks manufactured from quarry dust, with a particular focus on the transformative role of machine learning (ML) in predicting and optimizing these properties. By systematically reviewing existing literature and case studies, this paper evaluates the efficacy of ML methodologies, addressing challenges related to data quality, feature selection, and model optimization. It underscores how ML can enhance accuracy in predicting mechanical properties, providing a valuable tool for engineers and researchers to optimize the design and composition of blocks made from quarry dust. This synthesis of mechanical properties and ML applications contributes to advancing sustainable construction practices, offering insights into the future integration of technology for predictive modeling in material science. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Review of islanding detection using advanced signal processing techniques.

Author: Vadlamudi, Bindu and Anuradha, T.
Subjects: DEEP learning, SIGNAL processing, MACHINE learning, DISTRIBUTED power generation, FEATURE selection, RELIABILITY in engineering
Abstract: Increasing the integration of distributed generation (DG) into distribution networks provides many technological benefits, including improving system security, performance, and reliability. The intermittent nature of renewable DGs poses certain difficulties for this integration. Moreover, the large integration of DGs will lead to islanding conditions in the power system. In the islanded operation, the microgrid keeps power injection into the network. The islanding event can occur intentionally or unintentionally; the former is controllable and required for maintaining the main utility, whereas the latter is uncontrollable, caused by regular faults. However, islanding detection is important for ensuring the system's reliability and operation. Hence, a comprehensive review is made in this paper to examine different methods for islanding detection in the power system. Unlike the previous review papers, the proposed review paper focused on different types of islanding detection methods, including active, reactive, hybrid active–passive methods, deep learning and machine learning techniques. All of these methods have their advantage and disadvantages. Moreover, the complexities in power systems increased with the increasing penetration of DGs. Thus, the rapid islanding detection technique is necessary for improving the system's performance. This review has provided some recent literature for signal processing, which includes the recent feature selection and advance finding for islanding occurrences. From the comparative analysis, it is found that the Non-detection zone (NDZ) is more in the passive method is higher than in the active and hybrid active–passive methods. At the same time, the remote islanding detection methods are NDZ free, but it has computational complexities when compared with existing methods. Moreover, the DL based methods have higher computational time due to large training and testing data. It is found that hybrid methods are more feasible for providing accurate results in islanding detection. In addition to that, a feasible and economical solution in terms of recent research trends is provided. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Machine learning in crime prediction.

Author: Jenga, Karabo, Catal, Cagatay, and Kar, Gorkem
Abstract: Predicting crimes before they occur can save lives and losses of property. With the help of machine learning, many researchers have studied predicting crimes extensively. In this paper, we evaluate state-of-the-art crime prediction techniques that are available in the last decade, discuss possible challenges, and provide a discussion about the future work that could be conducted in the field of crime prediction. Although many works aim to predict crimes, the datasets they used and methods that are applied are numerous. Using a Systematic Literature Review (SLR) methodology, we aim to collect and synthesize the required knowledge regarding machine learning-based crime prediction and help both law enforcement authorities and scientists to mitigate and prevent future crime occurrences. We focus primarily on 68 selected machine learning papers that predict crime. We formulate eight research questions and observe that the majority of the papers used a supervised machine learning approach, assuming that there is prior labeled data, and however in some cases, there is no labeled data in real-world scenarios. We have also discussed the main challenges found while conducting some of the studies by the researchers. We consider that this research paves the way for further research to help governments and countries fight crime and decrease this for better safety and security. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. Successful intrusion detection with a single deep autoencoder: theory and practice.

Author: Catillo, Marta, Pecchia, Antonio, and Villano, Umberto
Subjects: INTRUSION detection systems (Computer security), PATTERN recognition systems, COMPUTER security, FEATURE selection, MACHINE learning, THEORY-practice relationship
Abstract: Intrusion detection is a key topic in computer security. Due to the ever-increasing number of network attacks, several accurate anomaly-based techniques have been proposed for intrusion detection, wherein pattern recognition through machine learning techniques is typically used. Many proposals rely on the use of autoencoders, due to their capability to analyze complex, high-dimensional, and large-scale data. They capitalize on composite architectures and accurate learning approaches, possibly in combination with sophisticated feature selection techniques. However, due to their high complexity and lack of transferability of the impressive intrusion detection results, they are hardly ever used in production environments. This paper is developed around the intuition that complexity is not necessarily justified because a single autoencoder is enough to obtain similar, if not better, intrusion detection results compared to related proposals. The wide study presented here addresses the effect of the seed, a deep investigation on the training loss, and feature selection across the use of different hardware platforms. The best practices presented, regarding set-up and training, threshold setting, and possible use of feature selection techniques for performance improvement, can be valuable for any future work on the use of autoencoders for successful intrusion detection purposes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Applications of deep learning for mobile malware detection: A systematic literature review.

Author: Catal, Cagatay, Giray, Görkem, and Tekinerdogan, Bedir
Subjects: FEATURE selection, DEEP learning, SUPERVISED learning, MALWARE, CONVOLUTIONAL neural networks, MACHINE learning
Abstract: For detecting and resolving the various types of malware, novel techniques are proposed, among which deep learning algorithms play a crucial role. Although there has been a lot of research on the development of DL-based mobile malware detection approaches, they were not reviewed in detail yet. This paper aims to identify, assess, and synthesize the reported articles related to the application of DL techniques for mobile malware detection. A Systematic Literature Review is performed in which we selected 40 journal articles for in-depth analysis. This SLR presents and categorizes these articles based on machine learning categories, data sources, DL algorithms, evaluation parameters & approaches, feature selection techniques, datasets, and DL implementation platforms. The study also highlights the challenges, proposed solutions, and future research directions on the use of DL in mobile malware detection. This study showed that Convolutional Neural Networks and Deep Neural Networks algorithms are the most used DL algorithms. API calls, Permissions, and System Calls are the most dominant features utilized. Keras and Tensorflow are the most popular platforms. Drebin and VirusShare are the most widely used datasets. Supervised learning and static features are the most preferred machine learning and data source categories. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications.

Author: Uddin, Islam, Awan, Hamid Hussain, Khalid, Majdi, Khan, Salman, Akbar, Shahid, Sarker, Mahidur R., Abdolrasol, Maher G. M., and Alghamdi, Thamer A. H.
Subjects: MACHINE learning, RNA modification & restriction, BOOSTING algorithms, RNA analysis, FEATURE selection, CYTOSINE
Abstract: RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA's operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Timely detection of DDoS attacks in IoT with dimensionality reduction.

Author: Kumari, Pooja and Jain, Ankit Kumar
Subjects: MACHINE learning, DENIAL of service attacks, FISHER discriminant analysis, FEATURE selection, PRINCIPAL components analysis
Abstract: The exponential growth of IoT devices and their interdependency makes the technology more vulnerable to network attacks like Distributed Denial of Service (DDoS) that interrupt network resources. The prevalence of these attacks necessitates the development of robust and effective defense mechanisms. In recent years, many machine learning defense methodologies have been developed to address the ubiquitous growth of DDoS attacks on IoT, and the majority of them suffer from detection time delay issues. Thus, the paper presents an approach focusing on dimensionality reduction and feature selection techniques to minimize long-time detection without compromising accuracy. The proposed approach uses Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Factor Analysis and Recursive Feature Elimination with Cross Validation (RFECV) as the dimensionality reduction and feature selection techniques and Gaussian Naïve Bayes (GNB), Decision Tree (DT), Random Forest (RF), AdaBoost, and Logistic Regression (LR) as machine learning models to classify the malicious traffic. The approach provides a reliable DDoS detection model that effectively enhances the detection time delay with the combination of GNB with LDA and achieves 99.98% accuracy in 0.582 s of detection time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Unsupervised text feature selection by binary fire hawk optimizer for text clustering.

Author: Msallam, Mohammed M. and Bin Idris, Syahril Anuar
Subjects: FEATURE selection, DOCUMENT clustering, K-means clustering, METAHEURISTIC algorithms, MACHINE learning
Abstract: Feature selection plays a critical role in reducing high-dimensional feature space in machine learning applications without affecting the accuracy of performance. The feature selection problem has been extensively studied in the literature. Nevertheless, few studies have been conducted on unsupervised text feature selection because of the absence of feature class labels and local optimization limitations. For that, this paper proposes three binary versions of the fire hawk optimizer based on different transfer functions for unsupervised text feature selection that can be used to select the most informative features for text clustering. The internal feature subset was evaluated using a mean absolute difference filter. The performance of the proposed methods is tested and compared with other state-of-the-art metaheuristic algorithms on several different benchmark text datasets from different sources using various evaluation metrics. The K-means clustering is applied to cluster documents based on the features selected by the methods. The results of the experiments indicate the effectiveness of the proposed method based on the S-shaped transfer function for improving the performance of document clustering compared with other feature selection methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. FSSDroid: Feature subset selection for Android malware detection.

Author: Polatidis, Nikolaos, Kapetanakis, Stelios, Trovati, Marcello, Korkontzelos, Ioannis, and Manolopoulos, Yannis
Subjects: FEATURE selection, SUBSET selection, DATA security, MALWARE
Abstract: Android malware has become an increasingly important threat to individuals, organizations, and society, posing significant risks to data security, privacy, and infrastructure. As malware evolves in sophistication and complexity, the detection and mitigation of these malicious software instances have become more challenging and time consuming since the required number of features to identify potential malware can be very high. To address this issue, we have developed an effective feature selection methodology for malware detection in Android. The critical concern in the field of malware detection is the complexity of algorithms and the use of features that are used to detect malware. The present paper delivers a methodology for pre-processing datasets to select the most optimal features that will allow detecting malware, while maintaining very high accuracy. The proposed methodology has been tested on two real world datasets and the results indicate that the number of features is significantly reduced from 489 to between 19 and 28 for the first dataset and from 9503 to between 9 and 27 for the second dataset, whilst the accuracy is maintained as if all features were used. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Toward a globally lunar calendar: a machine learning-driven approach for crescent moon visibility prediction.

Author: Loucif, Samia, Al-Rajab, Murad, Abu Zitar, Raed, and Rezk, Mahmoud
Subjects: MACHINE learning, LUNAR calendar, FEATURE selection, ASTRONOMY, MOON
Abstract: This paper presents a comprehensive approach to harmonizing lunar calendars across different global regions, addressing the long-standing challenge of variations in new crescent Moon sightings that mark the beginning of lunar months. We propose a machine learning (ML)-based framework to predict the visibility of the new crescent Moon, representing a significant advancement toward a globally unified lunar calendar. Our study utilized a dataset covering various countries globally, making it the first to analyze all 12 lunar months over a span of 13 years. We applied a wide array of ML algorithms and techniques. These techniques included feature selection, hyperparameter tuning, ensemble learning, and region-based clustering, all aimed at maximizing the model's performance. The overall results reveal that the gradient boosting (GB) model surpasses all other models, achieving the highest F1 score of 0.882469 and an area under the curve (AUC) of 0.901009. However, with selected features identified through the ANOVA F-test and optimized parameters, the Extra Trees model exhibited the best performance with an F1 score of 0.887872, and an AUC of 0.906242. We expanded our analysis to explore ensemble models, aiming to understand how a combination of models might boost predictive accuracy. The Ensemble Model exhibited a slight improvement, with an F1 score of 0.888058 and an AUC of 0.907482. Additionally, the geographical segmentation of the dataset enhanced predictive performance in certain areas, such as Africa and Asia. In conclusion, ML techniques can provide efficient and reliable tool for predicting the new crescent Moon visibility that would support the decisions of marking the beginning of new lunar months. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Utilizing machine learning to analyze trunk movement patterns in women with postpartum low back pain.

Author: A. Abdel Hady, Doaa and Abd El-Hafeez, Tarek
Subjects: LUMBAR pain, RECEIVER operating characteristic curves, FEATURE selection, MACHINE learning, RANGE of motion of joints
Abstract: This paper presents an analysis of trunk movement in women with postnatal low back pain using machine learning techniques. The study aims to identify the most important features related to low back pain and to develop accurate models for predicting low back pain. Machine learning approaches showed promise for analyzing biomechanical factors related to postnatal low back pain (LBP). This study applied regression and classification algorithms to the trunk movement proposed dataset from 100 postpartum women, 50 with LBP and 50 without. The Optimized optuna Regressor achieved the best regression performance with a mean squared error (MSE) of 0.000273, mean absolute error (MAE) of 0.0039, and R2 score of 0.9968. In classification, the Basic CNN and Random Forest Classifier both attained near-perfect accuracy of 1.0, the area under the receiver operating characteristic curve (AUC) of 1.0, precision of 1.0, recall of 1.0, and F1-score of 1.0, outperforming other models. Key predictive features included pain (correlation of -0.732 with flexion range of motion), range of motion measures (flexion and extension correlation of 0.662), and average movements (correlation of 0.957 with flexion). Feature selection consistently identified pain, flexion, extension, lateral flexion, and average movement as influential across methods. While limited to this initial dataset and constrained by generalizability, machine learning offered quantitative insight. Models accurately regressed (MSE < 0.01, R2 > 0.95) and classified (accuracy > 0.94) trunk biomechanics distinguishing LBP. Incorporating additional demographic, clinical, and patient-reported factors may enhance individualized risk prediction and treatment personalization. This preliminary application of advanced analytics supported machine learning's potential utility for both LBP risk determination and outcome improvement. This study provides valuable insights into the use of machine learning techniques for analyzing trunk movement in women with postnatal low back pain and can potentially inform the development of more effective treatments. Trial registration: The trial was designed as an observational and cross-section study. The study was approved by the Ethical Committee in Deraya University, Faculty of Pharmacy, (No: 10/2023). According to the ethical standards of the Declaration of Helsinki. This study complies with the principles of human research. Each patient signed a written consent form after being given a thorough description of the trial. The study was conducted at the outpatient clinic from February 2023 till June 30, 2023. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Ensemble multi-view feature set partitioning method for effective multi-view learning.

Author: Singh, Ritika and Kumar, Vipin
Subjects: FEATURE selection, STATISTICS
Abstract: Multi-view learning consistently outperforms traditional single-view learning by leveraging multiple perspectives of data. However, the effectiveness of multi-view learning heavily relies on how the data are partitioned into feature sets. In many cases, different datasets may require different partitioning methods to capture their unique characteristics, making a single partitioning method insufficient. Finding an optimal feature set partitioning (FSP) for each dataset may be a time-consuming process, and the optimal FSP may still not be sufficient for all types of datasets. Therefore, the paper presents a novel approach called ensemble multi-view feature set partitioning (EMvFSP) to improve the performance of multi-view learning, a technique that uses multiple data sources to make predictions. The proposed EMvFSP method combines the different views produced by multiple partitioning methods to achieve better classification performance than any single partitioning method alone. The experiments were conducted on 15 structured datasets with varying ratios of samples, features, and labels, and the results showed that the proposed EMvFSP method effectively improved classification performance. The paper also includes statistical analyses using Friedman ranking and Holms procedure to demonstrate the effectiveness of the proposed method. This approach provides a robust solution for multi-view learning that can adapt to different types of datasets and partitioning methods, making it suitable for a wide range of applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Sparse semi-supervised multi-label feature selection based on latent representation.

Author: Zhao, Xue, Li, Qiaoyan, Xing, Zhiwei, Yang, Xiaofei, and Dai, Xuezhen
Subjects: FEATURE selection, MACHINE learning, DATA mining, SUPERVISED learning
Abstract: With the rapid development of the Internet, there are a large number of high-dimensional multi-label data to be processed in real life. To save resources and time, semi-supervised multi-label feature selection, as a dimension reduction method, has been widely used in many machine learning and data mining. In this paper, we design a new semi-supervised multi-label feature selection algorithm. First, we construct an initial similarity matrix with supervised information by considering the similarity between labels, so as to learn a more ideal similarity matrix, which can better guide feature selection. By combining latent representation with semi-supervised information, a more ideal pseudo-label matrix is learned. Second, the local manifold structure of the original data space is preserved by the manifold regularization term based on the graph. Finally, an effective alternating iterative updating algorithm is applied to optimize the proposed model, and the experimental results on several datasets prove the effectiveness of the approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. A critical systematic review on spectral-based soil nutrient prediction using machine learning.

Author: Jain, Shagun, Sethia, Divyashikha, and Tiwari, Kailash Chandra
Subjects: SUSTAINABLE agriculture, AGRICULTURE, MACHINE learning, FEATURE selection, DEEP learning, MULTISPECTRAL imaging
Abstract: The United Nations (UN) emphasizes the pivotal role of sustainable agriculture in addressing persistent starvation and working towards zero hunger by 2030 through global development. Intensive agricultural practices have adversely impacted soil quality, necessitating soil nutrient analysis for enhancing farm productivity and environmental sustainability. Researchers increasingly turn to Artificial Intelligence (AI) techniques to improve crop yield estimation and optimize soil nutrition management. This study reviews 155 papers published from 2014 to 2024, assessing the use of machine learning (ML) and deep learning (DL) in predicting soil nutrients. It highlights the potential of hyperspectral and multispectral sensors, which enable precise nutrient identification through spectral analysis across multiple bands. The study underscores the importance of feature selection techniques to improve model performance by eliminating redundant spectral bands with weak correlations to targeted nutrients. Additionally, the use of spectral indices, derived from mathematical ratios of spectral bands based on absorption spectra, is examined for its effectiveness in accurately predicting soil nutrient levels. By evaluating various performance measures and datasets related to soil nutrient prediction, this paper offers comprehensive insights into the applicability of AI techniques in optimizing soil nutrition management. The insights gained from this review can inform future research and policy decisions to achieve global development goals and promote environmental sustainability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Classification and prediction of drought and salinity stress tolerance in barley using GenPhenML.

Author: Akbari, Mahjoubeh, Sabouri, Hossein, Sajadi, Sayed Javad, Yarahmadi, Saeed, and Ahangar, Leila
Subjects: ARTIFICIAL neural networks, MACHINE learning, FEATURE selection, ARTIFICIAL intelligence, EVIDENCE gaps, BARLEY
Abstract: Genetic and agronomic advances consistently lead to an annual increase in global barley yield. Since abiotic stresses (physical environmental factors that negatively affect plant growth) reduce barley yield, it is necessary to predict barley resistance. Artificial intelligence and machine learning (ML) models are new and powerful tools for predicting product resilience. Considering the research gap in the use of molecular markers in predicting abiotic stresses, this paper introduces a new approach called GenPhenML that combines molecular markers and phenotypic traits to predict the resistance of barley genotypes to drought and salinity stresses by ML models. GenPhenML uses feature selection algorithms to determine the most important molecular markers. It then identifies the best model that predicts atmospheric resistance with lower MAE, RMSE, and higher R2. The results showed that GenPhenML with a neural network model predicted the salinity stress resistance score with MAE, RMSE and R2 values of 0.1206, 0.0308 and 0.9995, respectively. Also, the NN model predicted drought stress scores with MAE, RMSE and R2 values of 0.0727, 0.0105 and 0.9999, respectively. The GenPhenML approach was also used to classify barley genotypes as resistant and stress-sensitive. The results showed that the accuracy, accuracy and F1 score of the proposed approach for salinity and drought stress classification were higher than 97%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Enhancing intrusion detection: a hybrid machine and deep learning approach.

Author: Sajid, Muhammad, Malik, Kaleem Razzaq, Almogren, Ahmad, Malik, Tauqeer Safdar, Khan, Ali Haider, Tanveer, Jawad, and Rehman, Ateeq Ur
Subjects: MACHINE learning, TELECOMMUNICATION, FEATURE selection, TECHNOLOGICAL innovations, DEEP learning, CONVOLUTIONAL neural networks
Abstract: The volume of data transferred across communication infrastructures has recently increased due to technological advancements in cloud computing, the Internet of Things (IoT), and automobile networks. The network systems transmit diverse and heterogeneous data in dispersed environments as communication technology develops. The communications using these networks and daily interactions depend on network security systems to provide secure and reliable information. On the other hand, attackers have increased their efforts to render systems on networks susceptible. An efficient intrusion detection system is essential since technological advancements embark on new kinds of attacks and security limitations. This paper implements a hybrid model for Intrusion Detection (ID) with Machine Learning (ML) and Deep Learning (DL) techniques to tackle these limitations. The proposed model makes use of Extreme Gradient Boosting (XGBoost) and convolutional neural networks (CNN) for feature extraction and then combines each of these with long short-term memory networks (LSTM) for classification. Four benchmark datasets CIC IDS 2017, UNSW NB15, NSL KDD, and WSN DS were used to train the model for binary and multi-class classification. With the increase in feature dimensions, current intrusion detection systems have trouble identifying new threats due to low test accuracy scores. To narrow down each dataset's feature space, XGBoost, and CNN feature selection algorithms are used in this work for each separate model. The experimental findings demonstrate a high detection rate and good accuracy with a relatively low False Acceptance Rate (FAR) to prove the usefulness of the proposed hybrid model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Cloud-based email phishing attack using machine and deep learning algorithm.

Author: Butt, Umer Ahmed, Amin, Rashid, Aldabbas, Hamza, Mohan, Senthilkumar, Alouffi, Bader, and Ahmadian, Ali
Subjects: MACHINE learning, PHISHING, DEEP learning, SPAM email, SUPPORT vector machines, COMPUTER systems
Abstract: Cloud computing refers to the on-demand availability of personal computer system assets, specifically data storage and processing power, without the client's input. Emails are commonly used to send and receive data for individuals or groups. Financial data, credit reports, and other sensitive data are often sent via the Internet. Phishing is a fraudster's technique used to get sensitive data from users by seeming to come from trusted sources. The sender can persuade you to give secret data by misdirecting in a phished email. The main problem is email phishing attacks while sending and receiving the email. The attacker sends spam data using email and receives your data when you open and read the email. In recent years, it has been a big problem for everyone. This paper uses different legitimate and phishing data sizes, detects new emails, and uses different features and algorithms for classification. A modified dataset is created after measuring the existing approaches. We created a feature extracted comma-separated values (CSV) file and label file, applied the support vector machine (SVM), Naive Bayes (NB), and long short-term memory (LSTM) algorithm. This experimentation considers the recognition of a phished email as a classification issue. According to the comparison and implementation, SVM, NB and LSTM performance is better and more accurate to detect email phishing attacks. The classification of email attacks using SVM, NB, and LSTM classifiers achieve the highest accuracy of 99.62%, 97% and 98%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. Short-term inflow forecasting in a dam-regulated river in Southwest Norway using causal variational mode decomposition.

Author: Yousefi, Mojtaba, Wang, Jinghao, Fandrem Høivik, Øivind, Rajasekharan, Jayaprakash, Hubert Wierling, August, Farahmand, Hossein, and Arghandeh, Reza
Subjects: FORECASTING, MACHINE learning, DAMS, FEATURE selection, WATERSHEDS, CAUSAL inference, WATER power, RESERVOIRS
Abstract: Climate change affects patterns and uncertainties associated with river water regimes, which significantly impact hydropower generation and reservoir storage operation. Hence, reliable and accurate short-term inflow forecasting is vital to face climate effects better and improve hydropower scheduling performance. This paper proposes a Causal Variational Mode Decomposition (CVD) preprocessing framework for the inflow forecasting problem. CVD is a preprocessing feature selection framework that is built upon multiresolution analysis and causal inference. CVD can reduce computation time while increasing forecasting accuracy by down-selecting the most relevant features to the target value (inflow in a specific location). Moreover, the proposed CVD framework is a complementary step to any machine learning-based forecasting method as it is tested with four different forecasting algorithms in this paper. CVD is validated using actual data from a river system downstream of a hydropower reservoir in the southwest of Norway. The experimental results show that CVD-LSTM reduces forecasting error metric by almost 70% compared with a baseline (scenario 1) and reduces by 25% compared to an LSTM for the same composition of input data (scenario 4). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Feature mining and classifier selection for API calls-based malware detection.

Author: Balan, Gheorghe, Simion, Ciprian-Alin, Gavriluţ, Dragoş Teodor, and Luchian, Henri
Subjects: MACHINE learning, MALWARE, DATABASES, FEATURE selection, APPLICATION program interfaces, MACHINE performance, DECISION trees
Abstract: This paper deals with a major challenge in cyber-security: the need to respond to ever renewed techniques used by attackers in order to avoid detection based on analysing static features of malware. These constantly renewed techniques consist of various changes in file geometry, entropy a.s.o. As a consequence, static malware features sets describe less and less accurately the malicious files; hence, the performance of machine learning models in detecting new variants of the same malware family may be severely impaired. The paper focuses on a promising approach to this detection challenge: defining file features based on OS (operating system) API (Application Program Interface) calls sequences. We explore in detail the detection potential of such features, since, in order to act maliciously, these features are highly unlikely to be hidden. We studied several tens of thousands of such features, a modest-sized subset of which were subsequently fed to several machine learning models. The database used for training and testing consists of 1.5 million files, including malicious files from the polymorphic families Emotet and Trickbot. Using this database, nearly 4,000 pairings (classifier, feature selection algorithm) were trained / tested. Our experimental results show that the API (Application Program Interface) calls-oriented feature mining process is well suited for detecting polymorphic malware. A comparative discussion of the detection results of the various models is presented; depending on the target optimisation criterion (detection rate / false positive rate / saving resources), three of the 4,000 classification models turn out to be best suited for real-world applications: Random Forrest, Legacy Neural Networks and Decision Tree. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. Revisiting of peer-to-peer traffic: taxonomy, applications, identification techniques, new trends and challenges.

Author: Ansari, Md. Sarfaraj Alam, Pal, Kunwar, and Govil, Mahesh Chandra
Subjects: MACHINE learning, FEATURE selection, INTERNET traffic, TRANSMISSION of texts, COMPUTER network architectures, TAXONOMY, PEER-to-peer architecture (Computer networks)
Abstract: The services provided through peer-to-peer (P2P) architecture involve the transmission of text, images, documents, and multimedia. Especially the distribution of multimedia content like video and audio is mainly demanded by clients and has become the major reason for generating traffic by consuming significant bandwidth. This traffic is mostly generated by P2P applications like Napster, Gnutella, BitTorrent, PPTV, YuppTV, and many more. To use the network bandwidth proficiently, thus classification and identification of this Internet traffic became necessary. Moreover, it is required to classify the specific P2P application traffic, so data distribution over the P2P network can be improved. This survey paper discusses the working of different P2P applications for which traffic is created and raises related issues. The paper deliberates the various techniques and overlays that are used to provide the services over the P2P network. This paper includes the various techniques of feature selection and the machine learning algorithm for the identification and classification of internet traffic. This paper also reviewed the recent developments and highlights the future direction of research work in P2P networks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Soft computing techniques for biomedical data analysis: open issues and challenges.

Author: Houssein, Essam H., Hosney, Mosa E., Emam, Marwa M., Younis, Eman M. G., Ali, Abdelmgeid A., and Mohamed, Waleed M.
Subjects: SOFT computing, DATA analysis, MACHINE learning, FEATURE selection, MEDICAL personnel, MEDICAL databases
Abstract: In recent years, medical data analysis has become paramount in delivering accurate diagnoses for various diseases. The plethora of medical data sources, encompassing disease types, disease-related proteins, ligands for proteins, and molecular drug components, necessitates adopting effective disease analysis and diagnosis methods. Soft computing techniques, including swarm algorithms and machine learning (ML) methods, have emerged as superior approaches. While ML techniques such as classification and clustering have gained prominence, feature selection methods are crucial in extracting optimal features and reducing data dimensions. This review paper presents a comprehensive overview of soft computing techniques for tackling medical data problems through classifying and analyzing medical data. The focus lies mainly on the classification of medical data resources. A detailed examination of various techniques developed for classifying numerous diseases is provided. The review encompasses an in-depth exploration of multiple ML methods designed explicitly for disease detection and classification. Additionally, the review paper offers insights into the underlying biological disease mechanisms and highlights several medical and chemical databases that facilitate research in this field. Furthermore, the review paper outlines emerging trends and identifies the key challenges in biomedical data analysis. It sheds light on this research domain's exciting possibilities and future directions. The enhanced understanding of soft computing techniques and their practical applications and limitations will contribute to advancing biomedical data analysis and support healthcare professionals in making accurate diagnoses. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. A novel ensemble learning-based model for network intrusion detection.

Author: Thockchom, Ngamba, Singh, Moirangthem Marjit, and Nandi, Utpal
Subjects: INTRUSION detection systems (Computer security), MACHINE learning, FEATURE selection, CHI-squared test, COMPUTER network security, REGRESSION trees
Abstract: The growth of Internet and the services provided by it has been growing exponentially in the past few decades. With such growth, there is also an ever-increasing threat to the security of networks. Several efficient countermeasures have been placed to deal with these threats in the network, such as the intrusion detection system (IDS). This paper proposes an ensemble learning-based method for building an intrusion detection model. The model proposed in this paper has relatively better overall performance than its individual classifiers. This ensemble model is constructed using lightweight machine learning models, i.e., Gaussian naive Bayes, logistic regression and decision tree as the base classifier and stochastic gradient descent as the meta-classifier. The performance of this proposed model and the individual classifiers used to build the ensemble model is trained and evaluated using three datasets, namely, KDD Cup 1999, UNSW-NB15 and CIC-IDS2017. The performance is evaluated for binary class as well as multiclass classifications. The proposed method also incorporates the usage of a feature selection method called Chi-square test to select only the most relevant features. The empirical results definitively prove that using an ensemble classifier can be immensely helpful in the field of intrusion detection system with unbalanced datasets where misclassifications can be costly. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Classification and spectrum optimization method of grease based on infrared spectrum.

Author: Feng, Xin, Xia, Yanqiu, Xie, Peiyuan, and Li, Xiaohe
Subjects: INFRARED spectra, SELF-organizing maps, FEATURE selection, MACHINE learning, ABSORPTION spectra, CLASSIFICATION
Abstract: The infrared (IR) absorption spectral data of 63 kinds of lubricating greases containing six different types of thickeners were obtained using the IR spectroscopy. The Kohonen neural network algorithm was used to identify the type of the lubricating grease. The results show that this machine learning method can effectively eliminate the interference fringes in the IR spectrum, and complete the feature selection and dimensionality reduction of the high-dimensional spectral data. The 63 kinds of greases exhibit spatial clustering under certain IR spectrum recognition spectral bands, which are linked to characteristic peaks of lubricating greases and improve the recognition accuracy of these greases. The model achieved recognition accuracy of 100.00%, 96.08%, 94.87%, 100.00%, and 87.50% for polyurea grease, calcium sulfonate composite grease, aluminum (Al)-based grease, bentonite grease, and lithium-based grease, respectively. Based on the different IR absorption spectrum bands produced by each kind of lubricating grease, the three-dimensional spatial distribution map of the lubricating grease drawn also verifies the accuracy of classification while recognizing the accuracy. This paper demonstrates fast recognition speed and high accuracy, proving that the Kohonen neural network algorithm has an efficient recognition ability for identifying the types of the lubricating grease. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Unveiling hidden factors: explainable AI for feature boosting in speech emotion recognition.

Author: Nfissi, Alaa, Bouachir, Wassim, Bouguila, Nizar, and Mishara, Brian
Subjects: EMOTION recognition, MACHINE learning, ARTIFICIAL intelligence, DATABASES, FEATURE selection, BOOSTING algorithms
Abstract: Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) dataset, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework. The source code of this paper is publicly available via this https://github.com/alaaNfissi/Unveiling-Hidden-Factors-Explainable-AI-for-Feature-Boosting-in-Speech-Emotion-Recognition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. IDS-XGbFS: a smart intrusion detection system using XGboostwith recent feature selection for VANET safety.

Author: Amaouche, Sara, AzidineGuezzaz, Benkirane, Said, and MouradeAzrour
Subjects: FEATURE selection, INTRUSION detection systems (Computer security), INTELLIGENT transportation systems, CONVOLUTIONAL neural networks, TECHNOLOGICAL innovations, VEHICULAR ad hoc networks
Abstract: The Vehicular Ad Hoc Network (VANET) is a novel and innovative technology which is part of the Intelligent Transportation Systems (ITS). VANET is a network composed of a collection of vehicles and other roadside components that are interconnected wirelessly. The intention underlying the development of this technology is the improvement of the vehicle environment and the enhancement of vehicle and driver safety. Nevertheless, since VANETs operate wirelessly and under complicated conditions, they are susceptible to a variety of attacks by malicious actors. Traditional techniques such as encryption are no longer effective, so new techniques using intrusion detection systems IDS have attracted the attention of a large number of researchers. The IDS scans the entire network and identifies all the possible harmful nodes present in the network. The present paper covers the problem of the identification of attacks in VANET by using XGBoost. The effectiveness analysis of the proposed models has been realized on the NSL-KDD and 5RoutingMetrics datasets combined with various feature selection techniques Boruta and Adaptive Synthetic Sampling Approach (ADASYN). Furthermore, the acquired results are being compared to two of the last most used ensemble methods CatBoost and convolutional neural networks CNN.In comparison with the other IDSs, our model approach achieves high performance in accuracy, recall and precision. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Machine learning-based intrusion detection: feature selection versus feature extraction.

Author: Ngo, Vu-Duc, Vuong, Tuan-Cuong, Van Luong, Thien, and Tran, Hung
Subjects: INTRUSION detection systems (Computer security), FEATURE selection, MACHINE learning, FEATURE extraction, DATA security failures, SMART cities
Abstract: Internet of Things (IoTs) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as K = 4 . Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Table 14 at the end of Sect. 4. Note that such the comparison between feature selection and feature extraction over UNSW-NB15 as well as theoretical guideline have been overlooked in the literature. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Selecting EEG channels and features using multi-objective optimization for accurate MCI detection: validation using leave-one-subject-out strategy.

Author: Aljalal, Majid, Aldosari, Saeed A., Molinas, Marta, and Alturki, Fahd A.
Subjects: MULTI-objective optimization, ELECTROENCEPHALOGRAPHY, DISCRETE wavelet transforms, UNCERTAINTY (Information theory), FEATURE extraction, MILD cognitive impairment, FRACTAL dimensions
Abstract: Effective management of dementia requires the timely detection of mild cognitive impairment (MCI). This paper introduces a multi-objective optimization approach for selecting EEG channels (and features) for the purpose of detecting MCI. Firstly, each EEG signal from each channel is decomposed into subbands using either variational mode decomposition (VMD) or discrete wavelet transform (DWT). A feature is then extracted from each subband using one of the following measures: standard deviation, interquartile range, band power, Teager energy, Katz's and Higuchi's fractal dimensions, Shannon entropy, sure entropy, or threshold entropy. Different machine learning techniques are used to classify the features of MCI cases from those of healthy controls. The classifier's performance is validated using leave-one-subject-out (LOSO) cross-validation (CV). The non-dominated sorting genetic algorithm (NSGA)-II is designed with the aim of minimizing the number of EEG channels (or features) and maximizing classification accuracy. The performance is evaluated using a publicly available online dataset containing EEGs from 19 channels recorded from 24 participants. The results demonstrate a significant improvement in performance when utilizing the NSGA-II algorithm. By selecting only a few appropriate EEG channels, the LOSO CV-based results show a significant improvement compared to using all 19 channels. Additionally, the outcomes indicate that accuracy can be further improved by selecting suitable features from different channels. For instance, by combining VMD and Teager energy, the SVM accuracy obtained using all channels is 74.24%. Interestingly, when only five channels are selected using NSGA-II, the accuracy increases to 91.56%. The accuracy is further improved to 95.28% when using only 8 features selected from 7 channels. This demonstrates that by choosing informative features or channels while excluding noisy or irrelevant information, the impact of noise is reduced, resulting in improved accuracy. These promising findings indicate that, with a limited number of channels and features, accurate diagnosis of MCI is achievable, which opens the door for its application in clinical practice. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Cognitive pairwise comparison forward feature selection with deep learning for astronomical object classification with sloan digital sky survey.

Author: Yuen, Kevin Kam Fung
Subjects: DEEP learning, FEATURE selection, ASTRONOMICAL surveys, MACHINE learning, CLASSIFICATION, DATA science, REDSHIFT
Abstract: This paper proposes a hybrid approach integrating the expert knowledge judgment approach using the Cognitive Pairwise Comparison (CPC) to the Deep Learning, a modern classification approach, for astronomic object classification. The astronomic data with ten thousand samples retrieved from Sloan Digital Sky Survey Sky Server Data Release 15 (SDSS SkyServer DR 15) are used for this study. The CPC is an approach to elicit and encode expert knowledge in the format of a Pairwise Opposite Matrix (POM) to evaluate expert preferences for the features. A forward feature selection algorithm taking the expert choices using CPC for the ordered features is used for the feature selection for the deep learning algorithm to build a heuristic training model based on the astronomic data. Whilst the accuracy of the case of improper feature selection is just 37.1%, the proposed hybrid approach can obtain a very high accuracy of 97.9% for the classification of the astronomic object using the eight scaled features (u, g, r, i, z redshift, ra, dec). To extend this research, the proposed CPC can be used as a human-centered tool to be applied to other areas of data sciences. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. PermDroid a framework developed using proposed feature selection approach and machine learning techniques for Android malware detection.

Author: Mahindru, Arvind, Arora, Himani, Kumar, Abhinav, Gupta, Sachin Kumar, Mahajan, Shubham, Kadry, Seifedine, and Kim, Jungeun
Subjects: FEATURE selection, MACHINE learning, MALWARE, RESEARCH personnel
Abstract: The challenge of developing an Android malware detection framework that can identify malware in real-world apps is difficult for academicians and researchers. The vulnerability lies in the permission model of Android. Therefore, it has attracted the attention of various researchers to develop an Android malware detection model using permission or a set of permissions. Academicians and researchers have used all extracted features in previous studies, resulting in overburdening while creating malware detection models. But, the effectiveness of the machine learning model depends on the relevant features, which help in reducing the value of misclassification errors and have excellent discriminative power. A feature selection framework is proposed in this research paper that helps in selecting the relevant features. In the first stage of the proposed framework, t-test, and univariate logistic regression are implemented on our collected feature data set to classify their capacity for detecting malware. Multivariate linear regression stepwise forward selection and correlation analysis are implemented in the second stage to evaluate the correctness of the features selected in the first stage. Furthermore, the resulting features are used as input in the development of malware detection models using three ensemble methods and a neural network with six different machine-learning algorithms. The developed models' performance is compared using two performance parameters: F-measure and Accuracy. The experiment is performed by using half a million different Android apps. The empirical findings reveal that malware detection model developed using features selected by implementing proposed feature selection framework achieved higher detection rate as compared to the model developed using all extracted features data set. Further, when compared to previously developed frameworks or methodologies, the experimental results indicates that model developed in this study achieved an accuracy of 98.8%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Classification of mental workload using brain connectivity and machine learning on electroencephalogram data.

Author: Safari, MohammadReza, Shalbaf, Reza, Bagherzadeh, Sara, and Shalbaf, Ahmad
Subjects: MACHINE learning, FISHER discriminant analysis, FEATURE selection, SUPPORT vector machines, ELECTROENCEPHALOGRAPHY, SYSTEMS design
Abstract: Mental workload refers to the cognitive effort required to perform tasks, and it is an important factor in various fields, including system design, clinical medicine, and industrial applications. In this paper, we propose innovative methods to assess mental workload from EEG data that use effective brain connectivity for the purpose of extracting features, a hierarchical feature selection algorithm to select the most significant features, and finally machine learning models. We have used the Simultaneous Task EEG Workload (STEW) dataset, an open-access collection of raw EEG data from 48 subjects. We extracted brain-effective connectivities by the direct directed transfer function and then selected the top 30 connectivities for each standard frequency band. Then we applied three feature selection algorithms (forward feature selection, Relief-F, and minimum-redundancy-maximum-relevance) on the top 150 features from all frequencies. Finally, we applied sevenfold cross-validation on four machine learning models (support vector machine (SVM), linear discriminant analysis, random forest, and decision tree). The results revealed that SVM as the machine learning model and forward feature selection as the feature selection method work better than others and could classify the mental workload levels with accuracy equal to 89.53% (± 1.36). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. A machine learning-based optimization approach for pre-copy live virtual machine migration.

Author: Haris, Raseena M., Khan, Khaled M., Nhlabatsi, Armstrong, and Barhamgi, Mahmoud
Subjects: VIRTUAL machine systems, MACHINE learning, SYSTEM downtime, CLOUD computing, FEATURE selection, COMMUNICATION infrastructure, CYBERTERRORISM
Abstract: Organizations widely use cloud computing to outsource their computing needs. One crucial issue of cloud computing is that services must be available to clients at all times. However, the cloud services may be temporarily unavailable due to maintenance of the cloud infrastructure, load balancing of services, defense against cyber attacks, power management, proactive fault tolerance, or resource usage. The unavailability of cloud services impacts negatively on the business model of cloud providers. One solution to tackle the service unavailability is Live Virtual Machine Migration (LVM), that is, moving virtual machines (VMs) from the source host machine to the destination host without disrupting the running application. Pre-copy memory migration is a common LVM approach used in most networked systems such as the cloud. The main difficulty with this approach is the high rate of frequently updating memory pages, referred to as "dirty pages. Transferring these updated or dirty pages during the pre-copy migration approach prolongs the total migration time. After a predefined iteration, the pre-copy approach enters the stop-and-copy phase and transfers the remaining memory pages. If the remaining pages are huge, the downtime or service unavailability will be very high -resulting in a negative impact on the availability of the running services. To minimize such service downtime, it is critical to find an optimal time to migrate a virtual machine in the pre-copy approach. To address the issue, this paper proposes a machine learning-based method to optimize pre-copy migration. It has mainly three stages (i) Feature selection (ii) Model generation and (iii) Application of the proposed model in pre-copy migration. The experiment results show that our proposed model outperforms other machine learning models in terms of prediction accuracy and it significantly reduces downtime or service unavailability during the migration process. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Adaptive federated learning scheme for recognition of malicious attacks in an IoT network.

Author: Chhikara, Prateek, Tekchandani, Rajkumar, and Kumar, Neeraj
Subjects: FEDERATED learning, MACHINE learning, DATA privacy, ARTIFICIAL intelligence, INTERNET of things, FEATURE selection
Abstract: The Internet of Things (IoT) is crucial for deploying a novel Artificial Intelligence (AI) model for both network and application management. However, using classical centralized learning algorithms in the IoT environment is challenging, given massively distributed private datasets. Advancements in AI have helped us solve various use cases, but it operates under two significant challenges. Firstly, the data exists in separate clusters, and secondly, the current AI has limited data privacy and security. Federated learning (FL) aims to preserve data privacy through distributed learning methods that keep the data in storage silos. Likewise, differential privacy improves data privacy by measuring the privacy loss in communication among the elements of FL. The paper proposes two adaptive approaches for making model training differentially private in a vertical federated environment. The first one uses random feature selection to train different machine learning models, and performance improvement is also proposed. The second approach uses a tree structure, i.e., Classification and Regression Trees, using some defined constraints. Further, we created a scheme to help identify malicious users/devices in a federated network cluster using parity checks for every FL iteration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Logging evaluation of favorable areas of a low porosity and permeability sandy conglomerate reservoir based on machine learning.

Author: Jiang, Yanjiao, Zhou, Jian, Song, Yanjie, Song, Lijun, Guo, Zhihua, and Shen, Peng
Subjects: MACHINE learning, CONGLOMERATE, SUPPORT vector machines, FEATURE selection, PERMEABILITY, POROSITY
Abstract: The sandy conglomerate reservoir in layer Es3 of the Liaohe Eastern Depression has good potential for oil reservoir exploration and has been identified as a key area for future exploration. The low porosity and permeability, complex lithology, and strong heterogeneity of the target layer make it difficult to predict favorable reservoirs. The objective of this study is to analyze and process conventional logging data to extract feature parameters that affect lithology by establishing a decision tree lithology classifier. Principal component analysis is used to reduce data dimensionality, and the elbow method is applied to the clustering algorithm to establish the optimal number of clusters for the automatic classification of reservoir types. Further, support vector machines are used for lithology classification based on features with higher classification capabilities. The results show that the support vector machine lithology recognition method based on feature selection achieved an accuracy of 91.8%. The processing of actual well data has verified the feasibility of the method. Based on the combination of core experiments and oil testing results, the characteristics of three types of reservoirs were presented, and potential reservoir zones were proposed for drilling wells. The comprehensive analysis and the practical application of the developed method reveal that the class I reservoir has high hydrocarbon production and could be the most favorable reservoir in the Es3 sandy conglomerate. The processing data of lithology identification and reservoir classification evaluation are consistent with core data and hydrocarbon production data, verifying the effectiveness and practicability of the method proposed in this paper. The results of this study will serve as a reference for low porosity and permeability sandy conglomerate reservoir evaluation based on machine learning in the target area. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. A novel feature selection approach with integrated feature sensitivity and feature correlation for improved prediction of heart disease.

Author: Saranya, G. and Pravin, A.
Abstract: This paper presents a random forest-feature sensitivity and feature correlation (RF-FSFC) technique for enhanced heart disease prediction. The proposed methodology is implemented using the Cleveland heart disease dataset which comprises a total of 120 heart disease patient records. Data imputation was utilized for missing values, and min–max normalization was utilized for data transformation. We attempted to construct an RF-based classifier for coronary heart disease in this paper by combining feature sensitivity and correlation analysis. The sensitivity-based feature selection process ranks features according to their value in assessing CHD risk, and the feature correlation analysis phase analyses if there are any correlations between features. The heart disease prediction accuracy of 81.16% was obtained using the proposed RF-FSFCA technique by omitting five features (sex, hemoglobin, TD, CRF, and cirrhosis). When compared to the Naïve Bayes, decision tree, regression analysis, and support vector machine models, the proposed model offered a higher accuracy of 86.141% without omitting any features. It also offered sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) scores of 87.321%, 87.364%, 91.23, and 91.02 respectively. Experiment findings demonstrated that the proposed RF-FSFC approach significantly improves prediction accuracy as compared to other approaches that do not use the integrated Feature selection method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

40. A novel sample and feature dependent ensemble approach for Parkinson's disease detection.

Author: Ali, Liaqat, Chakraborty, Chinmay, He, Zhiquan, Cao, Wenming, Imrana, Yakubu, and Rodrigues, Joel J. P. C.
Subjects: ARTIFICIAL neural networks, PARKINSON'S disease, MACHINE learning, FEATURE selection, VOICE disorders, PLURALITY voting, AUTOMATIC speech recognition
Abstract: Parkinson's disease (PD) is a neurological disease that has been reported to have affected most people worldwide. Recent research pointed out that about 90% of PD patients possess voice disorders. Motivated by this fact, many researchers proposed methods based on multiple types of speech data for PD prediction. However, these methods either face the problem of low rate of accuracy or lack generalization. To develop an approach that will be free of these issues, in this paper we propose a novel ensemble approach. These paper contributions are two folds. First, investigating feature selection integration with deep neural network (DNN) and validating its effectiveness by comparing its performance with conventional DNN and other similar integrated systems. Second, development of a novel ensemble model namely EOFSC (Ensemble model with Optimal Features and Sample Dependant Base Classifiers) that exploits the findings of recently published studies. Recent research pointed out that for different types of voice data, different optimal models are obtained which are sensitive to different types of samples and subsets of features. In this paper, we further consolidate the findings by utilizing the proposed integrated system and propose the development of EOFSC. For multiple types of vowel phonations, multiple base classifiers are obtained which are sensitive to different subsets of features. These features and sample-dependent base classifiers are integrated, and the proposed EOFSC model is constructed. To evaluate the final prediction of the EOFSC model, the majority voting methodology is adopted. Experimental results point out that feature selection integration with neural networks improves the performance of conventional neural networks. Additionally, feature selection integration with DNN outperforms feature selection integration with conventional machine learning models. Finally, the newly developed ensemble model is observed to improve PD detection accuracy by 6.5%. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

41. Novel Feature Selection Using Machine Learning Algorithm for Breast Cancer Screening of Thermography Images.

Author: Gupta, Kumod Kumar, Ritu Vijay, Pahadiya, Pallavi, Saxena, Shivani, and Gupta, Meenakshi
Subjects: MACHINE learning, FEATURE selection, EARLY detection of cancer, OPTIMIZATION algorithms, BREAST cancer
Abstract: Early diagnosis and treatment are the keys to managing patients with breast cancer. Identifying patients even before they present with symptoms is made possible through screening methods. Thermography is a tool for screening carcinoma breast to reduce the associated morbidity and mortality. This paper proposes a novel feature selection using a machine learning algorithm, namely, the Greedy search optimization algorithm. This algorithm is applied to compare various features selection techniques. These techniques are sequential backward (SBS), sequential forward feature selection, and exhaustive feature selection techniques. It is concluded from this comparison that sequential backward feature selection is the best technique for breast cancer diagnosis. Our average score of SBS comes 88.5714%, with a computational time of 87.4 s. For classification purposes, we have used an artificial neural network. The classification result varies according to the age group with the physiology of the breast. Considering this, we have selected features age-group wise by sequential backward technique. The classification accuracy of (20–40), (41–60), (61–80) years age group patients are 79.349%, 80.711%, and 74.76%. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

42. An efficient hybrid stock trend prediction system during COVID-19 pandemic based on stacked-LSTM and news sentiment analysis.

Author: Sharaf, Marwa, Hemdan, Ezz El-Din, El-Sayed, Ayman, and El-Bahnasawy, Nirmeen A.
Subjects: COVID-19 pandemic, SENTIMENT analysis, HYBRID systems, STOCKS (Finance), INVESTORS, FEATURE selection, FORECASTING
Abstract: The coronavirus is an irresistible virus that generally influences the respiratory framework. It has an effective impact on the global economy specifically, on the financial movement of stock markets. Recently, an accurate stock market prediction has been of great interest to investors. A sudden change in the stock movement due to COVID -19 appearance causes some problems for investors. From this point, we propose an efficient system that applies sentiment analysis of COVID-19 news and articles to extract the final impact of COVID-19 on the financial stock market. In this paper, we propose a stock market prediction system that extracts the stock movement with the COVID spread. It is important to predict the effect of these diseases on the economy to be ready for any disease change and protect our economy. In this paper, we apply sentimental analysis to stock news headlines to predict the daily future trend of stock in the COVID-19 period. Also, we use machine learning classifiers to predict the final impact of COVID-19 on some stocks such as TSLA, AMZ, and GOOG stock. For improving the performance and quality of future trend predictions, feature selection and spam tweet reduction are performed on the data sets. Finally, our proposed system is a hybrid system that applies text mining on social media data mining on the historical stock dataset to improve the whole prediction performance. The proposed system predicts stock movement for TSLA, AMZ, and GOOG with average prediction accuracy of 90%, 91.6%, and 92.3% respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

43. Enhanced chimp hierarchy optimization algorithm with adaptive lens imaging for feature selection in data classification.

Author: Zhang, Li and Chen, XiaoBo
Subjects: OPTIMIZATION algorithms, CHIMPANZEES, SOCIAL classes, FEATURE selection, MACHINE learning, SOCIAL hierarchies
Abstract: Feature selection is a critical component of machine learning and data mining to remove redundant and irrelevant features from a dataset. The Chimp Optimization Algorithm (CHoA) is widely applicable to various optimization problems due to its low number of parameters and fast convergence rate. However, CHoA has a weak exploration capability and tends to fall into local optimal solutions in solving the feature selection process, leading to ineffective removal of irrelevant and redundant features. To solve this problem, this paper proposes the Enhanced Chimp Hierarchy Optimization Algorithm for adaptive lens imaging (ALI-CHoASH) for searching the optimal classification problems for the optimal subset of features. Specifically, to enhance the exploration and exploitation capability of CHoA, we designed a chimp social hierarchy. We employed a novel social class factor to label the class situation of each chimp, enabling effective modelling and optimization of the relationships among chimp individuals. Then, to parse chimps' social and collaborative behaviours with different social classes, we introduce other attacking prey and autonomous search strategies to help chimp individuals approach the optimal solution faster. In addition, considering the poor diversity of chimp groups in the late iteration, we propose an adaptive lens imaging back-learning strategy to avoid the algorithm falling into a local optimum. Finally, we validate the improvement of ALI-CHoASH in exploration and exploitation capabilities using several high-dimensional datasets. We also compare ALI-CHoASH with eight state-of-the-art methods in classification accuracy, feature subset size, and computation time to demonstrate its superiority. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform.

Author: Sharkas, Maha and Attallah, Omneya
Subjects: DEEP learning, DISCRETE cosine transforms, CONVOLUTIONAL neural networks, MACHINE learning, COLORECTAL cancer, FEATURE selection
Abstract: Colorectal cancer (CRC) exhibits a significant death rate that consistently impacts human lives worldwide. Histopathological examination is the standard method for CRC diagnosis. However, it is complicated, time-consuming, and subjective. Computer-aided diagnostic (CAD) systems using digital pathology can help pathologists diagnose CRC faster and more accurately than manual histopathology examinations. Deep learning algorithms especially convolutional neural networks (CNNs) are advocated for diagnosis of CRC. Nevertheless, most previous CAD systems obtained features from one CNN, these features are of huge dimension. Also, they relied on spatial information only to achieve classification. In this paper, a CAD system is proposed called "Color-CADx" for CRC recognition. Different CNNs namely ResNet50, DenseNet201, and AlexNet are used for end-to-end classification at different training–testing ratios. Moreover, features are extracted from these CNNs and reduced using discrete cosine transform (DCT). DCT is also utilized to acquire spectral representation. Afterward, it is used to further select a reduced set of deep features. Furthermore, DCT coefficients obtained in the previous step are concatenated and the analysis of variance (ANOVA) feature selection approach is applied to choose significant features. Finally, machine learning classifiers are employed for CRC classification. Two publicly available datasets were investigated which are the NCT-CRC-HE-100 K dataset and the Kather_texture_2016_image_tiles dataset. The highest achieved accuracy reached 99.3% for the NCT-CRC-HE-100 K dataset and 96.8% for the Kather_texture_2016_image_tiles dataset. DCT and ANOVA have successfully lowered feature dimensionality thus reducing complexity. Color-CADx has demonstrated efficacy in terms of accuracy, as its performance surpasses that of the most recent advancements. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Divorce prediction using machine learning algorithms in Ha'il region, KSA.

Author: Moumen, Abdelkader, Shafqat, Ayesha, Alraqad, Tariq, Alshawarbeh, Etaf Saleh, Saber, Hicham, and Shafqat, Ramsha
Subjects: MACHINE learning, ARTIFICIAL neural networks, DIVORCE, COUPLES therapy, FAMILY therapists, FEATURE selection
Abstract: The application of artificial intelligence (AI) in predictive analytics is growing in popularity. It has the power to offer ground-breaking solutions for a range of social problems and real world societal difficulties. It is helpful in addressing some of the social issues that today's world seems incapable of solving. One of the most significant phenomena affecting people's lives is divorce. The goal of this paper is to study the use of machine learning algorithms to determine the effectiveness of divorce predictor scale (DPS) and identify the reasons that usually lead to divorce in the scenario of Hail region, KSA. For this purpose, in this study, the DPS, based on Gottman couples therapy, was used to predict divorce by applying different machine learning algorithms. There were 54 items of the DPS used as features or attributes for data collection. In addition to the DPS, a personal information form was utilized to gather participants' personal data in order to conduct this study in a more structured and traditional manner. Out of 148 participants 116 participants were married whereas 32 were divorced. With the use of algorithms artificial neural network (ANN), naïve bayes (NB), and random forest (RF), the effectiveness of DPS was examined in this study. The correlation based feature selection method was used to identify the top six features from the same dataset and the highest accuracy rate was 91.66% with RF. The results show that DPS can predict divorce. This scale can help family counselors and therapists in case formulation and intervention plan development process. Additionally, it may be argued that the Hail region, KSA sampling confirmed the Gottman couples treatment predictors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Examining user behavior with machine learning for effective mobile peer-to-peer payment adoption.

Author: Antonio, Blanco-Oliver, Juan, Lara-Rubio, Ana, Irimia-Diéguez, and Francisco, Liébana-Cabanillas
Subjects: MOBILE commerce, MACHINE learning, MOBILE learning, TECHNOLOGICAL innovations, FEATURE selection, RANDOM forest algorithms, PEERS
Abstract: Disruptive innovations caused by FinTech (i.e., technology-assisted customized financial services) have brought digital peer-to-peer (P2P) payments to the fore. In this challenging environment and based on theories about customer behavior in response to technological innovations, this paper identifies the drivers of consumer adoption of mobile P2P payments and develops a machine learning model to predict the use of this thriving payment option. To do so, we use a unique data set with information from 701 participants (observations) who completed a questionnaire about the adoption of Bizum, a leading mobile P2P platform worldwide. The respondent profile was the average Spanish citizen within the framework of European culture and lifestyle. We document (in this order of priority) the usefulness of mobile P2P payments, influence of peers and other social groups such as friends, family, and colleagues on individual behavior (that is, subjective norms), perceived trust, and enjoyment of the user experience within the digital context and how those attributes better classify (potential) users of mobile P2P payments. We also find that nonparametric approaches based on machine learning algorithms outperform traditional parametric methods. Finally, our results show that feature selection based on random forest, such as the Boruta procedure, as a preprocessing technique substantially increases prediction performance while reducing noise, redundancy of the resulting model, and computational costs. The main limitation of this research is that it only has a place within the sociocultural and institutional framework of the Spanish population. It is therefore desirable to replicate this study by surveying people from other countries to analyze the effects of the institutional environment on the adoption of mobile P2P payments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Feature selection and prediction of small-for-gestational-age infants.

Author: Li, Jianqiang, Liu, Lu, Zhou, MengChu, Yang, Ji-Jiang, Chen, Shi, Liu, HuiTing, Wang, Qing, Pan, Hui, Sun, ZhiHua, and Tan, Feng
Abstract: The small-for-gestational-age (SGA) condition often causes serious problems. Therefore, identifying the risk factors for SGA is important. Traditional statistical methods such as stepwise logistic regression (LR) have been widely utilized to discover possible risk factors. However, other feature selection methods from machine learning field have rarely been employed for the task. In this paper, a comparison of five feature selection methods from both fields for SGA risk factors analysis is conducted for the first time. To evaluate their performance, four classification algorithms are used to construct SGA prediction models. The evaluation criteria are precision and the area under the receiver operator characteristic curve. Stepwise LR achieves the best performance among the five feature selection methods, because it conducts both a univariate significance test and a model significance test, which make it more suitable for handling the complex relations among features. The top 20 features selected by each feature selection method and the 27 features selected by four or five of them could assist physicians to revise traditional SGA evaluation models. Ensemble method is also exploited to build effective SGA prediction models based on the feature subsets, which is indeed superior compared with the individual ones shown in the results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Incremental feature selection for large-scale hierarchical classification with the arrival of new samples.

Author: Tian, Yang and She, Yanhong
Subjects: FEATURE selection, ROUGH sets, MACHINE learning, CLASSIFICATION
Abstract: In the era of big data, the amount of class labels is growing rapidly, which poses a great challenge to classification tasks. The hierarchical classification was thus introduced to address this issue by considering the structural information between different class labels. In this paper, we propose an incremental feature selection algorithm for handling the arrival of new samples by using the theory of fuzzy rough sets. As a preliminary step, we propose a non-incremental hierarchical feature selection algorithm, which is an improved version of the existing method. Then utilizing the sibling strategy, the incremental calculation of the dependency degree at the arrival of samples is discussed. Based on the analysis of dependency degree change, we design feature addition and deletion strategies, as well as the incremental feature selection algorithm. In the experimental section, two versions of algorithms are designed. The experimental results show that our improvement of the existing method is highly effective and can significantly accelerate the process of feature selection. In addition, version 2 of the incremental algorithm exhibits much higher efficiency than the improved non-incremental algorithm on several datasets, as well as the existing method. Compared to six hierarchical feature selection algorithms, our algorithm achieves better results on the classification accuracy and three hierarchical evaluation metrics. The effectiveness and efficiency of version 1 are also verified by the comparison of version 2 and other results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Influence of One-Way ANOVA and Kruskal–Wallis Based Feature Ranking on the Performance of ML Classifiers for Bearing Fault Diagnosis.

Author: Jamil, Mohd Atif and Khanam, Sidra
Subjects: FAULT diagnosis, ONE-way analysis of variance, ROLLER bearings, MACHINE performance, KRUSKAL-Wallis Test, MACHINE learning, FEATURE selection
Abstract: Purpose: Condition monitoring and fault diagnosis of Rolling Element Bearings (REBs) is a crucial and time-consuming task. The advantage of automating fault diagnosis using intelligent techniques is the reduced prerequisite of experienced and skilled personnel. The accuracy of fault detection methods is influenced by selecting the most discriminative, higher ranked features from the original high-dimensional feature space. So, it is fairly significant to explore different feature ranking techniques and investigate their impact on the performance of Machine Learning (ML) techniques in diagnosing the bearing faults. Method: This paper presents the influence of One-way ANOVA and Kruskal–Wallis feature ranking techniques on the performance of four ML classifiers, namely, Decision Tree, SVM, KNN, and ANN for fault classification in a ball bearing. For this, two open data sources of Case Western Reserve University (CWRU) and Paderborn University (PU) are employed. The influence of feature ranking on fault diagnostic performance of the classifiers is investigated through Classification Accuracy, Training Time, Prediction Speed, Recall, Precision, and F1-Score. Results and Conclusion: It is observed that selecting distinctive features through feature ranking significantly influences the performance of ML models. The maximum accuracy of fault classification is achieved with top 15 features ranked by the Kruskal–Wallis test for all the classifiers. The accuracies attained are 98.6, 99.4, 96.9, and 97.8% for Decision Tree, SVM, KNN, and ANN, respectively, with CWRU dataset. Similar trend is obtained through PU dataset giving corresponding maximum accuracies of 95.0, 97.8, 97.2, and 95.0%. The Kruskal–Wallis test outperforms the One-way ANOVA for both the data sources. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning.

Author: Li, Jing, Othman, Mohd Shahizan, Chen, Hewan, and Yusuf, Lizawati Mi
Subjects: INTRUSION detection systems (Computer security), FEATURE selection, FEATURE extraction, MACHINE learning, INTERNET of things, CYBERTERRORISM
Abstract: Internet of Things (IoT) devices are widely used but also vulnerable to cyberattacks that can cause security issues. To protect against this, machine learning approaches have been developed for network intrusion detection in IoT. These often use feature reduction techniques like feature selection or extraction before feeding data to models. This helps make detection efficient for real-time needs. This paper thoroughly compares feature extraction and selection for IoT network intrusion detection in machine learning-based attack classification framework. It looks at performance metrics like accuracy, f1-score, and runtime, etc. on the heterogenous IoT dataset named Network TON-IoT using binary and multiclass classification. Overall, feature extraction gives better detection performance than feature selection as the number of features is small. Moreover, extraction shows less feature reduction compared with that of selection, and is less sensitive to changes in the number of features. However, feature selection achieves less model training and inference time compared with its counterpart. Also, more space to improve the accuracy for selection than extraction when the number of features changes. This holds for both binary and multiclass classification. The study provides guidelines for selecting appropriate intrusion detection methods for particular scenarios. Before, the TON-IoT heterogeneous IoT dataset comparison and recommendations were overlooked. Overall, the research presents a thorough comparison of feature reduction techniques for machine learning-driven intrusion detection in IoT networks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

1,044 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources