Descriptor: "RANDOM forest algorithms" / Search Limiters: Available in Library Collection / Topic: deep learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"RANDOM forest algorithms"' showing total 686 results

Start Over Descriptor "RANDOM forest algorithms" Search Limiters Available in Library Collection Topic deep learning

686 results on '"RANDOM forest algorithms"'

1. A novel calculation approach for price range prediction with general optimization of features using deep neural network and random forest algorithms.

Author: Reddy, M. Dinesh Kumar and Geetha, R.
Subjects: *ARTIFICIAL neural networks, *RANDOM forest algorithms, *PRICES, *DEEP learning, *VALUE (Economics)
Abstract: This study's goal is to estimate the price range for mobile devices using the Deep Neural Network algorithm and the Random Forest model. When N = 10 sample iterations were used to test the accuracy of the model for the validation of the mobile price range, two groups—the Deep Neural Network algorithm and the Random Forest model— with G power (value = 0.8) were studied. The Deep Neural Network model's accuracy results have a possible range of 84.183 percent, and the Random Forest model has an accuracy of (74.80 percent). The Deep Neural Network and Random Forest had statistically significant differences (p = 0.005). The Deep Neural Network technique outperforms the Random Forest approach in terms of performance and significance, according to the study findings for validating the pricing range for mobile devices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. MOSAIC: An Artificial Intelligence–Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers.

Author: D'Amico, Saverio, Dall'Olio, Lorenzo, Rollo, Cesare, Alonso, Patricia, Prada-Luengo, Iñigo, Dall'Olio, Daniele, Sala, Claudia, Sauta, Elisabetta, Asti, Gianluca, Lanino, Luca, Maggioni, Giulia, Campagna, Alessia, Zazzetti, Elena, Delleani, Mattia, Bicchieri, Maria Elena, Morandini, Pierandrea, Savevski, Victor, Arroyo, Borja, Parras, Juan, and Zhao, Lin Pierre
Subjects: *ARTIFICIAL intelligence, *FEDERATED learning, *MULTIMODAL user interfaces, *CLASSIFICATION, *HEMATOLOGIC malignancies, *RANDOM forest algorithms, *DEEP learning
Abstract: PURPOSE: Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)–based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities. METHODS: We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure. RESULTS: UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models. CONCLUSION: MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection. MOSAIC: AI-based Framework for Multi-Modal Analysis, Classification and Prognostic Assessment in Rare Cancers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Deep random forest with ferroelectric analog content addressable memory.

Author: Xunzhao Yin, Müller, Franz, Laguna, Ann Franchesca, Chao Li, Qingrong Huang, Zhiguo Shi, Lederer, Maximilian, Laleni, Nellie, Shan Deng, Zijian Zhao, Imani, Mohsen, Yiyu Shi, Niemier, Michael, Xiaobo Sharon Hu, Cheng Zhuo, Kämpfe, Thomas, and Kai Ni
Subjects: *RANDOM forest algorithms, *ARTIFICIAL neural networks, *FIELD-effect transistors, *FERROELECTRICITY, *DEEP learning
Abstract: Deep random forest (DRF), which combines deep learning and random forest, exhibits comparable accuracy, interpretability, low memory and computational overhead to deep neural networks (DNNs) in edge intelligence tasks. However, efficient DRF accelerator is lagging behind its DNN counterparts. The key to DRF acceleration lies in realizing the branch-split operation at decision nodes. In this work, we propose implementing DRF through associative searches realized with ferroelectric analog content addressable memory (ACAM). Utilizing only two ferroelectric field effect transistors (FeFETs), the ultra-compact ACAM cell performs energy-efficient branch-split operations by storing decision boundaries as analog polarization states in FeFETs. The DRF accelerator architecture and its model mapping to ACAM arrays are presented. The functionality, characteristics, and scalability of the FeFET ACAM DRF and its robustness against FeFET device non-idealities are validated in experiments and simulations. Evaluations show that the FeFET ACAM DRF accelerator achieves ~106x/10x and ~106x/2.5x improvements in energy and latency, respectively, compared to other DRF hardware implementations on state-of-the-art CPU/ReRAM. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Improved Classification of Coastal Wetlands in Yellow River Delta of China Using ResNet Combined with Feature-Preferred Bands Based on Attention Mechanism.

Author: Li, Yirong, Yu, Xiang, Zhang, Jiahua, Zhang, Shichao, Wang, Xiaopeng, Kong, Delong, Yao, Lulu, and Lu, He
Subjects: *COASTAL wetlands, *STORM surges, *MACHINE learning, *DEEP learning, *CLASSIFICATION, *RANDOM forest algorithms, *WETLANDS
Abstract: The Yellow River Delta wetlands in China belong to the coastal wetland ecosystem, which is one of the youngest and most characteristic wetlands in the world. The Yellow River Delta wetlands are constantly changed by inland sediment and the influence of waves and storm surges, so the accurate classification of the coastal wetlands in the Yellow River Delta is of great significance for the rational utilization, development and protection of wetland resources. In this study, the Yellow River Delta sentinel-2 multispectral data were processed by super-resolution synthesis, and the feature bands were optimized. The optimal feature-band combination scheme was screened using the OIF algorithm. A deep learning model attention mechanism ResNet based on feature optimization with attention mechanism integration into the ResNet network is proposed. Compared with the classical machine learning model, the AM_ResNet model can effectively improve the classification accuracy of the wetlands in the Yellow River Delta. The overall accuracy was 94.61% with a Kappa of 0.93, and they were improved by about 6.99% and 0.1, respectively, compared with the best-performing Random Forest Classification in machine learning. The results show that the method can effectively improve the classification accuracy of the wetlands in the Yellow River Delta. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Forecasting material quantity using machine learning and times series techniques.

Author: Zermane, Hanane, Madjour, Hassina, Ziar, Ahcene, and Zermane, Abderrahim
Subjects: *TIME series analysis, *DEEP learning, *FORECASTING, *RANDOM forest algorithms, *MANUFACTURING processes, *MACHINE learning
Abstract: The current research is dedicated to harnessing cutting-edge technologies within the paradigm of Industry 5.0. The objective is to capitalize on advancements in Machine and Deep Learning techniques. This research endeavors to construct robust predictive models, utilizing historical data, for precise real-time predictions in estimating material quantities within a cement workshop. Machine Learning regressors evaluated based on several metrics, SVR (R-squared 0.9739, MAE 0.0403), Random Forest (R-squared 0.9990, MAE 0.0026), MLP (R-squared 0.9890, MAE 0.0255), Gradient Boosting (R-squared 0.9989, MAE 0.0042). The time series models LSTM and GRU yielded R-squared 0.9978, MAE 0.0100, and R-squared 0.9980, MAE 0.0099, respectively. The ultimate outcomes include improved and efficient production, optimization of production processes, streamlined operations, reduced downtime, mitigation of potential disruptions, and the facilitation of the factory's evolution towards intelligent manufacturing processes embedded within the framework of Industry 5.0. These achievements underscore the potential impact of leveraging advanced machine learning techniques for enhancing the operational dynamics and overall efficiency of manufacturing facilities [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Prediction of Wave Spectral Parameters Using Multiple-Output Regression Models to Support the Execution of Marine Operations.

Author: Prócel, Jonathan, Alarcón, Marco Guamán, and Guachamin-Acero, Wilson
Subjects: *REGRESSION analysis, *SCIENTIFIC literature, *RANDOM forest algorithms, *TECHNICAL reports, *FORECASTING, *DEEP learning
Abstract: Execution of a marine operation (MO) requires coordinated actions of several vessels conducting simultaneous and sequential offshore activities. These activities have their operational limits given in terms of environmental parameters. Wave parameters are important because of their high energetic level. During the execution of a MO, forecast wave spectral parameters, i.e., significant wave height (Hs), peak period (Tp), and peak direction, are used to make an on-board decision. For critical operations, the use of forecasts can be complemented with buoy measurements. This paper proposes to use synthetic statistics of vessel dynamic responses to predict "real-time" wave spectral parameters using multi-output machine learning (ML) regression algorithms. For a case study of a vessel with no forward speed, it is observed that the random forest model predicts accurate Hs and Tp parameters. The prediction of wave direction is not very accurate but it can be corrected with on-board observations. The random forest model has good performance; it is efficient, useful for practical purposes, and comparable with other deep learning models reported in the scientific literature. Findings from this research can be valuable for real-time assessment of wave spectral parameters, which are necessary to support decision-making during the execution of MOs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Evaluating Multi-target Regression Framework for Dynamic Condition Prediction in Wellbore.

Author: Keshavarz, Sahar, Elmgerbi, Asad, Vita, Petr, and Thonhauser, Gerhard
Subjects: *MACHINE learning, *DEEP learning, *DRILL stem, *GAUSSIAN processes, *RANDOM forest algorithms, *STOCHASTIC processes
Abstract: In recent years, the focus has shifted towards leveraging physics-based modelling and data-driven analysis to predict drilling incidents and anomalies in real time, with the goal of reducing non-productive periods. However, much of this attention has directed at specific drilling operations like drilling and tripping, leaving other vital processes, such as wellbore conditioning, comparatively overlooked. The primary objective of this study is to employ data-driven techniques for predicting the dynamic state of the wellbore by utilising sensor data, operating parameters, and surface measurements. Accurate predictions are pivotal for automating these processes, promising significant savings in both redundant time and associated costs, ultimately elevating operational efficiency. In this research, the surface drilling parameters such as flowrate, rotation speed, block position, and drill string length are incorporated with the surface measurements such as hookload, pressure, and torque during wellbore conditioning operation to predict further surface sensor measurements. Different parameter settings are evaluated to find the best approach. Six supervised learning algorithms are used to select the best prediction method. The findings reveal that considering all surface parameters and measurements yields the most accurate predictions. Among various single and multi-target regression methods, including deep learning approaches, the Gaussian process and random forest models exhibit the lowest prediction errors. By reliably predicting and understanding wellbore behaviour, this research paves the way for more efficient and autonomous drilling operations in the future, bridging a critical gap in the industry's automation capabilities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. A model for skin cancer using combination of ensemble learning and deep learning.

Author: Hosseinzadeh, Mehdi, Hussain, Dildar, Zeki Mahmood, Firas Muhammad, A. Alenizi, Farhan, Varzeghani, Amirhossein Noroozi, Asghari, Parvaneh, Darwesh, Aso, Malik, Mazhar Hussain, and Lee, Sang-Woong
Subjects: *SKIN cancer, *DEEP learning, *FEATURE selection, *MACHINE learning, *RANDOM forest algorithms, *SURVIVAL rate
Abstract: Skin cancer has a significant impact on the lives of many individuals annually and is recognized as the most prevalent type of cancer. In the United States, an estimated annual incidence of approximately 3.5 million people receiving a diagnosis of skin cancer underscores its widespread prevalence. Furthermore, the prognosis for individuals afflicted with advancing stages of skin cancer experiences a substantial decline in survival rates. This paper is dedicated to aiding healthcare experts in distinguishing between benign and malignant skin cancer cases by employing a range of machine learning and deep learning techniques and different feature extractors and feature selectors to enhance the evaluation metrics. In this paper, different transfer learning models are employed as feature extractors, and to enhance the evaluation metrics, a feature selection layer is designed, which includes diverse techniques such as Univariate, Mutual Information, ANOVA, PCA, XGB, Lasso, Random Forest, and Variance. Among transfer models, DenseNet-201 was selected as the primary feature extractor to identify features from data. Subsequently, the Lasso method was applied for feature selection, utilizing diverse machine learning approaches such as MLP, XGB, RF, and NB. To optimize accuracy and precision, ensemble methods were employed to identify and enhance the best-performing models. The study provides accuracy and sensitivity rates of 87.72% and 92.15%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. An optimal and smart E-waste collection using neural network based on sine cosine optimization.

Author: Ravi, Srivel, Venkatesan, S., Arun kumar, and Lakshmi Kanth Reddy, K.
Subjects: *CONVOLUTIONAL neural networks, *ELECTRONIC waste, *RANDOM forest algorithms, *ELECTRONIC waste management, *POLLUTION, *ANIMAL health, *DEEP learning
Abstract: Electronic waste (e-waste) is considered a major issue that our world is tackling nowadays. This electronic waste causes various health issues to animals as well as human beings which further results in environmental pollution in developing countries like India. To overcome these issues, proper e-waste collection is proposed by using the dynamic sine cosine-based neural network optimization (DSCNN) approach. The major objective of this approach involves collecting waste from the individual, hence handling the widespread adoption and use of smartphones. To enhance waste planning collection, residents upload a photograph of their waste to the waste collection company's server, which mechanically recognizes and categorizes the image. A new classification and detection scheme using the DSCNN approach is proposed for efficient e-waste collection planning and correctly detects the type and quantity of waste components in images. The identification and classification accuracy of the uploaded images is very accurate; this method describes the e-waste collection process in various streets and buildings in Maharashtra, India. Experimental results describe that the proposed approach readily achieves the proper allocation of vehicle collection, vehicle routing plan, and household e-waste collection, resulting in reduced collection costs. Moreover, the proposed DSCNN method is compared to various other methods like random forest algorithm (RFA), fractional henry gas optimization (FHGO), behavior-based swarm model by the fuzzy controller (BSFC), and deep learning convolutional neural network (DL-CNN). The DSCNN approach yielded an e-waste collection detection accuracy of 97%. The accuracy rates of 94%, 95%, 93%, and 92.15% are obtained from the DL-CNN, FHGO, BSFC, and RFA. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Autoencoder-based feature extraction for the automatic detection of snow avalanches in seismic data.

Author: Simeon, Andri, Pérez-Guillén, Cristina, Volpi, Michele, Seupel, Christine, and Herwijnen, Alec van
Subjects: *FEATURE extraction, *AVALANCHES, *DEEP learning, *CLASSIFICATION algorithms, *DATABASES, *RANDOM forest algorithms, *MACHINE learning
Abstract: Monitoring snow avalanche activity is essential for operational avalanche forecasting and the successful implementation of mitigation measures to ensure safety in mountain regions. To facilitate and automate the monitoring process, avalanche detection systems equipped with seismic sensors can provide a cost-effective solution. Still, automatically differentiating avalanche signals from other sources in seismic data remains challenging, mainly due to the complexity of seismic signals generated by avalanches, the complex signal transmission through the ground, the relatively rare occurrence of avalanches, and the presence of multiple sources in the continuous seismic data. One approach to automate avalanche detection is by applying machine learning methods. So far, research in this area has mainly focused on extracting standard domain-specific signal attributes in the time and frequency domains as input features for statistical models. In this study, we propose a novel application of deep learning autoencoder models for the automatic and unsupervised extraction of features from seismic recordings. These new features are then fed into classifiers for discriminating snow avalanches. To this end, we trained three Random forest classifiers based on different feature extraction approaches. The first set of 32 features was automatically extracted from the time-series signals by an autoencoder consisting of convolutional layers and a recurrent long short-term memory unit. The second autoencoder applies a series of fully connected layers to extract 16 features from the spectrum of the signals. As a benchmark, a third random forest was trained with typical waveform, spectral and spectrogram attributes used to discriminate seismic events. We extracted all these features from 10-second windows of the seismograms recorded with an array of five seismometers installed in an avalanche test site located above Davos, Switzerland. The database used to train and test the models contained 84 avalanches and 828 noise (unrelated to avalanches) events recorded during the winter seasons of 2020–2021 and 2021–2022. Finally, we assessed the performance of each classifier, compared the results, and proposed different aggregation methods to improve the predictive performance of the developed seismic detection algorithms. The classifiers achieved an avalanche f1-score of 0.61 (seismic attributes), 0.49 (temporal autoencoder) and 0.60 (spectral autoencoder) and avalanche recall of 0.68, 0.71 and 0.71, respectively. Overall, the macro f1-score ranged from 0.70 (temporal autoencoder) to 0.78 (seismic attributes). After applying a post-processing step to event-based predictions, the avalanche recall of the three models significantly increased, reaching values between 0.82 and 0.91. The developed approach could be potentially used as an operational, near-real-time avalanche detection system. Yet, the relatively high number of false alarms still needs further implementation of the current automated seismic classification algorithms to be used as unique methods to detect avalanches effectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A hybrid feature weighted attention based deep learning approach for an intrusion detection system using the random forest algorithm.

Author: Hashmi, Arshad, Barukab, Omar M., and Hamza Osman, Ahmad
Subjects: *INTRUSION detection systems (Computer security), *RANDOM forest algorithms, *DEEP learning, *COMPUTER network traffic, *TELECOMMUNICATION, *COMPUTER network security, *COMPUTER networks
Abstract: Due to the recent advances in the Internet and communication technologies, network systems and data have evolved rapidly. The emergence of new attacks jeopardizes network security and make it really challenging to detect intrusions. Multiple network attacks by an intruder are unavoidable. Our research targets the critical issue of class imbalance in intrusion detection, a reflection of the real-world scenario where legitimate network activities significantly out number malicious ones. This imbalance can adversely affect the learning process of predictive models, often resulting in high false-negative rates, a major concern in Intrusion Detection Systems (IDS). By focusing on datasets with this imbalance, we aim to develop and refine advanced algorithms and techniques, such as anomaly detection, cost-sensitive learning, and oversampling methods, to effectively handle such disparities. The primary goal is to create models that are highly sensitive to intrusions while minimizing false alarms, an essential aspect of effective IDS. This approach is not only practical for real-world applications but also enhances the theoretical understanding of managing class imbalance in machine learning. Our research, by addressing these significant challenges, is positioned to make substantial contributions to cybersecurity, providing valuable insights and applicable solutions in the fight against digital threats and ensuring robustness and relevance in IDS development. An intrusion detection system (IDS) checks network traffic for security, availability, and being non-shared. Despite the efforts of many researchers, contemporary IDSs still need to further improve detection accuracy, reduce false alarms, and detect new intrusions. The mean convolutional layer (MCL), feature-weighted attention (FWA) learning, a bidirectional long short-term memory (BILSTM) network, and the random forest algorithm are all parts of our unique hybrid model called MCL-FWA-BILSTM. The CNN-MCL layer for feature extraction receives data after preprocessing. After convolution, pooling, and flattening phases, feature vectors are obtained. The BI-LSTM and self-attention feature weights are used in the suggested method to mitigate the effects of class imbalance. The attention layer and the BI-LSTM features are concatenated to create mapped features before feeding them to the random forest algorithm for classification. Our methodology and model performance were validated using NSL-KDD and UNSW-NB-15, two widely available IDS datasets. The suggested model's accuracies on binary and multi-class classification tasks using the NSL-KDD dataset are 99.67% and 99.88%, respectively. The model's binary and multi-class classification accuracies on the UNSW-NB15 dataset are 99.56% and 99.45%, respectively. Further, we compared the suggested approach with other previous machine learning and deep learning models and found it to outperform them in detection rate, FPR, and F-score. For both binary and multiclass classifications, the proposed method reduces false positives while increasing the number of true positives. The model proficiently identifies diverse network intrusions on computer networks and accomplishes its intended purpose. The suggested model will be helpful in a variety of network security research fields and applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text.

Author: Dinsa, Etana Fikadu, Das, Mrinal, and Abebe, Teklu Urgessa
Subjects: *MACHINE learning, *DEEP learning, *ARTIFICIAL intelligence, *PREDICTION models, *SUPPORT vector machines, *RANDOM forest algorithms
Abstract: Automated disease diagnosis and prediction, powered by AI, play a crucial role in enabling medical professionals to deliver effective care to patients. While such predictive tools have been extensively explored in resource-rich languages like English, this manuscript focuses on predicting disease categories automatically from symptoms documented in the Afaan Oromo language, employing various classification algorithms. This study encompasses machine learning techniques such as support vector machines, random forests, logistic regression, and Naïve Bayes, as well as deep learning approaches including LSTM, GRU, and Bi-LSTM. Due to the unavailability of a standard corpus, we prepared three data sets with different numbers of patient symptoms arranged into 10 categories. The two feature representations, TF-IDF and word embedding, were employed. The performance of the proposed methodology has been evaluated using accuracy, recall, precision, and F1 score. The experimental results show that, among machine learning models, the SVM model using TF-IDF had the highest accuracy and F1 score of 94.7%, while the LSTM model using word2vec embedding showed an accuracy rate of 95.7% and F1 score of 96.0% from deep learning models. To enhance the optimal performance of each model, several hyper-parameter tuning settings were used. This study shows that the LSTM model verifies to be the best of all the other models over the entire dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Explainable AI: Machine Learning Interpretation in Blackcurrant Powders.

Author: Przybył, Krzysztof
Subjects: *MACHINE learning, *DEEP learning, *ARTIFICIAL intelligence, *RANDOM forest algorithms, *DECISION trees, *DATA mining, *POWDERS
Abstract: Recently, explainability in machine and deep learning has become an important area in the field of research as well as interest, both due to the increasing use of artificial intelligence (AI) methods and understanding of the decisions made by models. The explainability of artificial intelligence (XAI) is due to the increasing consciousness in, among other things, data mining, error elimination, and learning performance by various AI algorithms. Moreover, XAI will allow the decisions made by models in problems to be more transparent as well as effective. In this study, models from the 'glass box' group of Decision Tree, among others, and the 'black box' group of Random Forest, among others, were proposed to understand the identification of selected types of currant powders. The learning process of these models was carried out to determine accuracy indicators such as accuracy, precision, recall, and F1-score. It was visualized using Local Interpretable Model Agnostic Explanations (LIMEs) to predict the effectiveness of identifying specific types of blackcurrant powders based on texture descriptors such as entropy, contrast, correlation, dissimilarity, and homogeneity. Bagging (Bagging_100), Decision Tree (DT0), and Random Forest (RF7_gini) proved to be the most effective models in the framework of currant powder interpretability. The measures of classifier performance in terms of accuracy, precision, recall, and F1-score for Bagging_100, respectively, reached values of approximately 0.979. In comparison, DT0 reached values of 0.968, 0.972, 0.968, and 0.969, and RF7_gini reached values of 0.963, 0.964, 0.963, and 0.963. These models achieved classifier performance measures of greater than 96%. In the future, XAI using agnostic models can be an additional important tool to help analyze data, including food products, even online. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Natural language processing augments comorbidity documentation in neurosurgical inpatient admissions.

Author: Sastry, Rahul A., Setty, Aayush, Liu, David D., Zheng, Bryan, Ali, Rohaid, Weil, Robert J., Roye, G. Dean, Doberstein, Curtis E., Oyelese, Adetokunbo A., Niu, Tianyi, Gokaslan, Ziya L., and Telfeian, Albert E.
Subjects: *NATURAL language processing, *DEEP learning, *MACHINE learning, *RANDOM forest algorithms, *CEREBRAL edema, *COMORBIDITY
Abstract: Objective: To establish whether or not a natural language processing technique could identify two common inpatient neurosurgical comorbidities using only text reports of inpatient head imaging. Materials and methods: A training and testing dataset of reports of 979 CT or MRI scans of the brain for patients admitted to the neurosurgery service of a single hospital in June 2021 or to the Emergency Department between July 1–8, 2021, was identified. A variety of machine learning and deep learning algorithms utilizing natural language processing were trained on the training set (84% of the total cohort) and tested on the remaining images. A subset comparison cohort (n = 76) was then assessed to compare output of the best algorithm against real-life inpatient documentation. Results: For "brain compression", a random forest classifier outperformed other candidate algorithms with an accuracy of 0.81 and area under the curve of 0.90 in the testing dataset. For "brain edema", a random forest classifier again outperformed other candidate algorithms with an accuracy of 0.92 and AUC of 0.94 in the testing dataset. In the provider comparison dataset, for "brain compression," the random forest algorithm demonstrated better accuracy (0.76 vs 0.70) and sensitivity (0.73 vs 0.43) than provider documentation. For "brain edema," the algorithm again demonstrated better accuracy (0.92 vs 0.84) and AUC (0.45 vs 0.09) than provider documentation. Discussion: A natural language processing-based machine learning algorithm can reliably and reproducibly identify selected common neurosurgical comorbidities from radiology reports. Conclusion: This result may justify the use of machine learning-based decision support to augment provider documentation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Application of Deep Learning in Sea Surface Height Estimation of GNSS Data Sets.

Author: Su, Yucheng, Fu, Shuai, Jiao, Boyang, Su, Yekang, Mao, Taoning, He, Yuping, and Jiang, Yi
Subjects: *CONVOLUTIONAL neural networks, *GLOBAL Positioning System, *DEEP learning, *STANDARD deviations, *RANDOM forest algorithms
Abstract: In this work, we used the convolutional neural network (CNN) method to invert sea surface height (SSH) from the Global Navigation Satellite System (GNSS) delayed Doppler map (DDM) data during 2009–2017 and compared the CNN inversion data with those obtained from traditional simple random forest (RF) method. SSH observations from the OSTM/Jason-2 satellite were used to judge the merits of the two methods. The results show that both methods yield good SSH inversion results, but when the training set is 9000, the root mean square errors of the SSH inversion results based on the CNN and the RF method are 16.78 and 15.96 respectively; as the training set increases above 9000, the accuracy of the CNN method is significantly better than that of the RF method. This suggests that SSH inversion based on the CNN method will become more advantageous as more data become available. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Comparing of brain tumor diagnosis with developed local binary patterns methods.

Author: Gül, Mehmet and Kaya, Yılmaz
Subjects: *CANCER diagnosis, *CLASSIFICATION algorithms, *BRAIN tumors, *DIAGNOSTIC imaging, *RANDOM forest algorithms
Abstract: A brain tumor is one of the most lethal diseases that can affect human health and cause death. Invasive biopsy techniques are one of the most common methods of identifying brain tumor disease. As a result of this procedure, bleeding may occur during the procedure, which could harm some brain functions. Consequently, this invasive biopsy process may be extremely dangerous. To overcome such a dangerous process, medical imaging techniques, which can be used by experts in the field, can be used to conduct a thorough examination and obtain detailed information about the type and stage of the disease. Within the scope of the study, the dataset was examined, and this dataset consisted of brain images with tumors and brain images of normal patients. Numerous studies on medical images were conducted and obtained with high accuracy within the hybrid model algorithms. The dataset's images were enhanced using three distinct local binary patterns (LBP) algorithms in the developed model within the scope of the study: the LBP, step-LBP (nLBP), and angle-LBP (αLBP) algorithms. In the second stage, classification algorithms were used to evaluate the results from the LBP, nLBP and αLBP algorithms. Among the 11 classification algorithms used, four different classification algorithms were chosen as a consequence of the experimental process since they produced the best results. The classification algorithms with the best outcomes are random forest (RF), optimized forest (OF), rotation forest (RF), and instance-based learner (IBk) algorithms, respectively. With the developed model, an extremely high success rate of 99.12% was achieved within the IBk algorithm. Consequently, the clinical service can use the developed method to diagnose tumor-based medical images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Mapping the Continuous Cover of Invasive Noxious Weed Species Using Sentinel-2 Imagery and a Novel Convolutional Neural Regression Network.

Author: Xing, Fei, An, Ru, Guo, Xulin, and Shen, Xiaoji
Subjects: *CONVOLUTIONAL neural networks, *NOXIOUS weeds, *MOUNTAIN ecology, *GRASSLANDS, *ECOSYSTEMS, *POISONOUS plants, *RANDOM forest algorithms
Abstract: Invasive noxious weed species (INWS) are typical poisonous plants and forbs that are considered an increasing threat to the native alpine grassland ecosystems in the Qinghai–Tibetan Plateau (QTP). Accurate knowledge of the continuous cover of INWS across complex alpine grassland ecosystems over a large scale is required for their control and management. However, the cooccurrence of INWS and native grass species results in highly heterogeneous grass communities and generates mixed pixels detected by remote sensors, which causes uncertainty in classification. The continuous coverage of INWS at the pixel level has not yet been achieved. In this study, objective 1 was to test the capability of Senginel-2 imagery at estimating continuous INWS cover across complex alpine grasslands over a large scale and objective 2 was to assess the performance of the state-of-the-art convolutional neural network-based regression (CNNR) model in estimating continuous INWS cover. Therefore, a novel CNNR model and a random forest regression (RFR) model were evaluated for estimating INWS continuous cover using Sentinel-2 imagery. INWS continuous cover was estimated directly from Sentinel-2 imagery with an R2 ranging from 0.88 to 0.93 using the CNNR model. The RFR model combined with multiple features had a comparable accuracy, which was slightly lower than that of the CNNR model, with an R2 of approximately 0.85. Twelve green band-, red-edge band-, and near-infrared band-related features had important contributions to the RFR model. Our results demonstrate that the CNNR model performs well when estimating INWS continuous cover directly from Sentinel-2 imagery, and the RFR model combined with multiple features derived from the Sentinel-2 imager can also be used for INWS continuous cover mapping. Sentinel-2 imagery is suitable for mapping continuous INWS cover across complex alpine grasslands over a large scale. Our research provides information for the advanced mapping of the continuous cover of invasive species across complex grassland ecosystems or, more widely, terrestrial ecosystems over large spatial areas using remote sensors such as Sentinel-2. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. A Transformer and LSTM-Based Approach for Blind Well Lithology Prediction.

Author: Xie, Danyan, Liu, Zeyang, Wang, Fuhao, and Song, Zhenyu
Subjects: *DEEP learning, *K-nearest neighbor classification, *PETROLOGY, *NATURAL gas prospecting, *PETROLEUM prospecting, *RANDOM forest algorithms, *MACHINE learning
Abstract: Petrographic prediction is crucial in identifying target areas and understanding reservoir lithology in oil and gas exploration. Traditional logging methods often rely on manual interpretation and experiential judgment, which can introduce subjectivity and constraints due to data quality and geological variability. To enhance the precision and efficacy of lithology prediction, this study employed a Savitzky–Golay filter with a symmetric window for anomaly data processing, coupled with a residual temporal convolutional network (ResTCN) model tasked with completing missing logging data segments. A comparative analysis against the support vector regression and random forest regression model revealed that the ResTCN achieves the smallest MAE, at 0.030, and the highest coefficient of determination, at 0.716, which are indicative of its proximity to the ground truth. These methodologies significantly enhance the quality of the training data. Subsequently, a Transformer–long short-term memory (T-LS) model was applied to identify and classify the lithology of unexplored wells. The input layer of the Transformer model follows an embedding-like principle for data preprocessing, while the encoding block encompasses multi-head attention, Add & Norm, and feedforward components, integrating the multi-head attention mechanism. The output layer interfaces with the LSTM layer through dropout. A performance evaluation of the T-LS model against established rocky prediction techniques such as logistic regression, k-nearest neighbor, and random forest demonstrated its superior identification and classification capabilities. Specifically, the T-LS model achieved a precision of 0.88 and a recall of 0.89 across nine distinct lithology features. A Shapley analysis of the T-LS model underscored the utility of amalgamating multiple logging data sources for lithology classification predictions. This advancement partially addresses the challenges associated with imprecise predictions and limited generalization abilities inherent in traditional machine learning and deep learning models applied to lithology identification, and it also helps to optimize oil and gas exploration and development strategies and improve the efficiency of resource extraction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Predicting element concentrations by machine learning models in neutron activation analysis.

Author: Nguyen, Huu Nghia, Tran, Quang Thien, Tran, Tuan Anh, Phan, Quang Trung, Nguyen, Minh Dao, Tuong, Thi Thu Huong, and Chau, Thi Nhu Quynh
Subjects: *MACHINE learning, *DEEP learning, *NUCLEAR activation analysis, *RARE earth oxides, *STANDARD deviations, *ARTIFICIAL intelligence, *RANDOM forest algorithms
Abstract: Applications for machine learning (ML), deep learning, and other artificial intelligence models have shown great promise in nuclear physics, including not only in classification systems but also in the analysis of numerical data. This study used various ML algorithms to estimate the concentrations of six rare earth elements (Sm, La, Ce, Sc, Eu, and Tb) in both archaeological and marine sediment samples. An interesting aspect of this analysis is that 80% of the 235 data points were used for training data, which included two parameters: specific activity ( A sp ) and concentration (ρ ) by the k0-method for the purpose of model development. The remaining 20% of the dataset was held out for testing the model's accuracy. The fundamental principle of this approach is the use of regression analysis between A sp and ρ to construct a machine learning regression model. This machine learning model was subsequently applied to estimate element concentrations based on A sp values obtained from gamma spectra. The mean absolute error (MAE), root mean square error (RMSE) and the statistical measure R-squared (R2) were employed for evaluating the accuracy of the predicted models. The random forest (RF) algorithm produces smaller MAE and RMSE values and achieves better R2 values compared to other algorithms. In addition, RF shows the lowest relative bias of the concentration values of elements in reference material (NIST 2711a) compared to other prediction models. The work focuses on demonstrating that machine learning models can effectively predict the concentrations of rare earth elements, even though this is a fundamental issue in NAA and one previous study has addressed this issue for one single element. The extension of the current work and potential directions for further development will be presented in the results and discussion section. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Impact of oxymoron features and deep learning techniques in the detection of sarcastic contents.

Author: Seethappan, K. and Premalatha, K.
Subjects: *DEEP learning, *MACHINE learning, *NATURAL language processing, *FIGURES of speech, *RANDOM forest algorithms
Abstract: Even though various features have been investigated in the detection of figurative language, oxymoron features have not been considered in the classification of sarcastic content. The main objective of this work is to present a system that can automatically classify sarcastic phrases in multi-domain data. This multi-domain dataset consisting of 67850 sarcastic and non-sarcastic data is collected from various websites to identify sarcastic or non-sarcastic utterances. Multiple approaches are examined in this work to improve sarcasm identification: 1. A Combination of fasttext embedding, syntactic, semantic, lexical n-gram, and oxymoron features 2. TF-IDF feature weighting scheme 3. Three machine learning algorithms (SVM, Multinomial Naïve Bayes, and Random Forest), three deep learning algorithms (CNN, LSTM, MLP), and one ensemble model (CNN + LSTM) The CNN + LSTM model achieves a Precision of 91.32%, Recall of 92.85%, F-Score of 92.08%, accuracy of 92.01%, and Kappa of 0.84 by combining the fasttext embedding, bigram, syntactic, semantic, and oxymoron features with TF-IDF method. These experimental results show CNN + LSTM with a combination of all features outperforms the other algorithms in classifying the sarcasm in both datasets. The sarcasm classification performance of our dataset and another sarcasm news dataset was compared while applying the above model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Identifying strawberry appearance quality based on unsupervised deep learning.

Author: Zhu, Hongfei, Liu, Xingyu, Zheng, Hao, Yang, Lianhe, Li, Xuchen, and Han, Zhongzhi
Subjects: *DEEP learning, *STRAWBERRIES, *GAUSSIAN mixture models, *SUPERVISED learning, *RANDOM forest algorithms
Abstract: The strawberry appearance is an essential standard for judging the quality, so it is crucial to accurately identify the strawberry appearance quality for intelligent picking. This study proposed a new strawberry appearance quality detection based on unsupervised deep learning. Firstly, using deep learning (Resnet18, Resnet50, and Resnet101) to extract the strawberry image feature information. And using the t-SNE (t-distribution stochastic neighbor embedding) to reduce the feature vectors' dimension. Finally, the unsupervised learning method (Gaussian Mixture Model) was used to cluster strawberries' feature points. The results showed that: (1) the clustering performance based on Resnet101 was effective in 2-dimensional space, the cluster accuracy was 94.89%, and the validation accuracy was 91.79%. (2) The clustering method based on Resnet50 had good performance in the 3-dimensional space, the cluster accuracy was 96.10%, and the validation accuracy was 93.08%. (3) The accuracy of deep features plus RF (random forest) was 95.00% under limited data. Thus this method will promote intelligent picking strawberry equipment and it will overcome the supervised learning drawback that divides image datasets according to prior knowledge. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Hybrid deep learning approach for product categorization in e-commerce.

Author: Gupta, Meenu, Kumar, Rakesh, Ved, Chetanya, and Taneja, Soham
Subjects: *MACHINE learning, *LANGUAGE models, *ELECTRONIC commerce, *RANDOM forest algorithms, *DEEP learning
Abstract: Selling and purchasing products online is made possible by e-commerce. For both service providers and clients, organizing and looking for items is a tedious procedure. The items must be organized and labeled, which takes up a lot of time. Product categorization is the process of automatically predicting a product's catalog route based on a predetermined catalog hierarchy in which all categories are formulated. Knowing how to add your goods to the most relevant category on any marketplace, including Flipkart, and Amazon is crucial to its selling. Categorization is a lengthy process that takes extensive study on the platform which has been improved with different methodologies used in this work. Machine Learning (ML) and Deep Learning (DL) models are used to sort items into recognized categories. Using information such as the item's title and summary, this model can properly classify it in each classification. Random Forest (RF) outperformed the other ML models, such as SVM, KNN, and Naive Bayes (NB), with an f1-score of 97 percent and a macro average of 94 percent. BERT model fared the best among the DL models (LSTM, CNN, BERT, and Hybrid CNN - LSTM model) with an f1-score of 97 percent and a macro average of 88 percent. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Phishing website detection using ensemble learning models.

Author: Iffath, Needa, Mummadi, Upendra Kumar, Taranum, Fahmina, Ahmad, Syed Shabbeer, Khan, Imtiyaz, and Shravani, D.
Subjects: *DEEP learning, *COMPUTER vision, *ARTIFICIAL intelligence, *PHISHING, *MACHINE learning, *RANDOM forest algorithms
Abstract: A malicious website, often known as a malicious URL, is a platform considering hosting unwanted content including spam, harmful advertisements, and dangerous websites. It is crucial towards quickly identify dangerous URLs. Blacklisting, regular expression, & signature matching techniques have all been employed in earlier investigations. These methods are utterly useless considering identifying new URLs, malicious URL variants, or URLs that have never been seen before. Machine learning-based solution that has been suggested can help towards solve this problem. Considering this kind about solution, in-depth study about feature engineering & feature representation about security artifact types, such as URLs, is necessary. Additionally, resources considering feature engineering & feature representation must be continuously improved towards support variations about current URLs or completely new URLs. Deep learning, machine learning, & artificial intelligence (AI) systems have recently been used towards achieve human-level performance in a number about areas & even surpass human eyesight in a number about computer vision applications. They can automatically extract best feature representation from raw inputs. We propose various algorithms, including SVM, Random forest, XgBoost, & AdaBoost, towards capitalise on & turn performance increase about them into cyber security area. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Estimating the impact of engineering education among students in India using machine learning and deep learning techniques.

Author: Ahmed, Syed Thouheed, Bhushan, S. Bharath, Srinivas, Aditya Sai, Basha, Syed Muzamil, and Reddy, Bhaskar
Subjects: *ENGINEERING education, *DEEP learning, *MACHINE learning, *CONVOLUTIONAL neural networks, *SUPPORT vector machines, *RANDOM forest algorithms
Abstract: After a deep questioner among students, faculty members and higher education experts. The demand for qualified engineers in the specific field as understandably gone down. The situation is grimmer for tier-2 and tier-3 colleges in India. The purpose of opting engineering course is not to get expertise in the particular course, instead to get a job in government sector with valid degree. The problem identified in the field of engineering education towards improving the quality in education is addressed with the help of exploratory data analysis. The dataset used in our experiment is collect from UCI machine Learning Repository having 33 variables and 1044 observations. The contribution made in the paper is to identify the vital attributes using single and multi-variant regression techniques. To perform prediction using Decision Tree, Random Forest, Support vector Machine and compare their performance in terms of classification accuracy and F-Score. In addition to that a convolution Neural Network (CNN) model is established in which, the vital attributes identified using regression techniques are provided as inputs and weights at each stage is estimated using Gradient Decent algorithm with step size 0.5. The classification accuracy from 63% is improved to 97% with the help of CNN model in 26664 iterations. The finding of the present research states that student having backlogs are less frequently opting for higher studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Learning vs. understanding: When does artificial intelligence outperform process-based modeling in soil organic carbon prediction?

Author: Bernardini, Luca G., Rosinger, Christoph, Bodner, Gernot, Keiblinger, Katharina M., Izquierdo-Verdiguier, Emma, Spiegel, Heide, Retzlaff, Carl O., and Holzinger, Andreas
Subjects: *ARTIFICIAL intelligence, *CARBON in soils, *SUPPORT vector machines, *SOIL dynamics, *RANDOM forest algorithms, *MACHINE learning, *DEEP learning
Abstract: In recent years, machine learning (ML) algorithms have gained substantial recognition for ecological modeling across various temporal and spatial scales. However, little evaluation has been conducted for the prediction of soil organic carbon (SOC) on small data sets commonly inherent to long-term soil ecological research. In this context, the performance of ML algorithms for SOC prediction has never been tested against traditional process-based modeling approaches. Here, we compare ML algorithms, calibrated and uncalibrated process-based models as well as multiple ensembles on their performance in predicting SOC using data from five long-term experimental sites (comprising 256 independent data points) in Austria. Using all available data, the ML-based approaches using Random forest and Support vector machines with a polynomial kernel were superior to all process-based models. However, the ML algorithms performed similar or worse when the number of training samples was reduced or when a leave-one-site-out cross validation was applied. This emphasizes that the performance of ML algorithms is strongly dependent on the data-size related quality of learning information following the well-known curse of dimensionality phenomenon, while the accuracy of process-based models significantly relies on proper calibration and combination of different modeling approaches. Our study thus suggests a superiority of ML-based SOC prediction at scales where larger datasets are available, while process-based models are superior tools when targeting the exploration of underlying biophysical and biochemical mechanisms of SOC dynamics in soils. Therefore, we recommend applying ensembles of ML algorithms with process-based models to combine advantages inherent to both approaches. • We evaluated process-based (PB) and machine-learning (ML) models for SOC prediction. • Random forest & support vector machine predicted SOC superior to PB models. • Reduced data availability and quality significantly weakened ML model performance. • ML-assisted PB model ensembles strongly improved model performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques.

Author: Elsherbini, Ahmed M. A., Elkholy, Amr Hassan, Fadel, Youssef M., Goussarov, Gleb, Elshal, Ahmed Mohamed, El-Hadidi, Mohamed, and Mysara, Mohamed
Subjects: *DEEP learning, *MACHINE learning, *SARS-CoV-2, *COVID-19 pandemic, *RANDOM forest algorithms
Abstract: The global spread of the SARS-CoV-2 pandemic, originating in Wuhan, China, has had profound consequences on both health and the economy. Traditional alignment-based phylogenetic tree methods for tracking epidemic dynamics demand substantial computational power due to the growing number of sequenced strains. Consequently, there is a pressing need for an alignment-free approach to characterize these strains and monitor the dynamics of various variants. In this work, we introduce a swift and straightforward tool named GenoSig, implemented in C++. The tool exploits the Di and Tri nucleotide frequency signatures to delineate the taxonomic lineages of SARS-CoV-2 by employing diverse machine learning (ML) and deep learning (DL) models. Our approach achieved a tenfold cross-validation accuracy of 87.88% (± 0.013) for DL and 86.37% (± 0.0009) for Random Forest (RF) model, surpassing the performance of other ML models. Validation using an additional unexposed dataset yielded comparable results. Despite variations in architectures between DL and RF, it was observed that later clades, specifically GRA, GRY, and GK, exhibited superior performance compared to earlier clades G and GH. As for the continental origin of the virus, both DL and RF models exhibited lower performance than in predicting clades. However, both models demonstrated relatively higher accuracy for Europe, North America, and South America compared to other continents, with DL outperforming RF. Both models consistently demonstrated a preference for cytosine and guanine over adenine and thymine in both clade and continental analyses, in both Di and Tri nucleotide frequencies signatures. Our findings suggest that GenoSig provides a straightforward approach to address taxonomic, epidemiological, and biological inquiries, utilizing a reductive method applicable not only to SARS-CoV-2 but also to similar research questions in an alignment-free context. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Aeroengine Remaining Life Prediction Using Feature Selection and Improved SE Blocks.

Author: Wang, Hairui, Xu, Shijie, Zhu, Guifu, and Li, Ya
Subjects: *DEEP learning, *CONVOLUTIONAL neural networks, *REMAINING useful life, *FEATURE selection, *RANDOM forest algorithms
Abstract: Aeroengines use numerous sensors to detect equipment health and ensure proper operation. Currently, filtering useful sensor data and removing useless data is challenging in predicting the remaining useful life (RUL) of an aeroengine using deep learning. To reduce computational costs and improve prediction performance, we use random forest to evaluate the feature importance of sensor data. Based on the size of the feature corresponding to the Gini index, we select the appropriate sensor. This helps us to determine which sensor to use and ensures that the computational resources are not wasted on unnecessary sensors. Considering that the RUL of equipment changes in a progressively more complex manner as the equipment is used over time, we propose an improved squeeze and excitation block (SSE) and combine it with a convolutional neural network (CNN). By enhancing the feature selection ability of CNN through segmented squeeze and excitation block, the model can focus on important information within features to effectively improve prediction performance. We compared our experiments with other RUL experiments on the CMAPSS aeroengine dataset and then conducted ablation experiments to verify the critical role of the methods we used. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. A Lightning Classification Method Based on Convolutional Encoding Features.

Author: Zhu, Shunxing, Zhang, Yang, Fan, Yanfeng, Sun, Xiubin, Zheng, Dong, Zhang, Yijun, Lyu, Weitao, Zhang, Huiyi, and Wang, Jingxuan
Subjects: *CONVOLUTIONAL neural networks, *RANDOM forest algorithms, *THUNDERSTORMS
Abstract: At present, for business lightning positioning systems, the classification of lightning discharge types is mostly based on lightning pulse signal features, and there is still a lot of room for improvement. We propose a lightning discharge classification method based on convolutional encoding features. This method utilizes convolutional neural networks to extract encoding features, and uses random forests to classify the extracted encoding features, achieving high accuracy discrimination for various lightning discharge events. Compared with traditional multi-parameter-based methods, the new method proposed in this paper has the ability to identify multiple lightning discharge events and does not require precise detailed feature engineering to extract individual pulse parameters. The accuracy of this method for identifying lightning discharge types in intra-cloud flash (IC), cloud-to-ground flash (CG), and narrow bipolar events (NBEs) is 97%, which is higher than that of multi-parameter methods. Moreover, our method can complete the classification task of lightning signals at a faster speed. Under the same conditions, the new method only requires 28.2 µs to identify one pulse, while deep learning-based methods require 300 µs. This method has faster recognition speed and higher accuracy in identifying multiple discharge types, which can better meet the needs of real-time business positioning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Predicting lncRNA–protein interactions through deep learning framework employing multiple features and random forest algorithm.

Author: Liang, Ying, Yin, XingRui, Zhang, YangSen, Guo, You, and Wang, YingLong
Subjects: *RANDOM forest algorithms, *DEEP learning, *CONVOLUTIONAL neural networks, *RNA-protein interactions, *MICE, *MACHINE learning
Abstract: RNA-protein interaction (RPI) is crucial to the life processes of diverse organisms. Various researchers have identified RPI through long-term and high-cost biological experiments. Although numerous machine learning and deep learning-based methods for predicting RPI currently exist, their robustness and generalizability have significant room for improvement. This study proposes LPI-MFF, an RPI prediction model based on multi-source information fusion, to address these issues. The LPI-MFF employed protein–protein interactions features, sequence features, secondary structure features, and physical and chemical properties as the information sources with the corresponding coding scheme, followed by the random forest algorithm for feature screening. Finally, all information was combined and a classification method based on convolutional neural networks is used. The experimental results of fivefold cross-validation demonstrated that the accuracy of LPI-MFF on RPI1807 and NPInter was 97.60% and 97.67%, respectively. In addition, the accuracy rate on the independent test set RPI1168 was 84.9%, and the accuracy rate on the Mus musculus dataset was 90.91%. Accordingly, LPI-MFF demonstrated greater robustness and generalization than other prevalent RPI prediction methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Explanation of the influence of geomorphometric variables on the landform classification based on selected areas in Poland.

Author: Dyba, Krzysztof
Subjects: *DEEP learning, *IMAGE recognition (Computer vision), *RANDOM forest algorithms, *MACHINE learning, *CONVOLUTIONAL neural networks, *GEOMORPHOLOGICAL mapping, *GEODIVERSITY
Abstract: In recent years, automatic image classification methods have significantly progressed, notably black box algorithms such as machine learning and deep learning. Unfortunately, such efforts only focused on improving performance, rather than attempting to explain and interpret how classification models actually operate. This article compares three state-of-the-art algorithms incorporating random forests, gradient boosting and convolutional neural networks for geomorphological mapping. It also attempts to explain how the most effective classifier makes decisions by evaluating which of the geomorphometric variables are most important for automatic mapping and how they affect the classification results using one of the explainable artificial intelligence techniques, namely accumulated local effects (ALE). This method allows us to understand the relationship between predictors and the model's outcome. For these purposes, eight sheets of the digital geomorphological map of Poland on the scale of 1:100,000 were used as the reference material. The classification results were validated using the holdout method and cross-validation for individual sheets representing different morphogenetic zones. The terrain elevation entropy, absolute elevation, aggregated median elevation and standard deviation of elevation had the greatest impact on the classification results among the 15 geomorphometric variables considered. The ALE analysis was conducted for the XGBoost classifier, which achieved the highest accuracy of 92.8%, ahead of Random Forests at 84% and LightGBM at 73.7% and U-Net at 59.8%. We conclude that automatic classification can support geomorphological mapping only if the geomorphological characteristics in the predicted area are similar to those in the training dataset. The ALE plots allow us to analyze the relationship between geomorphometric variables and landform membership, which helps clarify their role in the classification process. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Identification of apple varieties using hybrid transfer learning and multi-level feature extraction.

Author: Kılıçarslan, Serhat, Dönmez, Emrah, and Kılıçarslan, Sabire
Subjects: *MACHINE learning, *FEATURE extraction, *BLENDED learning, *FEATURE selection, *ORCHARDS, *RANDOM forest algorithms, *APPLES, *CULTIVARS
Abstract: The process of identifying apple varieties holds pivotal importance in pomology and agricultural science. This intricate task not only aids growers in optimizing orchard management, but also profoundly impacts consumers and the apple industry as a whole. Selecting the right apple varieties tailored to specific environmental conditions and market demands is instrumental for the sustainability and economic viability of apple cultivation. Accurate apple variety identification further contributes to maintaining product quality and ensuring consumer satisfaction. Traditional identification methods, however, are susceptible to human error given the vast diversity of apple cultivars. In response, the integration of advanced technologies, including image processing and machine learning, has emerged as a promising approach to enhance accuracy and efficiency in apple variety identification, benefitting both the agricultural and commercial sectors. The classification of apple types involved feature extraction using three methods: MobileNetV2, EfficientNetV2B0, and a combination of GLCM and Color-Space algorithms from apple images. Machine learning models were then built to classify apple varieties, utilizing various algorithms such as support vector machine (SVM), k-nearest neighbors (Knn), random subspace (RSS), and random forest. In the case of "EfficientNetV2B0 + GLCM + Color-Space" and utilizing the ReliefF feature selection method, the random forest algorithm attains peak performance with an accuracy, precision, recall, and F-score all registering an impressive 98.33%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Interpretable Convolutional Neural Network for Analyzing Precipitation in the Pre-Rainy Season of South China.

Author: Liu, Shengjun, Yan, Wenjie, Liu, Xinru, Hu, Yamin, and Yang, Dangfu
Subjects: *CONVOLUTIONAL neural networks, *DOWNSCALING (Climatology), *DEEP learning, *RANDOM forest algorithms, *SEASONS
Abstract: The research and application of convolutional neural networks (CNNs) on statistical downscaling have been hampered by the fact that deep learning is highly dependent on sample size and is considered to be a black-box model. Therefore, a CNN model with transfer learning (CNN-TL) is proposed to study the pre-rainy season precipitation of South China. First, an augmented monthly dataset is created by sliding a fixed-length window over the daily circulation field and precipitation data for the entire year. Next, a base CNN network is pretrained on the augmented dataset, and then the network parameters are tuned on the actual monthly dataset from South China. Then, guided backpropagation is conducted to obtain the distribution regions of the key features and explain the net. The coefficient of determination R2 and root-mean-square error (RMSE) show that the CNN-TL model has higher explanatory power and better fitting performance than the feature extraction–based random forest. In comparison with the base CNN, the transfer learning approach can improve the explanatory power of the model by 10.29% and reduce the average RMSE by 6.82%. In addition, the interpretation results of the model show that the critical regions are primarily South China and its surrounding areas, including the Indochina Peninsula, the Bay of Bengal, and the South China Sea. Furthermore, the ablation experiments and composite analysis illustrate that these regions are very important. Significance Statement: To mitigate the challenges posed by small sample sizes and the transparency of deep learning in downscaling problems, we propose a convolutional neural network based on sample augmentation and transfer learning to study the monthly precipitation downscaling problem during the preflood period in South China. In comparison with random forests and conventional convolutional neural networks, our model achieves an optimal interpretation rate and stability. In addition, we explore the interpretability of the model using guided backpropagation to find the distribution of key features within the large-scale circulation field, thus increasing the credibility of the model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Multiclass Sentiment Prediction of Airport Service Online Reviews Using Aspect-Based Sentimental Analysis and Machine Learning.

Author: Alanazi, Mohammed Saad M., Li, Jun, and Jenkins, Karl W.
Subjects: *MACHINE learning, *RANDOM forest algorithms, *AIRPORT management, *CONSUMERS' reviews, *DEEP learning, *USER-generated content
Abstract: Airport service quality ratings found on social media such as Airline Quality and Google Maps offer invaluable insights for airport management to improve their quality of services. However, there is currently a lack of research analysing these reviews by airport services using sentimental analysis approaches. This research applies multiclass models based on Aspect-Based Sentimental Analysis to conduct a comprehensive analysis of travellers' reviews, in which the major airport services are tagged by positive, negative, and non-existent sentiments. Seven airport services commonly utilised in previous studies are also introduced. Subsequently, various Deep Learning architectures and Machine Learning classification algorithms are developed, tested, and compared using data collected from Twitter, Google Maps, and Airline Quality, encompassing travellers' feedback on airport service quality. The results show that the traditional Machine Learning algorithms such as the Random Forest algorithm outperform Deep Learning models in the multiclass prediction of airport service quality using travellers' feedback. The findings of this study offer concrete justifications for utilising multiclass Machine Learning models to understand the travellers' sentiments and therefore identify airport services required for improvement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Off-design performance analysis of a radial fan using experimental, computational, and artificial intelligence approaches.

Author: Moradihaji, Kowsar, Ghassemi, Majid, and Pourbagian, Mahdi
Subjects: *ARTIFICIAL intelligence, *COMPUTATIONAL fluid dynamics, *SUPPORT vector machines, *RANDOM forest algorithms, *DEEP learning, *INTERPOLATION algorithms, *ARTIFICIAL neural networks
Abstract: Radial fans play a critical role as indispensable turbomachines in various industrial sectors. However, the conventional manufacturing process for these fans is often characterized by its resource-intensive and time-consuming nature. Traditionally, computational fluid dynamics (CFD) has been the go-to method for predicting and analyzing the performance of radial fans at different geometrical and operational conditions. Yet, in recent years, the rapid advancements in machine learning (ML) and deep learning (DL) techniques, particularly the rise of artificial neural networks (ANNs), have propelled significant progress in the field of predicting and optimizing the performance of radial fans. The present study aims to analyze the performance of a radial fan through a comprehensive experimental investigation and a meticulous three-dimensional numerical simulation. Subsequently, in order to predict the off-design performance of the fan, an extensive set of numerical simulations is conducted at various volumetric flow rates and rotational speeds. These simulations are used to analyze the fan performance and identify the most efficient operating condition. Moreover, the simulations serve as inputs for a finely-tuned ANN architecture. The predictive accuracy of the ANN model for both interpolation and extrapolation cases is then compared against two alternative techniques, namely support vector machine (SVM) and random forest (RF). The results explicitly highlight the superiority of the ANN model in terms of its predictive accuracy, thereby solidifying its position as the most reliable method for predicting the performance of radial fans. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction.

Author: Luo, Zeyu, Wang, Rui, Sun, Yawen, Liu, Junhao, Chen, Zongqing, and Zhang, Yu-Juan
Subjects: *ARTIFICIAL neural networks, *LANGUAGE models, *AMINO acid sequence, *DEEP learning, *PLANT mitochondria, *RANDOM forest algorithms, *STRUCTURAL design, *FEATURE extraction, *GOLGI apparatus
Abstract: As the application of large language models (LLMs) has broadened into the realm of biological predictions, leveraging their capacity for self-supervised learning to create feature representations of amino acid sequences, these models have set a new benchmark in tackling downstream challenges, such as subcellular localization. However, previous studies have primarily focused on either the structural design of models or differing strategies for fine-tuning, largely overlooking investigations into the nature of the features derived from LLMs. In this research, we propose different ESM2 representation extraction strategies, considering both the character type and position within the ESM2 input sequence. Using model dimensionality reduction, predictive analysis and interpretability techniques, we have illuminated potential associations between diverse feature types and specific subcellular localizations. Particularly, the prediction of Mitochondrion and Golgi apparatus prefer segments feature closer to the N-terminal, and phosphorylation site-based features could mirror phosphorylation properties. We also evaluate the prediction performance and interpretability robustness of Random Forest and Deep Neural Networks with varied feature inputs. This work offers novel insights into maximizing LLMs' utility, understanding their mechanisms, and extracting biological domain knowledge. Furthermore, we have made the code, feature extraction API, and all relevant materials available at https://github.com/yujuan-zhang/feature-representation-for-LLMs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Visual Detection of Road Cracks for Autonomous Vehicles Based on Deep Learning.

Author: Meftah, Ibrahim, Hu, Junping, Asham, Mohammed A., Meftah, Asma, Zhen, Li, and Wu, Ruihuan
Subjects: *AUTONOMOUS vehicles, *DEEP learning, *CONVOLUTIONAL neural networks, *CONCRETE pavements, *RANDOM forest algorithms, *MACHINE learning
Abstract: Detecting road cracks is essential for inspecting and assessing the integrity of concrete pavement structures. Traditional image-based methods often require complex preprocessing to extract crack features, making them challenging when dealing with noisy concrete surfaces in diverse real-world scenarios, such as autonomous vehicle road detection. This study introduces an image-based crack detection approach that combines a Random Forest machine learning classifier with a deep convolutional neural network (CNN) to address these challenges. Three state-of-the-art models, namely MobileNet, InceptionV3, and Xception, were employed and trained using a dataset of 30,000 images to build an effective CNN. A systematic comparison of validation accuracy across various base learning rates identified a base learning rate of 0.001 as optimal, achieving a maximum validation accuracy of 99.97%. This optimal learning rate was then applied in the subsequent testing phase. The robustness and flexibility of the trained models were evaluated using 6,000 test photos, each with a resolution of 224 × 224 pixels, which were not part of the training or validation sets. The outstanding results, boasting a remarkable 99.95% accuracy, 99.95% precision, 99.94% recall, and a matching 99.94% F1 Score, unequivocally affirm the efficacy of the proposed technique in precisely identifying road fractures in photographs taken on real concrete surfaces. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Automated Mapping of Land Cover Type within International Heterogenous Landscapes Using Sentinel-2 Imagery with Ancillary Geospatial Data.

Author: Lasko, Kristofer, O'Neill, Francis D., and Sava, Elena
Subjects: *LAND cover, *GEOSPATIAL data, *ZONING, *DEEP learning, *RANDOM forest algorithms, *EUCLIDEAN distance
Abstract: A near-global framework for automated training data generation and land cover classification using shallow machine learning with low-density time series imagery does not exist. This study presents a methodology to map nine-class, six-class, and five-class land cover using two dates (winter and non-winter) of a Sentinel-2 granule across seven international sites. The approach uses a series of spectral, textural, and distance decision functions combined with modified ancillary layers (such as global impervious surface and global tree cover) to create binary masks from which to generate a balanced set of training data applied to a random forest classifier. For the land cover masks, stepwise threshold adjustments were applied to reflectance, spectral index values, and Euclidean distance layers, with 62 combinations evaluated. Global (all seven scenes) and regional (arid, tropics, and temperate) adaptive thresholds were computed. An annual 95th and 5th percentile NDVI composite was used to provide temporal corrections to the decision functions, and these corrections were compared against the original model. The accuracy assessment found that the regional adaptive thresholds for both the two-date land cover and the temporally corrected land cover could accurately map land cover type within nine-class (68.4% vs. 73.1%), six-class (79.8% vs. 82.8%), and five-class (80.1% vs. 85.1%) schemes. Lastly, the five-class and six-class models were compared with a manually labeled deep learning model (Esri), where they performed with similar accuracies (five classes: Esri 80.0 ± 3.4%, region corrected 85.1 ± 2.9%). The results highlight not only performance in line with an intensive deep learning approach, but also that reasonably accurate models can be created without a full annual time series of imagery. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Extraction of Lilium davidii var. unicolor Planting Information Based on Deep Learning and Multi-Source Data.

Author: Shi, Yinfang, Zhang, Puhan, and Wang, Zhaoyang
Subjects: *LILIES, *DEEP learning, *PLANTING, *FEATURE selection, *RANDOM forest algorithms
Abstract: Accurate extraction of crop acreage is an important element of digital agriculture. This study uses Sentinel-2A, Sentinel-1, and DEM as data sources to construct a multidimensional feature dataset encompassing spectral features, vegetation index, texture features, terrain features, and radar features. The Relief-F algorithm is applied for feature selection to identify the optimal feature dataset. And the combination of deep learning and the random forest (RF) classification method is utilized to identify lilies in Qilihe District and Yuzhong County of Lanzhou City, obtain their planting structure, and analyze their spatial distribution characteristics in Gansu Province. The findings indicate that terrain features significantly contribute to ground object classification, with the highest classification accuracy when the number of features in the feature dataset is 36. The precision of the deep learning classification method exceeds that of RF, with an overall classification accuracy and kappa coefficient of 95.9% and 0.934, respectively. The Lanzhou lily planting area is 137.24 km2, and it primarily presents a concentrated and contiguous distribution feature. The study's findings can serve as a solid scientific foundation for Lanzhou City's lily planting structure adjustment and optimization and a basis of data for local lily yield forecasting, development, and application. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Automated Built-Up Infrastructure Land Cover Extraction Using Index Ensembles with Machine Learning, Automated Training Data, and Red Band Texture Layers.

Author: Maloney, Megan C., Becker, Sarah J., Griffin, Andrew W. H., Lyon, Susan L., and Lasko, Kristofer
Subjects: *MACHINE learning, *DEEP learning, *RANDOM forest algorithms, *LAND management, *STATISTICAL sampling
Abstract: Automated built-up infrastructure classification is a global need for planning. However, individual indices have weaknesses, including spectral confusion with bare ground, and computational requirements for deep learning are intensive. We present a computationally lightweight method to classify built-up infrastructure. We use an ensemble of spectral indices and a novel red-band texture layer with global thresholds determined from 12 diverse sites (two seasonally varied images per site). Multiple spectral indexes were evaluated using Sentinel-2 imagery. Our texture metric uses the red band to separate built-up infrastructure from spectrally similar bare ground. Our evaluation produced global thresholds by evaluating ground truth points against a range of site-specific optimal index thresholds across the 24 images. These were used to classify an ensemble, and then spectral indexes, texture, and stratified random sampling guided training data selection. The training data fit a random forest classifier to create final binary maps. Validation found an average overall accuracy of 79.95% (±4%) and an F1 score of 0.5304 (±0.07). The inclusion of the texture metric improved overall accuracy by 14–21%. A comparison to site-specific thresholds and a deep learning-derived layer is provided. This automated built-up infrastructure mapping framework requires only public imagery to support time-sensitive land management workflows. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Deep-Learning-Based Automatic Extraction of Aquatic Vegetation from Sentinel-2 Images—A Case Study of Lake Honghu.

Author: Gao, Hangyu, Li, Ruren, Shen, Qian, Yao, Yue, Shao, Yifan, Zhou, Yuting, Li, Wenxin, Li, Jinzhi, Zhang, Yuting, and Liu, Mingxia
Subjects: *POTAMOGETON, *PLANKTON blooms, *LAKE management, *LAKES, *VEGETATION mapping, *RANDOM forest algorithms
Abstract: Aquatic vegetation is an important component of aquatic ecosystems; therefore, the classification and mapping of aquatic vegetation is an important aspect of lake management. Currently, the decision tree (DT) classification method based on spectral indices has been widely used in the extraction of aquatic vegetation data, but the disadvantage of this method is that it is difficult to fix the threshold value, which, in turn, affects the automatic classification effect. In this study, Sentinel-2 MSI data were used to produce a sample set (about 930 samples) of aquatic vegetation in four inland lakes (Lake Taihu, Lake Caohai, Lake Honghu, and Lake Dongtinghu) using the visual interpretation method, including emergent, floating-leaved, and submerged vegetation. Based on this sample set, a DL model (Res-U-Net) was used to train an automatic aquatic vegetation extraction model. The DL model achieved a higher overall accuracy, relevant error, and kappa coefficient (90%, 8.18%, and 0.86, respectively) compared to the DT method (79%, 23.07%, and 0.77) and random forest (78%,10.62% and 0.77) when utilizing visual interpretation results as the ground truth. When utilizing measured point data as the ground truth, the DL model exhibited accuracies of 59%, 78%, and 91% for submerged, floating-leaved, and emergent vegetation, respectively. In addition, the model still maintains good recognition in the presence of clouds with the influence of water bloom. When applying the model to Lake Honghu from January 2017 to October 2023, the obtained temporal variation patterns in the aquatic vegetation were consistent with other studies. The study in this paper shows that the proposed DL model has good application potential for extracting aquatic vegetation data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Fast building detection using new feature sets derived from a very high-resolution image, digital elevation and surface model.

Author: Günen, Mehmet Akif
Subjects: *DIGITAL elevation models, *MACHINE learning, *RANDOM forest algorithms, *DECISION trees, *ROOFTOP construction, *OBJECT tracking (Computer vision)
Abstract: Detecting building rooftops with very high-resolution (VHR) images is an important issue in many fields, including disaster management, urban planning, and climate change research. Buildings with varying geometrical features are challenging to detect accurately from VHR image due to complicated image scenes containing spectrally similar objects, illumination, occlusions, viewing angles, and shadows. This study aims to detect building rooftops with high accuracy using a new framework that includes VHR image, visible band difference vegetation index, digital surface and elevation models, the terrain ruggedness and the topographic position index. Five distinct feature sets were generated in order of importance by exposing the ten related stacking features to a feature selection procedure using the maximum relevance minimum redundancy method. Then, Auto-Encoder, k-NN, decision tree, RUSBoost, and random forest machine learning algorithms were utilized for binary classification. Random forest yielded the highest accuracy (97.2% F-score, 98.72% accuracy) when all features (F10) were used, while decision tree was the least successful (59.16% F-score, 83.56% accuracy) for RGB feature set (FRGB). It was revealed that classification of F10 with random forest increased F-score by about 23% compared to classification with FRGB. Additionally, McNemar's tests showed no statistically significant difference between random forest vs k-NN and decision tree vs RUSBoost. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Pentad-mean air temperature prediction using spatial autocorrelation and attention-based deep learning model.

Author: Xu, Lei, Zhang, Xi, Du, Wenying, Yu, Hongchu, Chen, Zeqiang, and Chen, Nengcheng
Subjects: *ATMOSPHERIC temperature, *DEEP learning, *HAILSTORMS, *RANDOM forest algorithms, *NATURAL disasters, *STORMS
Abstract: Abnormal changes in air temperature cause natural disasters such as droughts, hailstorms, and storms, thereby affecting the normal lives of human beings. Consequently, timely and accurate air temperature prediction is essential for human production and livelihood. Traditional air temperature prediction methods are less accurate and less consider the spatial relationship between air temperature in different regions. In this paper, we propose a new deep learning model, convolutional long short-term memory based on channel attention and spatial autocorrelation (ConvLSTM-CASA), which focuses on the spatial correlation between ambient air temperatures and can effectively capture the interaction of air temperatures in different regions. The results show that the ConvLSTM-CASA model has an average R2 of 0.954 and MSE of 5.245 for pentad-mean temperature prediction over the Yangtze River basin. Compared with baseline forecasting models, the MSE accuracy by the ConvLSTM-CASA model improved by 72.45%, 48.95%, 48.97%, 47.79%, and 22.63% over the decision tree regression (DTR), multiple linear regression (MLR), random forests (RF), long short-term memory (LSTM), and ConvLSTM models, respectively. The ConvLSTM-CASA model is expected to outperform the ConvLSTM model over 90% of the area, suggesting robust prediction skill improvement over space. The ConvLSTM-CASA model provides new insights for data-driven pentad-mean air temperature prediction by including elaborate channel and spatial feature modeling, which aid individuals in comprehending the intricate patterns of air temperature fluctuations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Multitask Learning for Mental Health: Depression, Anxiety, Stress (DAS) Using Wearables.

Author: Saylam, Berrenur and İncel, Özlem Durmaz
Subjects: *MENTAL depression, *MENTAL health, *PSYCHOLOGICAL literature, *ANXIETY, *RANDOM forest algorithms
Abstract: This study investigates the prediction of mental well-being factors—depression, stress, and anxiety—using the NetHealth dataset from college students. The research addresses four key questions, exploring the impact of digital biomarkers on these factors, their alignment with conventional psychology literature, the time-based performance of applied methods, and potential enhancements through multitask learning. The findings reveal modality rankings aligned with psychology literature, validated against paper-based studies. Improved predictions are noted with temporal considerations, and further enhanced by multitasking. Mental health multitask prediction results show aligned baseline and multitask performances, with notable enhancements using temporal aspects, particularly with the random forest (RF) classifier. Multitask learning improves outcomes for depression and stress but not anxiety using RF and XGBoost. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Analyzing threat flow over network using ensemble-based dense network model.

Author: Harita, U. and Mohammed, Moulana
Subjects: *CYBERTERRORISM, *SUPERVISED learning, *RANDOM forest algorithms, *INTERNET access, *DEEP learning, *INTRUSION detection systems (Computer security), *MACHINE learning
Abstract: Cyberattacks may occur in any device with an Internet connection. The majority of businesses either advise preventative measures or creating gadgets with integrated cyber threat protection mechanisms. However, the availability of tools and methods needs to go beyond standard preventative measures which make the process more difficult to identify cyber threats. One important tool for combating these intrusions is an intrusion detection system based on deep learning. To analyze intrusion detection systems, this study suggests random forest-based ensemble methods. Using random forest, tests were carried out in the first phase. In the subsequent stage, random forest is utilized due to their recent notable advancements in supervised learning performance. Deep learning methods like long short-term memory (LSTM) and autoencoder (AE) networks are used in the experiment. The work is optimized using Harris hawks optimization (HHO). For experimental purposes, the Kaggle dataset is utilized. Using this dataset, the results demonstrate that IDS have greatly improved, surpassing the state of the art. The applicability model in IDS is strengthened by this enhancement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Development and Application of Traditional Chinese Medicine Using AI Machine Learning and Deep Learning Strategies.

Author: Pan, Danping, Guo, Yilei, Fan, Yongfu, and Wan, Haitong
Subjects: *TONGUE disease diagnosis, *CHINESE medicine, *RANDOM forest algorithms, *HUMAN services programs, *ARTIFICIAL intelligence, *LOGISTIC regression analysis, *NATURAL language processing, *SOFTWARE analytics, *SUPPORT vector machines, *DEEP learning, *ARTIFICIAL neural networks, *MACHINE learning, *DECISION trees, *DIGITAL image processing, *PULSE (Heart beat), *ALGORITHMS, *REGRESSION analysis, *CLOUD computing
Abstract: Traditional Chinese medicine (TCM) has been used for thousands of years and has been proven to be effective at treating many complicated illnesses with minimal side effects. The application and advancement of TCM are, however, constrained by the absence of objective measuring standards due to its relatively abstract diagnostic methods and syndrome differentiation theories. Ongoing developments in machine learning (ML) and deep learning (DL), specifically in computer vision (CV) and natural language processing (NLP), offer novel opportunities to modernize TCM by exploring the profound connotations of its theory. This review begins with an overview of the ML and DL methods employed in TCM; this is followed by practical instances of these applications. Furthermore, extensive discussions emphasize the mature integration of ML and DL in TCM, such as tongue diagnosis, pulse diagnosis, and syndrome differentiation treatment, highlighting their early successful application in the TCM field. Finally, this study validates the accomplishments and addresses the problems and challenges posed by the application and development of TCM powered by ML and DL. As ML and DL techniques continue to evolve, modern technology will spark new advances in TCM. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Optimizing anomaly-based attack detection using classification machine learning.

Author: Gouda, Hany Abdelghany, Ahmed, Mohamed Abdelslam, and Roushdy, Mohamed Ismail
Subjects: *DEEP learning, *MACHINE learning, *CONVOLUTIONAL neural networks, *K-nearest neighbor classification, *DIGITAL technology, *RANDOM forest algorithms
Abstract: One of the significant aspects of our digital world is that data are literally everywhere, and it is increasing. On the other hand, the number of cyberattacks aiming to seize this data and use it illegally is increasing at an exponential rate, and this is the challenge. Therefore, intrusion detection systems (IDS) have attracted considerable interest from researchers and industries. In this regard, machine learning (ML) techniques are playing a pivotal role as they put the responsibility of analyzing enormous amounts of data, finding patterns, classifying intrusions, and solving issues on computers instead of humans. This paper implements two separate classification layers of ML-based algorithms with the recently published NF-UQ-NIDS-v2 dataset, preprocessing two volumes of sample records (100 k and 10 million), utilizing MinMaxScaler, LabelEncoder, selecting superlative features by recursive feature elimination, normalizing the data, and optimizing hyper-parameters for classical algorithms and neural networks. With a small dataset volume, the results of the classical algorithms layer show high detection accuracy rates for support vector (98.26%), decision tree (98.78%), random forest (99.07%), K-nearest neighbors (98.16%), CatBoost (99.04%), and gradient boosting (98.80%). In addition, the layer of neural network algorithms has proven to be a very powerful technology when using deep learning, particularly due to its unique ability to effectively handle enormous amounts of data and detect hidden correlations and patterns; it showed high detection results, which were (98.87%) for long short-term memory and (98.56%) for convolutional neural networks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Speech emotion recognition via graph-based representations.

Author: Pentari, Anastasia, Kafentzis, George, and Tsiknakis, Manolis
Subjects: *EMOTION recognition, *DEEP learning, *AFFECTIVE computing, *MACHINE learning, *COMPARATIVE method, *RANDOM forest algorithms
Abstract: Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost 18 % , 8 % and 13 % , respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Suboptimal capability of individual machine learning algorithms in modeling small-scale imbalanced clinical data of local hospital.

Author: Li, Gang, Li, Chenbi, Wang, Chengli, and Wang, Zeheng
Subjects: *MACHINE learning, *DATA augmentation, *RANDOM forest algorithms, *DEEP learning, *ARTIFICIAL intelligence, *BIOCHEMICAL models
Abstract: In recent years, artificial intelligence (AI) has shown promising applications in various scientific domains, including biochemical analysis research. However, the effectiveness of AI in modeling small-scale, imbalanced datasets remains an open question in such fields. This study explores the capabilities of eight basic AI algorithms, including ridge regression, logistic regression, random forest regression, and others, in modeling a small, imbalanced clinical dataset (total n = 387, class 0 = 27, class 1 = 360) related to the records of the biochemical blood tests from the patients with multiple wasp stings (MWS). Through rigorous evaluation using k-fold cross-validation and comprehensive scoring, we found that none of the models could effectively model the data. Even after fine-tuning the hyperparameters of the best-performing models, the results remained below acceptable thresholds. The study highlights the challenges of applying AI to small-scale datasets with imbalanced groups in biochemical or clinical research and emphasizes the need for novel algorithms tailored to small-scale data. The findings also call for further exploration into techniques such as transfer learning and data augmentation, and they underline the importance of understanding the minimum dataset scale required for effective AI modeling in biochemical contexts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. An Application of Machine Learning Algorithms on the Prediction of the Damage Level of Rubble-Mound Breakwaters.

Author: Saha, Susmita, De, Soumen, and Changdar, Satyasaran
Subjects: *MACHINE learning, *BREAKWATERS, *BOOSTING algorithms, *DEEP learning, *RANDOM forest algorithms
Abstract: The stability analysis of breakwaters is very important to have a safe and economic design of these coastal protective structures and the damage level is one of the most important parameters in this context. In the recent past, machine learning techniques showed immense potential in transforming many industries and processes, for making them more efficient and accurate. In this study, five advanced machine learning algorithms, support vector regression, random forest, Adaboost, gradient boosting, and deep artificial neural network, were employed and analyzed on estimation of the damage level of rubble-mound breakwaters. A large experimental dataset, considering almost every stability variable with their whole ranges, was used in this purpose. Also, a detailed feature analysis is presented to have an insight into the relations between these variables. It was found that the present study had overcome all of the limitations of existing studies related to this field and delivered the highest level of accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Attribution of Runoff Variation in Reservoir Construction Area: Based on a Merged Deep Learning Model and the Budyko Framework.

Author: Zhang, Lilan, Chen, Xiaohong, Huang, Bensheng, Chen, Liangxiong, and Liu, Jie
Subjects: *DEEP learning, *RUNOFF, *EXTREME value theory, *RANDOM forest algorithms, *CLIMATE change
Abstract: This study presents a framework to attribute river runoff variations to the combined effects of reservoir operations, land surface changes, and climate variability. We delineated the data into natural and impacted periods. For the natural period, an integrated Long Short-Term Memory and Random Forest model was developed to accurately simulate both mean and extreme runoff values, outperforming existing models. This model was then used to estimate runoff unaffected by human activities in the impacted period. Our findings indicate stable annual and wet season mean runoff, with a decrease in wet season maximums and an increase in dry season means, while extreme values remained largely unchanged. A Budyko framework incorporating reconstructed runoff revealed that rainfall and land surface changes are the predominant factors influencing runoff variations in wet and dry seasons, respectively, and land surface impacts become more pronounced during the impacted period for both seasons. Human activities dominate dry season runoff variation (93.9%), with climate change at 6.1%, while in the wet season, the split is 64.5% to 35.5%. Climate change and human activities have spontaneously led to reduced runoff during the wet season and increased runoff during the dry season. Only reservoir regulation is found to be linked to human-induced runoff changes, while the effects of land surface changes remain ambiguous. These insights underscore the growing influence of anthropogenic factors on hydrological extremes and quantify the role of reservoirs within the impacts of human activities on runoff. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

686 results on '"RANDOM forest algorithms"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources