Descriptor: "RANDOM forest algorithms" / Search Limiters: Available in Library Collection - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"RANDOM forest algorithms"' showing total 18,111 results

Start Over Descriptor "RANDOM forest algorithms" Search Limiters Available in Library Collection

18,111 results on '"RANDOM forest algorithms"'

1. A machine learning method to infer clusters of galaxies mass radial profiles from mock Sunyaev-Zel'dovich maps with The Three Hundred clusters.

Author: Ferragamo, A., de Andres, D., Sbriglio, A., Cui, W., De Petris, M., Yepes, G., Dupuis, R., Jarraya, M., Lahouli, I., De Luca, F., Gianfagna, G., and Rasia, E.
Subjects: *GALAXY clusters, *SUNYAEV-Zel'dovich effect, *MACHINE learning, *HYDROSTATIC equilibrium, *RANDOM forest algorithms
Abstract: Our study introduces a new machine learning algorithm for estimating 3D cumulative radial profiles of total and gas mass in galaxy clusters from thermal Sunyaev-Zel'dovich (SZ) effect maps. We generate mock images from 2522 simulated clusters, employing an autoencoder and random forest in our approach. Notably, our model makes no prior assumptions about hydrostatic equilibrium. Our results indicate that the model successfully reconstructs unbiased total and gas mass profiles, with a scatter of approximately 10%. We analyse clusters in various dynamical states and mass ranges, finding that our method's accuracy and precision are consistent. We verify the capabilities of our model by comparing it with the hydrostatic equilibrium technique, showing that it accurately recovers total mass profiles without any bias. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. A cascading model for nudging employees towards energy-efficient behaviour in tertiary buildings.

Author: Kalamaras, Ilias, Sánchez-Corcuera, Rubén, Casado-Mansilla, Diego, Tsolakis, Apostolos C., Gómez-Carmona, Oihane, Krinidis, Stelios, Borges, Cruz E., Tzovaras, Dimitrios, and López-de-Ipiña, Diego
Subjects: *NUDGE theory, *GREEN behavior, *BUILT environment, *RANDOM forest algorithms, *ENERGY consumption, *COMMERCIAL buildings
Abstract: Energy-related occupant behaviour in the built environment is considered crucial when aiming towards Energy Efficiency (EE), especially given the notion that people are most often unaware and disengaged regarding the impacts of energy-consuming habits. In order to affect such energy-related behaviour, various approaches have been employed, being the most common the provision of recommendations towards more energy-efficient actions. In this work, the authors extend prior research findings in an effort to automatically identify the optimal Persuasion Strategy (PS), out of ten pre-selected by experts, tailored to a user (i.e., the context to trigger a message, allocate a task or providing cues to enact an action). This process aims to successfully influence the employees' decisions about EE in tertiary buildings. The framework presented in this study utilizes cultural traits and socio-economic information. It is based on one of the largest survey datasets on this subject, comprising responses from 743 users collected through an online survey in four countries across Europe (Spain, Greece, Austria and the UK). The resulting framework was designed as a cascade of sequential data-driven prediction models. The first step employs a particular case of matrix factorisation to rank the ten PP in terms of preference for each user, followed by a random forest regression model that uses these rankings as a filtering step to compute scores for each PP and conclude with the best selection for each user. An ex-post assessment of the individual steps and the combined ensemble revealed increased accuracy over baseline non-personalised methods. Furthermore, the analysis also sheds light on important user characteristics to take into account for future interventions related to EE and the most effective persuasion strategies to adopt based on user data. Discussion and implications of the reported results are provided in the text regarding the flourishing field of personalisation to motivate pro-environmental behaviour change in tertiary buildings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Predicting malaria outbreak in The Gambia using machine learning techniques.

Author: Khan, Ousman, Ajadi, Jimoh Olawale, and Hossain, M. Pear
Subjects: *MACHINE learning, *MALARIA, *ARTIFICIAL neural networks, *SUPPORT vector machines, *RANDOM forest algorithms, *PARASITIC diseases
Abstract: Malaria is the most common cause of death among the parasitic diseases. Malaria continues to pose a growing threat to the public health and economic growth of nations in the tropical and subtropical parts of the world. This study aims to address this challenge by developing a predictive model for malaria outbreaks in each district of The Gambia, leveraging historical meteorological data. To achieve this objective, we employ and compare the performance of eight machine learning algorithms, including C5.0 decision trees, artificial neural networks, k-nearest neighbors, support vector machines with linear and radial kernels, logistic regression, extreme gradient boosting, and random forests. The models are evaluated using 10-fold cross-validation during the training phase, repeated five times to ensure robust validation. Our findings reveal that extreme gradient boosting and decision trees exhibit the highest prediction accuracy on the testing set, achieving 93.3% accuracy, followed closely by random forests with 91.5% accuracy. In contrast, the support vector machine with a linear kernel performs less favorably, showing a prediction accuracy of 84.8% and underperforming in specificity analysis. Notably, the integration of both climatic and non-climatic features proves to be a crucial factor in accurately predicting malaria outbreaks in The Gambia. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Using optical coherence tomography to assess luster of pearls: technique suitability and insights.

Author: Zhou, Yang, Zhou, Lifeng, Yan, Jun, Yan, Xuejun, and Chen, Zhengwei
Subjects: *OPTICAL coherence tomography, *SPECKLE interference, *SUPPORT vector machines, *FRACTAL dimensions, *RANDOM forest algorithms
Abstract: Luster is one of the vital indexes in pearl grading. To find a fast, nondestructive, and low-cost grading method, optical coherence tomography (OCT) is introduced to predict the luster grade through the texture features. After background removal, flattening, and segmentation, the speckle pattern of the region of interest is described by seven kinds of feature textures, including center-symmetric auto-correlation (CSAC), fractal dimension (FD), Gabor, gray level co-occurrence matrix (GLCM), histogram of oriented gradients (HOG), laws texture energy (LAWS), and local binary patterns (LBP). To find the relations between speckle-derived texture features and luster grades, four Four groups of pearl samples were used in the experiment to detect texture differences based on support vector machines (SVMs) and random forest classifier (RFC)) for investigating the relations between speckle-derived texture features and luster grades. The precision, recall, F1-score, and accuracy are more significant than 0.9 in several simulations, even after dimension reduction. This demonstrates that the texture feature from OCT images can be applied to class the pearl luster based on speckle changes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Employing machine learning for enhanced abdominal fat prediction in cavitation post-treatment.

Author: Abdel Hady, Doaa A., Mabrouk, Omar M., and Abd El-Hafeez, Tarek
Subjects: *ABDOMINAL adipose tissue, *CAVITATION, *FAT, *RANDOM forest algorithms, *FEATURE selection, *WAIST circumference, *MACHINE learning
Abstract: This study investigates the application of cavitation in non-invasive abdominal fat reduction and body contouring, a topic of considerable interest in the medical and aesthetic fields. We explore the potential of cavitation to alter abdominal fat composition and delve into the optimization of fat prediction models using advanced hyperparameter optimization techniques, Hyperopt and Optuna. Our objective is to enhance the predictive accuracy of abdominal fat dynamics post-cavitation treatment. Employing a robust dataset with abdominal fat measurements and cavitation treatment parameters, we evaluate the efficacy of our approach through regression analysis. The performance of Hyperopt and Optuna regression models is assessed using metrics such as mean squared error, mean absolute error, and R-squared score. Our results reveal that both models exhibit strong predictive capabilities, with R-squared scores reaching 94.12% and 94.11% for post-treatment visceral fat, and 71.15% and 70.48% for post-treatment subcutaneous fat predictions, respectively. Additionally, we investigate feature selection techniques to pinpoint critical predictors within the fat prediction models. Techniques including F-value selection, mutual information, recursive feature elimination with logistic regression and random forests, variance thresholding, and feature importance evaluation are utilized. The analysis identifies key features such as BMI, waist circumference, and pretreatment fat levels as significant predictors of post-treatment fat outcomes. Our findings underscore the effectiveness of hyperparameter optimization in refining fat prediction models and offer valuable insights for the advancement of non-invasive fat reduction methods. This research holds important implications for both the scientific community and clinical practitioners, paving the way for improved treatment strategies in the realm of body contouring. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Veillonella and Bacteroides are associated with gestational diabetes mellitus exposure and gut microbiota immaturity.

Author: Valdez-Palomares, Fernanda, Aguilar, Jaqueline Reyes, Pérez-Campos, Eduardo, Mayoral, Laura Pérez-Campos, Meraz-Cruz, Noemi, and Palacios-González, Berenice
Subjects: *GUT microbiome, *BACTEROIDES fragilis, *GESTATIONAL diabetes, *BACTEROIDES, *VITAMIN B12, *FUNCTIONAL groups, *RANDOM forest algorithms
Abstract: Background: Dysbiosis during childhood impacts the configuration and maturation of the microbiota. The immaturity of the infant microbiota is linked with the development of inflammatory, allergic, and dysmetabolic diseases. Aims: To identify taxonomic changes associated with age and GDM and classify the maturity of the intestinal microbiota of children of mothers with GDM and children without GDM (n-GDM). Methods: Next-generation sequencing was used to analyze the V3–V4 region of 16S rRNA gene. QIIME2 and Picrust2 were used to determine the difference in the relative abundance of bacterial genera between the study groups and to predict the functional profile of the intestinal microbiota. Results: According to age, the older GDM groups showed a lower alpha diversity and different abundance of Enterobacteriaceae, Veillonella, Clostridiales, and Bacteroides. Regarding the functional profile, PWY-7377 and K05895 associated with Vitamin B12 metabolism were reduced in GDM groups. Compared to n-GDM group, GDM offspring had microbiota immaturity as age-discriminatory taxa in random forest failed to classify GDM offspring according to developmental age (OOB error 81%). Conclusion. Offspring from mothers with GDM have a distinctive taxonomic profile related to taxa associated with gut microbiota immaturity. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Bi-stage feature selection for crop mapping using grey wolf metaheuristic optimization.

Author: Moustafa, Marwa S., Mahmoud, Amira S., Farg, Eslam, Nabil, Mohsen, and Arafat, Sayed M.
Subjects: *METAHEURISTIC algorithms, *FEATURE selection, *MACHINE learning, *RANDOM forest algorithms, *CROPS, *REMOTE sensing, *MATHEMATICAL optimization
Abstract: Machine learning classifiers have been widely used for crop mapping; however, employing numerous features can reduce classifier accuracy. Feature selection is a widely recognized technique that addresses the issue of high dimensionality by choosing the most relevant subset of features with the highest degree of relevance and practicality. Therefore, the optimal combination of feature selection methods and classifiers for high-precision crop mapping remains an open problem. In this study, we introduce a novel hybrid feature selection approach to obtain the optimal subset of features for effective crop mapping. Initially, spectro-temporal remote sensing features from the dataset were ranked using two distinct feature selection techniques: Mutual Information (MI) and ReliefF filter. The most relevant features from each filter-based approach were combined into a union subset. Subsequently, the Gray Wolf Optimization (GWO), a metaheuristic optimization technique, was applied to enhance the feature set generated in the initial step. In the final stage, a random forest classifier leveraged the optimized feature subset for accurate crop type prediction. The effectiveness of the proposed approach was evaluated in Behiera governorate, Egypt. Comparative analysis against existing crop mapping methods demonstrated the superior performance of the proposed approach, achieving an accuracy rate of 82%. Furthermore, this approach showcased a significant reduction in the feature count, streamlining the set from 308 down to only 26 features. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Spatiotemporal Landsat-Sentinel-2 satellite imagery-based Hybrid Deep Neural network for paddy crop prediction using Google Earth engine.

Author: Saini, Preeti and Nagpal, Bharti
Subjects: *ARTIFICIAL neural networks, *CROP yields, *REMOTE-sensing images, *RANDOM forest algorithms, *PADDY fields, *NUTRITIONAL requirements
Abstract: • Proposed a Hybrid Model for Paddy Crop prediction based on phenological parameters. • CNN-LSTM attained RMSE value of 0.219 t/ha compared to other existing methods. • Random Forest method achieved highest accuracy of 96.6% in paddy classification. Rice is one of the predominant food sources to fulfill the dietary requirements of well-being in India. Therefore, accurate and timely paddy crop yield prediction is crucial to ensure the food security of the country. In this direction, the present study proposed a hybrid deep-learning method based on Conv-1D and LSTM layers using the classification-derived phenological with meteorological parameters for paddy crop yield prediction. The paddy crop classification has been conducted using high-resolution (10 m) multispectral imagery based on GPS coordinates collected during the paddy field visits to extract the phenological parameters for input to the prediction model. In this context, the efficiency of Random Forest, Naïve Bayes, SVM, and Gradient Tree boost classifiers was assessed. Furthermore, we have also analyzed the accuracy of Landsat-8, Sentinel-1 GRD, and Sentinel-2 satellite imagery in paddy crop classification based on area estimation. The Statistical Abstract of Haryana was utilized to validate the paddy crop area estimation and yield prediction. The classification outcomes showed that the Random Forest method attained the highest accuracy of 96.6 % compared to other GEE-based classifiers. The proposed Hybrid Deep learning approach achieved an RMSE value of 0.219 t/ha compared to CNN, LSTM, CNN-Bi-LSTM, and Regression techniques for crop yield prediction. The study conclusion highlighted that the sentinel-2 satellite imagery performed well and found that the proposed hybrid approach provided an alternative for paddy crop yield prediction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Comparative analysis of multi-source data for machine learning-based LAI estimation in Argania spinosa.

Author: Mouafik, Mohamed, Fouad, Mounir, Audet, Felix Antoine, and El Aboudi, Ahmed
Subjects: *LEAF area index, *RANDOM forest algorithms, *DATA analysis, *COMPARATIVE studies, *REMOTE-sensing images
Abstract: • Synergistic use of drone, satellite, and machine learning technologies improves LAI estimation. • Robust validation and advanced performance metrics confirm model accuracy and precision. • Utilizing Vegetation Indices to enhance the precision of LAI quantification. In this study, we conducted a comprehensive assessment of Leaf Area Index (LAI) estimation using three distinct sources of satellite data: Sentinel-2 imagery, drone imagery (UAVs), and Mohammed VI satellite data. The main objective was to identify the most reliable and precise dataset for predicting LAI, with a focus on evaluating the performance of Random Forest models. For Sentinel-2 imagery, our Random Forest model achieves a robust R-squared (R2) value of 0.89, signifying a strong alignment between predicted and measured LAI values. The associated root-mean-square error (RMSE) is 0.4, indicating high predictive accuracy. In the context of UAVs, our Random Forest model excels, exhibiting an impressive R2 value of 0.93, highlighting a substantial correlation between predicted and measured LAI. The RMSE for drone imagery stands at 0.37, showcasing exceptional predictive accuracy. Finally, the Random Forest model trained on Mohammed VI satellite data yields an R2 value of 0.92, underlining its strong fit with measured LAI values. The RMSE for Mohammed VI imagery is 0.39, further underscoring the model's exceptional predictive accuracy. This comparative analysis underscores the importance of selecting the most suitable satellite data source for LAI estimation in Argania spinosa. UAV imagery emerges as the most accurate choice, closely followed by Mohammed VI Satellite and Sentinel-2 imagery. These findings offer valuable insights for effective monitoring of Argania spinosa and advancing sustainable land management practices in rural ecosystems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Machine Learning Algorithms for Predicting Energy Consumption in Educational Buildings.

Author: Elhabyb, Khaoula, Baina, Amine, Bellafkih, Mostafa, and Deifalla, Ahmed Farouk
Subjects: *MACHINE learning, *ENERGY consumption of buildings, *ENERGY consumption, *STANDARD deviations, *CONSUMPTION (Economics), *COMMERCIAL buildings, *INTELLIGENT buildings, *RANDOM forest algorithms
Abstract: In the past few years, there has been a notable interest in the application of machine learning methods to enhance energy efficiency in the smart building industry. The paper discusses the use of machine learning in smart buildings to improve energy efficiency by analyzing data on energy usage, occupancy patterns, and environmental conditions. The study focuses on implementing and evaluating energy consumption prediction models using algorithms like long short-term memory (LSTM), random forest, and gradient boosting regressor. Real-life case studies on educational buildings are conducted to assess the practical applicability of these models. The data is rigorously analyzed and preprocessed, and performance metrics such as root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used to compare the effectiveness of the algorithms. The results highlight the importance of tailoring predictive models to the specific characteristics of each building's energy consumption. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Advancing 100m sprint performance prediction: A machine learning approach to velocity curve modeling and performance correlation.

Author: Tam, Chung Kit and Yao, Zai-Fu
Subjects: *SPRINTING, *MACHINE learning, *RUNNING speed, *MACHINE performance, *VELOCITY, *SPORTS sciences, *RANDOM forest algorithms
Abstract: This study presents a novel approach to modeling the velocity-time curve in 100m sprinting by integrating machine learning algorithms. It critically addresses the limitations of traditional speed models, which often require extensive and intricate data collection, by proposing a more accessible and accurate method using fewer variables. The research utilized data from various international track events from 1987 to 2019. Two machine learning models, Random Forest (RF) and Neural Network (NN), were employed to predict the velocity-time curve, focusing on the acceleration phase of the sprint. The models were evaluated against the traditional exponential speed model using Mean Squared Error (MSE), with the NN model demonstrating superior performance. Additionally, the study explored the correlation between maximum velocity, the time of maximum velocity occurrence, the duration of the maximum speed phase, and the overall 100m sprint time. The findings indicate a strong negative correlation between maximum velocity and final time, offering new insights into the dynamics of sprinting performance. This research contributes significantly to the field of sports science, particularly in optimizing training and performance analysis in sprinting. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Artificial neural network, machine learning modelling of compressive strength of recycled coarse aggregate based self-compacting concrete.

Author: Jagadesh, P., Khan, Afzal Hussain, Priya, B. Shanmuga, Asheeka, A., Zoubir, Zineb, Magbool, Hassan M., Alam, Shamshad, and Bakather, Omer Y.
Subjects: *SELF-consolidating concrete, *ARTIFICIAL neural networks, *MACHINE learning, *COMPRESSIVE strength, *RANDOM forest algorithms, *DATA distribution, *SILICATE cements (Dentistry)
Abstract: This research study aims to understand the application of Artificial Neural Networks (ANNs) to forecast the Self-Compacting Recycled Coarse Aggregate Concrete (SCRCAC) compressive strength. From different literature, 602 available data sets from SCRCAC mix designs are collected, and the data are rearranged, reconstructed, trained and tested for the ANN model development. The models were established using seven input variables: the mass of cementitious content, water, natural coarse aggregate content, natural fine aggregate content, recycled coarse aggregate content, chemical admixture and mineral admixture used in the SCRCAC mix designs. Two normalization techniques are used for data normalization to visualize the data distribution. For each normalization technique, three transfer functions are used for modelling. In total, six different types of models were run in MATLAB and used to estimate the 28th day SCRCAC compressive strength. Normalization technique 2 performs better than 1 and TANSING is the best transfer function. The best k-fold cross-validation fold is k = 7. The coefficient of determination for predicted and actual compressive strength is 0.78 for training and 0.86 for testing. The impact of the number of neurons and layers on the model was performed. Inputs from standards are used to forecast the 28th day compressive strength. Apart from ANN, Machine Learning (ML) techniques like random forest, extra trees, extreme boosting and light gradient boosting techniques are adopted to predict the 28th day compressive strength of SCRCAC. Compared to ML, ANN prediction shows better results in terms of sensitive analysis. The study also extended to determine 28th day compressive strength from experimental work and compared it with 28th day compressive strength from ANN best model. Standard and ANN mix designs have similar fresh and hardened properties. The average compressive strength from ANN model and experimental results are 39.067 and 38.36 MPa, respectively with correlation coefficient is 1. It appears that ANN can validly predict the compressive strength of concrete. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Conservation units alone are insufficient to protect Brazilian Amazonian chelonians.

Author: Sousa, Loyriane Moura, Correia, Letícia Lima, Alexandre, Rafaela Jemely Rodrigues, Pena, Simone Almeida, and Vieira, Thiago Bernardi
Subjects: *INDIGENOUS peoples of South America, *SPECIES distribution, *LAND cover, *PROTECTED areas, *SUPPORT vector machines, *RANDOM forest algorithms
Abstract: The creation of protected areas (PAs) is not always based on science; consequently, some aquatic species may not receive the same level of protection as terrestrial ones. The objective of this study was to identify priority areas for the conservation of chelonians in the Brazilian Amazon basin and assess the contribution of PAs, distinguishing between Full Protection Areas, Sustainable Use Areas, and Indigenous Lands for group protection. The entire species modeling procedure was carried out using Species Distribution Models. Location records were obtained from platforms such as SpeciesLink, GBIF, the Hydroatlas database, and WorldClim for bioclimatic variables adjusted with algorithms like Maximum Entropy, Random Forest, Support Vector Machine, and Gaussian-Bayesian. Indigenous lands cover more than 50% of the distribution areas of chelonian species in the Brazilian Amazon. Protected areas with higher conservation importance (Full Protection Areas and Sustainable Use Areas) hold less than 15% of the combined species distribution. Researchers face significant challenges when making decisions with models, especially in conservation efforts involving diverse taxa that differ significantly from one another within a group of individuals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Hspb1 and Lgals3 in spinal neurons are closely associated with autophagy following excitotoxicity based on machine learning algorithms.

Author: Yan, Lei, Li, Zihao, Li, Chuanbo, Chen, Jingyu, Zhou, Xun, Cui, Jiaming, Liu, Peng, Shen, Chong, Chen, Chu, Hong, Hongxiang, Xu, Guanhua, and Cui, Zhiming
Subjects: *AUTOPHAGY, *MACHINE learning, *SPINAL cord injuries, *GLUTAMIC acid, *NEURONS, *RANDOM forest algorithms, *GENE regulatory networks, *PROTEIN-protein interactions
Abstract: Excitotoxicity represents the primary cause of neuronal death following spinal cord injury (SCI). While autophagy plays a critical and intricate role in SCI, the specific mechanism underlying the relationship between excitotoxicity and autophagy in SCI has been largely overlooked. In this study, we isolated primary spinal cord neurons from neonatal rats and induced excitotoxic neuronal injury by high concentrations of glutamic acid, mimicking an excitotoxic injury model. Subsequently, we performed transcriptome sequencing. Leveraging machine learning algorithms, including weighted correlation network analysis (WGCNA), random forest analysis (RF), and least absolute shrinkage and selection operator analysis (LASSO), we conducted a comprehensive investigation into key genes associated with spinal cord neuron injury. We also utilized protein-protein interaction network (PPI) analysis to identify pivotal proteins regulating key gene expression and analyzed key genes from public datasets (GSE2599, GSE20907, GSE45006, and GSE174549). Our findings revealed that six genes—Anxa2, S100a10, Ccng1, Timp1, Hspb1, and Lgals3—were significantly upregulated not only in vitro in neurons subjected to excitotoxic injury but also in rats with subacute SCI. Furthermore, Hspb1 and Lgals3 were closely linked to neuronal autophagy induced by excitotoxicity. Our findings contribute to a better understanding of excitotoxicity and autophagy, offering potential targets and a theoretical foundation for SCI diagnosis and treatment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data.

Author: Chen, Tianjie and Kabir, Md Faisal
Subjects: *RNA sequencing, *RECEIVER operating characteristic curves, *SUPPORT vector machines, *DECISION trees, *RANDOM forest algorithms
Abstract: In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Machine learning approach for the development of a crucial tool in suicide prevention: The Suicide Crisis Inventory-2 (SCI-2) Short Form.

Author: De Luca, Gabriele P., Parghi, Neelang, El Hayek, Rawad, Bloch-Elkouby, Sarah, Peterkin, Devon, Wolfe, Amber, Rogers, Megan L., and Galynker, Igor
Subjects: *SUICIDE prevention, *MACHINE learning, *SUICIDAL behavior, *SUICIDAL ideation, *BOOSTING algorithms, *RANDOM forest algorithms
Abstract: The Suicide Crisis Syndrome (SCS) describes a suicidal mental state marked by entrapment, affective disturbance, loss of cognitive control, hyperarousal, and social withdrawal that has predictive capacity for near-term suicidal behavior. The Suicide Crisis Inventory-2 (SCI-2), a reliable clinical tool that assesses SCS, lacks a short form for use in clinical settings which we sought to address with statistical analysis. To address this need, a community sample of 10,357 participants responded to an anonymous survey after which predictive performance for suicidal ideation (SI) and SI with preparatory behavior (SI-P) was measured using logistic regression, random forest, and gradient boosting algorithms. Four-fold cross-validation was used to split the dataset in 1,000 iterations. We compared rankings to the SCI–Short Form to inform the short form of the SCI-2. Logistic regression performed best in every analysis. The SI results were used to build the SCI-2-Short Form (SCI-2-SF) utilizing the two top ranking items from each SCS criterion. SHAP analysis of the SCI-2 resulted in meaningful rankings of its items. The SCI-2-SF, derived from these rankings, will be tested for predictive validity and utility in future studies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Preoperative prediction model for risk of readmission after total joint replacement surgery: a random forest approach leveraging NLP and unfairness mitigation for improved patient care and cost-effectiveness.

Author: Digumarthi, Varun, Amin, Tapan, Kanu, Samuel, Mathew, Joshua, Edwards, Bryan, Peterson, Lisa A, Lundy, Matthew E, and Hegarty, Karen E
Subjects: *RISK assessment, *PREOPERATIVE period, *RANDOM forest algorithms, *PREDICTION models, *PATIENT readmissions, *DESCRIPTIVE statistics, *NATURAL language processing, *NURSING care facilities, *ARTIFICIAL joints, *ELECTRONIC health records, *DATA analysis software, *MACHINE learning, *ALGORITHMS
Abstract: Background: The Center for Medicare and Medicaid Services (CMS) imposes payment penalties for readmissions following total joint replacement surgeries. This study focuses on total hip, knee, and shoulder arthroplasty procedures as they account for most joint replacement surgeries. Apart from being a burden to healthcare systems, readmissions are also troublesome for patients. There are several studies which only utilized structured data from Electronic Health Records (EHR) without considering any gender and payor bias adjustments. Methods: For this study, dataset of 38,581 total knee, hip, and shoulder replacement surgeries performed from 2015 to 2021 at Novant Health was gathered. This data was used to train a random forest machine learning model to predict the combined endpoint of emergency department (ED) visit or unplanned readmissions within 30 days of discharge or discharge to Skilled Nursing Facility (SNF) following the surgery. 98 features of laboratory results, diagnoses, vitals, medications, and utilization history were extracted. A natural language processing (NLP) model finetuned from Clinical BERT was used to generate an NLP risk score feature for each patient based on their clinical notes. To address societal biases, a feature bias analysis was performed in conjunction with propensity score matching. A threshold optimization algorithm from the Fairlearn toolkit was used to mitigate gender and payor biases to promote fairness in predictions. Results: The model achieved an Area Under the Receiver Operating characteristic Curve (AUROC) of 0.738 (95% confidence interval, 0.724 to 0.754) and an Area Under the Precision-Recall Curve (AUPRC) of 0.406 (95% confidence interval, 0.384 to 0.433). Considering an outcome prevalence of 16%, these metrics indicate the model's ability to accurately discriminate between readmission and non-readmission cases within the context of total arthroplasty surgeries while adjusting patient scores in the model to mitigate bias based on patient gender and payor. Conclusion: This work culminated in a model that identifies the most predictive and protective features associated with the combined endpoint. This model serves as a tool to empower healthcare providers to proactively intervene based on these influential factors without introducing bias towards protected patient classes, effectively mitigating the risk of negative outcomes and ultimately improving quality of care regardless of socioeconomic factors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs.

Author: Zeng, Xin, Meng, Fan-Fang, Wen, Meng-Liang, Li, Shu-Juan, and Li, Yi
Subjects: *GRAPH neural networks, *PROTEIN-protein interactions, *PARALLEL algorithms, *ISOMORPHISMS, *RANDOM forest algorithms
Abstract: Most proteins exert their functions by interacting with other proteins, making the identification of protein-protein interactions (PPI) crucial for understanding biological activities, pathological mechanisms, and clinical therapies. Developing effective and reliable computational methods for predicting PPI can significantly reduce the time-consuming and labor-intensive associated traditional biological experiments. However, accurately identifying the specific categories of protein-protein interactions and improving the prediction accuracy of the computational methods remain dual challenges. To tackle these challenges, we proposed a novel graph neural network method called GNNGL-PPI for multi-category prediction of PPI based on global graphs and local subgraphs. GNNGL-PPI consisted of two main components: using Graph Isomorphism Network (GIN) to extract global graph features from PPI network graph, and employing GIN As Kernel (GIN-AK) to extract local subgraph features from the subgraphs of protein vertices. Additionally, considering the imbalanced distribution of samples in each category within the benchmark datasets, we introduced an Asymmetric Loss (ASL) function to further enhance the predictive performance of the method. Through evaluations on six benchmark test sets formed by three different dataset partitioning algorithms (Random, BFS, DFS), GNNGL-PPI outperformed the state-of-the-art multi-category prediction methods of PPI, as measured by the comprehensive performance evaluation metric F1-measure. Furthermore, interpretability analysis confirmed the effectiveness of GNNGL-PPI as a reliable multi-category prediction method for predicting protein-protein interactions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Machine learning-based algorithm identifies key mitochondria-related genes in non-alcoholic steatohepatitis.

Author: Dai, Longfei, Jiang, Renao, Zhan, Zhicheng, Zhang, Liangliang, Qian, Yuyang, Xu, Xinjian, Yang, Wenqi, and Zhang, Zhen
Subjects: *NON-alcoholic fatty liver disease, *MACHINE learning, *RANDOM forest algorithms, *THYMIDYLATE synthase, *PLANT mitochondria, *APOPTOSIS, *LIPID synthesis
Abstract: Background: Evidence suggests that hepatocyte mitochondrial dysfunction leads to abnormal lipid metabolism, redox imbalance, and programmed cell death, driving the onset and progression of non-alcoholic steatohepatitis (NASH). Identifying hub mitochondrial genes linked to NASH may unveil potential therapeutic targets. Methods: Mitochondrial hub genes implicated in NASH were identified via analysis using 134 algorithms. Results: The Random Forest algorithm (RF), the most effective among the 134 algorithms, identified three genes: Aldo–keto reductase family 1 member B10 (AKR1B10), thymidylate synthase (TYMS), and triggering receptor expressed in myeloid cell 2 (TREM2). They were upregulated and positively associated with genes promoting inflammation, genes involved in lipid synthesis, fibrosis, and nonalcoholic steatohepatitis activity scores in patients with NASH. Moreover, using these three genes, patients with NASH were accurately categorized into cluster 1, exhibiting heightened disease severity, and cluster 2, distinguished by milder disease activity. Conclusion: These three genes are pivotal mitochondrial genes implicated in NASH progression. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Comparing fatal crash risk factors by age and crash type by using machine learning techniques.

Author: Alshehri, Abdulaziz H., Alanazi, Fayez, Yosri, Ahmed. M., and Yasir, Muhammad
Subjects: *TRAFFIC regulations, *RANDOM forest algorithms, *SAFETY regulations, *MACHINE learning, *TRAFFIC accidents, *WEATHER, *TRAFFIC safety
Abstract: This study aims to use machine learning methods to examine the causative factors of significant crashes, focusing on accident type and driver's age. In this study, a wide-ranging data set from Jeddah city is employed to look into various factors, such as whether the driver was male or female, where the vehicle was situated, the prevailing weather conditions, and the efficiency of four machine learning algorithms, specifically XGBoost, Catboost, LightGBM and RandomForest. The results show that the XGBoost Model (accuracy of 95.4%), the CatBoost model (94% accuracy), and the LightGBM model (94.9% accuracy) were superior to the random forest model with 89.1% accuracy. It is worth noting that the LightGBM had the highest accuracy of all models. This shows various subtle changes in models, illustrating the need for more analyses while assessing vehicle accidents. Machine learning is also a transforming tool in traffic safety analysis while providing vital guidelines for developing accurate traffic safety regulations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Machine learning-empowered sleep staging classification using multi-modality signals.

Author: Satapathy, Santosh Kumar, Brahma, Biswajit, Panda, Baidyanath, Barsocchi, Paolo, and Bhoi, Akash Kumar
Subjects: *SLEEP stages, *FEATURE extraction, *RANDOM forest algorithms, *FEATURE selection, *DATABASES
Abstract: The goal is to enhance an automated sleep staging system's performance by leveraging the diverse signals captured through multi-modal polysomnography recordings. Three modalities of PSG signals, namely electroencephalogram (EEG), electrooculogram (EOG), and electromyogram (EMG), were considered to obtain the optimal fusions of the PSG signals, where 63 features were extracted. These include frequency-based, time-based, statistical-based, entropy-based, and non-linear-based features. We adopted the ReliefF (ReF) feature selection algorithms to find the suitable parts for each signal and superposition of PSG signals. Twelve top features were selected while correlated with the extracted feature sets' sleep stages. The selected features were fed into the AdaBoost with Random Forest (ADB + RF) classifier to validate the chosen segments and classify the sleep stages. This study's experiments were investigated by obtaining two testing schemes: epoch-wise testing and subject-wise testing. The suggested research was conducted using three publicly available datasets: ISRUC-Sleep subgroup1 (ISRUC-SG1), sleep-EDF(S-EDF), Physio bank CAP sleep database (PB-CAPSDB), and S-EDF-78 respectively. This work demonstrated that the proposed fusion strategy overestimates the common individual usage of PSG signals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Development and validation of machine learning models to predict the need for haemostatic therapy in acute upper gastrointestinal bleeding.

Author: Nazarian, Scarlet, Lo, Frank Po Wen, Qiu, Jianing, Patel, Nisha, Lo, Benny, and Ayaru, Lakshmana
Subjects: *HEMOSTATICS, *RISK assessment, *RANDOM forest algorithms, *PREDICTIVE tests, *GASTROINTESTINAL hemorrhage, *ACUTE diseases, *PREDICTION models, *HUMAN services programs, *RECEIVER operating characteristic curves, *RESEARCH funding, *EVALUATION of human services programs, *RETROSPECTIVE studies, *DESCRIPTIVE statistics, *LONGITUDINAL method, *OPERATIVE surgery, *ENDOSCOPIC gastrointestinal surgery, *INTERVENTIONAL radiology, *MACHINE learning, *COMPARATIVE studies, *CONFIDENCE intervals, *DATA analysis software, *SENSITIVITY & specificity (Statistics)
Abstract: Background: Acute upper gastrointestinal bleeding (AUGIB) is a major cause of morbidity and mortality. This presentation however is not universally high risk as only 20–30% of bleeds require urgent haemostatic therapy. Nevertheless, the current standard of care is for all patients admitted to an inpatient bed to undergo endoscopy within 24 h for risk stratification which is invasive, costly and difficult to achieve in routine clinical practice. Objectives: To develop novel non-endoscopic machine learning models for AUGIB to predict the need for haemostatic therapy by endoscopic, radiological or surgical intervention. Design: A retrospective cohort study Method: We analysed data from patients admitted with AUGIB to hospitals from 2015 to 2020 (n = 970). Machine learning models were internally validated to predict the need for haemostatic therapy. The performance of the models was compared to the Glasgow-Blatchford score (GBS) using the area under receiver operating characteristic (AUROC) curves. Results: The random forest classifier [AUROC 0.84 (0.80–0.87)] had the best performance and was superior to the GBS [AUROC 0.75 (0.72–0.78), p < 0.001] in predicting the need for haemostatic therapy in patients with AUGIB. A GBS cut-off of ⩾12 was associated with an accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of 0.74, 0.49, 0.81, 0.41 and 0.85, respectively. The Random Forrest model performed better with an accuracy, sensitivity, specificity, PPV and NPV of 0.82, 0.54, 0.90, 0.60 and 0.88, respectively. Conclusion: We developed and validated a machine learning algorithm with high accuracy and specificity in predicting the need for haemostatic therapy in AUGIB. This could be used to risk stratify high-risk patients to urgent endoscopy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Predictive modelling of transport decisions and resources optimisation in pre-hospital setting using machine learning techniques.

Author: Farhat, Hassan, Makhlouf, Ahmed, Gangaram, Padarath, El Aifa, Kawther, Howland, Ian, Babay Ep Rekik, Fatma, Abid, Cyrine, Khenissi, Mohamed Chaker, Castle, Nicholas, Al-Shaikh, Loua, Khadhraoui, Moncef, Gargouri, Imed, Laughton, James, and Alinier, Guillaume
Subjects: *EMERGENCY medical services, *PREDICTION models, *TRANSPORTATION of patients, *SUPPORT vector machines, *RANDOM forest algorithms
Abstract: Background: The global evolution of pre-hospital care systems faces dynamic challenges, particularly in multinational settings. Machine learning (ML) techniques enable the exploration of deeply embedded data patterns for improved patient care and resource optimisation. This study's objective was to accurately predict cases that necessitated transportation versus those that did not, using ML techniques, thereby facilitating efficient resource allocation. Methods: ML algorithms were utilised to predict patient transport decisions in a Middle Eastern national pre-hospital emergency medical care provider. A comprehensive dataset comprising 93,712 emergency calls from the 999-call centre was analysed using R programming language. Demographic and clinical variables were incorporated to enhance predictive accuracy. Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost) algorithms were trained and validated. Results: All the trained algorithm models, particularly XGBoost (Accuracy = 83.1%), correctly predicted patients' transportation decisions. Further, they indicated statistically significant patterns that could be leveraged for targeted resource deployment. Moreover, the specificity rates were high; 97.96% in RF and 95.39% in XGBoost, minimising the incidence of incorrectly identified "Transported" cases (False Positive). Conclusion: The study identified the transformative potential of ML algorithms in enhancing the quality of pre-hospital care in Qatar. The high predictive accuracy of the employed models suggested actionable avenues for day and time-specific resource planning and patient triaging, thereby having potential to contribute to pre-hospital quality, safety, and value improvement. These findings pave the way for more nuanced, data-driven quality improvement interventions with significant implications for future operational strategies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Detecting the symptoms of Parkinson's disease with non-standard video.

Author: Mifsud, Joseph, Embry, Kyle R., Macaluso, Rebecca, Lonini, Luca, Cotton, R. James, Simuni, Tanya, and Jayaraman, Arun
Subjects: *PARKINSON'S disease, *SUPPORT vector machines, *RANDOM forest algorithms, *MACHINE learning, *SYMPTOMS, *MOTORS
Abstract: Background: Neurodegenerative diseases, such as Parkinson's disease (PD), necessitate frequent clinical visits and monitoring to identify changes in motor symptoms and provide appropriate care. By applying machine learning techniques to video data, automated video analysis has emerged as a promising approach to track and analyze motor symptoms, which could facilitate more timely intervention. However, existing solutions often rely on specialized equipment and recording procedures, which limits their usability in unstructured settings like the home. In this study, we developed a method to detect PD symptoms from unstructured videos of clinical assessments, without the need for specialized equipment or recording procedures. Methods: Twenty-eight individuals with Parkinson's disease completed a video-recorded motor examination that included the finger-to-nose and hand pronation-supination tasks. Clinical staff provided ground truth scores for the level of Parkinsonian symptoms present. For each video, we used a pre-existing model called PIXIE to measure the location of several joints on the person's body and quantify how they were moving. Features derived from the joint angles and trajectories, designed to be robust to recording angle, were then used to train two types of machine-learning classifiers (random forests and support vector machines) to detect the presence of PD symptoms. Results: The support vector machine trained on the finger-to-nose task had an F1 score of 0.93 while the random forest trained on the same task yielded an F1 score of 0.85. The support vector machine and random forest trained on the hand pronation-supination task had F1 scores of 0.20 and 0.33, respectively. Conclusion: These results demonstrate the feasibility of developing video analysis tools to track motor symptoms across variable perspectives. These tools do not work equally well for all tasks, however. This technology has the potential to overcome barriers to access for many individuals with degenerative neurological diseases like PD, providing them with a more convenient and timely method to monitor symptom progression, without requiring a structured video recording procedure. Ultimately, more frequent and objective home assessments of motor function could enable more precise telehealth optimization of interventions to improve clinical outcomes inside and outside of the clinic. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Application of Deep Learning in Sea Surface Height Estimation of GNSS Data Sets.

Author: Su, Yucheng, Fu, Shuai, Jiao, Boyang, Su, Yekang, Mao, Taoning, He, Yuping, and Jiang, Yi
Subjects: *CONVOLUTIONAL neural networks, *GLOBAL Positioning System, *DEEP learning, *STANDARD deviations, *RANDOM forest algorithms
Abstract: In this work, we used the convolutional neural network (CNN) method to invert sea surface height (SSH) from the Global Navigation Satellite System (GNSS) delayed Doppler map (DDM) data during 2009–2017 and compared the CNN inversion data with those obtained from traditional simple random forest (RF) method. SSH observations from the OSTM/Jason-2 satellite were used to judge the merits of the two methods. The results show that both methods yield good SSH inversion results, but when the training set is 9000, the root mean square errors of the SSH inversion results based on the CNN and the RF method are 16.78 and 15.96 respectively; as the training set increases above 9000, the accuracy of the CNN method is significantly better than that of the RF method. This suggests that SSH inversion based on the CNN method will become more advantageous as more data become available. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Leveraging Temporal Information to Improve Machine Learning-Based Calibration Techniques for Low-Cost Air Quality Sensors.

Author: Ali, Sharafat, Alam, Fakhrul, Potgieter, Johan, and Arif, Khalid Mahmood
Subjects: *AIR pollution monitoring, *CALIBRATION, *DETECTORS, *GAS detectors, *RANDOM forest algorithms, *CARBON monoxide detectors
Abstract: Low-cost ambient sensors have been identified as a promising technology for monitoring air pollution at a high spatio-temporal resolution. However, the pollutant data captured by these cost-effective sensors are less accurate than their conventional counterparts and require careful calibration to improve their accuracy and reliability. In this paper, we propose to leverage temporal information, such as the duration of time a sensor has been deployed and the time of day the reading was taken, in order to improve the calibration of low-cost sensors. This information is readily available and has so far not been utilized in the reported literature for the calibration of cost-effective ambient gas pollutant sensors. We make use of three data sets collected by research groups around the world, who gathered the data from field-deployed low-cost CO and NO2 sensors co-located with accurate reference sensors. Our investigation shows that using the temporal information as a co-variate can significantly improve the accuracy of common machine learning-based calibration techniques, such as Random Forest and Long Short-Term Memory. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Aviation fuel pump health state assessment based on evidential reasoning and random forests.

Author: Zhang, Bangcheng, Chen, Dianxin, Su, Wei, Liu, Tiejun, and Shao, Yubo
Subjects: *AIRCRAFT fuels, *RANDOM forest algorithms, *FUEL pumps, *EVOLUTIONARY algorithms, *COVARIANCE matrices
Abstract: As the power source of the engine, the Fuel Pump(FP) plays a vital role in the safe operation of the aircraft. Due to the complexity of the working mechanism of Aviation Fuel Pumps (AFP) and the strong correlation between the components, the assessment model cannot reflect the health state of the FPs better, while the initial parameters in the assessment model will affect the assessment effect of the model. Therefore, this paper proposes a health status assessment model that can fully integrate monitoring data. To improve the accuracy of the model parameters, the Random Forest algorithm is used to give the feature weights to make up for the limitation of relying on expert knowledge, and the model parameters are optimized by the Covariance Matrix Adaptive Evolutionary Strategy algorithm, which achieves an accurate assessment of the state. Finally, the AFP test bed was built and the AFP was tested. Compared with other methods, the accuracy of the proposed method in this question reaches 96%, which is greatly superior to other methods and verifies the effectiveness of the proposed method. It also provides an outlook on future research directions for health state assessment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Digital descriptors sharpen classical descriptors, for improving genebank accession management: A case study on Arachis spp. and Phaseolus spp.

Author: Conejo-Rodríguez, Diego Felipe, Gonzalez-Guzman, Juan José, Ramirez-Gil, Joaquín Guillermo, Wenzl, Peter, and Urban, Milan Oldřich
Subjects: *ARACHIS, *PLANT germplasm, *MACHINE learning, *RANDOM forest algorithms, *AGROBIODIVERSITY
Abstract: High-throughput phenotyping brings new opportunities for detailed genebank accessions characterization based on image-processing techniques and data analysis using machine learning algorithms. Our work proposes to improve the characterization processes of bean and peanut accessions in the CIAT genebank through the identification of phenomic descriptors comparable to classical descriptors including methodology integration into the genebank workflow. To cope with these goals morphometrics and colorimetry traits of 14 bean and 16 forage peanut accessions were determined and compared to the classical International Board for Plant Genetic Resources (IBPGR) descriptors. Descriptors discriminating most accessions were identified using a random forest algorithm. The most-valuable classification descriptors for peanuts were 100-seed weight and days to flowering, and for beans, days to flowering and primary seed color. The combination of phenomic and classical descriptors increased the accuracy of the classification of Phaseolus and Arachis accessions. Functional diversity indices are recommended to genebank curators to evaluate phenotypic variability to identify accessions with unique traits or identify accessions that represent the greatest phenotypic variation of the species (functional agrobiodiversity collections). The artificial intelligence algorithms are capable of characterizing accessions which reduces costs generated by additional phenotyping. Even though deep analysis of data requires new skills, associating genetic, morphological and ecogeographic diversity is giving us an opportunity to establish unique functional agrobiodiversity collections with new potential traits. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Enhancing winter road maintenance with explainable AI: SHAP analysis for interpreting machine learning models in road friction estimation.

Author: Ding, Xueru and Kwon, Tae J.
Subjects: *ROAD maintenance, *FRICTION, *ARTIFICIAL intelligence, *REGRESSION trees, *RANDOM forest algorithms, *MACHINE learning
Abstract: Effective winter road maintenance relies on precise road friction estimation. Machine learning (ML) models have shown significant promise in this; however, their inherent complexity makes understanding their inner workings challenging. This paper addresses this issue by conducting a comparative analysis of road friction estimation models using four ML methods, including regression tree, random forest, eXtreme Gradient Boosting (XGBoost), and support vector regression (SVR). We then employ the SHapley Additive exPlanations (SHAP) explainable artificial intelligence (AI) to enhance model interpretability. Our analysis on an Alberta dataset reveals that the XGBoost model performs best with an accuracy of 91.39%. The SHAP analysis illustrates the logical relationships between predictor features and friction within all three tree-based models, but it also uncovers inconsistencies within the SVR model, potentially attributed to insufficient feature interactions. Thus, this paper not only showcase the role of explainable AI in improving the ML interpretability of models for road friction estimation, but also provides practical insights that could improve winter road maintenance decisions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Effects of lifestyle and associated diseases on serum CC16 suggest complex interactions among metabolism, heart and lungs.

Author: Rohmann, Nathalie, Stürmer, Paula, Geisler, Corinna, Schlicht, Kristina, Knappe, Carina, Hartmann, Katharina, Türk, Kathrin, Hollstein, Tim, Beckmann, Alexia, Seoudy, Anna K., Becker, Ulla, Wietzke-Braun, Perdita, Settgast, Ute, Tran, Florian, Rosenstiel, Philip, Beckmann, Jan H., von Schönfels, Witigo, Seifert, Stephan, Heyckendorf, Jan, and Franke, Andre
Subjects: *LUNGS, *DISEASE complications, *WEIGHT loss, *RANDOM forest algorithms, *CARDIOVASCULAR system, *METABOLISM, *WAIST-hip ratio
Abstract: [Display omitted] • CC16 is an anti-inflammatory, immunomodulatory protein expressed in respiratory club cells. • Severe abdominal obesity and arterial hypertension robustly decrease serum CC16. • ACEi/ARBs, uricosurics and chronic heart failure robustly increase serum CC16. • Effects might be mediated by adipose tissue inflammation as well as RAAS and uric acid disturbance. • Findings indicate a complex interplay of the metabolic, respiratory and cardiovascular system. Clara cell 16-kDa protein (CC16) is an anti-inflammatory, immunomodulatory secreted pulmonary protein with reduced serum concentrations in obesity according to recent data. Studies focused solely on bodyweight, which does not properly reflect obesity-associated implications of the metabolic and reno-cardio-vascular system. The purpose of this study was therefore to examine CC16 in a broad physiological context considering cardio-metabolic comorbidities of primary pulmonary diseases. CC16 was quantified in serum samples in a subset of the FoCus (N = 497) and two weight loss intervention cohorts (N = 99) using ELISA. Correlation and general linear regression analyses were applied to assess CC16 effects of lifestyle, gut microbiota, disease occurrence and treatment strategies. Importance and intercorrelation of determinants were validated using random forest algorithms. CC16 A38G gene mutation, smoking and low microbial diversity significantly decreased CC16. Pre-menopausal female displayed lower CC16 compared to post-menopausal female and male participants. Biological age and uricosuric medications increased CC16 (all p < 0.01). Adjusted linear regression revealed CC16 lowering effects of high waist-to-hip ratio (est. −11.19 [−19.4; −2.97], p = 7.99 × 10−3), severe obesity (est. −2.58 [−4.33; −0.82], p = 4.14 × 10−3) and hypertension (est. −4.31 [−7.5; −1.12], p = 8.48 × 10−3). ACEi/ARB medication (p = 2.5 × 10−2) and chronic heart failure (est. 4.69 [1.37; 8.02], p = 5.91 × 10−3) presented increasing effects on CC16. Mild associations of CC16 were observed with blood pressure, HOMA-IR and NT-proBNP, but not manifest hyperlipidemia, type 2 diabetes, diet quality and dietary weight loss intervention. A role of metabolic and cardiovascular abnormalities in the regulation of CC16 and its modifiability by behavioral and pharmacological interventions is indicated. Alterations by ACEi/ARB and uricosurics could point towards regulatory axes comprising the renin-angiotensin-aldosterone system and purine metabolism. Findings altogether strengthen the importance of interactions among metabolism, heart and lungs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Beyond salinity: Plants show divergent responses to soil ion composition.

Author: Pätsch, Ricarda, Midolo, Gabriele, Dítě, Zuzana, Dítě, Daniel, Wagner, Viktoria, Pavonič, Michal, Danihelka, Jiří, Preislerová, Zdenka, Ćuk, Mirjana, Stroh, Hans Georg, Tóth, Tibor, Chytrá, Helena, and Chytrý, Milan
Subjects: *SOIL composition, *SOIL salinity, *SALINITY, *PRINCIPAL components analysis, *RANDOM forest algorithms
Abstract: Aim: In salt‐affected environments, salinity shapes ecosystem functions and species composition. Apart from salinity, however, we know little about how soil chemical factors affect plant species. We hypothesized that specific ions, most of which contribute to salinity, co‐determine plant niche differentiation. We asked if the importance of ions differs for species with (halophytes) and without (associated species) physiological adaptations to saline soils. Location: Carpatho‐Pannonian region (Central and Eastern Europe). Time period: 2005–2021. Major taxa studied: Vascular plants. Methods: We recorded species occurrences and collected soil samples in 433 plots in saline habitats. We measured pH, salinity (electrical conductivity), and concentrations of Ca2+, K+, Mg2+, Na+, SO42− Cl−, CO32− and mineral nitrogen (mN) and calculated the sodium adsorption ratio (SAR). For 88 species, we fitted response curves with Huisman–Olff–Fresco (HOF) models. To study ions' effects on species composition and ions' variance, we compared unconstrained and constrained ordinations and performed a principal component analysis. We used random forests to analyse the importance of ions for individual species and created two‐dimensional species niche plots for key ions. Results: Ion concentration niches varied among species and did not necessarily correspond to soil salinity or alkalinity. We frequently observed monotonic, sigmoidal model responses, while skewed unimodal responses were rare. Ions explained a considerable proportion of species compositional variation. Particularly, Na+, SO42−, Cl−, and CO32− contributed to the ions' variance. Na+, followed by SO42−, Cl−, CO32−, Ca2+, Mg2+, and mN, was most important for the occurrence of individual species. Compared to associated species, Na+, SO42−, and mN were significantly less important for halophytes, whereas Cl− and CO32− played a significant role. Main conclusions: We show that ion composition co‐determines niche differentiation in saline soils, suggesting evolved physiological adaptations in halophytes. Our study calls for incorporating high‐resolution data on soil ion composition in ecological research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Discrimination between icequakes and earthquakes in southern Alaska: an exploration of waveform features using Random Forest algorithm.

Author: Kharita, Akash, Denolle, Marine A, and West, Michael E
Subjects: *RANDOM forest algorithms, *EARTHQUAKES, *ICE calving, *MACHINE learning, *SIGNAL-to-noise ratio
Abstract: This study examines the feature space of seismic waveforms often used in machine learning applications for seismic event detection and classification problems. Our investigation centres on the southern Alaska region, where the seismic record captures diverse seismic activity, notably from the calving of marine-terminating glaciers and tectonic earthquakes along active plate boundaries. While the automated discrimination of earthquakes and glacier quakes is our nominal goal, this data set provides an outstanding opportunity to explore the general feature space of regional seismic phases. That objective has applicability beyond ice quakes and our geographic region of study. We make a noteworthy discovery that features rooted in the spectral content of seismic waveforms consistently outperform statistical and temporal features. Spectral features demonstrate robust performance, exhibiting resilience to class imbalance while being minimally impacted by factors such as epicentral distance and signal-to-noise ratio. We also conduct experiments on the transferability of the model and find that transferability primarily depends on the appearance of the waveforms. Finally, we analyse misclassified events and find examples that are identified incorrectly in the original regional catalogue. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Clinical variables contributing to the identification of biologically defined subgroups within cognitively unimpaired and mild cognitive impairment individuals.

Author: Marcolini, Sofia, Mondragón, Jaime D., Dominguez‐Vega, Zeus T., De Deyn, Peter P., and Maurits, Natasha M.
Subjects: *ALZHEIMER'S disease, *RANDOM forest algorithms, *RACE, *MARITAL status, *PSYCHOLOGICAL tests, *MILD cognitive impairment
Abstract: Background: A lack of consensus exists in linking demographic, behavioral, and cognitive characteristics to biological stages of dementia, defined by the ATN (amyloid, tau, neurodegeneration) classification incorporating amyloid, tau, and neuronal injury biomarkers. Methods: Using a random forest classifier we investigated whether 27 demographic, behavioral, and cognitive characteristics allowed distinction between ATN‐defined groups with the same cognitive profile. This was done separately for three cognitively unimpaired (CU) (112 A‐T‐N‐; 46 A+T+N+/−; 65 A‐T+/‐N+/−) and three mild cognitive impairment (MCI) (128 A‐T‐N‐; 223 A+T+N+/−; 94 A‐T+/‐N+/−) subgroups. Results: Classification‐balanced accuracy reached 39% for the CU and 52% for the MCI subgroups. Logical Delayed Recall (explaining 16% of the variance), followed by the Alzheimer's Disease Assessment Scale 13 (14%) and Everyday Cognition Informant (10%), were the most relevant characteristics for classification of the MCI subgroups. Race and ethnicity, marital status, and Everyday Cognition Patient were not relevant (0%). Conclusions: The demographic, behavioral, and cognitive measures used in our model were not informative in differentiating ATN‐defined CU profiles. Measures of delayed memory, general cognition, and activities of daily living were the most informative in differentiating ATN‐defined MCI profiles; however, these measures alone were not sufficient to reach high classification performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Potential value of CT-based comprehensive nomogram in predicting occult lymph node metastasis of esophageal squamous cell paralaryngeal nerves: a two-center study.

Author: Xue, Ting, Wan, Xinyi, Zhou, Taohu, Zou, Qin, Ma, Chao, and Chen, Jieqiong
Subjects: *LYMPHATIC metastasis, *RECURRENT laryngeal nerve, *RANDOM forest algorithms, *RECEIVER operating characteristic curves, *MACHINE learning
Abstract: Purpose: The aim of this study is to construct a combined model that integrates radiomics, clinical risk factors and machine learning algorithms to predict para-laryngeal lymph node metastasis in esophageal squamous cell carcinoma. Methods: A retrospective study included 361 patients with esophageal squamous cell carcinoma from 2 centers. Radiomics features were extracted from the computed tomography scans. Logistic regression, k nearest neighbor, multilayer perceptron, light Gradient Boosting Machine, support vector machine, random forest algorithms were used to construct radiomics models. The receiver operating characteristic curve and The Hosmer–Lemeshow test were employed to select the better-performing model. Clinical risk factors were identified through univariate logistic regression analysis and multivariate logistic regression analysis and utilized to develop a clinical model. A combined model was then created by merging radiomics and clinical risk factors. The performance of the models was evaluated using ROC curve analysis, and the clinical value of the models was assessed using decision curve analysis. Results: A total of 1024 radiomics features were extracted. Among the radiomics models, the KNN model demonstrated the optimal diagnostic capabilities and accuracy, with an area under the curve (AUC) of 0.84 in the training cohort and 0.62 in the internal test cohort. Furthermore, the combined model exhibited an AUC of 0.97 in the training cohort and 0.86 in the internal test cohort. Conclusion: A clinical-radiomics integrated nomogram can predict occult para-laryngeal lymph node metastasis in esophageal squamous cell carcinoma and provide guidance for personalized treatment. Highlights: To investigate the value of CT-based radiomics features in predicting occult lymph node metastasis adjacent to recurrent laryngeal nerve in esophageal squamous cell carcinoma. The radiomics nomogram showed better performance than the clinical model to predict occult lymph node metastasis adjacent to recurrent laryngeal nerve in esophageal squamous cell carcinoma. The radiomics nomogram demonstrated excellent performance in the training, internal validation, and external validation cohort (AUC, 0.97; AUC, 0.86; AUC, 0.63) [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Ensemble Learning-Based Seismic Response Prediction of Isolated Structure Considering Soil–Structure Interaction.

Author: Fu, Bo, Liu, Xinrui, and Chen, Jin
Subjects: *SOIL-structure interaction, *SEISMIC response, *GROUND motion, *REGRESSION trees, *RANDOM forest algorithms, *DATABASES
Abstract: To accurately and rapidly predict seismic responses, including the maximum displacement (MaxD) and maximum acceleration (MaxA), of the isolated structure considering the soil–structure interaction (SSI), five ensemble learning models, i.e. random forest (RF), gradient boosting regression tree (GBRT), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and stacking model, are constructed. Firstly, a total of 96 000 nonlinear time history analyses of the isolated structure considering the SSI are conducted with the aid of OpenSees. The generated database is used for training and testing ensemble learning models. The ensemble learning models have 12 input variables in four categories, i.e. ground motion parameters, structural parameter, isolation parameters and soil parameter, and two output variables, i.e. MaxD and MaxA. The study shows that all ensemble learning models have excellent prediction performance for both training and testing datasets. The determination coefficients are larger than 0.96 and root-mean-square errors (RMSEs) are relatively small. Among the five ensemble learning models, the stacking model exhibits the best performance. In addition, the calculation method of feature importance score for the stacking model is provided. According to the feature importance analysis, the ground motion parameters have greater impact on seismic responses than other three categories of inputs. Finally, six ground motions are randomly selected to verify the generalization ability of the proposed ensemble learning models. The results show that the stacking model has a favorable generalization ability with relatively small prediction errors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Development of new materials for electrothermal metals using data driven and machine learning.

Author: Zhou, Chengqun, Pei, Muyang, Wu, Chao, Xu, Degang, Peng, Qiang, and He, Guoai
Subjects: *TEMPERATURE coefficient of electric resistance, *FEATURE selection, *RADIAL basis functions, *SUPPORT vector machines, *TITANIUM alloys, *RANDOM forest algorithms, *MACHINE learning
Abstract: After adopting a combined approach of data-driven methods and machine learning, the prediction of material performance and the optimization of composition design can significantly reduce the development time of materials at a lower cost. In this research, we employed four machine learning algorithms, including linear regression, ridge regression, support vector regression, and backpropagation neural networks, to develop predictive models for the electrical performance data of titanium alloys. Our focus was on two key objectives: resistivity and the temperature coefficient of resistance (TCR). Subsequently, leveraging the results of feature selection, we conducted an analysis to discern the impact of alloying elements on these two electrical properties.The prediction results indicate that for the resistivity data prediction task, the radial basis function kernel-based support vector machine model performs the best, with a correlation coefficient above 0.995 and a percentage error within 2%, demonstrating high predictive capability. For the TCR data prediction task, the best-performing model is a backpropagation neural network with two hidden layers, also with a correlation coefficient above 0.995 and a percentage error within 3%, demonstrating good generalization ability. The feature selection results using random forest and Xgboost indicate that Al and Zr have a significant positive effect on resistivity, while Al, Zr, and V have a significant negative effect on TCR. The conclusion of the composition optimization design suggests that to achieve both high resistivity and TCR, it is recommended to set the Al content in the range of 1.5% to 2% and the Zr content in the range of 2.5% to 3%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Improving predictions of compound amenability for liquid chromatography–mass spectrometry to enhance non-targeted analysis.

Author: Charest, Nathaniel, Lowe, Charles N., Ramsland, Christian, Meyer, Brian, Samano, Vicente, and Williams, Antony J.
Subjects: *MACHINE learning, *LIQUID chromatography-mass spectrometry, *ENVIRONMENTAL health, *RANDOM forest algorithms, *CHEMICAL structure, *STATISTICAL correlation
Abstract: Mass-spectrometry-based non-targeted analysis (NTA), in which mass spectrometric signals are assigned chemical identities based on a systematic collation of evidence, is a growing area of interest for toxicological risk assessment. Successful NTA results in better identification of potentially hazardous pollutants within the environment, facilitating the development of targeted analytical strategies to best characterize risks to human and ecological health. A supporting component of the NTA process involves assessing whether suspected chemicals are amenable to the mass spectrometric method, which is necessary in order to assign an observed signal to the chemical structure. Prior work from this group involved the development of a random forest model for predicting the amenability of 5517 unique chemical structures to liquid chromatography–mass spectrometry (LC-MS). This work improves the interpretability of the group's prior model of the same endpoint, as well as integrating 1348 more data points across negative and positive ionization modes. We enhance interpretability by feature engineering, a machine learning practice that reduces the input dimensionality while attempting to preserve performance statistics. We emphasize the importance of interpretable machine learning models within the context of building confidence in NTA identification. The novel data were curated by the labeling of compounds as amenable or unamenable by expert curators, resulting in an enhanced set of chemical compounds to expand the applicability domain of the prior model. The balanced accuracy benchmark of the newly developed model is comparable to performance previously reported (mean CV BA is 0.84 vs. 0.82 in positive mode, and 0.85 vs. 0.82 in negative mode), while on a novel external set, derived from this work's data, the Matthews correlation coefficients (MCC) for the novel models are 0.66 and 0.68 for positive and negative mode, respectively. Our group's prior published models scored MCC of 0.55 and 0.54 on the same external sets. This demonstrates appreciable improvement over the chemical space captured by the expanded dataset. This work forms part of our ongoing efforts to develop models with higher interpretability and higher performance to support NTA efforts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Explainable prediction of node labels in multilayer networks: a case study of turnover prediction in organizations.

Author: Gadár, László and Abonyi, János
Subjects: *DECISION trees, *RANDOM forest algorithms, *LABOR turnover, *FORECASTING, *PREDICTION models
Abstract: In real-world classification problems, it is important to build accurate prediction models and provide information that can improve decision-making. Decision-support tools are often based on network models, and this article uses information encoded by social networks to solve the problem of employer turnover. However, understanding the factors behind black-box prediction models can be challenging. Our question was about the predictability of employee turnover, given information from the multilayer network that describes collaborations and perceptions that assess the performance of organizations that indicate the success of cooperation. Our goal was to develop an accurate prediction procedure, preserve the interpretability of the classification, and capture the wide variety of specific reasons that explain positive cases. After a feature engineering, we identified variables with the best predictive power using decision trees and ranked them based on their added value considering their frequent co-occurrence. We applied the Random Forest using the SMOTE balancing technique for prediction. We calculated the SHAP values to identify the variables that contribute the most to individual predictions. As a last step, we clustered the sample based on SHAP values to fine-tune the explanations for quitting due to different background factors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Saliva‑microbiome‑derived signatures: expected to become a potential biomarker for pulmonary nodules (MCEPN-1).

Author: Ren, Yifeng, Ma, Qiong, Zeng, Xiao, Huang, Chunxia, Tan, Shiyan, Fu, Xi, Zheng, Chuan, You, Fengming, and Li, Xueke
Subjects: *PULMONARY nodules, *NEISSERIA, *NICOTINAMIDE adenine dinucleotide phosphate, *LUNG cancer, *RANDOM forest algorithms, *ORAL microbiology, *ORAL mucosa
Abstract: Background: Oral microbiota imbalance is associated with the progression of various lung diseases, including lung cancer. Pulmonary nodules (PNs) are often considered a critical stage for the early detection of lung cancer; however, the relationship between oral microbiota and PNs remains unknown. Methods: We conducted a 'Microbiome with pulmonary nodule series study 1' (MCEPN-1) where we compared PN patients and healthy controls (HCs), aiming to identify differences in oral microbiota characteristics and discover potential microbiota biomarkers for non-invasive, radiation-free PNs diagnosis and warning in the future. We performed 16 S rRNA amplicon sequencing on saliva samples from 173 PN patients and 40 HCs to compare the characteristics and functional changes in oral microbiota between the two groups. The random forest algorithm was used to identify PN salivary microbial markers. Biological functions and potential mechanisms of differential genes in saliva samples were preliminarily explored using the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Cluster of Orthologous Groups (COG) analyses. Results: The diversity of salivary microorganisms was higher in the PN group than in the HC group. Significant differences were noted in community composition and abundance of oral microorganisms between the two groups. Neisseria, Prevotella, Haemophilus and Actinomyces, Porphyromonas, Fusobacterium, 7M7x, Granulicatella and Selenomonas were the main differential genera between the PN and HC groups. Fusobacterium, Porphyromonas, Parvimonas, Peptostreptococcus and Haemophilus constituted the optimal marker sets (area under curve, AUC = 0.80), which can distinguish between patients with PNs and HCs. Further, the salivary microbiota composition was significantly correlated with age, sex, and smoking history (P < 0.001), but not with personal history of cancer (P > 0.05). Bioinformatics analysis of differential genes showed that patients with PN showed significant enrichment in protein/molecular functions related to immune deficiency and energy metabolisms, such as the cytoskeleton protein RodZ, nicotinamide adenine dinucleotide phosphate dehydrogenase (NADPH) dehydrogenase, major facilitator superfamily transporters and AraC family transcription regulators. Conclusions: Our study provides the first evidence that the salivary microbiota can serve as potential biomarkers for identifying PN. We observed a significant association between changes in the oral microbiota and PNs, indicating the potential of salivary microbiota as a new non-invasive biomarker for PNs. Trial registration: Clinical trial registration number: ChiCTR2200062140; Date of registration: 07/25/2022. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Employing machine learning algorithm for properties of wood ceramics prediction: A case study of ammonia nitrogen adsorption capacity, apparent porosity, surface hardness and burn-off for wood ceramics.

Author: Jiang, Wenjun, Guo, Xiurong, Guan, Qi, Zhang, Yanlin, and Du, Danfeng
Subjects: *MACHINE learning, *WOOD, *ATMOSPHERIC ammonia, *ADSORPTION capacity, *BOOSTING algorithms, *RANDOM forest algorithms, *CERAMICS
Abstract: The estimation of material performance plays a crucial role in practical life, enabling the rational allocation of time and resources while enhancing the practical application of materials. Therefore, this study investigates the analytical and predictive capabilities of five machine learning (ML) algorithms (the Random Forest algorithm – RF, the Adaboost algorithm – AB, the Gradient Boosting algorithm – GB, the Extra Trees algorithm – ET and linear models – LM) for comprehensive performance parameters of wood ceramics (ammonia nitrogen adsorption quantity - Q , open porosity - P , burn-off , and hardness). In the analysis of model prediction capabilities, five key statistical parameters (Root Mean Square Error - RMSE , Mean Square Error - MSE , Coefficient of Correlation - R , Determination Coefficient - R 2 , and Mean Absolute Relative Error - MARE) were calculated. The results indicate the following: (1) Among various ML models, the GB model exhibits the most superior performance with an R 2 ≥ 0.9594. (2) However, in predicting the four performance parameters of wood ceramics, the AB model with an R 2 ranging from 0.0078 to 0.45, indicating notably poor predictive capabilities. Despite integrating the Support Vector Regression (SVR) module, no enhancement in predictive accuracy was observed. (3) While forecasting the four performance parameters of wood ceramics, the LM model demonstrates predictive accuracy relatively akin to that of the RF and ET models among the ML models. (4) Though the feature importance scores vary across distinct input variable models, there is a consistent trend of change. (5) Variables B and D exhibit some level of correlation with other variables. In summary, the results of this study suggest that, for the analysis and prediction of wood ceramic performance, the GB model among the five regression models demonstrates outstanding simulation performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Prediction of jumbo drill penetration rate in underground mines using various machine learning approaches and traditional models.

Author: Heydari, Sasan, Hoseinie, Seyed Hadi, and Bagherpour, Raheb
Subjects: *MINES & mineral resources, *PENETRATION mechanics, *STANDARD deviations, *RANDOM forest algorithms, *MACHINE learning, *ROCK properties
Abstract: Estimating penetration rates of Jumbo drills is crucial for optimizing underground mining drilling processes, aiming to reduce costs and time. This study investigates various regression and machine learning methods, including Multilayer Perceptron (MLP), Support Vector Regression (SVR), and Random Forests (RF), to predict the penetration rates (ROP) using multivariate inputs such as operation parameters and rock mass characteristics. The Rock Mass Drillability Index (RDi), incorporating both intact rock properties and structural parameters, was utilized to characterize the rock mass. The dataset was split into 80% for training and 20% for testing. Performance metrics including correlation coefficient (R2), variance accounted for (VAF), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) were calculated for each method to evaluate the accuracy of the predictions. SVR exhibited the best prediction performance for ROP, achieving the highest R2, lowest RMSE, MAE, and MAPE, as well as the largest VAF values of 0.94, 0.15, 0.11, 4.84, and 94.13 during training, and 0.91, 0.19, 0.13, 6.02, and 91.11 during testing, respectively. With this high accuracy, we conclude that the proposed machine learning algorithms are valuable and efficient predictors for estimating jumbo drill penetration rates in underground mining operations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. M6A-related bioinformatics analysis indicates that LRPPRC is an immune marker for ischemic stroke.

Author: Shen, Lianwei and Yue, Shouwei
Subjects: *ISCHEMIC stroke, *REGULATOR genes, *RANDOM forest algorithms, *GENE expression, *CEREBROVASCULAR disease, *DRUG interactions
Abstract: Ischemic stroke (IS) is a common cerebrovascular disease whose pathogenesis involves a variety of immune molecules, immune channels and immune processes. 6-methyladenosine (m6A) modification regulates a variety of immune metabolic and immunopathological processes, but the role of m6A in IS is not yet understood. We downloaded the data set GSE58294 from the GEO database and screened for m6A-regulated differential expression genes. The RF algorithm was selected to screen the m6A key regulatory genes. Clinical prediction models were constructed and validated based on m6A key regulatory genes. IS patients were grouped according to the expression of m6A key regulatory genes, and immune markers of IS were identified based on immune infiltration characteristics and correlation. Finally, we performed functional enrichment, protein interaction network analysis and molecular prediction of the immune biomarkers. We identified a total of 7 differentially expressed genes in the dataset, namely METTL3, WTAP, YWHAG, TRA2A, YTHDF3, LRPPRC and HNRNPA2B1. The random forest algorithm indicated that all 7 genes were m6A key regulatory genes of IS, and the credibility of the above key regulatory genes was verified by constructing a clinical prediction model. Based on the expression of key regulatory genes, we divided IS patients into 2 groups. Based on the expression of the gene LRPPRC and the correlation of immune infiltration under different subgroups, LRPPRC was identified as an immune biomarker for IS. GO enrichment analyses indicate that LRPPRC is associated with a variety of cellular functions. Protein interaction network analysis and molecular prediction indicated that LRPPRC correlates with a variety of immune proteins, and LRPPRC may serve as a target for IS drug therapy. Our findings suggest that LRPPRC is an immune marker for IS. Further analysis based on LRPPRC could elucidate its role in the immune microenvironment of IS. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data.

Author: Uddin, Shahadat and Lu, Haohui
Subjects: *MACHINE learning, *REGRESSION trees, *RANDOM forest algorithms, *SUPPORT vector machines, *DECISION trees
Abstract: Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. A machine learning-based predictive model of causality in orthopaedic medical malpractice cases in China.

Author: Yang, Qingxin, Luo, Li, Lin, Zhangpeng, Wen, Wei, Zeng, Wenbo, and Deng, Hong
Subjects: *MEDICAL malpractice, *MACHINE learning, *PREDICTION models, *RANDOM forest algorithms, *IDENTIFICATION, *DATABASES
Abstract: Purpose: To explore the feasibility and validity of machine learning models in determining causality in medical malpractice cases and to try to increase the scientificity and reliability of identification opinions. Methods: We collected 13,245 written judgments from PKULAW.COM, a public database. 963 cases were included after the initial screening. 21 medical and ten patient factors were selected as characteristic variables by summarising previous literature and cases. Random Forest, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) were used to establish prediction models of causality for the two data sets, respectively. Finally, the optimal model is obtained by hyperparameter tuning of the six models. Results: We built three real data set models and three virtual data set models by three algorithms, and their confusion matrices differed. XGBoost performed best in the real data set, with a model accuracy of 66%. In the virtual data set, the performance of XGBoost and LightGBM was basically the same, and the model accuracy rate was 80%. The overall accuracy of external verification was 72.7%. Conclusions: The optimal model of this study is expected to predict the causality accurately. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Carpal tunnel syndrome prediction with machine learning algorithms using anthropometric and strength-based measurement.

Author: Yetiş, Mehmet, Kocaman, Hikmet, Canlı, Mehmet, Yıldırım, Hasan, Yetiş, Aysu, and Ceylan, İsmail
Subjects: *GRIP strength, *MACHINE learning, *CARPAL tunnel syndrome, *WRIST, *RANDOM forest algorithms, *MEDICAL personnel, *FORELIMB
Abstract: Objectives: Carpal tunnel syndrome (CTS) stands as the most prevalent upper extremity entrapment neuropathy, with a multifaceted etiology encompassing various risk factors. This study aimed to investigate whether anthropometric measurements of the hand, grip strength, and pinch strength could serve as predictive indicators for CTS through machine learning techniques. Methods: Enrollment encompassed patients exhibiting CTS symptoms (n = 56) and asymptomatic healthy controls (n = 56), with confirmation via electrophysiological assessments. Anthropometric measurements of the hand were obtained using a digital caliper, grip strength was gauged via a digital handgrip dynamometer, and pinch strengths were assessed using a pinchmeter. A comprehensive analysis was conducted employing four most common and effective machine learning algorithms, integrating thorough parameter tuning and cross-validation procedures. Additionally, the outcomes of variable importance were presented. Results: Among the diverse algorithms, Random Forests (accuracy of 89.474%, F1-score of 0.905, and kappa value of 0.789) and XGBoost (accuracy of 86.842%, F1-score of 0.878, and kappa value of 0.736) emerged as the top-performing choices based on distinct classification metrics. In addition, using variable importance calculations specific to these models, the most important variables were found to be wrist circumference, hand width, hand grip strength, tip pinch, key pinch, and middle finger length. Conclusion: The findings of this study demonstrated that wrist circumference, hand width, hand grip strength, tip pinch, key pinch, and middle finger length can be utilized as reliable indicators of CTS. Also, the model developed herein, along with the identified crucial variables, could serve as an informative guide for healthcare professionals, enhancing precision and efficacy in CTS prediction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. A Cost-Effective Model for Predicting Recurrent Gastric Cancer Using Clinical Features.

Author: Chen, Chun-Chia, Ting, Wen-Chien, Lee, Hsi-Chieh, Chang, Chi-Chang, Lin, Tsung-Chieh, and Yang, Shun-Fa
Subjects: *STOMACH cancer, *BOOTSTRAP aggregation (Algorithms), *RANDOM forest algorithms, *BODY mass index, *ARTIFICIAL intelligence, *HELICOBACTER pylori infections
Abstract: This study used artificial intelligence techniques to identify clinical cancer biomarkers for recurrent gastric cancer survivors. From a hospital-based cancer registry database in Taiwan, the datasets of the incidence of recurrence and clinical risk features were included in 2476 gastric cancer survivors. We benchmarked Random Forest using MLP, C4.5, AdaBoost, and Bagging algorithms on metrics and leveraged the synthetic minority oversampling technique (SMOTE) for imbalanced dataset issues, cost-sensitive learning for risk assessment, and SHapley Additive exPlanations (SHAPs) for feature importance analysis in this study. Our proposed Random Forest outperformed the other models with an accuracy of 87.9%, a recall rate of 90.5%, an accuracy rate of 86%, and an F1 of 88.2% on the recurrent category by a 10-fold cross-validation in a balanced dataset. We identified clinical features of recurrent gastric cancer, which are the top five features, stage, number of regional lymph node involvement, Helicobacter pylori, BMI (body mass index), and gender; these features significantly affect the prediction model's output and are worth paying attention to in the following causal effect analysis. Using an artificial intelligence model, the risk factors for recurrent gastric cancer could be identified and cost-effectively ranked according to their feature importance. In addition, they should be crucial clinical features to provide physicians with the knowledge to screen high-risk patients in gastric cancer survivors as well. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Short-Term Load Forecasting Based on Optimized Random Forest and Optimal Feature Selection.

Author: Magalhães, Bianca, Bento, Pedro, Pombo, José, Calado, Maria do Rosário, and Mariano, Sílvio
Subjects: *RANDOM forest algorithms, *FEATURE selection, *REGRESSION trees, *FORECASTING, *ELECTRIC power consumption, *COST control
Abstract: Short-term load forecasting (STLF) plays a vital role in ensuring the safe, efficient, and economical operation of power systems. Accurate load forecasting provides numerous benefits for power suppliers, such as cost reduction, increased reliability, and informed decision-making. However, STLF is a complex task due to various factors, including non-linear trends, multiple seasonality, variable variance, and significant random interruptions in electricity demand time series. To address these challenges, advanced techniques and models are required. This study focuses on the development of an efficient short-term power load forecasting model using the random forest (RF) algorithm. RF combines regression trees through bagging and random subspace techniques to improve prediction accuracy and reduce model variability. The algorithm constructs a forest of trees using bootstrap samples and selects random feature subsets at each node to enhance diversity. Hyperparameters such as the number of trees, minimum sample leaf size, and maximum features for each split are tuned to optimize forecasting results. The proposed model was tested using historical hourly load data from four transformer substations supplying different campus areas of the University of Beira Interior, Portugal. The training data were from January 2018 to December 2021, while the data from 2022 were used for testing. The results demonstrate the effectiveness of the RF model in forecasting short-term hourly and one day ahead load and its potential to enhance decision-making processes in smart grid operations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Application of Machine Learning for Productivity Prediction in Tight Gas Reservoirs.

Author: Fang, Maojun, Shi, Hengyu, Li, Hao, and Liu, Tongjing
Subjects: *GAS reservoirs, *BACK propagation, *GAS wells, *RANDOM forest algorithms, *FORECASTING
Abstract: Accurate well productivity prediction plays a significant role in formulating reservoir development plans. However, traditional well productivity prediction methods lack accuracy in tight gas reservoirs; therefore, this paper quantitatively evaluates the correlations between absolute open flow and the critical parameters for Linxing tight gas reservoirs through statistical analysis. Dominant control factors are obtained by considering reservoir engineering theories, and a novel machine learning-based well productivity prediction method is proposed for tight gas reservoirs. The adaptability of the productivity prediction model is assessed through machine learning and field data analysis. Combined with the typical decline curve analysis, the estimated ultimate recovery (EUR) of a single well in the tight gas reservoir is forecasted in an appropriate range. The results of the study include 10 parameters (such as gas saturation) identified as the dominant controlling factors for well productivity and geological factors that impact the productivity in this area compared to fracturing parameters. According to the prediction results of the three models, the R2 of Support Vector Regression (SVR), Back Propagation (BP), and Random Forest (RF) models are 0.72, 0.87, and 0.91, respectively. The results indicate that RF has a more accurate prediction. In addition, the RF model is more suitable for medium and high-production wells based on the actual field data. Based on this model, it is verified that the productivity of low-producing wells is affected by water production. This study confirms the model's reliability and application value by predicting recoverable reserves for a single well. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Early Prediction of Regional Red Needle Cast Outbreaks Using Climatic Data Trends and Satellite-Derived Observations.

Author: Watt, Michael S., Holdaway, Andrew, Watt, Pete, Pearse, Grant D., Palmer, Melanie E., Steer, Benjamin S. C., Camarretta, Nicolò, McLay, Emily, and Fraser, Stuart
Subjects: *REMOTE-sensing images, *PINUS radiata, *RANDOM forest algorithms, *SOLAR radiation, *DISEASE incidence, *SUMMER
Abstract: Red needle cast (RNC), mainly caused by Phytophthora pluvialis, is a very damaging disease of the widely grown species radiata pine within New Zealand. Using a combination of satellite imagery and weather data, a novel methodology was developed to pre-visually predict the incidence of RNC on radiata pine within the Gisborne region of New Zealand over a five-year period from 2019 to 2023. Sentinel-2 satellite imagery was used to classify areas within the region as being disease-free or showing RNC expression from the difference in the red/green index (R/Gdiff) during a disease-free time of the year and the time of maximum disease expression in the upper canopy (early spring–September). Within these two classes, 1976 plots were extracted, and a classification model was used to predict disease incidence from mean monthly weather data for key variables during the 11 months prior to disease expression. The variables in the final random forest model included solar radiation, relative humidity, rainfall, and the maximum air temperature recorded during mid–late summer, which provided a pre-visual prediction of the disease 7–8 months before its peak expression. Using a hold-out test dataset, the final random forest model had an accuracy of 89% and an F1 score of 0.89. This approach can be used to mitigate the impact of RNC by focusing on early surveillance and treatment measures. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Forest Habitat Mapping in Natura2000 Regions in Cyprus Using Sentinel-1, Sentinel-2 and Topographical Features.

Author: Prodromou, Maria, Theocharidis, Christos, Gitas, Ioannis Z., Eliades, Filippos, Themistocleous, Kyriacos, Papasavvas, Konstantinos, Dimitrakopoulos, Constantinos, Danezis, Chris, and Hadjimitsis, Diofantos
Subjects: *FOREST mapping, *FOREST monitoring, *RANDOM forest algorithms, *REMOTE sensing
Abstract: Accurate mapping of forest habitats, especially in NATURA sites, is essential information for forest monitoring and sustainable management but also for habitat characterisation and ecosystem functioning. Remote sensing data and spatial modelling allow accurate mapping of the presence and distribution of tree species and habitats and are valuable tools for the long-term assessment of habitat status required by the European Commission. In order to serve the above, the present study aims to propose a methodology to accurately map the spatial distribution of forest habitats in three NATURA2000 sites of Cyprus by employing Sentinel-1 and Sentinel-2 data as well as topographic features using the Google Earth Engine (GEE). A pivotal aspect of the methodology identified was that the best band combination of the Random Forest (RF) classifier achieves the highest performance for mapping the dominant habitats in the three case studies. Specifically, in the Akamas region, eight habitat types have been mapped, in Paphos nine and six in Troodos. These habitat types are included in three of the nine habitat groups based on the EU's Habitat Directive: the sclerophyllous scrub, rocky habitats and caves and forests. The results show that using the RF algorithm achieves the highest performance, especially using Dataset 6, which is based on S2 bands, spectral indices and topographical features, and Dataset 13, which includes S2, S1, spectral indices and topographical features. These datasets achieve an overall accuracy (OA) of approximately 91–94%. In contrast, Dataset 7, which includes only S1 bands and Dataset 9, which combines S1 bands and spectral indices, achieve the lowest performance with an OA of approximately 25–43%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

18,111 results on '"RANDOM forest algorithms"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources