718 results on '"classification and regression tree"'
Search Results
2. Land-use classification of Malaysian soils by ultra-high performance liquid chromatography (UHPLC)-based untargeted data combined with chemometrics for forensic provenance
- Author
-
Mohd Rosdi, Nur Ain Najihah Binti, Abdul Halim, Nur Izzma Hanis, Sashidharan, Jeevna A/P, Abd Hamid, Nadirah, Abdul Halim, Azhar, Sino, Hukil, and Lee, Loong Chuen
- Published
- 2024
- Full Text
- View/download PDF
3. Evaluation of ensemble data preprocessing strategy on forensic gasoline classification using untargeted GC–MS data and classification and regression tree (CART) algorithm
- Author
-
Md Ghazi, Md Gezani Bin, Lee, Loong Chuen, Samsudin, Aznor Sheda Binti, and Sino, Hukil
- Published
- 2022
- Full Text
- View/download PDF
4. Estimating body weight in Sujiang pigs using artificial neural network, nearest neighbor, and CART algorithms: a comparative study using morphological measurements.
- Author
-
Ergin, Malik and Koşkan, Özgür
- Abstract
The objectives of this study were to evaluate different machine learning algorithms for predicting body weight (BW) in Sujiang pigs using the following morphological traits: age, body length (BL), backfat thickness (BFT), chest circumference (CC), body height (BH), chest width (CW), and hip width (HW). Additionally, this study also investigated which machine learning algorithms could accurately and efficiently predict body weight in pigs using a limited set of morphological traits. For this purpose, morphological measurements of 365 mature (180 ± 5 days) Sujiang pigs from the Jiangsu Sujiang Pig Breeding Farm in Taizhou, Jiangsu Province, China were used. The age of the pigs (180 ± 5 days) was also included as a nominal predictor. In total, 218 individual measurements were obtained after data preprocessing. In the Sujiang pig dataset, BW had a significantly positive and high linear relationship with BH, BL, CW, HW, and CC resulting in values of 0.66, 0.72, 0.81, 0.84, and 0.88, respectively (p < 0.01). Artificial neural network (ANN), K-nearest neighbors (KNN), and classification and regression tree (CART) algorithms were used to predict BW. Overall, the ANN algorithm outperformed the other algorithms in this pig dataset according to the goodness of fit criteria of R
2 = 0.85, RMSE = 3.98, MAD = 3.25, MAPE = 4.25, SDR = 0.39, RAE = 0.002, MRAE = 0.008, and AIC = 97.96. The ANN algorithm was trained using several training algorithms, such as the Levenberg‒Marquardt algorithm, Bayesian regularization, and scaled conjugate gradient. In addition, the number of neurons in the hidden layer was manipulated to 2, 3, or 4. All training algorithms yielded similar results. However, when the predictor variables were HW, BL, and BH, the Levenberg–Marquardt network had the best ability to predict body weight in Sujiang pigs (R2 = 0.83). When BH measurements were not included in the model, the model’s predictive ability decreased by approximately 5%. According to the results, the use of Levenberg‒Marquardt and Bayesian Regularization in the ANN algorithm can help improve breeding strategies. The traits determined to be the best predictors of BW in Sujiang pigs using the ANN algorithm can be used as indirect selection criteria in the future. This study suggests that different age stages, breeds, and morphological traits can be used to accurately predict BW in pigs in future research. These findings indicate that the ANN algorithm is a powerful tool for accurately predicting pig BW using a limited set of traits. The results of the ANN model can be used to establish selection criteria and breed standards for Sujiang pigs. [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
5. Development of a clinical prediction rule for mobility status at discharge in patients with total knee arthroplasty: Using a decision tree model.
- Author
-
Kuwahara, Kenta, Kato, Toshihiro, Akatsuka, Yuko, Nakazora, Shigeto, Fukuda, Aki, and Asada, Keiji
- Subjects
- *
TOTAL knee replacement , *RECEIVER operating characteristic curves , *DECISION making , *KNEE osteoarthritis , *DECISION trees , *CLINICAL prediction rules , *KNEE pain - Abstract
Total knee arthroplasty (TKA) is an effective treatment to improve mobility in patients with severe knee osteoarthritis. However, some patients continue to have poor mobility after surgery. The preoperative identification of patients with poor mobility after TKA allows for better treatment selection and appropriate goal setting. The purpose of this study was to develop a clinical prediction rule (CPR) to predict mobility after TKA. This study included patients undergoing primary TKA. Predictors of outcome included patient characteristics, physical function, and psychological factors, which were measured preoperatively. The outcome measure was the Timed Up and Go test, which was measured at discharge. Patients with a score of ≥11 s were considered having a low-level of mobility. The classification and regression tree methodology of decision tree analysis was used for developing a CPR. Of the 101 cases (mean age, 72.2 years; 71.3 % female), 26 (25.7 %) were classified as low-mobility. Predictors were the modified Gait Efficacy Scale, age, knee pain on the operated side, knee extension range of motion on the non-operated side, and Somatic Focus, a subscale of the Tampa Scale for Kinesiophobia (short version). The model had a sensitivity of 50.0 %, a specificity of 98.7 %, a positive predictive value of 92.9 %, a positive likelihood ratio of 37.5, and an area under the receiver operating characteristic curve of 0.853. We have developed a CPR that, with some accuracy, predicts the mobility outcomes of patients after TKA. This CPR may be useful for predicting postoperative mobility and clinical goal setting. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Design and performance analysis of decision tree learning model for classification of dry and cooked rice samples.
- Author
-
Bhattacharyya, Suman Kumar and Pal, Sagarika
- Subjects
- *
UPLAND rice , *FEATURE extraction , *MACHINE learning , *FOOD chemistry , *FOOD safety , *RICE quality - Abstract
Ensuring accurate classification of rice as either cooked or dry is vital for food safety, as improperly stored or cooked rice contain harmful bacteria, emphasizing the importance of maintaining and monitoring food safety standards. In the field of image analysis and food classification, classifying dry and cooked rice samples using photographs is an interesting but difficult task. The main challenge stems from the minute visual variations between cooked and dry rice, which has not always displayed distinct traits that are easily observable by machines. Hence, various machine learning algorithms were implemented to effectively mitigate this issue. However, the existing works have not analysed the physicochemical characteristics due to non-destructive type of experimentation method with image processing. To overcome this issue, this work develops the Classification and Regression Tree (CART) of Decision Tree Learning method for classifying the rice grain samples as dry or cooked based on the physicochemical characteristics such as morphological, texture and color features, which in turn gain an exhaustive facts of rice quality in diverse state. Initially, the images are captured, pre-processed and the features are extracted. From the extracted features, the rice samples are classified as dry and cooked using DT and the results are compared with the existing algorithms like K-Nearest Neighbour (KNN) and Support Vector Machine (SVM). The comparative analysis of these classification algorithms infers the outperformance of the DT learning model under morphological, texture and color features in terms of accuracy, error, precision, recall and F-score. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Identification and prediction of association patterns between nutrient intake and anemia using machine learning techniques: results from a cross-sectional study with university female students from Palestine.
- Author
-
Qasrawi, Radwan, Badrasawi, Manal, Al-Halawa, Diala Abu, Polo, Stephanny Vicuna, Khader, Rami Abu, Al-Taweel, Haneen, Alwafa, Reem Abu, Zahdeh, Rana, Hahn, Andreas, and Schuchardt, Jan Philipp
- Subjects
- *
IRON deficiency anemia , *RISK assessment , *CROSS-sectional method , *PROTEINS , *DATA mining , *FOOD consumption , *MALNUTRITION , *DIETARY patterns , *CLUSTER analysis (Statistics) , *T-test (Statistics) , *HEALTH status indicators , *RESEARCH funding , *NUTRITIONAL requirements , *DESCRIPTIVE statistics , *MICRONUTRIENTS , *ANALYSIS of variance , *VITAMINS , *COLLEGE students , *MACHINE learning , *WOMEN'S health , *DECISION trees , *MINERALS , *ALGORITHMS , *DISEASE risk factors , *DISEASE complications - Abstract
Purpose: This study utilized data mining and machine learning (ML) techniques to identify new patterns and classifications of the associations between nutrient intake and anemia among university students. Methods: We employed K-means clustering analysis algorithm and Decision Tree (DT) technique to identify the association between anemia and vitamin and mineral intakes. We normalized and balanced the data based on anemia weighted clusters for improving ML models' accuracy. In addition, t-tests and Analysis of Variance (ANOVA) were performed to identify significant differences between the clusters. We evaluated the models on a balanced dataset of 755 female participants from the Hebron district in Palestine. Results: Our study found that 34.8% of the participants were anemic. The intake of various micronutrients (i.e., folate, Vit A, B5, B6, B12, C, E, Ca, Fe, and Mg) was below RDA/AI values, which indicated an overall unbalanced malnutrition in the present cohort. Anemia was significantly associated with intakes of energy, protein, fat, Vit B1, B5, B6, C, Mg, Cu and Zn. On the other hand, intakes of protein, Vit B2, B5, B6, C, E, choline, folate, phosphorus, Mn and Zn were significantly lower in anemic than in non-anemic subjects. DT classification models for vitamins and minerals (accuracy rate: 82.1%) identified an inverse association between intakes of Vit B2, B3, B5, B6, B12, E, folate, Zn, Mg, Fe and Mn and prevalence of anemia. Conclusions: Besides the nutrients commonly known to be linked to anemia—like folate, Vit B6, C, B12, or Fe—the cluster analyses in the present cohort of young female university students have also found choline, Vit E, B2, Zn, Mg, Mn, and phosphorus as additional nutrients that might relate to the development of anemia. Further research is needed to elucidate if the intake of these nutrients might influence the risk of anemia. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Temporal–Spatial Fluctuations of a Phytoplankton Community and Their Association with Environmental Variables Based on Classification and Regression Tree in a Shallow Temperate Mountain River.
- Author
-
Tian, Wang, Wang, Zhongyu, Kong, Haifei, Tian, Yonglan, and Huang, Tousheng
- Subjects
OXIDATION-reduction potential ,REGRESSION trees ,MICROCYSTIS aeruginosa ,SPECIES diversity ,WATER temperature - Abstract
The effects of environmental factors on phytoplankton are not simply positive or negative but complex and dependent on the combination of their concentrations in a fluctuating environment. Traditional statistical methods may miss some of the complex interactions between the environment and phytoplankton. In this study, the temporal–spatial fluctuations of phytoplankton diversity and abundance were investigated in a shallow temperate mountain river. The machine learning method classification and regression tree (CART) was used to explore the effects of environmental variables on the phytoplankton community. The results showed that both phytoplankton species diversity and abundance varied fiercely due to environmental fluctuation. Microcystis aeruginosa, Amphiprora sp., Anabaena oscillarioides, and Gymnodinium sp. were the dominant species. The CART analysis indicated that dissolved oxygen, oxidation-reduction potential, total nitrogen (TN), total phosphorus (TP), and water temperature (WT) explained 36.00%, 13.81%, 11.35%, 9.96%, and 8.80%, respectively, of phytoplankton diversity variance. Phytoplankton abundance was mainly affected by TN, WT, and TP, with variance explanations of 39.40%, 15.70%, and 14.09%, respectively. Most environmental factors had a complex influence on phytoplankton diversity and abundance: their effects were positive under some conditions but negative under other combinations. The results and methodology in this study are important in quantitatively understanding and exploring aquatic ecosystems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Juvenile Albacore tuna (Thunnus alalunga) foraging ecology varies with environmental conditions in the California Current Large Marine Ecosystem
- Author
-
Nickels, Catherine F, Portner, Elan J, Snodgrass, Owyn, Muhling, Barbara, and Dewar, Heidi
- Subjects
Earth Sciences ,Oceanography ,Agricultural ,Veterinary and Food Sciences ,Fisheries Sciences ,Life Below Water ,classification and regression tree ,diet ,environmental drivers ,fisheries interactions ,foraging strategy ,prey ,stomach contents ,Fisheries ,Fisheries sciences - Abstract
Juvenile North Pacific Albacore tuna (Thunnus alalunga) support commercial and recreational fisheries in the California Current Large Marine Ecosystem (CCLME), where they forage during summer and fall. The distributions of the commercial and recreational fisheries and estimates of forage availability have varied substantially over the past century. Time-series quantifying Albacore diet can help link forage composition to variability in Albacore abundance and distribution and, consequently, their availability to fishers. Previous diet studies in the CCLME are of relatively short duration, and long-term variability in Albacore diet remains poorly understood. We describe the diets of juvenile Albacore from three regions in the CCLME from 2007 to 2019 and use classification and regression tree analysis to explore environmental drivers of variability. Important prey include Northern Anchovy (Engraulis mordax), rockfishes (Sebastes spp.), Boreal Clubhook Squid (Onychoteuthis borealijaponica), euphausiids (Order: Euphausiidae), and amphipods (Order: Amphipoda), each contributing >5% mean proportional abundance. Most prey items were short lived species or young-of-the-year smaller than 10 cm. Diet variability was related to environmental conditions over the first 6 months of the year (PDO, sea surface temperature, and NPGO) and conditions concurrent with Albacore capture (region and surface nitrate flux). We describe foraging flexibility over regional and annual scales associated with these environmental influences. Continuous, long-term studies offer the opportunity to identify flexibility in Albacore foraging behavior and begin to make a predictive link between environmental conditions early in the year and Albacore foraging during summer and fall.
- Published
- 2023
10. A user guide of CART and random forests with applications in FinTech and InsurTech
- Author
-
Chen, Yongzhao, Cheung, Ka Chun, Sun, Ross Zhengyao, and Yam, Sheung Chi Phillip
- Published
- 2024
- Full Text
- View/download PDF
11. Assessing the importance of risk factors for diabetic retinopathy in patients with type 2 diabetes mellitus: Results from the classification and regression tree models
- Author
-
Ziyang Zhang, Deliang Lv, Yueyue You, Zhiguang Zhao, Wei Hu, Fengzhu Xie, Yali Lin, Wei Xie, and Xiaobing Wu
- Subjects
classification and regression tree ,diabetes mellitus ,diabetic retinopathy ,hemoglobin a1c ,risk factors ,Public aspects of medicine ,RA1-1270 - Abstract
BACKGROUND: Diabetic retinopathy (DR) is one of the serious complications of diabetes mellitus (DM). Many studies have identified the risk factors associated with DR, but there is not much evidence on the importance of these factors for DR. This study aimed to investigate the associated factors for patients with type 2 DM (T2DM) and calculate the importance of the identified factors. MATERIALS AND METHODS: Using probability proportionate to size sampling method in this community-based cross-sectional study, 22 community health service centers were selected from 10 administrative districts in Shenzhen, China. Approximately 60 T2DM patients were recruited from each center. The participants completed a structural questionnaire, had their venous blood collected, and underwent medical examinations and fundus photography. Logistic regression models were used to identify the risk factors of DR. The classification and regression tree (CART) model was used to calculate the importance of the identified risk factors. RESULTS: This study recruited 1097 T2DM patients, 266 of whom were identified as having DR, yielding a prevalence rate of 24.3% (95% confidence interval [CI]: 21.7%–26.9%). Results showed that a longer duration of DM, indoor-type lifestyle, and higher levels of hemoglobin A1c (HbA1c) or urea increased the risk of DR. Patients with HbA1c values ≥7% were about 2.45 times (odds ratio: 2.45; 95% CI: 1.83–3.29) more likely to have DR than their counterparts. The CART model found that the values of variable importance for HbA1c, DM duration, lifestyle (i.e., indoor type), and urea were 48%, 37%, 10%, and 4%, respectively. CONCLUSION: The prevalence of DR is high for T2DM patients who receive DM health management services from the primary healthcare system. HbA1c is the most important risk factor for DR. Integration of DR screening and HbA1c testing into the healthcare services for T2DM to reduce vision impairment and blindness is urgently warranted.
- Published
- 2024
- Full Text
- View/download PDF
12. Stratifying Mortality Risk in Intensive Care: A Comprehensive Analysis Using Cluster Analysis and Classification and Regression Tree Algorithms
- Author
-
Antonio Romanelli, Salvatore Palmese, Serena De Vita, Alessandro Calicchio, and Renato Gammaldi
- Subjects
Charlson Comorbidity Index ,Simplified Acute Physiology Score II ,Sequential Organ Failure Assessment ,Cluster analysis ,Classification and regression tree ,Mortality ,Medical emergencies. Critical care. Intensive care. First aid ,RC86-88.9 ,Medicine - Abstract
Abstract Background Machine learning (ML) can be promising for stratifying patients into homogeneous groups and assessing mortality based on score combination. Using ML, we compared mortality prediction performance for clustered and non-clustered models and tried to develop a simple decision algorithm to predict the patient’s cluster membership with classification and regression trees (CART). Methods Retrospective study involving patients requiring ICU admission (1st January 2011–16th September 2022). Clusters were identified by combining Charlson Comorbidity Index (CCI) plus Simplified Acute Physiology Score II (SAPS II) or Sequential Organ Failure Assessment (SOFA). Intercluster and survival analyses were performed. We analyzed the relationship with mortality with multivariate logistic regressions and receiver operating characteristic curves (ROC) for models with and without clusters. Nested models were compared with Likelihood Ratio Tests (LRT). Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were compared for non-nested models. With the best model, we used CART to build a decision tree for patient’s membership. Results Our sample consisted of 2605 patients (mortality 59.7%). For both score combinations, we identified two clusters (A and B for CCI + SAPS II, α and β for CCI + SOFA). Belonging to cluster B/β was associated with shorter survival times (Peto-Peto p-values
- Published
- 2024
- Full Text
- View/download PDF
13. Assessing the importance of risk factors for diabetic retinopathy in patients with type 2 diabetes mellitus: Results from the classification and regression tree models.
- Author
-
Zhang, Ziyang, Lv, Deliang, You, Yueyue, Zhao, Zhiguang, Hu, Wei, Xie, Fengzhu, Lin, Yali, Xie, Wei, and Wu, Xiaobing
- Subjects
TYPE 2 diabetes ,COMMUNITY health services ,VISION disorders ,DIABETES ,REGRESSION trees - Abstract
BACKGROUND: Diabetic retinopathy (DR) is one of the serious complications of diabetes mellitus (DM). Many studies have identified the risk factors associated with DR, but there is not much evidence on the importance of these factors for DR. This study aimed to investigate the associated factors for patients with type 2 DM (T2DM) and calculate the importance of the identified factors. MATERIALS AND METHODS: Using probability proportionate to size sampling method in this community-based cross-sectional study, 22 community health service centers were selected from 10 administrative districts in Shenzhen, China. Approximately 60 T2DM patients were recruited from each center. The participants completed a structural questionnaire, had their venous blood collected, and underwent medical examinations and fundus photography. Logistic regression models were used to identify the risk factors of DR. The classification and regression tree (CART) model was used to calculate the importance of the identified risk factors. RESULTS: This study recruited 1097 T2DM patients, 266 of whom were identified as having DR, yielding a prevalence rate of 24.3% (95% confidence interval [CI]: 21.7%–26.9%). Results showed that a longer duration of DM, indoor-type lifestyle, and higher levels of hemoglobin A1c (HbA1c) or urea increased the risk of DR. Patients with HbA1c values ≥7% were about 2.45 times (odds ratio: 2.45; 95% CI: 1.83–3.29) more likely to have DR than their counterparts. The CART model found that the values of variable importance for HbA1c, DM duration, lifestyle (i.e., indoor type), and urea were 48%, 37%, 10%, and 4%, respectively. CONCLUSION: The prevalence of DR is high for T2DM patients who receive DM health management services from the primary healthcare system. HbA1c is the most important risk factor for DR. Integration of DR screening and HbA1c testing into the healthcare services for T2DM to reduce vision impairment and blindness is urgently warranted. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Machine Learning-Driven Classification for Enhanced Rule Proposal Framework.
- Author
-
Gomathi, B., Manimegalai, R., Santhanam, Srivatsan, and Biswas, Atreya
- Subjects
MACHINE learning ,TRUST ,REGRESSION trees ,DECISION trees ,SELF-efficacy - Abstract
In enterprise operations, maintaining manual rules for enterprise processes can be expensive, time-consuming, and dependent on specialized domain knowledge in that enterprise domain. Recently, rule-generation has been automated in enterprises, particularly through Machine Learning, to streamline routine tasks. Typically, these machine models are black boxes where the reasons for the decisions are not always transparent, and the end users need to verify the model proposals as a part of the user acceptance testing to trust it. In such scenarios, rules excel over Machine Learning models as the end-users can verify the rules and have more trust. In many scenarios, the truth label changes frequently thus, it becomes difficult for the Machine Learning model to learn till a considerable amount of data has been accumulated, but with rules, the truth can be adapted. This paper presents a novel framework for generating human-understandable rules using the Classification and Regression Tree (CART) decision tree method, which ensures both optimization and user trust in automated decision-making processes. The framework generates comprehensible rules in the form of if condition and then predicts class even in domains where noise is present. The proposed system transforms enterprise operations by automating the production of human-readable rules from structured data, resulting in increased efficiency and transparency. Removing the need for human rule construction saves time and money while guaranteeing that users can readily check and trust the automatic judgments of the system. The remarkable performance metrics of the framework, which achieve 99.85% accuracy and 96.30% precision, further support its efficiency in translating complex data into comprehensible rules, eventually empowering users and enhancing organizational decision-making processes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Development of clinical screening tool for exocrine pancreatic insufficiency in patients with definite chronic pancreatitis.
- Author
-
Othman, Mohamed O., Forsmark, Christopher, Yadav, Dhiraj, Singh, Vikesh K., Lara, Luis F., Park, Walter, Zhang, Zuoyi, Yu, Jun, and Kort, Jens J.
- Abstract
No simple, accurate diagnostic tests exist for exocrine pancreatic insufficiency (EPI), and EPI remains underdiagnosed in chronic pancreatitis (CP). We sought to develop a digital screening tool to assist clinicians to predict EPI in patients with definite CP. This was a retrospective case-control study of patients with definite CP with/without EPI. Overall, 49 candidate predictor variables were utilized to train a Classification and Regression Tree (CART) model to rank all predictors and select a parsimonious set of predictors for EPI status. Five-fold cross-validation was used to assess generalizability, and the full CART model was compared with 4 additional predictive models. EPI misclassification rate (mRate) served as primary endpoint metric. 274 patients with definite CP from 6 pancreatitis centers across the United States were included, of which 58 % had EPI based on predetermined criteria. The optimal CART decision tree included 10 variables. The mRate without/with 5-fold cross-validation of the CART was 0.153 (training error) and 0.314 (prediction error), and the area under the receiver operating characteristic curve was 0.889 and 0.682, respectively. Sensitivity and specificity without/with 5-fold cross-validation was 0.888/0.789 and 0.794/0.535, respectively. A trained second CART without pancreas imaging variables (n = 6), yielded 8 variables. Training error/prediction error was 0.190/0.351; sensitivity was 0.869/0.650, and specificity was 0.728/0.649, each without/with 5-fold cross-validation. We developed two CART models that were integrated into one digital screening tool to assess for EPI in patients with definite CP and with two to six input variables needed for predicting EPI status. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Stratifying Mortality Risk in Intensive Care: A Comprehensive Analysis Using Cluster Analysis and Classification and Regression Tree Algorithms.
- Author
-
Romanelli, Antonio, Palmese, Salvatore, De Vita, Serena, Calicchio, Alessandro, and Gammaldi, Renato
- Subjects
MORTALITY ,CRITICAL care medicine ,MACHINE learning ,DECISION making ,REGRESSION trees - Abstract
Background: Machine learning (ML) can be promising for stratifying patients into homogeneous groups and assessing mortality based on score combination. Using ML, we compared mortality prediction performance for clustered and non-clustered models and tried to develop a simple decision algorithm to predict the patient's cluster membership with classification and regression trees (CART). Methods: Retrospective study involving patients requiring ICU admission (1st January 2011–16th September 2022). Clusters were identified by combining Charlson Comorbidity Index (CCI) plus Simplified Acute Physiology Score II (SAPS II) or Sequential Organ Failure Assessment (SOFA). Intercluster and survival analyses were performed. We analyzed the relationship with mortality with multivariate logistic regressions and receiver operating characteristic curves (ROC) for models with and without clusters. Nested models were compared with Likelihood Ratio Tests (LRT). Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were compared for non-nested models. With the best model, we used CART to build a decision tree for patient's membership. Results: Our sample consisted of 2605 patients (mortality 59.7%). For both score combinations, we identified two clusters (A and B for CCI + SAPS II, α and β for CCI + SOFA). Belonging to cluster B/β was associated with shorter survival times (Peto-Peto p-values < 0.0001) and increased mortality (Odds-ratio 4.65 and 5.44, respectively). According to LRT and ROC analysis, clustered models performed better, and CCI + SOFA showed the lowest AIC and BIC values (AIC = 3021.21, BIC = 3132.65). Using CART (β cluster positive case) the accuracy of the decision tree was 94.8%. Conclusion: Clustered models significantly improved mortality prediction. The CCI + SOFA clustered model showed the best balance between complexity and data fit and should be preferred. Developing a user-friendly decision-making algorithm for cluster membership with CART showed high accuracy. Further validation studies are needed to confirm these findings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Classification and regression tree (CART) for predicting cadmium (Cd) uptake by rice (Oryza sativa L.) and its application to derive soil Cd threshold based on field data
- Author
-
Haoting Tian, Yan Zhang, Xiaohui Yang, Huan Zhang, Dengfeng Wang, Pengbao Wu, Aijing Yin, and Chao Gao
- Subjects
Cadmium ,Rice ,Soil threshold ,Classification and regression tree ,Field validation ,Environmental pollution ,TD172-193.5 ,Environmental sciences ,GE1-350 - Abstract
The entry of Cd into soil-rice systems is a growing concern as it can pose potential risks to public health. To derive regional soil Cd threshold, a total of 333 paired soil and rice samples was collected in Anhui Province, Eastern China. The results showed that the total soil Cd and soil Zn/Cd were the most significant variables contributing to Cd content in polished rice. The Chinese Soil Quality Standards might overestimate risk posed by Cd-contaminated soil for rice production in the mining area due to high Zn/Cd values of some mining-associated soils. Cd levels in polished rice can be predictable using stepwise multiple linear regression (MLR) model. However, the derived soil Cd threshold based on the MLR model would be unrealistically high. The classification and regression tree method (CART) performed well in simulating Cd levels in polished rice and can be used to derive soil Cd threshold instead of MLR to minimize the uncertainty.
- Published
- 2024
- Full Text
- View/download PDF
18. Network Screening on Low-Volume Roads Using Risk Factors
- Author
-
Kazi Tahsin Huda and Ahmed Al-Kaisy
- Subjects
low-volume roads ,crash prediction ,Empirical Bayes ,network screening ,classification and regression tree ,risk factors ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
This paper proposes a new method for network screening on rural low-volume roads. These roads are important as they provide critical access to agricultural land and tourist attractions. Most low-volume roads belong to the lowest functional class (local rural roads) and thus are built to lower design standards. Conventional hot spot network screening techniques may not be appropriate for low-volume roads due to the sporadic nature of crashes occurring on these roads. Conversely, sophisticated network screening approaches require extensive roadway and traffic data that are often unavailable to local agencies due to a lack of resources, and/or a lack of technical expertise. This research attempts to address these obstacles to low-volume road network screening which aims to identify candidate sites for safety improvements. The research used an extensive low-volume road sample from the state of Oregon and Empirical Bayes expected number of crashes in developing the proposed models for network screening. The proposed models do not require exact measurements of roadway geometric features as all geometric variables were classified into categories that are easy to compile by local agencies. Further, the method could be used with and without traffic data, without compromising the effectiveness of the network screening process.
- Published
- 2024
- Full Text
- View/download PDF
19. Thermal decomposition kinetics of poly(vinyl chloride) insulation for overloaded and non-overloaded wires
- Author
-
Man, Peng Rui, Lin, Qing Wen, Xu, Jie, Wang, Huai Bin, Zhao, Yan Hong, Su, Wen Wei, Lyu, Hui Fei, and Li, Yang
- Published
- 2024
- Full Text
- View/download PDF
20. Development of Novel Hybrid Intelligent Predictive Models for Dilution Prediction in Underground Sub-level Mining
- Author
-
Chimunhu, Prosper, Faradonbeh, Roohollah Shirani, Topal, Erkan, Asad, Mohammad Waqar Ali, and Ajak, Ajak Duany
- Published
- 2024
- Full Text
- View/download PDF
21. Evolution of Crop Planting Structure in Traditional Agricultural Areas and Its Influence Factors: A Case Study in Alar Reclamation.
- Author
-
Jiang, Shuqi, Yu, Jiankui, Li, Shenglin, Liu, Junming, Yang, Guang, Wang, Guangshuai, Wang, Jinglei, and Song, Ni
- Subjects
- *
COTTON , *CART algorithms , *AGRICULTURE , *AGRICULTURAL prices , *LANDSAT satellites , *FARM mechanization , *REMOTE-sensing images - Abstract
This research provides a comprehensive analysis of the spatiotemporal evolution of the regional cropping structure and its influencing factors. Using Landsat satellite images, field surveys, and yearbook data, we developed a planting structure extraction model employing the classification regression tree algorithm to obtain data on the major crop cultivation and structural characteristics of Alar reclamation from 1990 to 2023. A dynamic model and transfer matrix were used to analyze temporal changes, and a centroid migration model was used to study spatial changes in the cropping structure. Nonparametric mutation tests and through-traffic coefficient analysis were utilized to quantify the main driving factors influencing the cropping structure. During the period of 1990–2023, the cotton area in the Alar reclamation region expanded by 722.08 km2, while the jujube exhibited an initial increase followed by a decrease in the same period. The primary reasons are linked to the cost of purchase, agricultural mechanization, and crop compatibility. In the Alar reclamation area, cotton, chili, and jujube are the primary cultivated crops. Cotton is mainly grown on the southern side of the Tarim River, while chili cultivation is concentrated on the northern bank of the river. Over the years, there has been a noticeable spatial complementarity in the distribution and density of rice and cotton crops in this region. In the Alar reclamation, the main factors influencing the change in cultivated land area are cotton price, agricultural machinery gross power, and population. Consequently, implementing measures such as providing planting subsidies and other policy incentives to enhance planting income can effectively stimulate farmers' willingness to engage in planting activities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Network Screening on Low-Volume Roads Using Risk Factors.
- Author
-
Huda, Kazi Tahsin and Al-Kaisy, Ahmed
- Subjects
LOW-volume roads ,TRAFFIC accidents ,TOURIST attractions ,REGRESSION trees ,DATA analysis - Abstract
This paper proposes a new method for network screening on rural low-volume roads. These roads are important as they provide critical access to agricultural land and tourist attractions. Most low-volume roads belong to the lowest functional class (local rural roads) and thus are built to lower design standards. Conventional hot spot network screening techniques may not be appropriate for low-volume roads due to the sporadic nature of crashes occurring on these roads. Conversely, sophisticated network screening approaches require extensive roadway and traffic data that are often unavailable to local agencies due to a lack of resources, and/or a lack of technical expertise. This research attempts to address these obstacles to low-volume road network screening which aims to identify candidate sites for safety improvements. The research used an extensive low-volume road sample from the state of Oregon and Empirical Bayes expected number of crashes in developing the proposed models for network screening. The proposed models do not require exact measurements of roadway geometric features as all geometric variables were classified into categories that are easy to compile by local agencies. Further, the method could be used with and without traffic data, without compromising the effectiveness of the network screening process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. ESTIMATING THE POWER DRAW OF GRIZZLY FEEDERS USED IN CRUSHING-SCREENING PLANTS THROUGH SOFT COMPUTING ALGORITHMS.
- Author
-
KÖKEN, Ekin
- Subjects
COMMUTER aircraft ,RANDOM forest algorithms ,REGRESSION analysis ,SOFT computing ,P-value (Statistics) - Abstract
In this study, the power draw (P) of several grizzly feeders used in the Turkish Mining Industry (TMI) is investigated by considering the classification and regression tree (CART), random forest (RF) and adaptive neuro-fuzzy inference system (ANFIS) algorithms. For this purpose, a comprehensive field survey is performed to collect quantitative data, including power draw (P) of some grizzly feeders and their working conditions such as feeder width (W), feeder length (L), feeder capacity (Q), and characteristic feed size (F80). Before applying the soft computing methodologies, correlation analyses are performed between the input parameters and the output (P). According to these analyses, it is found that W and L are highly associated with P. On the other hand, Q is moderately correlated with P. Consequently, numerous soft computing models were run to estimate the P of the grizzly feeders. Soft computing analysis results demonstrate no superiority between the performances of RF and CART models. The RF analysis results indicate that the W is necessary for evaluating P for grizzly feeders. On the other hand, the ANFIS-based predictive model is found to be the best tool to estimate varying P values, and it satisfies promising results with a correlation of determination value (R2) of 0.97. It is believed that the findings obtained from the present study can guide relevant engineers in selecting the proper motors propelling grizzly feeders. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Hybridized intelligent multi-class classifiers for rockburst risk assessment in deep underground mines.
- Author
-
Shirani Faradonbeh, Roohollah, Vaisey, Will, Sharifzadeh, Mostafa, and Zhou, Jian
- Subjects
- *
MINES & mineral resources , *K-means clustering , *CART algorithms , *CLUSTER analysis (Statistics) , *REGRESSION trees , *MINE safety , *HAZARD Analysis & Critical Control Point (Food safety system) - Abstract
The rockburst hazard induced by the extreme release of the stress concentrated in rock mass in deep underground mines poses a significant threat to the safety and economy of the mining projects. Therefore, properly managing this hazard is critical for ensuring rock engineering projects' sustainability. This study proposes comprehensible and practical classifiers for rockburst risk level appraisal by hybridizing K-means clustering with gene expression programming, GEP, logistic regression, LR, and classification and regression tree, CART (i.e., K-mean-GEP-LR and K-means-CART classifiers). A database containing 246 rockburst events with four risk levels of none, light, moderate, and severe was compiled from previous practices. Preliminary statistical analyses were conducted to detect the extreme outliers and determine the critical rockburst indicators. The K-means clustering analysis was performed to identify the main clusters within the database and relabel the rockburst events. The GEP algorithm was then utilized to develop binary models for predicting the occurrence of each class. Then, the likelihood of each class occurrence was determined using LR. Furthermore, the K-means clustering was combined with the CART algorithm to provide another visual tree structure model. The classifiers' performance evaluation showed 96% and 95% accuracy values in the training and testing stages, respectively, for the K-means-GEP-LR model, while the accuracy values of 98.8% and 93.0% were obtained for the foregoing stages for the K-means-CART classifier. The results showed the robustness and high classification capability of both models. MatLab codes were also provided for the K-means-GEP-LR model, which assists other researchers/engineers in implementing the model in practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. An Early Evaluation of the Long-Term Influence of Academic Papers Based on Machine Learning Algorithms
- Author
-
Junping Qiu and Xiaolin Han
- Subjects
Academic papers ,multiple linear regression ,artificial neural network ,classification and regression tree ,prediction model ,research value ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Evaluating the long-term influence of academic papers in the early stage after publication is crucial for research management and decision-making. Although the total number of citations has been considered as a valid indicator to measure the academic influence of papers, there are still some bottleneck problems in predicting future citation counts. Firstly, it remains a challenging task to select the reasonable features due to the diversity of model features. Secondly, some prediction models are overly complex and their defects may cause prediction bias or even errors. Finally, the most important thing is that very few models possess long-term predictive ability. This paper proposes a prediction model that can provide an early evaluation of the long-term citations, or the long-term influence, of academic papers. The model uses features available immediately after publication or after a certain period of time. Taking academic papers in Information Science and Library Science as our experimental subject, we train the model using three machine learning algorithms and find that the artificial neural network performs satisfactorily. Our findings may offer guidance for research management and decision-making.
- Published
- 2024
- Full Text
- View/download PDF
26. Alzheimer’s disease susceptibility in African American elders: a classification and regression tree (CART) analysis approach
- Author
-
Sung Seek Moon, Lindsey Anderson, Jinwon Lee, and Youngkwang Moon
- Subjects
alzheimer's disease ,african american ,antidepressant usage ,bmi ,classification and regression tree ,risk factors ,smoking ,Social Sciences ,Medicine (General) ,R5-920 - Abstract
Alzheimer's disease (AD) is increasingly prevalent, especially among African American older adults. Despite its widespread nature, accurate and timely diagnosis of AD remains challenging. Addressing the research gap in sociodemographic and cardiovascular risk factor research associated with AD in African American older adults, this study aimed to identify and analyze distinct subgroups within this population that are particularly vulnerable to AD, thereby contributing to the development of targeted interventions and healthcare strategies. This study employs a rigorous methodology utilizing classification and regression tree (CART) analysis to examine data from the 2017 Uniform Data Set (UDS). This approach enables a nuanced analysis of AD susceptibility among African American older adults. The CART analysis revealed significant associations between the studied sociodemographic and cardiovascular risk factors and AD susceptibility among African American older adults. The results indicate the presence of specific subgroups with increased vulnerability to AD, shaped by varying levels of education [relative importance (RI): 100%], antidepressant usage (RI: 83.1%), BMI (RI: 71.2%), use of antipsychotic agents (RI: 35.5%), and age of smoking cessation (RI: 21.5%). These findings underscore the importance of culturally specific research and interventions for addressing AD among African Americans. This study's findings, revealing significant associations between sociodemographic and cardiovascular risk factors and AD susceptibility among African American older adults, underscore the necessity of developing healthcare policies and interventions specifically tailored to address these risks.
- Published
- 2023
- Full Text
- View/download PDF
27. Temporal–Spatial Fluctuations of a Phytoplankton Community and Their Association with Environmental Variables Based on Classification and Regression Tree in a Shallow Temperate Mountain River
- Author
-
Wang Tian, Zhongyu Wang, Haifei Kong, Yonglan Tian, and Tousheng Huang
- Subjects
phytoplankton ,species diversity ,abundance ,environmental variables ,classification and regression tree ,Biology (General) ,QH301-705.5 - Abstract
The effects of environmental factors on phytoplankton are not simply positive or negative but complex and dependent on the combination of their concentrations in a fluctuating environment. Traditional statistical methods may miss some of the complex interactions between the environment and phytoplankton. In this study, the temporal–spatial fluctuations of phytoplankton diversity and abundance were investigated in a shallow temperate mountain river. The machine learning method classification and regression tree (CART) was used to explore the effects of environmental variables on the phytoplankton community. The results showed that both phytoplankton species diversity and abundance varied fiercely due to environmental fluctuation. Microcystis aeruginosa, Amphiprora sp., Anabaena oscillarioides, and Gymnodinium sp. were the dominant species. The CART analysis indicated that dissolved oxygen, oxidation-reduction potential, total nitrogen (TN), total phosphorus (TP), and water temperature (WT) explained 36.00%, 13.81%, 11.35%, 9.96%, and 8.80%, respectively, of phytoplankton diversity variance. Phytoplankton abundance was mainly affected by TN, WT, and TP, with variance explanations of 39.40%, 15.70%, and 14.09%, respectively. Most environmental factors had a complex influence on phytoplankton diversity and abundance: their effects were positive under some conditions but negative under other combinations. The results and methodology in this study are important in quantitatively understanding and exploring aquatic ecosystems.
- Published
- 2024
- Full Text
- View/download PDF
28. Assessing Derawan Island's Coral Reefs over Two Decades: A Machine Learning Classification Perspective.
- Author
-
Manessa, Masita Dwi Mandini, Ummam, Muhammad Al Fadio, Efriana, Anisya Feby, Semedi, Jarot Mulyo, and Ayu, Farida
- Subjects
- *
MACHINE learning , *CORAL reefs & islands , *CORALS , *SUPPORT vector machines , *LANDSAT satellites , *REGRESSION trees , *MULTISPECTRAL imaging - Abstract
This study aims to understand the dynamic changes in the coral reef habitats of Derawan Island over two decades (2003, 2011, and 2021) using advanced machine learning classification techniques. The motivation stems from the urgent need for accurate, detailed environmental monitoring to inform conservation strategies, particularly in ecologically sensitive areas like coral reefs. We employed non-parametric machine learning algorithms, including Random Forest (RF), Support Vector Machine (SVM), and Classification and Regression Tree (CART), to assess spatial and temporal changes in coral habitats. Our analysis utilized high-resolution data from Landsat 9, Landsat 7, Sentinel-2, and Multispectral Aerial Photos. The RF algorithm proved to be the most accurate, achieving an accuracy of 71.43% with Landsat 9, 73.68% with Sentinel-2, and 78.28% with Multispectral Aerial Photos. Our findings indicate that the classification accuracy is significantly influenced by the geographic resolution and the quality of the field and satellite/aerial image data. Over the two decades, there was a notable decrease in the coral reef area from 2003 to 2011, with a reduction to 16 hectares, followed by a slight increase in area but with more heterogeneous densities between 2011 and 2021. The study underscores the dynamic nature of coral reef habitats and the efficacy of machine learning in environmental monitoring. The insights gained highlight the importance of advanced analytical methods in guiding conservation efforts and understanding ecological changes over time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Two Stage Classification Model to Classify Patients into Lower Variability Resource User Groups.
- Author
-
Anitha, M., Sahithi, P., Lavanya, P., and Amrin, S. K.
- Subjects
- *
MEDICAL care costs , *INDEPENDENT variables , *REGRESSION trees , *RANDOM forest algorithms , *ELDER care - Abstract
Healthcare demand is growing in Australia and across the world. In Australia, the healthcare system comprises a mix of private and public organizations, such as hospitals, clinics, and aged care facilities. The Australian healthcare system is quite affordable and accessible because a large proportion of the expenditure, around 68%, is funded by the Australian government. The healthcare expenditure in 2015-16 was AUD 170.4 billion which was 10.0% of the GDP. Soaring healthcare costs and growing demand for services are increasing the pressure on the sustainability of the government-funded healthcare system. To be sustainable, we need to be more efficient in delivering healthcare services. We can schedule the care delivery process optimally and subsequently improve the efficiency of the system if demand for services is well known. However, there is a randomness in demand for services, and it is a cause of inefficiency in the healthcare delivery process. Soaring healthcare costs and the growing demand for services require us to use healthcare resources more efficiently. Randomness in resource requirements makes the care delivery process less efficient. Our aim is to reduce the uncertainty in patients’ resource requirements, and we achieve that objective by classifying patients into similar resource user groups. The conventional random forest., k-nearest neighbourhood (KNN) methods were resulted in poor classification, prediction performance. In this work, we develop a two-stage classification model to classify patients into lower variability resource user groups by using electronic patient record. There are various statistical tools for classifying patients into lower variability resource user groups. However, classification and regression tree (CART) analysis is a more suitable method for analyzing healthcare data because it has some distinct features. For example, it can handle the interaction between predictor variables naturally, it is nonparametric in nature, and it is relatively insensitive to the curse of dimensionality. [ABSTRACT FROM AUTHOR]
- Published
- 2023
30. _Combining return to sport, psychological readiness, body mass, hamstring strength symmetry, and hamstring/quadriceps ratio increases the risk of a second anterior cruciate ligament injury.
- Author
-
Almeida, Gabriel Peixoto Leão, Albano, Thamyla Rocha, Rodrigues, Carlos Augusto Silva, Tavares, Maria Larissa Azevedo, and de Paula Lima, Pedro Olavo
- Subjects
- *
ANTERIOR cruciate ligament injuries , *SPORTS re-entry , *STRENGTH training , *QUADRICEPS muscle , *SPORTS injuries , *RECEIVER operating characteristic curves - Abstract
Purpose: To investigate the combinations of variables that comprise the biopsychosocial model domains to identify clinical profiles of risk and protection of second anterior cruciate ligament injury. Methods: One hundred and forty-five patients for return-to-sport testing after anterior cruciate ligament (ACL) reconstruction (ACLR) were contacted, and 97 were deemed eligible. All were evaluated between 6 and 24 months and followed up for 2 years. Participants answered the International Knee Documentation Committee (IKDC) and Anterior Cruciate Ligament–Return to Sport after Injury Scale (ACL-RSI), performed the postural stability assessment using the Biodex Balance System, and assessed muscle strength at 60° and 300°/s on the isokinetic dynamometer. Personal factors (age, gender, body mass index), body structures (graft type and concomitant injuries), and environmental factors (time between surgery and evaluation) were also collected. The participants were asked about the occurrence of a second ACL injury and return to sport after 2 years of follow-up. Classification and regression tree (CART) analysis was used to determine predictors of a second ACL injury. The receiver operating characteristic (ROC) curve was performed to verify the accuracy of the CART analysis, in addition to the sensitivity, specificity, and relative risk (RR) of the model. Results: Of the initial 97 participants, 88 (89.8%) responded to follow-up and 14 (15.9%) had a second ACL injury (11 graft ruptures and three contralateral ACL). CART analysis identified the following variables as predictors of second ACL injury: return to sport, hamstring strength symmetry at 300°/s, ACL-RSI score, hamstrings/quadriceps ratio at 60°/s, and body mass index (BMI). CART correctly identified 9 (64.3%) of the 14 participants who were reinjured and 71 (95.9%) of the 74 participants who were not. The total correct classification was 90.9%. The area under the ROC curve was 0.88 (95% CI 0.72–0.99; p < 0.001), and the model showed a sensitivity of 75% (95% CI 42.8–94.5), specificity of 93.4% (95% CI 85.3–97.8), and RR of 15.9 (95% CI 4.9–51.4; p < 0.0001). Conclusion: The combination of hamstring strength symmetry, hamstring/quadriceps ratio (body functions); return to sport (activity and participation); psychological readiness; and BMI (personal factors) could identify three clinical risk profiles for a second ACL injury with good accuracy. Level of Evidence: IV. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. 基于DRSA 和CART 的桥梁震损状态决策 关联规则提取.
- Author
-
马东辉, 罗摇立, 王摇威, 郭小东, and 刘朝峰
- Abstract
Copyright of Journal of Beijing University of Technology is the property of Journal of Beijing University of Technology, Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
32. Tree-Structured Model with Unbiased Variable Selection and Interaction Detection for Ranking Data
- Author
-
Yu-Shan Shih and Yi-Hung Kung
- Subjects
classification and regression tree ,distance-based model ,independence test ,selection bias ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
In this article, we propose a tree-structured method for either complete or partial rank data that incorporates covariate information into the analysis. We use conditional independence tests based on hierarchical log-linear models for three-way contingency tables to select split variables and cut points, and apply a simple Bonferroni rule to declare whether a node worths splitting or not. Through simulations, we also demonstrate that the proposed method is unbiased and effective in selecting informative split variables. Our proposed method can be applied across various fields to provide a flexible and robust framework for analyzing rank data and understanding how various factors affect individual judgments on ranking. This can help improve the quality of products or services and assist with informed decision making.
- Published
- 2023
- Full Text
- View/download PDF
33. Estimation Recurrence Free Survival of the Epithelial Ovarian Cancer Using Classification and Regression Tree.
- Author
-
Deldar, Maryam, Sayehmiri, Kourosh, Anbiaee, Robab, and Jalilian, Anahita
- Subjects
DECISION trees ,OVARIAN epithelial cancer ,CANCER chemotherapy ,CANCER relapse ,RETROSPECTIVE studies ,METASTASIS ,RISK assessment ,DESCRIPTIVE statistics ,PROGRESSION-free survival ,PREDICTION models ,DATA analysis software ,TUMOR markers ,DISEASE risk factors - Abstract
Background: Epithelial ovarian cancer is one of the leading causes of death from gynecological cancers in the Western world. One of the important objectives of medical research is to determine predictors of an event. Regarding the interaction of risk factors, regression methods are unsuitable when the number of factors is high. Objectives: Regarding frequency predictors of recurrence-free survival in epithelial ovarian cancer, our aim in this article is to determine predictors and time to first recurrence using a classification and regression tree model. Methods: This retrospective analysis used medical and chemotherapy records of 141 patients with epithelial ovarian cancer between 2007 and 2018. They were referred to Imam Hossein Hospital in Tehran. Data were analyzed using classification and regression trees in Rver3.4.3. Results: The regression tree results showed that the worst recurrence-free survival in metastatic patients was in grade II patients (15.03 ± 11 months), but in patients without metastases were in patients with CA125 tumor marker above 207 that used 3-week chemotherapy courses (14.53 ± 6.4 months). The classification tree also showed that the most probability of the first recurrence in metastatic patients was in patients with adjuvant chemotherapy (0.81), and patients without metastases were among those with stages 2, 3, and 4 with the maximum platelet count above 305,000 and less than 35 years old (0.75). Conclusions: The classification and regression tree models, without any assumptions, can estimate the probability of recurrence in different subgroups. These models can be used in deciding due to the ease of interpretation by physicians and paramedic [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Investigating major cause of crashes on Indian expressways and developing strategies for traffic safety management.
- Author
-
Khan, Abdul Basit, Agrawal, Rajat, and Jain, S.S.
- Subjects
REGRESSION trees ,COMPACT discs ,FATIGUE (Physiology) ,CELL phones ,WEATHER ,TRAFFIC safety - Abstract
Distraction is one of the most prominent driver characteristics which accounts for several road crashes and is a major cause of concern for almost all road safety institutions. The main objective of this study was to explore and identify various causes of a crash on Yamuna Expressway. Furthermore, another objective includes the detailed study of driver's fault, including in-vehicular distraction and identifying the most and least distracting activity. Moreover, other driver characteristics like fatigue, the reaction of the driver when someone suddenly comes on the way, etc. were also analyzed for their contributions towards crash occurrence. In this study, supervised machine learning techniques viz. Classification and Regression Tree (CART) was used to predict the possibility of crash occurrence. The model results show that using a cell phone in case of adverse weather conditions has the highest likelihood of crash occurrence. Moreover, among other distraction tasks adjustment of radio/compact disc (CD), while driving has the least likelihood of a crash. Further, it was also found that the reaction of the driver when someone comes suddenly on the way (e.g. animal, another crossing vehicle/pedestrian) and long driving hours leading to headache/backache/fatigue also affect the possibility of crash occurrence. Model results were then validated using the testing dataset and it was found that the model is equally efficient in predicting crashes as it was with the training dataset. This study provides insight into several important driver characteristics and will be helpful in developing educational and enforcement strategies thereby reducing crash risk. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Comparison of decision tree and naïve Bayes algorithms in detecting trace residue of gasoline based on gas chromatography–mass spectrometry data.
- Author
-
Ghazi, Md Gezani Bin Md, Lee, Loong Chuen, Samsudin, Aznor S, and Sino, Hukil
- Subjects
GAS chromatography/Mass spectrometry (GC-MS) ,DECISION trees ,CART algorithms ,PRINCIPAL components analysis ,MARINE debris ,REGRESSION trees ,GASOLINE - Abstract
Fire debris analysis aims to detect and identify any ignitable liquid residues in burnt residues collected at a fire scene. Typically, the burnt residues are analysed using gas chromatography–mass spectrometry (GC–MS) and are manually interpreted. The interpretation process can be laborious due to the complexity and high dimensionality of the GC–MS data. Therefore, this study aims to compare the potential of classification and regression tree (CART) and naïve Bayes (NB) algorithms in analysing the pixel-level GC–MS data of fire debris. The data comprise 14 positive (i.e. fire debris with traces of gasoline) and 24 negative (i.e. fire debris without traces of gasoline) samples. The differences between the positive and negative samples were first inspected based on the mean chromatograms and scores plots of the principal component analysis technique. Then, CART and NB algorithms were independently applied to the GC–MS data. Stratified random resampling was applied to prepare three sets of 200 pairs of training and testing samples (i.e. split ratio of 7:3, 8:2, and 9:1) for estimating the prediction accuracies. Although both the positive and negative samples were hardly differentiated based on the mean chromatograms and scores plots of principal component analysis, the respective NB and CART predictive models produced satisfactory performances with the normalized GC–MS data, i.e. majority achieved prediction accuracy >70%. NB consistently outperformed CART based on the prediction accuracies of testing samples and the corresponding risk of overfitting except when evaluated using only 10% of samples. The accuracy of CART was found to be inversely proportional to the number of testing samples; meanwhile, NB demonstrated rather consistent performances across the three split ratios. In conclusion, NB seems to be much better than CART based on the robustness against the number of testing samples and the consistent lower risk of overfitting. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
36. A machine learning model for predicting innovation effort of firms.
- Author
-
Rani, Ruchi, Kumar, Sumit, Patil, Rutuja Rajendra, and Pippal, Sanjeev Kumar
- Subjects
MACHINE learning ,INNOVATIONS in business ,DATA mining ,BUSINESSPEOPLE ,TECHNOLOGICAL innovations ,REGRESSION trees - Abstract
Classification and regression tree (CART) data mining models have been used in several scientific fields for building efficient and accurate predictive models. Some of the application areas are prediction of disease, targeted marketing, and fraud detection. In this paper we use CART which widely used machine learning technique for predicting research and development (R&D) intensity or innovation effort of firms using several relevant variables like technical opportunity, knowledge spillover and absorptive capacity. We found that accuracy of CART models is superior to the often-used linear parametric models. The results of this study are considered necessary for both financial analysts and practitioners. In the case of financial analysts, it establishes the power of data-driven prototypes to understand the innovation thinking of employees, whereas in the case of policymakers or business entrepreneurs, who can take advantage of evidence-based tools in the decision-making process. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. RR Interval-based atrial fibrillation detection using traditional and ensemble machine learning algorithms
- Author
-
S K Shrikanth Rao and Roshan Joy Martis
- Subjects
atrial fibrillation ,area under the curve ,c4.5 ,classification and regression tree ,discrete wavelet transform ,electrocardiogram ,iterative dichotomiser 3 ,k-nn ,random forest ,rotation forest ,support vector machine ,Medical technology ,R855-855.5 - Abstract
Atrial fibrillation (AF) is a life threatening disease and can cause stroke, heart failure, and sometimes death. To reduce the rate of mortality and morbidity due to increased prevalence of AF, early detection of the same becomes a prior concern. Traditional machine learning (TML) algorithms and ensemble machine learning (EML) algorithms are proposed to detect AF in this article. The performances of both these methods are compared in this study. Methodology involves computation of RR interval features extracted from electrocardiogram and its classification into: normal, AF, and other rhythms. TML techniques such as Classification and Regression Tree, K Nearest Neighbor, C4.5, Iterative Dichotomiser 3, Support Vector Machine and EML classifier such as Random Forest (RF), and Rotation Forest are used for classification. The proposed method is evaluated using PhysioNet challenge 2017. During the tenfold cross validation, it is observed that RF classifier provided good classification accuracy of 99.10% with area under the curve of 0.998. Apart from contributing a new methodology, the proposed study also experimentally proves higher performance with ensemble learning method, RF. The methodology has many applications in health care management systems including defibrillators, cardiac pacemakers, etc.
- Published
- 2023
- Full Text
- View/download PDF
38. Multitemporal impervious surface estimation via an optimized stable/change pixel detection approach
- Author
-
Wei Fan, Jinsong Chen, Xiaoli Li, Paolo Tarolli, and Jin Wang
- Subjects
impervious surface ,change detection ,classification and regression tree ,remote sensing ,Mathematical geography. Cartography ,GA1-1776 ,Environmental sciences ,GE1-350 - Abstract
Remote sensing techniques have proved its efficacy for the impervious surface mapping, which is a significant indicator of urbanization process and environmental status. However, systematic and random errors in the existing methods still impact the reliability of subpixel impervious surface estimation, generating compounded errors when conducting multitemporal monitoring. The compounded errors of the conventional methods often significantly impact the temporal consistency of the results. In this study, a novel method based on a straightforward pixel change detection approach was put forward to improve the estimation of multitemporal impervious surface area. Two experimental areas located in Rome in Italy and Shenzhen in China were chosen to testify the generality of the proposed method to estimate different types of impervious surfaces worldwide. By reducing the compounded errors, the proposed method demonstrated its efficiency in achieving higher accuracy in both study areas without involving extensive data sources and intensive manual tasks. Compared with the conventional classification and regression tree algorithm, the overall mean average error and root mean square error of this study declined by more than 15.55% and 8.63%, respectively, and R2 increased from approximately 0.93 to 0.96. The proposed method also drastically reduced the standard deviation of the multitemporal percent ISA of the stable pixels. The accurate change estimation of percent ISA has been a fundamental but challenging issue associated with monitoring and understanding the urban environment. Therefore, our proposed method, with its improved ability to estimate impervious surface change both spatially and temporally, can provide accurate information required for urban environment research.
- Published
- 2022
- Full Text
- View/download PDF
39. What cultural values determine student self-efficacy? An empirical study for 42 countries and economies.
- Author
-
Rui Jin, Rongxiu Wu, Yuyan Xia, and Mingren Zhao
- Subjects
SELF-efficacy in students ,CULTURAL values ,EDUCATIONAL exchanges ,REGRESSION trees ,EMPIRICAL research ,POWER (Social sciences) - Abstract
Self-efficacy is a vital personal characteristic for student success. However, the challenge of cross-cultural comparisons remains as scalar invariance is hard to be satisfied. Also, it is unclear how to contextually understand student self-efficacy in light of cultural values in different countries. This study implements a novel alignment optimization method to rank the latent means of student self-efficacy of 308,849 students in 11,574 schools across 42 countries and economies that participated in the 2018 Program in International Student Assessment. We then used classification and regression trees to classified countries with differential latent means of student self-efficacy into groups according to Hofstede’s six cultural dimensions theory. The results of the alignment method recovered that Albania, Colombia, and Peru had students with the highest mean self-efficacy, while Slovak Republic, Moscow Region (RUS), and Lebanon had the lowest. Moreover, the CART analysis indicated a low student self-efficacy for countries presenting three features: (1) extremely high power distance; (2) restraint; and (3) collectivism. These findings theoretically highlighted the significance of cultural values in shaping student self-efficacy across countries and practically provided concrete suggestions to educators on which countries to emulate such that student self-efficacy could be promoted and informed educators in secondary education institutes on the international expansion of academic exchanges. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Using a decision tree approach to determine hearing aid ownership in older adults.
- Author
-
Tran, Yvonne, Tang, Diana, McMahon, Catherine, Mitchell, Paul, and Gopinath, Bamini
- Subjects
- *
TREATMENT of hearing disorders , *DECISION trees , *AUDITORY perception testing , *STATISTICS , *CONFIDENCE intervals , *HEARING aids , *MACHINE learning , *T-test (Statistics) , *COMPARATIVE studies , *AUDIOMETRY , *RESEARCH funding , *DESCRIPTIVE statistics , *CHI-squared test , *PREDICTION models , *DATA analysis software , *LOGISTIC regression analysis , *ODDS ratio , *SENSITIVITY & specificity (Statistics) , *ALGORITHMS , *OLD age - Abstract
The main clinical intervention for older adults with hearing loss is the provision of hearing aids. However, uptake and usage in this population have historically been reported as low. The aim of this study was to understand the hearing loss characteristics, from measured audiometric hearing loss and self-perceived hearing handicap, that contribute to the decision of hearing aid ownership. A total of 2833 adults aged 50+ years, of which 329 reported hearing aid ownership, were involved with a population-based survey with audiometric hearing assessments. Classification and regression tree (CART) analysis was used to classify hearing aid ownership from audiometric measurements and hearing disability outcomes. An overall accuracy of 92.5% was found for the performance of the CART analysis in predicting hearing aid ownership from hearing loss characteristics. By including hearing disability, sensitivity for predicting hearing aid ownership increased by up to 40% compared with just audiometric hearing loss measurements alone. A decision tree approach that considers both objectively measured hearing loss and self-perceived hearing disability, could facilitate a more tailored and personalised approach for determining hearing aid needs in the older population. Without intervention, older adults with hearing loss are at higher risk of cognitive decline and higher rates of depression, anxiety, social isolation. The provision of hearing aids can compensate hearing function, however, uptake and usage have been reported as low. Using a more precise cut-off from audiometric measures and self-perceived hearing disability scores could facilitate a tailored and personalised approach to screen and identify older adults for hearing aid needs. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. RR Interval-based Atrial Fibrillation Detection using Traditional and Ensemble Machine Learning Algorithms.
- Author
-
Shrikanth Rao, S. K. and Martis, Roshan Joy
- Subjects
MACHINE learning ,ATRIAL fibrillation ,CARDIAC pacemakers ,K-nearest neighbor classification ,SUPPORT vector machines ,REGRESSION trees ,RANDOM forest algorithms - Abstract
Atrial fibrillation (AF) is a life threatening disease and can cause stroke, heart failure, and sometimes death. To reduce the rate of mortality and morbidity due to increased prevalence of AF, early detection of the same becomes a prior concern. Traditional machine learning (TML) algorithms and ensemble machine learning (EML) algorithms are proposed to detect AF in this article. The performances of both these methods are compared in this study. Methodology involves computation of RR interval features extracted from electrocardiogram and its classification into: normal, AF, and other rhythms. TML techniques such as Classification and Regression Tree, K Nearest Neighbor, C4.5, Iterative Dichotomiser 3, Support Vector Machine and EML classifier such as Random Forest (RF), and Rotation Forest are used for classification. The proposed method is evaluated using PhysioNet challenge 2017. During the tenfold cross validation, it is observed that RF classifier provided good classification accuracy of 99.10% with area under the curve of 0.998. Apart from contributing a new methodology, the proposed study also experimentally proves higher performance with ensemble learning method, RF. The methodology has many applications in health care management systems including defibrillators, cardiac pacemakers, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Severity modeling of work zone crashes in New Jersey using machine learning models.
- Author
-
Hasan, Ahmed Sajid, Kabir, Md Asif Bin, Jalayer, Mohammad, and Das, Subasish
- Subjects
- *
MACHINE learning , *SUPPORT vector machines , *RANDOM forest algorithms , *INFRASTRUCTURE (Economics) , *SPEED limits - Abstract
In the United States, the probability of work zone crashes has increased due to an increase in renovation works by transportation infrastructures. The severity of work zone crashes is associated with multiple contributing factors such as the roadway's geometric design features, temporal variables, environmental conditions, types of vehicles, and driver behaviors. For this study, we acquired and analyzed three years (2016–2018) of work zone crash data from the state of New Jersey. We investigated the performance of several machine learning methods, including Support Vector Machine, Random Forest, Catboost, Light GBM, and XGBoost to predict the type of injury severity resulting from work zone crashes. To evaluate models' performances, some statistical evaluation parameters such as accuracy, precision, and recall scores were calculated. In addition, a sensitivity analysis was conducted to assess the impact of the most influential factors in work zone-related crashes. Random Forest and Catboost outperformed the other models in terms of predicting fatal, major, and minor injuries. According to the sensitivity analysis, crash type and speed limit were the most significantly associated variables with crash severity. The findings of this study are expected to facilitate the identification of appropriate countermeasures for reducing the severity of work zone crashes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. Tree-Structured Model with Unbiased Variable Selection and Interaction Detection for Ranking Data.
- Author
-
Shih, Yu-Shan and Kung, Yi-Hung
- Subjects
REGRESSION trees ,BONFERRONI correction ,SELECTION bias (Statistics) ,SIMULATION methods & models ,ROBUST control - Abstract
In this article, we propose a tree-structured method for either complete or partial rank data that incorporates covariate information into the analysis. We use conditional independence tests based on hierarchical log-linear models for three-way contingency tables to select split variables and cut points, and apply a simple Bonferroni rule to declare whether a node worths splitting or not. Through simulations, we also demonstrate that the proposed method is unbiased and effective in selecting informative split variables. Our proposed method can be applied across various fields to provide a flexible and robust framework for analyzing rank data and understanding how various factors affect individual judgments on ranking. This can help improve the quality of products or services and assist with informed decision making. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. How to better balance academic achievement and learning anxiety from time on homework? A multilevel and classification and regression tree analyses.
- Author
-
Xiaopeng Wu, Rongxiu Wu, Hanley, Carol, Hongyun Liu, and Jian Liu
- Subjects
REGRESSION trees ,ACADEMIC achievement ,REGRESSION analysis ,HOMEWORK ,ANXIETY - Abstract
Using education survey data from 153, 317 Grade 4 students and 150, 040 Grade 8 students in China, this study examined the relationship between time on homework and academic achievement and learning anxiety with hierarchical linear modeling (HLM) and classification and regression tree (CART) approaches. With a classification of time spent on homework into four related variables, this study found that, firstly, time spent on in-school homework during weekdays had positive effects on students' achievement for both grades, and the positive effect was stronger for Grade 8 students than Grade 4 students. Moreover, a maximum of 1 h was recommended for Grade 4 students. Secondly, time spent on out-ofschool homework on weekdays was negatively correlated with students' academic achievement and positively with learning anxieties. It had greater detrimental effect on Grade 8 than Grade 4. Thirdly, Grade 8 students were encouraged to have more out-of-school homework on weekend with more than 2.8 h on average recommended. It was expected to complement extant studies and provide the practical findings for teachers, practitioners and school policy makers in making any homework assignment planning or conducting interventions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Hospital-based prostate cancer screening in vietnamese men with lower urinary tract symptoms: a classification and regression tree model
- Author
-
Nguyen Chi Cuong, Nguyen Truong Vien, Nguyen Minh Thien, Phan Thanh Hai, and Tran Ngoc Dang
- Subjects
Prostate cancer ,PSA ,I-PSS ,Vietnamese patients ,Classification and regression tree ,Bayesian modeling averaging ,Diseases of the genitourinary system. Urology ,RC870-923 - Abstract
Abstract Background Prostate cancer (PCa) is a common disease in men over 65 years of age, and should be detected early, while reducing unnecessary biopsies. This study aims to construct a classification and regression tree (CART) model (i.e., risk stratification algorithm) using multivariable approach to select Vietnamese men with lower urinary tract symptoms (LUTS) for PCa biopsy. Methods We conducted a case-control study on 260 men aged ≥ 50 years who visited MEDIC Medical Center, Vietnam in 2017–2018 with self-reported LUTS. The case group included patients with a positive biopsy and the control group included patients with a negative biopsy diagnosis of PCa. Bayesian Model Averaging (BMA) was used for selecting the most parsimonious prediction model. Then the CART with 5-fold cross-validation was constructed for selecting men who can benefit from PCa biopsy in steps by steps and intuitive way. Results BMA suggested five potential prediction models, in which the most parsimonious model including PSA, I-PSS, and age. CART advised the following cut-off points in the marked screening sequence: 18
- Published
- 2022
- Full Text
- View/download PDF
46. Interaction Effect Between Hemoglobin and Hypoxemia on COVID-19 Mortality: an observational study from Bogotá, Colombia
- Author
-
Patiño-Aldana AF, Ruíz Sternberg M, Pinzón Rondón M, Molano-Gonzalez N, and Rodriguez Lima DR
- Subjects
hypoxia ,erythrocytosis ,acute respiratory infection ,sars-cov-2 ,neutrophil-to-lymphocyte ratio ,generalized additive model ,classification and regression tree ,inflammation ,Medicine (General) ,R5-920 - Abstract
Andrés Felipe Patiño-Aldana,1 Ángela María Ruíz Sternberg,1 Ángela María Pinzón Rondón,1 Nicolás Molano-Gonzalez,1 David Rene Rodriguez Lima1,2 1Grupo de Investigación Clínica, Escuela de Medicina y Ciencias de la Salud, Universidad del Rosario, Bogotá, Colombia; 2CIMED, Hospital Universitario Mayor - Méderi, Bogotá, ColombiaCorrespondence: Andrés Felipe Patiño-Aldana, Email andresf.patino@urosario.edu.coPurpose: We aimed to assess the effect of hemoglobin (Hb) concentration and oxygenation index on COVID-19 patients’ mortality risk.Patients and Methods: We retrospectively reviewed sociodemographic and clinical characteristics, laboratory findings, and clinical outcomes from patients admitted to a tertiary care hospital in Bogotá, Colombia, from March to July 2020. We assessed exploratory associations between oxygenation index and Hb concentration at admission and clinical outcomes. We used a generalized additive model (GAM) to evaluate the observed nonlinear relations and the classification and regression trees (CART) algorithm to assess the interaction effects.Results: We included 550 patients, of which 52% were male. The median age was 57 years old, and the most frequent comorbidity was hypertension (29%). The median value of SpO2/FiO2 was 424, and the median Hb concentration was 15 g/dL. The mortality was 15.1% (83 patients). Age, sex, and SpO2/FiO2, were independently associated with mortality. We described a nonlinear relationship between Hb concentration and neutrophil-to-lymphocyte ratio with mortality and an interaction effect between SpO2/FiO2 and Hb concentration. Patients with a similar oxygenation index had different mortality likelihoods based upon their Hb at admission. CART showed that patients with SpO2/FiO2 < 324, who were less than 81 years with an NLR > 9.9, and Hb > 15 g/dl had the highest mortality risk (91%). Additionally, patients with SpO2/FiO2 > 324 but Hb of < 12 g/dl and a history of hypertension had a higher mortality likelihood (59%). In contrast, patients with SpO2/FiO2 > 324 and Hb of > 12 g/dl had the lowest mortality risk (9%).Conclusion: We found that a decreased SpO2/FiO2 increased mortality risk. Extreme values of Hb, either low or high, showed an increase in the likelihood of mortality. However, Hb concentration modified the SpO2/FiO2 effect on mortality; the probability of death in patients with low SpO2/FiO2 increased as Hb increased.Keywords: hypoxia, erythrocytosis, acute respiratory infection, SARS-CoV-2, neutrophil-to-lymphocyte ratio, generalized additive model, classification and regression tree, inflammation, altitude, acute lung injury
- Published
- 2022
47. Depression Detection on Twitter Social Media Using Decision Tree
- Author
-
Marcello Rasel Hidayatullah and Warih Maharani
- Subjects
depression ,tweet ,depression anxiety and stress scale 42 ,classification and regression tree ,Systems engineering ,TA168 ,Information technology ,T58.5-58.64 - Abstract
Depression is a major mood illness that causes patients to experience significant symptoms that interfere with their daily activities. As technology has developed, people now frequently express themselves through social media, especially Twitter. Twitter is a social media platform that allows users to post tweets and communicate with each other. Therefore, detecting depression based on social media can help in early treatment for sufferers before further treatment. This study created a system to detect if a person is indicating depression or not based on Depression Anxiety and Stress Scale - 42 (DASS-42) and their tweets using the Classification and Regression Tree (CART) method with TF-IDF feature extraction. The results show that the most optimal model achieved an accuracy score of 81.25% and an f1 score of 85.71%, which are higher than baseline results with an accuracy score of 62.50% and an f1 score of 66.66%. In addition, we found that there were significant effects on changing the value of the maximum features in TF-IDF and changing the maximum depth of the tree to the model performance.
- Published
- 2022
- Full Text
- View/download PDF
48. Utilizing Different Machine Learning Techniques to Examine Speeding Violations.
- Author
-
Alomari, Ahmad H., Al-Mistarehi, Bara' W., Alnaasan, Tasneem K., and Obeidat, Motasem S.
- Subjects
SPEEDING violations ,MACHINE learning ,STANDARD deviations ,RECEIVER operating characteristic curves ,REGRESSION trees ,HURRICANE Irma, 2017 - Abstract
This study investigated the potential impacts on speeding violations in the United States, including the top ten states in terms of crashes: California, Florida, Georgia, Illinois, Michigan, North Carolina, Ohio, Pennsylvania, Tennessee, and Texas. Several variables connected to the driver, surroundings, vehicle, road, and weather were investigated. Three different machine learning algorithms—Random Forest (RF), Classification and Regression Tree (CART), and Multi-Layer Perceptron (MLP)—were applied to predict speeding violations. Accuracy, F-measure, Kappa statistic, Root Mean Squared Error (RMSE), Area Under Curve (AUC), and Receiver Operating Characteristic (ROC) were used to evaluate the algorithms' performance. Findings showed that age, accident year, road alignment, weather, accident time, and speed limits are the most significant variables. The algorithms used showed excellent ability in analyzing and predicting speeding violations. The RF was the best method for analyzing and predicting speeding violations. Understanding how these factors affect speeding violations helps decision-makers devise ways to cut down on these violations and make the roads safer. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Picking Winners: Identifying Features of High-Performing Special Purpose Acquisition Companies (SPACs) with Machine Learning.
- Author
-
Williams, Caleb J.
- Subjects
SPECIAL purpose acquisition companies ,MACHINE learning ,PRIVATE equity ,PRIVATE sector ,INDIVIDUAL investors ,LOGISTIC regression analysis - Abstract
Special Purpose Acquisition Companies (SPACs) are publicly listed "blank check" firms with a sole purpose: to merge with a private company and take it public. Selecting a target to take public via SPACs is a complex affair led by SPAC sponsors who seek to deliver investor value by effectively "picking winners" from the private sector. A key question for all sponsors is what they should be searching for. This paper aims to identify the characteristics of SPACs and their target companies that are relevant to market performance at sponsor lock-up windows. To achieve this goal, the study breaks market performance into a binary classification problem and uses a machine learning approach comprised of decision trees, logistic regression, and LASSO regression to identify features that exhibit a distinct relationship with market performance. The obtained results demonstrate that corporate or private equity backing in target firms greatly improves the odds of market outperformance one-year post-merger. This finding is novel in indicating that characteristics of target firms may also be deterministic of SPAC performance, in addition to SPACs, transaction, and the market features identified in the prior literature. It further suggests that a viable sponsor strategy could be constructed for generating outsized market returns at share lock-up windows by simply "following the money" and choosing target firms with prior involvement from corporate or private equity investors. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. An Electricity Consumption Disaggregation Method for HVAC Terminal Units in Sub-Metered Buildings Based on CART Algorithm.
- Author
-
Yang, Xinyu, Ji, Ying, Gu, Jiefan, and Niu, Menghan
- Subjects
CART algorithms ,ENERGY consumption of buildings ,ELECTRIC power consumption ,ELECTRIC circuits ,ENERGY consumption ,HEATING & ventilation industry ,MIXING circuits ,BUILDING-integrated photovoltaic systems - Abstract
Obtaining reliable and detailed energy consumption information about building service (BS) systems is an essential prerequisite for identifying energy-saving potential and improving energy efficiency of a building. Therefore, in recent years, energy sub-metering systems have been widely implemented in public buildings in China. A majority of electrical systems and equipment can be directly metered. However, in actual sub-metering systems, the terminal units of heating, ventilation and air conditioning (HVAC) systems, such as fan coils, air handling units and so on, are often mixed with the lighting-plug circuit. This mismatch between theoretical sub-metering systems and actual electricity supply circuits constitutes a lot of challenges in BS system management and control optimization. This study proposed an indirect method to disaggregate the energy consumption of HVAC terminal units from mixed sub-metering data based on the CART algorithm. This method was demonstrated in two buildings in Shanghai. The case study results show that the weighted mean absolute percentage errors (WMAPE) are within 5% and 15% during working hours in the cooling and heating seasons, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.