16,143 results on '"Decision Trees"'
Search Results
252. Improvements in Typhoon Intensity Change Classification by Incorporating an Ocean Coupling Potential Intensity Index into Decision Trees*,+.
- Author
-
Gao, Si, Zhang, Wei, Liu, Jia, Lin, I.-I., Chiu, Long S., and Cao, Kai
- Subjects
- *
TYPHOONS , *DECISION trees , *OCEAN temperature , *PREDICTION models , *STORMS - Abstract
Tropical cyclone (TC) intensity prediction, especially in the warning time frame of 24-48 h and for the prediction of rapid intensification (RI), remains a major operational challenge. Sea surface temperature (SST) based empirical or theoretical maximum potential intensity (MPI) is the most important predictor in statistical intensity prediction schemes and rules derived by data mining techniques. Since the underlying SSTs during TCs usually cannot be observed well by satellites because of rain contamination and cannot be produced on a timely basis for operational statistical prediction, an ocean coupling potential intensity index (OC_PI), which is calculated based on pre-TC averaged ocean temperatures from the surface down to 100 m, is demonstrated to be important in building the decision tree for the classification of 24-h TC intensity change Δ V24, that is, RI (Δ V24 ≥ 25 kt, where 1 kt = 0.51 m s−1) and non-RI (Δ V24 < 25 kt). Cross validations using 2000-10 data and independent verification using 2011 data are performed. The decision tree with the OC_PI shows a cross-validation accuracy of 83.5% and an independent verification accuracy of 89.6%, which outperforms the decision tree excluding the OC_PI with corresponding accuracies of 83.2% and 83.9%. Specifically for RI classification in independent verification, the former decision tree shows a much higher probability of detection and a lower false alarm ratio than the latter example. This study is of great significance for operational TC RI prediction as pre-TC OC_PI can skillfully reduce the overestimation of storm potential intensity by traditional SST-based MPI, especially for the non-RI TCs. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
253. Response frequencies in the conjoint recognition memory task as predictors of developmental dyslexia diagnosis: A decision-trees approach.
- Author
-
Obidziński, Michał
- Subjects
- *
DYSLEXIA , *MEMORY testing , *DECISION trees , *MEMORY , *ALGORITHMS , *HIGH school students , *SECONDARY education - Abstract
The presented study applies the methods of data mining and prediction models to the subject of memory functioning in developmental dyslexia. This article sets forth the results of an analysis of the decision tree algorithm for the classification of dyslexia/non-dyslexia, based on frequency data from the modified simplified conjoint recognition experiment-a paradigm based on the fuzzy-trace theory used to investigate verbatim and gist memory. This decision tree model was created with the use of the C&RT algorithm, which makes a prediction of the classification with the use of four predictors: the numbers of different types of answers depending on the specific stimuli presented. Seventy-one high school students, 33 with developmental dyslexia, took part in a memory experiment. The model created using the decision tree algorithm has a very good overall validity. Excellent developmental dyslexia classification was accompanied by satisfactory non-dyslexia classification. The decision tree proposed predictors that are supported both theoretically and empirically. The results obtained show an important role of verbatim and gist memory functioning in developmental dyslexia and suggest that the pattern of performance observed in the memory tests can be used as a predictor of the developmental dyslexia disorder. Results encourage further usage of decision trees. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
254. Comparative Study Between Decision Trees and Neural Networks to Predictfatal Road Accidents in Lebanon
- Author
-
Pierre Chauvet, Bassam Daya, Zeinab Farhat, Ali Karouni, Université Libanaise, Laboratoire Angevin de Recherche en Ingénierie des Systèmes (LARIS), and Université d'Angers (UA)
- Subjects
Artificial neural network ,Neural Networks ,business.industry ,Computer science ,Fatal Road Accident Prediction ,Decision tree ,Machine learning ,computer.software_genre ,Decision trees ,Cross-validation ,[SPI]Engineering Sciences [physics] ,Area under curve ,Artificial intelligence ,business ,Road traffic ,computer ,Data mining - Abstract
International audience; Nowadays, road traffic accidents are one of the leading causes of deaths in this world. It is a complex phenomenon leaving a significant negative impact on human’s life and properties. Classification techniques of data mining are found efficient to deal with such phenomena. After collecting data from Lebanese Internal Security Forces, data are split into training and testing sets using 10-fold cross validation. This paper aims to apply two different algorithms of Decision Trees C4.5 and CART, and various Artificial Neural Networks (MLP) in order to predict the fatality of road accidents in Lebanon. Afterwards, a comparative study is made to find the best performing algorithm. The results have shown that MLP with 2 hidden layers and 42 neurons in each layer is the best algorithm with accuracy rate of prediction (94.6%) and area under curve (AUC 95.71%).
- Published
- 2019
255. Analysis of the sense of integration of youth of Maghrebi origin in Catalonia through the decision trees technique
- Author
-
Vilà Baños, Ruth, Palou Julián, Berta, and Rubio Hurtado, María José
- Subjects
diagrama de flujo ,Youth ,Decision trees ,Educación ,España ,Integration ,Árboles de decisión ,lcsh:Education (General) ,Social sciences ,Education ,CIENCIAS SOCIALES ,educación inter-cultural ,Ciudadanía ,SOCIAL SCIENCES ,Cataluña ,Juventud ,Ciencias sociales ,inmigrante ,Data mining ,educación cívica ,Relacions racials ,Immigrants adolescents ,Citizenship ,Race relations ,Mineria de dades ,lcsh:L ,lcsh:L7-991 ,Integración ,Teenage immigrants ,diálogo ,lcsh:Education - Abstract
En este artículo se presenta la técnica de análisis de minería de datos (data mining) denominada árboles de decisión aplicada en un estudio sobre los factores influyentes en el sentimiento de integración de la juventud entendida desde un enfoque intercultural de diálogo e intercambio y como proceso dinámico y multidimensional. En este estudio se trata de identificar los elementos clave específicamente, respecto a la convivencia intercultural para que los jóvenes de origen magrebí se sientan integrados en la sociedad catalana. Para el desarrollo de este análisis se aplicó el cuestionario Cohesión social entre jóvenes a una muestra de 3.498 jóvenes de 47 centros de educación secundaria de 37 municipios de Cataluña. Los resultados obtenidos señalan que la juventud de origen magrebí tiene un mayor porcentaje de sentimiento de no integración que el resto de los jóvenes. Los elementos clave respecto a la convivencia intercultural para que se sientan integrados son: la procedencia cultural de las amistades en el centro educativo y el tiempo de residencia en Cataluña., This article presents the application of decision trees, a data mining analysis technique, in a study evaluating the factors influencing the sense of integration of youth, where integration is addressed from an intercultural approach of dialogue and exchange in a dynamic and multidimensional process. In this study, we try to identify the key elements regarding intercultural coexistence to make young people of Maghrebi origin feel integrated into the Catalan society. For the analysis, The Social Cohesion among Young People questionnaire was applied to a sample of 3498 young people of 47 secondary education centers belonging to 37 municipalities of Catalonia. The results indicate that the youth of Maghrebi origin has a higher percentage of sense of non-integration compared to the rest of young people. The key elements regarding the intercultural coexistence to feel integrated are the cultural origin of their friends in the educational center and the time of residence in Catalonia.
- Published
- 2019
256. Fuzzy decision trees embedded with evolutionary fuzzy clustering for locating users using wireless signal strength in an indoor environment
- Author
-
Boominathan Perumal, Xiaochun Cheng, Cyril Joe Baby, Rajen B. Bhatt, Swathi Jamjala Narayanan, Achyut Shankar, and Muhammad Rukunuddin Ghalib
- Subjects
0209 industrial biotechnology ,Fuzzy clustering ,Computer science ,Ant colony optimization algorithms ,Particle swarm optimization ,02 engineering and technology ,computer.software_genre ,Hybrid algorithm ,Theoretical Computer Science ,Human-Computer Interaction ,Evolutionary clustering ,020901 industrial engineering & automation ,Wireless signal strength ,Fuzzy decision tree ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Smart environment ,Data mining ,computer ,Software - Abstract
Location estimation is one of the critical requirement for developing smart environment products. Due to huge utilization and accessibility of WiFi infrastructure facility in indoor environments, researchers widely studied this technology to locate users accurately to provide several services instantly. In this research work, a hybrid algorithm namely fuzzy decision tree (FDT) with evolutionary fuzzy clustering methods is adopted for optimal user localization in a closed environment. Here we consider the wireless signal strengths received from the smart phones as predictors and the location of the user as the classification label. The required data for the current research is collected from the physical facility available at an office location in USA. The classification results obtained are promising enough to show that the evolutionary clustering approaches provide good fuzzy clusters for FDT induction with better accuracy.
- Published
- 2021
257. Predicting Bus Passenger Flow and Prioritizing Influential Factors Using Multi-Source Data: Scaled Stacking Gradient Boosting Decision Trees
- Author
-
Wenzhou Jin, Yisong Xia, and Weitiao Wu
- Subjects
050210 logistics & transportation ,Boosting (machine learning) ,Computer science ,business.industry ,Mechanical Engineering ,05 social sciences ,Big data ,Stability (learning theory) ,Decision tree ,computer.software_genre ,Computer Science Applications ,Data modeling ,Multicollinearity ,0502 economics and business ,Automotive Engineering ,Scalability ,Data mining ,Gradient boosting ,business ,computer - Abstract
Accurate bus passenger flow prediction contributes to informed decisions and full utilization of transit supply. Passenger flow is affected by an extensive range of attributes featuring travel environment, which can be collected through multi-source information. A successful prediction model should not only fully utilize the latent knowledge hidden in multi-source data, but also address the resulting multicollinearity issue. Based on this principle, we propose a novel scaled stacking gradient boosting decision tree (SS-GBDT) model to predict bus passenger flow with multi-source datasets. SS-GBDT includes two modules: the prior feature-generation module and the subsequent GBDT-prediction module. The prior module entails a couple of basic models with similar performance, which generates several enhanced features of multi-source data by stacking process. Particularly, we devise a scaled stacking method by introducing a quasi-attention based mechanism (precision-based scaling and time-based scaling). Taking the newly generated features as input, the prediction module forecasts the passenger flow using GBDT model with stacked data, thereby enhancing the prediction performance. The proposed model is tested in two real-life bus lines in Guangzhou, China. Results show that SS-GBDT not only presents superiority in both prediction accuracy and stability, but can also better handle the multicollinearity issue with multi-source data. It can also prioritize the influential factors on passenger flow prediction. The prediction model is flexible and scalable, which enables the integration of various influential factors in the presence of big data.
- Published
- 2021
258. Offline and Online Learning of Signal Temporal Logic Formulae Using Decision Trees
- Author
-
Calin Belta and Giuseppe Bombara
- Subjects
0209 industrial biotechnology ,Control and Optimization ,Computer Networks and Communications ,Computer science ,Supervised learning ,Decision tree ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Formal methods ,Fault detection and isolation ,Human-Computer Interaction ,020901 industrial engineering & automation ,Artificial Intelligence ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Anomaly detection ,Data mining ,Focus (optics) ,computer ,Finite set ,TRACE (psycholinguistics) - Abstract
In this article, we focus on inferring high-level descriptions of a system from its execution traces. Specifically, we consider a classification problem where system behaviors are described using formulae of Signal Temporal Logic (STL). Given a finite set of pairs of system traces and labels, where each label indicates whether the corresponding trace exhibits some system property, we devised a decision-tree-based framework that outputs an STL formula that can distinguish the traces. We also extend this approach to the online learning scenario. In this setting, it is assumed that new signals may arrive over time and the previously inferred formula should be updated to accommodate the new data. The proposed approach presents some advantages over traditional machine learning classifiers. In particular, the produced formulae are interpretable and can be used in other phases of the system’s operation, such as monitoring and control. We present two case studies to illustrate the effectiveness of the proposed algorithms: (1) a fault detection problem in an automotive system and (2) an anomaly detection problem in a maritime environment.
- Published
- 2021
259. A Comparative Analysis of Decision Trees Vis-'a-vis Other Computational Data Mining Techniques in Automotive Insurance Fraud Detection
- Author
-
Sukanto Bhattacharya, Kuldeep Kumar, J. Holton Wilson, and Adrian Gepp
- Subjects
0301 basic medicine ,Insurance fraud ,Computer science ,business.industry ,Decision tree ,Automotive industry ,Context (language use) ,computer.software_genre ,Insurance claims ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Business failure prediction ,Data mining ,business ,Insurance industry ,Financial fraud ,computer - Abstract
The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of - financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.
- Published
- 2021
260. Knowledge extraction for solving resource-constrained project scheduling problem through decision tree.
- Author
-
Xie, Lin-Lin, Chen, Yajiao, Wu, Sisi, Chang, Rui-Dong, and Han, Yilong
- Subjects
DECISION trees ,OPTIMIZATION algorithms ,STATISTICAL decision making ,TIME complexity ,SCHEDULING ,GENETIC algorithms - Abstract
Purpose: Project scheduling plays an essential role in the implementation of a project due to the limitation of resources in practical projects. However, the existing research tend to focus on finding suitable algorithms to solve various scheduling problems and fail to find the potential scheduling rules in these optimal or near-optimal solutions, that is, the possible intrinsic relationships between attributes related to the scheduling of activity sequences. Data mining (DM) is used to analyze and interpret data to obtain valuable information stored in large-scale data. The goal of this paper is to use DM to discover scheduling concepts and obtain a set of rules that approximate effective solutions to resource-constrained project scheduling problems. These rules do not require any search and simulation, which have extremely low time complexity and support real-time decision-making to improve planning/scheduling. Design/methodology/approach: The resource-constrained project scheduling problem can be described as scheduling a group of interrelated activities to optimize the project completion time and other objectives while satisfying the activity priority relationship and resource constraints. This paper proposes a new approach to solve the resource-constrained project scheduling problem by combining DM technology and the genetic algorithm (GA). More specifically, the GA is used to generate various optimal project scheduling schemes, after that C4.5 decision tree (DT) is adopted to obtain valuable knowledge from these schemes for further predicting and solving new scheduling problems. Findings: In this study, the authors use GA and DM technology to analyze and extract knowledge from a large number of scheduling schemes, and determine the scheduling rule set to minimize the completion time. In order to verify the application effect of the proposed DT classification model, the J30, J60 and J120 datasets in PSPLIB are used to test the validity of the scheduling rules. The results show that DT can readily duplicate the excellent performance of GA for scheduling problems of different scales. In addition, the DT prediction model developed in this study is applied to a high-rise residential project consisting of 117 activities. The results show that compared with the completion time obtained by GA, the DT model can realize rapid adjustment of project scheduling problem to deal with the dynamic environment interference. In a word, the data-based approach is feasible, practical and effective. It not only captures the knowledge contained in the known optimal scheduling schemes, but also helps to provide a flexible scheduling decision-making approach for project implementation. Originality/value: This paper proposes a novel knowledge-based project scheduling approach. In previous studies, intelligent optimization algorithm is often used to solve the project scheduling problem. However, although these intelligent optimization algorithms can generate a set of effective solutions for problem instances, they are unable to explain the process of decision-making, nor can they identify the characteristics of good scheduling decisions generated by the optimization process. Moreover, their calculation is slow and complex, which is not suitable for planning and scheduling complex projects. In this study, the set of effective solutions of problem instances is taken as the training dataset of DM algorithm, and the extracted scheduling rules can provide the prediction and solution of new scheduling problems. The proposed method focuses on identifying the key parameters of a specific dynamic scheduling environment, which can not only reproduces the scheduling performance of the original algorithm well, but also has the ability to make decisions quickly under the dynamic interference construction scenario. It is helpful for project managers to implement quick decisions in response to construction emergencies, which is of great practical significance for improving the flexibility and efficiency of construction projects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
261. Identification and prediction of association patterns between nutrient intake and anemia using machine learning techniques: results from a cross-sectional study with university female students from Palestine.
- Author
-
Qasrawi, Radwan, Badrasawi, Manal, Al-Halawa, Diala Abu, Polo, Stephanny Vicuna, Khader, Rami Abu, Al-Taweel, Haneen, Alwafa, Reem Abu, Zahdeh, Rana, Hahn, Andreas, and Schuchardt, Jan Philipp
- Subjects
IRON deficiency anemia ,RISK assessment ,CROSS-sectional method ,PROTEINS ,DATA mining ,FOOD consumption ,MALNUTRITION ,DIETARY patterns ,CLUSTER analysis (Statistics) ,T-test (Statistics) ,HEALTH status indicators ,RESEARCH funding ,NUTRITIONAL requirements ,DESCRIPTIVE statistics ,MICRONUTRIENTS ,ANALYSIS of variance ,VITAMINS ,COLLEGE students ,MACHINE learning ,WOMEN'S health ,DECISION trees ,MINERALS ,ALGORITHMS ,DISEASE risk factors ,DISEASE complications - Abstract
Purpose: This study utilized data mining and machine learning (ML) techniques to identify new patterns and classifications of the associations between nutrient intake and anemia among university students. Methods: We employed K-means clustering analysis algorithm and Decision Tree (DT) technique to identify the association between anemia and vitamin and mineral intakes. We normalized and balanced the data based on anemia weighted clusters for improving ML models' accuracy. In addition, t-tests and Analysis of Variance (ANOVA) were performed to identify significant differences between the clusters. We evaluated the models on a balanced dataset of 755 female participants from the Hebron district in Palestine. Results: Our study found that 34.8% of the participants were anemic. The intake of various micronutrients (i.e., folate, Vit A, B5, B6, B12, C, E, Ca, Fe, and Mg) was below RDA/AI values, which indicated an overall unbalanced malnutrition in the present cohort. Anemia was significantly associated with intakes of energy, protein, fat, Vit B1, B5, B6, C, Mg, Cu and Zn. On the other hand, intakes of protein, Vit B2, B5, B6, C, E, choline, folate, phosphorus, Mn and Zn were significantly lower in anemic than in non-anemic subjects. DT classification models for vitamins and minerals (accuracy rate: 82.1%) identified an inverse association between intakes of Vit B2, B3, B5, B6, B12, E, folate, Zn, Mg, Fe and Mn and prevalence of anemia. Conclusions: Besides the nutrients commonly known to be linked to anemia—like folate, Vit B6, C, B12, or Fe—the cluster analyses in the present cohort of young female university students have also found choline, Vit E, B2, Zn, Mg, Mn, and phosphorus as additional nutrients that might relate to the development of anemia. Further research is needed to elucidate if the intake of these nutrients might influence the risk of anemia. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
262. A comprehensive review for chronic disease prediction using machine learning algorithms.
- Author
-
Islam, Rakibul, Sultana, Azrin, and Islam, Mohammad Rashedul
- Subjects
CHRONIC diseases ,TECHNOLOGICAL innovations ,SUPPORT vector machines ,DECISION trees ,RANDOM forest algorithms ,DEEP learning ,MACHINE learning - Abstract
The past few years have seen an emergence of interest in examining the significance of machine learning (ML) in the medical field. Diseases, health emergencies, and medical disorders may now be identified with greater accuracy because of technological advancements and advances in ML. It is essential especially to diagnose individuals with chronic diseases (CD) as early as possible. Our study has focused on analyzing ML's applicability to predict CD, including cardiovascular disease, diabetes, cancer, liver, and neurological disorders. This study offered a high-level summary of the previous research on ML-based approaches for predicting CD and some instances of their applications. To wrap things up, we compared the results obtained by various studies and the methodologies as well as tools employed by the researchers. The factors or parameters that are responsible for improving the accuracy of the predicting model for different previous works are also identified. For identifying significant features, most of the authors employed a variety of strategies, where least absolute shrinkage and selection (LASSO), minimal-redundancy-maximum-relevance (mRMR), and RELIEF are extensively used methods. It is seen that a wide range of ML approaches, including support vector machine (SVM), random forest (RF), decision tree (DT), naïve Bayes (NB), etc., have been widely used. Also, several deep learning techniques and hybrid models are employed to create CD prediction models, resulting in efficient and reliable clinical decision-making models. For the benefit of the whole healthcare system, we have also offered our suggestions for enhancing the prediction results of CD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
263. The most important variables associated with death due to COVID‐19 disease, based on three data mining models Decision Tree, AdaBoost, and Support Vector Machine: A cross‐sectional study.
- Author
-
gharehhasani, Bita Shokri, Rezaei, Mansour, Naghipour, Armin, Sayad, Nazanine, Mostafaei, Shayan, and Alimohammadi, Ehsan
- Subjects
SUPPORT vector machines ,COVID-19 ,DECISION trees ,DATA mining ,BLOOD diseases ,TASTE disorders ,COUGH - Abstract
Introduction: Death due to covid‐19 is one of the biggest health challenges in the world. There are many models that can predict death due to COVID‐19. This study aimed to fit and compare Decision Tree (DT), Support Vector Machine (SVM), and AdaBoost models to predict death due to COVID‐19. Methods: To describe the variables, mean (SD) and frequency (%) were reported. To determine the relationship between the variables and the death caused by COVID‐19, chi‐square test was performed with a significance level of 0.05. To compare DT, SVM and AdaBoost models for predicting death due to COVID‐19 from sensitivity, specificity, accuracy and the area under the rock curve under R software using psych, caTools, random over‐sampling examples, rpart, rpartplot packages was done. Results: Out of the total of 23,054 patients studied, 10,935 cases (46.5%) were women, and 12,569 cases (53.5%) were men. Additionally, the mean age of the patients was 54.9 ± 21.0 years. There is a statistically significant relationship between gender, fever, cough, muscle pain, smell and taste, abdominal pain, nausea and vomiting, diarrhea, anorexia, dizziness, chest pain, intubation, cancer, diabetes, chronic blood disease, Violation of immunity, pregnancy, Dialysis, chronic lung disease with the death of covid‐19 patients showed (p < 0.05). The results showed that the sensitivity, specificity, accuracy and the area under the receiver operating characteristic curve were respectively 0.60, 0.68, 0.71, and 0.75 in the DT model, 0.54, 0.62, 0.63, and 0.71 in the SVM model, and 0.59, 0.65, 0.69 and 0.74 in the AdaBoost model. Conclusion: The results showed that DT had a high predictive power compared to other data mining models. Therefore, it is suggested to researchers in different fields to use DT to predict the studied variables. Also, it is suggested to use other approaches such as random forest or XGBoost to improve the accuracy in future studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
264. Analysis of CO selectivity during electroreduction of CO2 in deep eutectic solvents by machine learning.
- Author
-
Günay, M. Erdem and Tapan, N. Alper
- Subjects
MACHINE learning ,MELTING points ,EUTECTICS ,SOLVENTS ,PRINCIPAL components analysis ,DECISION trees ,ELECTROLYTIC reduction - Abstract
In this work, supervised and unsupervised machine learning approaches were applied to determine routes to high CO selectivity during the electroreduction of CO
2 in deep eutectic solvents (DES) utilizing the molecular, chemical, and physical characteristics of hydrogen bond donors and acceptors, as well as the properties of different electrodes and DES solvents. In addition, effective data visualization and machine learning techniques were employed to identify relationships between descriptor variables and CO faradaic efficiency. First, SHAP (Shapley Additive exPlanations) analysis was applied to determine the positive and negative effects of the descriptor variables on the target, and it was found that urea in HBD (hydrogen bond donor) has the greatest impact on the target. Then, principal component analysis (PCA) was used to identify the combinations that lead to low, medium, and high levels of the target. PCA indicated that high-level clusters may be linked with HBA (hydrogen bond acceptor) molecular properties rather than HBD in addition to choline chloride-type HBA, HBA/HBD ratio, HBD density, HBD melting point, and urea-type HBD. Finally, decision tree classification was used to discover the variables leading to very high levels of the target. The decision tree revealed one pathway with very high CO faradaic efficiency and two pathways with high CO faradaic efficiency. To conclude, future researchers will be able to design new experiments with less effort and time while analyzing the effect of new DES components for high-performance CO2 electrolyzers as a result of the machine learning study and exploratory data analysis performed in this study. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
265. Intelligent financial management system based on Data Mining Technology.
- Author
-
Du, Chunyu
- Subjects
DATABASE management ,BUDGET ,FINANCIAL management ,DATA mining ,FINANCIAL institutions ,DECISION trees - Abstract
This article is based on decision tree algorithm and data mining analysis technology to achieve the intelligence of financial management and decision-making in enterprise, providing effective support for financial management. An improved metric based C4.5 decision tree algorithm was proposed, which combines data warehousing, data mining, and analysis techniques, and adopts a data-driven approach to establish intelligent financial management in enterprise. Apply it to early warning of budget execution progress in financial projects, conduct experimental analysis and result research. In this experiment, data mining achieved significant results. From the root node of the decision tree, it can be seen that whether the project budget execution progress at the end of July is greater than 22.79% is of great significance for determining whether the project budget can reach 97% of the year-end budget execution progress. Based on the experimental results, it is concluded that as of July, more attention should be paid to the execution of project budgets, and effective management of data is the primary task and goal of financial institutions in enterprise. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
266. Data mining for predictive analysis in gynecology: a focus on cervical health.
- Author
-
Andrade-Arenas, Laberiano, Rubio-Paucar, Inoc, and Yactayo-Arias, Cesar
- Subjects
DATA mining ,DECISION trees ,CERVICAL cancer ,GYNECOLOGY ,DECISION making - Abstract
Currently, data mining based on the application of detection of important patterns that allow making decisions according to cervical cancer is a problem that affects women from the age of 24 years and older. For this purpose, the Rapid Miner Studio tool was used for data analysis according to age. To perform this analysis, the knowledge discovery in databases (KDD) methodology was used according to the stages that this methodology follows, such as data selection, data preparation, data mining and evaluation and interpretation. On the other hand, the comparison of methodologies such as the standard intersectoral process for data mining (Crips-dm), KDD and sample, explore, modify, model, evaluate (Semma) is shown, which is separated by dimensions and in each dimension both methodologies are compared. In that sense, a graph was created comparing algorithmic models such as naive Bayes, decision tree, and rule induction. It is concluded that the most outstanding result was -1.424 located in cluster 4 in the attribute result date. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
267. Predictive models in Alzheimer's disease: an evaluation based on data mining techniques.
- Author
-
Andrade-Arenas, Laberiano, Rubio-Paucar, Inoc, and Yactayo-Arias, Cesar
- Subjects
ALZHEIMER'S disease ,DATA mining ,DISEASE risk factors ,PREDICTION models ,DECISION trees - Abstract
The increasing prevalence of Alzheimer's disease in older adults has raised significant concern in recent years. Aware of this challenge, this research set out to develop predictive models that allow early identification of people at risk for Alzheimer's disease, considering several variables associated with the disease. To achieve this objective, data mining techniques were employed, specifically the decision tree algorithm, using the RapidMiner Studio tool. The sample explore modify model and assess (SEMMA) methodology was implemented systematically at each stage of model development, ensuring an orderly and structured approach. The results obtained revealed that 45.00% of people with dementia present characteristics that identify them as candidates for confirmation of a diagnosis of Alzheimer's disease. In contrast, 52.78% of those who do not have dementia show no danger of contracting the disease. In the conclusion of the research, it was noted that most patients diagnosed with Alzheimer's are older than 65 years, indicating that this stage of life tends to trigger brain changes associated with the disease. This finding underscores the importance of considering age as a key factor in the early identification of the disease. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
268. A Method for Supporting the Domain Expert by the Interpretation of Different Decision Trees Learnt from the Same Domain.
- Author
-
Perner, Petra
- Subjects
- *
DATA mining , *DECISION trees , *TREE graphs , *CLASSIFICATION algorithms , *ACQUISITION of data - Abstract
Data mining methods are widely used across many disciplines to identify patterns, rules, or associations among huge volumes of data. Data mining methods with explanation capability such as decision tree induction are preferred in many domains. The aim of this paper is to discuss how to deal with the result of decision tree induction methods. This paper has been prompted by the fact that domain experts are able to use the tools for decision tree induction but have great difficulties in interpreting the results. When the domain expert has learnt two decision trees that are from the same domain but based on different data sets as a result of further data collection, he is faced with the problem of how to interpret the different trees. The comparison of two decision trees is therefore an important issue as the user needs such a comparison in order to understand what has changed. We have proposed to provide him with a measure of correspondence between the two trees that allows him to judge if he can accept the changes or not. In this paper, we propose a proper similarity measure. In case of a low similarity value, the expert has evidence to start exploring the reason for this change. Often, he can find things in the data acquisition that might have resulted in some noise and might be fixed. Copyright © 2014 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
269. Presenting a Prediction Model for Successful Allogenic Hematopoietic Stem Cell Transplantation in Adults with Acute Myeloid Leukemia.
- Author
-
Langarizadeh, Mostafa, Farajollahi, Boshra, and Hajifathali, Abbas
- Subjects
- *
HOSPITALS , *DECISION trees , *HOMOGRAFTS , *GRAFT versus host disease , *DESCRIPTIVE statistics , *RESEARCH funding , *HEMATOPOIETIC stem cell transplantation , *PREDICTION models , *SENSITIVITY & specificity (Statistics) , *DATA mining , *DISEASE complications - Abstract
Background: Allogenic hematopoietic stem cell transplantation is considered as an effective treatment for patients with acute myeloid leukemia. However, complications of transplantation, like aGVHD, affect the efficiency of allogenic hematopoietic stem cell transplantation. The present study aimed to implement different models of data mining (DM) (single and ensemble) for prediction of allogenic hematopoietic stem cell transplantation in patients with acute myeloid leukemia (transplantation against host disease). Method: We conducted this developmental study on 94 patients with 34 attributes in Taleghani Hospital, Tehran, Iran, during 2009--2017. In this practical study, data were analyzed via decision tree (DT) algorithms, including decision tree, random forest and gradient boosting (ensemble learning), artificial neural network (Single Learning), and support vector machine. Some criteria, like specificity, accuracy, Fmeasure, AUC (area under curve), and sensitivity, were reported in order to evaluate DT algorithms. Results: There were 34 transplantation-related variables; some predictors, such as liver, hemoglobin, and donor blood group, were found to be the most important ones. To predict aGVHD, the two selected algorithms included the most appropriate DM models, artificial neural network and support vector machine classifiers, with ROC of 100. Conclusion: This study indicated that DT algorithms could be successfully used for approving the efficiency of the models predicting allogenic hematopoietic stem cell transplantation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
270. The data sampling effect on financial distress prediction by single and ensemble learning techniques.
- Author
-
Sue, Kuen-Liang, Tsai, Chih-Fong, and Chiu, Andy
- Subjects
- *
DECISION trees , *PREDICTION models , *CREDIT ratings , *FORECASTING - Abstract
Financial distress domain problem datasets are usually class imbalanced. In literature, data sampling is one of the widely used solutions to deal with the class imbalance problem. This article focuses on examining the data sampling effect on financial distress prediction models by single and ensemble learning techniques. The experimental datasets are based on three bankruptcy prediction and credit scoring datasets and twelve different single classifiers and classifier ensembles are constructed. We find that although some prediction models trained by the original class imbalanced datasets provide reasonable AUC, their type II errors are very high for the practical usage. However, when data sampling is performed over the datasets, all of the prediction models can slightly increase their AUC and largely reduce their type II errors. More specifically, the decision tree ensembles by bagging and boosting methods are the better choices for financial distress prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
271. Classification of transformed anchovy products based on the use of element patterns and decision trees to assess traceability and country of origin labelling.
- Author
-
Varrà, Maria Olga, Husáková, Lenka, Patočka, Jan, Ghidini, Sergio, and Zanardi, Emanuela
- Subjects
- *
INDUCTIVELY coupled plasma mass spectrometry , *DECISION trees , *COUNTRY of origin (Commerce) , *ANCHOVIES , *NUCLEOSYNTHESIS , *STRONTIUM , *MERCURY vapor - Abstract
• Elemental profiles of 180 anchovy samples are determined by ICP-MS. • Influence of the production process on the element composition can be passed over. • Pivotal elements for origin discrimination are identified. • Element compositions provide highly specific fingerprint of anchovy products. • CHAID and QUEST decision tree algorithms are recommended for origin prediction. Quadrupole inductively coupled plasma mass spectrometry (Q-ICP-MS) and direct mercury analysis were used to determine the elemental composition of 180 transformed (salt-ripened) anchovies from three different fishing areas before and after packaging. To this purpose, four decision trees-based algorithms, corresponding to C5.0, classification and regression trees (CART), chi-square automatic interaction detection (CHAID), and quick unbiased efficient statistical tree (QUEST) were applied to the elemental datasets to find the most accurate data mining procedure to achieve the ultimate goal of fish origin prediction. Classification rules generated by the trained CHAID model optimally identified unlabelled testing bulk anchovies (93.9% F-score) by using just 6 out of 52 elements (As, K, P, Cd, Li, and Sr). The finished packaged product was better modelled by the QUEST algorithm which recognised the origin of anchovies with F-score of 97.7%, considering the information carried out by 5 elements (B, As, K. Cd, and Pd). Results obtained suggested that the traceability system in the fishery sector may be supported by simplified machine learning techniques applied to a limited but effective number of inorganic predictors of origin. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
272. Optimization of Dominance Testing in Skyline Queries Using Decision Trees
- Author
-
Fei Hao, Jong-Hyeok Choi, Yoo-Sung Kim, and Aziz Nasridinov
- Subjects
Skyline ,General Computer Science ,Computer science ,query processing ,General Engineering ,Decision tree ,Sorting ,incomparability ,InformationSystems_DATABASEMANAGEMENT ,Decision rule ,computer.software_genre ,TK1-9971 ,Database ,Set (abstract data type) ,Data point ,skyline query ,decision tree ,Entropy (information theory) ,General Materials Science ,Pairwise comparison ,Electrical engineering. Electronics. Nuclear engineering ,Data mining ,Electrical and Electronic Engineering ,computer - Abstract
Skyline queries identify skyline points, the minimal set of data points that dominate all other data points in a large dataset. The main challenge with skyline queries is executing the skyline query in the shortest possible time. To address and solve skyline query performance issues, we propose a decision tree-based method known as the decision tree-based comparator (DC). This method minimizes unnecessary dominance tests (i.e., pairwise comparisons) by constructing a decision tree based on the dominance testing. DC uses dominance relations that can be obtained from the decision rules of the decision tree to determine incomparability between data points. DC can also be easily applied to improve the performance of various existing skyline query methods. After describing the theoretical background of DC and applying it to existing skyline queries, we present the results of various experiments showing that DC can improve skyline query performance by up to 23.15 times.
- Published
- 2021
273. A Comparative Analysis of Decision Trees Vis-à-vis Other Computational Data Mining Techniques in Automotive Insurance Fraud Detection.
- Author
-
Gepp, Adrian, Wilson, J. Holton, Kumar, Kuldeep, and Bhattacharya, Sukanto
- Subjects
- *
COMPARATIVE studies , *DECISION trees , *DATA mining , *INSURANCE crimes , *PREDICTION models , *BUSINESS failures , *AUTOMOBILE industry - Abstract
The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involv-ing financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of fi-nancial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a compara-tive analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists. [ABSTRACT FROM AUTHOR]
- Published
- 2012
274. Integration of Decision Trees Using Distance to Centroid and to Decision Boundary
- Author
-
Robert Burduk and Jedrzej Biedrzycki
- Subjects
classifier integration ,General Computer Science ,Computer science ,Decision tree ,Centroid ,QA75.5-76.95 ,computer.software_genre ,classifier integrat ,Theoretical Computer Science ,ComputingMethodologies_PATTERNRECOGNITION ,ensemble of classifiers ,Electronic computers. Computer science ,Decision boundary ,Data mining ,computer ,distance to decision boundary - Abstract
Plethora of ensemble techniques have been implemented and studied in order to achieve better classification results than base classifiers. In this paper an algorithm for integration of decision trees is proposed, which means that homogeneous base classifiers will be used. The novelty of the presented approach is the usage of the simultaneous distance of the object from the decision boundary and the center of mass of objects belonging to one class label in order to determine the score functions of base classifiers. This means that the score function assigned to the class label by each classifier depends on the distance of the classified object from the decision boundary and from the centroid. The algorithm was evaluated using an open-source benchmarking dataset. The results indicate an improvement in the classification quality in comparison to the referential method - majority voting method.
- Published
- 2020
275. RENDIMIENTO ACADÉMICO DE ESTUDIANTES EN EDUCACIÓN SUPERIOR: PREDICCIONES DE FACTORES INFLUYENTES A PARTIR DE ÁRBOLES DE DECISIÓN.
- Author
-
Díaz-Landa, Brenda, Meleán-Romero, Rosana, and Marín-Rodriguez, William
- Subjects
ARTIFICIAL intelligence ,DATA mining ,MASTER'S degree ,DECISION trees ,ALGORITHMS - Abstract
Copyright of Revista Telos is the property of Revista Telos and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
- Full Text
- View/download PDF
276. Privately Evaluating Decision Trees and Random Forests
- Author
-
David J. Wu, Tony Feng, Michael Naehrig, and Kristin E. Lauter
- Subjects
Computer science ,Feature vector ,Decision tree ,0102 computer and information sciences ,02 engineering and technology ,Machine learning ,computer.software_genre ,privacy ,01 natural sciences ,0202 electrical engineering, electronic engineering, information engineering ,Bandwidth (computing) ,Protocol (object-oriented programming) ,General Environmental Science ,Ethics ,decision trees ,business.industry ,Process (computing) ,Information technology ,QA75.5-76.95 ,BJ1-1725 ,secure computation ,Random forest ,Tree (data structure) ,010201 computation theory & mathematics ,Electronic computers. Computer science ,General Earth and Planetary Sciences ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,business ,computer - Abstract
Decision trees and random forests are common classifiers with widespread use. In this paper, we develop two protocols for privately evaluating decision trees and random forests. We operate in the standard two-party setting where the server holds a model (either a tree or a forest), and the client holds an input (a feature vector). At the conclusion of the protocol, the client learns only the model’s output on its input and a few generic parameters concerning the model; the server learns nothing. The first protocol we develop provides security against semi-honest adversaries. We then give an extension of the semi-honest protocol that is robust against malicious adversaries. We implement both protocols and show that both variants are able to process trees with several hundred decision nodes in just a few seconds and a modest amount of bandwidth. Compared to previous semi-honest protocols for private decision tree evaluation, we demonstrate a tenfold improvement in computation and bandwidth.
- Published
- 2016
277. Image Processing and Image Mining using Decision Trees.
- Author
-
KUN-CHE LU and DON-LIN YANG
- Subjects
DIGITAL image processing ,DATA mining ,DECISION trees ,PIXELS ,IMAGE retrieval ,IMAGE databases - Abstract
Valuable information can be hidden in images, however, few research discuss data mining on them. In this paper, we propose a general framework based on the decision tree for mining and processing image data. Pixel-wised image features were extracted and transformed into a database-like table which allows various data mining algorithms to make explorations on it. Each tuple of the transformed table has a feature descriptor formed by a set of features in conjunction with the target label of a particular pixel. With the label feature, we can adopt the decision tree induction to realize relationships between attributes and the target label from image pixels, and to construct a model for pixel-wised image processing according to a given training image dataset. Both experimental and theoretical analyses were performed in this study. Their results show that the proposed model can be very efficient and effective for image processing and image mining. It is anticipated that by using the proposed model, various existing data mining and image processing methods could be worked on together in different ways. Our model can also be used to create new image processing methodologies, refine existing image processing methods, or act as a powerful image filter. [ABSTRACT FROM AUTHOR]
- Published
- 2009
278. Factores asociados al desempeño académico en Lectura Crítica en las pruebas Saber 11o con árboles de decisión.
- Author
-
Timarán Pereira, Ricardo, Hidalgo Troya, Arsenio, and Caicedo Zambrano, Javier
- Subjects
HIGH school students ,DATA mining ,DECISION trees ,READING - Abstract
Copyright of Investigación e Innovación en Ingenierías is the property of Universidad Simon Bolivar and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
279. Explainable AI: Machine Learning Interpretation in Blackcurrant Powders.
- Author
-
Przybył, Krzysztof
- Subjects
MACHINE learning ,DEEP learning ,ARTIFICIAL intelligence ,RANDOM forest algorithms ,DECISION trees ,DATA mining ,POWDERS - Abstract
Recently, explainability in machine and deep learning has become an important area in the field of research as well as interest, both due to the increasing use of artificial intelligence (AI) methods and understanding of the decisions made by models. The explainability of artificial intelligence (XAI) is due to the increasing consciousness in, among other things, data mining, error elimination, and learning performance by various AI algorithms. Moreover, XAI will allow the decisions made by models in problems to be more transparent as well as effective. In this study, models from the 'glass box' group of Decision Tree, among others, and the 'black box' group of Random Forest, among others, were proposed to understand the identification of selected types of currant powders. The learning process of these models was carried out to determine accuracy indicators such as accuracy, precision, recall, and F1-score. It was visualized using Local Interpretable Model Agnostic Explanations (LIMEs) to predict the effectiveness of identifying specific types of blackcurrant powders based on texture descriptors such as entropy, contrast, correlation, dissimilarity, and homogeneity. Bagging (Bagging_100), Decision Tree (DT0), and Random Forest (RF7_gini) proved to be the most effective models in the framework of currant powder interpretability. The measures of classifier performance in terms of accuracy, precision, recall, and F1-score for Bagging_100, respectively, reached values of approximately 0.979. In comparison, DT0 reached values of 0.968, 0.972, 0.968, and 0.969, and RF7_gini reached values of 0.963, 0.964, 0.963, and 0.963. These models achieved classifier performance measures of greater than 96%. In the future, XAI using agnostic models can be an additional important tool to help analyze data, including food products, even online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
280. Predicting Academic Success of College Students Using Machine Learning Techniques.
- Author
-
Guanin-Fajardo, Jorge Humberto, Guaña-Moya, Javier, and Casillas, Jorge
- Subjects
COLLEGE students ,DECISION trees ,DATA mining ,ACADEMIC achievement ,CLASSIFICATION algorithms ,SCHOOL dropout prevention ,MACHINE learning - Abstract
College context and academic performance are important determinants of academic success; using students' prior experience with machine learning techniques to predict academic success before the end of the first year reinforces college self-efficacy. Dropout prediction is related to student retention and has been studied extensively in recent work; however, there is little literature on predicting academic success using educational machine learning. For this reason, CRISP-DM methodology was applied to extract relevant knowledge and features from the data. The dataset examined consists of 6690 records and 21 variables with academic and socioeconomic information. Preprocessing techniques and classification algorithms were analyzed. The area under the curve was used to measure the effectiveness of the algorithm; XGBoost had an AUC = 87.75% and correctly classified eight out of ten cases, while the decision tree improved interpretation with ten rules in seven out of ten cases. Recognizing the gaps in the study and that on-time completion of college consolidates college self-efficacy, creating intervention and support strategies to retain students is a priority for decision makers. Assessing the fairness and discrimination of the algorithms was the main limitation of this work. In the future, we intend to apply the extracted knowledge and learn about its influence of on university management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
281. Developing a prototype system of computer-aided appointment scheduling: A radiology department case study.
- Author
-
Chen, Ping-Shun, Lai, Chin-Hui, Chen, Ying-Tzu, and Lung, Ting-Yu
- Subjects
MULTIPLE regression analysis ,K-means clustering ,HOSPITAL patients ,DECISION trees ,SCHEDULING - Abstract
BACKGROUND: Scheduling patient appointments in hospitals is complicated due to various types of patient examinations, different departments and physicians accessed, and different body parts affected. OBJECTIVE: This study focuses on the radiology scheduling problem, which involves multiple radiological technologists in multiple examination rooms, and then proposes a prototype system of computer-aided appointment scheduling based on information such as the examining radiological technologists, examination departments, the patient's body parts being examined, the patient's gender, and the patient's age. METHODS: The system incorporated a stepwise multiple regression analysis (SMRA) model to predict the number of examination images and then used the K-Means clustering with a decision tree classification model to classify the patient's examination time within an appropriate time interval. RESULTS: The constructed prototype creates a feasible patient appointment schedule by classifying patient examination times into different categories for different patients according to the four types of body parts, eight hospital departments, and 10 radiological technologists. CONCLUSION: The proposed patient appointment scheduling system can schedule appointment times for different types of patients according to the type of visit, thereby addressing the challenges associated with diversity and uncertainty in radiological examination services. It can also improve the quality of medical treatment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
282. Nature of Occupational Incidents among Roofing Contractors: A Data Mining Approach.
- Author
-
Onuchukwu, Ikechukwu Sylvester, Gholizadeh, Pouya, Liko, Gentian, and Esmaeili, Behzad
- Subjects
ROOFING contractors ,DATA mining ,DECISION trees ,DECISION making ,CONSTRUCTION contractors - Abstract
Given that roofing contractors in the construction industry have the highest fatality rate among specialty contractors, understanding the root cause of incidents among roofers is critical for improving safety outcomes. This study applied frequency analysis and decision tree data-mining techniques to analyze roofers' fatal and non-fatal accident reports. The frequency analysis yielded insights into the leading cause of accidents, with fall to a lower level (83%) being the highest, followed by incidence sources relating to structures and surfaces (56%). The most common injuries experienced by roofing contractors were fractures (49%) and concussions (15%), especially for events occurring in residential buildings, maintenance and repair works, small projects (i.e., $50,000 or less), and on Mondays. According to the decision tree analysis, the most important factor for determining the nature of the injury is the nonfragile injured body part, followed by injury caused by coating works. The decision tree also produced decision rules that provide an easy interpretation of the underlying association between the factors leading to incidents. The decision tree models developed in this study can be used to predict the nature of potential injuries for strategically selecting the most effective injury-prevention strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
283. Knowledge discovery and data mining in psychology: Using decision trees to predict the Sensation Seeking Scale score
- Author
-
Andrej Kastrin
- Subjects
knowledge discovery from data ,data mining ,psychological assessment ,Psychology ,BF1-990 - Abstract
Knowledge discovery from data is an interdisciplinary research field combining technology and knowledge from domains of statistics, databases, machine learning and artificial intelligence. Data mining is the most important part of knowledge discovery process. The objective of this paper is twofold. The first objective is to point out the qualitative shift in research methodology due to evolving knowledge discovery technology. The second objective is to introduce the technique of decision trees to psychological domain experts. We illustrate the utility of the decision trees on the prediction model of sensation seeking. Prediction of the Zuckerman's Sensation Seeking Scale (SSS-V) score was based on the bundle of Eysenck's personality traits and Pavlovian temperament properties. Predictors were operationalized on the basis of Eysenck Personality Questionnaire (EPQ) and Slovenian adaptation of the Pavlovian Temperament Survey (SVTP). The standard statistical technique of multiple regression was used as a baseline method to evaluate the decision trees methodology. The multiple regression model was the most accurate model in terms of predictive accuracy. However, the decision trees could serve as a powerful general method for initial exploratory data analysis, data visualization and knowledge discovery.
- Published
- 2008
284. Data mining decision tree algorithm C4.5 classification of student personality characteristics.
- Author
-
Selvida, Desilia, Pulungan, Annisa Fadhillah, and Elveny, Marischa
- Subjects
CLASSIFICATION algorithms ,ABILITY grouping (Education) ,DATA mining ,DECISION trees ,ERROR rates ,ALGORITHMS - Abstract
The C4.5 algorithm still has weaknesses in predicting or classifying data if the amount is large. It is necessary to improve the performance of the C4.5 algorithm with the selected split attribute using application of the average gain value to perform the classification. The C4.5 algorithm is one of the Decision Tree methods in the classification process using entropy. The result of the classification obtained from the analysis can be a classification of 8 student data from 100 student data that is tested to produce information on Sanguine, Choleric, Melancholy, and Phlegmatic. From the result of the Decision Tree classification algorithm C4.5 has an accuracy rate of 86.36% with an application error rate or error of 13.64%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
285. Building customer churn prediction models in Indonesian telecommunication company using decision tree algorithm.
- Author
-
Ramadhanti, Darin, Larasati, Aisyah, Muid, Abdul, and Mohamad, Effendi
- Subjects
- *
DECISION trees , *CONSUMERS , *PREDICTION models , *CUSTOMER satisfaction , *TELECOMMUNICATION , *DATA mining - Abstract
Customer churn has become a big problem for telecommunication companies. Preventive efforts are needed by predicting the value of churn in the future. This study uses data mining techniques with decision tree algorithms to predict customer churn in one of Indonesian Telecommunication companies. The best decision tree model has parameters of criterion information gain with a minimal gain = 0.01 and a max depth = 6. This decision tree model has an accuracy value of 78.28% with 19,6% customer churn rate. Based on this model, customers of this company tend to have voluntary churns. Some important factors that affect customer churn are type of contract, number of monthly downloads, tenure, customer satisfaction value, and add on. The type of contract has the highest impact on the customer churn in this company. Based on the results, the company is suggested to promote a retention program based in order to decrease customer churn rate. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
286. Decision tree using ant colony for classification of health data.
- Author
-
Nugroho, Arief Kelik, Permadi, Ipung, Kurniawan, Yogiek Indra, Hanifa, Aini, and Nofiyati
- Subjects
- *
DECISION trees , *ANT colonies , *ANT algorithms , *CLASSIFICATION algorithms , *ANT behavior , *DATA mining - Abstract
The classification algorithm's goal is to built a model that maximizes the accuracy of the number of correct predictions, although the completeness of the model plays an important role in many application areas. Ant Colony Optimization (ACO) is relatively simple to realize the behavior of ant colonies, and they cooperate with each other to achieve the goal from nest to food source. A system capable of executing a search to discover the optimum answer to an optimization issue with a vast search space is referred to as a colony generation system. Classification by applying the ACO algorithm in data mining has the advantage of searching with flexible values and value combinations. One of the many benefits that can be applied using ACO is to build a decision tree. As a model representation, the decision tree is easy to understand and can be represented in the form of a graph. By using the modified decision tree using ACO, the result of using the pruning technique is 76.1%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
287. Análise dos atributos do solo e da produtividade da cultura de cana-de-açúcar com o uso da geoestatística e árvore de decisão Analyze the soil attributes and sugarcane yield culture with the use of geostatistics and decision trees
- Author
-
Zigomar Menezes de Souza, Domingos Guilherme Pellegrino Cerri, Marcelo José Colet, Luiz Henrique Antunes Rodrigues, Paulo Sérgio Graziano Magalhães, and Rafael Junqueira Araújo Mandoni
- Subjects
agricultura de precisão ,variabilidade espacial ,mineração de dados ,monitor de produtividade ,precision agriculture ,spatial variability ,data mining ,yield monitor ,Agriculture ,Agriculture (General) ,S1-972 - Abstract
Um dos desafios da agricultura de precisão é oferecer subsídios para a definição de unidades de manejo para posteriores intervenções. Portanto, o objetivo deste trabalho foi avaliar os atributos químicos do solo e a produtividade da cultura de cana-de-açúcar por meio da geoestatística e mineração de dados pela indução da árvore de decisão. A produtividade da cana-de-açúcar foi mapeada em uma área de aproximadamente 23ha, utilizando-se o critério de célula, por meio de um monitor de produtividade que permitiu a elaboração de um mapa digital que representa a superfície de produção para a área em estudo. Para determinar os atributos de um Argissolo Vermelho-Amarelo, foram coletadas as amostras no início da safra 2006/2007, utilizando-se uma grade regular de 50 x 50m, nas profundidades de 0,0-0,2m e 0,2-0,4m. Os dados dos atributos do solo e da produtividade foram analisados por meio da técnica de goestatística e classificados em três níveis de produção para indução de árvore de decisão. A árvore de decisão foi induzida no programa SAS Enterprise Miner, sendo utilizado algoritmo baseado na redução de entropia. As variáveis altitude e potássio apresentaram os maiores valores de correlação com a produtividade de cana-de-açúcar. A indução de árvores de decisão permitiu verificar que a altitude é a variável com maior potencial para interpretar os mapas de produtividade de cana-de-açúcar, auxiliando na agricultura de precisão e mostrando-se uma ferramenta adequada para o estudo de definição de zonas de manejo em área cultivada com essa cultura.One of the challenges of precision agriculture is to offer subsidies for the definition of management units for posterior interventions. Therefore, the objective of this work was to evaluate soil chemical attributes and sugarcane yield with the use of geostatistics and data mining by decision tree induction. Sugarcane yield was mapped in a 23ha field, applying the cell criterion, by using a yield monitor that allowed the elaboration of a digital map representing the surface of production of the studied area. To determine the soil attributes, soil samples were collected at the beginning of the harvest in 2006/2007 using a regular grid of 50 x 50m, in the depths of 0.0-0.2m and 0.2-0.4m. Soil attributes and sugarcane yield data were analyzed by using geostatistics techniques and were classified into three yield levels for the elaboration of the decision tree. The decision tree was induced in the software SAS Enterprise Miner, using an algorithm based on entropy reduction. Altitude and potassium presented the highest values of correlation with sugarcane yield. The induction of decision trees showed that the altitude is the variable with the greatest potential to interpret the sugarcane yield maps, then assisting in precision agriculture and, revealing an adjusted tool for the study of management definition zones in area cropped with sugarcane.
- Published
- 2010
288. Online adaptive decision trees based on concentration inequalities
- Author
-
André C. P. L. F. de Carvalho, Gonzalo Ramos-Jiménez, Rafael Morales-Bueno, José del Campo-Ávila, Agustín Alejandro Ortiz-Díaz, and Isvani Frias-Blanco
- Subjects
Information Systems and Management ,Concept drift ,Computer science ,Decision tree ,02 engineering and technology ,Machine learning ,computer.software_genre ,Management Information Systems ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Pruning (decision trees) ,Online algorithm ,Incremental decision tree ,Data stream mining ,business.industry ,Decision tree learning ,Tree (data structure) ,Bounded function ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,business ,APRENDIZADO COMPUTACIONAL ,computer ,Software - Abstract
Classification trees are a powerful tool for mining non-stationary data streams. In these situations, massive data are constantly generated at high speed and the underlying target function can change over time. The iadem family of algorithms is based on Hoeffding's and Chernoff's bounds and induces online decision trees from data streams, but is not able to handle concept drift. This study extends this family to deal with time-changing data streams. The new online algorithm, named iadem-3, performs two main actions in response to a concept drift. Firstly, it resets the variables affected by the change and maintains unbroken the structure of the tree, which allows for changes in which consecutive target functions are very similar. Secondly, it creates alternative models that replace parts of the main tree when they significantly improve the accuracy of the model, thereby rebuilding the main tree if needed. An online change detector and a non-parametric statistical test based on Hoeffding's bounds are used to guarantee this significance. A new pruning method is also incorporated in iadem-3, making sure that all split tests previously installed in decision nodes are useful. The learning model is also viewed as an ensemble of classifiers, and predictions of the main and alternative models are combined to classify unlabeled examples. iadem-3 is empirically compared with various well-known decision tree induction algorithms for concept drift detection. We empirically show that our new algorithm often reaches higher levels of accuracy with smaller decision tree models, maintaining the processing time bounded, irrespective of the number of instances processed.
- Published
- 2016
289. Predicting vessel speed in the Arctic without knowing ice conditions using AIS data and decision trees
- Author
-
Bjørnar Brende Smestad, Prithvi S Rao, Ekaterina Kim, Anirban Bhattacharyya, and Bjørn Egil Asbjørnslett
- Subjects
Shipment of goods. Delivery of goods ,Knowledge-based systems: 425 [VDP] ,Decision trees ,AIS ,Perspective (graphical) ,Decision tree ,Mode (statistics) ,HF5761-5780 ,Statistical model ,Speed ,computer.software_genre ,Kunnskapsbaserte systemer: 425 [VDP] ,Random forest ,Maskinlæring ,Shipping ,Arctic ,Machine learning ,Arktis ,Daylight ,Skipsfart ,Data mining ,Explicit knowledge ,computer ,Communication channel - Abstract
The vessel speed is one of the important parameters that govern safety, emergency, and transport planning in the Arctic. While previous studies have traditionally relied on physics-based simulations to predict vessel's speed in ice-covered waters, most have not fully explored data-driven approaches and powerful supervised machine learning tools to aid speed prediction. This study offers a perspective of applying supervised machine learning models to predict MV SOG using historical Automatic Identification System (AIS) data and without explicit knowledge of local ice conditions. This paper presents a case-study from the region of the Eastern Barents Sea and the Southern Kara Sea. We first analyzed the vessel traffic situation for the years 2017 and 2018, and then used this knowledge to build statistical models to predict vessel speeds. Finally, we evaluated the models’ performance on a test dataset from January 2019. Performance of three models (Random Forest, XGBoost, and LightGBM) have been tested with a variety of date-time handling techniques, and data input mode being permuted to arrive at the most optimal model. The results demonstrate the ability of the models to predict the vessel's speed based on its geographical location, time of the year and other engineered features such as daylight information and route. With the proposed approach we were able to achieve mean absolute error 3.5 knots in average on a test dataset without explicit knowledge of local ice conditions around the vessel, with the majority of the errors being in the Kara Strait region and the Sabetta Channel.
- Published
- 2021
290. Decision trees for informative process alarm definition and alarm-based fault classification.
- Author
-
Dorgo, Gyula, Palazoglu, Ahmet, and Abonyi, Janos
- Abstract
Alarm messages in industrial processes are designed to draw attention to abnormalities that require timely assessment or intervention. However, in practice, alarms are arbitrarily and excessively defined by process operators resulting numerous nuisance and chattering alarms that are simply a source of distraction. Countless techniques are available for the retrospective filtering of alarm data, e.g., adding time delays and deadbands to existing alarm settings. As an alternative, in the present paper, instead of filtering or modifying existing alarms, a method for the design of alarm messages being informative for fault detection is proposed which takes into consideration that the occurring alarm messages originally should be optimal for fault detection and identification. This methodology utilizes a machine learning technique, the decision tree classifier, which provides linguistically well-interpretable models without the modification of the measured process variables. Furthermore, an online application of the defined alarm messages for fault identification is presented using a sliding window-based data preprocessing approach. The effectiveness of the proposed methodology is demonstrated in terms of the analysis of a well-known benchmark simulator of a vinyl-acetate production technology, where the complexity of the simulator is considered to be sufficient for the testing of alarm systems. Note to practitioners: Process-specific knowledge can be used to label historical process data to normal operating and fault-specific periods. Alarm generation should be designed to be able to detect and isolate faulty states. Using decision trees, optimal "cuts" or alarm limits for the purpose of fault classification can be defined utilizing a labelled dataset. The results apply to a variety of industries operating with online control systems, and especially timely in the chemical industry. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
291. New Splitting Criteria for Decision Trees in Stationary Data Streams.
- Author
-
Jaworski, Maciej, Duda, Piotr, and Rutkowski, Leszek
- Subjects
DECISION trees ,DATA transmission systems ,ARTIFICIAL intelligence - Abstract
The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding’s inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding’s inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type- $I$ splitting criteria guarantee, with high probability, the highest expected value of split measure. Type- $II$ criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
292. DATA MINING ANALYSIS FOR IMPROVING DECISION-MAKING IN COMPUTER MANAGEMENT INFORMATION SYSTEMS.
- Author
-
XIAOHONG DONG and BING XIANG
- Subjects
INFORMATION resources management ,DATA mining ,ASSOCIATION rule mining ,INFORMATION storage & retrieval systems ,DECISION trees ,MANAGEMENT information systems ,ENTROPY (Information theory) - Abstract
In this paper a decision tree-based data mining procedure for information systems is proposed to enhance the accuracy and efficiency of data mining. An enhanced C4.5 decision tree method based on cosine similarity is suggested to evaluate the information gain rate of characteristics and the information entropy of their values. When the information entropy variance among any two values for attributes is within the threshold range, the cosine similarity of the merging attribute values is determined, and the information gain rate of the attributes is recalculated. Large-scale data sets that conventional data processing methods are unable to handle successfully have given rise to the area of data mining. The prime objective is to look into how data mining technology is used in computer management information systems. The benefits of data mining technologies in computer management information systems are examined from a variety of angles in this study. In order to analyze and comprehend huge data sets and to derive knowledge that can be utilized to enhance the decision-making process in computer management information systems, the suggested solution makes use of a number of data mining techniques, including Clustering, Classification, and Association Rule Mining. The experimental analysis indicates that the time required by the proposed method to construct a decision tree is less than the time required by the GBDT, P-GBDT method and the C5.0 decision tree Hyperion image forest type fine classification method. The minimum time is not more than 15 seconds when compared with the minimum time saving of the other two methods. The time required by the C5.0 decision tree Hyperion image forest type fine classification method is always the greatest in comparison with the minimum time saving of the C5.0 decision tree. The classification accuracy of the proposed method for various datasets exceeds 95 percent, and the data mining efficacy is high. This method enhances the precision and efficacy of data mining in order to uncover valuable information concealed behind a large volume of data and maximize its value. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
293. Processing Decision Tree Data Using Internet of Things (IoT) and Artificial Intelligence Technologies with Special Reference to Medical Application.
- Author
-
Al Fryan, Latefa Hamad, Shomo, Mahasin Ibrahim, Alazzam, Malik Bader, and Rahman, Md Adnan
- Subjects
- *
DECISION trees , *MEDICINE , *PSORIASIS , *INTERNET of things , *ARTIFICIAL intelligence , *DATABASE management , *ALGORITHMS , *DATA mining - Abstract
Alternative methods are available for a wide range of medical conditions. Idealistically, doctors would have a tool that would analyse their patients' symptoms and suggest the most accurate diagnosis and treatment plan. Artificial intelligence uses decision trees to predict and classify large datasets. A decision tree is a versatile prediction model. Its main purpose is to learn from observations and logic. Rule-based prediction systems represent and categorize events. We discuss the basic properties of decision trees and successful medical alternatives to the classic induction strategy. The study reviews some of the most important medical applications of decision trees (classification). We show researchers and managers how to accurately assess hospital and epidemic management behaviour. Additionally, we discuss decision trees and their applications. The results showed the effectiveness of decision trees in processing medical data by using internet of things (IoT) and artificial intelligence technologies in medical applications. Accordingly, the researchers recommend the use of these technologies in other fields of studies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
294. Real-time discrete event simulation: a framework for an intelligent expert system approach utilising decision trees
- Author
-
Divya Tiwari, Neha Prajapat, Ashutosh Tiwari, Windo Hutabarat, and Christopher Turner
- Subjects
0209 industrial biotechnology ,Computer science ,Mechanical Engineering ,Flexible manufacturing system ,Decision tree ,02 engineering and technology ,computer.software_genre ,Multi-objective optimization ,Industrial and Manufacturing Engineering ,Expert system ,Computer Science Applications ,Random forest ,020901 industrial engineering & automation ,Control and Systems Engineering ,Resource allocation ,Scenario analysis ,Data mining ,Industrial and production engineering ,Discrete event simulation ,computer ,Software ,Lead time ,Test data - Abstract
This paper explores the use of discrete event simulation (DES) for decision making in real time based on the potential for data streamed from production line sensors. Technological innovations for data collection and an increasingly competitive global market have led to an increase in the application of discrete event simulation by manufacturing companies in recent years. Scenario analysis and optimisation methods are often applied to these simulation models to improve objectives such as cost, profit and throughput. The literature review has identified key research gaps as the lack of example cases where multi-objective optimisation methods have been applied to simulation models and the need for a framework to visualise the relationship between inputs and outputs of simulation models. A framework is presented to enable the optimisation DES simulation models and optimise multiple objectives simultaneously using design of experiments and meta-models to create a Pareto front of solutions. The results show that the resource allocation meta-model provides acceptable prediction accuracy whilst the lead time meta-model was not able to provide accurate prediction. Regression trees have been proposed to assist stakeholders with understanding the relationships between input and output variables. The framework uses regression and classification trees with overlaid values for multiple objectives and random forests to improve prediction accuracy for new points. A real-life test case involving a turbine assembly process is presented to illustrate the use and validity of the framework. The generated regression tree expressed a general trend by demonstrating relationships between input variables and two conflicting objectives. Random forests were implemented for creating higher accuracy predictions and they produced a mean square error of ~ 0.066 on the training data and ~ 0.081 on test data.
- Published
- 2020
295. Forest data visualization and land mapping using support vector machines and decision trees
- Author
-
Sujatha Radhakrishnan, Jyotir Moy Chatterjee, Aarthy Seshadri Lakshminarayanan, and D. Jude Hemanth
- Subjects
Box plot ,010504 meteorology & atmospheric sciences ,Computer science ,Rule induction ,business.industry ,Decision tree ,Confusion matrix ,Terrain ,010502 geochemistry & geophysics ,computer.software_genre ,01 natural sciences ,Support vector machine ,Advanced Spaceborne Thermal Emission and Reflection Radiometer ,Data visualization ,General Earth and Planetary Sciences ,Data mining ,business ,computer ,0105 earth and related environmental sciences - Abstract
Forests play a vital role in the regulation of climate, absorption of carbon dioxide, hydrological cycle, conservation of water, soil and biodiversity and help mitigate natural disasters. With the help of various remote sensors, high-resolution satellite images are being collected nowadays, which helps in tackling the global challenges of forest mapping in remote areas. Each landscape will grow different types of trees and in turn substantiate a part of the country’s economy. This paper uses visualization and machine learning (ML) processes to classify the forest land on the terrain dataset composed of the advanced spaceborne thermal emission and reflection radiometer (ASTER) imaging instrument to get the insight of the cumulated data by using Box Plot and Heat Map. The accuracy obtained by utilizing different machine learning techniques like Support Vector Machine (SVM) gives 95.4%, Logistic Regression (LR) gives 94.5%, K-Nearest Neighbor (K-NN) gives 93.7%, Decision Tree (DT) with 89.5%, Stochastic Gradient Descendent (SGD) with 92.4% and CN2 Rule Induction (RI) gives 85.3% are allied which gives appreciable results in forest mapping substantiated the same with confusion matrix and ROC. We also obtained the DT and rules for the considered dataset.
- Published
- 2020
296. Improve Intrusion Detection Using Grasshopper Optimization Algorithm and Decision Trees
- Author
-
Seiyed Hosseiny, Morad Derakhshan, and Akram Isvand Rahmani
- Subjects
Optimization algorithm ,biology ,Computer science ,Decision tree ,Intrusion detection system ,Data mining ,Safety, Risk, Reliability and Quality ,computer.software_genre ,Grasshopper ,biology.organism_classification ,computer ,General Environmental Science - Published
- 2020
297. From Data to Insights: Advancing Anomaly Detection with a Robust Modeling Pipeline.
- Author
-
Trardi, Youssef, Ananou, Bouchra, and Ouladsine, Mustapha Ouladsine
- Subjects
DATA mining ,DECISION trees ,DECISION making ,MACHINE learning ,DIGITAL technology - Abstract
The field of multivariate time series (MTS) anomaly detection has gained significant importance in data mining and industrial applications. However, existing studies often overlook crucial variables and fail to address their relationships, resulting in information loss and false alarms. To address this, we propose a supervised approach for detecting anomalies in MTS using multi-level processing techniques and decision tree learning algorithms. We apply our approach to detect abnormal wafers in an intermediate manufacturing chain, utilizing feedback from an eleven-sensor array over 150-second intervals. Each sensor's time series is processed in two layers: the first layer applies dimensionality reduction techniques, and the second layer employs a meta-transformer to evaluate feature importance. We train decision trees and boosting methods on the processed data and validate the models using a dataset of 7000 samples. Our results demonstrate the effectiveness of our approach in supporting decision-making for detecting abnormal wafers in semiconductor manufacturing processes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
298. Data Mining in Social Sciences: A Decision Tree Application Using Social and Political Concepts.
- Author
-
Massou, Efthalia, Prodromitis, Gerasimos, and Papastamou, Stamos
- Subjects
- *
DECISION trees , *DATA mining , *SOCIAL psychology , *POLITICAL attitudes , *SOCIAL attitudes , *LATENT variables , *PREDICTIVE validity - Abstract
In this paper, we investigated the utility of data mining to classify individuals into predefined categories of a target variable, based on their social and political attitude. Data collected for a social psychology study conducted in Greece in 1994 were used for this purpose. We established the theoretical background of our analysis through explanatory factor analysis. We ran the decision tree algorithm CHAID in order to build a predictive model that classifies the study participants in terms of their attitude toward physical and symbolic violence. The CHAID algorithm provided a decision tree that was easily interpreted, and which revealed meaningful predictive patterns. CHAID algorithm showed satisfactory predictive ability and promising alternatives to social psychology data analysis. To the best of our knowledge, there is no other evidence in the literature that the decision tree algorithms can be used to identify latent variables. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
299. Missing data imputation using decision trees and fuzzy clustering with iterative learning
- Author
-
Susan E. Bedingfield, Sanaz Nikfalazar, Chung-Hsing Yeh, and Hadi Akbarzadeh Khorshidi
- Subjects
Fuzzy clustering ,Computer science ,Iterative learning control ,Decision tree ,02 engineering and technology ,computer.software_genre ,Missing data ,Human-Computer Interaction ,Artificial Intelligence ,Hardware and Architecture ,Robustness (computer science) ,020204 information systems ,Missing data imputation ,0202 electrical engineering, electronic engineering, information engineering ,Data mining ,Imputation (statistics) ,computer ,Categorical variable ,Software ,Information Systems - Abstract
Various imputation approaches have been proposed to address the issue of missing values in data mining and machine learning applications. To improve the accuracy of missing data imputation, this paper proposes a new method called DIFC by integrating the merits of decision tress and fuzzy clustering into an iterative learning approach. To compare the performance of the DIFC method against five effective imputation methods, extensive experiments are conducted on six widely used datasets with numerical and categorical missing data, and with various amounts and types of missing values. The experimental results show that the DIFC method outperforms other methods in terms of imputation accuracy. Further experiments on the effect of missing value types demonstrate the robustness of the DIFC method in dealing with different types of missing values. This paper contributes to missing data imputation research by providing an accurate and robust method.
- Published
- 2019
300. A new methodology to speed-up fuel lattice design optimization using decision trees and new objective functions.
- Author
-
Ortiz-Servin, Juan José, Pelta, David A., Cadenas, José Manuel, and Castillo, Alejandro
- Subjects
- *
BOILING water reactors , *DECISION trees , *RECURRENT neural networks , *LATTICE constants - Abstract
• In this paper, a methodology to speed up fuel lattices optimization is shown. • A recurrent neural network to explore the solutions space was used. • Two new objective functions were studied. • Decision trees to estimate variables involved in objective functions were constructed. • This methodology was tested obtaining a good NN-DT performance. In this paper a new methodology to speed up the fuel lattice design optimization in a BWR is explored. In previous works, fuel lattice optimization was made using LPPF (Local Power Peaking Factor) at the beginning of the fuel lattice life. However, undesirable LPPF vs. fuel lattice exposure behaviors were observed. Due to this, LPPF vs. fuel lattice exposure was calculated through out fuel lattice life burnup. From a computational point of view, such calculation is very expensive when done using the CASMO-4 code. A new methodology to speed up the optimization was proposed based on two aspects: in one side, using objective functions that take into account LPPF vs. fuel lattice exposure and residual gadolinia; and in other side, using decision trees to estimate some fuel lattice parameters in a fast and reliable way. It could be verified that decision trees estimations had the enough reliability to be used into an optimization process to discard bad fuel lattice configurations and speed up the optimization process. At the end of this process, CASMO-4 code is used to calculate the final fuel lattice parameters. In this way, fuel lattice optimization time was reduced from 6 hours to 15 minutes obtaining good LPPF vs exposure behaviors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.