Descriptor: "Decision tree" / Topic: 02 engineering and technology - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Decision tree"' showing total 8,275 results

Start Over Descriptor "Decision tree" Topic 02 engineering and technology

8,275 results on '"Decision tree"'

1. Analysis of different machine learning classifiers on MP election commission and breast cancer big dataset

Author: Priyank Jain and Shriya Sahu
Subjects: 010302 applied physics, Learning classifier system, Computer science, business.industry, Big data, Decision tree, 02 engineering and technology, General Medicine, 021001 nanoscience & nanotechnology, Machine learning, computer.software_genre, 01 natural sciences, Class (biology), Support vector machine, Naive Bayes classifier, ComputingMethodologies_PATTERNRECOGNITION, Software, 0103 physical sciences, Scalability, Artificial intelligence, 0210 nano-technology, business, computer
Abstract: This paper is a unique effort to resolve the scaling issues of machine learning using the multi-node environment of Big Data. For this purpose, it incorporates the concept of machine learning with big data. Machine learning is a branch of Artificial Intelligence which trains the computers to learn without being explicitly programmed. Machine learning works on the development of computer programs and software that work according to the input dataset. In machine learning, classification is used to identify the class of instances. In this the category or class or a group provides the new observation to which it belongs on the basis of the training dataset. Training datasets contain those instances whose class is already known. An algorithm that implements classification is known as a classifier. The main aim of this paper is to evaluate how different classifiers work on two different datasets, one is a real time dataset of MP Election nomination (2018), while the other is a standard Breast Cancer dataset, released by University of Wisconsin (1995). There are various classifiers available like Decision Trees, K-Nearest Neighbors, Support Vector Machine, Logistic Regression, Naive Bayes etc. Different classifiers work differently on different datasets, giving different accuracy according to the kind and the size of the dataset. So we have used different classifiers for our work. When we were applying machine learning classifier on a real-time dataset, i.e. MPSEC dataset, Accuracy goes down. Similarly, these classifiers apply to Standard Breast Cancer dataset then results show outstanding accuracy. This research work shows the exciting patterns to study the case of a real-time and standard dataset using different machine learning classifiers with a combination of big data. This research work also incorporates the concept of parallelization using a multi-nodes environment which is able to deal with scalability issues.
Published: 2023

2. Forecasting Airport Transfer Passenger Flow Using Real-Time Data and Machine Learning

Author: Yael Grushka-Cockayne, Bert De Reyck, and Xiaojia Guo
Subjects: 050208 finance, 021103 operations research, Computer science, Strategy and Management, 05 social sciences, 0211 other engineering and technologies, Decision tree, ComputerApplications_COMPUTERSINOTHERSYSTEMS, 02 engineering and technology, Management Science and Operations Research, computer.software_genre, Flow (mathematics), Transfer (computing), 0502 economics and business, Real-time data, Data mining, computer
Abstract: Problem definition: Airports and airlines have been challenged to improve decision making by producing accurate forecasts in real time. We develop a two-phased predictive system that produces forecasts of transfer passenger flows at an airport. In the first phase, the system predicts the distribution of individual transfer passengers’ connection times. In the second phase, the system samples from the distribution of individual connection times and produces distributional forecasts for the number of passengers arriving at the immigration and security areas. Academic/practical relevance: To our knowledge, this work is the first to apply machine learning for predicting real-time distributional forecasts of journeys in an airport using passenger level data. Better forecasts of these journeys can help optimize passenger experience and improve airport resource deployment. Methodology: The predictive system developed is based on a regression tree combined with copula-based simulations. We generalize the tree method to predict distributions, moving beyond point forecasts. We also formulate a newsvendor-based resourcing problem to evaluate decisions made by applying the new predictive system. Results: We show that, when compared with benchmarks, our two-phased approach is more accurate in predicting both connection times and passenger flows. Our approach also has the potential to reduce resourcing costs at the immigration and transfer security areas. Managerial implications: Our predictive system can produce accurate forecasts frequently and in real time. With these forecasts, an airport’s operating team can make data-driven decisions, identify late passengers, and assist them to make their connections. The airport can also update its resourcing plans based on the prediction of passenger flows. Our predictive system can be generalized to other operations management domains, such as hospitals or theme parks, in which customer flows need to be accurately predicted. Funding: This work was funded by Eurocontrol (APOC Business Process Reengineering Big Data Study) [Grant 15-220643-AWP6.3.1]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2021.0975 .
Published: 2022

3. Analyzing acetylene adsorption of metal–organic frameworks based on machine learning

Author: Peisong Yang, Lei Liu, Xin Lai, Qingyuan Yang, Duli Yu, and Gang Lu
Subjects: Quantitative structure–activity relationship, Mean squared error, Renewable Energy, Sustainability and the Environment, Computer science, business.industry, Decision tree, 02 engineering and technology, 010402 general chemistry, 021001 nanoscience & nanotechnology, Machine learning, computer.software_genre, 01 natural sciences, 0104 chemical sciences, Support vector machine, Adsorption, Feature (machine learning), Metal-organic framework, Artificial intelligence, 0210 nano-technology, business, computer, Topology (chemistry)
Abstract: Metal–organic frameworks (MOFs) containing open metal sites are important materials for acetylene (C2H2) adsorption. However, it is inefficient or even impossible to search suitable MOFs by molecular simulation method in nearly infinite MOFs space. Therefore, machine learning (ML) methods are adopted in the material screening and prediction of high-performance MOFs. In this paper, architecture, chemical and structural features are used to analyze the C2H2 adsorption performance of the MOFs. Different ML algorithms are applied to perform classification and regression analysis to the factors affecting material adsorption. By decision tree (DT) algorithm, it is found that only PV, GSA, and Cu-OMS are sufficient to determine the high adsorption of the MOFs. Furthermore, the influence of topology on the performance of MOFs is obtained. Gradient Boosting Decision Tree (GBDT), Support Vector Machine (SVM), and Back Propagation Neural Network (BPNN), are introduced to analyze the quantitative structure–property relationship (QSPR) between C2H2 adsorption and the features of MOFs. The prediction of the GBDT model is found to have the highest accuracy, with R2 as 0.93 and RMSE as 11.58. In addition, the GBDT model is used for feature analysis, and the contribution of each feature to the performance is obtained, which is of great significance for the design and analysis of MOFs. The successful application of ML to MOFs screening greatly reduce the calculation time and provides important reference for the design and synthesis of new MOFs.
Published: 2022

4. A Comparative Approach for MI-Based EEG Signals Classification Using Energy, Power and Entropy

Author: Subhasis Bhaumik, Akash Kumar Bhoi, and Ganesh Roy
Subjects: business.industry, Computer science, 0206 medical engineering, Feature extraction, Biomedical Engineering, Biophysics, Decision tree, Pattern recognition, 02 engineering and technology, Quadratic classifier, Linear discriminant analysis, 020601 biomedical engineering, Haar wavelet, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, Naive Bayes classifier, Statistical classification, 0302 clinical medicine, Artificial intelligence, Entropy (energy dispersal), business
Abstract: Objective The initial principal task of a Brain-Computer Interfacing (BCI) research is to extract the best feature set from a raw EEG (Electroencephalogram) signal so that it can be used for the classification of two or multiple different events. The main goal of the paper is to develop a comparative analysis among different feature extraction techniques and classification algorithms. Materials and methods In this present investigation, four different methodologies have been adopted to classify the recorded MI (motor imagery) EEG signal, and their comparative study has been reported. Haar Wavelet Energy (HWE), Band Power, Cross-correlation, and Spectral Entropy (SE) based Cross-correlation feature extraction techniques have been considered to obtain the necessary features set from the raw EEG signals. Four different machine learning algorithms, viz. LDA (Linear Discriminant Analysis), QDA (Quadratic Discriminant Analysis), Naive Bayes, and Decision Tree, have been used to classify the features. Results The best average classification accuracies are 92.50%, 93.12%, 72.26%, and 98.71% using the four methods. Further, these results have been compared with some recent existing methods. Conclusion The comparative results indicate a significant accuracy level performance improvement of the proposed methods with respect to the existing one. Hence, this presented work can guide to select the best feature extraction method and the classifier algorithm for MI-based EEG signals.
Published: 2022

5. Optimization of decision trees using modified African buffalo algorithm

Author: Dharmpal D. Doye and Archana R. Panhalkar
Subjects: Optimization problem, General Computer Science, Optimization algorithm, Computer science, Decision tree learning, Decision tree, Volume (computing), 020206 networking & telecommunications, 02 engineering and technology, Overfitting, 0202 electrical engineering, electronic engineering, information engineering, Extensive data, 020201 artificial intelligence & image processing, Algorithm, Optimal decision
Abstract: Decision tree induction is a simple, however powerful learning and classification tool to discover knowledge from the database. The volume of data in databases is growing to quite large sizes, both in the number of attributes and instances. Some important limitations of decision trees are instability, local decisions, and overfitting for this extensive data. The simple, effective and non-convergence nature of the African Buffalo Optimization (ABO) algorithm makes it suitable to solve complex optimization problems. In this paper, we propose the African Buffalo Optimized Decision Tree (ABODT) algorithm to create globally optimized decision trees using the intelligent and collective behaviour of African Buffalos. The modified African Buffalo optimization algorithm is used to create efficient and optimal decision trees. To evaluate the efficiency of the proposed African Buffalo Optimized Decision Tree algorithm, experiments are performed on 15 standard UCI learning repository datasets that are of various sizes and domains. Results show that the African Buffalo Optimized Decision Tree algorithm globally optimizes decision trees, increases accuracy and reduces the size of a decision tree. These optimized trees are stable and efficient than conventional decision trees.
Published: 2022

6. Music rhythm tree based partitioning approach to decision tree classifier

Author: V. Umadevi, Vijayakumar Kadappa, Ajith Abraham, and Shankru Guggari
Subjects: General Computer Science, Computer science, business.industry, Decision tree learning, Decision tree, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Class (biology), Random forest, Tree (data structure), Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, AdaBoost, Artificial intelligence, business, Statistical hypothesis testing
Abstract: Decision tree is a widely used non-parametric technique in machine learning, data mining and pattern recognition. It is simple to understand and interpret, however it faces challenges such as handling higher dimensional and class imbalanced datasets, over-fitting and instability. To overcome some of these issues, vertical partitioning approaches like serial partitioning, theme based partitioning are used in the literature. A vertical partitioning approach divides the feature set into subsets of features (blocks) and makes use of these subsets for subsequent tasks. In this work, we use the ideas of music rhythm tree to propose a novel vertical partitioning technique. It orders the features based on the average correlation strength of the features before partitioning the feature set. The proposed method is proved to be superior by showing an average of 13.8 % , 6 % , 9.8 % , 19.7 % , 9.4 % , and 29.4 % higher classification accuracy over C4.5, Random Forest, Bagging, Adaboost, an ensemble technique and a vertical partitioning technique respectively. Our empirical results on 15 datasets demonstrate that the proposed vertical partitioning method is more stable and better in handling class-imbalanced data. Finally, some popular statistical tests are conducted to validate the statistical significance of the results of the proposed method.
Published: 2022

7. How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness

Author: Ana Valdivia, Jorge Casillas, and Javier Sánchez-Monedero
Subjects: Sociotechnical system, business.industry, Computer science, Decision tree, 02 engineering and technology, Machine learning, computer.software_genre, Multi-objective optimization, Theoretical Computer Science, Human-Computer Interaction, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software
Abstract: Fair machine learning has been focusing on the development of equitable algorithms that address discrimination. Yet, many of these fairness-aware approaches aim to obtain a unique solution to the problem, which leads to a poor understanding of the statistical limits of bias mitigation interventions. In this study, a novel methodology is presented to explore the tradeoff in terms of a Pareto front between accuracy and fairness. To this end, we propose a multiobjective framework that seeks to optimize both measures. The experimental framework is focused on logistiregression and decision tree classifiers since they are well-known by the machine learning community. We conclude experimentally that our method can optimize classifiers by being fairer with a small cost on the classification accuracy. We believe that our contribution will help stakeholders of sociotechnical systems to assess how far they can go being fair and accurate, thus serving in the support of enhanced decision making where machine learning is used.
Published: 2023

8. Prognosis and Treatment Prediction of Type-2 Diabetes Using Deep Neural Network and Machine Learning Classifiers

Author: Md. Kowsher, Mahbuba Yesmin Turaba, Tanvir Sajed, and Mahfuzur Rahman
Subjects: Chronic metabolic disorder, FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, Computer science, business.industry, Decision tree, 020206 networking & telecommunications, 02 engineering and technology, Type 2 diabetes, Overfitting, Machine learning, computer.software_genre, medicine.disease, Random forest, Machine Learning (cs.LG), Support vector machine, Naive Bayes classifier, Diabetes mellitus, 0202 electrical engineering, electronic engineering, information engineering, medicine, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Type 2 Diabetes is a fast-growing, chronic metabolic disorder due to imbalanced insulin activity. As lots of people are suffering from it, access to proper treatment is necessary to control the problem. Most patients are unaware of health complexity, symptoms and risk factors before diabetes. The motion of this research is a comparative study of seven machine learning classifiers and an artificial neural network method to prognosticate the detection and treatment of diabetes with a high accuracy, in order to identify and treat diabetes patients at an early age. Our training and test dataset is an accumulation of 9483 diabetes patients’ information. The training dataset is large enough to negate overfitting and provide for highly accurate test performance. We use performance measures such as accuracy and precision to find out the best algorithm deep ANN which outperforms with 95.14% accuracy among all other tested machine learning classifiers. We hope our high performing model can be used by hospitals to predict diabetes and drive research into more accurate prediction models.
Published: 2023
Full Text: View/download PDF

9. Architecture and optimization of data mining modeling for visualization of knowledge extraction: Patient safety care

Author: Gebeyehu Belay Gebremeskel, Birhanu Hailu, and Belete Biazen
Subjects: General Computer Science, Computer science, Process (engineering), Dynamic data, 020206 networking & telecommunications, Context (language use), Architectures, 02 engineering and technology, QA75.5-76.95, computer.software_genre, Data structure, Visualization, Data modeling, Knowledge extraction, Pattern analysis, Clinical datasets, Electronic computers. Computer science, 0202 electrical engineering, electronic engineering, information engineering, Decision tree, 020201 artificial intelligence & image processing, Data mining, Dimension (data warehouse), computer
Abstract: Visualization of the knowledge extraction process is a front line to reveal the detail process and data structure, which is an advanced technique for the presentation of data modeling. However, the mechanisms for healthcare are challenging and dynamic processes to gain a clear insight or understanding of patient care. In this paper, we proposed a new approach of architecture and optimization of data mining modeling for visualization of knowledge extraction by analyzing clinical data sets to define the determinant attributes through modeling techniques. Therefore, architecture for the visualization of the knowledge extraction process is a systematic approach to support users to the best of their knowledge of the issues over the challenge of visualizing techniques. The proposed approach is capable and dynamic to handle and analyze large-scale data in its dimension and context. Such a variable is defined using various techniques to characterize them towards the detection of determinant variables as its influential circumstance. We focused on modeling based visualization as model representation, factor's interaction and integration. The detection process experimented in a different approach and justification as discussed in section five. The finding showed a deep understandability for an advanced and dynamic data mining modeling techniques to integrate applications with domain contexts for the optimal and understandable decision process. The strength of this approach is the depth for visualization towards the knowledge extraction process and its understandability for users as per their background and circumstances. It is also essential to inference for architecture based modeling and visualization for large scale data. Researchers, physicians, experts, and other users are the potentials to refer to these novel ideas and findings.
Published: 2022

10. Disjunctive Fuzzy Neural Networks: A New Splitting-Based Approach to Designing a T–S Fuzzy Model

Author: Witold Pedrycz, Xiaoqian Chen, Wen Yao, Yong Zhao, and Ning Wang
Subjects: Network architecture, business.industry, Computer science, Applied Mathematics, Fuzzy set, Decision tree, 02 engineering and technology, Grid, Fuzzy logic, Computational Theory and Mathematics, Artificial Intelligence, Control and Systems Engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Greedy algorithm, Curse of dimensionality, Interpretability
Abstract: This paper proposes a new network approach towards the implementation of T-S fuzzy models referred to as disjunctive fuzzy neural networks (DJFNN). The proposed DJFNN involves a novel network architecture and a greedy learning algorithm. Being different from existing grid-based and clustering-based network architectures, the proposed architecture adds an OR neural layer positioned between the fuzzification layer and the rule layer. In this way, the implied constraint between the number of rules and the number of fuzzy labels is excluded so that a curse of dimensionality can be overcome and more interpretable models can be formed. Further, inspired by the core algorithm for building a decision tree, a top-down, non-backtracking, and greedy algorithm is proposed to learn the unknown parameters of the networks. The input space splits into smaller and smaller subspace along the pre-defined fuzzy grids in a supervised manner meanwhile the associated conditions of T-S fuzzy model are identified. The greedy algorithm is applicable to high-dimensional problems since there is no exponential growth in time or space as the dimensionality increases. The new network architecture and greedy learning algorithm make the proposed DJFNN a regression model of high interpretability and good prediction capability, particularly suitable for solving high-dimensional problems. DJFNN was experimented with using a synthetic dataset and 28 real-world datasets and compared with classical and state-of-the-art methods through non-parametric statistical tests. The results confirmed the effectiveness of DJFNN in terms of accuracy, interpretability, and computational cost.
Published: 2022

11. Prediction of second-order rate constants between carbonate radical and organics by deep neural network combined with molecular fingerprints

Author: Peizhe Sun, Hong Yao, Shangyu Li, Ruochun Zhang, and Huixin Ma
Subjects: Artificial neural network, Decision tree, 02 engineering and technology, General Chemistry, 010402 general chemistry, 021001 nanoscience & nanotechnology, 01 natural sciences, 0104 chemical sciences, Support vector machine, Reaction rate, chemistry.chemical_compound, Reaction rate constant, chemistry, Molecular descriptor, Yield (chemistry), Carbonate, 0210 nano-technology, Biological system
Abstract: Carbonate radical is among the most important environmental relevant reactive species which govern the transformation and fate of pharmaceutical contaminants (PCs). However, reaction rate constants between carbonate radical and most of the PCs have not been experimentally determined, and quantitative structural-activity relationships (QSARs) have not been established for rate estimation. This study applied MaxMin data processing method and used molecular fingerprints (MF) as the input of a deep neural network (DNN) to predict the rate constants between carbonate radical and organic compounds. MF parameters and the hyper-structure of the DNN were adjusted to yield satisfactory accuracy of rate prediction. The vector length of 512 bits with radius of 1 for MF and 5 hidden layers gave the best performance. The optimized MaxMin-MF-DNN model was compared with some of the most commonly used QSARs and machine learning methods, including random data splitting, molecular descriptors, supporting vector machine, decision tree, etc. Results showed that the MF-DNN model out-performed the other methods by more than 10% increase in prediction accuracy. Applying this MF-DNN model, we estimated reaction rates between carbonate radical and pharmaceuticals used in human medicine (1576) and veterinary practice (390). Among them, 46 drugs were identified as fast-reacting compounds, suggesting the important relations of their environmental fate with carbonate radical.
Published: 2022

12. Feature based classification of voice based biometric data through Machine learning algorithm

Author: Samiya Shakil, Taskeen Zaidi, and Deepak Arora
Subjects: 010302 applied physics, Biometrics, Computer science, business.industry, Dimensionality reduction, Decision tree learning, Big data, Decision tree, Sample (statistics), 02 engineering and technology, 021001 nanoscience & nanotechnology, Machine learning, computer.software_genre, 01 natural sciences, Identification (information), 0103 physical sciences, Artificial intelligence, 0210 nano-technology, business, computer, Algorithm, Selection (genetic algorithm)
Abstract: In the era of big data and growing artificial intelligence, the requirement and necessity of biometric identification increase in a rapid manner. The digitalization and recent Pandemic crisis gives a boost to need to authorized identification which get fulfilled with biometric identification. Our paper focuses on same concept of checking the identification accuracy of machine learning algorithm REPTree on selected biometric dataset which is being deployed and evaluated on a data mining tool WEKA. Our target is to achieve more or equal to 95 percentages in order to predict the given sample data is accurately classified into our target variables values i.e. male female. The selected algorithm REPTree is a kind of decision tree classification algorithm which works on same concept as C4.5 and decision tree algorithm with speciality of generation of both kind of output i.e. discrete and continuous. The selection of algorithm gives us benefits with achievement of higher accuracy and selection of dataset also become easy with some required modification and pre-processing of data with some dimension reduction filters.
Published: 2022

13. Prediction of Cancer Disease using Machine learning Approach

Author: F.J. Shaikh and D.S. Rao
Subjects: 010302 applied physics, Artificial neural network, business.industry, Computer science, Deep learning, Decision tree, Cancer, 02 engineering and technology, Disease, 021001 nanoscience & nanotechnology, Machine learning, computer.software_genre, medicine.disease, 01 natural sciences, Field (computer science), Support vector machine, 0103 physical sciences, medicine, Artificial intelligence, 0210 nano-technology, business, computer, Biomedicine
Abstract: Cancer has identified a diverse condition of several various subtypes. The timely screening and course of treatment of a cancer form is now a requirement in early cancer research because it supports the medical treatment of patients. Many research teams studied the application of ML and Deep Learning methods in the field of biomedicine and bioinformatics in the classification of people with cancer across high- or low-risk categories. These techniques have therefore been used as a model for the development and treatment of cancer. As, it is important that ML instruments are capable of detecting key features from complex datasets. Many of these methods are widely used for the development of predictive models for predicating a cure for cancer, some of the methods are artificial neural networks (ANNs), support vector machine (SVMs) and decision trees (DTs). While we can understand cancer progression with the use of ML methods, an adequate validity level is needed to take these methods into consideration in clinical practice every day. In this study, the ML & DL approaches used in cancer progression modeling are reviewed. The predictions addressed are mostly linked to specific ML, input, and data samples supervision.
Published: 2022

14. Automatic recognition of Indian vehicles license plates using machine learning approaches

Author: Ashok Kumar Sahoo
Subjects: 010302 applied physics, Artificial neural network, business.industry, Computer science, Character (computing), Feature extraction, Decision tree, Pattern recognition, 02 engineering and technology, 021001 nanoscience & nanotechnology, 01 natural sciences, Automation, Prewitt operator, 0103 physical sciences, Segmentation, Artificial intelligence, 0210 nano-technology, business, Connected-component labeling
Abstract: A proposed method for automation of Indian vehicles number plate conversion is presented in this paper. Many vehicles are plying on Indian roads with varying in size and structure and hence there are a number of variations in number plates too. Although there are strict norms to be followed for number plate designs, however, it can be seen many deviations in actual number plates of many vehicles. It can be also observed that the number plates are written in many Indian languages, but the prescribed norm is Hindu-Arabic numerals with Latin letters which is prescribed by the licensing plate issuing authority. The number plate designs are varied according to types of vehicles, like, personal, commercial, public transport, etc. In order to design an automatic vehicle number plate, considerations of these factors are taken into account. The steps in the design include segmentation and character extraction from number plates and recognition of segmented characters in order to extract the complete information from number plates. The work in this paper is focused on Hindu-Arabic numerals with Latin letters. For feature extraction, Prewitt filter technique is used to extraction of characters from the vehicle number plates and connected component analysis is used for segmenting characters. Three different classifiers, namely, k-Nearest Neighbor, Artificial Neural Network and Decision Tree, are used in the experiments. Recognition accuracy up to 98.10% is achieved and is a promising factor for future research in this direction.
Published: 2022

15. Warpage prediction of Injection-molded PVC part using ensemble machine learning algorithm

Author: Sumaiya Benta Nasir, C. L. Karmaker, Tazim Ahmed, and Parveen Sharma
Subjects: 010302 applied physics, Process (computing), Decision tree, 02 engineering and technology, Molding (process), 021001 nanoscience & nanotechnology, 01 natural sciences, Ensemble learning, Cooling time, Random forest, 0103 physical sciences, 0210 nano-technology, Algorithm, Predictive modelling, Holding time, Mathematics
Abstract: In this study, the warpage prediction models have been developed for an injection molded PVC component named a drip chamber. Two popular and widely used ensemble machine learning algorithms, namely random forest and gradient boosted regression tree have been used to develop the predictive models. 40 experiments were carried out for various input process parameters to develop the warpage prediction dataset. ANOVA was performed to identify the significant input process parameters. These process parameters were barrel temperature, holding pressure, holding time, mold temperature, and cooling time. The results shows that the mean absolute percentage errors of random forest and gradient boosted regression tree model are 3.25% and 9.37%, respectively. This indicates that random forest ensemble algorithm outperforms gradient boosted model in predicting the warpage of injection molded part. This model allows production managers to monitor injection molding method parameters and regulate the warpage before actual production and with minimum waste.
Published: 2022

16. Score level fusion in multi-biometric identification based on zones of interest

Author: Kamel Aizi and Mohamed Ouslim
Subjects: Multibiometric fusion, Identification, Modality (human–computer interaction), General Computer Science, Biometrics, business.industry, Computer science, Decision tree, Zones of interest, 020206 networking & telecommunications, Pattern recognition, Score level, QA75.5-76.95, 02 engineering and technology, Rejection rate, Fuzzy logic, Clustering, Identification (information), Fingerprint, Electronic computers. Computer science, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Cluster analysis
Abstract: In this paper, we present a new multibiometric fusion method for the identification of persons using two modalities, the iris and the fingerprint. Each modality is separately processed to generate a vector of scores. The fusion method is applied at the score level. A preliminary study based on the k-means clustering method, for each modality, led us to split the score range into three zones of interest relevant to the proposed identification method. The fusion is then applied to the extracted regions using two approaches. The first one achieves the classification by the decision tree combined to the weighted sum (BCC), while the second approach is based on the fuzzy logic (BFL). Several tests were conducted to evaluate the performance of the proposed methods on standard biometric databases using four metrics, namely, False Accept Rate, False Reject Rate, Enrollee False Accept Rate and Recognition Rate. The obtained results are very interesting since they illustrate clearly that the proposed fusion approaches outperform those based on a single modality. In addition, we showed that the BCC fusion approach achieves slightly better performance compared to the BFL.
Published: 2022

17. Comparing the HRV Time-Series Signals Acquired from Cannabis Consuming and Non-Consuming Indian Paddy-Field Workers by Recurrence Quantification Analysis

Author: Kunal Pal, Indranil Banerjee, S. Banani, Kishore K. Tarafdar, Suraj K. Nayak, and Doman Kim
Subjects: biology, Computer science, Dimensionality reduction, 0206 medical engineering, Biomedical Engineering, Biophysics, Decision tree, Feature selection, 02 engineering and technology, biology.organism_classification, 020601 biomedical engineering, 030218 nuclear medicine & medical imaging, Support vector machine, 03 medical and health sciences, 0302 clinical medicine, Recurrence quantification analysis, Statistics, Mann–Whitney U test, Information gain ratio, Cannabis
Abstract: Objective In the last few decades, the consumption of cannabis-based products for recreational purposes has dramatically increased. Unfortunately, cannabis consumption has been associated with the incidences of cardiovascular diseases. Hence, there is a necessity for understanding the plausible mechanics of cardiophysiological changes due to cannabis consumption. Accordingly, the current study was designed to understand the suitability of the recurrence quantification analysis (RQA) method in detecting the changes in the heart rate variability (HRV) time-series signals due to the consumption of cannabis (bhang). Further, a machine learning model has been proposed for the automated detection of the cannabis takers. Materials and Methods The RQA of the HRV time-series signals from 200 healthy Indian male paddy-field workers were carried out. The obtained parameters were statistically analyzed using the Mann-Whitney U test. Further, the decision trees, weight-based feature ranking, and dimensionality reduction methods were employed for identifying the relevant features for the development of a suitable machine learning model. Results Observable changes in the patterns of the recurrence plots among the bhang consuming and non-consuming groups were noticed. However, there were no significant differences in the RQA parameters. Among the developed machine learning models, the SVM model obtained from the “Information gain ratio” feature selection method exhibited the highest accuracy (%) of 69.09 ± 9.33. Conclusion Our study suggests that the RQA method is not as effective as the time and frequency domain methods for detecting the alterations in the HRV time-series signals due to cannabis consumption. The SVM model was found to be the best model for the automated detection of cannabis takers. The selection of the features by the information gain ratio method played an important role in the development of the optimized SVM model.
Published: 2021

18. Textual analysis of traitor-based dataset through semi supervised machine learning

Author: Asif Masood, Malik Muhammad Zaki Murtaza Khan, Faisal Janjua, Imran Rashid, and Haider Abbas
Subjects: Artificial neural network, Computer Networks and Communications, Computer science, business.industry, Decision tree, Insider threat, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Random forest, Insider, Support vector machine, Identification (information), Naive Bayes classifier, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software
Abstract: Insider threats are one of the most challenging and growing security threats which the government agencies, organizations, and institutions face. In such scenarios, malicious (red) activities are performed by the authorized individuals within the company. Because of which, an insider threat has become a taxing and difficult task to identify among other attacks. Along with other monitoring parameters; email logs play a vital role in many research areas such as stalking Insider Threat involving Collaborating Traitors, Textual Analysis, and Social Media exploration. This paper presents a semi-supervised machine learning framework which embraces the pre-processing and classification techniques together for unlabeled dataset i.e. emails. Enron Corporation dataset has been used for experiments and TWOS for evaluation of the proposed framework. Initially, dataset is transformed into vector form using Term Frequency–Inverse Document Frequency (TF–IDF). Thereafter, K-Means is used to classify emails based on message content. Finally, Machine Learning algorithm Decision Tree (DT) is applied to classify the malicious activities. The proposed framework has also been tested with other algorithms such as Logistic Regression (LR), Naive Bayes (NB), KNN, Support Vector Machine (SVM), Random Forest (RF) and Neural Network (NN). However, Decision Tree (DT) combined with pre-processing steps has given the desired results with 99.96% Accuracy and 0.994 AUC for identification of malicious content.
Published: 2021

19. Prediction of Drug–Target Interactions Based on Network Representation Learning and Ensemble Learning

Author: Ping Xuan, Tiangang Zhang, yan yang, and Bingxu Chen
Subjects: Computer science, Association (object-oriented programming), 0206 medical engineering, Drug target, Decision tree, 02 engineering and technology, Machine learning, computer.software_genre, Machine Learning, Drug Development, Genetics, Humans, Drug Interactions, Basis (linear algebra), business.industry, Applied Mathematics, Decision Trees, Computational Biology, Proteins, Network representation learning, Ensemble learning, Pharmaceutical Preparations, Drug development, Node (circuits), Artificial intelligence, business, computer, Algorithms, 020602 bioinformatics, Biotechnology
Abstract: Identifying interactions between drugs and target proteins is a critical step in the drug development process, as it helps identify new targets for drugs and accelerate drug development. The number of known drug-protein interactions (positive samples) is much lower than that of the unknown ones (negative samples), which forms a class imbalance. Most previous methods only utilised part of the negative samples to train the prediction model, so most of the information on negative samples was neglected. Therefore, a new method must be developed to predict candidate drug-related proteins and fully utilise negative samples to improve prediction performance. We present a method based on non-negative matrix factorisation and gradient boosting decision tree (GBDT), named NGDTP, to identify the candidate drug-protein interactions. NGDTP integrates multiple kinds of protein similarities, drugs-proteins interactions, and multiple kinds of drugs similarities at different levels, including target proteins of drugs, drug-related diseases, and side effects of drugs. We propose a network representation learning method based on matrix factorisation to learn low-dimensional vector representations of drug and protein nodes. On the basis of these low-dimensional node representations, a GBDT-based prediction model was constructed and it obtains the association scores through establishing multiple decision trees for a drug-protein pairs. NGDTP is an ensemble learning model that fully utilises all the negative samples to effectively alleviate the problem of class imbalance. NGDTP achieves superior prediction performance when it is compared with several state-of-the-art methods. The experimental results indicate that NGDTP also retrieves more actual drug-protein interactions in the top part of prediction result, which drew significant attention from the biologists. In addition, case studies on 10 drugs further confirmed the ability of the NGDTP to identify potential candidate proteins for drugs.
Published: 2021

20. Fitness-Aware Containerization Service Leveraging Machine Learning

Author: Santonu Sarkar and Sreekrishnan Venkateswaran
Subjects: 020203 distributed computing, Service (systems architecture), Information Systems and Management, Computer Networks and Communications, business.industry, Computer science, Software as a service, Decision tree, Containerization, Cloud computing, 02 engineering and technology, Microservices, Machine learning, computer.software_genre, Computer Science Applications, Hardware and Architecture, Container (abstract data type), 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, business, Cluster analysis, computer
Abstract: Containerized deployment of microservices has gained immense traction across industries. To meet demand, traditional cloud providers offer container-as-a-service, where selection of the container and containerization of workloads remain developer's responsibility. This task is arduous for a developer since the choice of containers across different cloud providers is many. Furthermore, there does not exist any mechanism using which one can compare and contrast the capabilities of containers across different providers. In this scenario, we envisage the need for a smart cloud broker that can automatically deploy a chosen IT service into the best-fit container environment mapped to performance requirements, from among the set of available underpinning brokered container hosting systems spread across multiple cloud providers. We propose a novel fitness-aware containerization-as-a-service to achieve this. We show why a best-fit container selection process is operationally complex and time consuming, and how we heuristically prune the associated decision tree in two phases so that it becomes viable to implement this as an on-demand service. We propose a new metric called fitness quotient (FQ) to evaluate containers obtained from heterogeneous providers. We leverage machine learning techniques to inject automation into these two phases: unsupervised K-Means clustering in the first-level build-time phase to accurately classify IaaS cost and performance data, and polynomial regression during the second-level provisioning-time phase to discover relationships between SaaS performance and container strength.
Published: 2021

21. Machine learning-based real-time visible fatigue crack growth detection

Author: Lin Meng, Xu Chen, Zhichen Wang, Zhe Zhang, Le Zhang, and Lei Wang
Subjects: Structural safety, Computer Networks and Communications, Computer science, Decision tree, Information technology, 02 engineering and technology, Machine learning, computer.software_genre, Mechanoresponsive luminogen, Length measurement, 020210 optoelectronics & photonics, 0202 electrical engineering, electronic engineering, information engineering, Monitoring methods, Structural health monitoring, business.industry, Fatigue testing, 020206 networking & telecommunications, Paris' law, Growth prediction, T58.5-58.64, Hardware and Architecture, Fatigue crack, Path (graph theory), Fracture (geology), Computer vision, Artificial intelligence, business, computer
Abstract: Many large-scale and complex structural components are applied in the aeronautics and automobile industries. However, the repeated alternating or cyclic loads in service tend to cause unexpected fatigue fracture. Therefore, developing real-time and visible monitoring methods for fatigue crack initiation and propagation is critically important for the structural safety. This paper proposes a machine learning-based fatigue crack growth detection method that combines computer vision and machine leaning. In our model, computer vision is used for data creation, and the machine learning model is used for crack detection, then computer vision is used for marking and analyzing the crack growth path and length. We apply seven models for the crack classification and find that the decision tree is the best model in this research. The experimental results prove the effectiveness of our method and the crack length measurement accuracy achieved is 0.6 mm. Furthermore, the slight machine learning models help us realize the real-time and visible fatigue crack detection.
Published: 2021

22. A fear detection method based on palpebral fissure

Author: Bhattarasiri Slakkham, Pattarasinee Bhattarakosol, and Rawinan Praditsangthong
Subjects: General Computer Science, Computer science, Decision tree, 02 engineering and technology, Eye, 0202 electrical engineering, electronic engineering, information engineering, medicine, Emotion, Landmark, business.industry, Fissure, Interpalpebral fissure, ID3 algorithm, 020206 networking & telecommunications, Pattern recognition, QA75.5-76.95, Classification, Data set, Palpebral fissure, medicine.anatomical_structure, Face (geometry), Electronic computers. Computer science, Palpebral fissure region, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Human emotions, such as smiling or laughing, can be expressed in various forms through the face whenever there are stimuli. These changing faces can reflect the emotional states that are used to identify a normal or an abnormal behaviour. This research aims to study the patterns in human faces and identify the areas of interest (AOI), which is called Facial Landmark Detection (FLD). The investigation of the external elements of eyes is performed, and it consists of the interpalpebral fissure (IPF), the palpebral fissure length (PFL), and the palpebral fissure region (PFR). These elements are applied to classify the emotional differences between neutral and fearful emotions. A method for emotional classification was designed according to the changing values of the IPF, PFL, and PFR. An ID3 algorithm was used to classify the emotions. Three hundred sixty images were derived from horror-thriller-murder movies based on IMDb. This data set was utilized to generate the proposed pattern. This pattern was used to classify the emotions using a decision tree technique that led to the development of an emotional classification model. The accuracy of the emotional classification model between neutral and fearful emotions was 92.50%, thus proving that the proposed model is efficient.
Published: 2021

23. Gaussian model based hybrid technique for infection level identification in TB diagnosis

Author: W. R. Sam Emmanuel and K. S. Mithra
Subjects: Infection level, Bacilli, Tuberculosis, General Computer Science, Mean squared error, Computer science, Gaussian, 0206 medical engineering, Decision tree, Deep Belief Networks, 02 engineering and technology, TB diagnosis, symbols.namesake, Deep belief network, 0202 electrical engineering, electronic engineering, information engineering, medicine, Sputum smear images, biology, business.industry, Decision Trees, Pattern recognition, QA75.5-76.95, medicine.disease, Mixture model, biology.organism_classification, 020601 biomedical engineering, Gaussian mixture model, Electronic computers. Computer science, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML)
Abstract: Tuberculosis (TB) is caused by mycobacterium tuberculosis, which is a common disease all over the world that can be deadliest if not diagnosed at the early stages. Thus an accurate and effective technique is required for the diagnosis of TB. Accordingly, a hybrid classifier, named, Gaussian Decision Tree based Deep Belief Network (GDT-DBN) is proposed to diagnose the infection level of TB from the sputum smear microscopic images. Here, a two-level classification is performed using proposed GDT-DBN classifier, which is the combination of Decision Tree (DT), Deep Belief Network (DBN), and Gaussian Mixture Model (GMM). The first level classification depends on categorizing the image into three classes, namely few bacilli, no bacilli and overlapping bacilli, whereas the second level classification finds the number of bacilli present and based on the bacilli count, the density ratio is measured to determine the infection level. The results for Mean square error, Missing count and Infection level difference were calculated and compared which is better than the existing methods.
Published: 2021

24. Detecting anonymous attacks in wireless communication medium using adaptive grasshopper optimization algorithm

Author: Shubhra Dwivedi
Subjects: Fitness function, business.industry, Computer science, Cognitive Neuroscience, Decision tree, Experimental and Cognitive Psychology, 02 engineering and technology, Intrusion detection system, Machine learning, computer.software_genre, Random forest, Constant false alarm rate, 03 medical and health sciences, Attack model, Naive Bayes classifier, 0302 clinical medicine, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 030217 neurology & neurosurgery, Software, Extreme learning machine
Abstract: Intrusion Detection Systems (IDSs) is a system that monitors network traffic for suspicious activity and issues alert when such activity is revealed. Moreover, the existing IDSs-based methods are based on outdated attacks that unable to identify modern attacks or malicious trends. For this reason, in this study we developed a new multi-swarm adaptive grasshopper optimization algorithm to utilize adaptation mechanism in a group of swarms based on fuzzy logic to protect against sophisticated attacks. The proposed (MSAGOA) technique has the capability of global optimization and rapid convergence that are used to attain optimal feature subsets to identify attack types on IDS datasets. In the MSAGOA technique, learning engine as Extreme learning Machine, Naive Bayes, Random Forest and Decision Tree is applied as a fitness function to select the highly discriminating features and to maximize classification performance. Afterward, select the best classifier which works as a fitness function in our approach to measure the performance in terms of accuracy, detection rate, and false alarm rate. The simulations are performed on three IDS datasets such as NSL-KDD, AWID-ATK-R, and NGIDS-DS. The experimental results demonstrated that MSAGOA method has performed better and obtained high detection rate of 99.86%, accuracy of 99.89% in NSL-KDD and high detection rate of 98.73%, accuracy of 99.67% in AWID-ATK-R and detection rate of 89.50%, accuracy of 90.23% in NGIDS-DS. In addition, the performance is compared with several other existing techniques to show the efficacy of the proposed approach.
Published: 2021

25. SENTINEL-2A NDVI zaman serisi kullanılarak ürün fenolojisi ve nesne tabanlı ürün sınıflandırma yaklaşımı: Kırklareli ayçiçeği alanları

Author: Nihal Ceylan, Erdem Bahar, Armağan Aloe Karabulut, and İlker Kurşun
Subjects: Crop phenology, Object-oriented programming, Series (stratigraphy), 010504 meteorology & atmospheric sciences, 0211 other engineering and technologies, Decision tree, 02 engineering and technology, General Medicine, Agricultural engineering, 01 natural sciences, Sunflower, Normalized Difference Vegetation Index, Environmental science, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences
Abstract: The aim of this study is to develop a methodology for determining sunflower cultivated areas with the help of high resolution SENTINEL-2A satellite images time series representing the phenological stages of the crop growth cycle, and its application in Kırklareli province. Spectral information representing phenological periods was obtained with the help of satellite images and normalized difference vegetation index (NDVI) time series, and an object-oriented classification approach was developed based on this spectral information database. Segmentation and classification decision tree algorithms were produced by using this spectral information database, object shape criteria and other auxiliary thematic maps. The best performance in segmentation was achieved by increasing the weight coefficient of the "Canny edge” layer, which is the edge determination layer defined in the multiresolution method of "Canny edge” algorithm to define the agricultural parcels. Object-oriented classification was carried out based on the this segmented parcels. First, summer, winter, fallow and continuous green areas were determined through the classification decision tree algorithms. The summer and winter crops were classified using the parcel spectral information of the crop-based learning samples that allocated in field work. The crops for which class definition could not be made were passed through a second elimination in the "unclassified" group and later assigned to their classes. In the last stage, parcels whose class definition could not be made were named as "other" class. According to the confusion matrix and accuracy analysis results, sunflower, which was determined in two classes as early and late sowing, was classified at 98% and 92% accuracy, respectively.
Published: 2021

26. Analysis of surface roughness of rock dust reinforced AA6061 -Mg matrix composite in turning

Author: R. Balasundaram, Manickam Ravichandran, and R. Balachandhar
Subjects: Materials science, Turning, Composite number, Composite, 02 engineering and technology, Surface finish, 01 natural sciences, Matrix (geology), Surface roughness, Machining, 0103 physical sciences, Linear regression, Aluminium alloy, Decision tree, Composite material, 010302 applied physics, Mining engineering. Metallurgy, Metals and Alloys, TN1-997, Regression analysis, 021001 nanoscience & nanotechnology, Regression, Anova, Mechanics of Materials, visual_art, visual_art.visual_art_medium, 0210 nano-technology
Abstract: This work, examines the Surface Roughness (SR) of composite consisting Aluminium alloy (AA6061), Magnesium and Rock dust during turning process. To study the performance, three different test specimens with different constituents of Al 6061-T6, AZ31 and Rock dust were prepared by stir casting method. Turning experiments were carried out using MTAB Siemens – CNC lathe. The input parameters for machining are speed, depth of cut & feed and output response is surface roughness For each test specimen, there are 15 turning operations were performed using Box-Ben hen Design approach. To analyze the process parameters for SR, the models of ANOVA and Decision Tree (DT) algorithms were performed. Both algorithms are confirmed that, speed is the most significant factor for SR. The addition of AZ 31 with 1% and rock dust of 2% in AA6061 produced better surface finish. Regression models of linear regression, multilayer perception and support vector regression from data science were formulated to find the relationship between variables. Among these models multi layer perception produced minimum root mean square error.
Published: 2021

27. A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data

Author: Nan Yin, Zhaozhao Xu, Xi Han, Yue Kou, Tiezheng Nie, and Derong Shen
Subjects: Information Systems and Management, Computer science, 05 social sciences, Decision tree, k-means clustering, 050301 education, Sample (statistics), 02 engineering and technology, Computer Science Applications, Theoretical Computer Science, Random forest, Artificial Intelligence, Control and Systems Engineering, 0202 electrical engineering, electronic engineering, information engineering, Oversampling, 020201 artificial intelligence & image processing, Sensitivity (control systems), 0503 education, Algorithm, Software, Cluster based, Interpolation
Abstract: The algorithm of C4.5 decision tree has the advantages of high classification accuracy , fast calculation speed and comprehensible classification rules, so it is widely used for medical data analysis. However, for imbalanced medical data, the classification accuracy of decision trees-based models is not ideal. Therefore, this paper proposes a cluster-based oversampling algorithm (KNSMOTE) combining Synthetic minority oversampling technique (SMOTE) and k-means algorithm. The sample classes clustered by k -means and the original sample classes are calculated to select the ‘‘safe samples” whose sample classes have not been changed. The ‘‘safe samples” are linearly interpolated to synthesize the new samples. The improved SMOTE sets the oversampling ratio according to the imbalance ratio of the original samples, which is used to synthesize the samples whose number is the same as that of the original samples. Compared with other oversampling algorithms on 8 UCI datasets, our algorithm has achieved significant advantages. Our algorithm was applied to the medical datasets, and the average values of the Sensitivity and Specificity indexes of the Random forest (RF) algorithm were 99.84% and 99.56%, respectively.
Published: 2021

28. Efficient traversal of decision tree ensembles with FPGAs

Author: Veronica Gil-Costa, Fernando Loor, Salvatore Trani, Franco Maria Nardini, Romina Soledad Molina, Raffaele Perego, Molina, R., Loor, F., Gil-Costa, V., Nardini, F. M., Perego, R., and Trani, S.
Subjects: System on Chip, Computer Networks and Communications, Computer science, Decision trees, Decision tree, Inference, 02 engineering and technology, Theoretical Computer Science, FPGA, Machine learning, Constant (computer programming), Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Field-programmable gate array, 020206 networking & telecommunications, Tree (data structure), Tree traversal, Computer engineering, Hardware and Architecture, Scalability, Hardware acceleration, 020201 artificial intelligence & image processing, Ranking, Software
Abstract: System-on-Chip (SoC) based Field Programmable Gate Arrays (FPGAs) provide a hardware acceleration technology that can be rapidly deployed and tuned, thus providing a flexible solution adaptable to specific design requirements and to changing demands. In this paper, we present three SoC architecture designs for speeding-up inference tasks based on machine learned ensembles of decision trees. We focus on QuickScorer , the state-of-the-art algorithm for the efficient traversal of tree ensembles and present the issues and the advantages related to its deployment on two SoC devices with different capacities. The results of the experiments conducted using publicly available datasets show that the solution proposed is very efficient and scalable. More importantly, it provides almost constant inference times, independently of the number of trees in the model and the number of instances to score. This allows the SoC solution deployed to be fine tuned on the basis of the accuracy and latency constraints of the application scenario considered.
Published: 2021

29. Predicting metal-organic frameworks as catalysts to fix carbon dioxide to cyclic carbonate by machine learning

Author: Bijin Wang, Xinwu Yang, Shaorui Sun, Shuyuan Li, Hong He, Yuxuan Hu, and Yunjiang Zhang
Subjects: Speedup, Materials science, Decision tree, 02 engineering and technology, 010402 general chemistry, Machine learning, computer.software_genre, 01 natural sciences, Catalysis, Cyclic carbonate, Materials of engineering and construction. Mechanics of materials, CO2 fixation, Artificial neural network, Catalysts, business.industry, Carbon fixation, Metals and Alloys, Metal-organic frameworks, 021001 nanoscience & nanotechnology, 0104 chemical sciences, Surfaces, Coatings and Films, Electronic, Optical and Magnetic Materials, Support vector machine, Stochastic gradient descent, TA401-492, Metal-organic framework, Artificial intelligence, 0210 nano-technology, business, computer
Abstract: The process of discovering and developing new materials currently requires considerable effort, time, and expense. Machine learning (ML) algorithms can potentially provide quick and accurate methods for screening new materials. In the present work, the features of the metal organic frameworks (MOFs) as a catalyst for fixing carbon dioxide into cyclic carbonate were extracted to build a data set, which were collected from the experimental results of approximately 100 published papers. Classifiers were trained with the data set with various ML algorithms, including support vector machine (SVM), K-nearest neighbor classification (KNN), decision trees (DT), stochastic gradient descent (SGD), and neural networks (NN), to predict the catalytic performance. The ML models were trained on 80% of the data set and then tested on the remaining 20% to predict the carbon dioxide fixation ability. The trained ML model was extended to explore 1311 hypothetical MOFs, and some structures displayed a strong catalytic ability. Finally, the six best metal ions (Mn, V, Cu, Ni, Zr and Y) and four best ligands (tactmb, tdcbpp, TCPP, H3L) were determined. These six metals and four ligands could be combined into 24 MOFs, which are strongly potential catalysts for carbon dioxide fixation. Using machine learning methods can speed up the screening of materials, and this methodology is promising for application not only to MOFs as catalysts but also in many other materials science projects.
Published: 2021

30. Characteristics of cyclists using fitness tracker apps and its implications for planning of bicycle transport systems

Author: Valerian Kwigizile, Sia Macmillan Lyimo, and Keneth Morgan Kwayu
Subjects: 050210 logistics & transportation, education.field_of_study, Transportation planning, business.industry, Computer science, education, 05 social sciences, Geography, Planning and Development, Population, Physical fitness, 0211 other engineering and technologies, Decision tree, 021107 urban & regional planning, Transportation, 02 engineering and technology, Odds, Urban Studies, Transport engineering, 0502 economics and business, Tracking data, business, Cycling, human activities, Recreation
Abstract: The surge of crowdsourced cycling data from various fitness tracker apps has attracted the attention of transportation planners as it has the potential to provide rapid, cheaper, and high-resolution data for the planning of cycle infrastructure. However, much has been asked on whether the data coming from these fitness tracker apps represent the demographics and trip characteristics of the total cycling population. To explore the question, this study conducted a field intercept survey of 320 cyclists in the city of Ann Arbor and Grand Rapids located in Michigan, United States. The cyclists were categorized into three main groups based on the reported fitness tracker usage namely cyclists using the Strava app, cyclists using other fitness tracker apps and cyclists reported not to use any fitness tracker app. The logistic regression and Bagging (Bootstrapped Aggregating) decision tree were used in the analysis. The survey results suggest that cycling data on cycling activities from the Strava app and other fitness apps can jointly represent different cycling populations by age and gender. As for the trip purpose, a cyclist who reported using cycling fitness tracker apps had higher odds of being on a recreational trip as opposed to a commute trip. This indicates that the fusion of different crowdsourced fitness tracking data alone cannot offset the bias in the trip characteristics despite the observed diversity in demographic characteristics. Consequently, the study emphasizes the need for complementing crowdsourced cycling data emanated from multiple fitness tracker apps with other data sources to capture the travel behaviors of both commuting and recreational riders.
Published: 2021

31. Approximating XGBoost with an interpretable decision tree

Author: Omer Sagi and Lior Rokach
Subjects: Information Systems and Management, Exploit, Computer science, Decision tree, 02 engineering and technology, Machine learning, computer.software_genre, Theoretical Computer Science, Artificial Intelligence, Order (exchange), 0202 electrical engineering, electronic engineering, information engineering, Interpretability, business.industry, 05 social sciences, 050301 education, Computer Science Applications, Random forest, Tree (data structure), Control and Systems Engineering, Transparency (graphic), 020201 artificial intelligence & image processing, Gradient boosting, Artificial intelligence, business, 0503 education, computer, Software
Abstract: The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finary – the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to use black-box models in these scenarios, regardless of their high predictive performance. Decision forests, and in particular Gradient Boosting Decision Trees (GBDT), are examples of this kind of model. GBDT models are considered the state-of-the-art in many classification challenges, reflected by the fact that the majority of Kaggle’s recent winners used GBDT methods as a part of their solution (such as XGBoost). But despite their superior predictive performance, they cannot be used in tasks that require transparency. This paper presents a novel method for transforming a decision forest of any kind into an interpretable decision tree. The method extends the tool-set available for machine learning practitioners, who want to exploit the interpretability of decision trees without significantly impairing the predictive performance gained by GBDT models like XGBoost. We show in an empirical evaluation that in some cases the generated tree is able to approximate the predictive performance of a XGBoost model while enabling better transparency of the outputs.
Published: 2021

32. An argumentation enabled decision making approach for Fall Activity Recognition in Social IoT based Ambient Assisted Living systems

Author: Nancy Gulati and Pankaj Deep Kaur
Subjects: Computer Networks and Communications, business.industry, Computer science, Weighted voting, Decision tree, Wearable computer, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Support vector machine, Activity recognition, Statistical classification, Naive Bayes classifier, Hardware and Architecture, Home automation, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software
Abstract: With the advancement in Information and Communication Technologies (ICTs), smart devices are becoming even more smart and intelligent with every passing day. Further, the evolution of speaking and hearing enabled devices in an IoT network is transforming the face of research in the Social IoT domain. However, the integration of argumentation enabled devices in Social IoT network has not been fully explored by researchers in the past. Therefore, this research work focuses on development of argument enabled Social IoT networks. In this paper, a fuzzy argument based classification scheme termed as Classification Enhanced with Fuzzy Argumentation (CleFAR) is proposed. The proposed scheme is deployed for classification of fall activities in fall prevention applications. A novel framework for fall prevention system using Fall Activity Recognition (FAR) is presented. The proposed system is designed for the purpose of fall activity recognition in smart home Ambient Assisted Living (AAL) systems. To experimentally evaluate the system’s performance, a smart home AAL environment is simulated and the inhabitant’s routine activity dataset is generated. The fall activities are simulated using wearable fall detection systems. The proposed scheme is trained and tested on generated datasets and its performance is compared with traditional classification algorithms such as Random Forest (RF), Support Vector Machines (SVM), Naive Bayes (NB), Decision Tree (DT) and Artificial Neural Networks (ANN) as well as existing argumentation based game theoretic Weighted Voting Scheme (WVS). Experimental results indicate that the proposed scheme outperforms the traditional classification schemes and WVS approach with prediction accuracy up to 91%. It turns out that the proposed approach achieves significant improvement over the existing schemes.
Published: 2021

33. Artificial Intelligence outflanks all other machine learning classifiers in Network Intrusion Detection System on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing

Author: T. Prem Jacob and V. Kanimozhi
Subjects: Computer Networks and Communications, Computer science, Botnet, Decision tree, Denial-of-service attack, Information technology, 02 engineering and technology, Machine learning, computer.software_genre, AWS, Naive Bayes classifier, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, AdaBoost, CSE-CIC-IDS2018, business.industry, 020208 electrical & electronic engineering, Password cracking, 020206 networking & telecommunications, T58.5-58.64, Various machine learning classifiers, Hardware and Architecture, Malware, Artificial intelligence, Calibration curve, business, computer, Classifier (UML), Software, Information Systems
Abstract: Our paramount task is to examine and detect network attacks, is one of the daunting tasks because the variety of attacks are day by day existing in colossal number. The program proposed detects botnet attacks using the newest CSE-CIC-IDS2018 cyber dataset published by the Canadian Cybersecurity Establishment (CIC). The cyber dataset can be accessed on AWS (Amazon Web Services). The realistic network dataset consists of all the modern and existing attacks such as Brute-force attacks and password cracking, Heartbleed, Botnet, DoS (Denial of Service), DDoS also known as Distributed Denial of Service, Web attacks i.e. vulnerable web app attacks, and infiltration of the network from inside. The objective of the proposed research is to identify a classification of Botnet attacks. Botnet attack is a Trojan Horse malware attack that poses a serious security threat to the banking and financial sectors. Since a specific classifier could possibly work for such datasets it is crucial to finish a comparative examination of classifiers in order to achieve the most noteworthy execution in such basic detection of network attacks. The proposed framework is to incorporate different classifier methods such as KNearset Neighbor classifier, Naive Bayes, Adaboost with Decision Tree, Support Vector Machine classifier, Random Forest classifier, and Artificial Intelligence to distinguish a portrayal of botnet attacks on the recent and realistic cyber dataset CSE-CIC-IDS2018. The results of the classification are given as precise precision for the specific classifiers. And furthermore, the proposed framework uses the Calibration curve as a standard approach in analytical methods which generates reliability diagrams to check the predicted probabilities of various classifiers are well-calibrated or not. Finally, the displayed graph proves how well the artificial intelligence technique outperforms all other classifiers which generates reliability diagrams to check the predicted probabilities of various classifiers are well-calibrated or not.
Published: 2021

34. Lameness prediction in broiler chicken using a machine learning technique

Author: Luiz Antônio Lima, Irenilza de Alencar Nääs, Jair Minoro Abe, Rodrigo Franco Gonçalves, Henry Costa Ungaro, and Nilsa Duarte da Silva Lima
Subjects: 020209 energy, Agriculture (General), Decision tree, 02 engineering and technology, Information technology, Aquatic Science, Machine learning, computer.software_genre, 01 natural sciences, S1-972, Gait (human), Broiler welfare, 0202 electrical engineering, electronic engineering, information engineering, Broiler walking speed, Pruning (decision trees), Mathematics, business.industry, 010401 analytical chemistry, Genetic strain, Broiler, Confusion matrix, Forestry, T58.5-58.64, 0104 chemical sciences, Computer Science Applications, Preferred walking speed, Lameness, Gait score, Animal Science and Zoology, Artificial intelligence, business, Agronomy and Crop Science, computer
Abstract: Broiler flock welfare is usually assessed through mortality, physiology, behavior, and walking ability. The possibility of assessing broiler chicken lameness using the bird walking ability was investigated using the machine learning approach for the first time. Data on broiler walking speed and acceleration, genetic strain, and sex were recorded and input in a dataset. Broilers were classified according to the 6-point gait score (GS0 is a sound bird, and GS5 is a severely lame bird). Decision trees were built initially using all datasets. The confusion matrix of each developed model was analyzed. The pruning technique was used, removing from the dataset the variables that did not infer in the classification results. We reorganized the dataset and re-arranged the data by grouping the intermediate target class of gait score using the Borda Count method. Re-processing data, we obtained a new set of decision trees. Using the 3-point gait score (GS0 is a sound bird, and GS2 is a lame bird), we obtained a new model with better accuracy (78%); however, the model had a lower accuracy for classifying lame broilers (GS2, 5%). The final decision tree was selected for classifying broilers, either sound or lame, according to their walking speed. The developed model presented good accuracy (91%), and it ordered properly sound (86%) and lame birds (92%). The novel model might be used to assess broiler lameness on-farm by registering the bird displacement velocity. Further developments using the model might allow flock lameness detection automatically.
Published: 2021

35. Machine-Learned Decision Trees for Predicting Gold Nanorod Sizes from Spectra

Author: Peter J. Rossky, Behnaz Ostovar, Stephan Link, Logan D. C. Bishop, Katsuya Shiratori, Rashad Baiyasi, Christy F. Landes, and Yi-Yu Cai
Subjects: Gold nanorod, Materials science, business.industry, Decision tree, Pattern recognition, 02 engineering and technology, 010402 general chemistry, 021001 nanoscience & nanotechnology, 01 natural sciences, 0104 chemical sciences, Surfaces, Coatings and Films, Electronic, Optical and Magnetic Materials, General Energy, Artificial intelligence, Physical and Theoretical Chemistry, 0210 nano-technology, business
Published: 2021

36. Deep Convolution Neural Network Based System for Early Diagnosis of Alzheimer's Disease

Author: Yogesh Rathore and Rekh Ram Janghel
Subjects: business.industry, Computer science, Deep learning, 0206 medical engineering, Feature extraction, Biomedical Engineering, Biophysics, Decision tree, k-means clustering, Pattern recognition, 02 engineering and technology, 020601 biomedical engineering, Convolutional neural network, 030218 nuclear medicine & medical imaging, Support vector machine, 03 medical and health sciences, ComputingMethodologies_PATTERNRECOGNITION, 0302 clinical medicine, Preprocessor, Artificial intelligence, Sensitivity (control systems), business
Abstract: Objectives Alzheimer's Disease (AD) is the most general type of dementia. In all leading countries, it is one of the primary reasons of death in senior citizens. Currently, it is diagnosed by calculating the MSME score and by the manual study of MRI Scan. Also, different machine learning methods are utilized for automatic diagnosis but existing has some limitations in terms of accuracy. So, main objective of this paper to include a preprocessing method before CNN model to increase the accuracy of classification. Materials and method In this paper, we present a deep learning-based approach for detection of Alzheimer's Disease from ADNI database of Alzheimer's disease patients, the dataset contains fMRI and PET images of Alzheimer's patients along with normal person's image. We have applied 3D to 2D conversion and resizing of images before applying VGG-16 architecture of Convolution neural network for feature extraction. Finally, for classification SVM, Linear Discriminate, K means clustering, and Decision tree classifiers are used. Results The experimental result shows that the average accuracy of 99.95% is achieved for the classification of the fMRI dataset, while the average accuracy of 73.46% is achieved with the PET dataset. On comparing results on the basis of accuracy, specificity, sensitivity and on some other parameters we found that these results are better than existing methods. Conclusions this paper, suggested a unique way to increase the performance of CNN models by applying some preprocessing on image dataset before sending to CNN architecture for feature extraction. We applied this method on ADNI database and on comparing the accuracies with other similar approaches it shows better results.
Published: 2021

37. Recognition of human locomotion on various transportations fusing smartphone sensors

Author: Anindya Das Antar, Atiqur Rahman Ahad, and Masud Ahmed
Subjects: Artificial neural network, Computer science, business.industry, Deep learning, Decision tree, 02 engineering and technology, Overfitting, Machine learning, computer.software_genre, Linear discriminant analysis, 01 natural sciences, Random forest, Activity recognition, Recurrent neural network, Artificial Intelligence, 0103 physical sciences, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, 010306 general physics, business, computer, Software
Abstract: Recognition of daily human activities in various locomotion and transportation modes has numerous applications like coaching users for behavior modification and maintaining a healthy lifestyle. Besides, applications and user interfaces aware of user mobility through their smartphones can also aid in urban transportation planning, smart parking, and vehicular traffic monitoring. In this paper, we explored smartphone sensor-based two benchmark datasets (Sussex Huawei Locomotion (SHL) and Transportation Mode Detection (TMD)). Firstly, we demonstrated preprocesssing of sensor data, window length optimization based on Akaike Information Criteria (AIC), and introduced smartphone orientation independent features. We also provided an in-depth analysis of different smartphone sensors’ importance for classifying daily activities and transportation modes. We justified the sensor relevance by showing the variation of performances with the number of sensors explored. For refining classifier predictions, we also proposed a post-processing approach named “Mode technique”. This method primarily concentrates on the statistical analysis of transportation modes and improves the activity recognition rate in statistical classifiers: Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis, Logistic Regression, Support Vectors Machine with RBF kernel, Random Forest, and deep learning-based methods: Artificial Neural Network and Recurrent Neural Network by smoothing the outputs of these classifiers. Besides, we showed the use of magnitude and jerk-based features to overcome the overfitting problem due to smartphone orientation. We obtained 97.2% accuracy in the SHL dataset and 99.13% accuracy in the TMD dataset. These results demonstrate that our approach can profoundly recognize various activities in advanced locomotion and transportation modes compared to existing methods in two large-scale datasets.
Published: 2021

38. Developing a Low-Order Statistical Feature Set Based on Received Samples for Signal Classification in Wireless Sensor Networks and Edge Devices

Author: Colin C. Murphy, Kevin G. McCarthy, Philip J. Harris, and George D. O'Mahony
Subjects: IoT, Technology, Edge device, Computer science, Real-time computing, 02 engineering and technology, k-nearest neighbors algorithm, law.invention, Bluetooth, ZigBee, QA76.75-76.765, 0203 mechanical engineering, law, decision tree, 0202 electrical engineering, electronic engineering, information engineering, Wireless, Computer software, 020301 aerospace & aeronautics, Artificial neural network, Network packet, business.industry, 020206 networking & telecommunications, edge devices, Support vector machine, WSNs, machine learning, classification, Q300-390, business, Wireless sensor network, Cybernetics, XGBoost
Abstract: Classifying fluctuating operating wireless environments can be crucial for successfully delivering authentic and confidential packets and for identifying legitimate signals. This study utilizes raw in-phase (I) and quadrature-phase (Q) samples, exclusively, to develop a low-order statistical feature set for wireless signal classification. Edge devices making decentralized decisions from I/Q sample analysis is beneficial. Implementing appropriate security and transmitting mechanisms, reducing retransmissions and increasing energy efficiency are examples. Wireless sensor networks (WSNs) and their Internet of Things (IoT) utilization emphasize the significance of this time series classification problem. Here, I/Q samples of typical WSN and industrial, scientific and medical band transmissions are collected in a live operating environment. Analog Pluto software-defined radios and Raspberry Pi devices are utilized to achieve a low-cost yet high-performance testbed. Features are extracted from Matlab-based statistical analysis of the I/Q samples across time, frequency (fast Fourier transform) and space (probability density function). Noise, ZigBee, continuous wave jamming, WiFi and Bluetooth signal data are examined. Supervised machine learning approaches, including support vector machines, Random Forest, XGBoost, k nearest neighbors and a deep neural network (DNN), evaluate the developed feature set. The optimal approach is determined as an XGBoost/SVM classifier. This classifier achieves similar accuracy and generalization results, on unseen data, to the DNN, but for a fraction of time and computation requirements. Compared to existing approaches, this study’s principal contribution is the developed low-order feature set that achieves signal classification without prior network knowledge or channel assumptions and is validated in a real-world wireless operating environment. The feature set can extend the development of resource-constrained edge devices as it is widely deployable due to only requiring received I/Q samples and these features are warranted as IoT devices become widely used in various modern applications.
Published: 2021

39. BDF: A new decision forest algorithm

Author: Michael Bewong, Zahidul Islam, Ryan H.L. Ip, and Nasim Adnan
Subjects: Information Systems and Management, Computer science, 05 social sciences, Rank (computer programming), Decision tree, 050301 education, 02 engineering and technology, Base (topology), Computer Science Applications, Theoretical Computer Science, Random forest, Set (abstract data type), Artificial Intelligence, Control and Systems Engineering, Synchronization (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0503 education, Algorithm, Software, Diversity (business)
Abstract: The foremost requirement for a decision forest to achieve better ensemble accuracy is building a set of accurate and diverse individual decision trees as base classifiers . Existing decision forest algorithms mainly differ from each other on how they induce diversity among the decision trees. At the same time, most of the drawbacks of existing algorithms originate from their induction processes of diversity. In this paper, we propose a new decision forest algorithm that is more balanced through effective synchronization between different sources of diversity. The proposed algorithm is balanced theoretically and empirically. We carried out experiments on 25 well-known data sets that are publicly available from the UCI Machine Learning Repository, to perform an extensive empirical evaluation. The experimental results indicate that the proposed algorithm has the best average ensemble accuracy rank of 1.8 compared to its closest competitor at 3.5. Using the Friedman and Bonferroni-Dunn tests, we also show that such an improvement is indeed statistically significant. In addition, the proposed algorithm is found to be competitive in terms of complexity and other relevant parameters.
Published: 2021

40. Optimization of air pollution measurements with unmanned aerial vehicle low-cost sensor based on an inductive knowledge management method

Author: Dawid Przysiężniuk, Arkadiusz Gardecki, Sławomir Pochwała, Piotr Lewandowski, Adam Deptuła, and Stanisław Anweiler
Subjects: Pollution, Control and Optimization, Knowledge management, media_common.quotation_subject, Decision tree, Air pollution, Aerospace Engineering, 02 engineering and technology, medicine.disease_cause, computer.software_genre, 01 natural sciences, Altitude, 0202 electrical engineering, electronic engineering, information engineering, medicine, Electrical and Electronic Engineering, Civil and Structural Engineering, media_common, business.industry, Mechanical Engineering, System of measurement, 010401 analytical chemistry, 020206 networking & telecommunications, Particulates, Drone, Expert system, 0104 chemical sciences, Environmental science, business, computer, Software
Abstract: The article presents the study of Particulate Matter air pollution with PM1, PM2,5 and PM10 by means of a low-cost sensors mounted on Unmanned Aerial Vehicles. The article is divided into two parts. In first part pollution measurement system is described. In second part expert system for optimization of flight parameters is described. The research was conducted over a municipal cemetery area in Poland. The obtained results were analyzed through an inductive knowledge management system (decision tree method) for classification analysis of air pollution. The decision tree mechanism would be used to optimize flight parameters taking into account the air pollution parameters. The analysis was made from the influence of PM concentration point of view, depending on the altitude. The decision tree method was used, which allowed to determine, among other aspects, which PM indicator should be measured and which altitude plays a greater role in the optimization of air pollution measurements by means of cheap sensors mounted on drones. As a result of the analysis, the optimum flight altitude of the measurement drone in the specified area was determined.
Published: 2021

41. Decision tree classifiers for unmanned aircraft configuration selection

Author: Ney Rafael Secco and João Antônio Dantas de Jesus Ferreira
Subjects: 020301 aerospace & aeronautics, Traceability, Computer science, Decision tree, Aerospace Engineering, Wing configuration, ComputerApplications_COMPUTERSINOTHERSYSTEMS, 02 engineering and technology, 01 natural sciences, Phase (combat), Operational requirements, 010305 fluids & plasmas, 0203 mechanical engineering, Work (electrical), 0103 physical sciences, Configuration selection, Systems engineering
Abstract: Purpose This paper aims to investigate the possibility of lowering the time taken during the aircraft design for unmanned aerial vehicles by using machine learning (ML) for the configuration selection phase. In this work, a database of unmanned aircraft is compiled and is proposed that decision tree classifiers (DTC) can understand the relations between mission and operational requirements and the resulting aircraft configuration. Design/methodology/approach This paper presents a ML-based approach to configuration selection of unmanned aircraft. Multiple DTC are built to predict the overall configuration. The classifiers are trained with a database of 118 unmanned aircraft with 57 characteristics, 47 of which are inputs for the classification problem, and 10 are the desired outputs, such as wing configuration or engine type. Findings This paper shows that DTC can be used for the configuration selection of unmanned aircraft with reasonable accuracy, understanding the connections between the different mission requirements and the culminating configuration. The framework is also capable of dealing with incomplete databases, maximizing the available knowledge. Originality/value This paper increases the computational usage for the aircraft design while retaining requirements’ traceability and increasing decision awareness.
Published: 2021

42. Modeling on Disruption Risk Prediction of Manufacturing Supply Chain Based on C4.5 Algorithm

Author: Weibin Wang, Renyong Chi, and Caihong Liu
Subjects: 0209 industrial biotechnology, Supply chain, Decision tree, 02 engineering and technology, Factor analysis of information risk, 020901 industrial engineering & automation, C4.5 algorithm, Incentive, Risk analysis (engineering), Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Customer satisfaction, Business, Information flow (information theory), Electrical and Electronic Engineering, Robustness (economics)
Abstract: Under the impact of covid-19, the global and domestic manufacturing supply chains, almost suffered from the serious interruption crisis of manpower flow, logistics, information flow and capital flow. The risk of supply chain disruption has become the primary risk of the supply chain. However, some risk inducement of supply chain interruption is complex and diverse, so it is very difficult to grasp and screen the risk data needed for research from the supply chain operation data. To improve the robustness of supply chain for boosting the domestic and international circulation of China's manufacturing, in this paper, according to the characteristics of China's manufacturing supply chain and its risk incentives, the data needed for risk prediction modeling has been sorted out through questionnaire survey, and a regression model of risk prediction for manufacturing supply chain by using empirical method would be put forward. Then, C4.5 decision tree method is used to train and evaluation the risk prediction model. The conclusion shows that the customer satisfaction has great diagnostic value for risk, and the model has a strong sensitivity to market information risk and market order risk. The conclusion is more consistent with general cognition, and the model fits well, indicating that the model proposed in this paper has a certain theoretical significance, and its practical application value is worthy of further testing. © 2021, North Atlantic University Union NAUN. All rights reserved.
Published: 2021

43. DEVELOPING A PARALLEL CLASSIFIER FOR MINING IN BIG DATA SETS

Author: Ahad Shamseen, Morteza Mohammadi Zanjireh, Mahdi Bahaghighat, and Qin Xin
Subjects: General Computer Science, Applied Mathematics, General Chemical Engineering, General Engineering, 020206 networking & telecommunications, 02 engineering and technology, Engineering (General). Civil engineering (General), Big data, Parallel classifier, Decision tree, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, TA1-2040, Data mining, SPRINT classifier
Abstract: Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.
Published: 2021

44. A Comparative Analysis of Machine Learning Algorithms to Predict Alzheimer’s Disease

Author: Manjit Kaur, Mohammad Monirujjaman Khan, Maliha Mamtaz, Morshedul Bari Antor, Parminder Singh, Sultan Aljahdali, Mehedi Masud, and A H M Shafayet Jamil
Subjects: Medicine (General), Support Vector Machine, Article Subject, Computer science, media_common.quotation_subject, Biomedical Engineering, MEDLINE, Decision tree, Health Informatics, 02 engineering and technology, Disease, Machine learning, computer.software_genre, Logistic regression, Machine Learning, 03 medical and health sciences, R5-920, 0302 clinical medicine, Alzheimer Disease, Reading (process), Medical technology, 0202 electrical engineering, electronic engineering, information engineering, medicine, Humans, Dementia, R855-855.5, Aged, media_common, business.industry, Brain, medicine.disease, Random forest, Support vector machine, 020201 artificial intelligence & image processing, Surgery, Artificial intelligence, business, computer, Algorithms, 030217 neurology & neurosurgery, Research Article, Biotechnology
Abstract: Alzheimer’s disease has been one of the major concerns recently. Around 45 million people are suffering from this disease. Alzheimer’s is a degenerative brain disease with an unspecified cause and pathogenesis which primarily affects older people. The main cause of Alzheimer’s disease is Dementia, which progressively damages the brain cells. People lost their thinking ability, reading ability, and many more from this disease. A machine learning system can reduce this problem by predicting the disease. The main aim is to recognize Dementia among various patients. This paper represents the result and analysis regarding detecting Dementia from various machine learning models. The Open Access Series of Imaging Studies (OASIS) dataset has been used for the development of the system. The dataset is small, but it has some significant values. The dataset has been analyzed and applied in several machine learning models. Support vector machine, logistic regression, decision tree, and random forest have been used for prediction. First, the system has been run without fine-tuning and then with fine-tuning. Comparing the results, it is found that the support vector machine provides the best results among the models. It has the best accuracy in detecting Dementia among numerous patients. The system is simple and can easily help people by detecting Dementia among them.
Published: 2021

45. An Adaptive Approach for Dynamic Load Modeling in Microgrids

Author: L. Chavarro-Barrera, Juan Mora-Flórez, and Sandra Pérez-Londoño
Subjects: General Computer Science, Computer science, 020209 energy, 020208 electrical & electronic engineering, Decision tree, Stability (learning theory), Parameterized complexity, 02 engineering and technology, Ant colony, Dynamic load testing, Control theory, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Microgrid
Abstract: Electric microgrids require accurate dynamic models for operation, control, stability, and protection studies, then adequate load modeling plays an important role. This paper presents a two-stage adaptive approach to improve the generalization capability of load models obtained with the measurement-based modeling. The load model and their respective parameters are obtained through machine learning tools like decision trees (DTs) and optimization algorithms as ant colony (ACO). In the off-line stage of the proposed approach, several parameterized load models are optimally obtained using a database of microgrid disturbances. Then, the best model to represent each disturbance is defined using a similarity criterion. This model and the disturbance characteristics are integrated into a DT (classifier), while the characteristics and the model parameters are related in a second DT (predictor). These DTs are used in an on-line stage to swiftly determine the adequate parameterized load model in the case of a new disturbance in the microgrid. The approach’s performance is compared with the conventional measurement-based load modeling in a modified CIGRE benchmark low voltage microgrid. The results evidence the advantages of the proposed adaptive approach for dynamic load modeling.
Published: 2021

46. Efficient and Secure Decision Tree Classification for Cloud-Assisted Online Diagnosis Services

Author: Zheng Qin, Lu Ou, Sheng Xiao, Jinwen Liang, and Xiaodong Lin
Subjects: 021110 strategic, defence & security studies, Information retrieval, business.industry, Computer science, Decision tree learning, 0211 other engineering and technologies, Decision tree, Cryptography, Cloud computing, 02 engineering and technology, Encryption, Symmetric-key algorithm, Server, Electrical and Electronic Engineering, business, Decision table
Abstract: Decision tree classification has become a prevailing technique for online diagnosis services. By outsourcing computation intensive tasks to a cloud server, cloud-assisted online diagnosis services are better ways for cases that the storage and computation requirements exceed the capability of medical institutions. With privacy concerns as well as intellectual property protection issues, the valuable diagnosis classifier and the sensitive user data should be protected against the cloud server. In this paper, we identify a work-flow for cloud-assisted online diagnosis services. We propose an efficient and secure decision tree classification scheme in the proposed work-flow. Specifically, the medical institution transforms a locally pre-trained decision tree classifier to a decision table, and later uses searchable symmetric encryption to encrypt the decision table. Then, the encrypted table is outsourced to the cloud server, and a user can submit encrypted physiological features to the cloud server and obtain an encrypted diagnosis prediction back. We provide formal security proofs to demonstrate that our scheme protects the confidentiality of the decision tree classifier and the user’s data. The performance analysis shows that our scheme achieves faster-than-linear classification speed. Experimental evaluations show that our scheme requires several micro-seconds to process a diagnosis request in the tested datasets.
Published: 2021

47. Evaluation and determinants of preschool effectiveness in Chile

Author: Emili Tortosa-Ausina, Víctor Giménez, Claudio Thieme, and Diego Prior
Subjects: Economics and Econometrics, Index (economics), Strategy and Management, Geography, Planning and Development, 0211 other engineering and technologies, Decision tree, Public policy, effectiveness, 02 engineering and technology, Management Science and Operations Research, Family income, assurance region, preschool, 0502 economics and business, 050207 economics, Socioeconomic status, 021103 operations research, 05 social sciences, Intervention (law), Premise, centralized data envelopment analysis, Job satisfaction, Demographic economics, Statistics, Probability and Uncertainty, Psychology, composite indicators
Abstract: Early intervention in quality education is a way to equalize opportunities, a premise assumed by several countries. As a result, interest has grown in preschool education, which is now an important area of public policy. However, the performance of preschool education centers has received relatively scarce research attention. Consistent with the above, the aim of this study is to examine the problem and evaluate the performance of preschool education centers that serve children from lower socioeconomic families in Chile. To this end, a centralized DEA Assurance Region model is proposed and compared with two other models. In addition, a second stage analysis is carried out using decision trees to identify variables that determine the composition of homogeneous preschool groups according to their effectiveness. The results show an average efficiency index of 70.54% with important heterogeneities between regions of the country, and significantly less than the 84.47% obtained through a traditional benefit-of-the-doubt model. The centralized (unique) weights obtained were 45% for learning, and 32% for user satisfaction and job satisfaction. A second stage analysis reveals the importance of three structural factors that mark the effectiveness of kindergartens: (i) size of the kindergarten; (ii) family income; and (iii) urban or rural location of the kindergarten. These results are relevant to public policy since they propose a valid and useful methodology for decision makers, quantify the levels of effectiveness, and emphasize the importance of focusing efforts on the centers that, due to their structural conditions, are more prone to lower levels of effectiveness. Similarly, they validate and align with the changes at this level of education in Chile in recent years.
Published: 2022

48. Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi

Author: Agam Madan, Qin Xin, Ankit Chaudhary, Shubham Shubham, Vedika Gupta, and Nikita Jain
Subjects: Hindi, General Computer Science, Computer science, business.industry, Sentiment analysis, Decision tree, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Lexicon, Convolutional neural network, Mandarin Chinese, language.human_language, Support vector machine, Resource (project management), 0202 electrical engineering, electronic engineering, information engineering, language, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter. Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.
Published: 2021

49. Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability

Author: Naimul Mefraz Khan and Muhammad Rehman Zafar
Subjects: explainable artificial intelligence (XAI), Computer engineering. Computer hardware, Computer science, Decision tree, Stability (learning theory), Feature selection, Linear classifier, 02 engineering and technology, Machine learning, computer.software_genre, 01 natural sciences, TK7885-7895, 010104 statistics & probability, Black box, deterministic explanations, local explanations, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), stable explanations, 0101 mathematics, Interpretability, business.industry, human interpretable explanations, Linear model, interpretable machine learning, model agnostic explanations, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique used to increase the interpretability and explainability of black box Machine Learning (ML) algorithms. LIME typically creates an explanation for a single prediction by any ML model by learning a simpler interpretable model (e.g., linear classifier) around the prediction through generating simulated data around the instance by random perturbation, and obtaining feature importance through applying some form of feature selection. While LIME and similar local algorithms have gained popularity due to their simplicity, the random perturbation methods result in shifts in data and instability in the generated explanations, where for the same prediction, different explanations can be generated. These are critical issues that can prevent deployment of LIME in sensitive domains. We propose a deterministic version of LIME. Instead of random perturbation, we utilize Agglomerative Hierarchical Clustering (AHC) to group the training data together and K-Nearest Neighbour (KNN) to select the relevant cluster of the new instance that is being explained. After finding the relevant cluster, a simple model (i.e., linear model or decision tree) is trained over the selected cluster to generate the explanations. Experimental results on six public (three binary and three multi-class) and six synthetic datasets show the superiority for Deterministic Local Interpretable Model-Agnostic Explanations (DLIME), where we quantitatively determine the stability and faithfulness of DLIME compared to LIME.
Published: 2021

50. Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management

Author: Zhu Gu and Chaohu He
Subjects: Technology, Computer Networks and Communications, Computer science, ID3 algorithm, Decision tree, Mobile computing, 020206 networking & telecommunications, TK5101-6720, 02 engineering and technology, Iris flower data set, Data set, Tree (data structure), Data file, Telecommunication, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Electrical and Electronic Engineering, Algorithm, Information Systems, Pace
Abstract: After the reform and the opening, the economy of our country has developed rapidly, and the living conditions of the people have become better and better. As a result, they have a lot of time to pay attention to their health, which has promoted the rapid development of the sports and fitness industry in my country. In response to the increasing development of the sports and fitness sector of my country, the current state of the administration of members of the sports fitness industry does not keep pace with the development of the sports and fitness industry of my country. Based on this, this article uses a fuzzy decision tree algorithm to establish a decision tree based on the characteristics of customer data and loses existing customers. Analyzing the situation is of strategic significance for improving the competitiveness of the club. This article selects the 7 most commonly used data sets from the UCI data set as the initial experimental data for model training in three different formats and then uses the data of a specific club member to conduct experiments, using these data files as training samples to construct a vague analysis of the decision tree to overturn the customer to analyze the main factors of customer change. Experiments show that the fuzzy decision tree ID3 algorithm based on mobile computing has the highest accuracy in the Iris data set, reaching 97.8%, and the accuracy rate in the Wine data set is the smallest, only 65.2%. The mobile computing-based fuzzy decision tree ID3 algorithm proposed in this paper obtained the highest correct rate (86.32%). This shows that, compared to traditional analysis methods, the blurred decision tree obtained for churn client analysis has the advantages of high classification accuracy and is understandable so that ideal classification accuracy can be achieved when the tree is small.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

8,275 results on '"Decision tree"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources