Descriptor: "RANDOM forest algorithms" / Publication Year Range: Last 3 years / Publisher: springer nature / Topic: decision trees and supervised learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"RANDOM forest algorithms"' showing total 10 results

Start Over Descriptor "RANDOM forest algorithms" Topic decision trees Topic supervised learning Publication Year Range Last 3 years Publisher springer nature

10 results on '"RANDOM forest algorithms"'

1. Hellinger distance decision trees for PU learning in imbalanced data sets.

Author: Ortega Vázquez, Carlos, vanden Broucke, Seppe, and De Weerdt, Jochen
Subjects: DECISION trees, RANDOM forest algorithms, FRAUD investigation, SUPERVISED learning
Abstract: Learning from positive and unlabeled data, or PU learning, is the setting in which a binary classifier can only train from positive and unlabeled instances, the latter containing both positive as well as negative instances. Many PU applications, e.g., fraud detection, are also characterized by class imbalance, which creates a challenging setting. Not only are fewer minority class examples compared to the case where all labels are known, there is also only a small fraction of unlabeled observations that would actually be positive. Despite the relevance of the topic, only a few studies have considered a class imbalance setting in PU learning. In this paper, we propose a novel technique that can directly handle imbalanced PU data, named the PU Hellinger Decision Tree (PU-HDT). Our technique exploits the class prior to estimate the counts of positives and negatives in every node in the tree. Moreover, the Hellinger distance is used instead of more conventional splitting criteria because it has been shown to be class-imbalance insensitive. This simple yet effective adaptation allows PU-HDT to perform well in highly imbalanced PU data sets. We also introduce PU Stratified Hellinger Random Forest (PU-SHRF), which uses PU-HDT as its base learner and integrates a stratified bootstrap sampling. Our empirical analysis shows that PU-SHRF substantially outperforms state-of-the-art PU learning methods for imbalanced data sets in most experimental settings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Explainable Ensemble Trees.

Author: Aria, Massimo, Gnasso, Agostino, Iorio, Carmela, and Pandolfo, Giuseppe
Subjects: *RANDOM forest algorithms, *MACHINE learning, *MAJORITIES, *DECISION trees, *INDEPENDENT variables, *PLURALITY voting, *DECISION making, *SUPERVISED learning
Abstract: Ensemble methods are supervised learning algorithms that provide highly accurate solutions by training many models. Random forest is probably the most widely used in regression and classification problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression. However, such an algorithm suffers from a lack of explainability and thus does not allow users to understand how particular decisions are made. To improve on that, we propose a new way of interpreting an ensemble tree structure. Starting from a random forest model, our approach is able to explain graphically the relationship structure between the response variable and predictors. The proposed method appears to be useful in all real-world cases where model interpretation for predictive purposes is crucial. The proposal is evaluated by means of real data sets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. logicDT: a procedure for identifying response-associated interactions between binary predictors.

Author: Lau, Michael, Schikowski, Tamara, and Schwender, Holger
Subjects: REGRESSION trees, MACHINE learning, RANDOM forest algorithms, DECISION trees, SUPERVISED learning, STATISTICAL learning
Abstract: Interactions between predictors play an important role in many applications. Popular and successful tree-based supervised learning methods such as random forests or logic regression can incorporate interactions associated with the considered outcome without specifying which variables might interact. Nonetheless, these algorithms suffer from certain drawbacks such as limited interpretability of model predictions and difficulties with negligible marginal effects in the case of random forests or not being able to incorporate interactions with continuous variables, being restricted to additive structures between Boolean terms, and not directly considering conjunctions that reveal the interactions in the case of logic regression. We, therefore, propose a novel method called logic decision trees (logicDT) that is specifically tailored to binary input data and helps to overcome the drawbacks of existing methods. The main idea consists of considering sets of Boolean conjunctions, using these terms as input variables for decision trees, and searching for the best performing model. logicDT is also accompanied by a framework for estimating the importance of identified terms, i.e., input variables and interactions between input variables. This new method is compared to other popular statistical learning algorithms in simulations and real data applications. As these evaluations show, logicDT is able to yield high prediction performances while maintaining interpretability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Uncovering the factors that affect earthquake insurance uptake using supervised machine learning.

Author: Ng'ombe, John N., Addai, Kwabena Nyarko, Mzyece, Agness, Han, Joohun, and Temoso, Omphile
Subjects: *EARTHQUAKE insurance, *SUPERVISED learning, *MACHINE learning, *RANDOM forest algorithms, *DECISION trees
Abstract: The escalating threat of natural disasters to public safety worldwide underlines the crucial role of effective environmental risk management tools, such as insurance. This is particularly evident in the case of earthquakes that occurred in Oklahoma between 2011 and 2020, which were linked to wastewater injection, underscoring the need for earthquake insurance. In this regard, from a survey of 812 respondents in Oklahoma, USA, we used supervised machine learning techniques (i.e., logit, ridge, least absolute shrinkage and selection operator (LASSO), decision tree, and random forest classifiers) to identify the factors that influence earthquake insurance uptake and to predict individuals who would acquire earthquake insurance. Our findings reveal that influential factors that affect earthquake insurance uptake include demographic factors such as older age, male gender, race, and ethnicity. These were found to significantly influence the decision to purchase earthquake insurance. Additionally, individuals residing in rental properties were less likely to purchase earthquake insurance, while longer residency in Oklahoma had a positive influence. Past experience of earthquakes was also found to positively influence the decision to purchase earthquake insurance. Both decision trees and random forests demonstrated good predictive capabilities for identifying earthquake insurance uptake. Notably, random forests exhibited higher precision and robustness, emerging as an encouraging choice for earthquake insurance modeling and other classification problems. Empirically, we highlight the importance of insurance as an environmental risk management tool and emphasize the need for awareness and education on earthquake insurance as well as the use of supervised machine learning algorithms for classification problems. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Machine learning based analytical approach for geographical analysis and prediction of Boston City crime using geospatial dataset.

Author: Sharma, Hitesh Kumar, Choudhury, Tanupriya, and Kandwal, Adarsh
Subjects: CRIME, MACHINE learning, SUPERVISED learning, ARTIFICIAL intelligence, DECISION trees, RANDOM forest algorithms
Abstract: Machine Learning algorithms has proved its significant contribution in all major domains of technical and non-technical sectors. In present days, the Intelligence Bureau is also using Artificial Intelligence and Machine Learning based analytical approach to predict crime location using past crime data for a given geographical location. Availability of digital records of last few years crimes happened in a certain geographical location is helping crime control division to predict the possible zones for happening next crime and take some precautionary action to reduce the probability of occurring the unwanted event. The government of Boston city initiate to improve city by releasing Crime Incident Report dataset of Boston city to the public. The researcher or analyst can take their initiatives to develop some crime prediction models that can help Boston Police Department to identify the crime prone locations in Boston city and could take some well advance measure to reduce the crime. In this research work, we have analyzed the provided dataset and did an exploratory data analysis to identified the high and low crime prone locations, most severe and least severe crime, year-wise and month-wise crime and shooted and not-shooted crime cases in Boston city. The result presented in this study shows that random forest with Principle Component Analysis (PCA) improve the classification result by 9% in accuracy with comparison to simple decision tree, and PCA with decision tree gives 5% more accuracy than decision tree. Although the computation time is increased in PCA based algorithms in compare to simple decision tree. The proposed research work opens the doors for application of these supervised learning algorithms for prediction and classifying crimes in some other states. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. A semi-supervised coarse-to-fine approach with bayesian optimization for lithology identification.

Author: Xie, Yunxin, Jin, Liangyu, Zhu, Chenyang, and Wu, Siyu
Subjects: *MACHINE learning, *PETROLOGY, *BOOSTING algorithms, *SUPERVISED learning, *PETROLEUM prospecting, *RANDOM forest algorithms, *DECISION trees
Abstract: Lithology identification is critical in the interpretation of well-logging data for petroleum exploration and development. However, the limited availability of labeled well-logging data for machine learning model training can lead to compromised accuracy in lithology classification models. Here, we propose a semi-supervised lithology identification model to overcome this challenge. Our framework consists of Bayesian optimization for tuning ensemble algorithms, including random forest, gradient boosting decision tree, extremely randomized trees, and adaptive boosting, to establish a high-quality baseline model for semi-supervised learning. We also employ a self-training strategy to increase the number of labeled samples in the training set and use the predicted label with the highest confidence as a pseudo-label to reduce the accumulation of deviation caused by incorrect pseudo-labels. Our semi-supervised coarse-to-fine framework improves rock classification accuracy, particularly for sandstone. Testing our model on well-logging data from two real regions, we found that the ExtraRF-based semi-supervised model in the HGF area performs the best, with a maximum classification accuracy of 91.6 % , which is 5 % higher than the original coarse-to-fine model without using Bayesian optimization and pseudo-labeling techniques. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Web-S4AE: a semi-supervised stacked sparse autoencoder model for web robot detection.

Author: Jagat, Rikhi Ram, Sisodia, Dilip Singh, and Singh, Pradeep
Subjects: *BLOGS, *ROBOTS, *DEEP learning, *COMPUTER software, *RANDOM forest algorithms, *DECISION trees, *SUPERVISED learning
Abstract: Web robots are automated computer programs that can be exploited for benign and malicious activities such as website indexing, monitoring, or unauthorized content scraping and scalping. Several methods are available to detect automated web robots through their footprints and behaviors. Although the accuracy and efficiency of existing methods depend highly on the labeled web log data, countless web requests are generated daily with the help of web robots. Exhaustive and accurate manual labeling of reconstructed sessions is time-consuming and challenging. Further, effective detection of web robots is more challenging with unlabeled or partially labeled data. To address the aforementioned issues, we reformulated web robot detection as a semi-supervised learning problem. In this paper, we propose a deep learning-based Semi-Supervised Stacked Sparse AutoEncoder (Web-S4AE) for web robot detection. The proposed model uses content-based features and features extracted from web access log data to effectively classify web robots. The experiments were conducted on publicly available web log data from a library and information portal to assess the performance of Web-S4AE. The Web-S4AE model was trained in two phases. The first phase; comprises training the model with unlabeled data to extract the hidden information, and in the second phase, the model is fine-tuned using labeled data. The results suggest that incorporating more unlabeled data can significantly improve the classifier's performance. The Web-S4AE model's performance was also compared with other models such as the Decision Tree (DT), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Multi-Layer Perceptron (MLP). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. A Tri-Training method for lithofacies identification under scarce labeled logging data.

Author: Zhu, Xinyi, Zhang, Hongbing, Ren, Quan, Zhang, Dailu, Zeng, Fanxing, Zhu, Xinjie, and Zhang, Lingyuan
Subjects: *DATA logging, *LITHOFACIES, *SUPPORT vector machines, *RANDOM forest algorithms, *SECURE Sockets Layer (Computer network protocol), *DECISION trees, *SUPERVISED learning
Abstract: Lithofacies identification is critical to energy exploration and reservoir evaluation. Machine learning provides a way to use logging data for lithofacies intelligence identification. However, labeled logging data are usually scarce, which makes the currently used supervised algorithms less effective, so semi-supervised methods have received attention from researchers. In this paper, we propose to apply Tri-Training to the field of lithofacies recognition. The framework used Random Forest (RF), Gradient-Boosted Decision Trees (GBDT), and Support Vector Machine (SVM), as the baseline supervised classifiers, and based on the idea of inductive semi-supervised methods and ensemble learning. Baseline classifiers are trained and iterated using unlabeled data to obtain effect improvement. The final results are output in an ensemble paradigm. We used seven logging parameters from two wells as input and divide the data randomly 10 times for training and testing. With only five samples of each lithology, the prediction accuracy improved by the average of 2.1% and 14.5% in both wells compared to the baseline methods. In addition, we also compared two commonly used semi-supervised methods, label propagation algorithm (LPA) and Co-Training. The experimental results also confirm that Tri-training has the better and more stable performance. The Tri-training method in this paper can be effectively applied to lithofacies identification under scarce labeled logging data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. An SSH predictive model using machine learning with web proxy session logs.

Author: Lee, Junwon and Lee, Heejo
Subjects: *MACHINE learning, *PREDICTION models, *SUPERVISED learning, *DEEP learning, *DECISION trees, *RANDOM forest algorithms
Abstract: An adversary can use SSH communication as a route for information leakage or hacking. Many studies have focused on TCP header analysis to detect encrypted communication. However, SSH detection using TCP header analysis is limited when changing TCP port information or modifying components of the SSH protocol. Various machine-learning (ML) techniques have been introduced to enhance network traffic classification by analyzing TCP headers. Most ML-based traffic classification research has analyzed network packet flows. However, because of the complex structures and the various implementations of the TCP protocol, a lot of time and resources are required for the recombination of network packet flows. This paper presents a novel contribution to overcome the problems of network packet analysis that employs web proxy session logs, which do not require the recombination of packets to prepare a dataset for analysis. Moreover, we propose a hybrid predictive model that is useful for web proxy session log analysis. In the modeling process, we collected the web proxy logs from an actual network of ICT companies (more than 10,000 employees, Seoul, South Korea) and used the random forest and decision tree algorithms for the supervised learning. The detection rate (DR) for the training dataset was 99.9%, which is similar to or higher than that of other studies using ML and deep learning. Using the dataset of DARPA99, we proved that the DR and FPR for our proposed model were better than those achieved by Alshammari et al.'s model. We expect that the proposed predictive model can be used to block illegal attempts at SSH communication over HTTP CONNECT by changing the destination port and to detect novel illegal communication protocols. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Selecting an appropriate supervised machine learning algorithm for predictive maintenance.

Author: Ouadah, Abdelfettah, Zemmouchi-Ghomari, Leila, and Salhi, Nedjma
Subjects: *MACHINE learning, *SUPERVISED learning, *RANDOM forest algorithms, *DECISION trees, *CLASSIFICATION algorithms
Abstract: Predictive maintenance refers to predicting malfunctions using data from monitoring equipment and process performance measurements. Machine learning algorithms and techniques are often used to analyze equipment monitoring data. Machine learning is the process in which a computer can work more precisely by collecting and analyzing data. It is often the case that machine learning algorithms use supervised learning, in which labelled data is used to feed the algorithm. However, there are many supervised machine learning algorithms available. Therefore, choosing the best-supervised machine learning algorithm to resolve predictive maintenance issues is not trivial. This paper aims to increase the performance of predictive maintenance and achieve its goals by selecting the most suitable supervised machine learning algorithm. Based on the most commonly used criteria in research articles, we selected three supervised machine learning algorithms from a comparative study: Random forest, Decision tree and KNN. We then tested selected algorithms on data from real-world and simulation scenarios. Finally, we conducted the experiment based on vibration analysis and reliability evaluation. We noticed that Random forests and Decision trees obtained slightly the same performance. KNN is a better classification algorithm for extensive volumes of data; on the contrary, Random forest performs better in the case of small datasets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

10 results on '"RANDOM forest algorithms"'

1. Hellinger distance decision trees for PU learning in imbalanced data sets.

2. Explainable Ensemble Trees.

3. logicDT: a procedure for identifying response-associated interactions between binary predictors.

4. Uncovering the factors that affect earthquake insurance uptake using supervised machine learning.

5. Machine learning based analytical approach for geographical analysis and prediction of Boston City crime using geospatial dataset.

6. A semi-supervised coarse-to-fine approach with bayesian optimization for lithology identification.

7. Web-S4AE: a semi-supervised stacked sparse autoencoder model for web robot detection.

8. A Tri-Training method for lithofacies identification under scarce labeled logging data.

9. An SSH predictive model using machine learning with web proxy session logs.

10. Selecting an appropriate supervised machine learning algorithm for predictive maintenance.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

10 results on '"RANDOM forest algorithms"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources