356 results on '"RANDOM forest algorithms"'
Search Results
2. An innovative method for predicting Alzheimer's disease using the random forest classifier algorithm and compared with extreme gradient boosting algorithm to enhance the accuracy of prediction.
- Author
-
Sai, R. Venkata Hruthick and Gayathri, A.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *ALZHEIMER'S disease , *STATISTICS , *CONFIDENCE intervals , *BOOSTING algorithms - Abstract
The project's goal is to find a technique to detect Alzheimer's disease in people using two machine learning algorithms, the extreme Gradient Boost method and the Novel Random Forest Classifier. Consequently, we will evaluate the Novel Random Forest Classifier alongside the extreme Gradient Boost method to determine their relative merits. Two machine learning algorithms—the revolutionary Random Forest Classifier and the extreme Gradient Boost Algorithm—were used to assess the accuracy of Alzheimer's disease prediction. Using clinical data, we ran a series of iterations with 373 samples, each repeated 10 times, to get the optimal sample size. In both cases, we used a power of 80% and a confidence interval of 95%. The Novel Random Forest Classifier technique achieved a 98.24% improvement in performance compared to the extreme Gradient Boost method. Results that are statistically noteworthy were produced by using the Novel Random Forest Classifier approach and the extreme Gradient Boost strategy (p=0.003, p<0.05). Statistical analysis using the independent sample T-test confirms the relevance of this work. The Novel Random Forest Classifier approach outperformed the extreme Gradient Boost method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Comparison of random forest with K-nearest neighbors to detect fake news with improved accuracy.
- Author
-
Saranya, K. S. Sri, Juliet, A. Hency, and Nataraj, Chandrasekharan
- Subjects
- *
K-nearest neighbor classification , *FAKE news , *RANDOM forest algorithms , *MACHINE learning , *TRUST - Abstract
To develop an automated, reliable and effective system that detects fake news articles, with the goal of reducing the spread of misinformation and promoting the dissemination of accurate and trustworthy information because the rapid spread of fake innovative news has dturn out to be a foremost concern in modern years. Materials and Methods: The effectiveness of two methods Random Forest and K Nearest Neighbor (KNN) are compared in predicting fake news. The evaluation was carried out using a Github dataset of 2000 newscast informations labeled as either counterfeit or unaffected, with performance metrics such as accuracy used to compare the two algorithms. The model size of the group is 10. Results and Discussions: The result shows that KNN outperformed Random Forest (RF) in terms of all the performance metrics, suggesting that KNN is a more effective method for detecting fake news. The significance value for this study is p=0.001 which is less than 0.05. Hence, there is a statistically significant difference between the two groups. Conclusion: The results suggest that KNN with 81.20% accuracy is a more effective algorithm for fake news detection. This study provides valuable insights into the effectiveness of Machine Learning (ML) algorithms in detecting fake-news. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Maximising the accuracy of handwritten alphabet recognition using Bayesian regression over random forest.
- Author
-
Prakash, J., Dass, P., Kavitha, N., and Thiruchelvam, V.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *STATISTICS , *RECOGNITION (Psychology) , *INSTITUTIONAL repositories - Abstract
This research paper deals with maximising the accuracy of recognising Handwritten Alphabet using Bayesian Recognition over Random Forest. The dataset named A-Z Handwritten Alphabets consists of 370,000 images were collected from the Kaggle repository. The suggested ML classifier model, namely Bayesian Regression and Random Forest is used in this phase. Nearly 10 iterative values from each group were taken for statistical analysis. For SPSS calculation done using G power by presetting value of 0.95 is used. The Bayesian Regression, which has an accuracy of 92.52%, outperforms the Random Forest technique, which has an accuracy of 83.42%, according to the data. Thus, it demonstrates that the Novel Bayesian Regression and Random Forest differ statistically significantly with p=0.004 (T test on independent sample p<0.05). The suggested technology gained more attention in the field of Handwritten Alphabet Recognition, and it can make number recognition easier. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A study and survey of chrome extension to detect phishing websites.
- Author
-
Machap, Kamalakannan, Murakami, Rin, and Rahman, Nor Azlina Abdul
- Subjects
- *
MACHINE learning , *INTERNET privacy , *PHISHING , *RANDOM forest algorithms , *SELF-efficacy - Abstract
This paper is focusing on the development of a efficient Chrome extension designed to detect phishing websites. Phishing attacks continue to pose a significant threat to online users, compromising their sensitive information and causing financial losses. The proposed extension utilizes random forest machine learning algorithm to analyze website URL, enabling the identification and alerting of potential phishing attempts. By integrating with the user's browsing experience, the extension provides a proactive defense mechanism, empowering users to make informed decisions and stay protected from phishing attacks. The research work effectiveness is evaluated through extensive testing. Overall, this research contributes to enhancing user security and privacy in the online ecosystem, with implications for both individual users and organizations concerned with cybersecurity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Mitigating post-harvest losses through IoT-based machine learning algorithms in smart farming.
- Author
-
Kanna, R. Rajesh, Priya, T. Mohana, Sivakumar, V., Nataraj, Chandrasekharan, and Thomas, Jishamol
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *INTERNET of things , *AGRICULTURE , *SELF-efficacy , *AGRICULTURAL technology - Abstract
This research paper explores the transformative potential of Internet of Things (IoT) technology in mitigating the longstanding issue of post-harvest losses within the agriculture sector. These losses, which encompass both quantitative and qualitative deterioration of food commodities from harvest to consumption, have posed persistent challenges, resulting in economic losses and food wastage. By delving into the current landscape of post-harvest losses and the application of IoT technology, the paper offers valuable insights into how IoT can be harnessed to reduce these losses effectively. It not only highlights the benefits and existing IoT solutions but also addresses the inherent challenges, providing recommendations for their resolution. Moreover, the research introduces a machine learning-based model, specifically Random Forest ML, to identify and prevent losses in tandem with IoT devices, empowering farmers with timely alert messages for informed decision-making, thus fostering a more sustainable and efficient agricultural ecosystem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Utilization of machine learning for predictive maintenance in improving productivity in manufacturing industry.
- Author
-
Agustina, Dina, Fitri, Fadhilah, Zilrahmi, Winanda, Rara Sandhy, and Sari, Devni Prima
- Subjects
- *
INDUSTRY 4.0 , *SUPPORT vector machines , *FEATURE selection , *RANDOM forest algorithms , *MACHINE learning - Abstract
The fourth industrial revolution, also known as Industry 4.0, is driven by the combination of IoT, AI, and big data in the manufacturing industry. One of the challenges for manufacturers is machine failures or downtimes, which can significantly hinder production processes. Predictive maintenance (PdM) is a solution to this problem and is widely used in the industry. In this study, Support Vector Machine (SVM) and Random Forest (RF) algorithms were used to predict the Overall Equipment Effectiveness (OEE) of a production machine, and the best model was selected based on accuracy using a confusion matrix. The study involved data preprocessing, exploratory data analysis, feature selection, and training the models to generate predictive classification models. The accuracy of the SVM algorithm was found to be 87%, while the RF algorithm achieved an accuracy of 91%. Therefore, the RF algorithm can be considered a better choice for forecasting OEE using these two features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Optimizing sales strategy in the Indian automobile industry: Predicting future car prices using machine learning and demographic data.
- Author
-
Khan, M. Reyasudin Basir, Islam, Gazi Md. Nurul, Ng, Poh Kiat, Zainuddin, Ahmad Anwar, Lean, Chong Peng, Al-Fattah, Jabbar, and Kamarudin, Nazhatul Hafizah
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *BUSINESS planning , *AUTOMOBILE sales & prices , *DECISION trees , *BIG data - Abstract
Demographics play a vital role in defining the size, distribution, and structure of a population. In the context of the automobile industry, business owners can leverage demographic insights to gauge the demand for vehicles and strategically align their sales efforts. Accurate sales forecasting is essential for long-term business strategy, providing manufacturers with a competitive advantage in optimizing production planning methods. This project utilizes large-scale automobile sales data to forecast car price variations in the coming months, considering factors such as purchase patterns, car models, and other relevant data. By analyzing different attributes from a past-year dataset, three machine learning algorithms: Linear Regression, Decision Tree Regression, and Random Forest Regression were employed to predict future car prices. The performance of each algorithm is evaluated using the R-squared value. Notably, the Random Forest regression model achieves a higher accuracy of 93%, outperforming both Decision Tree regression and Linear regression. These results demonstrate the suitability of Random Forest regression in predicting big data for the industry's future product production plan and overall strategy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Facial and vocal expression-based comprehensive framework for real-time stress monitoring.
- Author
-
Karlapudi, Ajay Pavan, Cherukuri, Krishna Bhargav, Badigama, Sai Kumar, and Irudayaraj, Juvvana
- Subjects
- *
MACHINE learning , *MENTAL health personnel , *RANDOM forest algorithms , *FACIAL expression , *SIGNAL processing - Abstract
Many people in today's society experience anxiety, depression, and heart disease as direct results of stress. An increasing number of people and medical professionals recognizethe need for efficient stress monitoring and management toolsto track and alleviate stress in real time. In order to fill this gap, the proposed stress monitoring framework makes use of vocal and facial expressions asindicators of stress. Frowziness, furrowed brows, and narrowed eyes are all telltale signs of stress, as changes in vocal pitch, volume, and rate. The proposed system uses signal processing techniques to extract stress-related features from these expressions and classify them as indicative of stress or not. Stress-related characteristics can be accurately classifiedusing machine learning models like neural networks, SVM, and Random Forest in this setting. The proposed system enhancesthe accuracy and robustness of the stress monitoring tool bycombining the results of multiple decision trees trained usingRandom Forest on different subsets of data and features. An intuitive interface that shows current stress levels has been created to make the system more approachable. The mental health field, the medical field, and related fields can all benefit from this interface. For instance, it can be used by mental health professionals to better diagnose and track their patients' stress levels over time, allowing for more precise and timely interventions. In addition, it can teach people how to controltheir own stress and show them how their thoughts, feelings,and actions all contribute to their health. Finally, the proposed stress monitoring framework providesa robust and effective method for tracking stress levels in real-time. The system's robustness and accuracy are due to the integration of signal processing techniques and machine learning algorithms, which can be applied in a number of fields, including healthcare and personal wellness. Designed with simplicity in mind, the system is a great resource for copingwith the stresses of modern life. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Predicting model for multiclass imbalanced data using pipeline sampling technique with dynamic ensemble selection.
- Author
-
Kamaladevi, M., Venkataraman, V., and Umamaheswari, P.
- Subjects
- *
MACHINE learning , *CLASSIFICATION algorithms , *RANDOM forest algorithms , *MEDICAL decision making , *WORK design , *BOOSTING algorithms - Abstract
Machine learning algorithms find patterns in data that helps to make better prediction and decision. In day to day life these algorithms help to make critical decision such as medical diagnosis, stock prediction fault detection etc., Classification algorithm predict labels from trained patterns. Imbalanced data classification distribute data unevenly among classes i.e majority class has high proportion data whereas minority class takes low proportion data. Common machine learning algorithms have poor prediction accuracy for minority class leads to data imbalance problem. Besides multi class imbalanced learning has greater challenges than binary classification. To address this issue, this works designed a new classifier model that combine pipeline sampling for resample the data and Dynamic ensemble classifier selection for prediction on multiclass imbalanced dataset taken form UCI repository. Performance are evaluated through the multiclass classification metrics such as Weighted average Accuracy, weighted average Precision and weighted average F1 score, Roc_AUC Score, cohen's kappa score, Mathew correlation co-efficient. A thorough empirical comparison is conducted to analyze the performance of proposed model with existing ensemble algorithm Gradient Boosting Classifier Bagging classifier and Random forest classifier. Dynamic ensemble algorithm outperforms the existing ensemble algorithm [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Rank based random forest model for gestational diabetes mellitus prediction.
- Author
-
Amarnath, Sumathi and Selvamani, Meganathan
- Subjects
- *
PREGNANCY complications , *FIRST trimester of pregnancy , *MACHINE learning , *RANDOM forest algorithms , *MATERNAL health , *PREGNANCY - Abstract
Gestational Diabetes Mellitus (GDM) is a type of diabetes that can occur during pregnancy in women. Due to its great prevalence and potential negative effects on the health of mothers and children, GDM is a challenge for world health. GDM is estimated to affect around 7-10% of pregnancies worldwide. The prevalence of GDM has enormously increased over the last few years, especially in developing countries like India. GDM is higher in women whose age is over 25, overweight or obese, have a family history of diabetes, GDM in a previous pregnancy, PCOS, etc. GDM-related complications during and after pregnancy are increasing across the Country. Early prediction and timely treatment allow women to avoid pregnancy-related complications. Early recognition of GDM during the first trimester of pregnancy improves maternal and fetus health and helps to overcome future diabetes in the mother and future generations. RF is an ensemble learning method that gives more accurate predictions than other machine learning algorithms, and its result is closer to the ensemble method. The Rank based Random Forest (RRF) approach is used to improve classification performance. Various weight values have been examined, and the chosen weight values (0.6, 0.4) are assigned to the dataset. The new features, Sum and Rank, were generated. Based on the threshold value, the diabetes prediction was performed on the dataset. The proposed RRF improved classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. MVP of the NBA season prediction using machine learning, data analysis and web scraping.
- Author
-
Parihar, Abhyuday Singh, Jaiswal, Amisha, and Malarselvi, G.
- Subjects
- *
BASKETBALL , *DATA analysis , *MACHINE learning , *RANDOM forest algorithms , *MULTICOLLINEARITY , *GAMBLING - Abstract
People try to anticipate the highest scorers in each season as the NBA (National Basketball Association) becomes more and more popularity for a variety of reasons, including gambling, online tournaments, or competitions. Numerous variables could have an impact on the outcome. The purpose of this article is to develop an effective yet straightforward model to forecast the most valuable player of the current season based on the results of the previous year, as well as to derive team abilities and distinct home advantages based on data from the previous 30 seasons. The model accurately predicts outcomes by looking at some key variables, such as team talent and home field advantage, even when contrasted to the bookmaker's point spread by looking at the games from 1991 to 2022. Even while the data still does not account for additional factors like injuries, fouls, and trades, the forecast produced by this model is still correct. In order to forecast the outcomes, this study uses the ridge regression and random forest regressor models. By comparing the standard errors of the results, the best model is created, and the top player for the upcoming year is determined, resulting in precise projections. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Customer purchasing intention prediction using machine learning.
- Author
-
Kulandaivel, Madhumitha, Agarwal, Rashi, and Singh, Aryan
- Subjects
- *
CONSUMERS , *SUPPORT vector machines , *K-nearest neighbor classification , *RANDOM forest algorithms , *IMPULSE buying , *MACHINE learning - Abstract
Online purchasing services are quickly displacing traditional or physical retailers. Online shopping websites have seen a considerable increase in customer confidence over time. On the one hand, the proliferation of these websites has encouraged fierce competition, which is excellent for customers because it leads to better and more affordable items. This makes online purchasing a fascinating subject for academic study. Retailers are observing a rise in online transactions from their clients as a result of how simple it is to use E-commerce platforms to make purchases. Predictive analytics can be applied to analyze these interactions and provide intricate behavioral patterns that assist businesses in better comprehending the needs of their clients. Online trust and previous online purchase experiences, along with elements like impulse purchase orientation, brand orientation, and quality orientation, all influence customer online purchase intention and shopping orientation. The objective of our project is to build a prediction model that will help in the increment of the profitability of a marketing campaign for a hypothetical corporation. By applying various preprocessing techniques on data and multiple feature engineering methods, along with four machine learning models we have tried to achieve our goal. The final model should allow the company to focus its advertising on customers who are most likely to respond to the campaign while excluding non-respondents. As a result, four different learning classifiers K-Nearest Neighbors Algorithm, Support Vector Machine Algorithm, Logistic Regression, and Random Forest were tested and optimized, and we have achieved the best classification performance using Random Forest. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A comparative study of machine learning techniques for chronic kidney disease prediction.
- Author
-
Jansi, K. R., Kosireddy, Kishore, and Kodumuru, Adarsh
- Subjects
- *
CHRONIC kidney failure , *FEATURE selection , *MACHINE learning , *RANDOM forest algorithms - Abstract
A major global health issue, chronic kidney disease affects millions of individuals. Machine learning algorithms have showed promisein predicting the risk of developing disease, and early detection of chronic kidney disease is essential to preventing or slowing down its course. Machine learning techniques for chronic kidney disease prediction are investigated. Also, the dataset was subjected to the appropriate feature selection technique. The wrapper technique, feature selection, and complete features were used, respectively, to calculate the output of each classifier. Logistic regression classifier, KNN, random forest classifier, Ada Boost, Gradient Boosting, Stochastic Gradient Boosting (SGB), Extra Trees Classifier, and LGBM Classifier are a few of the techniques and models that are examined for chronic kidney disease prediction. Extra Trees Classifier, LGBM Classifier are discussed. Additionally, different features and datasets used in chronic kidney disease prediction are analyzed. Finally, the performance of various machine learning models is evaluated, and future directions for chronic kidney disease prediction research are outlined. Overall, Machine learning algorithms have the potential to significantly improve early detection and management of CKD, thus reducing the burden of this disease on healthcare systems and individuals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Email spam detection and filtering using machine learning.
- Author
-
Asha, P., Siddhartha, Katakam, Manikanta, Kodati Naga Satya Sai, Gopi, Chilukuri, and Mayan, J. Albert
- Subjects
- *
SPAM filtering (Email) , *SPAM email , *MACHINE learning , *RANDOM forest algorithms , *PHISHING - Abstract
Phishing assaults, in which the perpetrator masquerades as a legitimate source in order to obtain confidential material, are now a serious threat due to the rapid growth of online consumers damaging one's credibility, costing one's money, or infecting one's computer with spyware and perhaps other viruses. Due to their capacity to sift through large amounts of data in search of patterns that can be used to make predictions, intelligent approaches like ML & DL were finding growing usage in the realm of cybersecurity. In this study, we explore the efficacy of using such clever methods to identify phishing websites. We utilized two different data sets and picked the most highly linked attributes, which included both content-based and URL-lexical/domain-based characteristics. After that, many ML models were implemented, and their relative efficacy was assessed. The results demonstrated the significance of selecting features in raising the quality of the models. In addition, the findings attempted to determine the most useful factors that affect the model when it comes to recognizing phishing websites. When it came to classifying data, the Random Forest (RF) algorithm performed best across the board. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Personality prediction based on Twitter stream.
- Author
-
Duggal, Sanchit and Kayalvizhi, R.
- Subjects
- *
RANDOM forest algorithms , *NATURAL language processing , *K-nearest neighbor classification , *PERSONALITY , *MACHINE learning , *SOCIAL media - Abstract
The trend of social media interaction, texts, and hidden identity communication has increased drastically over recent years. With the evolution of various methods for convenient techniques of messaging, social media has evolved at the top most position, for almost anyone in the modern world. A major drawback to this "Anonymous Identity" a communication, is that, it becomes very difficult to identify the actual reality of the person communicating from the other ends. This project is purposefully made to implement Natural Language Processing (NLP), Machine Learning (ML), Random Forest Algorithm, K-Nearest Neighbors (KNN) understand behavior traits and search techniques, interests, literal processing of the person, to identify his/her personality, by the input from the most popularly usage platform Twitter. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Rainfall prediction system using machine learning algorithms.
- Author
-
Brindha, R., Firoz, Sk. Md Khaja, and Reddy, C. Ramnath
- Subjects
- *
ARTIFICIAL neural networks , *BACK propagation , *RANDOM forest algorithms , *PRECIPITATION forecasting , *DECISION trees , *MACHINE learning - Abstract
Agriculture is vital for survival in India. Rainfall is crucial to agriculture. Predicting rainfall has become a significant issue recently. Rainfall forecasting helps people be prepared and informed of impending rain so they can take the necessary safety measures to preserve their crops from the rain. There are numerous methods available to predict rainfall. Predicting rainfall is where machine learning techniques are most beneficial. XGBoost, Decision Tree, Random Forest, Light BGM, and Logistic Regression are some of the most important machine learning algorithms. The linear and non-linear models, which are both often used, forecast seasonal precipitation. Logistic regression models are the initial models. Rainfall can be predicted when utilising Artificial Neural Networks (ANN) by employing Back Propagation Neural Networks, Decision Trees, and regression models like Random Forest. Due to the atmosphere's dynamic character, applied mathematics techniques are unable to guarantee reliable precision for a statement about precipitation. Regression may be used in the prediction of precipitation utilising machine learning approaches. The goal of this project is to provide non-experts with simple access to the methods and approaches used in the field of precipitation prediction as well as to compare different machine learning techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Cryptocurrency trend analysis and prediction.
- Author
-
Ramalingam, V. V., Taruun, P., and Raj, Sarang
- Subjects
- *
CRYPTOCURRENCIES , *TREND analysis , *SUPERVISED learning , *VALUE (Economics) , *PRICES , *MACHINE learning , *RANDOM forest algorithms - Abstract
This paper aims to predict the price of a cryptocurrency by taking into account various factors that impact its value. Initially, the researchers analyze the daily market trends and study the optimal conditions affecting the cryptocurrency's pricing. They collect data on multiple aspects of cryptocurrency pricing and payment networks that are recorded daily. Using this information, they intend to make the most accurate prediction of the cryptocurrency's daily price. To achieve this, we use a semi-supervised machine learning model called the Transformer package for sentimental analysis and store the data in a file. We use the Random Forest Classifier as a baseline training machinelearning model, and the XGBoost Classifier for improved accuracy and precision to predict the target value for the next day. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Comparison of machine learning techniques for weather prediction.
- Author
-
Kothari, Rohit, Kanchan, Anant, and Kanchana, M.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *LOGISTIC regression analysis - Abstract
The direction of numerical projections has actively been developed in recent years by an intensive investigation of processed observational data to identify trends of change and generate numerical data of climatic parameters for the future. Weather forecasting offers knowledge that people and organizations may use to lessen weather-related losses and improve social benefits. This study compares different machine learning models in an effort to identify which one provides the most accurate weather prediction data. In our proposed approach, we employed the models such as GridSearch Cross Validation, Random Forest, Logistic Regression, and Gaussian Naïve Bayes. The Random Forest Tree method, which has a very high accuracy of 99%, is judged to be the best for predicting the weather after evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Book recommendation system using machine learning.
- Author
-
Duhan, Anant and Arunachalam, N.
- Subjects
- *
RECOMMENDER systems , *MACHINE learning , *RESERVATION systems , *K-nearest neighbor classification , *RANDOM forest algorithms , *SUPERVISED learning - Abstract
Book suggestions may be used by users to explore and search for books on the internet. Given a vast number of items and descriptions that correlate to the user's requirements, our recommendation system will assist the user in picking the book that best matches the description. The following criteria impact recommendation algorithms: rating, reviews, description, and author. The effectiveness of Book Recommendation Systems is greatly dependent on the classifier utilized. As a result, developing an accurate classifier is critical for improving the performance of recommendation systems. Decision Tree Classifiers stand out among many supervised learning approaches and algorithms due to their high accuracy, fast classification speed, strong learning ability, and ease of design. The framework for a Decision Tree-Based recommendation system is proposed in this study. Among the other significant supervised learning techniques and algorithms are Naïve Bayes, Random Forest, Logistic Regression, and K-Nearest Neighbor. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Detection of Phishing websites using various machine learning techniques.
- Author
-
Eliazer, M., Baalaji, Haree, and Abhilash, Chalamcharla Naga
- Subjects
- *
PHISHING , *FEATURE selection , *WEBSITES , *RANDOM forest algorithms , *MACHINE learning , *PHISHING prevention - Abstract
Phishing is a type of cybercrime when unsuspecting individuals are persuaded to give crucial informationto the phishers through spammed messages and phony websites. This is how confidential information gathered is utilized to access money or take people. This study aims to build a phished channel using several machine learning methods. Classification is a machine learning approach that may be used to identify phishers. It creates and tests models using a number of setting combinations, contrasts different machine learning techniques, assesses the accuracy of a created model, and calculates a range of assessment metrics. In the current study, Nave Bayes (NB) and Random Forest (RF) are two machine learning techniques that are compared for their forecast performance, F1Score, precession, and recall. The approach is also improved by employing feature selection methods, which increases the accuracy in detecting phishing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Design of predictive analysis system for crops using machine learning approaches.
- Author
-
Shanmugam, S. and Bardhan, Harsh
- Subjects
- *
MACHINE learning , *DATA mining , *AGRICULTURAL forecasts , *RANDOM forest algorithms , *AGRICULTURAL productivity - Abstract
India depends substantially on agriculture. Agriculture output is impacted by seasonal, organic, and economic factors. The current demographics of our country make it challenging to estimate agricultural production. Long-term crop output forecasts enable farmers to prepare for activities such as selling and storage. Given the huge amount of data involved in crop output predictions, data mining techniques are the right approach. An approach known as data mining serves to harvest anticipated information from enormous datasets that have never been accessed previously. Data mining, which helps organizations foresee future trends and behavior, decisions may be made with knowledge. This research uses the Random Forest approach to provide a brief overview of the agriculture yield estimate for a certain area. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Lung cancer prediction using random forest, image segmentation and CNN.
- Author
-
Kaushal, Khushi and Balasaigayathri, B.
- Subjects
- *
RANDOM forest algorithms , *LUNG cancer , *MACHINE learning , *EARLY diagnosis , *IMAGE segmentation - Abstract
Cancer is one the most hastily increasing diseases that is one of the major reasons for death nowadays. Especially lung cancer, which is increasing at a very speedy rate. One defining feature of cancer is the rapid creation of abnormal cells that grow beyond their usual boundaries, and which can then invade adjoining parts of the body and spread to other organs; the latter process is referred to as metastasis. Many of the researchers are working on finding solutions to this vastly increasing disease, but it will surely take at least a decade to find a proper solution to such a problem. In the meantime, we can work on the early detection of this disease so that the treatment can be started at an early state and therefore save the precious lives of the patient. In this project, the aim is to work and develop different algorithms with the help of Machine Learning, and therefore deploy these models to predict if the patient is suffering from lung Cancer or not. In this project, I have implemented a Random Forest model using machine learning and analysis of lung cancer tissues which gave a good accuracy. I have also worked on CNN model and Image segmentation model and found their respective accuracies. Furthermore, I have compared each model based on their precision, f1-score, recall and accuracy and I have shown which model predicts most accurately that patients have lung cancer or not. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Detection of cyber-attacks using machine learning.
- Author
-
Bandi, Sunil Reddy, Vemula, Deshna, Amaran, Sibi, and SreeKumar, K.
- Subjects
- *
MACHINE learning , *SUPPORT vector machines , *RANDOM forest algorithms , *CYBERTERRORISM , *COMPUTER literacy - Abstract
Cyber-attack position can fete obscure assaults from systematize traffics and has been a prosperous implies of systematize screen. These days, being strategies for systematize eccentricity position are as a rule grounded on usual engine literacy models, similar as KNN, SVM, etc. In imbalanced arrange exertion, noxious cyber-attacks can constantly cover up in extensive totalities of true information. It shows a altitudinous place of covert and distraction in the internet, making it worrisome to guarantee the perfection and opportuneness of position. This paper investigates engine literacy and profound literacy for cyber-attack position in imbalanced arrange exertion. It proposes a new worrisome Set Examining program (DSSTE) computation to manage the course lopsidedness conclusion. To begin with, use the remodeled Closest Neighbor computation to insulate the imbalanced prepping set into the worrisome set and the simple set. Following, use the KMeans computation to squeeze the larger portion experiments within the worrisome set to dwindle the larger portion. To confirm the proffered program, we guide experiments on the archetypal discontinuity dataset NSL-KDD and the further up to assignation and complete discontinuity dataset CSE- CIC-IDS2018. We use prescriptive bracket models like Random Forest (RF), Support Vector Machine (SVM), XGBoost, MLP AlexNet, Mini-VGGNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Mobile advertisements click through rate prediction using machine learning.
- Author
-
Jacob, Jacinta Ann and Gnanavel, S.
- Subjects
- *
RANDOM forest algorithms , *DECISION trees , *ADVERTISING , *INTERNET advertising , *PREDICTION models , *MACHINE learning - Abstract
Online advertising has a big impact on whether your business succeeds or fails. Because of this, it is crucial to assess your advertisement's effectiveness before posting it online. Finding the Click-Through Rate (CTR) allows for this. Unfortunately, because you must gather user clicks before determining Click-Through Rate, this method is not environmentally friendly. In this situation, CTR prediction is helpful. For forecasting ad Click-Through Rate, user click data is a crucial source of information. Accurate Click-Through Rate prediction for contemporary e-advertising platforms is a challenging and crucial undertaking. Click-Through Rate prediction employs machine learning methods to determine how many times a potential consumer has clicked on an online ad. The more clicks an advertisement receives, the more successful it is. In this paper, we create a machine learning-based Click-Through Rate prediction model. Finding the Click-Through Rate allows for this. Unfortunately, because you must gather user clicks before determining Click-Through Rate, this method is not environmentally friendly. In this situation, CTR prediction is helpful. For forecasting ad Click-Through Rate, user click data is a crucial source of information. Accurate CTR prediction for contemporary e-advertising platforms is a challenging and crucial undertaking. Click-Through Rate prediction employs machine learning methods to determine how many times a potential consumer has clicked on an online ad. The more clicks an advertisement receives, the more successful it is. In this paper, we create a machine learning-based Click-Through Rate prediction model. The proposed study defines a model that produces accurate results with minimal use of computational resources. Three classification methods were used namely logistic regression, decision tree classifier and random forest classifier. Awasu dataset was used for analysis. The click data is generated over a 10-day period and sorted chronologically. This study answers the following question: Considering a user and the page they visit. What is the likelihood that they will click on a particular ad? The Random Forest classifier proved to be the best model with an accuracy score of 96%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Applying XGBoost, neural networks, and oversampling in the undernutrition classification of school-aged children in the Philippines.
- Author
-
Yiu, Mark Kevin A. Ong, Pastor, Carlo Gabriel M., Candano, Gabrielle Jackie C., Miro, Eden Delight P., Antonio, Victor Andrew A., and Go, Clark Kendrick C.
- Subjects
- *
SCHOOL children , *NUTRITIONAL status , *MACHINE learning , *MALNUTRITION in children , *MALNUTRITION , *BODY mass index , *DEMOGRAPHIC characteristics , *RANDOM forest algorithms - Abstract
In the Philippines, one in five school-aged children are affected by undernutrition, increasing their risk of physical and cognitive development. The Department of Education (DepEd) attempts to address this issue by targeting children with low body mass index (BMI) for their school-based feeding program (SBFP). However, challenges like inadequate measuring tools and supervision in low-resource communities have led to large discrepancies in the nutritional status of SBFP beneficiaries and non-beneficiaries. Siy Van et al. [1] addresses the difficulties associated with BMI by using machine learning (ML) to predict undernutrition among school-aged children based on socioeconomic and demographic characteristics, dietary diversity scores, and food insecurity scores. Their study compared several ML algorithms and found that their best performing model in terms of accuracy was a random forest (RF) model. However, the RF model had high sensitivity with low specificity, indicating a bias towards the positive class. This study aims to improve these results by employing oversampling techniques and other ML algorithms that were not used in the study. Using the same data set in [1], this study compares four machine learning algorithms (RF, XGBoost, DNN, and NNRF) to predict undernutrition among school-aged children, managing imbalanced data using three oversampling techniques (SMOTE, Borderline-SMOTE, and ADASYN). Eight independent classification tasks for predicting undernutrition were performed, and results showed that a RF-Borderline model performed the best in terms of Cohen's κ (0.3662), with an accuracy of 71.61%, sensitivity of 71.13%, and a specificity of 73.08%. While RF performed the best overall, XGBoost and NNRF performed better than RF on specific tasks. Notably, incorporating oversampling consistently enhanced model performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Analyzing technical, sentimental, and machine learning algorithms for stock market prediction.
- Author
-
Sharma, Akshat, Lohumi, Yogesh, Gangodkar, Durgaprasad, and Goyal, Ashtha
- Subjects
- *
PROCESS capability , *STOCK prices , *INVESTORS , *STOCKS (Finance) , *RANDOM forest algorithms , *MACHINE learning - Abstract
In the financial market, the use of machine learning algorithms for predicting stock prices is particularly advantageous. Stock prices, a critical factor influencing decisions made by traders, investors, and large firms, are subject to volatility influenced by various external factors, rendering prediction a complex endeavor. Accurate predictions, however, play a pivotal role in optimizing profitability within the realm of stock trading. Machine learning algorithms, endowed with the capacity to autonomously learn and improve, are well-suited to this task when integrated with sentiment and technical indicators. They possess the capability to process vast volumes of historical data, unveiling patterns that may elude immediate human perception. This paper presents an analysis that incorporates sentimental and technical analyses and employs machine learning algorithms viz. Logistic Regression, Random Forest, and Decision Tree for the prediction of stock prices. These algorithms, when combined with technical and emotional analysis, serve to efficiently forecast market behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Predictive modeling of car sales using random forest regression: Leveraging diverse features for accurate sales projections.
- Author
-
Swami, Bhavya, Singh, Prabh Deep, and Chauhan, Akash
- Subjects
- *
RANDOM forest algorithms , *PREDICTION models , *AUTOMOBILE sales & prices , *AUTOMOBILE detailing , *AUTOMOBILE industry , *SALES forecasting , *MACHINE learning - Abstract
This work addresses the urgent demand for exact sales estimates in the automotive industry by utilising Random Forest Regression, a potent predictive modelling technique. Compared to conventional forecasting techniques, this approach gives a more detailed grasp of market trends and consumer preferences. It works especially well in the dynamic automotive business, which is characterised by quick technological change and shifting consumer preferences. For manufacturers and dealerships to plan production schedules, manage supply chains, and handle inventory, it is stressed how important accurate sales forecasts are. In order to provide a thorough overview of the industry, the study takes into account a wide range of vehicle characteristics, including engine characteristics, safety features, decorative components, and technology improvements. The methodology uses the RandomForestRegressor model because it can handle complicated interactions and non-linear patterns, as well as data collection, preprocessing, feature engineering, model selection, training, and evaluation. The study also includes a user-friendly web interface that allows users to enter specific automobile details and get sales projections. The project aims to improve the accuracy and granularity of automobile sales predictions by merging cutting-edge machine learning algorithms with substantial car characteristic data, giving industry stakeholders useful insights for informed decision-making in this quickly evolving sector. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Urban SO2 levels prediction using machine learning.
- Author
-
Prabha, Gayathri Narayanan, Harshith, Akula Venkat, Rajesh, Uthara, Omanakuttan, Vishnu Kesav, and Shiju, Amrita Varshini
- Subjects
- *
CITIES & towns , *RANDOM forest algorithms , *DECISION trees , *MANUFACTURING processes , *REGRESSION trees , *MACHINE learning - Abstract
The pristine images of the skies are deceiving. The air living beings breathe lies invisible hazardous particles which can adversely harm them. One such compound is sulfur Dioxide. These gaseous compounds are born from the combustion of sulfur-containing fuels and various industrial processes. Accurate prediction of urban air quality is crucial for well-being. This research delves into a novel approach for forecasting SO2 concentrations in cities. The proposed method leverages readily available hourly data, over the span of a month, extracting informative trend attribute from past SO2. Through the application of Machine Learning Regression Models, this paper provides an innovative approach in predicting these SO2 values, Utilizing data from 5 diverse cities. We achieved our aim of identifying the best performing models, Decision Tree Regression and Random Forest regression, from all the models that were compared for this study according to the performance metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Fake review detection using machine learning.
- Author
-
Kumar, G. Kiran, Mahammad, Farooq Sunar, Chaitanya, V. Lakshmi, Moulali, Ch., Rajasekhar, K., Rangaiah, D. N. L., and ImamVali, D.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *K-nearest neighbor classification , *TRAIL running , *ONLINE shopping , *CONSUMERS - Abstract
The continuous of purchasing online products, For the buy of any product, we must see the reviews of that product after that we purchase the products. Review places the main role in online purchasing the product. Mainly positive reviews make customers buy a product online, similarly, the negative reviews make confusion about whether the customer buys a product or not. The product seller wrote a fake review to satisfy the customer. Thus, identifying fake reviews is a faithful and ongoing research area. Identifying a fake review depends on key features of the review but also it depends on behave of the reviews. This paper proposes a Machine learning accession to the identification of fake reviews. In addition to the features eradication process of the reviews. This paper appeal to some features engineering techniques to essence various behave of the reviews. This paper equates the performance of some trail runs on a real dataset of online shopping reviews with and without features estimations of users' behavior. In both cases, we compare the performance of some classifiers: K-Nearest Neighbor (KNN), Navy Bayes, Logistic Regression, and Random Forest algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Comparison of heart disease prediction between basics and a Hybrid machine learning (ML) technique.
- Author
-
Yu, Soo Jun and Dollmat, Khairi Shazwan
- Subjects
- *
HEART diseases , *FEATURE selection , *MACHINE learning , *SUPPORT vector machines , *K-nearest neighbor classification , *RANDOM forest algorithms - Abstract
Heart disease is one of the main causes of death worldwide. Machine learning has been discovered tobe useful in creating predictions from massive amounts of data. We've also seen machine learning techniques used in recentadvances in a variety of fields such as medical, finance and even retail. In this research, we used a few traditional ML techniques which is K-Nearest Neighbors (K-NN), Support Vector Machine (SVM), Naïve Bayes, Decision Tree, Random Forest, Logistic Regression, and a Hybrid ML technique that combines Random Forest, SVM and K-NN. We achieved a great performance level with 63.33% accuracy rate of using the hybrid ML model in predicting heart disease. Before applying machine learning techniques, we used feature selection including BORUTA and RFE to identify the Top 10 variables from the dataset to compare with non-using feature selection to build an effective predictive model. Other thanthat, several performance metrics are used to evaluate the results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Predicting graduate-on-time using machine learning.
- Author
-
Ahmad, Intan Khairina Adlina, Ting, Choo-Yee, Goh, Hui-Ngo, Quek, Albert, and Cham, Chin-Leei
- Subjects
- *
GRADUATE education , *RANDOM forest algorithms , *DECISION trees , *GRADE point average , *REGRESSION analysis , *MACHINE learning - Abstract
Predicting academic performance is a crucial task for educators and institutions because it enables the early identification of at-risk students and helps provide targeted interventions to improve their academic outcomes. Existing research often focuses on predicting academic performance using CGPA; less work, however, has used graduate-on-time (GOT) as a dependent variable. In this study, the objective was to (i) determine the optimal set of features that influence the predictions, (ii) construct a predictive model that predicts academic performance focusing on graduating on time (GOT). The data, obtained from the Ministry of Higher Education Graduates Tracer Study, contains information about graduated MMU students. It has 2382 entries and 95 columns, which include records of Graduate On Time (GOT), Cumulative Grade Point Average, Estimated Terms, and many more. This paper employed machine learning techniques such as Gaussian Naive Bayes, Decision Tree, Logistic Regression, Random Forest, Gradient Boosting, Stacking Ensemble methods and Multilayer Perceptron. The results showed that among all the techniques, the Ensemble method model exhibits the highest accuracy (84.03%), precision (84.86%), and recall (90.57%), as well as a f1-score of 87.62%. The Random Forest model and the Logistic Regression model both have a f1-score of 84%, which comes in second place after the strong results of the Ensemble technique. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Predicting ethylene production in oxidative coupling of methane: A machine learning approach.
- Author
-
Dhivyaprabha, T. T., Susan, M. B. Jennyfer, Lalitha, P., Subashini, P., Kamilini, D., and Vijayabhanu, R.
- Subjects
- *
MACHINE learning , *ETHYLENE , *NATURAL gas , *SUPPORT vector machines , *RANDOM forest algorithms , *OXIDATIVE coupling - Abstract
Oxidative Coupling of Methane (OCM) is a chemical reaction which directly converts natural gas, primarily consisting of methane into value-added chemicals, specifically ethylene. The OCM process faces challenges in the limitations of yield and high separation cost in transforming natural gas into useful chemical products. Ethylene serves various purposes, including the production of fabricated plastics and is widely used as a raw material. This paper focuses on machine learning algorithms to predict ethylene production using a catalysis dataset. So initially the high ethylene yield is predicted from the 18 catalysts collected from the dataset. The clustering algorithm called K-Mean clustering and Agglomerative clustering is used to identify the high-yield ethylene. Among 18 catalysts, 9 elements are identified as the largest amount of ethylene from the catalysis data. The prediction models are used to predict the large yield of ethylene from the catalysis. The prediction model like Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Boosting (GB) are compared to identify the best model for the prediction of ethylene yield. Based on the analysis, the agglomerative clustering along with the Random Forest predicts the higher ethylene composition from the OCM dataset. The ethylene production of the Magnesium (Mg), Barium (Ba) and Zirconium (Zr) chemical compounds along with oxygen and methane is predicted with the error rate of 0.17, 0.89 and 0.99 respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Stellar classification using supervised machine learning.
- Author
-
Swathi, S., Saranya, S., Vijayalakshmi, K., and Aswini, V.
- Subjects
- *
SUPERVISED learning , *NAIVE Bayes classification , *MACHINE learning , *RANDOM forest algorithms , *DECISION trees , *K-nearest neighbor classification - Abstract
In this work, objects from Sloan Digital Sky Survey Data Release 17 (SDSS DR17) [1] were classified as dwarfs or giants using learning methods. The classification was created by supervised learning. Machine learning algorithms like K Nearest Neighbors, Naive Bayes Classifier, Support Vector Classification and Random Forest were developed in addition to Decision Trees. The algorithm that performed the worst was naive Bayes. Random Forest, which performed the best, was able to successfully classify the cases in the dataset that were marked as stars. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. An intelligent intruder framework for cyber-attacks using machine learning techniques.
- Author
-
Sarla, Pushpalatha, Jamalpur, Bhavana, and Chandhar, K.
- Subjects
- *
DENIAL of service attacks , *MACHINE learning , *INTERNET protocol address , *RANDOM forest algorithms , *COMPUTER network security - Abstract
Attacks like Distributed Denial of Service (DDoS) pose a major threat to the network's security. Many different firms' servers have fallen prey to such unusual types of attacks. These attacks from the many bots under the direction of the botmaster (cracker) can possibly result in the victim's computational and communication capabilities being severely impaired. To create an effective NIDS, the researchers used datasets that were made accessible to the public. Existing studies' datasets, however, are insufficient since they exclude the most commonly used protocols, such as DHCP, which is essential to network architecture. In a network, IP addresses and other crucial network setup settings are dynamically assigned via the Dynamic Host Configuration Protocol (DHCP). Two research inquiries serve as the foundation for this work: 1) what algorithm will get the best results for identifying Distributed Denial of Service attacks? 2) How accurate would these algorithms be if they were trained on real-world data? We exceeded 96% accuracy with the Random Forest Classifier, and we confirmed our findings using two measures. The results were also compared to other works to ensure that they were adequate. We also provide a thorough study to back up our conclusions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Spectacles prediction using machine learning algorithms.
- Author
-
Amulya, Soma, Sudeeptha, K., Sathvika, G., and Veeramsetty, Venkataramana
- Subjects
- *
RANDOM forest algorithms , *DECISION trees , *LOGISTIC regression analysis , *EYESTRAIN , *REGRESSION analysis , *MACHINE learning - Abstract
Eye defects in teenagers is becoming more prevalent these days, especially due to the pandemic situation. Digital eye strain due to excessive usage of gadgets can result in a prescription for spectacles, which is a basic test in any ophthalmology clinic, but due to the current situation of the pandemic, visiting a clinic might be risky. In this project, we propose an AI (ML) model for spectacles prediction based on a few input parameters through an app. AI machine learning models are used widely in the medical sector. Here, we applied three different machine learning models (logistic regression algorithm, decision tree algorithm and random forest algorithm) to extract maximum accuracy prediction using the dataset we collected from teenagers. We achieved highest accuracy of 97% using logistic regression model and made a website to predict spectacles. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Prediction and analysis of cricket batsman performance using neural networks.
- Author
-
Sreenivasgoud, Pulluri, Sirajuddin, Mohammad, Sridhar, Kankanala, Sagar, Rachoori, and Venkatesh, Thudum
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *CRICKET (Sport) , *AWARENESS advertising , *RANDOM forest algorithms - Abstract
Cricket is the more popular game with uncertainty like a single ball can change the results. Players must concentrate more on t20 and One-Day matches without diverting their presence of mind. The sponsors invest more money in the game and players for their brand advertising. The match depended on various parameters such as player performance, pitch, team binding, etc. So, selecting a player and forming a team is essential for sponsors. Predicting player performance based on previous records will take a lot of work, so many researchers have taken forward steps to analyze the data. Machine learning algorithms, such as Linear Regression, SVM, Random forest, etc., have been implemented in various research studies. In our study, we have implemented the Artificial Neural Networks model to predict the player batting performance based on the koggle.com dataset. According to our model, ANN has provided 86.21% accuracy in predicting a player. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Prediction of cardiac attack using ensemble learning.
- Author
-
Eranki, Kiran L. N., Kalerua, Sai Chandu, Pilli, Tejaswi, Alajpur, Chandana Priya, and Tejaswini, Velpula
- Subjects
- *
MACHINE learning , *DIETARY patterns , *SUPPORT vector machines , *K-nearest neighbor classification , *FEATURE selection , *RANDOM forest algorithms , *DATA privacy , *MYOCARDIAL infarction , *HUMAN fingerprints - Abstract
The cardiac attack is considered to have high mortality due to lifestyle habits and dietary regimes. Prediction of heart disease based on medical history of patients is critical. As privacy of sensitive data is also of concern, digital fingerprint collected from vast amount of data generated by the healthcare sector is made easier with the help of Machine Learning (ML). With advances in technology application of internet of things (IoT) in healthcare is also gaining popularity. Application of algorithms to predict heart attack is also evident. In the current work, we propose to predict cardiac arrest using medical history for the patients data and select the features best suitable for based on a comparative analysis among the ML models. Different feature selection metrics and classification models are used for prediction. ML algorithms like Regression, Decision tree, Random Forest, Support vector machine, Naive Bayes, and K-nearest neighbor have been used. Our results show the importance of medical history in prediction of cardiac disease and diagnosis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Mineral identification on hyperspectral imagery of rock samples using machine learning.
- Author
-
Qudsi, Izzul, Fakhrurrozi, Afnindar, Mordekhai, Gustandika, Praviandy, and Noor, Muhammad Rifat
- Subjects
- *
CONVOLUTIONAL neural networks , *SUPERVISED learning , *RANDOM forest algorithms , *MINERALS , *IDENTIFICATION , *GEOLOGICAL mapping , *MACHINE learning - Abstract
Mineralogy plays an important role in the early stages of earth resource exploration. In the past decades, geologists started to use hyperspectral in various scales of mineral identification, from hand specimen samples to airborne geological mapping surveys. Performing conventional mineral classification with hyperspectral methods is time-consuming and requires significant human intervention, from the endmember selection for each mineral to validating the classes manually in laboratory analysis. In this study, we introduced supervised machine learning algorithms to stimulate the mineral mapping process of a large dataset of core data. Three hyperspectral imageries of milled pebbled samples were used where one of the samples was pre-identified and used for training machine learning models to identify the mineralogy of the other two samples. The samples contain four minerals; namely Muscovite, Tourmaline, Illite and High-Crystallinity Illite, that will be auto-identified by the machine learning algorithms. In this study, Random Forest and Convolutional Neural Network were the selected algorithms to perform the mineral identification. Both algorithms produce high-accuracy mineral maps compared to the existing mineral maps from the previous study. The Convolutional Neural Network struggled to identify High-Crystallinity Illite, whilst Random Forest succeeded in separating High-Crystallinity Illite from other minerals. Thus, the Random Forest algorithm produces higher accuracy results. The proposed workflow provides a time-efficient alternative methodology for further mineral mapping process on a larger scale dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Rate of penetration prediction using machine learning.
- Author
-
Murti, Gendro Wisnu and Wardana, Raka Sudira
- Subjects
- *
RANDOM forest algorithms , *FORECASTING , *PREDICTION models , *ARCHITECTURAL design , *MACHINE learning - Abstract
The rate of penetration (ROP) prediction is carried out using a recorded drilling dataset from 02-Well. The prediction approaches use machine learning random forest regressor model and artificial neural network, especially the MLP regressor. The aim is to make the best machine learning model accurately predict the ROP parameter at 02-Well. The method used is designing the architecture of the machine learning model, which is divided into five stages: exploratory data analysis, data pre-processing, prediction and modeling using the selected algorithm, hyper-parameter tuning, and model evaluation. The 02-Well dataset would be divided into a 70% training set and a 30% test set as the base case. The model evaluation results show that modeling using a random forest regressor has a mean absolute percentage error (MAPE) score of 19.81%, which belongs to the "Good Forecasting" criteria. Meanwhile, modeling using MLP regressors has a MAPE score of 22.84% with the "Reasonable Forecasting" criteria. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. A comparative analysis for detecting fake news using supervised learning algorithms.
- Author
-
Dixit, Dheeraj Kumar, Bhagat, Amit, and Dangi, Dharmendra
- Subjects
- *
FAKE news , *COMPARATIVE studies , *RANDOM forest algorithms , *DECISION trees , *DATA scrubbing , *MACHINE learning - Abstract
Fake news is a type of essential problem on social media. The rapid circulate of fake news has an ability for disastrous influences on human beings and the society. Thus, it becomes more useful to detect fake news on social sites or internet. Recently, many models have been developed to detect the fake news from the publicly available datasets. In this paper discussed the various machine learning algorithms and their performance analysis on two different news data. The proposed framework contains two step process. In the first step, clean the data and extracted features by TF-IDF and Hashing Vectorizer. In the second step, machine learning algorithms (Logistic regression, Decision Tree, Random Forest, Multinomial Naive Bayes, and Passive Aggressive classifier) have been applied in an effective and efficient manner. Comparative analysis revealed that the optimal performance is achieved by the Logistic regression and Passive aggressive classifier, 95.45% and 97.35% respectively for two public datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. A prediction of operating systems vulnerabilities using machine learning algorithms.
- Author
-
Barge, Komal and Kamble, Vijaya
- Subjects
- *
RANDOM forest algorithms , *SUPPORT vector machines , *MACHINE learning , *LOGISTIC regression analysis - Abstract
Cybercrime has been on a massive increase for several years. As more individuals conduct business and live their lives online, more criminals are turning to the internet to steal. With the developments in cybersecurity, adoption of best practices, cybersecurity awareness initiatives, and increased regulations and partnerships between businesses and governments to resolve the problems, cybercriminals may be feeling the pressure. In this paper, we are going to predict operating system vulnerabilities using machine learning algorithms. And evaluate their accuracies through training them. The aim of the paper is to use Machine Learning algorithms to predict with accuracy operating systems vulnerabilities. Different ML algorithms have been analyzed, such as Logistic Regression, Support Vector Machine and Random Forest. And, out of all of these algorithms, Random Forest had the highest accuracy of 99.33%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Efficient prediction in forest fire alert system using logistic regression and a novel tree specific random forest classifiers.
- Author
-
Krishna, A. Jaya and Padmakala, S.
- Subjects
- *
REGRESSION trees , *RANDOM forest algorithms , *LOGISTIC regression analysis , *FOREST fires , *MATHEMATICAL logic , *FOREST fire management , *FOREST fire prevention & control , *MACHINE learning - Abstract
The accuracy and precision with which Logistic Regression (LR) and Random Forest can predict forest fires are compared in this study (RF). The List Consists Of The Random Forest method outperforms other machine learning techniques in fire detection speed and accuracy. To better detect forest fires, a system that compares Random Forest with Logistic Regression was developed. Using G power, we determined that 28 people were needed for each experiment. The sample size for the pilot study was calculated to be 95 percent confident. The dataset shows that the Random Forest (RF) model is 95 percent accurate in predicting forest fires, whereas the Logistic Regression (LR) model is 62 percent accurate. The p-value for Logistic Regression is 0.005, but Random Forest is a superior classifier. In terms of accuracy and precision, Random Forest is superior to Logistic Regression. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Improving the accuracy of predicting the cybercrime using novel random forest algorithm over support vector machine.
- Author
-
Kumar, K. Ajith and Kumaran, J. Chenni
- Subjects
- *
RANDOM forest algorithms , *SUPPORT vector machines , *MACHINE learning - Abstract
The purpose of this study was to assess the accuracy of cybercrime predictions provided using Novel Random Forest and Support Vector Machine. Material and Procedure: Cybercrime predictions were made using the new random forest (N=10) and the support vector machine (N=10), which were then analysed. Random forest trumps SVM in terms of precision (by a margin of 84 percent to 85 percent). 81 percent is the percentage. The advanced random forest classifier performed better than the conventional SVM classifier. There is a significant difference in accuracy between the two approaches (p>0.005). The development of the new cybercrime prediction system used machine learning. This novel random forest technique outperformed SVM. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Comparison of novel reinforcement learning with random forest algorithm to improve prediction rate of social vulnerability in web application.
- Author
-
Koushik, P. N. N. J. and Rama, A.
- Subjects
- *
RANDOM forest algorithms , *MACHINE learning , *WEB-based user interfaces , *SOCIAL prediction - Abstract
The prediction rate of social vulnerability in a web application based on a comparison of the F1-Measure of novel Reinforcement learning and the Random forest method. Components and Techniques: Outcome F1-measure and the Random Forest Algorithm with a New Reinforcement Learning Algorithm (83.85 percent). There are a total of 55,336 samples to be analysed, split evenly between two groups. Discussion and Results With an F1-measure score of 83.85 percent, the novel Reinforcement learning algorithm outperforms the more traditional Random forest approach for detecting social vulnerability (74.58 percent). In this study's final analysis, the Novel Reinforcement Learning algorithm was shown to be more accurate in predicting social vulnerability than the Random Forest technique. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Medical image prediction for diagnosis of breast cancer disease comparing the machine learning algorithms: SVM, KNN, logistic regression, random forest and decision tree to measure accuracy.
- Author
-
Dinesh, Paidipati, Vickram, A. S., and Kalyanasundaram, P.
- Subjects
- *
RANDOM forest algorithms , *DECISION trees , *COMPUTER-assisted image analysis (Medicine) , *CANCER diagnosis , *DIAGNOSTIC imaging , *LOGISTIC regression analysis , *MACHINE learning - Abstract
The study's primary objective is to compare the efficacy of the state-of-the-art SVM method for image prediction with that of KNN, Logistic Regression, Random Forest, and Decision Tree. The UCI Machine Learning Laboratory provides a total of 569 samples. Groups like SVM, KNN, Decision Tree, Random Forest, and Logistic Regression are used to the samples after they have been separated into benign and malignant cells so that their respective performances may be compared. G power calculation is used to determine how many samples are needed for this analysis. The maximum acceptable error is set at 0.5, and the minimum power of analysis is set at 0.8. Predictions made using Logistic Regression appear to have a higher accuracy(95%) than those made using SVM, KNN, Decision Tree, or Random Forest(92%,90%,85%, and 91%). This proposed system has a probability importance of 0.55. The Wisconsin dataset was used to compare Logistic Regression against SVM, KNN, Decision Tree, and Random Forest for the detection of breast cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Comparative study on improved support vector machine and random forest classifier for efficient classification of music genre based on accuracy.
- Author
-
Rajyalakshmi, Kona and Gunasekaran, M.
- Subjects
- *
SUPPORT vector machines , *RANDOM forest algorithms , *POPULAR music genres , *DIGITAL music , *SIGNAL processing , *MACHINE learning - Abstract
The purpose of this study is to apply machine learning methods to the problem of classifying musical genres. Here, signal processing is used to extract the rhythm pattern feature, which is then used in conjunction with machine learning algorithms to provide a multiclass categorization of musical genres. A preexisting internet music collection is used to generate two distinct datasets, one with undersampling and the other with oversampling to achieve class balance. Two types of models were compared for their efficacy in this research. The first model is an enhanced support vector machine that is fully trained to identify the musical genre of an input signal from its spectrogram. The Random Forest Classifier is the second model. G power for this research is 80%; there are two groups in total, and each has a sample size of 20. Experiments performed on the audio dataset show that an assembler classifier that takes use of both techniques achieves an AUC of 0.894. According to the findings, the support vector machine method provides a higher accuracy (69.24%) than the Random Forest Classifier (64.26%). Independent sample T-test significance level of P=0 (P 0.05, 2-tailed). The findings showed that the accuracy of the music genre classification using the enhanced support vector machine with an unique Rhythm pattern feature method was higher than using the Random Forest Classifier. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Classification of fire and smoke images using random forest algorithm in comparison with decision tree to measure accuracy, precision, recall, F-score.
- Author
-
Reddy, B. Haranadh, Vickram, A. S., and Karthikeyan, P. R.
- Subjects
- *
DECISION trees , *RANDOM forest algorithms , *TOBACCO smoke , *SMOKE , *FIRE detectors , *MACHINE learning , *CLASSIFICATION - Abstract
The purpose of this research is to compare and contrast how well the random forest algorithm and the decision tree method perform in classifying fire and smoke photographs. With a g power of 0.8 and an alpha of 0.05, we find that out of a total of 4000 photos, 2000 depict fire and the remaining 2000 depict smoke. In this case, we split the dataset into a training set (n=3200, or 80%) and a validation set (n=800, or 20%). Classification tasks are executed with the help of the Sklearn machine learning package in Python. Precision, recall, f score, and accuracy numbers are some of the metrics used to measure an algorithm's effectiveness. When compared to the Decision Tree algorithm's 90.54, 90.00, 90.9, and 89.70 percent accuracy, F-score, recall, and precision, Random Forest achieved 97.42, 97.28, 97.43, and 97.15 percent values, respectively (p0.001(2-tailed)). The results of this research show that the random forest method outperforms the decision tree algorithm by a wide margin. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Air pollution hotspot identification to prevent post effects of pollution using naive bayes over random forest.
- Author
-
Sairam, M. Jagadeesh and Nagaraju, V.
- Subjects
- *
RANDOM forest algorithms , *AIR pollution , *MACHINE learning , *POLLUTION , *IDENTIFICATION , *STANDARD deviations - Abstract
In this study, we compare Naive Bayes (NB) and Novel Tree Specific Random Forest for their ability to forecast the downstream impacts of air pollution (RF). Naive Bayes has a greater recognition rate than other machine learning algorithms and is thus more effective in detecting air pollution. Naive Bayes over Novel Tree Specific Random Forest is presented and developed as a framework for identifying air pollution to mitigate its aftereffects. Using G power, we determined that we needed 96 participants in each group. The sample size, consisting of 48 individuals per group, was determined using a pretest power of 95%. The goal of this study is to enhance the performance of the algorithm throughout the harvesting process. Average detection accuracy is within one standard deviation, suggesting that the suggested model is both effective and quicker than the current approach. The level of significance is 0.006 (p0.05). Findings highlight Naive Bayes' superior accuracy and precision compared to Random Forest. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Evaluation of income class classification using random forest algorithm.
- Author
-
Zaid, Mohamed, Sathish, T., and Shibi, C. Sherin
- Subjects
- *
RANDOM forest algorithms , *INCOME , *LOGISTIC regression analysis , *STATISTICS , *MACHINE learning - Abstract
The goal of this study is to utilise machine learning classifiers to categorise people's income levels, namely those with and without a household income of $50,000 or more. We examine two common algorithms in this study: Random Forest and Logistic Regression Algorithm. This experimental inquiry makes use of the Adult Income dataset, which has 32516 items. An experiment with N=10 repetitions was carried out in order to discriminate between income groups of more than $500,000 and less than $40,000. For statistical reasons, the G-power test is run at 80%. The trials demonstrate that the Random Forest Algorithm has an average accuracy of 84.1840, whereas the Logistic Regression Algorithm has an average accuracy of 79.6410. Statistical analysis reveals a substantial difference in accuracy between the two techniques, with a p-value of 0.032 for the t-test on independent samples. The results reveal that the Random Forest Algorithm outperforms the Logistic Regression Algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.