728 results on '"RANDOM forest algorithms"'
Search Results
2. Improved accuracy in automatic deduction of cyberbullying using recurrent neural network compare accuracy with random forest.
- Author
-
Reddy, Babu and Ramkumar, G.
- Subjects
- *
CYBERBULLYING , *RANDOM forest algorithms , *SAMPLE size (Statistics) , *RECURRENT neural networks - Abstract
This study compares the accuracy of a random forest classifier with a revolutionary recurrent neural network-based cyber bullying deduction. Supplies and Methods: Two businesses employ this strategy. This research used the Random Forest technique in Group 2 & Recurrent Neural Networks in Group 1 to analyze 20 samples from each group in order to evaluate the validity of the new Deduction of Cyber bullying. G power was used to calculate the sample size, and the pretest power was fixed at 80%. Findings: Random forest has an accuracy of 92.36%, whereas RNN has 94.55%. A statistically significant difference of 0.18 (p>0.05) has been discovered. To sum up, recurrent neural network methods outperform random forest classifiers in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Improving the accuracy of intrusion detection system in the detection of DoS using Naive Bayes with random forest feature elimination and comparing with Naive Bayes without feature elimination in wireless ad hoc network.
- Author
-
Kumar, A. Senthil and Nagalakshmi, T. J.
- Subjects
- *
AD hoc computer networks , *RANDOM forest algorithms , *STATISTICS , *INTRUSION detection systems (Computer security) - Abstract
Developing and evaluating a new intrusion detection system (IDS) that combines random forest component removal with naive bayes is the main goal of the study. We used the NSL-KDD Dataset to evaluate the system and methodology. In each group, 19 out of the 38 samples that were gathered were used. We performed the statistical analysis using the SPSS software. For both groups, an independent sample T test was run at an accuracy level of 0.002 (p<0.05). When features are eliminated from an Innovative Naive Bayes Intrusion Detection System (IDS), the average accuracy is 0.7432; when features are maintained, the average accuracy is 0.5874. Thus, there is no appreciable difference in the two groups' mean accuracy when employing the Innovative Naive Bayes with Random Forest Feature Elimination Intrusion Detection System (IDS). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Air prediction analysis based on accuracy for air quality index using modified random forest novel technique in comparison with support vector regression.
- Author
-
Maheshwar, Thaginapelli and Sharma, K. Jai
- Subjects
- *
AIR quality indexes , *AIR analysis , *RANDOM forest algorithms , *AIR quality , *CITIES & towns - Abstract
A prototype model to predict the pollutants level based on air quality index using Modified Random Forest Novel Technique (MRFNT) in comparison with Support Vector Regression (SVR). The proposed model, MRFNT uses the bootstrap and bagging technique on the nonlinear data, the pollutants level is predicted accurately. To predict the air quality, the Air Quality Index CPCB dataset was collected from the National Air Quality Index aided by the Indian government. For better proficiency, bagging error and unbiased data points pre-processed to remove outliers and mix up with continuous categorical variables. The statistical analysis was performed using sample size 20 for each group to perform comparison. From the observed results, MRFNT has 99.96% and SVR has 98.36% accuracy and has better significance of 0.001 for the Confidence Interval (CI) 95% and Significance Value (p<0.05). Based on the result comparison, it shows that MRFNT significantly better than SVR in air quality prediction in various cities in India. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Accuracy improvement of flooded area detection from satellite images using novel convolutional neural network in comparison with random forest.
- Author
-
Vamsi, Cheruvupalli and Amudha, V.
- Subjects
- *
CONVOLUTIONAL neural networks , *RANDOM forest algorithms , *SAMPLE size (Statistics) - Abstract
Examining how well the Random Forest algorithm and the Novel Convolutional Neural Network detect floods is the main objective of this study. With 70 samples used for training and 48 samples for testing, the total number of samples employed for this research is 118. An overall sample size was calculated using a 0.05 threshold of significance and an 80% power. The accuracy of the Novel Convolutional Neural Network is 85.25%, whereas the Random Forest only manages 77.12% accuracy at a significance level of 0.010 (p < 0.05). When it came to predicting the identification of flooded areas, the Random Forest performed far worse than the Novel Convolutional Neural Network in this investigation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Comparing random forest algorithm and gradient boosting algorithm for efficient detection of arrhythmia.
- Author
-
Harshitha, L. and Raja, A.
- Subjects
- *
BOOSTING algorithms , *RANDOM forest algorithms , *SENSITIVITY & specificity (Statistics) , *POWER tools , *SAMPLE size (Statistics) - Abstract
This paper aims to compare the efficiency of networks using random forest and gradient boosting algorithms for efficient detection of Arrhythmia. The G power tool is used to calculate the samples with parameters such as alpha value is 0.05, power is about 80% and unity environment ratio. Sum of 602 samples is collected from two arrhythmia disease datasets available in Kaggle. The total training sample is divided as training data of n = 451 (75%) and test data of n = 151 (25%). Accuracy, precision, specificity and sensitivity values are estimated to evaluate the overall performance of the system using random forest. Result G power tool is used for calculating sample size which is 0.8. Random forest achieved values of accuracy, precision, sensitivity and specificity as 88.88%, 85%, 94% and 80% respectively with better significance (p<0.05) compared to 76%, 87%, 56%, 90% by gradient boosting algorithm. Conclusion: The performance of random forest is significant in differentiating abnormality from normal controls. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Real time loan prediction system using novel logistic regression algorithm compared random forest algorithm for increased accuracy rate.
- Author
-
Charan, B. V. Sai and Logu, K.
- Subjects
- *
RANDOM forest algorithms , *LOANS , *REGRESSION analysis , *FORECASTING , *LOGISTIC regression analysis - Abstract
Continuous loan forecasts are made more accurate in this study by using new logistic regression (NLR) and random forest techniques. These are estimations that are close to what is reported. By playing about with the NLRA value, we may attempt to mimic the pH-altering effects of a 10-number random forest method and a 10-number new logistic regression calculation. Twenty instances were employed for this study, with Gpower 80% for both groups used to choose the test size. The basic accuracy achieved by NLRA is greater (83.29% vs. 81.64%), when comparing the two approaches. When comparing the new logistic regression model with the random forest model, a statistically significant difference was observed (p<0.05, 2 tailed, 0.003). Compared to random forest, the novel logistic regression approach outperforms it when it comes to predicting the results of loans. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Comparison of SAR image classification using novel convolutional neural network with random forest for enhancing accuracy.
- Author
-
Prasad, C. H. Vaikunta Krishna and Amudha, V.
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *RANDOM forest algorithms , *SYNTHETIC aperture radar , *SAMPLE size (Statistics) - Abstract
The primary goal of this study is to compare a new convolutional neural network with a random forest model in order to improve the accuracy of SAR picture categorization. Approach and methodology: Twenty samples from the dataset were extracted using the Kaggle platform. Twenty datasets were used; ten were utilized for training purposes, and ten were reserved for testing. We use MATLAB to categorize synthetic aperture radar images and evaluate the detection accuracy of random forests and new convolutional neural networks. After zeroing in on an alpha value of 0.05 and an 80% power, the G power is used to calculate the sample size. Using MATLAB as our model, we found that the unique convolutional neural network technique achieved a classification accuracy of 95.31%, which was higher than the random forest model. The random forest model achieved a mere 71.06%. A level of significance of 0.038 (p<0.05) was achieved, as per the SPSS analysis. Finally, when it comes to identifying synthetic aperture radar pictures using the supplied dataset, the innovative convolutional neural network achieves better accuracy than the random forest. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Accuracy improvement of flooded area detection from satellite images using random forest in comparison with novel K-Nearest Neighbours.
- Author
-
Vamsi, Cheruvupalli and Amudha, V.
- Subjects
- *
RANDOM forest algorithms , *SAMPLE size (Statistics) , *NEIGHBORS - Abstract
Using Random Forest and Novel K-nearest neighbours, this study aims to find flooded regions and evaluate its effectiveness. In all, 118 samples made up the data set for this research, with 70 serving as training samples and 48 as testing samples. For the purpose of determining the sample size, a combined 80% G-power and 0.05 alphas were used. The random forest approach achieves an accuracy rate of 77.12% while the K-nearest neighbours method achieves an accuracy rate of 80.26%, given a significance threshold of 0.032 (p<0.05). It seems that Novel k-nearest neighbours perform much better than Random Forest when it comes to recognizing regions that have been flooded. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Detection of bone tumor from bone x-ray images using KNN classifier comparing with random forest classifier to improve accuracy rate.
- Author
-
Kumar, T. Sanjay and Jagadeesh, P.
- Subjects
- *
K-nearest neighbor classification , *RANDOM forest algorithms , *X-ray imaging , *BONE cells , *BONE cancer - Abstract
Finding a way to make KNN Classifier, instead of Random Forest Classifier, better at identifying bone tumors in x-ray pictures is the main goal of this study. The data used in this paper's dataset is obtained from the NTHU Computer Vision Lab, which is open to the public. We used a 95% confidence range with alpha and beta values of 0.05 and 0.2, respectively. An analysis of bone x-ray images was conducted using G-power 0.8 to ascertain the likelihood of tumor cell identification and categorization. We used a total of 180 participants, 90 from Group 1 & 90 from Group 2. A combination of K-Nearest Neighbor (KNN) & Random Forest (RF) is employed, along with a sample size of 10 individuals, to identify and classify cancer cells in bone x-ray images. According to the results, the Novel K-Nearest Neighbor (KNN) classifier outperforms the Random Forest (RF) classifier with an accuracy rating of 95.2905. The study's results are statistically significant (p=0.027). In conclusion, when it comes to classifying tumor cells from bone x-ray pictures, K-Nearest Neighbor (KNN) outperforms Random Forest (RF) in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Detection of fake reviews in social media with F score using novel support vector machine over random forest.
- Author
-
Irfan, U. Ahamed and Pushpalatha, Charlyn
- Subjects
- *
SUPPORT vector machines , *RANDOM forest algorithms , *SOCIAL media , *SAMPLE size (Statistics) - Abstract
To detect accuracy in Fake reviews using Novel A study using Research using Support Vector Machine and Random Forest. What you are research incorporates both new SVMs and RFs, or Random Forests. Our research parameters are as follows: beta=0.2, power=0.8, and alpha= 0.05. 10 participants in each group. Furthermore, we assess their accuracy using various sample sizes. Whereas Random Forest only had a 78.24% success rate, the Novel Support Vector Machine managed a whopping 98.47%. Both the performance and loss metrics have a at the 0.376 level of significance (p>0.05), it is statistically significant. Compared to the RF, the SVM model does far better in detecting fraudulent reviews. For the purpose of identifying disingenuous news reports, it may be your best bet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Detection and classification of melanoma skin cancer images based on convolution neural network in comparison with random forest classification approach.
- Author
-
Muniteja, M., Vidhya, K., and Shibi, S. Rimlon
- Subjects
- *
CONVOLUTIONAL neural networks , *SKIN disease diagnosis , *RANDOM forest algorithms , *SKIN imaging , *MACHINE learning - Abstract
The primary goal is to enhance the accuracy of skin disease and cancer diagnosis by replacing Random Forest with a Convolutional Neural Network. An open-access machine learning repository called uci is used to get the datasets. Each of the two groups uses twenty images of melanoma to make a diagnosis: one employs a Convolution Neural Network and the other a Random Forest. The maximum allowable inaccuracy is 0.8, which is half of the G power. With a significance level of 0.001, the suggested technique outperformed the first research utilizing Random Forest (RF), which had an accuracy of 86.1%, sensitivity of 94.02%, and specificity of 87.11%. Convolution Neural Networks seem to be much more effective than Random Forests when it comes to skin cancer diagnosis, according to this perspective. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. An innovative method for predicting Alzheimer's disease using the random forest classifier algorithm and compared with extreme gradient boosting algorithm to enhance the accuracy of prediction.
- Author
-
Sai, R. Venkata Hruthick and Gayathri, A.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *ALZHEIMER'S disease , *STATISTICS , *CONFIDENCE intervals , *BOOSTING algorithms - Abstract
The project's goal is to find a technique to detect Alzheimer's disease in people using two machine learning algorithms, the extreme Gradient Boost method and the Novel Random Forest Classifier. Consequently, we will evaluate the Novel Random Forest Classifier alongside the extreme Gradient Boost method to determine their relative merits. Two machine learning algorithms—the revolutionary Random Forest Classifier and the extreme Gradient Boost Algorithm—were used to assess the accuracy of Alzheimer's disease prediction. Using clinical data, we ran a series of iterations with 373 samples, each repeated 10 times, to get the optimal sample size. In both cases, we used a power of 80% and a confidence interval of 95%. The Novel Random Forest Classifier technique achieved a 98.24% improvement in performance compared to the extreme Gradient Boost method. Results that are statistically noteworthy were produced by using the Novel Random Forest Classifier approach and the extreme Gradient Boost strategy (p=0.003, p<0.05). Statistical analysis using the independent sample T-test confirms the relevance of this work. The Novel Random Forest Classifier approach outperformed the extreme Gradient Boost method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Analyze the lack of accuracy in loan prediction using support vector machine compared with random forest to improve accuracy.
- Author
-
Vivek, Ramini and Mahaveerakannan, R.
- Subjects
- *
SUPPORT vector machines , *RANDOM forest algorithms , *PERSONAL loans , *STANDARD deviations , *ACQUISITION of data - Abstract
The main goal of this research is to increase the accuracy of the loan prediction over the Random Forest approach. The most recent results used data collected from several sources, a 0.05 percent threshold, and a 95% confidence range that includes the mean and standard deviation. Using an 80% G-power estimate, two distinct sets of algorithms are combined to get the classification for n = 10 samples. The Random Forest approach only scores 75.60%, whereas the Novel Support Vector Machine achieves a high accuracy score of 79.00%, according to the study's findings. Between the two groups, there is a little difference of 0.001, or less than 0.05. Compared to Random Forest, Support Vector Machine generates estimates of consumer loans that are more accurate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Prediction of academic performance of students using novel classification technique of decision tree comparing with random forest.
- Author
-
Krishna, C. Vamsi and Ashokkumar, S.
- Subjects
- *
ACADEMIC achievement , *MATHEMATICS , *ACHIEVEMENT , *CALCULATORS , *CLASSIFICATION , *DECISION trees , *RANDOM forest algorithms - Abstract
The goal of this research is to find an alternative to the Random Forests Model that can categorize pupil success through assessments that have a 97% precision using the Decision Tree. Utilizing a sampling value calculator, we find out how much of an area to cover for Decision Tree as well as Random Forest, applying 10 along with 10 tests, accordingly. A Decision Tree is used to categorize pupil achievement in the proposed method. When comparing Decision Tree results, the Random Forest method is utilized. Math, reading, as well as written test results formed the basis of pupil evaluations as well as rankings. While the Decision Tree technique achieves a higher accuracy rate of 97%, the approach using Random Forests only manages 93 %. Decision Tree as well as Random Forest have a p-value of 0.001 (p0.05) for dual-tailed statistical importance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Building interpretable predictive model for hospital readmission.
- Author
-
Miswan, Nor Hamizah, Nazar, Roslinda Mohd, and Seng, Chan Chee
- Subjects
- *
ASSOCIATION rule mining , *PATIENT readmissions , *MEDICAL care costs , *RANDOM forest algorithms , *PATIENTS' rights - Abstract
Hospital readmission poses a significant cost for healthcare systems worldwide. If patients at a higher risk of readmission could be identified at the outset, appropriate plans to reduce the risk of readmission could be implemented. It is crucial to predict the right target patients and provide interpretable insights from the model's predictions. This study develops a hospital readmission prediction framework using random forest, a widely-used machine learning classifier. Association rule mining (ARM) was employed to identify hidden patterns and relationships among readmission factors. Regarding ARM model interpretation, the overall dataset demonstrated that the main rules for readmission were associated with multiple past hospitalisations and hospital visits, particularly among the elderly. This framework focuses on predictive modelling and provides model interpretation insights that can aid in decision-making. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Analyzing optional retirement in Royal Malaysia Police Force (PDRM) using machine learning techniques.
- Author
-
Halid, Hazwani, Bakar, Mohd Aftar Abu, and Ariff, Noratiqah Mohd
- Subjects
- *
EARLY retirement , *RETIREMENT age , *TEXT mining , *LABOR turnover , *RANDOM forest algorithms - Abstract
Employee turnover is a problem that affects every organization, whether in government or private sector. Employee attrition leads to high costs for any organization, especially in terms of training. In Royal Malaysia Police Force (PDRM), the optional retirement rate is higher compared to those that retire at retirement age. This study will focus on the factors that cause PDRM officers to choose optional retirement using Machine Learning (ML) techniques. In this study, k-prototype cluster analysis, Random Forest, and text analytics were performed for various analysis purposes. The results show that an officer's age is the primary motivator for electing optional retirement over mandatory retirement. Various health issues cause lower productivity and enthusiasm in performing tasks in their work. The work placement, the job ranks, the remainder service year, and the time period of the last promotion is among those identified crucial factors that contributed towards early retirement in PDRM. Since family concerns are frequently cited as a reason for retirees choosing early retirement, the work-life balanced in the police force profession was also noted as another early retirement factor. The findings of this study may assist PDRM in revamping the career in the police force so that the problem of high attrition rate can be curbed and also make the profession more attractive, hence attracting more people to join the police force. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Detecting sarcasm in public opinion about COVID-19 using NBC and RBF.
- Author
-
Alita, Debby, Hendraastuty, Nirwana, Priyanta, Sigit, Nurkholis, Andi, Aldino, Ahmad Ari, Afifah, Sofie Mutia, and Shafira, Salsa
- Subjects
- *
COVID-19 pandemic , *SENTIMENT analysis , *RANDOM forest algorithms , *PUBLIC opinion , *SARCASM - Abstract
During the Covid-19 pandemic, many opinions were voiced by the public using social media platforms, one of which was using Twitter. By analyzing the opinion, it can be classified that the opinion is a positive opinion which is a support opinion, or a negative opinion which is a derogatory opinion. But there is another opinion called sarcasm opinion. In this study, analyzing the sarcasm opinions contained in twiiter. For sentiment analysis using unigram, select k-best, and TF-IDF, for classification, namely Naive Bayes. Whereas for the classification of sarcasm using the Random Forest Classifier which has 4 features, namely, Sentiment-relate, Puncuation-relate, Lexcial and Syntatic, and Pattern-relare, for classification using the Decission tree. The results in this study on the training data obtained an accuracy rate of 76%, and for the test data obtained an accuracy rate of 92%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Comparison of random forest with K-nearest neighbors to detect fake news with improved accuracy.
- Author
-
Saranya, K. S. Sri, Juliet, A. Hency, and Nataraj, Chandrasekharan
- Subjects
- *
K-nearest neighbor classification , *FAKE news , *RANDOM forest algorithms , *MACHINE learning , *TRUST - Abstract
To develop an automated, reliable and effective system that detects fake news articles, with the goal of reducing the spread of misinformation and promoting the dissemination of accurate and trustworthy information because the rapid spread of fake innovative news has dturn out to be a foremost concern in modern years. Materials and Methods: The effectiveness of two methods Random Forest and K Nearest Neighbor (KNN) are compared in predicting fake news. The evaluation was carried out using a Github dataset of 2000 newscast informations labeled as either counterfeit or unaffected, with performance metrics such as accuracy used to compare the two algorithms. The model size of the group is 10. Results and Discussions: The result shows that KNN outperformed Random Forest (RF) in terms of all the performance metrics, suggesting that KNN is a more effective method for detecting fake news. The significance value for this study is p=0.001 which is less than 0.05. Hence, there is a statistically significant difference between the two groups. Conclusion: The results suggest that KNN with 81.20% accuracy is a more effective algorithm for fake news detection. This study provides valuable insights into the effectiveness of Machine Learning (ML) algorithms in detecting fake-news. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Exploring the superiority of encoder-decoder architecture over traditional image processing techniques for binary segmentation of underwater images.
- Author
-
George, G., Anusuya, Kanakala, A., and Lau, C. Y.
- Subjects
- *
UNDERWATER exploration , *VERNACULAR architecture , *RANDOM forest algorithms , *MARINE biology , *HABITATS , *IMAGE segmentation - Abstract
Applications like underwater exploration, marine life monitoring, and underwater item detection depend heavily on the ability to segment underwater images. However, underwater habitats' varied sights and challenging qualities, such as reduced visibility, make proper segmentation challenging. In this study, the performance of an encoder- decoder architecture, especially the Unet model, is compared to that of conventional image processing methods for binary segmentation of underwater images. Performance evaluation is based on critical criteria, such as sensitivity (recall), F1 score, precision, and accuracy. The evaluation's findings show that, among the methods examined, the Unet model has the highest sensitivity (95.16%). This demonstrates how well it works to locate bright areas in underwater photos. The Unet model also achieves a noteworthy F1 score of 86.71%, illustrating a favorable balance between precision and recall. While the Random Forest method's F1 score is slightly lower at 78.80%, it still displays comparable precision (75.46%) and accuracy (79.45%) values. The Otsu Level set technique performs worse on all metrics. The encoder-decoder architecture of the Unet model, which efficiently absorbs and uses both local and global contextual information throughout the segmentation process, can be credited with the model's better performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Comparing the accuracy of gold price prediction using linear regression and random forest regression algorithms for investment.
- Author
-
Jayendrakamesh, S., Michaelraj, T. F., and Yong, L. C.
- Subjects
- *
GOLD sales & prices , *RANDOM forest algorithms , *REGRESSION analysis , *INVESTORS , *STATISTICS - Abstract
This investigation aimed to optimize gold price forecasting strategies to support investors' decision-making. We compared the predictive capabilities of two algorithms using the "Gold Price Data". Thorough data pre-processing, including feature scaling, one-hot encoding, and outlier handling, preceded the comparative analysis. Our results demonstrate the Random Forest Regression model's significantly superior accuracy (mean: 98.53%) and lower variability compared to the Linear Regression model (mean accuracy: 85.86%). This statistical analysis (p<0.05) underscores the RFR model's potential in predicting gold prices, offering a promising tool for investors. Further research should explore the model's performance in real-world investment scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. A novel approach for prediction of eye microcirculatory disorder using enhanced random forest algorithm and accuracy comparison with Naive Bayes algorithm.
- Author
-
Krishna, T. M. S., Mohana, J., Selvaperumal, S. K., and Kumar, P. N.
- Subjects
- *
RANDOM forest algorithms , *MICROCIRCULATION disorders , *STATISTICS , *MEDICAL technology , *CONFIDENCE intervals - Abstract
The proposed study aims to develop a novel diagnostic and identification strategy for ocular microcirculatory disorders by replacing the conventional Bayes algorithm with a novel Random Forest implementation. 20 images for group 1 (Random Forest) and 20 images for group 2 (Naive Bayes) made up the sample size with a total of 40 images. The dataset is derived from actual patient data that Shanggong Medical Technology gathered. After performing examinations, 3200 images were extracted. The Random Forest model was utilized to predict microcirculation disorders in the eyes. For statistical analysis, a G-power of 0.8, along with alpha and beta values of 0.05 and 0.2 respectively, were employed, alongside a 95% confidence interval. In this study, the Random Forest model achieved an accuracy of 95.06%, while the Naive Bayes model attained 85.22% accuracy. A statistical disparity was observed between the Random Forest and Naive Bayes groups, with a p-value of 0.001 (independent sample T-test p<0.05). The suggested Random Forest model outperforms the Naive Bayes model at predicting eye microcirculation disorder. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. An accuracy analysis and prediction of daily workout using smart phone dataset using novel random forest algorithm over linear regression.
- Author
-
Brindha, C. S., Sivanantham, S., Nataraj, C., and Talasila, V. S. N.
- Subjects
- *
RANDOM forest algorithms , *SMARTPHONES , *SAMPLE size (Statistics) , *ACQUISITION of data , *ALGORITHMS - Abstract
The proposed research work anticipates the regular workout patterns of users by utilising the data collected from their smartphones. The study indicates that the Random Forest algorithm is more effective in predicting human exercise routines with higher precision than the Linear regression algorithm. A dataset of 3000 entry with the headings 'user','activity','Timestamp', and so on are used to compare, with two groups of 10 sample size for each being studied. Linear Regression recorded an accuracy of 73.67%, while the innovative Random Forest Algorithm gives an precision of 76.51. The difference between these groups is statistically significant at 0.033, which is smaller than 0.05 (p<0.05). It is obvious that these groups differ statistically significantly from one another. Compared to the results of the linear regression, the predictions made by the Random Forest algorithm model are more accurate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Accuracy improvement for personality prediction using logistic regression in comparison with random forest algorithm.
- Author
-
Dinesh, K., Kamatchi, S., Mangaiyarkarasi, K., and Selvaperumal, S. K.
- Subjects
- *
MYERS-Briggs Type Indicator , *RANDOM forest algorithms , *PSYCHOLOGICAL typologies , *STATISTICAL significance , *PERSONALITY , *LOGISTIC regression analysis - Abstract
The research aims to discern people personality types by examining four dimensions of personality traits derived from their cognition and ideas through the application of logistic regression and Random Forest methodologies. Two cohorts were established, with one cohort implementing Logistic Regression and the other cohort adopting Random Forest. The cohorts underwent around 38 iterations. The sample size was established using a personality prediction analysis with a significance level (alpha) of 0.005, a pretest power of 80%, and a confidence level of 95%. A Myers-Briggs Type Indicator (MBTI) dataset, containing 8675 samples, was divided into two sets: 6000 samples for training and 2700 samples for testing. The simulation using Logistic Regression produced a personality prediction accuracy of 97.86%, whereas Random Forest achieved an accuracy of 82.18%. The independent sample T-test resulted in a p-value of 0.003, which indicates statistical significance at a significance level of p<0.05. These findings indicate that Logistic Regression performs much better than Random Forest in predicting personality based on the supplied dataset. Logistic Regression achieves an improved accuracy rate of 97.86% compared to Random Forest. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. An analysis on obesity levels prediction based on smoking habits using stepwise linear regression algorithm in comparison with random forest classifier for improved accuracy.
- Author
-
Kumar, K. S., Bee, M. K. Mariam, and Thiruchelvam, V.
- Subjects
- *
RANDOM forest algorithms , *SMOKING , *DATA recorders & recording , *OBESITY , *ALGORITHMS - Abstract
The aim of this study is an analysis on obesity levels prediction based on smoking habits using stepwise linear regression algorithm in comparison with random forest classifier for improved accuracy. Estimating the incidence of obesity in people using information acquired from the open-access website Kaggle Insufficient weight, normal weight, overweight levels I and II, overweight levels I and II, and obesity types I, II, and III are the categories that can be used to group the 2111 records of the data that have 17 qualities. A comparison between the random forest algorithm (Group 2) and the stepwise linear regression algorithm (Group 1) using 20 records each. The Gpower is 80 % (The values for g power are alpha(α)=0.05 and power=0.85). The research uses the stepwise linear regression algorithm and obtained accuracy of 84.8% while random forest classifier got 81.9 %. The significance value is found to be p=0.001 (p<0.05) after analyzing the results from Independent. This perspective compares the random forest classifier and stepwise linear regression algorithm for predicting obesity levels based on smoking habits. In comparison the stepwise linear regression method performs more accurately than a random forest classifier. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A comparison of K-nearest neighbour algorithm with random forest to improve the accuracy for prediction of blast disease in rice crop.
- Author
-
Tharun, K. R. M., Ramesh, S., Thiruchelvam, V., and Prasanth, V.
- Subjects
- *
RICE blast disease , *RANDOM forest algorithms , *SUPERVISED learning , *SUPPORT vector machines , *PLANT diseases , *K-nearest neighbor classification , *RICE quality - Abstract
In this study of this research is to improve in accuracy for blast disease in rice crops using K-Nearest Neighbour Algorithm and compared with Random Forest Algorithm. To predict the Rice Blast illness, K-Nearest Neighbour Algorithm and Random Forest in both used, with different training and testing splits. In this case, 169 samples were employed in each of the two groups for an analysis that involved 10 iterations in total. The ClinCalc programme is used as a tool to calculate the setup's correctness for supervised learning. This test's average Gpower is about 80% using G power setting values of 0.05 and 0.80. Novel Supporting Vector Machine and K-Nearest Neighbour's performance at identifying Blast Disease in rice crops will depend on the quality of chosen dataset. The K-Closest Neighbour approach outperforms Random Forest (88.5950%) in terms of both Accuracy and loss, with an Independent Sample T-test of p=0.001 which is considered to be statistically significant. The K-Nearest Neighbour method (88.595%) is more accurate than Random Forest (78.100%) Algorithm. K-Nearest Neighbour can be more effective than Random Forest when dealing with high-dimensional data and when there is a complex relationship between the features and the label for the selected dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Maximising the accuracy of handwritten alphabet recognition using Bayesian regression over random forest.
- Author
-
Prakash, J., Dass, P., Kavitha, N., and Thiruchelvam, V.
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *STATISTICS , *RECOGNITION (Psychology) , *INSTITUTIONAL repositories - Abstract
This research paper deals with maximising the accuracy of recognising Handwritten Alphabet using Bayesian Recognition over Random Forest. The dataset named A-Z Handwritten Alphabets consists of 370,000 images were collected from the Kaggle repository. The suggested ML classifier model, namely Bayesian Regression and Random Forest is used in this phase. Nearly 10 iterative values from each group were taken for statistical analysis. For SPSS calculation done using G power by presetting value of 0.95 is used. The Bayesian Regression, which has an accuracy of 92.52%, outperforms the Random Forest technique, which has an accuracy of 83.42%, according to the data. Thus, it demonstrates that the Novel Bayesian Regression and Random Forest differ statistically significantly with p=0.004 (T test on independent sample p<0.05). The suggested technology gained more attention in the field of Handwritten Alphabet Recognition, and it can make number recognition easier. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Blockchain based agriculture product supply chain management system using K nearest neighbor to enhance the accuracy and comparing with random forest algorithm.
- Author
-
Royal, G. Harshith, Gomathi, S., Sungeetha, D., and Sooriamoorthy, D.
- Subjects
- *
RANDOM forest algorithms , *SUPPLY chain management , *FARM supplies , *RUNNING training , *BLOCKCHAINS - Abstract
This study compares K Nearest Neighbor and random forests to increase accuracy in blockchain-based agriculture product supply chain management systems. K Nearest Neighbor and Random Forest algorithms are tested when the data sets are imported. Algorithms are run with varied training and testing splits to improve accuracy in blockchain-based farm product supply chain management systems. There are two groups for the two algorithms. There are 20 total samples, with 10 in each group, With G power setting parameters of (α=0.05 and power=0.80). Our research demonstrates a statistically significant difference between the K Nearest Neighbor algorithm's accuracy of 83.0% and the Random Forest algorithm's accuracy of 77.0%. Furthermore, the t-test for independent samples with statistically significant value of p=0.000, (p<0.05) was applied to estimate the mean, deviation, and standard error. According to the data obtained for this research, the Innovative K Nearest Neighbour Algorithm demonstrates superior performance in accuracy (83.0%) compared to the Random Forest Algorithm (77.0%). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Object classification based-on patterns using random forest classifier compared with enhanced K-nearest neighbor algorithm.
- Author
-
Kumar, G. P., Anbazhagan, K., and Ramasenderan, N.
- Subjects
- *
PATTERN recognition systems , *RANDOM forest algorithms , *IMAGE recognition (Computer vision) , *DATABASES , *STATISTICAL significance - Abstract
This article's main objectives are to recognize the sequence in the image, distinguish the object it is, and analyse it correctly. Using the nearest neighbor classifier and the novel random forest classifier, the input picture is used to predict the image recognition. The Kaggle database served as the source of the study dataset for this examination. larger accuracy was predicted for visual pattern analysis (with a sample size of 10 from G1 and 10 from G2) with a sample size of 20. The computation involved the use of a 95% poise interval, an alpha and beta value of 0.2 and 0.05, and a G-power of 0.8. With 91.54 percent exactness, the suggested novel RF outperforms the latter, which has an exactness pace of 85.33 percent. p = 0.001 (Independent Sample T Test p = 0.05) indicates the statistical significance of the difference between the two algorithms. Data analysis shows that for image pattern recognition, the novel random forest model that has been proposed performs better than the K nearest neighbor algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Evaluation and comparison of random forest algorithm with Novel Extended Artificial Immuno System algorithm for predicting stroke.
- Author
-
Dinesh, P. S., Muneeshwari, P., Selvaperumal, S. K., and Venu, D.
- Subjects
- *
RANDOM forest algorithms , *MAGNETIC resonance imaging , *SUPERVISED learning , *STATISTICAL significance , *SAMPLE size (Statistics) - Abstract
Stroke prognosis is the unmatched ability to disclose soft tissue characterisation and 3-D visualization, magnetic resonance imaging (MRI) has evolved into a useful technique. A Novel Extended Artificial Immuno System is used with 20 sets as sample size, and Random Forest algorithm has been used with a sample mean size of 20 sets with a total of 40 sets being compared to improve the accuracy of the present research. The mean accuracy of the present research has been calculated using the ClinCalc software appliance under supervised learning with 0.8 as the alpha value, a G-Power value of 0.8, and CI of 95%. After performing this research, the Novel Extended Artificial Immuno System Has obtained an accuracy of 98.61% and the Random Forest has achieved an accuracy of 96.31%. An Independent samples T-Test analysis has been executed, and its significance value is found to be p value is 0.000 (p<0.05), suggesting statistical significance. In this present research, the Novel Extended Artificial Immuno System is collated with the Random Forest algorithm. After performing the current research experiment, The Novel Extended Artificial Immuno System has been found to have more perfection than the Random Forest algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. An effective method of improving accuracy in temperament identification using text messages and social media history with gated recurrent unit algorithm in comparison with random forest algorithm.
- Author
-
Kumar, M. Akshay, Radhika, S., and Alexandar, C. H. C.
- Subjects
- *
RANDOM forest algorithms , *SOCIAL history , *SAMPLE size (Statistics) , *SOCIAL media , *TEMPERAMENT - Abstract
This research proposes an effective method for improving accuracy in identifying temperament history and text messages and comparing the performance of algorithms. Iterations were conducted during the research for a total sample size of 3062. The csv file repositories on the Kaggle website provided the data set for the study. Which is carried out by setting the G-power as 0.80, with the alpha computation as 0.05 and beta computation as 0.2, with the Confidence range of 95.0%. An accuracy of 93.40% and a loss of 6.59% were achieved with the implementation of the GRU model, which outperformed the Random Forest model with its accuracy of 81.52% and loss of 13.33%. The analysis uncovered a substantial difference among the 2 groups, as evidenced by a calculated p-value of 0.001.GRU algorithm with 93.40% outperforms the RF algorithm with 81.52% in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. A study and survey of chrome extension to detect phishing websites.
- Author
-
Machap, Kamalakannan, Murakami, Rin, and Rahman, Nor Azlina Abdul
- Subjects
- *
MACHINE learning , *INTERNET privacy , *PHISHING , *RANDOM forest algorithms , *SELF-efficacy - Abstract
This paper is focusing on the development of a efficient Chrome extension designed to detect phishing websites. Phishing attacks continue to pose a significant threat to online users, compromising their sensitive information and causing financial losses. The proposed extension utilizes random forest machine learning algorithm to analyze website URL, enabling the identification and alerting of potential phishing attempts. By integrating with the user's browsing experience, the extension provides a proactive defense mechanism, empowering users to make informed decisions and stay protected from phishing attacks. The research work effectiveness is evaluated through extensive testing. Overall, this research contributes to enhancing user security and privacy in the online ecosystem, with implications for both individual users and organizations concerned with cybersecurity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Improving accuracy for classifying audio based mel frequency features using artificial neural network in comparison with Random Forest algorithm.
- Author
-
Rahul, B., Rashmita, K., Thiruchelvam, V., and Susiapan, Y.
- Subjects
- *
ARTIFICIAL neural networks , *RANDOM forest algorithms , *STATISTICS , *RESEARCH personnel , *CLASSIFICATION algorithms - Abstract
The primary objective of the study is to explore the effectiveness of Audio classification using a Novel Artificial Neural Network (ANN) Algorithm in comparison with Random Forest to classifying different sounds into specific groups and representing them as waveforms. Material and Methods: The dataset which has been collected from Urban Sounds 8k that include a total of 8700 audios. Each Audio is composed of 40 rows and 40 columns for the classification of audio based mel frequency. Researchers compared the performance of a novel Artificial Neural Network (ANN) with a Random Forest algorithm for audio classification. A statistical analysis ensured a high probability of detecting true effects (power of 0.8) while minimizing false positives (alpha of 0.05). The ANN achieved a significantly higher mean accuracy (94.34%) compared to Random Forest (67.26%) over 10 trials (p-value = 0.004). This suggests that ANNs are a promising approach for audio classification tasks based on the findings of this study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Optimization of significant fluid parameters in the success of hydraulic fracturing in shale gas using the random forest algorithm.
- Author
-
Cahyani, A. D., Erfando, T., Serasa, N. A. S., and Afdhol, M. K.
- Subjects
- *
RANDOM forest algorithms , *SHALE gas reservoirs , *STANDARD deviations , *HYDRAULIC fracturing , *YIELD stress - Abstract
Hydraulic fracturing is carried out in an unconventional reservoir, namely a shale gas reservoir, which is influenced by several factors, such as porosity and low permeability. It aims to determine fluid parameters that are very influential in the success of hydraulic fracturing and determine the optimum ratio using the random forest algorithm. Parameters tested in This research include the Power law index, Yield stress, Fluid consistency, Propant concentration, and particle diameter of the five parameters above. Generate using fraccade 7.0, and the result is a Half value Length and Effective Permeability, which will then be analyzed using CMOST and an effective permeability of 799.4 md and a half-length of 66.3 ft indicates an optimal state with a cumulative gas value produced of 907500.1 ft3. Then using the random forest algorithm at a ratio of 0.7: 0.3 with RMSE (Root Mean Squared Error) of 0.14 and MAPE (Mean Absolute Percentage Error) of 0.00%, the results show that they are in optimal condition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Mitigating post-harvest losses through IoT-based machine learning algorithms in smart farming.
- Author
-
Kanna, R. Rajesh, Priya, T. Mohana, Sivakumar, V., Nataraj, Chandrasekharan, and Thomas, Jishamol
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *INTERNET of things , *AGRICULTURE , *SELF-efficacy , *AGRICULTURAL technology - Abstract
This research paper explores the transformative potential of Internet of Things (IoT) technology in mitigating the longstanding issue of post-harvest losses within the agriculture sector. These losses, which encompass both quantitative and qualitative deterioration of food commodities from harvest to consumption, have posed persistent challenges, resulting in economic losses and food wastage. By delving into the current landscape of post-harvest losses and the application of IoT technology, the paper offers valuable insights into how IoT can be harnessed to reduce these losses effectively. It not only highlights the benefits and existing IoT solutions but also addresses the inherent challenges, providing recommendations for their resolution. Moreover, the research introduces a machine learning-based model, specifically Random Forest ML, to identify and prevent losses in tandem with IoT devices, empowering farmers with timely alert messages for informed decision-making, thus fostering a more sustainable and efficient agricultural ecosystem. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Improved diabetes disease prediction IWFO model using machine learning algorithms.
- Author
-
Gunavathi, R., Sivakumar, V., Kumar, B. Senthil, and Vijayalakshmi, S.
- Subjects
- *
FEATURE selection , *DECISION making , *RANDOM forest algorithms , *MATHEMATICAL optimization , *PREDICTION models - Abstract
Diabetic disease is the mostly affected and massive disease on a global level. Diagnosing the diabetic earlier will help the medicalist to give the improved and latest clinical treatment. The healthcare specialist unit uses many machine learning techniques, methodologies and tools for decision making in diabetic field. The machine learning techniques are utilized for the prediction of the diabetic diseases in the initial level. To eliminate such issues, optimized detection techniques are proposed. First of all, the training samples are increased using the sliding window protocol. Further, class imbalanced training data classes are balanced and resolved using the adaptive and gradient booster technique. Further, the diabetic feature selection process is improved by the Intensity Weighted Firefly Optimization firefly techniques (IWFO), in which irrelevant features are reduced based on the correlation between the features that deducts the unwanted features involved in the diabetic disease process. Then the feature transformation problem is faced by the PCA technique, which manages the several types of features. Finally, the improved and optimal hybrid random forest is applied into the normal and diabetes classes respectively. The proposed system predicts the diabetic disease efficiently and maximizes its precision of the prediction system. The present paper is compared with different classifiers to determine the efficiency of the work. Overall, the initiated system improved the present studies accuracy level. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Prediction of heart disease using XGB classifier.
- Author
-
Vijayalakshmi, S., Sivakumar, V., Nataraj, C., and Kanth, P. C.
- Subjects
- *
ARTIFICIAL neural networks , *CORONARY disease , *RANDOM forest algorithms , *SUPPORT vector machines , *HEART diseases - Abstract
Predicting heart disease in advance could be a significant medical breakthrough because it is widespread. A reliable strategy that can be utilized to do this is machine learning. Decision tree classifiers, random forests, and multilayer perceptron have all been used in studies to predict heart disease. However, several of these techniques could be improved, like poor precision. In our research, we have taken the South African heart Disease dataset and implemented a few models, which include Support Vector Machine (SVM), K Neighbors (KNN), Artificial neural network and XG Boost Classifier. We have used different methods for measuring performance. SVM with 69.0 accuracy, KNN with 86.0 accuracy, and ANN with 80.0 accuracy. However, the XGB classifier has shown some promising results in predicting heart disease with an accuracy of 90%. Further, when the hyperparameters were tuned using the random search method, the accuracy increased to 92.8%. The benefit of this work is that it uses machine-learning approaches to enhance the performance of coronary heart disease prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Utilization of machine learning for predictive maintenance in improving productivity in manufacturing industry.
- Author
-
Agustina, Dina, Fitri, Fadhilah, Zilrahmi, Winanda, Rara Sandhy, and Sari, Devni Prima
- Subjects
- *
INDUSTRY 4.0 , *SUPPORT vector machines , *FEATURE selection , *RANDOM forest algorithms , *MACHINE learning - Abstract
The fourth industrial revolution, also known as Industry 4.0, is driven by the combination of IoT, AI, and big data in the manufacturing industry. One of the challenges for manufacturers is machine failures or downtimes, which can significantly hinder production processes. Predictive maintenance (PdM) is a solution to this problem and is widely used in the industry. In this study, Support Vector Machine (SVM) and Random Forest (RF) algorithms were used to predict the Overall Equipment Effectiveness (OEE) of a production machine, and the best model was selected based on accuracy using a confusion matrix. The study involved data preprocessing, exploratory data analysis, feature selection, and training the models to generate predictive classification models. The accuracy of the SVM algorithm was found to be 87%, while the RF algorithm achieved an accuracy of 91%. Therefore, the RF algorithm can be considered a better choice for forecasting OEE using these two features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Optimizing sales strategy in the Indian automobile industry: Predicting future car prices using machine learning and demographic data.
- Author
-
Khan, M. Reyasudin Basir, Islam, Gazi Md. Nurul, Ng, Poh Kiat, Zainuddin, Ahmad Anwar, Lean, Chong Peng, Al-Fattah, Jabbar, and Kamarudin, Nazhatul Hafizah
- Subjects
- *
MACHINE learning , *RANDOM forest algorithms , *BUSINESS planning , *AUTOMOBILE sales & prices , *DECISION trees , *BIG data - Abstract
Demographics play a vital role in defining the size, distribution, and structure of a population. In the context of the automobile industry, business owners can leverage demographic insights to gauge the demand for vehicles and strategically align their sales efforts. Accurate sales forecasting is essential for long-term business strategy, providing manufacturers with a competitive advantage in optimizing production planning methods. This project utilizes large-scale automobile sales data to forecast car price variations in the coming months, considering factors such as purchase patterns, car models, and other relevant data. By analyzing different attributes from a past-year dataset, three machine learning algorithms: Linear Regression, Decision Tree Regression, and Random Forest Regression were employed to predict future car prices. The performance of each algorithm is evaluated using the R-squared value. Notably, the Random Forest regression model achieves a higher accuracy of 93%, outperforming both Decision Tree regression and Linear regression. These results demonstrate the suitability of Random Forest regression in predicting big data for the industry's future product production plan and overall strategy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Application of machine learning models in predicting motorcyclist severity in heavy good vehicles (HGV) crashes in Malaysia.
- Author
-
Kang, Ho Ming, Musa, Sarah, Darman, Hazlina, Hamidun, Rizati, and Roslan, Azzuhana
- Subjects
- *
SUPPORT vector machines , *RANDOM forest algorithms , *DECISION trees , *SPEED limits , *ROAD safety measures - Abstract
Prediction on motorcyclist severity is always a critical task for transportation system and a promising research topic in road safety studies. Machine learning models have gained popularity in the recent years due to their strong prediction accuracy. Therefore, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among the machine learning models. In this study, crash data from Malaysia is used to predict the motorcyclist severity using variables such as road type, speed limit, location type and collision type. The analysis begins with the use of random forest (RF) to adequately select important features for prediction. Then, three most often used machine learning models, which are multinomial logistic regression (MLR), decision tree (DT) and support vector machine (SVM), are applied and their performances are evaluated. The results indicated that the most important features in predicting the motorcyclist severity are the number of drivers killed, and environmental factors such as traffic system, collision type and light condition. Among the three models used in this study, SVM has shown better performance with 82.14% accuracy than DT and LR. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Facial and vocal expression-based comprehensive framework for real-time stress monitoring.
- Author
-
Karlapudi, Ajay Pavan, Cherukuri, Krishna Bhargav, Badigama, Sai Kumar, and Irudayaraj, Juvvana
- Subjects
- *
MACHINE learning , *MENTAL health personnel , *RANDOM forest algorithms , *FACIAL expression , *SIGNAL processing - Abstract
Many people in today's society experience anxiety, depression, and heart disease as direct results of stress. An increasing number of people and medical professionals recognizethe need for efficient stress monitoring and management toolsto track and alleviate stress in real time. In order to fill this gap, the proposed stress monitoring framework makes use of vocal and facial expressions asindicators of stress. Frowziness, furrowed brows, and narrowed eyes are all telltale signs of stress, as changes in vocal pitch, volume, and rate. The proposed system uses signal processing techniques to extract stress-related features from these expressions and classify them as indicative of stress or not. Stress-related characteristics can be accurately classifiedusing machine learning models like neural networks, SVM, and Random Forest in this setting. The proposed system enhancesthe accuracy and robustness of the stress monitoring tool bycombining the results of multiple decision trees trained usingRandom Forest on different subsets of data and features. An intuitive interface that shows current stress levels has been created to make the system more approachable. The mental health field, the medical field, and related fields can all benefit from this interface. For instance, it can be used by mental health professionals to better diagnose and track their patients' stress levels over time, allowing for more precise and timely interventions. In addition, it can teach people how to controltheir own stress and show them how their thoughts, feelings,and actions all contribute to their health. Finally, the proposed stress monitoring framework providesa robust and effective method for tracking stress levels in real-time. The system's robustness and accuracy are due to the integration of signal processing techniques and machine learning algorithms, which can be applied in a number of fields, including healthcare and personal wellness. Designed with simplicity in mind, the system is a great resource for copingwith the stresses of modern life. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Predicting model for multiclass imbalanced data using pipeline sampling technique with dynamic ensemble selection.
- Author
-
Kamaladevi, M., Venkataraman, V., and Umamaheswari, P.
- Subjects
- *
MACHINE learning , *CLASSIFICATION algorithms , *RANDOM forest algorithms , *MEDICAL decision making , *WORK design , *BOOSTING algorithms - Abstract
Machine learning algorithms find patterns in data that helps to make better prediction and decision. In day to day life these algorithms help to make critical decision such as medical diagnosis, stock prediction fault detection etc., Classification algorithm predict labels from trained patterns. Imbalanced data classification distribute data unevenly among classes i.e majority class has high proportion data whereas minority class takes low proportion data. Common machine learning algorithms have poor prediction accuracy for minority class leads to data imbalance problem. Besides multi class imbalanced learning has greater challenges than binary classification. To address this issue, this works designed a new classifier model that combine pipeline sampling for resample the data and Dynamic ensemble classifier selection for prediction on multiclass imbalanced dataset taken form UCI repository. Performance are evaluated through the multiclass classification metrics such as Weighted average Accuracy, weighted average Precision and weighted average F1 score, Roc_AUC Score, cohen's kappa score, Mathew correlation co-efficient. A thorough empirical comparison is conducted to analyze the performance of proposed model with existing ensemble algorithm Gradient Boosting Classifier Bagging classifier and Random forest classifier. Dynamic ensemble algorithm outperforms the existing ensemble algorithm [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Rank based random forest model for gestational diabetes mellitus prediction.
- Author
-
Amarnath, Sumathi and Selvamani, Meganathan
- Subjects
- *
PREGNANCY complications , *FIRST trimester of pregnancy , *MACHINE learning , *RANDOM forest algorithms , *MATERNAL health , *PREGNANCY - Abstract
Gestational Diabetes Mellitus (GDM) is a type of diabetes that can occur during pregnancy in women. Due to its great prevalence and potential negative effects on the health of mothers and children, GDM is a challenge for world health. GDM is estimated to affect around 7-10% of pregnancies worldwide. The prevalence of GDM has enormously increased over the last few years, especially in developing countries like India. GDM is higher in women whose age is over 25, overweight or obese, have a family history of diabetes, GDM in a previous pregnancy, PCOS, etc. GDM-related complications during and after pregnancy are increasing across the Country. Early prediction and timely treatment allow women to avoid pregnancy-related complications. Early recognition of GDM during the first trimester of pregnancy improves maternal and fetus health and helps to overcome future diabetes in the mother and future generations. RF is an ensemble learning method that gives more accurate predictions than other machine learning algorithms, and its result is closer to the ensemble method. The Rank based Random Forest (RRF) approach is used to improve classification performance. Various weight values have been examined, and the chosen weight values (0.6, 0.4) are assigned to the dataset. The new features, Sum and Rank, were generated. Based on the threshold value, the diabetes prediction was performed on the dataset. The proposed RRF improved classification accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Disease prediction system using machine learning.
- Author
-
Jayapradha, J., Singh, Neetish Kumar, Dwivedi, Vishal, and Devi, M. Uma
- Subjects
- *
PREDICTION models , *DECISION trees , *RANDOM forest algorithms , *CLASSIFICATION algorithms , *FORECASTING - Abstract
In this era, technology has revolutionized the health industry to a great extent. The proposed model aims to design a diagnostic system for various diseases based on their symptoms. The Disease Prediction System has implemented different ML prediction models for the prediction of the user's disease based on various symptoms inputted by the user. Machine learning classification algorithms analyse the inputs given by the user and then predict the disease and probability of occurrence of the disease as output. The proposed system predicts the diseases such as i) Diabetes, ii) Kidney, iii) Cancer, iv) Heart and v) Liver. Four prediction models, Naive Bayes, Decision Tree, Random Forest and Logistic Regression, has been implemented in the proposed system for various disease. The dataset "Disease Prediction Using Machine Learning," with a count of 132 symptoms, has been used in the proposed model. The main goal of the proposed model is to predict the disease; however, the user doesn't need a medical report to use this system as the prediction is based on the symptoms, which will save time and money. The system also has an easy-to-use user interface, and all the users can use it to predict genetic diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. MVP of the NBA season prediction using machine learning, data analysis and web scraping.
- Author
-
Parihar, Abhyuday Singh, Jaiswal, Amisha, and Malarselvi, G.
- Subjects
- *
BASKETBALL , *DATA analysis , *MACHINE learning , *RANDOM forest algorithms , *MULTICOLLINEARITY , *GAMBLING - Abstract
People try to anticipate the highest scorers in each season as the NBA (National Basketball Association) becomes more and more popularity for a variety of reasons, including gambling, online tournaments, or competitions. Numerous variables could have an impact on the outcome. The purpose of this article is to develop an effective yet straightforward model to forecast the most valuable player of the current season based on the results of the previous year, as well as to derive team abilities and distinct home advantages based on data from the previous 30 seasons. The model accurately predicts outcomes by looking at some key variables, such as team talent and home field advantage, even when contrasted to the bookmaker's point spread by looking at the games from 1991 to 2022. Even while the data still does not account for additional factors like injuries, fouls, and trades, the forecast produced by this model is still correct. In order to forecast the outcomes, this study uses the ridge regression and random forest regressor models. By comparing the standard errors of the results, the best model is created, and the top player for the upcoming year is determined, resulting in precise projections. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Customer purchasing intention prediction using machine learning.
- Author
-
Kulandaivel, Madhumitha, Agarwal, Rashi, and Singh, Aryan
- Subjects
- *
CONSUMERS , *SUPPORT vector machines , *K-nearest neighbor classification , *RANDOM forest algorithms , *IMPULSE buying , *MACHINE learning - Abstract
Online purchasing services are quickly displacing traditional or physical retailers. Online shopping websites have seen a considerable increase in customer confidence over time. On the one hand, the proliferation of these websites has encouraged fierce competition, which is excellent for customers because it leads to better and more affordable items. This makes online purchasing a fascinating subject for academic study. Retailers are observing a rise in online transactions from their clients as a result of how simple it is to use E-commerce platforms to make purchases. Predictive analytics can be applied to analyze these interactions and provide intricate behavioral patterns that assist businesses in better comprehending the needs of their clients. Online trust and previous online purchase experiences, along with elements like impulse purchase orientation, brand orientation, and quality orientation, all influence customer online purchase intention and shopping orientation. The objective of our project is to build a prediction model that will help in the increment of the profitability of a marketing campaign for a hypothetical corporation. By applying various preprocessing techniques on data and multiple feature engineering methods, along with four machine learning models we have tried to achieve our goal. The final model should allow the company to focus its advertising on customers who are most likely to respond to the campaign while excluding non-respondents. As a result, four different learning classifiers K-Nearest Neighbors Algorithm, Support Vector Machine Algorithm, Logistic Regression, and Random Forest were tested and optimized, and we have achieved the best classification performance using Random Forest. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. A comparative study of machine learning techniques for chronic kidney disease prediction.
- Author
-
Jansi, K. R., Kosireddy, Kishore, and Kodumuru, Adarsh
- Subjects
- *
CHRONIC kidney failure , *FEATURE selection , *MACHINE learning , *RANDOM forest algorithms - Abstract
A major global health issue, chronic kidney disease affects millions of individuals. Machine learning algorithms have showed promisein predicting the risk of developing disease, and early detection of chronic kidney disease is essential to preventing or slowing down its course. Machine learning techniques for chronic kidney disease prediction are investigated. Also, the dataset was subjected to the appropriate feature selection technique. The wrapper technique, feature selection, and complete features were used, respectively, to calculate the output of each classifier. Logistic regression classifier, KNN, random forest classifier, Ada Boost, Gradient Boosting, Stochastic Gradient Boosting (SGB), Extra Trees Classifier, and LGBM Classifier are a few of the techniques and models that are examined for chronic kidney disease prediction. Extra Trees Classifier, LGBM Classifier are discussed. Additionally, different features and datasets used in chronic kidney disease prediction are analyzed. Finally, the performance of various machine learning models is evaluated, and future directions for chronic kidney disease prediction research are outlined. Overall, Machine learning algorithms have the potential to significantly improve early detection and management of CKD, thus reducing the burden of this disease on healthcare systems and individuals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Estimation of NPK requirements using random forest.
- Author
-
Bhansali, Tilak, Bhardwaj, Sagar, and Ramalingam, Anita
- Subjects
- *
RANDOM forest algorithms , *SOIL fertility , *RAINFALL , *SOIL drying , *NUTRITIONAL requirements - Abstract
Fertilizer use is usually under the limited control of the farmer. Competent advice on the optimal use of these fertilizers is needed so that farmers can achieve higher yields and reduce fertilizer losses. There is a correlation between nutrient losses. The right amount of rainfall at the right time allows nutrients to penetrate the root zone of the soil and loosen the dry manure. When there is an excess of rain, it can cause more runoff, and soils contain important nutrients like manganese (Mn), phosphorus (P), boron (B), nitrogen (N), and potassium (K). The study employs iterative random forest algorithms that are regularly updated with time-series data. These algorithms are used to determine nutrient recommendations by considering rainfall patterns and soil fertility. The aim is to estimate the nutrient requirements for various crops. The method proposed in this study can help improve soil fertility by providing nutrient recommendations for optimal conditions for plant growth and reducing the potential for leaching and runoff. The proposed system is able to achieve 92% accuracy. A user-friendly system has been implemented in the form of a website to provide cross-platform functionality and suggest appropriate timings and amounts of nutrients required for an inputted crop, with an alert system for heavy rainfall. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Email spam detection and filtering using machine learning.
- Author
-
Asha, P., Siddhartha, Katakam, Manikanta, Kodati Naga Satya Sai, Gopi, Chilukuri, and Mayan, J. Albert
- Subjects
- *
SPAM filtering (Email) , *SPAM email , *MACHINE learning , *RANDOM forest algorithms , *PHISHING - Abstract
Phishing assaults, in which the perpetrator masquerades as a legitimate source in order to obtain confidential material, are now a serious threat due to the rapid growth of online consumers damaging one's credibility, costing one's money, or infecting one's computer with spyware and perhaps other viruses. Due to their capacity to sift through large amounts of data in search of patterns that can be used to make predictions, intelligent approaches like ML & DL were finding growing usage in the realm of cybersecurity. In this study, we explore the efficacy of using such clever methods to identify phishing websites. We utilized two different data sets and picked the most highly linked attributes, which included both content-based and URL-lexical/domain-based characteristics. After that, many ML models were implemented, and their relative efficacy was assessed. The results demonstrated the significance of selecting features in raising the quality of the models. In addition, the findings attempted to determine the most useful factors that affect the model when it comes to recognizing phishing websites. When it came to classifying data, the Random Forest (RF) algorithm performed best across the board. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Personality prediction based on Twitter stream.
- Author
-
Duggal, Sanchit and Kayalvizhi, R.
- Subjects
- *
RANDOM forest algorithms , *NATURAL language processing , *K-nearest neighbor classification , *PERSONALITY , *MACHINE learning , *SOCIAL media - Abstract
The trend of social media interaction, texts, and hidden identity communication has increased drastically over recent years. With the evolution of various methods for convenient techniques of messaging, social media has evolved at the top most position, for almost anyone in the modern world. A major drawback to this "Anonymous Identity" a communication, is that, it becomes very difficult to identify the actual reality of the person communicating from the other ends. This project is purposefully made to implement Natural Language Processing (NLP), Machine Learning (ML), Random Forest Algorithm, K-Nearest Neighbors (KNN) understand behavior traits and search techniques, interests, literal processing of the person, to identify his/her personality, by the input from the most popularly usage platform Twitter. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.