13,892 results
Search Results
2. Acute Lymphoblastic Leukemia Disease Detection Using Image Processing and Machine Learning
- Author
-
Chavan, Abhishek D., Thakre, Anuradha, Chopade, Tulsi Vijay, Fernandes, Jessica, Gawari, Omkar S., Gore, Sonal, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Singh, Mayank, editor, Tyagi, Vipin, editor, Gupta, P. K., editor, Flusser, Jan, editor, and Ören, Tuncer, editor
- Published
- 2022
- Full Text
- View/download PDF
3. Paper spray mass spectrometry combined with machine learning as a rapid diagnostic for chronic kidney disease.
- Author
-
Pereira, Igor, Sboto, Jindar N. S., Robinson, Jason L., and Gill, Chris G.
- Subjects
- *
CHRONIC kidney failure , *MASS spectrometry , *RANDOM forest algorithms , *HIGH voltages , *MASS measurement - Abstract
A new analytical method for chronic kidney disease (CKD) detection utilizing paper spray mass spectrometry (PS-MS) combined with machine learning is presented. The analytical protocol is rapid and simple, based on metabolic profile alterations in urine. Anonymized raw urine samples were deposited (10 μL each) onto pointed PS-MS sample strips. Without waiting for the sample to dry, 75 μL of acetonitrile and high voltage were applied to the strips, using high resolution mass spectrometry measurement (15 s per sample) with polarity switching to detect a wide range of metabolites. Random forest machine learning was used to classify the resulting data. The diagnostic performance for the potential diagnosis of CKD was evaluated for accuracy, sensitivity, and specificity, achieving results >96% for the training data and >91% for validation and test data sets. Metabolites selected by the classification model as up- or down-regulated in healthy or CKD samples were tentatively identified and in agreement with previously reported literature. The potential utilization of this approach to discriminate albuminuria categories (normo, micro, and macroalbuminuria) was also demonstrated. This study indicates that PS-MS combined with machine learning has the potential to be used as a rapid and simple diagnostic tool for CKD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Content-based quality evaluation of scientific papers using coarse feature and knowledge entity network.
- Author
-
Wang, Zhongyi, Zhang, Haoxuan, Chen, Haihua, Feng, Yunhe, and Ding, Junhua
- Subjects
MACHINE learning ,SCIENCE education ,COMPUTER science ,PEER pressure ,RANDOM forest algorithms - Abstract
Pre-evaluating scientific paper quality aids in alleviating peer review pressure and fostering scientific advancement. Although prior studies have identified numerous quality-related features, their effectiveness and representativeness of paper content remain to be comprehensively investigated. Addressing this issue, we propose a content-based interpretable method for pre-evaluating the quality of scientific papers. Firstly, we define quality attributes of computer science (CS) papers as integrity , clarity , novelty , and significance , based on peer review criteria from 11 top-tier CS conferences. We formulate the problem as two classification tasks: Accepted/Disputed/Rejected (ADR) and Accepted/Rejected (AR). Subsequently, we construct fine-grained features from metadata and knowledge entity networks, including text structure, readability, references, citations, semantic novelty, and network structure. We empirically evaluate our method using the ICLR paper dataset, achieving optimal performance with the Random Forest model, yielding F1 scores of 0.715 and 0.762 for the two tasks, respectively. Through feature analysis and case studies employing SHAP interpretable methods, we demonstrate that the proposed features enhance the performance of machine learning models in scientific paper quality evaluation, offering interpretable evidence for model decisions. • Define four criteria for quality evaluation of scientific papers: integrity, clarity, novelty, and significance. • Propose a framework for quality evaluation of scientific papers based on coarse features and knowledge entity network. • An effective algorithm for measuring the novelty and significance of scientific papers based on knowledge entity networks. • Create and release a rigorous dataset, which could serve as the gold standard for quality evaluation of scientific papers. • Conduct extensive experiments to validate the effectiveness of the proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Machine Learning Methods for Evaluation of Technical Factors of Spraying in Permanent Plantations.
- Author
-
Tadić, Vjekoslav, Radočaj, Dorijan, and Jurišić, Mladen
- Subjects
MACHINE learning ,RADIAL basis functions ,ELECTRONIC paper ,KERNEL functions ,RANDOM forest algorithms - Abstract
Considering the demand for the optimization of the technical factors of spraying for a greater area coverage and minimal drift, field tests were carried out to determine the interaction between the area coverage, number of droplets per cm
2 , droplet diameter, and drift. The studies were conducted with two different types of sprayers (axial and radial fan) in an apple orchard and a vineyard. The technical factors of the spraying interactions were nozzle type (ISO code 015, code 02, and code 03), working speed (6 and 8 km h−1 ), and spraying norm (250–400 L h−1 ). The airflow of both sprayers was adjusted to the plantation leaf mass and the working pressure was set for each repetition separately. A method using water-sensitive paper and a digital image analysis was used to collect data on coverage factors. The data from the field research were processed using four machine learning models: quantile random forest (QRF), support vector regression with radial basis function kernel (SVR), Bayesian Regularization for Feed-Forward Neural Networks (BRNN), and Ensemble Machine Learning (ENS). Nozzle type had the highest predictive value for the properties of number of droplets per cm2 (axial = 69.1%; radial = 66.0%), droplet diameter (axial = 30.6%; radial = 38.2%), and area coverage (axial = 24.6%; radial = 34.8%). Spraying norm had the greatest predictive value for area coverage (axial = 43.3%; radial = 26.9%) and drift (axial = 72.4%; radial = 62.3%). Greater coverage of the treated area and a greater number of droplets were achieved with the radial sprayer, as well as less drift. The accuracy of the machine learning model for the prediction of the treated surface showed a satisfactory accuracy for most properties (R2 = 0.694–0.984), except for the estimation of the droplet diameter for an axial sprayer (R2 = 0.437–0.503). [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
6. Raman spectrum characteristics and aging diagnosis of oil-paper insulation with different oil-paper ratios.
- Author
-
Zhou, Yongkuo, Chen, Weigen, Yang, Dingkun, and Zhang, Ruyue
- Subjects
RAMAN spectroscopy ,RANDOM forest algorithms ,INSULATING oils ,SUPPORT vector machines - Abstract
In this paper, a method of aging diagnosis under different oil-paper ratios is proposed by investigating the accelerated thermal aging data with different oil-paper ratios. Aged oil-paper insulation samples with oil-paper ratios of 10:1, 15:1 and 20:1 are obtained by accelerated thermal aging tests. The Raman spectra of the transformer oil samples are measured by a Raman spectrometer, and screened by random forest algorithm to extract the features. A modified model is established to correcting the features of the aged oil samples under the conditions of 15:1 and 20:1 oil-paper ratio to features under the condition of 10:1 oil-paper ratio. An grid search optimized support vector machine (SVM) classification model is established, and verified by the corrected features. The experimental results show that the proposed method can effectively identify the aging of oil-paper insulation with different oil-paper ratios. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
7. Comprehensive evaluation of the transformer oil-paper insulation state based on RF-combination weighting and an improved TOPSIS method.
- Author
-
Fugen Song and Shichao Tong
- Subjects
- *
ELECTRIC transformers , *RANDOM forest algorithms , *FEATURE extraction , *TOPSIS method , *DATA analysis - Abstract
The accurate identification of the oil-paper insulation state of a transformer is crucial for most maintenance strategies. This paper presents a multi-feature comprehensive evaluation model based on combination weighting and an improved technique for order of preference by similarity to ideal solution (TOPSIS) method to perform an objective and scientific evaluation of the transformer oil-paper insulation state. Firstly, multiple aging features are extracted from the recovery voltage polarization spectrum and the extended Debye equivalent circuit owing to the limitations of using a single feature for evaluation. A standard evaluation index system is then established by using the collected time-domain dielectric spectrum data. Secondly, this study implements the per-unit value concept to integrate the dimension of the index matrix and calculates the objective weight by using the random forest algorithm. Furthermore, it combines the weighting model to overcome the drawbacks of the single weighting method by using the indicators and considering the subjective experience of experts and the random forest algorithm. Lastly, the enhanced TOPSIS approach is used to determine the insulation quality of an oil-paper transformer. A verification example demonstrates that the evaluation model developed in this study can efficiently and accurately diagnose the insulation status of transformers. Essentially, this study presents a novel approach for the assessment of transformer oil-paper insulation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. A Machine Learning Model to Predict Citation Counts of Scientific Papers in Otology Field.
- Author
-
Alohali, Yousef A., Fayed, Mahmoud S., Mesallam, Tamer, Abdelsamad, Yassin, Almuhawas, Fida, and Hagr, Abdulrahman
- Subjects
DECISION trees ,SERIAL publications ,NATURAL language processing ,BIBLIOMETRICS ,MACHINE learning ,REGRESSION analysis ,RANDOM forest algorithms ,CITATION analysis ,DESCRIPTIVE statistics ,PREDICTION models ,ARTIFICIAL neural networks ,MEDICAL research ,MEDICAL specialties & specialists ,ALGORITHMS - Abstract
One of the most widely used measures of scientific impact is the number of citations. However, due to its heavy-tailed distribution, citations are fundamentally difficult to predict but can be improved. This study was aimed at investigating the factors and parts influencing the citation number of a scientific paper in the otology field. Therefore, this work proposes a new solution that utilizes machine learning and natural language processing to process English text and provides a paper citation as the predicted results. Different algorithms are implemented in this solution, such as linear regression, boosted decision tree, decision forest, and neural networks. The application of neural network regression revealed that papers' abstracts have more influence on the citation numbers of otological articles. This new solution has been developed in visual programming using Microsoft Azure machine learning at the back end and Programming Without Coding Technology at the front end. We recommend using machine learning models to improve the abstracts of research articles to get more citations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Analysis of handmade paper by Raman spectroscopy combined with machine learning.
- Author
-
Yan, Chunsheng, Cheng, Zhongyi, Luo, Si, Huang, Chen, Han, Songtao, Han, Xiuli, Du, Yuandong, and Ying, Chaonan
- Subjects
- *
MACHINE learning , *RAMAN spectroscopy , *SUPPORT vector machines , *K-nearest neighbor classification , *PRINCIPAL components analysis , *RANDOM forest algorithms , *SPECTRAL imaging , *MULTISPECTRAL imaging - Abstract
Handmade paper is a major carrier and restoration material of traditional Chinese ancient books, calligraphies, and paintings. In this study, we carried out a Raman spectroscopy analysis of 18 types of handmade paper samples. The main components of the handmade paper were cellulose and lignin, according to the wavenumber and Raman vibration assignment. We divided its Raman spectrum into eight subbands. Five machine learning models were employed: principal component analysis (PCA), partial least squares (PLS), support vector machine (SVM), k‐nearest neighbors (KNN), and random forest (RF). The Raman spectral data were normalized, and the fluorescence envelope was subtracted using the airPLS algorithm to obtain four types of data, raw, normalized, defluorescence, and fluorescence data. An RF variable importance analysis of data processing showed that data normalization eliminated the intensity differences of fluorescence signals caused by lignin, which contained important information of raw materials and papermaking technology, let alone the data defluorescence. The data processing also reduced the importance of the average variables in almost all spectral bands. Nevertheless, the data processing is worthwhile because it significantly improves the accuracy of machine learning, and the information loss does not affect the prediction. Using the machine learning models of PCA, PLS, and SVM combined with linear regression (LR), KNN, and RF, the classification and prediction of handmade paper samples were realized. For almost all processed data, including the fluorescence data, PCA‐LR had the highest classification and prediction accuracy (R2 = 1) in almost all spectral bands. PLS‐LR and SVM‐LR had the second‐highest accuracies (R2 = 0.4–0.9), whereas KNN and RF had the lowest accuracies (R2 = 0.1–0.4) for full band spectral data. Our results suggest that the abundant information contained in Raman spectroscopy combined with powerful machine learning models could inspire further studies on handmade paper and related cultural relics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Application of Selected Machine Learning Techniques for Identification of Basic Classes of Partial Discharges Occurring in Paper-Oil Insulation Measured by Acoustic Emission Technique.
- Author
-
Boczar, Tomasz, Borucki, Sebastian, Jancarczyk, Daniel, Bernas, Marcin, and Kurtasz, Pawel
- Subjects
- *
ACOUSTIC emission , *PARTIAL discharges , *NAIVE Bayes classification , *SUPPORT vector machines , *MACHINE learning , *RANDOM forest algorithms , *CLASSIFICATION algorithms , *K-nearest neighbor classification - Abstract
The paper reports the results of a comparative assessment concerned with the effectiveness of identifying the basic forms of partial discharges (PD) measured by the acoustic emission technique (AE), carried out by application of selected machine learning methods. As part of the re-search, the identification involved AE signals registered in laboratory conditions for eight basic classes of PDs that occur in paper-oil insulation systems of high-voltage power equipment. On the basis of acoustic signals emitted by PDs and by application of the frequency descriptor that took the form of a signal power density spectrum (PSD), the assessment involved the possibility of identifying individual types of PD by the analyzed classification algorithms. As part of the research, the results obtained with the use of five independent classification mechanisms were analyzed, namely: k-Nearest Neighbors method (kNN), Naive Bayes Classification, Support Vector Machine (SVM), Random Forests and Probabilistic Neural Network (PNN). The best results were achieved using the SVM classification tuned with polynomial core, which obtained 100% accuracy. Similar results were achieved with the kNN classifier. Random Forests and Naïve Bayes obtained high accuracy over 97%. Throughout the study, identification algorithms with the highest effectiveness in identifying specific forms of PD were established. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. Evaluation method for moisture content of oil‐paper insulation based on segmented frequency domain spectroscopy: From curve fitting to machine learning.
- Author
-
Yao, Huanmin, Mu, Haibao, Ding, Ning, Zhang, Daning, Liang, ZhaoJie, Tian, Jie, and Zhang, Guanjun
- Subjects
- *
SPECTROMETRY , *MOISTURE , *MACHINE learning , *RANDOM forest algorithms , *DECISION trees - Abstract
In recent years, frequency domain spectroscopy (FDS) is often used to evaluate oil paper insulation state in power transformer bushing. But it is still very difficult to evaluate the moisture content accurately and quickly. In order to solve this problem, this paper proposes an intelligent algorithm based on random forest regression (RFR) to construct an efficient evaluation method through segmented FDS curves. Furthermore, the characteristics of FDS curves were studied and the intelligent method was compared with support vector regression (SVR) and deep neural networks (DNN). The results show that the dielectric loss, the real part and imaginary part of complex capacitance all move upward with the moisture increasing, so they can be used as the input feature of the evaluation model; The moisture content evaluation accuracy of the RFR model in the whole frequency band is higher than that of SVR and DNN models; With the increase of lower cut off frequency (FDS test stop frequency), the FDS test time is greatly shortened, and the accuracy of the RFR model can still meet the evaluation requirements. Therefore, the data in a compromise frequency band can be used to evaluate the moisture content of oil paper insulation accurately and quickly. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. Optimal Feature Selection through Search-Based Optimizer in Cross Project.
- Author
-
Faiz, Rizwan bin, Shaheen, Saman, Sharaf, Mohamed, and Rauf, Hafiz Tayyab
- Subjects
FEATURE selection ,RANDOM forest algorithms ,GENETIC algorithms ,INDEPENDENT variables ,FILTER paper - Abstract
Cross project defect prediction (CPDP) is a key method for estimating defect-prone modules of software products. CPDP is a tempting approach since it provides information about predicted defects for those projects in which data are insufficient. Recent studies specifically include instructions on how to pick training data from large datasets using feature selection (FS) process which contributes the most in the end results. The classifier helps classify the picked-up dataset in specified classes in order to predict the defective and non-defective classes. The aim of our research is to select the optimal set of features from multi-class data through a search-based optimizer for CPDP. We used the explanatory research type and quantitative approach for our experimentation. We have F1 measure as our dependent variable while as independent variables we have KNN filter, ANN filter, random forest ensemble (RFE) model, genetic algorithm (GA), and classifiers as manipulative independent variables. Our experiment follows 1 factor 1 treatment (1F1T) for RQ1 whereas for RQ2, RQ3, and RQ4, there are 1 factor 2 treatments (1F2T) design. We first carried out the explanatory data analysis (EDA) to know the nature of our dataset. Then we pre-processed our data by removing and solving the issues identified. During data preprocessing, we analyze that we have multi-class data; therefore, we first rank features and select multiple feature sets using the info gain algorithm to get maximum variation in features for multi-class dataset. To remove noise, we use ANN-filter and get significant results more than 40% to 60% compared to NN filter with base paper (all, ckloc, IG). Then we applied search-based optimizer i.e., random forest ensemble (RFE) to get the best features set for a software prediction model and we get 30% to 50% significant results compared with genetic instance selection (GIS). Then we used a classifier to predict defects for CPDP. We compare the results of the classifier with base paper classifier using F1-measure and we get almost 35% more than base paper. We validate the experiment using Wilcoxon and Cohen's d test. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. 基于考虑滞后性LSTM模型的电商需求预测.
- Author
-
包吉祥, 李 林, and 赵梦鸽
- Subjects
CONSUMER behavior ,CONSUMER goods ,RANDOM forest algorithms ,PAPER products ,ONLINE shopping ,DEMAND forecasting - Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
- Full Text
- View/download PDF
14. Predicting the clinical citation count of biomedical papers using multilayer perceptron neural network.
- Author
-
Li, Xin, Tang, Xuli, and Cheng, Qikai
- Subjects
CITATION networks ,SUPPORT vector machines ,RANDOM forest algorithms ,SCIENTIFIC community - Abstract
• This study predicted the clinical citation count of biomedical papers using a multilayer perceptron neural network model, which outperformed other five baseline models (i.e., linear regression, support vector machine, random forest, KNN, and XGBoost). • The most important features for predicting clinical count of biomedical papers are features in the reference dimension, which have been ignored in previous research. Meanwhile, clinical translation-related features are important for predicting clinical count of basic papers but not the papers closer to clinical research. • Features that have previously demonstrated to be highly related to the citation count of academic papers, are not important for the clinical citation count prediction of biomedical papers. • This study could be useful for policymakers and pharmaceutical companies to early assess the translational progress of biomedical research and to monitor the biomedical research with high potential to be clinically translated in real time. The number of clinical citations received from clinical guidelines or clinical trials has been considered as one of the most appropriate indicators for quantifying the clinical impact of biomedical papers. Therefore, the early prediction of clinical citation count of biomedical papers is critical to scientific activities in biomedicine, such as research evaluation, resource allocation, and clinical translation. In this study, we designed a four-layer multilayer perceptron neural network (MPNN) model to predict the clinical citation count of biomedical papers in the future by using 9,822,620 biomedical papers published from 1985 to 2005. We extracted ninety-one paper features from three dimensions as the input of the model, including twenty-one features in the paper dimension, thirty-five in the reference dimension, and thirty-five in the citing paper dimension. In each dimension, the features can be classified into three categories, i.e., the citation-related features, the clinical translation-related features, and the topic-related features. Besides, in the paper dimension, we also considered the features that have previously been demonstrated to be related to the citation counts of research papers. The results showed that the proposed MPNN model outperformed the other five baseline models, and the features in the reference dimension were the most important. In all the three dimensions, the citation-related and topic-related features were more important than the clinical translation-related features for the prediction. It also turned out that the features helpful in predicting the citation count of papers are not important for predicting the clinical citation count of biomedical papers. Furthermore, we explored the MPNN model based on different categories of biomedical papers. The results showed that the clinical translation-related features were more important for the prediction of clinical citation count of basic papers rather than those papers closer to clinical science. This study provided a novel dimension (i.e., the reference dimension) for the research community and could be applied to other related research tasks, such as the research assessment for translational programs. In addition, the findings in this study could be useful for biomedical authors (especially for those in basic science) to get more attention from clinical research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. Consumer behaviour in e-Tourism: Exploring new applications of machine learning in tourism studies.
- Author
-
Mendieta-Aragón, Adrián and Garín-Muñoz, Teresa
- Subjects
CONSUMER behavior ,MACHINE learning ,TOURISM websites ,RANDOM forest algorithms ,CONSUMERS' reviews ,FOOD tourism ,ELECTRONIC paper ,TOURISM ,REGIONAL economic disparities - Abstract
Copyright of Investigaciones Turisticas is the property of Investigaciones Turisticas and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
16. 31‐2: Student Paper: Fermi Level Prediction of Solution‐processed Ultra‐wide Band gap a‐Ga2Ox via Supervised Machine Learning Models.
- Author
-
Purnawati, Diki, Regonia, Paul Rossener, Bermundo, Juan Paolo, Ikeda, Kazushi, and Uraoka, Yukiharu
- Subjects
SUPERVISED learning ,FERMI level ,BAND gaps ,MACHINE learning ,RANDOM forest algorithms ,ULTRA-wideband radar - Abstract
This work presents machine learning (ML) assisted Fermi level prediction of solution‐processed ultra‐wide bandgap (UWB) amorphous gallium oxide (a‐Ga2Ox) which can significantly accelerate the fabrication of semiconducting UWB a‐Ga2Ox‐based material for future display application. Different models such as Kernel Ridge Regression (KRR), Support Vector Regression (SVR) and Random Forest Regression (RFR) were trained with empirical features, including experimental thickness, annealing temperature and environment during the solution‐processed UWB a‐Ga2Ox film fabrication. This work is a big step towards rapid and cost‐effective optimization method of fabricating UWB a‐Ga2Ox‐based devices. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. The use of artificial intelligence in learning management systems in the context of higher education: Systematic literature review.
- Author
-
Manhiça, Ruben, Santos, Arnaldo, and Cravino, José
- Subjects
ARTIFICIAL intelligence ,LEARNING Management System ,HIGHER education ,EDUCATIONAL technology ,RANDOM forest algorithms - Abstract
Artificial intelligence (AI) has been developing, and its application is spreading at a good pace in recent years, so much so that AI has become part of everyday life in various sectors. According to several international reports, AI in Education is one of the emerging fields of technology in the education sector, from where much research is being developed to support educational processes. This paper aims to provide an overview of the research on AI applications in education management systems (LMS) in higher education through a systematic literature review following the protocol proposed by Kitchenham [1]. Three hundred six papers were initially identified from Scopus and EBSCOhost databases from 2010 to 2022, from which 33 papers were selected for final analysis according to the defined inclusion and exclusion criteria. The research results show that the LMS most used for implementing AI solutions in education is Moodle and that AI has been most used for student performance assessment based on student data. Among the AI algorithms used, Random Forest, Neural Networks, K-means, Naive Bayes, Support Vector Machine, and decision trees stand out. [ABSTRACT FROM AUTHOR]
- Published
- 2022
18. 2021 Georisk best paper award, most cited paper award and best EBM award.
- Author
-
Zhang, Limin
- Subjects
AWARDS ,RANDOM forest algorithms - Published
- 2022
- Full Text
- View/download PDF
19. Ink analysis based forensic investigation of handwritten legal documents.
- Author
-
Roy, Priyanka and Bag, Soumen
- Subjects
LEGAL documents ,FORENSIC sciences ,NEW words ,RANDOM forest algorithms ,ELECTRONIC paper - Abstract
Document falsification is among the fastest growing problems all over the world. Disclosure of such document is not always possible due to the conspiracy of attorney bodies; especially legal documents such as bank cheques, contracts, cash memos, and so on. Handwritten document tampering detection due to addition of new word(s) in judicial documents is the prime objective of this research. Minute alteration in writing causes financial loss to a person or to an organization and decreases the global economy. Such intangible assets remain undiscovered owing to lack of proper forensic techniques. Though writing style imitation can be possible, however, the possibility of getting exactly the same pen of the authorized document is quite impossible for an imitator. Hence, the paper introduces a solution to detect forgery in handwritten legal documents by analyzing perceptually similar pen ink. Forgery activity happens either ends of a written document by appending new word(s)/letter(s) with similar type of pen. The work is formulated as a binary classification problem and established with the help of several statistical features and three different classifiers: Multilayer Perceptron(MLP), RBF-SVM, and Random Forest(RF). Besides, the problem has also been implemented through some DCNN approaches to check whether it is possible to reflect the forgery by direct approaches. The efficiency of the proposed method is quite promising for involvement in the examination of forensic documents. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Research on the improvement of teachers' teaching ability based on machine learning and digital twin technology.
- Author
-
Siyan, Chen, Tinghuai, Wang, Xiaomei, Li, Liu, Zhu, Danying, Wu, Paul, Anand, Cheung, Simon K.S., Ho, Chiung Ching, and Din, Sadia
- Subjects
TEACHER development ,MACHINE learning ,RANDOM forest algorithms ,ELECTRONIC paper ,DATA scrubbing - Abstract
The qualitative analysis results of teachers' abilities are difficult to quantify, and ability problems in the teaching process are difficult to be effectively measured. In order to study methods to improve teachers' teaching abilities, this paper builds a corresponding teacher competence evaluation model based on machine learning and digital twin technology, establishes a data collection model for teachers' professional competence, and establishes a data fusion model. It includes data cleaning model based on XML information template, data integration model, multi-index screening mechanism and clustering strategy based on perturbation attributes. On this basis, this paper uses decision tree algorithm, random forest algorithm and neural network algorithm to construct three scheduling rule mining models aiming at teachers' professional ability. In addition, this paper establishes a digital twin-driven multi-knowledge model scheduling optimization architecture that uses the three scheduling rules mined. The research results show that the model constructed in this paper has good performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. Quantum Inspired Evolutionary Computing Algorithms for Complex Optimization Problems.
- Author
-
JIA-BAO LIU
- Subjects
QUANTUM computers ,ARTIFICIAL intelligence ,CONVOLUTIONAL neural networks ,EVOLUTIONARY algorithms ,RANDOM forest algorithms ,FEATURE extraction ,SOFT computing - Published
- 2024
22. CRIME STATUS PREDICTION USING ENSEMBLE LEARNING.
- Author
-
JAIN, SANJAY and SINGH, PRASHANT
- Subjects
LOCATION data ,CRIME ,RANDOM forest algorithms ,DECISION trees ,CRIME statistics ,DATA analysis ,FORECASTING - Abstract
This paper focuses on crime status prediction through an ensemble methodology applied to extensive datasets obtained from catalog.data.gov, specifically targeting Los Angeles crime incidents since 2020. The research methodology comprises meticulous data collection, rigorous preprocessing, exploratory data analysis, model selection, and comprehensive model evaluation. Initial challenges included data inaccuracies and privacy-preserving measures in location data, necessitating thorough cleaning and transformation processes. Exploratory data analysis revealed crucial insights, including the 'Status' attribute's limited correlation, crime code distributions, areawise crime counts, and temporal patterns. To address class imbalance within 'Status', the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance the dataset. Model evaluation highlighted the superiority of random forest models employing 10 and 20 decision trees, alongside KNN, which demonstrated consistent high accuracy, balanced precision-recall trade-offs, and notable F1 scores in crime status prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Applied Computing and Artificial Intelligence.
- Author
-
Li, Xiang, Zhang, Shuo, and Zhang, Wei
- Subjects
ARTIFICIAL intelligence ,DEEP learning ,RANDOM forest algorithms ,DIFFERENTIAL evolution ,CONVOLUTIONAL neural networks ,REMAINING useful life ,DRIVER assistance systems - Abstract
Applied computing and artificial intelligence methods have been attracting growing interest in recent years due to their effectiveness in solving technical problems. Hao et al. [[14]] present an unsupervised fault diagnosis methodology to leverage the generated MPCs of different working conditions to diagnose the actual unlabeled MPCs. The paper by Ainapure et al. [[17]] proposes a new cross-domain fault diagnosis method with enhanced robustness. The paper by Saeed et al. [[1]] proposes an approach to building an AutoML data-dependent CNN model (DeepPCANet) customized for DR screening automatically. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
24. Natural killer cell detection, quantification, and subpopulation identification on paper microfluidic cell chromatography using smartphone-based machine learning classification.
- Author
-
Zenhausern, Ryan, Day, Alexander S., Safavinia, Babak, Han, Seungmin, Rudy, Paige E., Won, Young-Wook, and Yoon, Jeong-Yeol
- Subjects
- *
MACHINE learning , *MICROFLUIDIC devices , *SMARTPHONES , *MICROFLUIDICS , *RANDOM forest algorithms , *CELL analysis , *CHROMATOGRAPHIC analysis , *KILLER cells - Abstract
Natural killer (NK) cells are immune cells that defend against viral infections and cancer and are used in cancer immunotherapies. Subpopulations of NK cells include CD56dim and CD56bright which either produce cytokines or cytotoxically kill cells directly. The absolute number and proportion of these cells in peripheral blood are tied to proper immune function. Current methods of cytokine detection and proportion of NK cell subpopulations require fluorescent dyes and highly specialized equipment, e.g., flow cytometry, thus rapid cell quantification and subpopulation analysis are needed in the clinical setting. Here, a smartphone-based device and a two-component paper microfluidic chip were used towards identifying NK cell subpopulation and inflammatory markers. One unit measured flow velocity via smartphone-captured video, determining cytokine (IL-2) and total NK cell concentrations in undiluted buffy coat blood samples. The other, single flow lane unit performs spatial separation of CD56dim and CD56bright and cells over its length using differential binding of anti-CD56 nanoparticles. A smartphone microscope combined with cloud-based machine learning predictive modeling (utilizing a random forest classification algorithm) analyzed both flow data and NK cell subpopulation differentiation. Limits of detection for cytokine and cell concentrations were 98 IU/mL and 68 cells/mL, respectively, and cell subpopulation analysis showed 89% accuracy. • First smartphone-based paper microfluidic cell chromatography that can identify cell subpopulation. • Machine learning predictive modeling for NK cell subpopulation differentiation. • Integration of both cell chromatography and flow rate analysis on a single platform. • Potential application to many other cytokines and cell subpopulation analyses. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. PREDICTIVE CULTIVATION: INTEGRATING METEOROLOGICAL DATA AND MACHINE LEARNING FOR ENHANCED CROP YIELD FORECAST.
- Author
-
KALYANI, BJD, SHAHANAZ, SHAIK, and SAI, KOPPARTHI PRANEETH
- Subjects
MACHINE learning ,CHICKPEA ,RANDOM forest algorithms ,CROP yields ,AGRICULTURE - Abstract
Agriculture is a key component of Telangana's economy, and greater performance in this sector is crucial for inclusive growth. A central challenge is yielding estimation to predict crop yields before harvesting. This paper addresses this challenge with machine learning approaches includes Naive Bayes, KNN and Random Forest. The parameters considered for model testing are crop, season, rainfall and location. This paper includes a case study of Telangana with the help of Telangana weather data set to provide analysis on the key factors like overall rainfall recorded with respect to each Mandal, overall seasonal yield in selected years, seasonal yield of major crops like Bengal gram, groundnut and maize, and overall yield in two different agricultural seasons: rabi and kharif. Random forest machine learning model produces highest accuracy of 99.32% when compared with other process models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Assessment of Ensemble-Based Machine Learning Algorithms for Exoplanet Identification.
- Author
-
Luz, Thiago S. F., Braga, Rodrigo A. S., and Ribeiro, Enio R.
- Subjects
RANDOM forest algorithms ,ALGORITHMS ,EXTRASOLAR planets ,MATRICES (Mathematics) ,CLASSIFICATION - Abstract
This paper presents a comprehensive assessment procedure for evaluating Ensemble-based Machine Learning algorithms in the context of exoplanet classification. Each of the algorithm hyperparameter values were tuned. Deployments were carried out using the cross-validation method. Performance metrics, including accuracy, sensitivity, specificity, precision, and F1 score, were evaluated using confusion matrices generated from each implementation. Machine Learning (ML) algorithms were trained and used to identify exoplanet data. Most of the current research deals with traditional ML algorithms for this purpose. The Ensemble algorithm is another type of ML technique that combines the prediction performance of two or more algorithms to obtain an improved final prediction. Few studies have applied Ensemble algorithms to predict exoplanets. To the best of our knowledge, no paper that has exclusively assessed Ensemble algorithms exists, highlighting a significant gap in the literature about the potential of Ensemble methods. Five Ensemble algorithms were evaluated in this paper: Adaboost, Random Forest, Stacking, Random Subspace Method, and Extremely Randomized Trees. They achieved an average performance of more than 80% in all metrics. The results underscore the substantial benefits of fine tuning hyperparameters to enhance predictive performance. The Stacking algorithm achieved a higher performance than the other algorithms. This aspect is discussed in this paper. The results of this work show that it is worth increasing the use of Ensemble algorithms to improve exoplanet identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Startup Sustainability Forecasting with Artificial Intelligence.
- Author
-
Takas, Nikolaos, Kouloumpris, Eleftherios, Moutsianas, Konstantinos, Liapis, Georgios, Vlahavas, Ioannis, and Kousenidis, Dimitrios
- Subjects
NATURAL language processing ,DECISION support systems ,ARTIFICIAL intelligence ,ATTENTION span ,RANDOM forest algorithms - Abstract
In recent years, we have witnessed a massive increase in the number of startups, which are also producing significant amounts of digital data. This poses a new challenge for expert analysts due to their limited attention spans and knowledge, also considering the low success rate of empirical startup evaluation. However, this new era also presents a great opportunity for the application of artificial intelligence (AI) towards intelligent startup investments. There are only a few works that have considered the potential of AI for startup recommendation, and they have not paid attention to the actual requirements of investors, also neglecting to investigate the desirability, feasibility, and value proposition of this venture. In this paper, we answer these questions by conducting a survey in collaboration with three major organizations of the Greek startup ecosystem. Furthermore, this paper also presents the design specifications for an AI-based decision support system for forecasting startup sustainability that is aligned with the requirements of expert analysts. Preliminary experiments with 44 Greek startups demonstrate Random Forest's strong ability to predict sustainability scores. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Interactive 3D Vase Design Based on Gradient Boosting Decision Trees.
- Author
-
Wang, Dongming, Xu, Xing, Xia, Xuewen, and Jia, Heming
- Subjects
OPTIMIZATION algorithms ,TECHNOLOGICAL innovations ,COMPUTER-aided design software ,GENETIC algorithms ,RANDOM forest algorithms - Abstract
Traditionally, ceramic design began with sketches on rough paper and later evolved into using CAD software for more complex designs and simulations. With technological advancements, optimization algorithms have gradually been introduced into ceramic design to enhance design efficiency and creative diversity. The use of Interactive Genetic Algorithms (IGAs) for ceramic design is a new approach, but an IGA requires a significant amount of user evaluation, which can result in user fatigue. To overcome this problem, this paper introduces the LightGBM algorithm and the CatBoost algorithm to improve the IGA because they have excellent predictive capabilities that can assist users in evaluations. The algorithms are also applied to a vase design platform for validation. First, bicubic Bézier surfaces are used for modeling, and the genetic encoding of the vase is designed with appropriate evolutionary operators selected. Second, user data from the online platform are collected to train and optimize the LightGBM and CatBoost algorithms. Finally, LightGBM and CatBoost are combined with an IGA and applied to the vase design platform to verify their effectiveness. Comparing the improved algorithm to traditional IGAs, KD trees, Random Forest, and XGBoost, it is found that IGAs improve with LightGBM, and CatBoost performs better overall, requiring fewer evaluations and less time. Its R
2 is higher than other proxy models, achieving 0.816 and 0.839, respectively. The improved method proposed in this paper can effectively alleviate user fatigue and enhance the user experience in product design participation. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
29. ANALYSIS OF AN ENHANCED RANDOM FOREST ALGORITHM FOR IDENTIFYING ENCRYPTED NETWORK TRAFFIC.
- Author
-
Xiaoqing Yang, Angkawisittpan, Niwat, and Xinyue Feng
- Subjects
RANDOM forest algorithms ,MACHINE learning ,ARTIFICIAL intelligence ,INTERNET security ,DECISION making - Abstract
The focus of this paper is to apply an improved machine learning algorithm to realize the efficient and reliable identification and classification of network communication encrypted traffic, and to solve the challenges faced by traditional algorithms in analyzing encrypted traffic after adding encryption protocols. In this study, an enhanced random forest (ERF) algorithm is introduced to optimize the accuracy and efficiency of the identification and classification of encrypted network traffic. Compared with traditional methods, it aims to improve the identification ability of encrypted traffic and fill the knowledge gap in this field. Using the publicly available datasets and preprocessing the original PCAP format packets, the optimal combination of the relevant parameters of the tree was determined by grid search cross-validation, and the experimental results were evaluated in terms of performance using accuracy, precision, recall and F1 score, which showed that the average precision was more than 98 %, and that compared with the traditional algorithm, the error rate of the traffic test set was reduced, and the data of each performance evaluation index were better, which It shows that the advantages of the improved algorithm are obvious. In the experiment, the enhanced random forest and traditional random forest models were trained and tested on a series of data sets and the corresponding test errors were listed as the basis for judging the model quality. The experimental results show that the enhanced algorithm has good competitiveness. These findings have implications for cybersecurity professionals, researchers, and organizations, providing a practical solution to enhance threat detection and data privacy in the face of evolving encryption technologies. This study provides valuable insights for practitioners and decision-makers in the cybersecurity field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery.
- Author
-
Xiao Zhang, Liangyun Liu, Xidong Chen, Yuan Gao, Shuai Xie, and Jun Mi
- Subjects
SPATIAL systems ,COMPUTING platforms ,LAND cover ,CLASSIFICATION ,RANDOM forest algorithms ,PAPER products ,TIME series analysis - Abstract
Over past decades, a lot of global land-cover products have been released, however, these is still lack of a global land-cover map with fine classification system and spatial resolution simultaneously. In this study, a novel global 30-m land-cover classification with a fine classification system for the year 2015 (GLC_FCS30-2015) was produced by combining time-series of Landsat imagery and high-quality training data from the GSPECLib (Global Spatial Temporal Spectra Library) on the Google Earth Engine computing platform. First, the global training data from the GSPECLib were developed by applying a series of rigorous filters to the MCD43A4 NBAR and CCI_LC land-cover products. Secondly, a local adaptive random forest model was built for each 5° x 5° geographical tile by using the multi-temporal Landsat spectral and textures features of the corresponding training data, and the GLC_FCS30-2015 land-cover product containing 30 land-cover types was generated for each tile. Lastly, the GLC_FCS30-2015 was validated using three different validation systems (containing different land-cover details) using 44 043 validation samples. The validation results indicated that the GLC_FCS30-2015 achieved an overall accuracy of 82.5 % and a kappa coefficient of 0.784 for the level-0 validation system (9 basic land-cover types), an overall accuracy of 71.4 % and kappa coefficient of 0.686 for the UN-LCCS (United Nations Land Cover Classification System) level-1 system (16 LCCS land-cover types), and an overall accuracy of 68.7 % and kappa coefficient of 0.662 for the UN-LCCS level-2 system (24 fine land-cover types). The comparisons against other land-cover products (CCI_LC, MCD12Q1, FROM_GLC and GlobeLand30) indicated that GLC_FCS30-2015 provides more spatial details than CCI_LC-2015 and MCD12Q1-2015 and a greater diversity of land-cover types than FROM_GLC-2015 and GlobeLand30-2010, and that GLC_FCS30-2015 achieved the best overall accuracy of 82.5% against FROM_GLC-2015 of 59.1 % and GlobeLand30-2010 of 75.9 %. Therefore, it is concluded that the GLC_FCS30-2015 product is the first global land-cover dataset that provides a fine classification system with high classification accuracy at 30 m. The GLC_FCS30-2015 global land-cover products generated in this paper is available at https://doi.org/10.5281/zenodo.3986871 (Liu et al., 2020). [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Sentiment Analysis of Coastal Karnataka Daijiworld users with Classic ML Models.
- Author
-
D., Sushma M., Geethalaxmi, and K., Ranganath
- Subjects
MACHINE learning ,SENTIMENT analysis ,K-nearest neighbor classification ,SUPPORT vector machines ,RANDOM forest algorithms - Abstract
The "Daijiworld News" forum, a well-known news website in coastal Karnataka, was the source of the comments for this paper's sentiment analysis study, which was done on about 15,000 reader comments. The comments were scraped using Beautiful Soup, a popular web scraping library and labelled as positive, negative, and neutral. Pre-processing of comments was made using techniques such as stop word removal, tokenization, stemming, lemmatization, and lowercase conversion. Logistic regression, support vector machine (SVM), naive Bayes, random forest, K-nearest neighbors (KNN), AdaBoost, gradient boosting and neural networks was used for classification. Performance metrics including accuracy, precision, recall, and F1 score were evaluated. Logistic regression achieved the highest precision (0.75), recall (0.74), accuracy (0.74), and F1 score (0.74), followed closely by the neural network classifier with a precision of 0.670, recall of 0.670, accuracy of 0.670, and F1 score of 0.669. The study demonstrates the effectiveness of logistic regression and neural networks in sentiment analysis of news forum comments, giving insightful information to grasp public opinion and improving user engagement. The findings contribute to the field of sentiment analysis, emphasising the significance of web scraping and pre-processing techniques in enhancing sentiment classification accuracy. The results serve as a reference for researchers and practitioners, assisting in the selection of appropriate classifiers for sentiment analysis in similar contexts. The study encourages further exploration of advanced techniques to enhance sentiment classification accuracy in regional news forums, paving the way for future research in sentiment analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
32. ENGLISH DISTANCE TEACHING BASED ON SPOC CLASSROOM AND ONLINE MIXED TEACHING MODE.
- Author
-
MEIJUAN ZHANG and XIAOLI ZHU
- Subjects
ONLINE education ,RANDOM forest algorithms ,EDUCATIONAL outcomes ,LEARNING strategies ,SATISFACTION ,STOCHASTIC learning models - Abstract
At present, the English Small Private Online Course (SPOC) online mixed teaching model has problems in evaluating students' learning and organizing teaching papers. For example, the evaluation is chaotic and unable to meet the key points of organizing the paper. Starting from the thinking chain that accepting learning outcomes can promote learning behavior, a score prediction method and test paper generation algorithm (TPGA) based on a learning evaluation diagnostic model are designed. Among them, the performance prediction algorithm is designed by combining multiple linear regression (MLR) and random forest (RF). The TPGA is based on students' learning status. The research results show that most of the predicted values output by the performance prediction model are not significantly different from the actual values. They are within a reasonable range. Meanwhile, under the influence of TPGA, the number of students in the experimental group is higher in the 70-80 and 80-90 segments, with 27 and 6, respectively. The experimental group has a higher average score rate on each type of question and knowledge point. Both models have high student satisfaction, indicating that the results oriented online mixed learning strategy designed in the study can effectively improve students' learning outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Random forest model in tax risk identification of real estate enterprise income tax.
- Author
-
Xu, Chunmei and Kong, Yan
- Subjects
INCOME tax ,RANDOM forest algorithms ,CORPORATE taxes ,TAXPAYER compliance ,REAL property tax ,TAXATION software - Abstract
The text describes improvements made to the random forest model to enhance its distinctiveness in addressing tax risks within the real estate industry, thereby tackling issues related to tax losses. Firstly, the paper introduces the potential application of the random forest model in identifying tax risks. Subsequently, the experimental analysis focuses on the selection of indicators for tax risk. Finally, the paper develops and utilizes actual taxpayer data to test a risk identification model, confirming its effectiveness. The experimental results indicate that the model's output report includes basic taxpayer information, a summary of tax compliance risks, value-added tax refund situations, directions of suspicious items, and detailed information on common indicators. This paper comprehensively presents detailed taxpayer data, providing an intuitive understanding of tax-related risks. Additionally, the paper reveals the level of enterprise risk registration assessment, risk probability, risk value, and risk assessment ranking. Further analysis shows that enterprise risk points primarily exist in operating income, selling expenses, financial expenses, and total profit. Additionally, the results indicate significant differences between the model's judgment values and declared values, especially in the high-risk probability of total operating income and profit. This implies a significant underreporting issue concerning corporate income tax for real estate enterprises. Therefore, this paper contributes to enhancing the identification of tax risks for real estate enterprises. Using the optimized random forest model makes it possible to accurately assess enterprises' tax compliance risks and identify specific risk points. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. DISEASE PREDICTION USING NAIVE BAYES, RANDOM FOREST, DECISION TREE, KNN ALGORITHMS.
- Author
-
JYOTHI, PYLA, KUMAR, A. LOKESH, SRI, G. KAVYA, DAKSHAYANI, D., and KAVYA, K. SRI
- Subjects
RANDOM forest algorithms ,K-nearest neighbor classification ,DECISION trees ,NAIVE Bayes classification ,MACHINE learning ,MODERN society ,MEDICAL databases - Abstract
In contemporary society, encountering individuals afflicted with various diseases is a common occurrence, emphasizing the critical need for accurate disease prediction as an integral facet of effective treatment. This paper focuses on leveraging classification algorithms such as Naive Bayes, Random Forest, Decision Tree, and KNN to predict diseases based on patient symptoms. This system enables users to input symptoms and, through meticulous analysis, accurately forecast the disease the patient may be suffering from. The prediction model extends to specific diseases like heart disease and diabetes, providing the outcome of the presence or absence of a particular ailment. The potential impact of such a predictive system on the future of medical treatment is substantial. Upon disease prediction, the system not only identifies the ailment but also recommends the appropriate type of doctor for consultation. This paper reviews recent advancements in utilizing machine learning for disease prediction and emphasizes the creation of an interactive interface as the front-end for user-friendly symptom input. By leveraging machine learning algorithms, this system extracts valuable insights from medical databases, aiding in early disease prediction, patient care, and community services. A comprehensive analysis was conducted using a dataset comprising 4920 patient records with 41 diseases. This integrated machine learning-based disease prediction system represents a significant step forward in leveraging advanced technologies for enhancing healthcare outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. An overview of hand gesture recognition based on computer vision.
- Author
-
Tasfia, Rifa, Mohd Yusoh, Zeratul Izzah, Habib, Adria Binte, and Mohaimen, Tousif
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER vision ,RANDOM forest algorithms ,RESEARCH personnel ,GESTURE - Abstract
Hand gesture recognition emerges as one of the foremost sectors which has gone through several developments within pattern recognition. Numerous studies and research endeavors have explored methodologies grounded in computer vision within this domain. Despite extensive research endeavors, there is still a need for a more thorough evaluation of the efficiency of various methods in different environments along with the challenges encountered during the application of these methods. The focal point of this paper is the comparison of different research in the domain of vision-based hand gesture recognition. The objective is to find out the most prominent methods by reviewing efficiency. Concurrently, the paper delves into presenting potential solutions for challenges faced in different research. A comparative analysis particularly centered around traditional methods and convolutional neural networks like random forest, long short-term memory (LSTM), heatmap, and you only look once (YOLO). considering their efficacy. Where convolutional neural network-based algorithms performed best for recognizing the gestures and gave effective solutions for the challenges faced by the researchers. In essence, the findings of this review paper aim to contribute to future implementations and the discovery of more efficient approaches in the gesture recognition sector. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. A Hybrid Data Envelopment Analysis–Random Forest Methodology for Evaluating Green Innovation Efficiency in an Asymmetric Environment.
- Author
-
Chen, Limei, Xie, Xiaohan, Yao, Yao, Huang, Weidong, and Luo, Gongzhi
- Subjects
DATA envelopment analysis ,RANDOM forest algorithms ,SUSTAINABLE design ,ECONOMIC efficiency ,SUSTAINABLE development ,TECHNOLOGICAL innovations - Abstract
The accurate evaluation of green innovation efficiency is a critical prerequisite for enterprises to achieve sustainable development goals and improve environmental performance and economic efficiency. This paper evaluates the green innovation efficiency of 72 new-energy enterprises by using a hybrid method of Data Envelopment Analysis (DEA) and a random forest model. The non-parametric DEA model is combined with the parametric SFA model to analyze the real green innovation efficiency on the basis of removing environmental factors and random factors. Then, the random forest model based on a nonlinear relationship is used to evaluate factors impacting green innovation efficiency. This paper proposes a comprehensive evaluation method designed to assess the green innovation efficiency of new-energy enterprises. By applying this method, companies can gain a comprehensive understanding of the current performance in green innovation, facilitating informed decision-making and accelerating sustainable development. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Comprehensive Review On The Analysis Of Various Machine Learning Algorithms For Early Detection Of Critical Diseases.
- Author
-
Chitre, Divya, Bhushan, Shivendu, and Patil, Manisha S.
- Subjects
MACHINE learning ,EARLY diagnosis ,SUPPORT vector machines ,K-nearest neighbor classification ,RANDOM forest algorithms - Abstract
Early detection of critical diseases is a pivotal aspect of modern healthcare, significantly impacting patient outcomes and healthcare costs. This research paper provides a comprehensive review and analysis of various machine learning algorithms employed in the realm of early disease detection. The study explores the strengths, limitations, and overall efficacy of prominent algorithms, including Logistic Regression, Support Vector Machines, Random Forests, Neural Networks, K-Nearest Neighbors, and Ensemble Learning. Each algorithm's suitability for early detection is assessed based on factors such as interpretability, scalability, and performance in handling diverse data types. Furthermore, the review discusses the specific applications of these algorithms in different medical contexts, highlighting their contributions to the early identification of critical diseases. By synthesizing the current state of research, this paper aims to provide valuable insights for researchers, and policymakers working towards advancing the field of early disease detection through machine learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
38. Phishing Attacks Detection Using Ensemble Machine Learning Algorithms.
- Author
-
Innab, Nisreen, Osman, Ahmed Abdelgader Fadol, Ataelfadiel, Mohammed Awad Mohammed, Abu-Zanona, Marwan, Elzaghmouri, Bassam Mohammad, Zawaideh, Farah H., and Alawneh, Mouiad Fadeil
- Subjects
SOCIAL engineering (Fraud) ,INTERNET fraud ,ARTIFICIAL intelligence ,PHISHING ,RANDOM forest algorithms - Abstract
Phishing, an Internet fraud where individuals are deceived into revealing critical personal and account information, poses a significant risk to both consumers and web-based institutions. Data indicates a persistent rise in phishing attacks. Moreover, these fraudulent schemes are progressively becoming more intricate, thereby rendering them more challenging to identify. Hence, it is imperative to utilize sophisticated algorithms to address this issue. Machine learning is a highly effective approach for identifying and uncovering these harmful behaviors. Machine learning (ML) approaches can identify common characteristics in most phishing assaults. In this paper, we propose an ensemble approach and compare it with six machine learning techniques to determine the type of website and whether it is normal or not based on two phishing datasets. After that, we used the normalization technique on the dataset to transform the range of all the features into the same range. The findings of this paper for all algorithms are as follows in the first dataset based on accuracy, precision, recall, and F1-score, respectively: Decision Tree (DT) (0.964, 0.961, 0.976, 0.968), Random Forest (RF) (0.970, 0.964, 0.984, 0.974), Gradient Boosting (GB) (0.960, 0.959, 0.971, 0.965), XGBoost (XGB) (0.973, 0.976, 0.976, 0.976), AdaBoost (0.934, 0.934, 0.950, 0.942), Multi Layer Perceptron (MLP) (0.970, 0.971, 0.976, 0.974) and Voting (0.978, 0.975, 0.987, 0.981). So, the Voting classifier gave the best results. While in the second dataset, all the algorithms gave the same results in four evaluation metrics, which indicates that each of them can effectively accomplish the prediction process. Also, this approach outperformed the previous work in detecting phishing websites with high accuracy, a lower false negative rate, a shorter prediction time, and a lower false positive rate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Comparative Performance Analysis of Filling Missing Values Algorithms in PdM Systems of UAV.
- Author
-
ANDRIOAIA, Dragoş Alexandru, GĂITAN, Vasile Gheorghiţă, PĂTRUŢ, Bogdan, and FURDU, Iulian
- Subjects
REMAINING useful life ,MISSING data (Statistics) ,SYSTEM failures ,K-nearest neighbor classification ,RANDOM forest algorithms - Abstract
With the development of the IoT domain, the volume of data produced by various applications has also increased. Due to multiple reasons, such as sensor failure, communication system failure, and human errors, the data acquired from the sensors have missing values. The presence of missing values in the dataset affects the informational content of the dataset and thus affects the process of extracting knowledge from the data. In this paper, the authors present a comparative analysis of the performances of the methods of filling in the missing values, such as method, Interpolation, Mean, the K-Nearest Neighbors (KNN), and Random Forests (RF), on the data coming from a Predictive Maintenance (PdM) system that can be used at Unmanned Aerial Vehicle (UAV). The data on which the performance of these methods has been studied comes from a PdM system from the UAVs, used to identify the defects of the Brushless DC (BLDC) motors and estimate the Remaining Useful Life (RUL) of Li-ion batteries. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Quality design based on kernel trick and Bayesian semiparametric model for multi-response processes with complex correlations.
- Author
-
Yang, Shijuan, Wang, Jianjun, Cheng, Xiaoying, Wu, Jiawei, and Liu, Jinpei
- Subjects
PRINCIPAL components analysis ,EVOLUTIONARY algorithms ,RANDOM forest algorithms ,LEAST squares - Abstract
Processes or products are typically complex systems with numerous interrelated procedures and interdependent components. This results in complex relationships between responses and input factors, as well as complex nonlinear correlations among multiple responses. If the two types of complex correlations in the quality design cannot be properly dealt with, it will affect the prediction accuracy of the response surface model, as well as the accuracy and reliability of the recommended optimal solutions. In this paper, we combine kernel trick-based kernel principal component analysis, spline-based Bayesian semiparametric additive model, and normal boundary intersection-based evolutionary algorithm to address these two types of complex correlations. The effectiveness of the proposed method in modeling and optimisation is validated through a simulation study and a case study. The results show that the proposed Bayesian semiparametric additive model can better describe the process relationships compared to least squares regression, random forest regression, and support vector basis regression, and the proposed multi-objective optimisation method performs well on several indicators mentioned in the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Breathing site classification via joint mel frequency cepstral coefficients and gammatone frequency cepstral coefficients approach.
- Author
-
Zhang, Jiarui and Ling, Bingo Wing-Kuen
- Subjects
AUTOMATIC speech recognition ,SIGNAL-to-noise ratio ,NASOPHARYNX cancer ,RESPIRATION ,RANDOM forest algorithms ,CLASSIFICATION - Abstract
The patients with the nasopharyngeal cancer are required to breath through their mouth after performing the surgery. Hence, it is required to perform the breathing site classification and employs the classification results to indicate whether the patients breath correctly or not. Nevertheless, there is currently no such a medical aided tool in the market. To address this issue, this paper extracts both the mel frequency cepstral coefficients (MFCCs) based features and the gammatone frequency cepstral coefficients (GFCCs) based features as well as employs the random forest as the classifier for performing the breathing site classification. The data lasted for a few minutes acquired from 10 volunteers are employed to demonstrate the effectiveness of our proposed method. The computer numerical simulation results show that the average accuracy, the average specificity and the average sensitivity yielded by our proposed method are 95.30±2.00%, 93.27±3.87% and 97.15±1.87%, respectively. Although this paper proposes a method based on the fusion of two types of the acoustic features for classifying different breathing sites, the computer numerical simulation results show that our proposed method outperforms the common respiration or speech processing based methods. Besides, our proposed method is also compared to a series of relevant methods. It is found that our proposed method achieves the highest classification results at the majority signal to noise ratios among the state of the arts methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Revolutionizing Shopping Experience: Smart Carts and Shelves.
- Author
-
Rane, Milind, Ghusarkar, Ajinkya Ghansham, Nikam, Girish Anil, Thokal, Atharva Rajesh, Shintre, Sourabh Bholanath, and Kulkarni, Atul
- Subjects
INVENTORY control ,RADIO frequency identification systems ,CUSTOMER satisfaction ,SHOPPING ,SHELVING (Furniture) ,RANDOM forest algorithms ,MACHINE learning - Abstract
Efficient inventory management is essential for businesses to optimize their supply chain, reduce costs, and enhance operational efficiency. At the same time due to manual scanning at the checkout processes, customers have to wait for a long time in the queue. As a solution to these different problems, we have proposed a system using IoT. This research paper presents a novel approach to inventory management by integrating two cutting-edge technologies: Load Cell Sensor HX711 and Smart Cart System using Radio Frequency Identification (RFID) technology. By combining the capabilities of these technologies, the paper aims to automate inventory tracking, monitoring, and replenishment, providing real-time data for accurate inventory control and decision-making. The Smart Cart System enhances the customer satisfaction while the Smart Shelves increase efficiency of inventory management by automating the data capture process. It provides convenience by eliminating the need for manual scanning or checkout processes, and generating bill on webpage allowing customers to simply place items in the cart and proceed with their shopping. Also we present the machine learning aspect of our project aimed at revolutionizing the shopping experience using RFID technology and load cells. We explore the implementation of three prominent algorithms - Linear Regression, Random Forest, and XG Boost - to predict future sales. [ABSTRACT FROM AUTHOR]
- Published
- 2024
43. Contents list.
- Subjects
X-ray spectroscopy ,RAMAN spectroscopy ,CAREER development ,LASER-induced breakdown spectroscopy ,VACUUM circuit breakers ,RANDOM forest algorithms ,URANIUM ,X-ray fluorescence ,SAMARIUM - Abstract
The document is a contents list for the Journal of Analytical Atomic Spectrometry. It provides information about the articles and papers included in the journal's latest issue. Topics covered include advances in environmental analysis, managing transition metal interferences, stability of iron single atoms in graphene structures, and various other topics related to atomic spectrometry. The journal is published by The Royal Society of Chemistry, a leading chemistry community. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
44. Detecting Brute Force Attacks on SSH and FTP Protocol Using Machine Learning: A Survey.
- Author
-
Hamza, Amer Ali and surayh Al-Janabi, Rana Jumma
- Subjects
COMPUTER network traffic ,NAIVE Bayes classification ,RANDOM forest algorithms ,CYBERTERRORISM ,COMPUTER network protocols ,MACHINE learning ,COMPUTER networks - Abstract
The significance of detecting network traffic anomalies in cybersecurity cannot be overstated, especially given the increasing frequency and complexity of computer network attacks. As new Internet-related technologies emerge, so do more intricate attacks. One particularly daunting challenge is represented by dictionary-based bruteforce attacks, which require effective real- time detection and mitigation methods. In this paper, we investigate Secure Shell or Secure Socket Shell, is a network protocol that gives users, particularly system administrators, a secure way to access a computer over an unsecured network(SSH)and File Transfer Protocol is a standard network protocol used for the transfer of files from one host to another over a TCP-based network, such as the Internet (FTP) brute-force attack detection by using Our research focuses on using the machine learning approach to detect SSH and FTP brute-force attacks. A reasonably thorough investigation of machine learners' efficacy in identifying brute force assaults on SSH and FTP is made possible by employing several classifiers. Bruteforce assaults are a popular and risky method of obtaining usernames and passwords. Applying ethical hacking is an excellent technique to examine the effects of a brute-force assault. This article discusses many defense strategies and approaches to using bruteforce assaults. The pros and cons of several defense strategies are enumerated, along with information on which kind of assault is easiest to identify. we made use of machine learning classifiers: Naive Bayes, Random Forest, Logistic Regression, we determined that the Random Forest algorithm achieved the highest level with an accuracy the contribution lies in demonstrating the feasibility of training and evaluating basic Random Forest models with two independent variables to classify CSE-CIC-IDS2018 dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. A novel intelligent method for inter-shaft bearing-fault diagnosis based on hierarchical permutation entropy and LLE-RF.
- Author
-
Tian, Jing, Zhang, Yuwei, Zhang, Fengling, Ai, Xinping, and Wang, Zhi
- Subjects
ENTROPY ,RANDOM forest algorithms ,FAULT diagnosis ,FEATURE extraction - Abstract
Since the transmission path of inter-shaft bearing-fault signal is complex, a fault feature extraction method based on hierarchical permutation entropy (HPE) and locally linear embedding (LLE) algorithm is proposed in this paper. In this method, HPE is utilized to extract fault information of signals, and LLE is utilized to reduce and fuse high-dimensional fault features of multi-sensors to construct fault samples. Then, the random forest (RF) model is established to diagnose the faults of the inter-shaft bearings. The fault simulation test rig with the inter-shaft bearing is built to simulate the normal bearing, inner ring fault, outer ring fault, and rolling ball fault, and the data are collected to verify the HPE-LLE-RF fault diagnosis algorithm of inter-shaft bearings established in this paper. The experimental results show that the proposed algorithm can extract the fault features of inter-shaft bearings effectively with a fault diagnosis accuracy of 93.3% without overfit phenomenon. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Enhancing stress detection in wearable IoT devices using federated learning and LSTM based hybrid model.
- Author
-
Mouhni, Naoual, Amalou, Ibtissam, Chakri, Sana, Tourad, Mohamedou Cheikh, Chakraoui, Mohamed, and Abdali, Abdelmounaim
- Subjects
CONVOLUTIONAL neural networks ,FEDERATED learning ,BLENDED learning ,RANDOM forest algorithms ,DEEP learning - Abstract
In the domain of smart health devices, the accurate detection of physical indicators levels plays a crucial role in enhancing safety and well-being. This paper introduces a cross device federated learning framework using hybrid deep learning model. Specifically, the paper presents a comprehensive comparison of different combination of long short-term memory (LSTM), gated recurrent unit (GRU), convolutional neural network (CNN), random forest (RF), and extreme gradient boosting (XGBoost), in order to forecast stress levels by utilizing time series information derived from wearable smart gadgets. The LSTM-RF model demonstrated the highest level of accuracy, achieving 93.53% for user 1, 99.40% for user 2, and 97.88% for user 3. Similarly, the LSTM-XGBoost model yielded favorable outcomes, with accuracy rates of 85.88%, 98.55%, and 92.02% for users 1, 2, and 3, respectively, out of 23 users studied. These findings highlight the efficacy of federated learning and the utilization of hybrid models in stress detection. Unlike traditional centralized learning paradigms, the presented federated approach ensures privacy preservation and reduces data transmission requirements by processing data locally on Edge devices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Enhancing phishing URL detection through comprehensive feature selection: a comparative analysis across diverse datasets.
- Author
-
Preeti and Sharma, Priti
- Subjects
UNIFORM Resource Locators ,FEATURE selection ,RANDOM forest algorithms ,DEEP learning ,DECISION trees - Abstract
Malicious attacks have developed a prominent risk to the safety of online users, with attackers employing increasingly sophisticated systems to deceive unsuspecting victims. This research focuses on the critical aspect of feature selection in optimizing phishing uniform resource locator (URL) detection system. Feature selection boosts machine learning (ML) and deep learning (DL) by picking vital attributes efficiently. This research paper provides a comprehensive examination of feature selection techniques using five diverse datasets. Various methods, including random forest (RF) select from model, SelectKBest with chi-square statistic, principal component analysis (PCA) and recursive feature elimination (RFE), were employed. The experiments, with a particular emphasis on PCA and fourth dataset, revealed that all four models RF, decision trees (DTs), XGBoost, and multilayer perceptron) achieved 100% accuracy in detecting phishing URL attacks. This underscores the efficacy of feature selection methods in enhancing to a deeper understanding of feature selection's role in bolstering the effectiveness of phishing detection system across diverse datasets, highlighting the importance of leveraging techniques such as PCA for optimal results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Enhancing typhoon wave hindcasting with random forests and BP neural networks in the SWAN model.
- Author
-
Cheng Chen, Hongkun Lin, Dawei Guan, Feng Cai, Qiaoyi Wang, and Qingchun Liu
- Subjects
BACK propagation ,RANDOM forest algorithms ,TYPHOONS ,DATABASES ,STATISTICAL correlation - Abstract
Forecasting typhoon waves during typhoons is crucial. In this paper, the numerical wave model SWAN was enhanced through integration with two machine learning methods: the Back Propagation Neural Network and Random Forest. This integration facilitated the development of two distinct models, namely SWAN-BP and SWAN-Tree. Through correlation analysis, key input features were identified for the machine learning models. The forecasts from the SWAN model were subsequently utilized as inputs to enhance further wave prediction. These hybrid models were validated using data from Typhoon Doksuri (2023) and Typhoon Nesat (2017). The results indicated significant improvements in predicting typhoon-induced wave heights with both the SWAN-BP and SWAN-Tree models compared to the original SWAN model. Specifically, the SWAN-BP model demonstrated a 33% improvement in accuracy for the Typhoon Doksuri, whereas the SWAN-Tree model exhibited a 24% improvement. For Typhoon Nesat, the accuracy improvements were 23% for the SWAN-BP model and 21% for the SWAN-Tree model. These findings demonstrate that integrating wave numerical models with machine learning techniques can significantly enhance the predictive accuracy of numerical models. This approach offers a cost-effective means to improve the existing wave forecasting database. Traditionally, the direct use of meteorological and oceanographic data for typhoon wave prediction might be compromised by biases inherent in the numerical wave models. However, the SWAN-BP and SWAN-Tree models effectively reduce these biases, thereby providing more accurate and robust predictions. In conclusion, this paper enhances the predictive accuracy of the SWAN model and establishes a crucial foundation for more precise typhoon wave forecasting through the application of machine learning techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Explainable Machine Learning for Fallout Prediction in the Mortgage Pipeline.
- Author
-
Purohit, Preetam and Verma, Amit
- Subjects
MACHINE learning ,REGRESSION analysis ,LOANS ,RANDOM forest algorithms ,FINANCIAL institutions - Abstract
This study examines mortgage loan fallout using data provided by a leading financial institution. By accurately predicting mortgage loan fallout, lenders can protect their bottom line, maintain financial stability, and contribute to a healthier economy. The paper employs various machine learning models to predict mortgage fallout based on loan, market, property, and borrower characteristics. A large dataset of locked mortgage applications from a major U.S. lender was analyzed. The random forest model demonstrated superior predictive efficiency and stability. To understand the factors influencing mortgage fallout, the SHAP method, along with empirical analysis with logistic regression, was utilized to identify key determinants. The paper discusses the implications of these findings for mortgage lenders and future research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Prediction of Biochar Adsorption of Uranium in Wastewater and Inversion of Key Influencing Parameters Based on Ensemble Learning.
- Author
-
Qu, Zening, Wang, Wei, and He, Yan
- Subjects
WASTEWATER treatment ,FEATURE selection ,ADSORPTION capacity ,RANDOM forest algorithms ,HEAVY metals ,BIOCHAR - Abstract
With the rapid development of industrialization, the problem of heavy metal wastewater treatment has become increasingly serious, posing a serious threat to the environment and human health. Biochar shows great potential for application in the field of wastewater treatment; however, biochars prepared from different biomass sources and experimental conditions have different physicochemical properties, resulting in differences in their adsorption capacity for uranium, which limits their wide application in wastewater treatment. Therefore, there is an urgent need to deeply explore and optimize the key parameter settings of biochar to significantly improve its adsorption capacity. This paper combines the nonlinear mapping capability of SCN and the ensemble learning advantage of the Adaboost algorithm based on existing experimental data on wastewater treatment. The accuracy of the model is evaluated by metrics such as coefficient of determination (R
2 ) and error rate. It was found that the Adaboost–SCN model showed significant advantages in terms of prediction accuracy, precision, model stability and generalization ability compared to the SCN model alone. In order to further improve the performance of the model, this paper combined Adaboost–SCN with maximum information coefficient (MIC), random forest (RF) and energy valley optimizer (EVO) feature selection methods to construct three models, namely, MIC-Adaboost–SCN, RF-Adaboost–SCN and EVO-Adaboost–SCN. The results show that the prediction model with added feature selection is significantly better than the Adaboost–SCN model without feature selection in each evaluation index, and EVO has the most significant effect on feature selection. Finally, the correlation between biochar adsorption properties and production parameters was discussed through the inversion study of key parameters, and optimal parameter intervals were proposed to improve the adsorption properties. Providing strong support for the wide application of biochar in the field of wastewater treatment helps to solve the urgent environmental problem of heavy metal wastewater treatment. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.