770 results on '"classification models"'
Search Results
2. Enhanced Semantic Natural Scenery Retrieval System Through Novel Dominant Colour and Multi‐Resolution Texture Feature Learning Model.
- Author
-
Pavithra, L. K., Subbulakshmi, P., Paramanandham, Nirmala, Vimal, S., Alghamdi, Norah Saleh, and Dhiman, Gaurav
- Abstract
ABSTRACT A conventional content‐based image retrieval system (CBIR) extracts image features from every pixel of the images, and its depiction of the feature is entirely different from human perception. Additionally, it takes a significant amount of time for retrieval. An optimal combination of appropriate image features is necessary to bridge the semantic gap between user queries and retrieval responses. Furthermore, users should require minimal interactions with the CBIR system to obtain accurate responses. Therefore, the proposed work focuses on extracting highly relevant feature information from a set of images in various natural image databases. Subsequently, a feature‐based learning/classification model is introduced before similarity measure calculations, aiming to minimise retrieval time and the number of comparisons. The proposed work analyses the learning models based on the retrieval system's performance separately for the following features: (i) dominant colour, (ii) multi‐resolution radial difference texture patterns, and a combination of both. The developed work is assessed with other techniques, and the results are reported. The results demonstrate that the implemented ensemble learning model‐based CBIR outperforms the recent CBIR techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Regulatory‐based classification of rums: a chemometric and machine learning analysis.
- Author
-
Rincón‐López, Juliana, Chica, Juanita Castro, Rojas, Victoria Eugenia Recalde, Martínez, Liliana Moncayo, Gartner, Ángela María Arango, Rosero‐Moreano, Milton, and Taborda‐Ocampo, Gonzalo
- Subjects
- *
MACHINE learning , *LIQUORS , *CHEMOMETRICS , *ISOBUTANOL , *CLASSIFICATION - Abstract
Summary: The Industria Licorera de Caldas (ILC) stands as a major liquor factory in Colombia, specialising in the production of various rum types including Tradicional, Juan de la Cruz, Carta de Oro, and Reserva Especial. These rums, as congeneric drinks, are known for their rich content of volatile compounds that define their sensory characteristics. To be commercialised, each rum batch must comply with Colombian standard NTC278 which defines rigorous assessment of congener content and various physicochemical parameters. Thus, the ILC has accumulated a vast amount of data over the years. This study conducts a comprehensive analysis of ILC rums, using chemometric techniques and machine‐learning classification models such as PCA, KNN, LDA, and RF. The aim was to distinguish between rum types based on parameters specified for standard compliance, streamlining the process without the need for additional or extensive new methodologies. As a result, through PCA data exploration, it was revealed that acetaldehyde, ethyl acetate, and isobutanol levels are instrumental in differentiating rum variants. Similarly, all classification models achieved accuracy levels exceeding 0.83 and precision surpassing 0.93. These findings pave the way for further research in the development of an ILC‐specific sensor for rapid and reliable liquor authenticity testing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Enhancing Sex Estimation Accuracy with Cranial Angle Measurements and Machine Learning.
- Author
-
Toneva, Diana, Nikolova, Silviya, Agre, Gennady, Harizanov, Stanislav, Fileva, Nevena, Milenov, Georgi, and Zlatareva, Dora
- Subjects
- *
MACHINE learning , *DIAGNOSTIC sex determination , *SEX discrimination , *COMPUTED tomography , *ANTHROPOMETRY , *ANGLES , *SUPPORT vector machines - Abstract
Simple Summary: Sex estimation based on bones is a technique used to determine the biological sex of an individual from skeletal remains. It relies on the anatomical differences between male and female skeletons. Various bone characteristics have been incorporated into methods for sex estimation. Linear measurements are commonly used features in classification models for sex estimation. On the other hand, angle measurements are rarely included in such models, although they are important characteristics of the geometry of the bones and could provide essential information for the discrimination between the male and female bones. The goal of this research is to examine the potential of cranial angles for sex estimation and to identify the set of the most dimorphic angles by applying machine learning algorithms. The development of current sexing methods largely depends on the use of adequate sources of data and adjustable classification techniques. Most sex estimation methods have been based on linear measurements, while the angles have been largely ignored, potentially leading to the loss of valuable information for sex discrimination. This study aims to evaluate the usefulness of cranial angles for sex estimation and to differentiate the most dimorphic ones by training machine learning algorithms. Computed tomography images of 154 males and 180 females were used to derive data of 36 cranial angles. The classification models were created by support vector machines, naïve Bayes, logistic regression, and the rule-induction algorithm CN2. A series of cranial angle subsets was arranged by an attribute selection scheme. The algorithms achieved the highest accuracy on subsets of cranial angles, most of which correspond to well-known features for sex discrimination. Angles characterizing the lower forehead and upper midface were included in the best-performing models of all algorithms. The accuracy results showed the considerable classification potential of the cranial angles. The study demonstrates the value of the cranial angles as sex indicators and the possibility to enhance the sex estimation accuracy by using them. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Selection of the level of vibration signal decomposition and mother wavelets to determine the level of failure severity in spur gearboxes.
- Author
-
Pérez‐Torres, Antonio, Sánchez, Rene‐Vinicio, and Barceló‐Cerdá, Susana
- Subjects
- *
SENSOR placement , *WAVELET transforms , *SIGNAL processing , *POSITION sensors , *FEATURE extraction , *GEARBOXES - Abstract
Spur gearboxes are an integral component in the operation of rotary machines. Hence, the early determination of the severity level of a failure is crucial. This manuscript delineates a methodology for selecting essential mother wavelets and filters from the wavelet transform (WT) to process the vibration signal within the time‐frequency domain, aiming to ascertain the severity level of failures in spur gearboxes. Initially, information is garnered from the gearbox through vibration signals in the time domain, utilising six accelerometers. Subsequently, the signal is partitioned into various levels, and information from each level is extracted using diverse mother wavelets and their respective filters. The signal is segmented into sub‐bands, from which the condition state is ascertained using an energy operator. After that, the appropriate level of wave decomposition is determined through ANOVA tests and post‐hoc Tukey analyses, evaluating performance in failure classification via the Random Forest (RF) model. Upon establishing the decomposition level, the analysis proceeds to identify which mother wavelets and filters are most suitable for determining the severity level of different types of failure in spur gearboxes. Moreover, this study investigates the impact of sensor positioning and inclination on acquiring the vibration signal. This aspect is explored through factorial ANOVA tests and multiple comparisons of the data derived from the sensors. The RF classification model achieved exceedingly favourable results (accuracy >$\,>$96% and AUC >$\,>$98%), with minimal practical influence from the positioning and inclination of a sensor, thereby affirming the proposed methodology's suitability for this type of analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. An approach to the detection of pain from autonomic and cortical correlates.
- Author
-
Chouchou, F., Fauchon, C., Perchet, C., and Garcia-Larrea, L.
- Subjects
- *
MACHINE learning , *PAIN perception , *COGNITIVE testing , *BLOOD pressure , *ELECTROPHYSIOLOGY - Abstract
• Tonic painful stimulation entailed a decrease in EEG alpha power, together with cardiac, electrodermal and pupil activation. • Taking in isolation, none of these activities was specific to pain. • Powerful discrimination between painful and non-painful conditions was achieved only when EEG and autonomic changes were combined. To assess the value of combining brain and autonomic measures to discriminate the subjective perception of pain from other sensory-cognitive activations. 20 healthy individuals received 2 types of tonic painful stimulation delivered to the hand: electrical stimuli and immersion in 10 Celsius degree (°C) water, which were contrasted with non-painful immersion in 15 °C water, and stressful cognitive testing. High-density electroencephalography (EEG) and autonomic measures (pupillary, electrodermal and cardiovascular) were continuously recorded, and the accuracy of pain detection based on combinations of electrophysiological features was assessed using machine learning procedures. Painful stimuli induced a significant decrease in contralateral EEG alpha power. Cardiac, electrodermal and pupillary reactivities occurred in both painful and stressful conditions. Classification models, trained on leave-one-out cross-validation folds, showed low accuracy (61–73%) of cortical and autonomic features taken independently, while their combination significantly improved accuracy to 93% in individual reports. Changes in cortical oscillations reflecting somatosensory salience and autonomic changes reflecting arousal can be triggered by many activating signals other than pain; conversely, the simultaneous occurrence of somatosensory activation plus strong autonomic arousal has great probability of reflecting pain uniquely. Combining changes in cortical and autonomic reactivities appears critical to derive accurate indexes of acute pain perception. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Human Activity Recognition from Accelerometry, Based on a Radius of Curvature Feature.
- Author
-
Cavita-Huerta, Elizabeth, Reyes-Reyes, Juan, Romero-Ugalde, Héctor M., Osorio-Gordillo, Gloria L., Escobar-Jiménez, Ricardo F., and Alvarado-Martínez, Victor M.
- Subjects
FEEDFORWARD neural networks ,MACHINE learning ,ARTIFICIAL neural networks ,WEARABLE technology ,SPORTS sciences ,HUMAN activity recognition - Abstract
Physical activity recognition using accelerometry is a rapidly advancing field with significant implications for healthcare, sports science, and wearable technology. This research presents an interesting approach for classifying physical activities using solely accelerometry data, signals that were taken from the available "MHEALTH dataset" and processed through artificial neural networks (ANNs). The methodology involves data acquisition, preprocessing, feature extraction, and the application of deep learning algorithms to accurately identify activity patterns. A major innovation in this study is the incorporation of a new feature derived from the radius of curvature. This time-domain feature is computed by segmenting accelerometry signals into windows, conducting double integration to derive positional data, and subsequently estimating a circumference based on the positional data obtained within each window. This characteristic is computed across the three movement planes, providing a robust and comprehensive feature for activity classification. The integration of the radius of curvature into the ANN models significantly enhances their accuracy, achieving over 95%. In comparison with other methodologies, our proposed approach, which utilizes a feedforward neural network (FFNN), demonstrates superior performance. This outperforms previous methods such as logistic regression, which achieved 93%, KNN models with 90%, and the InceptTime model with 88%. The findings demonstrate the potential of this model to improve the precision and reliability of physical activity recognition in wearable health monitoring systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Towards a Taxonomy of Textbooks as a Genre: the Case of Russian Textbooks
- Author
-
Marina I. Solnyshkina, Gulnoza N. Shoeva, and Ksenia O. Kosova
- Subjects
text profiling ,multi-dimensional analysis ,rulingva ,parametrisation indices ,classification models ,ru lingva ,Language. Linguistic theory. Comparative grammar ,P101-410 ,Semantics ,P325-325.5 - Abstract
The project is presented in the paper initially is launched to design a functional recognition or classification model of a modern Russian school textbook as a genre. In this study we test and confirm the hypothesis that detection of domain (subject area) and complexity level of a textbook can be reduced to a limited number of quantitative linguistic parameters provided with accurately identified and verified value ranges. We outlined our approach to genre analysis as multi-dimensional, compiled a corpus of over 1 mln. tokens, measured values of 15 linguistic parameters in 19 textbooks of two different subject areas and complexity levels, revealed 7 complexity predictors, 7 subject area predictors, and one - frequency - a metaparameter able to discriminate textbooks of History and Social Studies from texts of other genres. Our findings highlight the significance of the following parameters for textbooks across the selected subject areas: incidence of nouns, verb tenses (present, past and future), local and global argument overlap, type-token ratio. Complexity classification model is ascertained to be a function of sentence length, word length, incidence of nouns in genitive case and verbs, Abstractness score, verb/noun ratio, and adjective/noun ratio. The outcomes of this analysis will be used to interpret quantitative linguistic descriptions and classify texts.
- Published
- 2024
- Full Text
- View/download PDF
9. Classification of RTV-coated porcelain insulator condition under different profiles and levels of pollution
- Author
-
Ali Ahmed Salem, Samir Ahmed Al-Gailani, Abdulrahman Ahmed Ghaleb Amer, Mohammad Alsharef, Mohit Bajaj, Ievgen Zaitsev, Razli Ngah, and Sherif S. M. Ghoneim
- Subjects
Classification models ,Coating distribution ,Pollution flashover ,Porcelain insulators ,Medicine ,Science - Abstract
Abstract Due to the limited hydrophobic properties of porcelain insulators, applying anti-pollution flashover coatings is crucial to enhance their functionality. This research outlines a classification system for assessing contamination levels on 22 kV porcelain insulators, both with and without coatings. It synthesizes six classification criteria derived through both numerical simulations and experimental studies to effectively gauge contamination severity. The study examined insulators treated with Room Temperature Vulcanizing (RTV) silicone under three different conditions: uncoated, partially coated, and fully coated. Additionally, the research assessed the effects of humidity on these polluted insulators to understand environmental impacts on their performance. The criteria, which are the flashover voltage (x1), fifth to third harmonics of leakage current (x2), maximum electric field (x3), total harmonic index (x4), insulation resistance (x5) and dielectric loss (x6), were proposed for evaluating the insulator’s string condition. The finite element method (FEM) was used to simulate an electric field. Then, based on the proposed criteria, the performances of the Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Multi-layer Perceptron (MLP) have been trained and compared to classify polluted insulator conditions with and without coating. The established criteria facilitate precise monitoring of the condition of high-voltage insulators, ensuring quick and effective responses that support the stability of the electrical power system.
- Published
- 2024
- Full Text
- View/download PDF
10. Exploring the complexity of obstructive sleep apnea: findings from machine learning on diagnosis and predictive capacity of individual factors.
- Author
-
Russo, Simone, Martini, Agnese, Luzzi, Valeria, Garbarino, Sergio, Pietrafesa, Emma, and Polimeni, Antonella
- Abstract
Purpose: Obstructive sleep apnoea (OSA) is a prevalent sleep disorder characterized by pharyngeal airway collapse during sleep, leading to intermittent hypoxia, intrathoracic pressure swings, and sleep fragmentation. OSA is associated with various comorbidities and risk factors, contributing to its substantial economic and social burden. Machine learning (ML) techniques offer promise in predicting OSA severity and understanding its complex pathogenesis. This study aims to compare the accuracy of different ML techniques in predicting OSA severity and identify key associated factors contributing to OSA. Methods: Adult patients suspected of OSA underwent clinical assessments and polysomnography. Demographic, anthropometric and clinical data were collected. Five supervised ML models (logistic regression, decision tree, random forest, extreme gradient boosting, support vector machine) were employed, optimized through grid search and cross-validation. Results: ML models exhibited varied performance across OSA severity levels. SVM demonstrated the highest accuracy for mild OSA, XGBoost for moderate OSA, and random forest for severe OSA. Logistic regression showed the highest AUC for moderate and severe OSA. Anthropometric measures, gender, and hypertension were significant predictors of OSA severity. Conclusion: ML models offer valuable insights into predicting OSA severity and identifying associated factors. Our findings support the relevant potential clinical utility of ML in OSA management, although further validation and refinement are warranted. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. Predictive algorithms for supply chain management: a comprehensive approach to forecasting delivery times and managing risk
- Author
-
Dariusz Woźniak, Michał Warszycki, Józef Stokłosa, Rafał Szrajnert, and Robert Chmura
- Subjects
classification models ,regression models ,supply chain ,forecasting delivery times ,xgboost ,random forest ,lightgbm ,Social Sciences - Abstract
The article delves into the challenges of delivery time management in today's business landscape. The authors underscore the need for precise delivery time forecasts, a key factor in maintaining a competitive edge and meeting customer expectations. They outline various methods for estimating the time of a selected commodity based on historical data, and stress the necessity of modern tools that can adapt to the intricate web of factors influencing delivery times and facilitate swift responses to changes. The following article presents an innovative delivery time forecasting application that integrates advanced predictive algorithms with historical data, current data, and external factors affecting the delivery process. The application was developed to provide more accurate delivery time forecasts and optimize logistics processes. Through advanced technologies, it can consider even the most complex scenarios and changes, allowing companies to plan and manage their logistics operations more effectively.
- Published
- 2024
- Full Text
- View/download PDF
12. Flow Analysis of Mastectomy Patients Using Length of Stay: A Single-Center Study
- Author
-
Teresa Angela Trunfio and Giovanni Improta
- Subjects
mastectomy ,length of stay ,regression models ,classification models ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Background: Malignant breast cancer is the most common cancer affecting women worldwide. The COVID-19 pandemic appears to have slowed the diagnostic process, leading to an enhanced use of invasive approaches such as mastectomy. The increased use of a surgical procedure pushes towards an objective analysis of patient flow with measurable quality indicators such as length of stay (LOS) in order to optimize it. Methods: In this work, different regression and classification models were implemented to analyze the total LOS as a function of a set of independent variables (age, gender, pre-op LOS, discharge ward, year of discharge, type of procedure, presence of hypertension, diabetes, cardiovascular disease, respiratory disease, secondary tumors, and surgery with complications) extracted from the discharge records of patients undergoing mastectomy at the ‘San Giovanni di Dio e Ruggi d’Aragona’ University Hospital of Salerno (Italy) in the years 2011–2021. In addition, the impact of COVID-19 was assessed by statistically comparing data from patients discharged in 2018–2019 with those discharged in 2020–2021. Results: The results obtained generally show the good performance of the regression models in characterizing the particular case studies. Among the models, the best at predicting the LOS from the set of variables described above was polynomial regression, with an R2 value above 0.689. The classification algorithms that operated on a LOS divided into 3 arbitrary classes also proved to be good tools, reaching 79% accuracy with the voting classifier. Among the independent variables, both implemented models showed that the ward of discharge, year of discharge, type of procedure and complications during surgery had the greatest impact on LOS. The final focus to assess the impact of COVID-19 showed a statically significant increase in surgical complications. Conclusion: Through this study, it was possible to validate the use of regression and classification models to characterize the total LOS of mastectomy patients. LOS proves to be an excellent indicator of performance, and through its analysis with advanced methods, such as machine learning algorithms, it is possible to understand which of the demographic and organizational variables collected have a significant impact and thus build simple predictors to support healthcare management.
- Published
- 2024
- Full Text
- View/download PDF
13. On Quantification of the Nonlinearity of PPV in Model Evaluation with Imbalanced Datasets.
- Author
-
Zhu, Qiuming
- Subjects
- *
HESSIAN matrices , *SURFACE analysis , *QUANTITATIVE research , *CLASSIFICATION , *CURVATURE - Abstract
This paper presents a quantitative analysis of the nonlinearities of the positive predictive value (PPV) and its effect in evaluating two-class pattern classification models with imbalanced datasets. The analysis is made through an expression of the PPV as a function of two other classification ratios that are invariant to the data imbalance —the true positive rate (TPR) and false positive rate (FPR), and σ — the imbalance ratio (IR) of the dataset such that PPV = σ TPR/(σ TPR + FPR). The curvatures of PPV in the three-dimensional TPR–FPR– σ space are studied using the Hessian matrix, from which a saddle-shaped 3D surface in the space is revealed. This paper explores the nonlinear behaviors of PPV around the critical points, identified at FPR = σ TPR on the saddle surface, along with its scaling and sensitivity issues as performance measurements in model evaluation. The effect of the nonlinearities of PPV for the F1 and MCC metrics on imbalanced datasets is also studied. It is warned through the results of this study that the evaluations of classification models could be misleading if without an awareness and understanding of the nonlinearities associated with the PPV and its relevant metrics on imbalanced datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Flow Analysis of Mastectomy Patients Using Length of Stay: A Single-Center Study.
- Author
-
Trunfio, Teresa Angela and Improta, Giovanni
- Subjects
- *
REGRESSION analysis , *MACHINE learning , *SURGICAL complications , *LENGTH of stay in hospitals , *RESPIRATORY diseases - Abstract
Background: Malignant breast cancer is the most common cancer affecting women worldwide. The COVID-19 pandemic appears to have slowed the diagnostic process, leading to an enhanced use of invasive approaches such as mastectomy. The increased use of a surgical procedure pushes towards an objective analysis of patient flow with measurable quality indicators such as length of stay (LOS) in order to optimize it. Methods: In this work, different regression and classification models were implemented to analyze the total LOS as a function of a set of independent variables (age, gender, pre-op LOS, discharge ward, year of discharge, type of procedure, presence of hypertension, diabetes, cardiovascular disease, respiratory disease, secondary tumors, and surgery with complications) extracted from the discharge records of patients undergoing mastectomy at the 'San Giovanni di Dio e Ruggi d'Aragona' University Hospital of Salerno (Italy) in the years 2011–2021. In addition, the impact of COVID-19 was assessed by statistically comparing data from patients discharged in 2018–2019 with those discharged in 2020–2021. Results: The results obtained generally show the good performance of the regression models in characterizing the particular case studies. Among the models, the best at predicting the LOS from the set of variables described above was polynomial regression, with an R2 value above 0.689. The classification algorithms that operated on a LOS divided into 3 arbitrary classes also proved to be good tools, reaching 79% accuracy with the voting classifier. Among the independent variables, both implemented models showed that the ward of discharge, year of discharge, type of procedure and complications during surgery had the greatest impact on LOS. The final focus to assess the impact of COVID-19 showed a statically significant increase in surgical complications. Conclusion: Through this study, it was possible to validate the use of regression and classification models to characterize the total LOS of mastectomy patients. LOS proves to be an excellent indicator of performance, and through its analysis with advanced methods, such as machine learning algorithms, it is possible to understand which of the demographic and organizational variables collected have a significant impact and thus build simple predictors to support healthcare management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Optimizing machine learning models for classification of stroke patients with epileptiform EEG pattern: the impact of dataset balancing techniques.
- Author
-
Iscra, Katerina, Biscontin, Alessandro, Miladinovic, Aleksandar, Bonini, Andrea, Furlanis, Giovanni, Prandin, Gabriele, Malesani, Michele, Naccarato, Marcello, Manganotti, Paolo, Accardo, Agostino, and Ajčević, Miloš
- Abstract
Epileptiform electroencephalogram (EEG) patterns are commonly observed in stroke patients and can significantly impact clinical management and patient outcomes. Therefore, the classification of the stroke patients in order to identify the subjects with high probability of epileptiform EEG patterns may improve the stroke management. In recent years, there has been a notable increase in interest and utilization of machine learning, especially in the domain of classification tasks. Nevertheless, the presence of imbalanced datasets presents hurdles for machine learning algorithms, resulting in skewed predictions toward dominant classes and diminished accuracy, especially for underrepresented ones. Hence, the study aims to evaluate the effects of dataset balancing methods on the classification efficacy of machine learning models for classification of stroke patients with epileptiform EEG patterns by conducting a comparative analysis between models trained on imbalanced and balanced datasets. Four different sampling techniques were employed: an oversampling technique, SMOTENC; an undersampling technique, NearMiss; and two techniques that combine over- and undersampling methods, SMOTEToken and SMOTEENN. The features selection was made using the ReliefF scoring method and for model construction, only features that presented a contribution value greater than 0.01 were utilized. Five different machine learning models were considered in the study: classification tree, logistic regression, naïve Bayes, artificial neural network and support vector machine. The produced models were trained on the original and resampled training set and subsequently the models' performances were evaluated on the test set. The results showed that SMOTENC was the most effective among the considered dataset balancing techniques, showing superior classification performance compared to other methods and the original dataset. Models utilizing SMOTENC exhibited significant improvements in AUC (0.76 vs 0.67) and specificity values (0.73 vs 0.43) while maintaining comparable accuracy (0.72 vs 0.74) to those trained on the original dataset, respectively. Furthermore, it has been noted that different sampling techniques result in different selection of the most predictive features. In conclusion, our study highlights the crucial role of utilizing dataset balancing methods to improve the classification performances of predictive models in case of highly unbalanced datasets such as case of stratification of stroke patients with epileptiform EEG patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Auto-BCS: A Hybrid System for Real-Time Breast Cancer Screening from Pathological Images.
- Author
-
Ekta and Bhatia, Vandana
- Subjects
BREAST tumor diagnosis ,COMPUTER-assisted image analysis (Medicine) ,BREAST tumors ,EARLY detection of cancer ,DEEP learning ,COMPUTER-aided diagnosis ,DIGITAL image processing ,MACHINE learning - Abstract
Breast cancer is recognized as a prominent cause of cancer-related mortality among women globally, emphasizing the critical need for early diagnosis resulting improvement in survival rates. Current breast cancer diagnostic procedures depend on manual assessments of pathological images by medical professionals. However, in remote or underserved regions, the scarcity of expert healthcare resources often compromised the diagnostic accuracy. Machine learning holds great promise for early detection, yet existing breast cancer screening algorithms are frequently characterized by significant computational demands, rendering them unsuitable for deployment on low-processing-power mobile devices. In this paper, a real-time automated system "Auto-BCS" is introduced that significantly enhances the efficiency of early breast cancer screening. The system is structured into three distinct phases. In the initial phase, images undergo a pre-processing stage aimed at noise reduction. Subsequently, feature extraction is carried out using a lightweight and optimized deep learning model followed by extreme gradient boosting classifier, strategically employed to optimize the overall performance and prevent overfitting in the deep learning model. The system's performance is gauged through essential metrics, including accuracy, precision, recall, F1 score, and inference time. Comparative evaluations against state-of-the-art algorithms affirm that Auto-BCS outperforms existing models, excelling in both efficiency and processing speed. Computational efficiency is prioritized by Auto-BCS, making it particularly adaptable to low-processing-power mobile devices. Comparative assessments confirm the superior performance of Auto-BCS, signifying its potential to advance breast cancer screening technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Machine learning-based classification models for non-covalent Bruton's tyrosine kinase inhibitors: predictive ability and interpretability.
- Author
-
Li, Guo, Li, Jiaxuan, Tian, Yujia, Zhao, Yunyang, Pang, Xiaoyang, and Yan, Aixia
- Abstract
In this study, we built classification models using machine learning techniques to predict the bioactivity of non-covalent inhibitors of Bruton's tyrosine kinase (BTK) and to provide interpretable and transparent explanations for these predictions. To achieve this, we gathered data on BTK inhibitors from the Reaxys and ChEMBL databases, removing compounds with covalent bonds and duplicates to obtain a dataset of 3895 inhibitors of non-covalent. These inhibitors were characterized using MACCS fingerprints and Morgan fingerprints, and four traditional machine learning algorithms (decision trees (DT), random forests (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost)) were used to build 16 classification models. In addition, four deep learning models were developed using deep neural networks (DNN). The best model, Model D_4, which was built using XGBoost and MACCS fingerprints, achieved an accuracy of 94.1% and a Matthews correlation coefficient (MCC) of 0.75 on the test set. To provide interpretable explanations, we employed the SHAP method to decompose the predicted values into the contributions of each feature. We also used K-means dimensionality reduction and hierarchical clustering to visualize the clustering effects of molecular structures of the inhibitors. The results of this study were validated using crystal structures, and we found that the interaction between the BTK amino acid residue and the important features of clustered scaffold was consistent with the known properties of the complex crystal structures. Overall, our models demonstrated high predictive ability and a qualitative model can be converted to a quantitative model to some extent by SHAP, making them valuable for guiding the design of new BTK inhibitors with desired activity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Classification of FLT3 inhibitors and SAR analysis by machine learning methods.
- Author
-
Zhao, Yunyang, Tian, Yujia, Pang, Xiaoyang, Li, Guo, Shi, Shenghui, and Yan, Aixia
- Abstract
FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure–activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set. In addition, we clustered 3867 inhibitors into 11 subsets by the K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine,2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Near infrared spectroscopy for cooking time classification of cassava genotypes.
- Author
-
Bandeira e Sousa, Massaine, Garcia Morales, Cinara Fernanda, Mbanjo, Edwige Gaby Nkouaya, Egesi, Chiedozie, and Jorge de Oliveira, Eder
- Subjects
NEAR infrared spectroscopy ,K-nearest neighbor classification ,ACQUISITION of data ,GENOTYPES ,DATA analysis ,CASSAVA - Abstract
Cooking time is a crucial determinant of culinary quality of cassava roots and incorporating it into the early stages of breeding selection is vital for breeders. This study aimed to assess the potential of near-infrared spectroscopy (NIRS) in classifying cassava genotypes based on their cooking times. Five cooking times (15, 20, 25, 30, and 40 minutes) were assessed and 888 genotypes evaluated over three crop seasons (2019/2020, 2020/2021, and 2021/2022). Fifteen roots from five plants per plot, featuring diameters ranging from 4 to 7 cm, were randomly chosen for cooking analysis and spectral data collection. Two root samples (15 slices each) per genotype were collected, with the first set aside for spectral data collection, processed, and placed in two petri dishes, while the second set was utilized for cooking assessment. Cooking data were classified into binary and multiclass variables (CT4C and CT6C). Two NIRs devices, the portable QualitySpec® Trek (QST) and the benchtop NIRFlex N-500 were used to collect spectral data. Classification of genotypes was carried out using the K-nearest neighbor algorithm (KNN) and partial least squares (PLS) models. The spectral data were split into a training set (80%) and an external validation set (20%). For binary variables, the classification accuracy for cassava cooking time was notably high (R²
Cal ranging from 0.72 to 0.99). Regardingmulticlass variables, accuracy remained consistent across classes, models, and NIR instruments (~0.63). However, the KNNmodel demonstrated slightly superior accuracy in classifying all cooking time classes, except for the CT4C variable (QST) in the NoCook and 25 min classes. Despite the increased complexity associated with binary classification, it remained more efficient, offering higher classification accuracy for samples and facilitating the selection of the most relevant time or variables, such as cooking time ≤ 30 minutes. The accuracy of the optimal scenario for classifying samples with a cooking time of 30 minutes reached R²Cal = 0.86 and R²Val = 0.84, with a Kappa value of 0.53. Overall, the models exhibited a robust fit for all cooking times, showcasing the significant potential of NIRs as a high-throughput phenotyping tool for classifying cassava genotypes based on cooking time. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
20. An analysis on classification models for customer churn prediction
- Author
-
Kathi Chandra Mouli, Ch. V. Raghavendran, V. Y. Bharadwaj, G. Y. Vybhavi, C. Sravani, Khristina Maksudovna Vafaeva, Rajesh Deorari, and Laith Hussein
- Subjects
Customer churn ,classification models ,class imbalance ,accuracy metrics ,cross validation ,hyper parameters ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
The rapid expansion of technical infrastructure has brought about transformative changes in business operations. A notable consequence of this digital evolution is the proliferation of subscription-based services. With an increasing array of options for goods and services, customer churn has emerged as a significant challenge, posing a threat to businesses across sectors. The direct impact on earnings has prompted businesses to proactively develop tools for predicting potential client turnover. Identifying the underlying factors contributing to churn is crucial for implementing effective retention strategies. Our research makes a pivotal contribution by presenting a churn prediction model designed to assist businesses in identifying clients at risk of churn. The proposed model leverages machine learning classification techniques, with the customer data undergoing thorough pre-processing phases prior to model application. We systematically evaluated ten classification techniques, including Logistic Regression, Support Vector Classifier, Kernel SVM, KNN, Gaussian Naïve Bayes, Decision Tree Classifier, Random Forest, ADA Boost, XGBoost, and Gradient Boost. The assessment encompassed various evaluation metrics, such as ROC AUC Mean, ROC AUC STD, Accuracy Mean, Accuracy STD, Accuracy, Precision, Recall, F1 Score, and F2 Score. Employing 10-fold cross-validation and hyper parameter tuning through GridSearchCV and RandomizedSearchCV, we identified Random Forest as the most effective classifier, achieving an 85% Area Under the Curve (AUC) for optimal results.
- Published
- 2024
- Full Text
- View/download PDF
21. Conspiracy Detection Beyond Text: Exploring the Feasibility of Adding Psycho-Linguistic Features to Enhance Conspiracy Detection Models
- Author
-
George, Anna R., Ahrens, Maximilian, Pierrehumbert, Janet B., McMahon, Michael, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Preuss, Mike, editor, Leszkiewicz, Agata, editor, Boucher, Jean-Christopher, editor, Fridman, Ofer, editor, and Stampe, Lucas, editor
- Published
- 2024
- Full Text
- View/download PDF
22. Machine Learning for ESG Rating Classification: An Integrated Replicable Model with Financial and Systemic Risk Parameters
- Author
-
Castellano, Rosella, Cini, Federico, Ferrari, Annalisa, Corazza, Marco, editor, Gannon, Frédéric, editor, Legros, Florence, editor, Pizzi, Claudio, editor, and Touzé, Vincent, editor
- Published
- 2024
- Full Text
- View/download PDF
23. New Approach to Facial Expression Recognition and Classification Using Typical Testors
- Author
-
Alvarado-Moreira, Roberto, Ibarra-Fiallo, Julio, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
24. Synergistic Performance Forecasting: Harnessing Gradient Boost and Linear Discriminant Analysis for Student Achievement Prediction
- Author
-
Tamilkodi, R., Madhavi, K. Valli, Christi, J. Annie, Kumar, S. Revanth, Pavan, M. Naga, Manohar, A. Phani, Fournier-Viger, Philippe, Series Editor, Madhavi, K. Reddy, editor, Subba Rao, P., editor, Avanija, J., editor, Manikyamba, I. Lakshmi, editor, and Unhelkar, Bhuvan, editor
- Published
- 2024
- Full Text
- View/download PDF
25. AI-Infused Finance: Predicting Stock Prices Through News and Market Data Analysis
- Author
-
Sangala, Veena Madhuri, Alamanda, Sirisha, Tirumalareddy, Prathima, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Bajaj, Anu, editor, Hanne, Thomas, editor, and Siarry, Patrick, editor
- Published
- 2024
- Full Text
- View/download PDF
26. Performance Analysis of Classifying URL Phishing Using Recursive Feature Elimination
- Author
-
Albaser, Marwa, Ali, Salwa, Chantar, Hamouda, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Benmusa, Tammam A. T., editor, Elbuni, Mohamed Samir, editor, Saleh, Ibrahim M., editor, Ashur, Ahmed S., editor, Drawil, Nabil M., editor, and Ellabib, Issmail M., editor
- Published
- 2024
- Full Text
- View/download PDF
27. Boosting Customer Retention in Pharmaceutical Retail: A Predictive Approach Based on Machine Learning Models
- Author
-
Espinoza-Vega, Angel, Roa, Henry N., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
28. Detecting Abnormal Authentication Delays In Identity And Access Management Using Machine Learning
- Author
-
Xiang, Jiahui, Salem, Osman, Mehaoua, Ahmed, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Renault, Éric, editor, Boumerdassi, Selma, editor, and Mühlethaler, Paul, editor
- Published
- 2024
- Full Text
- View/download PDF
29. Mental Health Predictive Analysis Using Machine-Learning Techniques
- Author
-
Jain, Vanshika, Kumari, Ritika, Bansal, Poonam, Dev, Amita, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Senjyu, Tomonobu, editor, So–In, Chakchai, editor, and Joshi, Amit, editor
- Published
- 2024
- Full Text
- View/download PDF
30. Lead Conversion and Scoring with Machine Learning
- Author
-
Singh, Sunil Kumar, Abhi, Shinu, Agarwal, Rashmi, Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, and Uddin, Mohammad Shorif, editor
- Published
- 2024
- Full Text
- View/download PDF
31. Predictive Analytics for Non-performing Loans and Bank Vulnerability During Crises in Philippines
- Author
-
De Guzman, Jessie James C., Lazo, Macrina P., Balan, Ariel Kelly D., De Goma, Joel C., Intal, Grace Lorraine D., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2024
- Full Text
- View/download PDF
32. Schools Students Performance with Artificial Intelligence Machine Learning: Features Taxonomy, Methods and Evaluation
- Author
-
Hennebelle, Alain, Ismail, Leila, Linden, Tanya, and Khine, Myint Swe, editor
- Published
- 2024
- Full Text
- View/download PDF
33. Hybrid Principal Component Analysis Using Boosting Classification Techniques: Categorical Boosting
- Author
-
Lalwani, Pooja, Ramasamy, Ganeshan, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Nanda, Satyasai Jagannath, editor, Yadav, Rajendra Prasad, editor, Gandomi, Amir H., editor, and Saraswat, Mukesh, editor
- Published
- 2024
- Full Text
- View/download PDF
34. Synthetic Hyperspectral Data for Avocado Maturity Classification
- Author
-
Sanchez, Froylan Jimenez, Tabares, Marta Silvia, Aguilar, Jose, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tabares, Marta, editor, Vallejo, Paola, editor, Suarez, Biviana, editor, Suarez, Marco, editor, Ruiz, Oscar, editor, and Aguilar, Jose, editor
- Published
- 2024
- Full Text
- View/download PDF
35. Machine learning-based model for customer emotion detection in hotel booking services
- Author
-
Nguyen, Nghia, Nguyen, Thuy-Hien, Nguyen, Yen-Nhi, Doan, Dung, Nguyen, Minh, and Nguyen, Van-Ho
- Published
- 2024
- Full Text
- View/download PDF
36. Transfer learning for probabilistic localization of hidden cracks in concrete structures
- Author
-
Miele, S., Karve, P., and Mahadevan, S.
- Published
- 2024
- Full Text
- View/download PDF
37. Toward Better Education Quality through Students' Sentiment Analysis Using AutoML.
- Author
-
SIMIONESCU, Corina, MARCU, Daniela, and MĂCIUCĂ, Marius Silviu
- Subjects
- *
SENTIMENT analysis , *ACHIEVEMENT gains (Education) , *RECOMMENDER systems , *HIGH school students , *MACHINE learning - Abstract
Sentiment analysis from students' interactions with learning environments is a topic of interest for researchers in the field of education because it can make important contributions to improving the quality of instructional processes through recommendation systems integrated into learning applications, or by improving the quality of courses, by grouping students according to their common interests and providing feedback on school progress. There are two approaches to sentiment analysis: one lexicon-based and another that uses machine learning. In this study, we present a sentiment analysis from two own data sets that represent students' opinions about school. Our goal is to create a model that helps us to automatically label students' opinions, assigning sentiment scores between 0 and 4 (0 for an extremely negative opinion). To train and evaluate the performance of the model, we used opinions collected from 1443 Romanian high school students. The novelty that we propose is the manual labeling system. Our current research which uses a machine learning approach to classify students' opinions obtains an accuracy of 86.507%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. A Study on Comparative Analysis of Feature Selection Algorithms for Students Grades Prediction.
- Author
-
Tariq, Muhammad Arham
- Subjects
- *
FEATURE selection , *RANDOM forest algorithms , *SUPPORT vector machines , *K-nearest neighbor classification , *DECISION trees , *DATA mining - Abstract
Education data mining (EDM) applies data mining techniques to extract insights from educational data, enabling educators to evaluate their teaching methods and improve student outcomes. Feature selection algorithms play a crucial role in improving classifier accuracy by reducing redundant features. However, a detailed and diverse comparative analysis of feature selection algorithms on multiclass educational datasets is missing. This paper presents a study that compares ten different feature selection algorithms for predicting student grades. The goal is to identify the most effective feature selection technique for multi-class student grades prediction. Five classifiers, including Support Vector Machines (SVM), Decision Trees (DT), Random Forests (RF), Gradient Boosting (GB), and k-Nearest Neighbors (KNN), are trained and tested on ten different feature selection algorithms. The results show that SelectFwe(SFWEF) performed best, achieving an accuracy of 74.3% with Random Forests (RT) across all ten feature selection algorithms. This algorithm selects features based on their relationship with the target variable while controlling the family-wise error rate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Exploration of Convective and Infrared Drying Effect on Image Texture Parameters of 'Mejhoul' and 'Boufeggous' Date Palm Fruit Using Machine Learning Models.
- Author
-
Noutfia, Younes and Ropelewska, Ewa
- Subjects
DATE palm ,MACHINE learning ,DATES (Fruit) ,CULTIVARS ,RANDOM forest algorithms ,FRUIT extracts - Abstract
Date palm (Phoenix dactylifera L.) fruit samples belonging to the 'Mejhoul' and 'Boufeggous' cultivars were harvested at the Tamar stage and used in our experiments. Before scanning, date samples were dried using convective drying at 60 °C and infrared drying at 60 °C with a frequency of 50 Hz, and then they were scanned. The scanning trials were performed for two hundred date palm fruit in fresh, convective-dried, and infrared-dried forms of each cultivar using a flatbed scanner. The image-texture parameters of date fruit were extracted from images converted to individual color channels in RGB, Lab, XYZ, and UVS color models. The models to classify fresh and dried samples were developed based on selected image textures using machine learning algorithms belonging to the groups of Bayes, Trees, Lazy, Functions, and Meta. For both the 'Mejhoul' and 'Boufeggous' cultivars, models built using Random Forest from the group of Trees turned out to be accurate and successful. The average classification accuracy for fresh, convective-dried, and infrared-dried 'Mejhoul' reached 99.33%, whereas fresh, convective-dried, and infrared-dried samples of 'Boufeggous' were distinguished with an average accuracy of 94.33%. In the case of both cultivars and each model, the higher correctness of discrimination was between fresh and infrared-dried samples, whereas the highest number of misclassified cases occurred between fresh and convective-dried fruit. Thus, the developed procedure may be considered an innovative approach to the non-destructive assessment of drying impact on the external quality characteristics of date palm fruit. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Classification Model Using Transfer Learning for the Detection of Pneumonia in Chest X-Ray Images.
- Author
-
Maquen-Niño, Gisella Luisa Elena, Nuñez-Fernandez, Jhojan Genaro, Taquila-Calderon, Fany Yesica, Adrianzén-Olano, Ivan, De-La-Cruz-VdV, Percy, and Carrión-Barco, Gilberto
- Subjects
X-rays ,X-ray imaging ,IMAGE recognition (Computer vision) ,CONVOLUTIONAL neural networks ,PNEUMONIA ,RESPIRATORY organs - Abstract
In the current global context, there has been a significant increase in respiratory system diseases, particularly pneumonia. This disease has a higher incidence of mortality in children under five years old and adults over 60 years old because it leads to complications if not treated in time. This research leverages convolutional neural networks (CNNs) to classify images, specifically to detect the presence of pneumonia. The data processing methodology utilized in this study is CRISP-DM. The dataset consists of 5,856 images of anteroposterior chest X-rays downloaded from the open repository "Kaggle," divided into 5,216 images for training, 16 for validation, and 624 for testing. Preprocessing involved image augmentation through modifications to the original images, scaling, and batch division in tensor format. A comparative analysis was conducted among the transfer models: DenseNet, VGG19, and ResNet50 version 2. Each transfer model was the header of a CNN with four subsequent layers. The models underwent training, validation, and testing phases. The test's results showed that DenseNet achieved an accuracy of 0.87, VGG19 achieved 0.86, and ResNet50 achieved 0.91. These results affirm the effectiveness of ResNet50 in image classification, considering that the model's output is binary, where 0 represents that the patient does not have pneumonia and 1 indicates that the patient has pneumonia. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Metabolomics on Apple (Malus domestica) Cuticle—Search for Authenticity Markers.
- Author
-
Bechynska, Kamila, Sedlak, Jiri, Uttl, Leos, Kosek, Vit, Vackova, Petra, Kocourek, Vladimir, and Hajslova, Jana
- Subjects
METABOLOMIC fingerprinting ,TANDEM mass spectrometry ,METABOLOMICS ,CUTICLE ,METABOLITES ,APPLES ,ORCHARDS - Abstract
The profile of secondary metabolites present in the apple cuticular layer is not only characteristic of a particular apple cultivar; it also dynamically reflects various external factors in the growing environment. In this study, the possibility of authenticating apple samples by analyzing their cuticular layer extracts was investigated. Ultra-high-performance liquid chromatography coupled with high-resolution tandem mass spectrometry (UHPLC-HRMS/MS) was employed for obtaining metabolomic fingerprints. A total of 274 authentic apple samples from four cultivars harvested in the Czech Republic and Poland between 2020 and 2022 were analyzed. The complex data generated, processed using univariate and multivariate statistical methods, enabled the building of classification models to distinguish apple cultivars as well as their geographical origin. The models showed very good performance in discriminating Czech and Polish samples for three out of four cultivars: "Gala", "Golden Delicious" and "Idared". Moreover, the validity of the models was tested over several harvest seasons. In addition to metabolites of the triterpene biosynthetic pathway, the diagnostic markers were mainly wax esters. "Jonagold", which is known to be susceptible to mutations, was the only cultivar for which an unambiguous classification of geographical origin was not possible. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Machine learning for persistent free radicals in biochar: dual prediction of contents and types using regression and classification models.
- Author
-
Latif, Junaid, Chen, Na, Saleem, Azka, Li, Kai, Qin, Jianjun, Yang, Huiqiang, and Jia, Hanzhong
- Subjects
BIOCHAR ,MACHINE learning ,FREE radicals ,SUPERVISED learning ,REGRESSION analysis ,GRAPHICAL user interfaces - Abstract
Persistent free radicals (PFRs) are emerging substances with diverse impacts in biochar applications, necessitating accurate prediction of their content and types prior to their optimal use and minimal adverse effects. This prediction task is challenging due to the nonlinearity and intricate variable relationships of biochar. Herein, we employed data-driven techniques to compile a dataset from peer-reviewed publications, aiming to systematically predict the PFRs by developing supervised machine learning models. Notably, extreme gradient boosting (XGBoost) model exhibited the best predictive performance for both regression and classification tasks in predicting the PFRs, achieving a test R
2 value of 0.95 for PFR content prediction, along with an Area Under the Receiver Operating Curve (AUROC) of 0.92 for PFR type prediction, respectively. Based on XGBoost model, a graphical user interface (GUI) was developed to access PFRs predictions. Analysis of feature importance revealed that the biochar properties, such as metal/non-metal doping, pyrolysis temperature, carbon content, and specific surface area were identified as the four most significant factors influencing PFRs contents. Regarding the types of PFRs in biochar, specific surface area, pyrolysis temperature, carbon content, and feedstock were top-ranked influencing factors. These findings provide valuable guidance for accurately predicting both the contents and types of PFRs in biochar, and also hold significant potential for highly efficient utilization of biochar across various applications. Highlights: • Recognizing dual nature of PFRs, a machine-learning framework predicts them in biochar. • XGBoost excels, achieving an R2 (0.95) for PFR content and an AUROC (0.92) for PFR type. • Important factors of PFR: doping, pyrolysis temp, carbon, and surface area. • GUI enhances accessibility, enabling PFR predictions before biochar preparation. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
43. Mind your prevalence!
- Author
-
Guesné, Sébastien J. J., Hanser, Thierry, Werner, Stéphane, Boobier, Samuel, and Scott, Shaylyn
- Subjects
- *
EQUILIBRIUM testing , *STRUCTURE-activity relationships , *MODEL validation , *STATISTICAL correlation , *SENSITIVITY & specificity (Statistics) - Abstract
Multiple metrics are used when assessing and validating the performance of quantitative structure–activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model's perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data. Scientific contribution: This work provides a formal, unified framework for understanding prevalence dependence in model validation metrics, deriving balanced metric expressions beyond balanced accuracy, and demonstrating their practical utility for common use cases. In contrast to prior literature, it introduces the derived confusion matrix to express metrics as functions of sensitivity, specificity and prevalence without needing additional coefficients. The manuscript extends the concept of balanced metrics to Matthews' correlation coefficient and other widely used performance indicators, enabling robust comparisons under prevalence shifts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Machine Learning-Driven Classification of Urease Inhibitors Leveraging Physicochemical Properties as Effective Filter Criteria.
- Author
-
Morales, Natalia, Valdés-Muñoz, Elizabeth, González, Jaime, Valenzuela-Hormazábal, Paulina, Palma, Jonathan M., Galarza, Christian, Catagua-González, Ángel, Yáñez, Osvaldo, Pereira, Alfredo, and Bustos, Daniel
- Subjects
- *
UREASE , *MACHINE learning , *HELICOBACTER pylori , *ENZYME metabolism , *CHRONIC kidney failure , *GASTRIC diseases - Abstract
Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic Helicobacter pylori. Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Contrails and Their Dependence on Meteorological Situations.
- Author
-
Kameníková, Iveta, Nagy, Ivan, and Hospodka, Jakub
- Subjects
CONDENSATION trails ,CIRRUS clouds ,TRAFFIC density ,AIR traffic ,TRAFFIC flow - Abstract
Contrails created by aircraft are a very hot topic today because they contribute to the warming of the atmosphere. Air traffic density is very high, and current forecasts predict a further significant increase. Increased air traffic volume is associated with an increased occurrence of contrails and induced cirrus clouds. The scientific level of contrails and their impact on the Earth's climate is surprisingly low. The scientific studies published so far are mainly based on global models, in situ measurements, and satellite observations of contrails. The research is based on observations of contrails in flight paths in the vicinity of Děčín and Prague, and the collection of flight and meteorological data. It focused on the influence of the meteorological situation on the formation of persistent contrails. The collected data on contrails and meteorological variables were statistically processed using machine learning methods for classification models. Several models were developed to predict and simulate the properties of contrails as a function of given air traffic and meteorological conditions. The Random Forests model produced the best results. Dependencies between meteorological conditions, formation, and contrail lifetime were found. The aim of the study was to identify the possibility of using available meteorological data to predict persistent contrails. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Performance Assessment of Ensemble-Tree Learning Models on Breast Cancer Dataset
- Author
-
O. I. ALABI, O. J. FATABO, S. C. OKENU, T. A. ONIMISI, G. O. OYERINDE, I. J. UDOH, A. I. OZOEZE, and A. C. EGBA
- Subjects
Breast cancer ,Classification models ,Tree-based Ensemble ,Supervised learning ,Bibliography. Library science. Information resources - Abstract
Advancements of feature extraction enable the collection of prognostic data values which can be used to distinguish between benign and malignant tumours. While single learning models are capable of making predictions, combining weak learners to form an ensemble can improve predictive performance. This study evaluates and compares the performance of a few selected ensemble-tree machine learning models as applied to a Wisconsin Diagnostic breast cancer (WDBC) dataset. The dataset is split, producing a 60% training and 40% test division set. Random Forest classifier, Extremely Randomized Trees classifier, Gradient Boosting machine classifier and Extreme Gradient Boosting classifier were initialized with 3 weak learners and fit to the training set, with subsequent predictions made on the test set. Evaluation metrics used include Accuracy, Area under Receiver Operating Characteristic curves (AUROC), Precision-Recall curves and F2 scores followed by a Stratified 5-fold cross-validation procedure. Taking Precision and Recall into higher consideration, Extreme Gradient Boosting classifier and Extremely Randomized Trees classifier produced better performances with an average accuracy of 0.9386 and 0.9460 respectively. Overall, the Extremely Randomized Trees classifier outperforms the rest of the models with an average F2 score of 0.4232. Keywords: Breast cancer; Classification models; Tree-based Ensemble; Supervised learning
- Published
- 2024
- Full Text
- View/download PDF
47. Near infrared spectroscopy for cooking time classification of cassava genotypes
- Author
-
Massaine Bandeira e Sousa, Cinara Fernanda Garcia Morales, Edwige Gaby Nkouaya Mbanjo, Chiedozie Egesi, and Eder Jorge de Oliveira
- Subjects
Manihot esculenta Crantz ,classification models ,portable NIR ,accuracy ,root quality ,Plant culture ,SB1-1110 - Abstract
Cooking time is a crucial determinant of culinary quality of cassava roots and incorporating it into the early stages of breeding selection is vital for breeders. This study aimed to assess the potential of near-infrared spectroscopy (NIRS) in classifying cassava genotypes based on their cooking times. Five cooking times (15, 20, 25, 30, and 40 minutes) were assessed and 888 genotypes evaluated over three crop seasons (2019/2020, 2020/2021, and 2021/2022). Fifteen roots from five plants per plot, featuring diameters ranging from 4 to 7 cm, were randomly chosen for cooking analysis and spectral data collection. Two root samples (15 slices each) per genotype were collected, with the first set aside for spectral data collection, processed, and placed in two petri dishes, while the second set was utilized for cooking assessment. Cooking data were classified into binary and multiclass variables (CT4C and CT6C). Two NIRs devices, the portable QualitySpec® Trek (QST) and the benchtop NIRFlex N-500 were used to collect spectral data. Classification of genotypes was carried out using the K-nearest neighbor algorithm (KNN) and partial least squares (PLS) models. The spectral data were split into a training set (80%) and an external validation set (20%). For binary variables, the classification accuracy for cassava cooking time was notably high (RCal2 ranging from 0.72 to 0.99). Regarding multiclass variables, accuracy remained consistent across classes, models, and NIR instruments (~0.63). However, the KNN model demonstrated slightly superior accuracy in classifying all cooking time classes, except for the CT4C variable (QST) in the NoCook and 25 min classes. Despite the increased complexity associated with binary classification, it remained more efficient, offering higher classification accuracy for samples and facilitating the selection of the most relevant time or variables, such as cooking time ≤ 30 minutes. The accuracy of the optimal scenario for classifying samples with a cooking time of 30 minutes reached RCal2 = 0.86 and RVal2 = 0.84, with a Kappa value of 0.53. Overall, the models exhibited a robust fit for all cooking times, showcasing the significant potential of NIRs as a high-throughput phenotyping tool for classifying cassava genotypes based on cooking time.
- Published
- 2024
- Full Text
- View/download PDF
48. Experimental data manipulations to assess performance of hyperspectral classification models of crop seeds and other objects
- Author
-
Nansen, Christian, Imtiaz, Mohammad S, Mesgaran, Mohsen B, and Lee, Hyoseok
- Subjects
Agricultural ,Veterinary and Food Sciences ,Biological Sciences ,Bioinformatics and Computational Biology ,Plant Biology ,Agricultural Biotechnology ,Classification performance ,Machine vision ,Proximal sensing ,Classification models ,Seed analysis ,Optical sensing ,Biochemistry and Cell Biology ,Plant Biology & Botany ,Agricultural biotechnology ,Bioinformatics and computational biology ,Plant biology - Abstract
BackgroundOptical sensing solutions are being developed and adopted to classify a wide range of biological objects, including crop seeds. Performance assessment of optical classification models remains both a priority and a challenge.MethodsAs training data, we acquired hyperspectral imaging data from 3646 individual tomato seeds (germination yes/no) from two tomato varieties. We performed three experimental data manipulations: (1) Object assignment error: effect of individual object in the training data being assigned to the wrong class. (2) Spectral repeatability: effect of introducing known ranges (0-10%) of stochastic noise to individual reflectance values. (3) Size of training data set: effect of reducing numbers of observations in training data. Effects of each of these experimental data manipulations were characterized and quantified based on classifications with two functions [linear discriminant analysis (LDA) and support vector machine (SVM)].ResultsFor both classification functions, accuracy decreased linearly in response to introduction of object assignment error and to experimental reduction of spectral repeatability. We also demonstrated that experimental reduction of training data by 20% had negligible effect on classification accuracy. LDA and SVM classification algorithms were applied to independent validation seed samples. LDA-based classifications predicted seed germination with RMSE = 10.56 (variety 1) and 26.15 (variety 2), and SVM-based classifications predicted seed germination with RMSE = 10.44 (variety 1) and 12.58 (variety 2).ConclusionWe believe this study represents the first, in which optical seed classification included both a thorough performance evaluation of two separate classification functions based on experimental data manipulations, and application of classification models to validation seed samples not included in training data. Proposed experimental data manipulations are discussed in broader contexts and general relevance, and they are suggested as methods for in-depth performance assessments of optical classification models.
- Published
- 2022
49. Doctor or AI? Efficient Neural Network for Response Classification in Health Consultations
- Author
-
Olumide E. Ojo, Olaronke O. Adebanji, Hiram Calvo, Alexander Gelbukh, Anna Feldman, and Ofir Ben Shoham
- Subjects
Artificial intelligence ,classification models ,deep learning ,machine learning ,MedXNet ,neural network ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Patients seek quality healthcare because they trust their doctors and the healthcare system. However, the use of AI models in medical consultations has undermined this trust. AI systems typically depend on accurate and large volumes of data for training, but in cases of insufficient or incorrect data, this can lead to incomplete or flawed outputs. The inaccuracies in the response generated by AI systems may result in biased outcomes, compromising patient care and further eroding the trust patients place in the healthcare system. In this paper, we describe an innovative approach to distinguishing between responses generated by AI and those written by human doctors during health consultations, using an efficient neural network. As part of our feature extraction approach, we converted text into numerical representations via word-level tokenization, mapping to integer sequences. This allows the neural network to efficiently process text while preserving semantic structure and handling a large vocabulary with fixed sequence lengths. Through rigorous experimentation and evaluation, we showcase the effectiveness and reliability of our proposed neural network architecture, MedXNet, in accurately classifying diverse responses encountered in health consultations. For the classification approach, we combined BiLSTM, Transformer, and CNN layers to capture local and global dependencies in sequence inputs and a dense layer that was fully connected with dropout regularization and softmax activation. We compared MedXNet performance with different RNNs, including LSTM, Bi-LSTM, GRU, and 1D-CNN, across three datasets of increasing complexity. Dataset A represents simple data, dataset B introduces greater complexity, and dataset C poses the highest level of challenge. Our findings revealed that MedXNet outperforms the others with an accuracy of 98.74% on dataset A. Although the accuracy of MedXNet decreased on B, it remains the top performer. With 94.63% accuracy, MedXNet still achieves the highest accuracy in dataset C. Based on these findings, MedXNet demonstrated robustness across a wide range of data complexity levels, making it an ideal classification tool for doctor-written and AI-generated text in health consultations. This can enhance the trust patients have in the responses they receive during online medical consultations.
- Published
- 2024
- Full Text
- View/download PDF
50. Prediction of Chemical Compounds Biodegradability: Molecular Fingerprint-Based Machine Learning Models
- Author
-
Alaa M. Elsayad, Medien Zeghid, Hassan Yousif Ahmed, and Khaled A. Elsayad
- Subjects
Biodegradability prediction ,classification models ,environmental risk assessment ,molecular fingerprints ,genetic algorithm ,feature ranking ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This work evaluates the performance of four machine learning models (MLMs): support vector machine (SVM), K-nearest neighbor (KNN), discriminant analysis (DA), and logistic regression (LR) in predicting the biodegradability of chemicals, a critical factor for assessing environmental risks. For this purpose, the RDKit library was initially employed to extract nine fingerprints from a dataset consisting of 1717 chemical compounds. Subsequently, the Information Gain (IG) feature ranking algorithm was used to identify the top 100 predictive features for each fingerprint. The MLM Hyperparameters were optimized, and informative features were selected using a multi-objective genetic algorithm (MOGA), which identified dominant pairs (MLM/fingerprint) maximizing F1-score while minimizing number of features. MACCS, Layered, and Avalon molecular fingerprints associated with the different MLM showed superior performance, with KNN/MACCS achieving a cross-validated F1-score of 87.63% using only 13 features. Afterwards, the final classification models were constructed using the solution with the highest cross-validated F1-score complement for each combination of MLM/fingerprint. These models were then evaluated on both the training and test datasets. In the training subset, the MACCS fingerprint consistently outperformed others, achieving cross-validated AUC scores above 90% for all models (SVM: 91%, KNN: 90.4%, DA: 90.9%, LR: 91.32%). The SVM/MACCS pair demonstrated the highest accuracy (94.17%), specificity (95.84%), and F1-score (93.14%). In the test subset, the SVM/Layered pair exhibited the highest accuracy (84.01%) and specificity (88.09%). The DA/Avalon combination achieved the highest sensitivity (84.40%), while the SVM/Avalon pair reached the highest F1-score (82.58%). These results underscore the effectiveness of MACCS fingerprints for biodegradation classification across various models. Additionally, Permutation Feature Importance (PFI) and Shapley Additive Explanations (SHAP) have identified the key MACCS features influencing biodegradation classifications in the SVM model. These methods highlighted MACCS bit numbers 154, 145, 142, 144, and 156 as the most crucial contributors to the SVM model’s predictions.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.