72,528 results on '"bayes classifier"'
Search Results
2. XNB: Explainable Class-Specific NaIve-Bayes Classifier
- Author
-
Aguilar-Ruiz, Jesus S., Romero, Cayetano, and Cicconardi, Andrea
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
In today's data-intensive landscape, where high-dimensional datasets are increasingly common, reducing the number of input features is essential to prevent overfitting and improve model accuracy. Despite numerous efforts to tackle dimensionality reduction, most approaches apply a universal set of features across all classes, potentially missing the unique characteristics of individual classes. This paper presents the Explainable Class-Specific Naive Bayes (XNB) classifier, which introduces two critical innovations: 1) the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and 2) the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized. Extensive empirical analysis on high-dimensional genomic datasets shows that XNB matches the classification performance of traditional Naive Bayes while drastically improving model interpretability. By isolating the most relevant features for each class, XNB not only reduces the feature set to a minimal, distinct subset for each class but also provides deeper insights into how the model makes predictions. This approach offers significant advantages in fields where both precision and explainability are critical.
- Published
- 2024
3. Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier
- Author
-
Hue, Carine and Boullé, Marc
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We study supervised classification for datasets with a very large number of input variables. The na\"ive Bayes classifier is attractive for its simplicity, scalability and effectiveness in many real data applications. When the strong na\"ive Bayes assumption of conditional independence of the input variables given the target variable is not valid, variable selection and model averaging are two common ways to improve the performance. In the case of the na\"ive Bayes classifier, the resulting weighting scheme on the models reduces to a weighting scheme on the variables. Here we focus on direct estimation of variable weights in such a weighted na\"ive Bayes classifier. We propose a sparse regularization of the model log-likelihood, which takes into account prior penalization costs related to each input variable. Compared to averaging based classifiers used up until now, our main goal is to obtain parsimonious robust models with less variables and equivalent performance. The direct estimation of the variable weights amounts to a non-convex optimization problem for which we propose and compare several two-stage algorithms. First, the criterion obtained by convex relaxation is minimized using several variants of standard gradient methods. Then, the initial non-convex optimization problem is solved using local optimization methods initialized with the result of the first stage. The various proposed algorithms result in optimization-based weighted na\"ive Bayes classifiers, that are evaluated on benchmark datasets and positioned w.r.t. to a reference averaging-based classifier.
- Published
- 2024
4. Diagnosis and multiclass classification of diabetic retinopathy using enhanced multi thresholding optimization algorithms and improved Naive Bayes classifier
- Author
-
Bhimavarapu, Usharani
- Published
- 2024
- Full Text
- View/download PDF
5. Comparison of Naïve Bayes Classifier and Decision Tree Algorithms for Sentiment Analysis on the House of Representatives' Right of Inquiry on Twitter
- Author
-
Putri Wahyuni and Moh. Ali Romli
- Subjects
house of representatives' right of inquiry ,public sentiment ,twitter ,naïve bayes classifier ,decision tree ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This research analyzes public sentiment towards the topic of the House of Representatives' Right of Inquiry on Twitter using Naïve Bayes Classifier and Decision Tree algorithms. The goal is to compare the effectiveness of the two algorithms in political sentiment analysis. . The research methodology includes data collection from Twitter, data pre-processing, sentiment classification, and result analysis. Sentiment analysis reveals the dominance of positive sentiment related to the DPR's Right of Inquiry. However, this study has limitations in terms of dataset size and depth of text-based sentiment analysis. This research contributes to a better understanding of public sentiment towards political issues in Indonesia and highlights the importance of proper algorithm selection in social media sentiment analysis. Development suggestions include exploration of deep learning techniques, integration of multimodal analysis, data balancing (oversampling or undersampling) and improvement of pre-processing so that the model is better able to capture negative contexts. The results of the study showed excellent performance of both Naive Bayes Classifier and Decision Tree algorithms with accuracy above 95%. Decision Tree excels with an accuracy of 99%, while Naïve Bayes Classifier performs better with an accuracy of 96%. The results with the Confusion Matrix test are precision 0.98, recall 1.00, and F1-Score 0.99.
- Published
- 2024
- Full Text
- View/download PDF
6. Implementation of the Naive Bayes Classifier Algorithm for Classifying Toddler Nutritional Status
- Author
-
Muhammad Insan Kamil and Adityo Permana Wibowo
- Subjects
naive bayes classifier ,nutritional status ,toddlers ,malnutrition ,class imbalance ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
This research addresses the pressing issue of malnutrition among toddlers in Indonesia, aiming to classify their nutritional status using the Naive Bayes Classifier (NBC). The study utilizes a dataset comprising 958 records from Puskesmas Cilandak and categorizes nutritional status into six class labels: good nutrition, at risk of excess nutrition, excess nutrition, obesity, undernutrition, and severe malnutrition. The methodology includes data preprocessing techniques such as class weighting to tackle class imbalance and Principal Component Analysis (PCA) for effective feature extraction. The model's performance is evaluated using metrics such as accuracy, precision, recall, and F1 score, achieving an impressive accuracy of 85.76% when class weighting is applied, which significantly enhances the recall and F1 scores for minority classes. The findings highlight the critical importance of robust preprocessing and evaluation metrics in improving machine learning models for public health applications. Furthermore, they suggest that further exploration of alternative algorithms and dataset expansion could yield more comprehensive insights into the classification of toddler nutritional status.
- Published
- 2024
- Full Text
- View/download PDF
7. Naïve Bayes classifier for Kashmiri word sense disambiguation
- Author
-
Mir, Tawseef Ahmad and Lawaye, Aadil Ahmad
- Published
- 2024
- Full Text
- View/download PDF
8. A Notion of Uniqueness for the Adversarial Bayes Classifier
- Author
-
Frank, Natalie S.
- Subjects
Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Statistics - Machine Learning - Abstract
We propose a new notion of uniqueness for the adversarial Bayes classifier in the setting of binary classification. Analyzing this concept produces a simple procedure for computing all adversarial Bayes classifiers for a well-motivated family of one dimensional data distributions. This characterization is then leveraged to show that as the perturbation radius increases, certain the regularity of adversarial Bayes classifiers improves. Various examples demonstrate that the boundary of the adversarial Bayes classifier frequently lies near the boundary of the Bayes classifier., Comment: 49 pages, 7 figures v2: fixed typos, notation errors, and a mistake in example 7
- Published
- 2024
9. Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier
- Author
-
Frank, Natalie S.
- Subjects
Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Statistics - Machine Learning - Abstract
Minimizing an adversarial surrogate risk is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as adversarial Bayes classifiers. Specifically, under reasonable distributional assumptions, a convex surrogate loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness., Comment: 2 figures, 20 pages, v2: fixed typos, v3: improved organization of paper and added figures
- Published
- 2024
10. PERBANDINGAN ANALISIS REGRESI LOGISTIK BINER DAN NAÏVE BAYES CLASSIFIER UNTUK MEMPREDIKSI FAKTOR RESIKO DIABETES
- Author
-
Rafika Aristawidya, Indahwati Indahwati, Erfiani Erfiani, Anwar Fitrianto, and Muftih A. A.
- Subjects
diabetes, regresi logistik biner, naïve bayes classifier ,binary logistic regression ,Mathematics ,QA1-939 - Abstract
Diabetes is a global health problem that is increasing in prevalence worldwide. This study compares the performance of two data analysis methods, namely binary logistic regression and naïve bayes classifier in predicting diabetes risk. This study aims to identify factors that significantly affect diabetes risk and classify diabetes risk using binary logistic regression, then compare the classification with the naive bayes classifier algorithm. Binary logistic regression models the relationship between independent predictor variables and binary dependent variables, while naïve bayes classifier uses the assumption of independence between variables. In this study, both methods were evaluated based on accuracy, sensitivity, specificity and positive predictive value. The results show that the factors that influence the risk of diabetes are Age, Gender, Polyuria, Polydipsia, Genital thrush, Itching, Irritability, and Partial paresis. Furthermore, the binary logistic regression results have a higher classification accuracy (92.31%) compared to the naïve bayes classifier (84.61%). Therefore, binary logistic regression was identified as the best method to predict diabetes risk in the context of this study
- Published
- 2024
- Full Text
- View/download PDF
11. Optimizing Mail Sorting with Naive Bayes Classifier and Enhanced Feature Extraction Method
- Author
-
Pavithra, C., Saradha, M., and Nisha, B. Antline
- Published
- 2024
- Full Text
- View/download PDF
12. Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Kepribadian MBTI Menggunakan Naive Bayes Classifier
- Author
-
Mutiara Persada Pulungan, Andi Purnomo, and Aliyah Kurniasih
- Subjects
Myers-Briggs Type Indicator(MBTI) ,Imbalance Class, Synthetic Minority Over-sampling Technique (SMOTE) ,Term Frequency-Inverse Document Frequency (TF-IDF) ,Naive Bayes Classifier ,Technology ,Information technology ,T58.5-58.64 - Abstract
Kepribadian Myers-Briggs Type Indicator ( MBTI ) telah menjadi topik populer dalam memahami karakteristik individu dan dampaknya pada interaksi sosial, karir, dan pengambilan keputusan. Model Machine Learning dengan algoritma Naive Bayes Classifier sering digunakan untuk memprediksi kepribadian MBTI berdasarkan data Twitter. Namun, seringkali terjadi ketidakseimbangan kelas, dengan beberapa jenis kepribadian yang memiliki sampel lebih sedikit. Untuk mengatasi hal ini, penelitian ini menggunakan teknik Synthetic Minority Over-sampling Technique (SMOTE) untuk meningkatkan jumlah sampel pada kelas minoritas. Selain itu, metode Term Frequency-Inverse Document Frequency (TF-IDF) digunakan untuk mengekstraksi fitur penting dari teks. Penelitian ini bertujuan menerapkan teknik SMOTE untuk mengatasi ketidakseimbangan kelas dalam klasifikasi kepribadian MBTI menggunakan beberapa algoritma Naive Bayes Classifier, termasuk Gaussian, Multinomial, Bernoulli, Complement, dan Logistic Regression berdasarkan model Keirsey: Artisan, Guardian, Rational, dan Idealist. Evaluasi menggunakan metode Hold-Out-Validation dengan membagi data menjadi 90% data latih dan 10% data uji. Hasil evaluasi menunjukkan performa rendah algoritma Naive Bayes Classifier untuk kelas Artisan dan Guardian, tetapi baik untuk kelas Rational dan Idealist. Algoritma Logistic Regression memiliki akurasi tertinggi 80% dan performa yang lebih baik secara keseluruhan, meskipun masih rendah untuk kelas Artisan dan Guardian. Dengan demikian, penelitian ini memberikan pemahaman tentang penggunaan algoritma Naive Bayes Classifier dan teknik SMOTE dalam prediksi kepribadian MBTI, dengan potensi peningkatan kinerja melalui penggunaan algoritma Logistic Regression. Abstract Myers-Briggs Type Indicator (MBTI) personality is becoming a popular topic in understanding individual characteristics and their impact on social interaction, career, and decision-making. Machine Learning models with Naive Bayes Classifier algorithms are often used to predict MBTI personalities from Twitter data. However, there is often a class imbalance, with some personality types having a smaller sample. To overcome this, this study used the Synthetic Minority Over-sampling Technique (SMOTE) technique to increase the number of samples in minority classes. Additionally, the Term Frequency-Inverse Document Frequency (TF-IDF) method is used to extract important features from text. This study aims to apply SMOTE techniques to address class imbalances in MBTI personality classification using several Naïve Bayes Classifier algorithms, including Gaussian, Multinomial, Bernoulli, Complement, and Logistic Regression based on Keirsey's model: Artisan, Guardian, Rational, and Idealist. Evaluation using the Hold-Out-Validation method by dividing the data into 90% training data and 10% test data. The evaluation results showed low performance of the Naive Bayes Classifier algorithm for the Artisan and Guardian classes, but both for the Rational and Idealist classes. The Logistic Regression algorithm has the highest accuracy of 79% and better performance overall, although it is still low for the Artisan and Guardian classes. Thus, this study provides insight into the use of Naive Bayes Classifier algorithm and SMOTE technique in MBTI personality prediction, with potential performance improvement through the use of Logistic Regression algorithm.
- Published
- 2024
- Full Text
- View/download PDF
13. Sampling Audit Evidence Using a Naive Bayes Classifier
- Author
-
Sheu, Guang-Yih and Liu, Nai-Ru
- Subjects
Computer Science - Machine Learning ,62D05, 62H30 - Abstract
Taiwan's auditors have suffered from processing excessive audit data, including drawing audit evidence. This study advances sampling techniques by integrating machine learning with sampling. This machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples. We first classify data using a Naive Bayes classifier into some classes. Next, a user-based, item-based, or hybrid approach is employed to draw audit evidence. The representativeness index is the primary metric for measuring its representativeness. The user-based approach samples data symmetric around the median of a class as audit evidence. It may be equivalent to a combination of monetary and variable samplings. The item-based approach represents asymmetric sampling based on posterior probabilities for obtaining risky samples as audit evidence. It may be identical to a combination of non-statistical and monetary samplings. Auditors can hybridize those user-based and item-based approaches to balance representativeness and riskiness in selecting audit evidence. Three experiments show that sampling using machine learning integration has the benefits of drawing unbiased samples, handling complex patterns, correlations, and unstructured data, and improving efficiency in sampling big data. However, the limitations are the classification accuracy output by machine learning algorithms and the range of prior probabilities., Comment: 16 pages, 11 figures, 4 tables
- Published
- 2024
14. A hyper-heuristic based approach with naive Bayes classifier for the reliability p-median problem
- Author
-
Chappidi, Edukondalu and Singh, Alok
- Published
- 2023
- Full Text
- View/download PDF
15. Threshold-based Naïve Bayes classifier
- Author
-
Romano, Maurizio, Contu, Giulia, Mola, Francesco, and Conversano, Claudio
- Published
- 2024
- Full Text
- View/download PDF
16. An Efficient Shapley Value Computation for the Naive Bayes Classifier
- Author
-
Lemaire, Vincent, Clérot, Fabrice, and Boullé, Marc
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no ``analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal, to another frequently used indicator, the Weight of Evidence (WoE) and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity so that it can be used on very large datasets with extremely low computation time., Comment: 15 pages, 3 figures
- Published
- 2023
17. Prediction of depression status in college students using a Naive Bayes classifier based machine learning model
- Author
-
Cruz, Fred Torres, Flores, Evelyn Eliana Coaquira, and Quispe, Sebastian Jarom Condori
- Subjects
Statistics - Other Statistics - Abstract
This study presents a machine learning model based on the Naive Bayes classifier for predicting the level of depression in university students, the objective was to improve prediction accuracy using a machine learning model involving 70% training data and 30% validation data based on the Naive Bayes classifier, the collected data includes factors associated with depression from 519 university students, the results showed an accuracy of 78.03%, high sensitivity in detecting positive cases of depression, especially at moderate and severe levels, and significant specificity in correctly classifying negative cases, these findings highlight the effectiveness of the model in early detection and treatment of depression, benefiting vulnerable sectors and contributing to the improvement of mental health in the student population.
- Published
- 2023
18. Naive Bayes classifier – An ensemble procedure for recall and precision enrichment
- Author
-
Peretz, Or, Koren, Michal, and Koren, Oded
- Published
- 2024
- Full Text
- View/download PDF
19. Aspect-Based Sentiment Analysis of User Reviews on the Game 'Honkai: Star Rail' Using Naïve Bayes Classifier
- Author
-
Hisyam Agus Setiawan and Herman Yuliansyah
- Subjects
Technology ,Information technology ,T58.5-58.64 - Abstract
Game is a form of entertainment that is often used to refresh the mind from the fatigue of daily activities and routines. Honkai: Star Rail is a popular turn-based game from Hoyoverse available on Google Play Store. Several studies have proposed Sentiment Analysis with Naïve Bayes classification method. However, not many have identified the reviews of a game to the extent of identifying on its aspects. In aspect-based sentiment analysis, text is analyzed to identify various attributes or components, then the relevant sentiment (positive, negative, or neutral) for each of these attributes is determined. This research aims to analyze aspect-based sentiment using the Naïve Bayes Classifier method, as well as categorize sentiment into positive and negative, and classify reviews into certain aspects. The results obtained after 5-fold iteration obtained the best average accuracy of 79%, The evaluation results show that it is necessary to tune the model using Grid Search Hyperparameter Tuning. Optimization of smoothing parameters with alpha = 0.1 proved effective in improving model performance with the highest weighted average accuracy of 93%. The evaluation results show that Grid Search Hyperparameter Tuning optimization gives better performance to the Naive Bayes algorithm model in multi-label classification.
- Published
- 2024
- Full Text
- View/download PDF
20. Penggunaan Metode Naïve Bayes Classifier pada Analisis Sentimen Penilaian Masyarakat Terhadap Pelayanan Rumah Sakit di Malang
- Author
-
Tsania Dzulkarnain, Dian Eka Ratnawati, and Bayu Rahayudi
- Subjects
Analisis Sentimen ,rumah sakit ,Naive Bayes ,ulasan pengunjung ,Google Maps Reviews ,Technology ,Information technology ,T58.5-58.64 - Abstract
Peran rumah sakit dalam kehidupan masyarakat sangatlah penting terkait tingkat kepuasan Masyarakat terhadap pelayanan, fasilitas, dan aspek lainnya. Opini dan penilaian masyarakat turut menjadi penilaian terhadap kinerja pelayanan rumah sakit. Pada Google Maps Reviews banyak ulasan dari berbagai rumah sakit.Penilaian yang sangat besar dapat kita lihat pada Google Maps Reviews akan memakan waktu bagi masyarakat. Keluhan-keluhan Masyarakat disekitar penulis terhadap pelayanan rumah sakit di Malang menjadikan penilaian pelayanan rumah sakit di Malang menjadi objek dari penelitian dasar ini. Penulis memanfaatkan algoritma Naïve Bayes Classifier dan Cross Validation untuk mengkategorikan penilaian berdasarkan sentimen positif dan negatif serta aspek agar mempermudah pengkategorian. Aspek yang dipergunakan tersebut adalah aspek penanganan, fasilitas, administrasi, dan biaya. Penulis juga menggunakan analisis Root Cause untuk mempermudah masyarakat dan pihak terkait dalam menemukan masalah dan rekomendasi pemecahan masalah. Awalnya data di proses dengan text preprocessing lalu pembobotan kata TF-IDF, pelabelan data, penerapan algoritma Naïve Bayes Classifier dan mengambil sentimen negatif untuk menentukan Root Cause. Hasil pengujian dengan menggunakan Cross Validation dengan fold k-9 memiliki nilai accuracy 82,97% , precision sebesar 83,13%, recall 82,93%, dan f-measure sebesar 82,92%. Hasil uji dengan menggunakan 20% data tes diperoleh akurasi 90%. Abstract The role of hospitals in society is crucial in terms of the level of satisfaction that the community derives from their services, facilities, and other aspects. Public opinions and assessments also contribute to evaluating hospital service performance. On Google Maps Reviews, there are numerous reviews from various hospitals. A significant evaluation can be observed on Google Maps Reviews, which might take time for the community. The complaints of the community around the writer regarding the hospital services in Malang make the assessment of hospital services in Malang the subject of this basic research. The author utilizes the Naive Bayes Classifier algorithm and Cross Validation to categorize assessments based on positive and negative sentiments, as well as 4 aspects to facilitate categorization. The author also employs Root Cause analysis to aid the public and relevant parties in identifying issues and providing problem-solving recommendations. After processing the data through text preprocessing and TF-IDF word weighting, data labeling, applying the Naive Bayes Classifier algorithm, and extracting negative sentiments to determine the Root Cause in negative hospital sentiments. Based on this process, applying Cross Validation with k-9 folds yields the highest values: an accuracy of 82.97%, precision of 83.13%, recall of 82.93%, and an f-measure of 82.92%. Through the sentiment classification and Cross Validation process, the accuracy results in 90% for hospital reviews with the highest number of assessments divided into 2 sentiments and 4 aspects: positive and negative sentiments, as well as treatment, facilities, administration, and costs.
- Published
- 2024
- Full Text
- View/download PDF
21. Analisis Sentimen Pengguna Youtube Terhadap Uang Baru Tahun Emisi 2022 Menggunakan Metode Naïve Bayes Classifier
- Author
-
Aji Akbar Mirinda Putra, Islamiyah, Muhammad Labib Jundillah, Aji Akbar Mirinda Putra, Islamiyah, and Muhammad Labib Jundillah
- Abstract
Uang merupakan komoditas vital dalam kegiatan ekonomi dan pada tahun 2022 Bank Indonesia meluncurkan Uang Rupiah Tahun Emisi 2022 dengan desain yang lebih menarik dan fitur keamanan yang lebih baik. Dalam hal ini media sosial, khususnya YouTube menjadi platform penting bagi masyarakat Indonesia untuk mengekspresikan pendapat mereka tentang perubahan tersebut. Metode Naïve Bayes Classifier digunakan untuk mengklasifikasikan sentimen pengguna ke dalam kategori positif atau negatif. Penelitian ini bertujuan untuk menganalisis sentimen pengguna YouTube terhadap Tahun Emisi Rupiah 2022 dengan menggunakan metode Naïve Bayes Classifier. Tujuan utamanya adalah untuk mengidentifikasi apakah sentimen pengguna positif atau negatif terhadap Tahun Emisi Rupiah 2022 dan mengukur accuracy, precision, recall, dan f1-score dari analisis sentimen tersebut. Hasil analisis menunjukkan bahwa 55,1% pengguna YouTube memberikan respon positif terhadap Uang Rupiah Tahun Emisi 2022, sementara 44,9% memberikan respon negatif. Evaluasi kinerja model menghasilkan accuracy 80%, precision 87%, recall 73%, dan f1-score 80%. Hasil ini menunjukkan bahwa model ini memiliki kinerja yang baik dalam mengklasifikasikan sentimen pengguna.
- Published
- 2024
22. Bayes classifier cannot be learned from noisy responses with unknown noise rates
- Author
-
Bakshi, Soham and Maity, Subha
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Training a classifier with noisy labels typically requires the learner to specify the distribution of label noise, which is often unknown in practice. Although there have been some recent attempts to relax that requirement, we show that the Bayes decision rule is unidentified in most classification problems with noisy labels. This suggests it is generally not possible to bypass/relax the requirement. In the special cases in which the Bayes decision rule is identified, we develop a simple algorithm to learn the Bayes decision rule, that does not require knowledge of the noise distribution., Comment: Invited to present in ICLR Tiny Paper 2023
- Published
- 2023
23. Iterative threshold-based Naïve bayes classifier
- Author
-
Romano, Maurizio, Zammarchi, Gianpaolo, and Conversano, Claudio
- Published
- 2024
- Full Text
- View/download PDF
24. A Naive Bayes Classifier for identifying Class II YSOs
- Author
-
Wilson, Andrew J., Lakeland, Ben S., Wilson, Tom J., and Naylor, Tim
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
A naive Bayes classifier for identifying Class II YSOs has been constructed and applied to a region of the Northern Galactic Plane containing 8 million sources with good quality Gaia EDR3 parallaxes. The classifier uses the five features: Gaia $G$-band variability, WISE mid-infrared excess, UKIDSS and 2MASS near-infrared excess, IGAPS H$\alpha$ excess and overluminosity with respect to the main sequence. A list of candidate Class II YSOs is obtained by choosing a posterior threshold appropriate to the task at hand, balancing the competing demands of completeness and purity. At a threshold posterior greater than 0.5 our classifier identifies 6504 candidate Class II YSOs. At this threshold we find a false positive rate around 0.02 per cent and a true positive rate of approximately 87 per cent for identifying Class II YSOs. The ROC curve rises rapidly to almost one with an area under the curve around 0.998 or better, indicating the classifier is efficient at identifying candidate Class II YSOs. Our map of these candidates shows what are potentially three previously undiscovered clusters or associations. When comparing our results to published catalogues from other young star classifiers, we find between one quarter and three quarters of high probability candidates are unique to each classifier, telling us no single classifier is finding all young stars., Comment: 38 pages, 28 figures, 15 tables. Accepted for publication in MNRAS
- Published
- 2023
- Full Text
- View/download PDF
25. Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier
- Author
-
Karr, Alan F., Bowen, Zac, and Porter, Adam A.
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Whether based on models, training data or a combination, classifiers place (possibly complex) input data into one of a relatively small number of output categories. In this paper, we study the structure of the boundary--those points for which a neighbor is classified differently--in the context of an input space that is a graph, so that there is a concept of neighboring inputs, The scientific setting is a model-based naive Bayes classifier for DNA reads produced by Next Generation Sequencers. We show that the boundary is both large and complicated in structure. We create a new measure of uncertainty, called Neighbor Similarity, that compares the result for a point to the distribution of results for its neighbors. This measure not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented, at a computational cost, for classifiers without inherent measures of uncertainty.
- Published
- 2022
26. Naïve Bayes classifier based on reliability measurement for datasets with noisy labels
- Author
-
Zhu, Yingqiu, Wang, Yinzhi, Qin, Lei, Zhang, Bo, Shia, Ben-Chang, and Chen, MingChih
- Published
- 2023
- Full Text
- View/download PDF
27. Sentiment Analysis of Telemedicine Applications on Twitter Using Lexicon-Based and Naive Bayes Classifier Methods
- Author
-
Arid Hasan, Yudhi Raymond Ramadhan, and Minarto Minarto
- Subjects
lexicon based ,naive bayes classifier ,sentiment analysis ,telemedicine applications ,Electronic computers. Computer science ,QA75.5-76.95 ,Computer engineering. Computer hardware ,TK7885-7895 - Abstract
Since the onset of the COVID-19 pandemic in Indonesia, many people have turned to telemedicine programs as an alternative to minimize social interactions, opting for consultations from the safety of their homes using smartphones and internet connectivity. Given the necessity for physical distancing and avoiding crowded places, these applications have become indispensable substitutes for in-person medical consultations. Numerous apps facilitating access to healthcare services have been introduced in Indonesia, ranging from business startups to initiatives by the Ministry of Health. Telemedicine can potentially revolutionize healthcare in Indonesia, addressing critical health challenges. A significant issue within Indonesia's healthcare system is the scarcity of doctors and their uneven distribution. With only four doctors per 10,000 people, this figure falls far below the WHO guideline of 10 doctors per 1,000. Sentiment analysis of these applications was conducted to evaluate how telemedicine applications meet public needs and offer an alternative solution. Lexicon-based and naive Bayes methods were employed to classify tweet data into positive, neutral, and negative sentiments. The results revealed 908 positive tweets, 172 negative tweets, and 168 neutral tweets, indicating predominantly positive public perceptions of telemedicine applications. The naive Bayes classifier exhibited a 74% accuracy rate, with a precision of 98% and a recall of 86%. These findings underscore the positive impact and acceptance of telemedicine applications among the Indonesian populace, emphasizing their significance in augmenting the nation's healthcare landscape.
- Published
- 2023
- Full Text
- View/download PDF
28. Sentiment Analysis of Social Media Platform Reviews Using the Naïve Bayes Classifier Algorithm
- Author
-
Sudin Saepudin, Selviani Widiastuti, and Carti Irawan
- Subjects
sentiment analysis, google play store, nbc (naïve bayes classifier), social media platforml, python ,Information technology ,T58.5-58.64 - Abstract
The Covid-19 pandemic has caused significant changes in people's lifestyles which are further strengthened by the rapid development of technology. This has resulted in increased use of the internet and accelerated dissemination of information through social media platforms. Not only for self-expression, social media can also be a means of communication, information, education, and even used as a marketing tool. Several social media platforms have recently been popular and widely used, the number of users is increasing from year to year, and each user can provide a rating review of the application. To find out public opinion on social media platforms, sentiment analysis will be carried out on several social media platform applications on the Google Play Store, namely Twitter, Instagram and Tiktok which will later be used as material for evaluating these applications. In this study, the dataset was taken based on ratings from user reviews on the Google Play Store using the NBC (Naïve Bayes Classifier) method with the Python programming language. Based on testing of 1000 comment review data from each application, it was found that the majority gave positive sentiment (Twitter 57.2%, Instagram 74.1%, Tiktok 83.9%), and negative sentiment (Twitter 42.8%, Instagram 25.9%, Tiktok 16.1%) with an accuracy rate of 85.6% for the Twitter application, 83.6% for the Instagram application, and 84.8% for the Tiktok application.
- Published
- 2023
- Full Text
- View/download PDF
29. Fault Diagnosis Method of Impulse Impedance Characteristic Spectrum Based on Naive Bayes Classifier
- Author
-
Huang, Baoming, primary
- Published
- 2024
- Full Text
- View/download PDF
30. Naive Bayes Classifier-Based Smishing Detection Framework to Reduce Cyber Attack
- Author
-
Kaur, Gaganpreet, primary, Singh, Kiran Deep, additional, Arora, Jatin, additional, Bagchi, Susama, additional, Debnath, Sanjoy Kumar, additional, and Kumar, A. V. Senthil, additional
- Published
- 2024
- Full Text
- View/download PDF
31. Use of Naïve Bayes Classifier to Assess the Effects of Antipsychotic Agents on Brain Electrical Activity Parameters in Rats
- Author
-
Sysoev, Yu. I., Shits, D. D., Puchik, M. M., Prikhodko, V. A., Idiyatullin, R. D., Kotelnikova, A. A., and Okovityi, S. V.
- Published
- 2022
- Full Text
- View/download PDF
32. HabagatPlus: Providing Recommendations for Localized Class Suspension in Schools During Inclement Weather Conditions Using Naive Bayes Classifier.
- Author
-
Michael Junice M. Salvilla and Aleta C. Fabregas
- Published
- 2024
- Full Text
- View/download PDF
33. Implementasi Smote untuk Mengatasi Class Imbalance pada Naive Bayes Classifier dalam Analisis Sentimen Calon Presiden di Pemilihan Umum 2024
- Author
-
SETIANINGSIH, Susi and SETIANINGSIH, Susi
- Abstract
Euforia Pemilu 2024 sudah mulai terasa sejak tahun 2022, khususnya untuk pemilihan presiden dan wakil presiden. Banyak lembaga di Indonesia melakukan survei untuk memetakan kekuatan dan melihat peluang kandidat dalam berkontestasi. Akan tetapi, cara ini dinilai kurang efektif, sehingga perlu alternatif lain seperti akuisisi media sosial salah satunya Twitter atau 'X'. Analisis sentimen pada platform media sosial X dapat digunakan untuk memperoleh informasi dari berbagai macam sentimen oleh pengguna melalui tweet. Teknik Naive Bayes Classifier digunakan untuk klasifikasi sentimen karena memberikan hasil yang cukup baik. Namun, data tweet cenderung tidak seimbang pada setiap kelasnya, sehingga diperlukan suatu pendekatan untuk mengatasi ketidakseimbangan kelas yaitu dengan menggunakan SMOTE (Synthetic Minority Oversampling Technique). Data set diambil melalui API token dari aplikasi X. Data tweet akan di-labelling untuk selanjutnya dilakukan pemodelan. Hasil akhir dilakukan dengan menentukan model dengan nilai akurasi tertinggi. Untuk memudahkan dalam melakukan proses analisis, tampilan dalam bentuk website akan dipilih sebagai pendukung dalam melakukan analisis sentimen. Tampilan website akan dibuat menggunakan framework Streamlit dengan bahasa pemrograman Python. Hasil yang diperoleh menunjukkan bahwa Uji 3 dengan 2400 data training mendapatkan nilai akurasi terbaik mencapai 72.80%. Terdapat peningkatan nilai akurasi sebesar 15-23% lebih tinggi di kelompok uji yang menggunakan implementas SMOTE pada model Naive Bayes. Perbedaan jumlah data yang digunakan juga memiliki pengaruh terhadap perbaikan hasil nilai akurasi pengujian sebesar 1-10%.
- Published
- 2024
34. Analisis Sentimen Twitter Terhadap Indeks Harga Saham Gabungan Menggunakan Metode Term Frequency – Inverse Document Frequency dan Naïve Bayes Classifier
- Author
-
RISBIYANTO, Ihsan Puntadewa and RISBIYANTO, Ihsan Puntadewa
- Abstract
Saham merupakan salah satu instrumen investasi yang legal di Indonesia, harga rata-rata seluruh saham yang diperjual belikan di Indonesia disebut IHSG (indeks harga saham gabungan). Ada banyak faktor yang mempengaruhi naik turunnya IHSG salah satunya adalah sentimen masyarakat Indonesia itu sendiri, IHSG memiliki kecenderungan turun saat sentimen masyarakat negatif dan ada kecenderungan naik pada saat sentimen masyarakat positif. Twitter merupakan platform media sosial yang mana penggunanya bisa bebas berpendapat, masyarakat Indonesia pun banyak yang menggunakan twitter untuk menuliskan opininya terkait IHSG. Sehingga diperlukan metode untuk mengklasifikasikan sentimen masyarakat pada twitter. Dalam hal ini, analisis sentimen digunakan untuk mengklasifikasi tweet pada Twitter terhadap ihsg untuk menentukan apakah tweet tersebut bersifat positif atau negatif. Model sentimen analisis dibuat menggunakan metode TF-IDF terhadap data cuitan di Twitter. Metode TF-IDF dipakai untuk menentukan hubungan kata (term) terhadap dokumen atau kalimat dengan memberikan bobot atau nilai pada masing-masing kata. Selanjutnya, untuk penentuan kelas sentimen dilakukan dengan menggunakan metode Naïve Bayes. Hasil dari penelitian ini aplikasi analisis sentimen pengguna twitter terhadap indeks harga saham gabungan menggunakan metode Term Frequency - Inverse Document frequency dan Naïve Bayes classifier dengan akurasi model sebesar 90,3%. Sedangkan nilai precisionnya sebesar 90,3% dan untuk nilai recallnya sebesar 95,9%, dengan total dataset 6019 dan pembagian dataset 4815 (80%) data training dan 1203 (20%) data testing. Aplikasi ini memiliki 2 jenis tampilan landing page, dan admin page. Website ini bisa mengklasifikasikan data tweet dengan 2 jenis input teks, dan csv.
- Published
- 2024
35. Symmetrical and Asymmetrical Sampling Audit Evidence Using a Naive Bayes Classifier
- Author
-
Guang-Yih Sheu and Nai-Ru Liu
- Subjects
symmetrical sampling ,asymmetrical sampling ,audit evidence ,representativeness index ,Naive Bayes classifier ,Mathematics ,QA1-939 - Abstract
Taiwan’s auditors have suffered from processing excessive audit data, including drawing audit evidence. This study advances sampling techniques by integrating machine learning with sampling. This machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples. We first classify data using a Naive Bayes classifier into some classes. Next, a user-based, item-based, or hybrid approach is employed to draw audit evidence. The representativeness index is the primary metric for measuring its representativeness. The user-based approach samples data symmetrically around the median of a class as audit evidence. It may be equivalent to a combination of monetary and variable samplings. The item-based approach represents asymmetric sampling based on posterior probabilities for obtaining risky samples as audit evidence. It may be identical to a combination of non-statistical and monetary samplings. Auditors can hybridize those user-based and item-based approaches to balance representativeness and riskiness in selecting audit evidence. Three experiments show that sampling using machine learning integration has the benefits of drawing unbiased samples; handling complex patterns, correlations, and unstructured data; and improving efficiency in sampling big data. However, the limitations are the classification accuracy output by machine learning algorithms and the range of prior probabilities.
- Published
- 2024
- Full Text
- View/download PDF
36. Naive Bayes classifier assisted automated detection of cerebral microbleeds in susceptibility-weighted imaging brain images
- Author
-
Ateeq, Tayyab, Faheem, Zaid Bin, Ghoneimy, Mohamed, Ali, Jehad, Li, Yang, and Baz, Abdullah
- Subjects
Intracerebral hemorrhage -- Diagnosis ,Magnetic resonance imaging -- Methods ,Algorithms -- Usage ,Algorithm ,Biological sciences - Abstract
Cerebral microbleeds (CMBs) in the brain are the essential indicators of critical brain disorders such as dementia and ischemic stroke. Generally, CMBs are detected manually by experts, which is an exhaustive task with limited productivity. Since CMBs have complex morphological nature, manual detection is prone to errors. This paper presents a machine learning-based automated CMB detection technique in the brain susceptibility-weighted imaging (SWI) scans based on statistical feature extraction and classification. The proposed method consists of three steps: (1) removal of the skull and extraction of the brain; (2) thresholding for the extraction of initial candidates; and (3) extracting features and applying classification models such as random forest and naive Bayes classifiers for the detection of true positive CMBs. The proposed technique is validated on a dataset consisting of 20 subjects. The dataset is divided into training data that consist of 14 subjects with 104 microbleeds and testing data that consist of 6 subjects with 63 microbleeds. We were able to achieve 85.7% sensitivity using the random forest classifier with 4.2 false positives per CMB, and the naive Bayes classifier achieved 90.5% sensitivity with 5.5 false positives per CMB. The proposed technique outperformed many state-of-the-art methods proposed in previous studies. Key words: cerebral microbleeds, random forest classifier, naive Bayes classifier, brain bleeds, hemosiderin deposits, 1. Introduction Cerebral microbleeds (CMBs) are small bleedings inside the brain that are believed to be caused by leakage of blood from blood vessels due to some medical conditions (Greenberg [...]
- Published
- 2023
- Full Text
- View/download PDF
37. Automated Classification of Cancer using Heuristic Class Topper Optimization based Naïve Bayes Classifier
- Author
-
Kukreja, Sonia, Sabharwal, Munish, Katiyar, Alok, and Gill, D. S.
- Published
- 2024
- Full Text
- View/download PDF
38. Comparison of correlated algorithm accuracy Naive Bayes Classifier and Naive Bayes Classifier for heart failure classification
- Author
-
Pungkas Subarkah, Wenti Risma Damayanti, and Reza Aditya Permana
- Subjects
classification ,correlated naive bayes classifier ,naive bayes classifier ,heart failure. ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Heart failure (ARF) is a health problem that has relatively high mortality and morbidity rates in developed or developing countries, including Indonesia. In 2016, WHO stated that 17.5 million people died from cardiovascular disease, while in 2008, HF disease represented 31% of patient deaths worldwide. One of the new breakthroughs for early diagnosis is utilizing data mining techniques. In this study, the Correlated Naive Bayes Classifier (C-NBC) and Naive Bayes Classifier (NBC) algorithms are used to obtaining the best accuracy results so that they can be used for the Heart Failure dataset. Based on the results of the tests that have been carried out, it shows that the Correlated Naive Bayes Classifier (C-NBC) algorithm accuracy of 80.6% obtains higher accuracy than the Naive Bayes Classifier (NBC) algorithm of 67.5%. With the results of this study, the use of the Correlated Naive Bayes Classifier (C-NBC) algorithm can be used to diagnose patients with heart failure (heart failure) because it has a high level of accuracy and is categorized as Good Classification.
- Published
- 2022
- Full Text
- View/download PDF
39. Emotion Classification in Bangla Text Data Using Gaussian Naive Bayes Classifier: A Computational Linguistic Study
- Author
-
S M Abdullah Shafi, Myesha Samia, and Sultanul Arifeen Hamim
- Subjects
Emotion Detection ,Natural Language Processing ,Naïve Bayes ,Machine Learning ,Technology - Abstract
Emotion analysis from Bengali text data is challenging due to the intricate structure of the language itself and lack of resource availability tailored to Sentiment Classification. In this paper, the authors have used machine learning algorithms, particularly Gaussian Naive Bayes and Support Vector Machine, for the classification of six emotions in Bengali text. The data is comprehensively pre-processed through segmentation, emoticon handling, removal of stop words, and stemming. It uses feature selection techniques like unigram, bi-gram, and term frequency-inverse document frequency to improve classification accuracy. The main aim of the paper is to present an in-depth analysis of emotion detection in Bengali text, which would be very helpful to scholars working on NLP problems in non-English languages. This research, hence, fills up the gap in emotion analysis research for Bengali text, which has comparatively remained underexplored compared to other languages. The methodology involves dataset preparation, extensive preprocessing, feature extraction with selection, and classification. After rigorous experimentation, the accuracy attained with the GNB classifier is 93.83%, proving the effectiveness of the proposed model in capturing subtle emotional nuances in Bengali text.
- Published
- 2024
40. Diagnosis and multiclass classification of diabetic retinopathy using enhanced multi thresholding optimization algorithms and improved Naive Bayes classifier.
- Author
-
Usharani Bhimavarapu
- Published
- 2024
- Full Text
- View/download PDF
41. Ok-NB: An Enhanced OPTICS and k-Naive Bayes Classifier for Imbalance Classification With Overlapping.
- Author
-
Zahid Ahmed, Biju Issac 0001, and Sufal Das
- Published
- 2024
- Full Text
- View/download PDF
42. Threshold-based Naïve Bayes classifier.
- Author
-
Maurizio Romano, Giulia Contu, Francesco Mola, and Claudio Conversano
- Published
- 2024
- Full Text
- View/download PDF
43. Application of The Naïve Bayes Classifier Algorithm to Classify Community Complaints
- Author
-
Keszya Wabang, Oky Dwi Nurhayati, and Farikhin
- Subjects
classification ,complaints/ community reports ,naive bayes classifier ,Systems engineering ,TA168 ,Information technology ,T58.5-58.64 - Abstract
Unsatisfactory public services encourage the public to submit complaints/ reports to public service providers to improve their services. However, each complaint/ report submitted varies. Therefore, the first step of the community complaint resolution process is to classify every incoming community complaint. The Ombudsman of The Republic of Indonesia annually receives a minimum of 10,000 complaints with an average of 300-500 reports per province per year, classifies complaints/ community reports to divide them into three classes, namely simple reports, medium reports, and heavy reports. The classification process is carried out using a weight assessment of each complaint/ report using 5 (five) attributes. It becomes a big job if done manually. This impacts the inefficiency of the performance time of complaint management officers. As an alternative solution, in this study, a machine learning method with the Naïve Bayes Classifier algorithm was applied to facilitate the process of automatically classifying complaints/ community reports to be more effective and efficient. The results showed that the classification of complaints/ community reports by applying the Naïve Bayes Classifier algorithm gives a high accuracy value of 92%. In addition, the average precision, recall, and f1-score values, respectively, are 91%, 93%, and 92%.
- Published
- 2022
- Full Text
- View/download PDF
44. SISTEM PENDUKUNG KEPUTUSAN REKOMENDASI TOPIK JUDUL SKRIPSI MENGGUNAKAN METODE NAIVE BAYES CLASSIFIER
- Author
-
Andi Wijaya
- Subjects
decision support system, thesis, naïve bayes classifier, prototyping ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Decision Support System (DSS) is a computer-based interactive application that combines data and mathematical models to assist the decision-making process in dealing with a problem.[1] At the Faculty of Engineering, Informatics Study Program, Nurul Jadid University, there are several stages in the process of preparing a thesis that students need to do, namely submission of thesis titles, submission of thesis proposals, proposal seminars, research and thesis guidance. After writing is considered ready and finished, students present the results of their thesis at the lecturer examines the thesis exam, but students whose thesis exam results pass with revisions, carry out the revision process in accordance with the examiner's input. The problem that is often experienced by students at the Faculty of Engineering Informatics Study Program is the process of submitting thesis titles, where students have difficulty determining the topic of thesis title. Then a Decision Support System for Thesis Title Topic Recommendations was created using the Naive Bayes Classifier Method at the Faculty of Engineering, Informatics Study Program, Nurul Jadid University, which aims to help facilitate lecturers and students in the management process of determining the recommendation criteria for thesis title topics, the process of managing thesis title recommendations and thesis title submissions, the method used is prototyping with the PHP programming language, MySQL database and the Naive Bayes Classifier method, for system design using Flowcharts, DFD, and ERD. Based on the results of the Naive Bayes Classifier method, it produces test results with very good Likert scale calculations with a high accuracy value of 96.6%.
- Published
- 2022
- Full Text
- View/download PDF
45. Positive Feature Values Prioritized Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes Classifier for Hierarchical Feature Spaces
- Author
-
Wan, Cen
- Subjects
Computer Science - Machine Learning ,Quantitative Biology - Genomics - Abstract
The Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes (HRE-TAN) classifier is a semi-naive Bayesian model that learns a type of hierarchical redundancy-free tree-like feature representation to estimate the data distribution. In this work, we propose two new types of positive feature values prioritized hierarchical redundancy eliminated tree augmented naive Bayes classifiers that focus on features bearing positive instance values. The two newly proposed methods are applied to 28 real-world bioinformatics datasets showing better predictive performance than the conventional HRE-TAN classifier., Comment: arXiv admin note: substantial text overlap with arXiv:2202.04105
- Published
- 2022
46. On the Existence of the Adversarial Bayes Classifier (Extended Version)
- Author
-
Awasthi, Pranjal, Frank, Natalie S., and Mohri, Mehryar
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended and corrected version of the paper \emph{On the Existence of the Adversarial Bayes Classifier} published in NeurIPS 2021. There were two errors in theorem statements in the original paper -- one in the definition of pseudo-certifiable robustness and the other in the measurability of $A^\e$ for arbitrary metric spaces. In this version we correct the errors. Furthermore, the results of the original paper did not apply to some non-strictly convex norms and here we extend our results to all possible norms., Comment: 27 pages, 3 figures. Version 2: Corrects 2 errors in the paper "On the Existence of the Adversarial Bayes Classifier" published in NeurIPS. Version 3: Update to acknowledgements
- Published
- 2021
47. A New Approach for Discontinuity Extraction Based on an Improved Naive Bayes Classifier
- Author
-
Guangyin Lu, Xudong Zhu, Bei Cao, Yani Li, Chuanyi Tao, and Zicheng Yang
- Subjects
rock mass ,point cloud ,discontinuity ,automatic extraction ,machine learning ,Naive Bayes classifier ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
An increasing number of methods are being used to extract rock discontinuities from 3D point cloud data of rock surfaces. In this paper, a new method for automatic extraction of rock discontinuity based on an improved Naive Bayes classifier is proposed. The method first uses principal component analysis to find the normal vectors of the points, and then generates a certain number of random point sets around the selected training points for training the classifier. The trained, improved Naive Bayes classifier is based on point normal vectors and is able to automatically remove noise points due to various reasons in conjunction with the knee point algorithm, realizing high-precision extraction of the discontinuity sets. Subsequently, the individual discontinuities are segmented using a hierarchical density-based spatial clustering method with noise application. Finally, the PCA algorithm is used to complete the orientation by plane fitting the individual discontinuities. The method was applied in two cases, Kingston and Colorado, and the reliability and advantages of the new method were verified by comparing the results with those of previous research, and the discussion and analysis determined the optimal values of the relevant parameters in the algorithm.
- Published
- 2024
- Full Text
- View/download PDF
48. MNBC: a multithreaded Minimizer-based Naïve Bayes Classifier for improved metagenomic sequence classification.
- Author
-
Lu, Ruipeng, Dumonceaux, Tim, Anzar, Muhammad, Zovoilis, Athanasios, Antonation, Kym, Barker, Dillon, Corbett, Cindi, Nadon, Celine, Robertson, James, Eagle, Shannon H C, Lung, Oliver, Rudar, Josip, Surujballi, Om, and Laing, Chad
- Subjects
- *
PLURALITY voting , *METAGENOMICS , *DATABASES , *CLASSIFICATION , *CENTRIFUGES - Abstract
Motivation State-of-the-art tools for classifying metagenomic sequencing reads provide both rapid and accurate options, although the combination of both in a single tool is a constantly improving area of research. The machine learning-based Naïve Bayes Classifier (NBC) approach provides a theoretical basis for accurate classification of all reads in a sample. Results We developed the multithreaded Minimizer-based Naïve Bayes Classifier (MNBC) tool to improve the NBC approach by applying minimizers, as well as plurality voting for closely related classification scores. A standard reference- and test-sequence framework using simulated variable-length reads benchmarked MNBC with six other state-of-the-art tools: MetaMaps, Ganon, Kraken2, KrakenUniq, CLARK, and Centrifuge. We also applied MNBC to the "marine" and "strain-madness" short-read metagenomic datasets in the Critical Assessment of Metagenome Interpretation (CAMI) II challenge using a corresponding database from the time. MNBC efficiently identified reads from unknown microorganisms, and exhibited the highest species- and genus-level precision and recall on short reads, as well as the highest species-level precision on long reads. It also achieved the highest accuracy on the "strain-madness" dataset. Availability and implementation MNBC is freely available at: https://github.com/ComputationalPathogens/MNBC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Urban agglomeration waterlogging hazard exposure assessment based on an integrated Naive Bayes classifier and complex network analysis
- Author
-
Wang, Mo, Fu, Xiaoping, Zhang, Dongqing, Lou, Siwei, Li, Jianjun, Chen, Furong, Li, Shan, and Tan, Soon Keat
- Published
- 2023
- Full Text
- View/download PDF
50. Fairness-Aware Naive Bayes Classifier for Data with Multiple Sensitive Features
- Author
-
Boulitsakis-Logothetis, Stelios
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Fairness-aware machine learning seeks to maximise utility in generating predictions while avoiding unfair discrimination based on sensitive attributes such as race, sex, religion, etc. An important line of work in this field is enforcing fairness during the training step of a classifier. A simple yet effective binary classification algorithm that follows this strategy is two-naive-Bayes (2NB), which enforces statistical parity - requiring that the groups comprising the dataset receive positive labels with the same likelihood. In this paper, we generalise this algorithm into N-naive-Bayes (NNB) to eliminate the simplification of assuming only two sensitive groups in the data and instead apply it to an arbitrary number of groups. We propose an extension of the original algorithm's statistical parity constraint and the post-processing routine that enforces statistical independence of the label and the single sensitive attribute. Then, we investigate its application on data with multiple sensitive features and propose a new constraint and post-processing routine to enforce differential fairness, an extension of established group-fairness constraints focused on intersectionalities. We empirically demonstrate the effectiveness of the NNB algorithm on US Census datasets and compare its accuracy and debiasing performance, as measured by disparate impact and DF-$\epsilon$ score, with similar group-fairness algorithms. Finally, we lay out important considerations users should be aware of before incorporating this algorithm into their application, and direct them to further reading on the pros, cons, and ethical implications of using statistical parity as a fairness criterion., Comment: To be published in the Proceedings of the AAAI 2022 Spring Symposium on Achieving Wellbeing in AI, Stanford University, Palo Alto, California, USA, Mar. 21-23, 2022
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.