Descriptor: "bayes classifier" / Topic: computer - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"bayes classifier"' showing total 949 results

Start Over Descriptor "bayes classifier" Topic computer

949 results on '"bayes classifier"'

1. Predicting at-risk university students based on their e-book reading behaviours by using machine learning classifiers

Author: Jian-Xuan Weng, Hiroaki Ogata, Chien-Yuan Su, Stephen J.H. Yang, and Cheng-Huan Chen
Subjects: Learning classifier system, business.industry, media_common.quotation_subject, Decision tree, Context (language use), Academic achievement, Bayes classifier, Machine learning, computer.software_genre, Education, Random forest, Naive Bayes classifier, Reading (process), Artificial intelligence, business, Psychology, computer, media_common
Abstract: Providing early predictions of academic performance is necessary for identifying at-risk students and subsequently providing them with timely intervention for critical factors affecting their academic performance. Although e-book systems are often used to provide students with teaching/learning materials in university courses, seldom has research made the early prediction based on their online reading behaviours by implementing machine learning classifiers. This study explored to what extent university students’ academic achievement can be predicted, based on their reading behaviours in an e-book supported course, using the classifiers. It further investigated which of the features extracted from the reading logs influence the predictions. The participants were 100 first-year undergraduates enrolled in a compulsory course at a university in Taiwan. The results suggest that logistic regression supports vector classification, decision trees, and random forests, and neural networks achieved moderate prediction performance with accuracy, precision, and recall metrics. The Bayes classifier identified almost all at-risk students. Additionally, student online reading behaviours affecting the prediction models included: turning pages, going back to previous pages and jumping to other pages, adding/deleting markers, and editing/removing memos. These behaviours were significantly positively correlated to academic achievement and should be encouraged during courses supported by e-books. Implications for practice or policy: For identifying at-risk students, educators could prioritise using Gaussian naïve Bayes in an e-book supported course, as it shows almost perfect recall performance. Assessors could give priority to logistic regression and neural networks in this context because they have stable achievement prediction performance with different evaluation metrics. The prediction models are strongly affected by student online reading behaviours, in particular by locating/returning to relevant pages and modifying markers.
Published: 2021
Full Text: View/download PDF

2. ImplementasiAlgoritma NaÃ¯ve Bayes Classifier (NBC) Untuk Analisis Sentimen Komentar Kebijakan Full Day School

Author: Muhammad Halmi Dar, Volvo Sihombing, and Yarma Agustya Dewi Utami
Subjects: Computer science, business.industry, Sentiment analysis, Feature selection, Bayes classifier, Lexicon, Machine learning, computer.software_genre, Naive Bayes classifier, Trigram, Artificial intelligence, business, computer, Selection (genetic algorithm), Test data
Abstract: Sentiment analysis is an important research topic and is currently being developed. Sentiment analysis is carried out to see the opinion or tendency of a person's opinion on a problem or object, whether it tends to have a negative or positive view. The main purpose of this research is to find out public sentiment towards the Full Day school policy comments from the Facebook Page of the Ministry of Education and Culture of the Republic of Indonesia and to determine the performance of the Na-Ã¯ve Bayes Classifier Algorithm. The results of this study indicate that the public's negative sentiment towards the Full Day School policy is higher than positive or neutral sentiment. The highest accuracy value is the NaÃ¯ve Bayes Classifier algorithm with the trigram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy of the results. Meanwhile, the simulation results from 10 test data with 5 different NBC and Lexicon algorithms also show that the Full Day School Policy proposed by the Indonesian Minister of Education and Culture has a higher negative sentiment than positive or neutral by most Facebook users who express opinions through comments. The highest accuracy value is the NaÃ¯ve Bayes Classifier algorithm with the trigram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy of the results. Meanwhile, the simulation results from 10 test data with 5 different NBC and Lexicon algorithms also show that the Full Day School Policy proposed by the Indonesian Minister of Education and Culture has a higher negative sentiment than positive or neutral by most users. Facebook that expresses opinions through comments. The highest accuracy value is the NaÃ¯ve Bayes Classifier algorithm with the tri-gram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy results.
Published: 2021
Full Text: View/download PDF

3. Implementation of Data Mining using Naïve Bayes Classifier Method in Food Crop Prediction

Author: Halim Fathoni, Oki Arifin, and Kurniawan Saputra
Subjects: Crop, Naive Bayes classifier, Mean squared error, Data mining, Bayes classifier, Prediction system, computer.software_genre, computer, Mathematics
Abstract: Lampung province has development activity orienting on source potential in the agricultural sector mainly food crops. Yield estimation of food crops is one of the things crucial problems in the agricultural sector, because of the farmers' lack of knowledge about the bountiful harvest, and climate change big impact on the yield of food crops. Then it was needed to be developed modeling to prediction system of food crops by data mining, with Naïve Bayes Classifier (NBC) which expected will give information and can use by the farmer and industrial food crops. On classification, progress attributes that use there is the temperature (°C), humidity (%), rainfall (mm), photoperiodicity (hour), and production result (ton) as a class attribute. The data of research that getting there are climate data and yield of food crops by data from the Central Bureau of Statistics (BPS) and the Meteorology, Climatology and Geophysics Agency (BMKG) from 2010 to 2017 at Lampung Province. Data of food crops used in this research there are paddy, maize, and soybean. The research results about the average accuracy of modeling that development using the 10-fold cross-validation method, that had an accuracy value of 72.78% and Root Mean Square Error (RMSE) there is 0.438.
Published: 2021
Full Text: View/download PDF

4. An Improved Feature Selection Based on Naive Bayes with Kernel Density Estimator for Opinion Mining

Author: Raja Rajeswari Sethuraman and John Sanjeev Kumar Athisayam
Subjects: Multidisciplinary, business.industry, Computer science, 010102 general mathematics, Kernel density estimation, Estimator, Feature selection, Bayes classifier, Machine learning, computer.software_genre, 01 natural sciences, Naive Bayes classifier, ComputingMethodologies_PATTERNRECOGNITION, Kernel (statistics), Information gain ratio, Artificial intelligence, 0101 mathematics, business, Classifier (UML), computer
Abstract: Opinion mining has gained much attention in the recent years due to the rapid growth of social media. It is a task of analyzing customer reviews to make decisions by classifying the reviews into positive or negative. These text reviews have high dimensions that lead to the curse of dimensionality. To handle this high dimension of text data, improved gain ratio is proposed to select the features with the highest ranking. Naїve Bayes classifier with kernel density function is used to evaluate the feature set. The Naїve Bayes classifier with Kernel density estimation is a nonparametric classifier that computes the probability density function based on the kernel estimator. This classifier produces higher accuracy in various benchmarking datasets.
Published: 2021
Full Text: View/download PDF

5. An efficient classification of secure and non-secure bug report material using machine learning method for cyber security

Author: Ravi Rastogi and Zaher Shuraym M. Alharthi
Subjects: 010302 applied physics, Artificial neural network, Computer science, business.industry, Software development, Natural language programming, 02 engineering and technology, Bayes classifier, 021001 nanoscience & nanotechnology, Machine learning, computer.software_genre, 01 natural sciences, Field (computer science), ComputingMethodologies_PATTERNRECOGNITION, n-gram, Software, 0103 physical sciences, Feature (machine learning), Artificial intelligence, 0210 nano-technology, business, computer
Abstract: In the field of software development, the main problem is to recognize the security-oriented issues within the reported bugs due to its inacceptable rate to provide the satisfied reliability on customer and software dataset. The objective is to propose a novel machine learning approach for multiclass supervised classification named Bug Severity classification to overcome these challenges with the use of supervised Artificial Neural Network and stacking based Navies Bayes classifier. This proposed approach directly examines the latent and highly descriptive features. Primarily, using the natural language programming approaches bug report text is preprocessed. After then, N gram is employed for extracting features by overcoming data sparsity problems. Further, the supervised Artificial Neural Network extracts the salient feature patterns of the corresponding severity classes. Finally, the stacking-based Navies Bayes classifier is employed for classifying multiple bug severity classes.
Published: 2021
Full Text: View/download PDF

6. Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem

Author: Yang Lu, Yuan Yan Tang, and Yiu-ming Cheung
Subjects: Computer Networks and Communications, Computer science, 02 engineering and technology, Bayes classifier, computer.software_genre, Imbalanced data, Computer Science Applications, Impact index, Bayes' theorem, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, F1 score, Classifier (UML), computer, Software
Abstract: Recent studies of imbalanced data classification have shown that the imbalance ratio (IR) is not the only cause of performance loss in a classifier, as other data factors, such as small disjuncts, noise, and overlapping, can also make the problem difficult. The relationship between the IR and other data factors has been demonstrated, but to the best of our knowledge, there is no measurement of the extent to which class imbalance influences the classification performance of imbalanced data. In addition, it is also unknown which data factor serves as the main barrier for classification in a data set. In this article, we focus on the Bayes optimal classifier and examine the influence of class imbalance from a theoretical perspective. We propose an instance measure called the Individual Bayes Imbalance Impact Index (IBI3) and a data measure called the Bayes Imbalance Impact Index (BI3). IBI3 and BI3 reflect the extent of influence using only the imbalance factor, in terms of each minority class sample and the whole data set, respectively. Therefore, IBI3 can be used as an instance complexity measure of imbalance and BI3 as a criterion to demonstrate the degree to which imbalance deteriorates the classification of a data set. We can, therefore, use BI3 to access whether it is worth using imbalance recovery methods, such as sampling or cost-sensitive methods, to recover the performance loss of a classifier. The experiments show that IBI3 is highly consistent with the increase of the prediction score obtained by the imbalance recovery methods and that BI3 is highly consistent with the improvement in the F1 score obtained by the imbalance recovery methods on both synthetic and real benchmark data sets.
Published: 2020
Full Text: View/download PDF

7. Document-level emotion detection using graph-based margin regularization

Author: Jie Zhang, Pengcheng Hu, and Ruihua Cheng
Subjects: 0209 industrial biotechnology, Computer science, business.industry, Cognitive Neuroscience, Bigram, Sentiment analysis, Unstructured data, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Regularization (mathematics), Computer Science Applications, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Sentiment analysis aims to automatically detect the underlying attitudes that users express. For the documents with complex unstructured data, such as reviews, emojis and surveys, it is usually hard to precisely identify the real emotion. Thus, it becomes urgent, yet challenging, to develop a technique that can process and make use of the unstructured information. In this article, we consider sentiment classification for those unstructured features extracted from texts. We propose a regularization-based framework to pursue better classification performance by (1) introducing polarity shifters assembled with sentiment words to create novel bigram features and (2) simultaneously constructing a constraint graph to encode the relative polarity among unstructured features to improve the parameter estimation procedure. Under these settings, our approach can uncover the intrinsic semantic information from the unstructured text data. Theoretically, we justify its underlying equivalent connection with the standard Bayes classifier, which is ideally optimal when the sample distribution is known. Moreover, we show that our new method yields better generalization ability due to the reduced solution search space and the appealing asymptotic consistency. The superior performance from real data experiments demonstrates the robustness and effectiveness of the proposed method.
Published: 2020
Full Text: View/download PDF

8. Nonparametric 'anti-Bayesian' quantile-based pattern classification

Author: B. John Oommen, Mostafa Razmkhah, and Fatemeh Mahmoudi
Subjects: Computer science, business.industry, Bayesian probability, Kernel density estimation, Nonparametric statistics, 020207 software engineering, 02 engineering and technology, Bayes classifier, Mixture model, Machine learning, computer.software_genre, Bayes' theorem, ComputingMethodologies_PATTERNRECOGNITION, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Parametric statistics, Quantile
Abstract: Parametric and nonparametric pattern recognition have been studied for almost a century based on a Bayesian paradigm, which is, in turn, founded on the principles of Bayes theorem. It is well known that the accuracy of the Bayes classifier cannot be exceeded. Typically, this reduces to comparing the testing sample to mean or median of the respective distributions. Recently, Oommen and his co-authors have presented a pioneering and non-intuitive paradigm, namely that of achieving the classification by comparing the testing sample with another descriptor, which could also be quite distant from the mean. This paradigm has been termed as being “anti-Bayesian,” and it essentially uses the quantiles of the distributions to achieve the pattern recognition. Such classifiers attain the optimal Bayesian accuracy for symmetric distributions even though they operate with a non-intuitive philosophy. While this paradigm has been applied in a number of domains (briefly explained in the body of this paper), its application for nonparametric domains has been limited. This paper explains, in detail, how such quantile-based classification can be extended to the nonparametric world, using both traditional and kernel-based strategies. The paper analyzes the methodology of such nonparametric schemes and their robustness. From a fundamental perspective, the paper utilizes the so-called large sample theory to derive strong asymptotic results that pertain to the equivalence between the parametric and nonparametric schemes for large samples. Apart from the new theoretical results, the paper also presents experimental results demonstrating their power. These results pertain to artificial data sets and also involve a real-life breast cancer data set obtained from the University Hospital Centre of Coimbra. The experimental results clearly confirm the power of the proposed “anti-Bayesian” procedure, especially when approached from a nonparametric perspective.
Published: 2020
Full Text: View/download PDF

9. Flexible Machine Learning-Based Cyberattack Detection Using Spatiotemporal Patterns for Distribution Systems

Author: Bo Chen, Jianhui Wang, and Mingjian Cui
Subjects: Thesaurus (information retrieval), General Computer Science, business.industry, Computer science, 020209 energy, 020208 electrical & electronic engineering, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Distribution system, Search engine, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Laplacian matrix, business, computer
Abstract: This letter develops a flexible machine learning detection method for cyberattacks in distribution systems considering spatiotemporal patterns. Spatiotemporal patterns are recognized by the graph Laplacian based on system-wide measurements. A flexible Bayes classifier (BC) is used to train spatiotemporal patterns which could be violated when cyberattacks occur. Cyberattacks are detected by using flexible BCs online. The effectiveness of the developed method is demonstrated through standard IEEE 13- and 123-node test feeders.
Published: 2020
Full Text: View/download PDF

10. Predicting Traffic Conditions Using Knowledge-Growing Bayes Classifier

Author: Emir Husni, Kuspriyanto, Surya Michrandi Nasution, and Rahadian Yusuf
Subjects: General Computer Science, Computer science, Decision tree, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, traffic prediction, knowledge growing Bayes classifier, 050210 logistics & transportation, Training set, Artificial neural network, business.industry, congestion, Deep learning, 020208 electrical & electronic engineering, 05 social sciences, General Engineering, Support vector machine, machine learning, Traffic conditions, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, F1 score, business, lcsh:TK1-9971, computer
Abstract: Congestion often hinders human mobility. This problem occurs due to the constant increase in vehicles every year. Reliable predictions of traffic conditions would allow drivers to choose their routes to avoid traffic jams while providing the police with traffic management strategies. Therefore, this paper tests the ability of various machine learning methods to predict traffic conditions. The study assesses Neural Networks, Bayes Classifier, Decision Trees, SVM, Deep Neural Network, and Deep Learning. Of these methods, the Decision Tree, Deep Neural Network, and Bayes Classifier show the highest performance in predicting traffic conditions using static data testing. However, in dynamic testing to assess the growth of knowledge, the performance of the Knowledge-Growing Decision Tree tends to decrease as the training data grows. Its performance decreased 3.89 points (88.24% to 84.35%) in accuracy, and 7.55 points (76.25% to 68.70%) for each precision, recall, and F1 Score. Conversely, the Knowledge-Growing Deep Neural Network and Bayes Classifier had a better performance than Decision Tree. The performances of Knowledge-Growing Deep Neural Network increased slightly by 0.35 points (93.38% to 93.73%) for accuracy and 0.69 points (86.77% to 87.64%) in other measurements. Although its performance increased, the processing time takes very long, namely 139452.76 seconds and 318832.80 seconds for sub-scheme (a) and (b), respectively. Meanwhile, the Knowledge-Growing Bayes Classifier offers a greater performance increase of 2.3 points (80.52% to 82.82%) for the accuracy and 4.6 points (65.63% to 61.03%) for the other performance measurements. In addition, it also scored better for processing time, as predictions only take 3 seconds using sub-scheme (a), and 7 seconds when using sub-scheme (b). Therefore, the paper proposes the Knowledge-Growing Bayes Classifier to predict rapidly changing traffic conditions. This method outperform the others. These can be attributed to its ability to 1) adjust to ever-changing the traffic conditions; 2) predict the result as soon as the data are acquired; and 3) make decentralized predictions.
Published: 2020
Full Text: View/download PDF

11. Deep Learning vs. Traditional Approaches to Malware Traffic Classification – A Comparative Study

Author: Waldemar Graniszewski, Marcin Iwanowski, Damian Rybicki, and Jacek Krupski
Subjects: Computer science, business.industry, Deep learning, Decision tree, Bayes classifier, Machine learning, computer.software_genre, Convolutional neural network, Bayes' theorem, Traffic classification, Malware, Artificial intelligence, business, Reference model, computer
Abstract: In the paper, decision trees and Bayes classifiers are applied to detect malware traffic in computer networks and compared with the state-of-the-art convolutional neural network reference model. The experiments have been conducted using the known USTC-TFC2016 data set. The results obtained are close to 100% accuracy and are better than the original results produced by the reference method.
Published: 2021
Full Text: View/download PDF

12. Detecting Phishing Websites Using Neural Network and Bayes Classifier

Author: Ravinthiran Partheepan
Subjects: Password, Spoofing attack, business.industry, Computer science, Whitelist, Bayes classifier, Perceptron, Machine learning, computer.software_genre, Phishing, Blacklist, ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS, Naive Bayes classifier, Artificial intelligence, business, computer
Abstract: Phishing is a social engineering attack that is based on a cyberattack and it focuses on naive online users by spoofing them to provide their sensitive credentials such as password, username, security number, or debit card number, etc. Phishing can be performed by masking a webpage as a legitimate page to pull the personal credentials of the user. Nonetheless, there are a lot of methodologies that have been introduced as a solution for detecting the phishing websites such as the whitelist approach or blacklist approach, visual similarity-based approach, and meta-heuristic approach however still the online users are getting scammed into revealing sensitive credentials in phishing websites. In this research paper, a novel hybrid methodology PB-cup learner was proposed, which is based on integration dimensional and neural learning that is pulled from the source code, uniform resource locator, and representative state transfer API to overcome the drawbacks of the existing phishing techniques. This model gives the accuracy analysis of the Naive Bayes Classifier, Genetic Algorithm, Multi-Layer Perceptron, Multiple Linear Regression, and PB-CUP neural learner and out of which, the Multi-Layer Perceptron algorithm has been performed the best with an accuracy of 99.17%. The experiments were iteratively analyzed with different orthogonal algorithms for finding the best classifier accuracy for phishing website detection.
Published: 2021
Full Text: View/download PDF

13. Botnet Detection Using Bayes Classifier

Author: Deepak Kshirsagar and Prapti Kolpe
Subjects: Bayes' theorem, Software_OPERATINGSYSTEMS, business.industry, Computer science, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Botnet, The Internet, Artificial intelligence, Bayes classifier, Machine learning, computer.software_genre, business, computer
Abstract: In today’s connected world, risk of getting attacked over the internet is increased, which plays a major role in infecting the devices over the internet. The internet is flooded with different malwares, but we have focused on the harmful effects of Botnet. Botnet is a group of devices controlled by a single device to attack and infect other devices over the internet. The devices are called bots and these can be any internet-connected device and the single device controlling these can be called as a botmaster or a bot driver. It is crucial to detect them at a faster rate since they can perform various malicious activities. We performed different experiments to detect Botnet. For experimentation, we used CICIDS2017 dataset and different machine learning algorithms from Weka. With the ML algorithms, we achieved the highest accuracy of 98.9146% for NaiveBayesMultinominalText algorithm.
Published: 2021
Full Text: View/download PDF

14. Classification of EEG signals to detect alcoholism using machine learning techniques

Author: Pedro Pedrosa Rebouças Filho, Jardel das C. Rodrigues, Eugenio Peixoto, Victor Hugo C. de Albuquerque, and Arun kumar N
Subjects: Computer science, 02 engineering and technology, Bayes classifier, Electroencephalography, Machine learning, computer.software_genre, 01 natural sciences, Standard deviation, Wavelet packet decomposition, Bayes' theorem, Immune system, Wavelet, Artificial Intelligence, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, medicine, 010306 general physics, medicine.diagnostic_test, business.industry, Perceptron, Support vector machine, Signal Processing, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Biorthogonal wavelet, computer, Software
Abstract: The diagnosis of alcoholism is of great importance not only due to its effects on the individual and society but also the costs to the national health systems. Moreover, there are a large number of people suffering from this disease worldwide. Alcoholism has critical pathological effects on the liver, immune system, brain, and heart. Machine learning techniques are already well known for the classification of biosignals as they offer an efficient way to assist professionals in the automated diagnosis of various diseases, with high accuracy rates. This work presents the classification of alcoholic electroencephalographic (EEG) signals using Wavelet Packet Decomposition (WPD) and machine learning techniques. The experiments were realized using the minimum value, maximum value, mean, standard deviation, power value, the ratio of absolute mean and the absolute mean as features to feed the classifiers. These features were combined with the objective of exploring the feasibility of such features to classify alcoholism. The classification task was performed using Support Vector Machine (SVM), Optimum-Path Forest (OPF), Nave Bayes, k-Nearest Neighbors (k-NN) and Multi-layer Perceptron (MLP). The results showed maximum values of 99.87% for specificity, sensitivity, positive predictive value (PPV), and accuracy. These results were generated using the Nave Bayes classifier and the Biorthogonal wavelet family. A comparison with other techniques was performed aiming to validate our approach. The promising results, the inclusion of OPF classifier, and the specific combinations involving the chosen classifiers and wavelet families are the main contributions of this work. Finally, our strategy proved to be very effective in classifying alcoholic EEG signals.
Published: 2019
Full Text: View/download PDF

15. An Intelligently-Focused Crawling for Filtering the e-Learning Documents Using Optimized Hidden Na ̈ıve Bayes Classifier

Author: A., S. A. Sahaaya Arul Ramachandran, Mary
Subjects: Numerical Analysis, business.industry, Computer science, Applied Mathematics, E-learning (theory), Crawling, Bayes classifier, Machine learning, computer.software_genre, Computer Science Applications, Computational Theory and Mathematics, Artificial intelligence, business, computer, Analysis
Published: 2019
Full Text: View/download PDF

16. Bayesian network models for probabilistic evaluation of earthquake-induced liquefaction based on CPT and Vs databases

Author: Jilei Hu and Huabei Liu
Subjects: Database, Wave velocity, Probabilistic logic, Liquefaction, Bayesian network, Geology, Bayes classifier, Geotechnical Engineering and Engineering Geology, computer.software_genre, Cone penetration test, Standard penetration test, Soil liquefaction, computer, Mathematics
Abstract: Cone penetration test (CPT) and shear wave velocity (Vs) based databases have been used for the evaluation of earthquake-induced soil liquefaction, but probabilistic evaluation of soil liquefaction using Bayesian network methods has seldom been attempted using CPT and Vs results. In this study, these databases are first used to construct two new Bayesian network (BN) models for predicting the probability of the occurrence of soil liquefaction and then compared with four simplified procedures and a Bayes classifier for soil liquefaction evaluation. The present study shows that the two new BN models are preferred over the simplified procedures and the Bayes classifier. The reasons for the better performance and advantages of the BN models are discussed. In addition, a converging BN model combing CPT, SPT (standard penetration test), and Vs databases is simultaneously attempted to further improve the prediction performance and applicability.
Published: 2019
Full Text: View/download PDF

17. Training Normal Bayes Classifier on Distributed Data

Author: Ivan Kholod, S.S. Sokolov, Andrey Shorov, I.G. Mironenko, E.V. Postnikov, E.V. Titkov, and Mikhail Kuprianov
Subjects: Java, Computer science, Computation, Process (computing), 020206 networking & telecommunications, 02 engineering and technology, Extension (predicate logic), Bayes classifier, Transfer (computing), 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, Join (sigma algebra), 020201 artificial intelligence & image processing, Representation (mathematics), computer, Algorithm, General Environmental Science, computer.programming_language
Abstract: The paper describes an approach to parallelization of Normal Bayes classifier training algorithm for distributed data. In the process of distributed data analysis and the algorithm performance, the results fail to join properly. Due to this, the algorithm is to be performed in a distributed manner. For this purpose, we use representation of the algorithm as a sequential composition of functions. The algorithm is parallelized to work with data distributed horizontally and vertically. This allows placing parallel functions of the algorithm at data nodes. Experiments show that transfer of computations to sources allow to decrease training time and network traffic. We implement the algorithm variants as an extension of the industrial-strength Java-based library Xelopes.
Published: 2019
Full Text: View/download PDF

18. Second-Order Markov Assumption Based Bayes Classifier for Networked Data With Heterophily

Author: Dayou Liu, Lina Li, Dong Sa, Yungang Zhu, Jie Liu, Ruochuan Ouyang, and Tingting Li
Subjects: Artificial intelligence, General Computer Science, Computer science, heterophilous networks, Inference, Markov process, Bayes classifier, computer.software_genre, Homophily, symbols.namesake, Naive Bayes classifier, General Materials Science, Node (networking), General Engineering, data mining, Heterophily, machine learning, ComputingMethodologies_PATTERNRECOGNITION, networked data classification, relational classifier, symbols, Markov property, Multinomial distribution, lcsh:Electrical engineering. Electronics. Nuclear engineering, Data mining, lcsh:TK1-9971, computer
Abstract: The classification of networked data is an interesting and challenging problem. Most traditional relational classifiers that are based on the principle of homophily have an unsatisfactory classification performance in networks with heterophily. This is because these methods treat inhomogeneous networks homogeneously. A progression of a network-only Bayes-classifier-based second-order Markov assumption is proposed for heterophilous networks in this paper to address this problem. First, we estimate the class distribution of an unlabeled node according to the class distribution of its neighbors' neighbors. In this process, we perform this computation on the known and unknown neighbors separately. Second, we combine the two parts using multinomial naïve Bayesian classification. Meanwhile, we pair a relaxation labeling collective inference method (which imports simulated annealing) with this new method to update the class distributions at each iteration. Comparisons of the experimental results demonstrate that the proposed method performs better when the networks are heterophilous.
Published: 2019
Full Text: View/download PDF

19. Towards Learning Spatio-Temporal Data Stream Relationships for Failure Detection in Avionics

Author: Sida Chen, Shigeru Imai, Wennan Zhu, and Carlos A. Varela
Subjects: 020301 aerospace & aeronautics, Computer science, Data stream mining, Dynamic data, Linear model, Statistical model, 02 engineering and technology, Bayes classifier, computer.software_genre, Synthetic data, Temporal database, Flight planning, 0203 mechanical engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer
Abstract: Spatio-temporal data streams are often related in complex ways, for example, while the airspeed that an aircraft attains in cruise phase depends on the weight it carries, it also depends on many other factors. Some of these factors are controllable such as engine inputs or the airframe’s angle of attack, while others contextual, such as air density, or turbulence. It is therefore critical to develop failure models that can help recognize errors in the data, such as an incorrect fuel quantity, a malfunctioning pitot-static system, or other abnormal flight conditions. In this paper, we extend our PILOTS programming language [1] to support machine learning techniques that will help data scientists: (1) create parameterized failure models from data and (2) continuously train a statistical model as new evidence (data) arrives. The linear regression approach learns parameters of a linear model to minimize least squares error for given training data. The Bayesian approach classifies operating modes according to supervised offline training and can discover new statistically significant modes online. As shown in Tuninter 1153 simulation result, dynamic Bayes classifier finds discrete error states on the fly while the error signatures approach requires every error state predefined. Using synthetic data, we compare the accuracy, response time, and adaptability of these machine learning techniques. Future dynamic data driven applications systems (DDDAS) using machine learning can identify complex dynamic data-driven failure models, which will in turn enable more accurate flight planning and control for emergency conditions.
Published: 2021
Full Text: View/download PDF

20. CT-Based Radiomics Signature With Machine Learning Predicts MYCN Amplification in Pediatric Abdominal Neuroblastoma

Author: Kaiping Huang, Hao Ding, Ting Zhang, Li Zhang, Ling He, Xin Chen, Huan Liu, Wenqing Yu, and Haoru Wang
Subjects: Cancer Research, Computer science, Bayes classifier, Machine learning, computer.software_genre, Logistic regression, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, Bayes' theorem, neuroblastoma, 0302 clinical medicine, Lasso (statistics), children, MYCN, RC254-282, Original Research, Receiver operating characteristic, business.industry, Neoplasms. Tumors. Oncology. Including cancer and carcinogens, prediction, Nomogram, Random forest, Support vector machine, Oncology, radiomics, 030220 oncology & carcinogenesis, Artificial intelligence, business, computer, abdomen
Abstract: PurposeMYCN amplification plays a critical role in defining high-risk subgroup of patients with neuroblastoma. We aimed to develop and validate the CT-based machine learning models for predicting MYCN amplification in pediatric abdominal neuroblastoma.MethodsA total of 172 patients with MYCN amplified (n = 47) and non-amplified (n = 125) were enrolled. The cohort was randomly stratified sampling into training and testing groups. Clinicopathological parameters and radiographic features were selected to construct the clinical predictive model. The regions of interest (ROIs) were segmented on three-phrase CT images to extract first-, second- and higher-order radiomics features. The ICCs, mRMR and LASSO methods were used for dimensionality reduction. The selected features from the training group were used to establish radiomics models using Logistic regression, Support Vector Machine (SVM), Bayes and Random Forest methods. The performance of four different radiomics models was evaluated according to the area under the receiver operator characteristic (ROC) curve (AUC), and then compared by Delong test. The nomogram incorporated of clinicopathological parameters, radiographic features and radiomics signature was developed through multivariate logistic regression. Finally, the predictive performance of the clinical model, radiomics models, and nomogram was evaluated in both training and testing groups.ResultsIn total, 1,218 radiomics features were extracted from the ROIs on three-phrase CT images, and then 14 optimal features, including one original first-order feature and eight wavelet-transformed features and five LoG-transformed features, were identified and selected to construct the radiomics models. In the training group, the AUC of the Logistic, SVM, Bayes and Random Forest model was 0.940, 0.940, 0.780 and 0.927, respectively, and the corresponding AUC in the testing group was 0.909, 0.909, 0.729, 0.851, respectively. There was no significant difference among the Logistic, SVM and Random Forest model, but all better than the Bayes model (p ConclusionThe CT-based radiomics signature is able to predict MYCN amplification of pediatric abdominal NB with high accuracy based on SVM, Logistic and Random Forest classifiers, while Bayes classifier yields lower predictive performance. When combined with clinical and radiographic qualitative features, the clinics-radiomics nomogram can improve the performance of predicting MYCN amplification.
Published: 2021

21. Comparing Conventional and Machine-Learning Approaches to Risk Assessment in Domestic Abuse Cases

Author: Jeffrey Grogger, Sean Gupta, Ria Ivandic, and Tom Kirchmaier
Subjects: Protocol (science), Service (systems architecture), Computer science, business.industry, Download, Failure rate, Bayes classifier, Machine learning, computer.software_genre, False positive paradox, Domestic violence, Artificial intelligence, Risk assessment, business, computer
Abstract: We compare predictions from a conventional protocol-based approach to risk assessment with those based on a machine-learning approach. We first show that the conventional predictions are less accurate than, and have similar rates of negative prediction error as, a simple Bayes classifier that makes use only of the base failure rate. Machine learning algorithms based on the underlying risk assessment questionnaire do better under the assumption that negative prediction errors are more costly than positive prediction errors. Machine learning models based on two-year criminal histories do even better. Indeed, adding the protocol-based features to the criminal histories adds little to the predictive adequacy of the model. We suggest using the predictions based on criminal histories to prioritize incoming calls for service, and devising a more sensitive instrument to distinguish true from false positives that result from this initial screening. Institutional subscribers to the NBER working paper series, and residents of developing countries may download this paper without additional charge at www.nber.org.
Published: 2021
Full Text: View/download PDF

22. Analysis on Protocol-Based Intrusion Detection System Using Artificial Intelligence

Author: Savitri Mandal, A. Sai Sabitha, and Deepti Mehrotra
Subjects: Computer science, Network security, business.industry, Bayesian network, Intrusion detection system, Bayes classifier, computer.software_genre, Random forest, Host-based intrusion detection system, Naive Bayes classifier, Data mining, Protocol-based intrusion detection system, business, computer
Abstract: One of the major challenges in every field is the network security, so for preventing system, its data and sensitive information from any unauthorized access or harmful activity, intrusion detection systems are used. The objective of this research work is threefold. First objective is to applying various machine learning approaches such as Bayes classifier and random forests on the intrusion detection system for detecting any type of malicious activity. Second objective is to do the comparison for the accuracy of both random forest and Bayes classifier method. Third objective is to find out which algorithm will be fast and provide best result for intrusion detection systems to detect various attacks. In this, working on a protocol-based intrusion detection system understands the HTTP that is running in the particular net server or system. It can be used on the online server which is monitoring the HTTP or HTTPS. As we know that HTTP is a basic protocol that is used for communication between the client and server, attackers can exclusively make use of these protocols to exploit web application vulnerabilities. This system will analyze and monitor the dynamic state and behavior of the protocols, and therefore protecting the system. In this research work, acting on different classification algorithm’s performance like: Bayesian network, Naive Bayes, random forest and random tree. Selection of the features can reduce the information as well as the computational complexity thus producing the more efficient results.
Published: 2021
Full Text: View/download PDF

23. Introduction to Healthcare Information Management and Machine Learning

Author: S. R. Mani Sekhar, Vivek Dosaya, G. M. Siddesh, and Sunilkumar S. Manvi
Subjects: Information management, business.industry, Computer science, Bayes classifier, Machine learning, computer.software_genre, Information science, Field (computer science), Support vector machine, Health care, Data analysis, Artificial intelligence, business, Classifier (UML), computer
Abstract: Health information management deals with the collection, storing, analysis, and management of health data. It consists of various fields such as computer science, information science, information management, medicinal, business and data analytics. Machine Learning (ML) in healthcare evolving as an emerging field for healthcare industry. They help in analyzing the health data more effectively and timely. ML is one of the vital regions in the field of software engineering. It gives a streamlined answer for this present reality issues by utilizing past learning or past experience information. There are distinctive kinds of machine learning calculations present in software engineering. This chapter gives the outline of some chosen machine learning methods, for direct relapse, straight discriminant examination, bolster vector machine, innocent Bayes classifier, neural systems, and choice trees. Every one of these techniques is described in detail, which thus helps the reader to create our own answers for the given issues.
Published: 2021
Full Text: View/download PDF

24. Comparing Conventional and Machine-Learning Approaches to Risk Assessment in Domestic Abuse Cases

Author: Ria Ivandic, Sean Gupta, Tom Kirchmaier, and Jeffrey Grogger
Subjects: Protocol (science), Computer science, business.industry, Mean squared prediction error, 05 social sciences, Failure rate, Bayes classifier, Machine learning, computer.software_genre, 0506 political science, Random forest, 0502 economics and business, 050602 political science & public administration, False positive paradox, Domestic violence, Artificial intelligence, 050207 economics, Risk assessment, business, computer
Abstract: We compare predictions from a conventional protocol-based approach to risk assessment with those based on a machine-learning approach. We first show that the conventional predictions are less accurate than, and have similar rates of negative prediction error as, a simple Bayes classifier that makes use only of the base failure rate. A random forest based on the underlying risk assessment questionnaire does better under the assumption that negative prediction errors are more costly than positive prediction errors. A random forest based on two-year criminal histories does better still. Indeed, adding the protocol-based features to the criminal histories adds almost nothing to the predictive adequacy of the model. We suggest using the predictions based on criminal histories to prioritize incoming calls for service, and devising a more sensitive instrument to distinguish true from false positives that result from this initial screening.
Published: 2020
Full Text: View/download PDF

25. User Location and Collaborative based Recommender System using Naive Bayes Classifier and UIR Matrix

Author: P Sathishkumar, S Deepa, and R Suguna
Subjects: Naive Bayes classifier, Information extraction, Information retrieval, Computer science, Scalability, Collaborative filtering, Recommender system, Bayes classifier, Cluster analysis, computer.software_genre, computer, MovieLens
Abstract: The world is filled with information and getting the right information is a challenging task for internet users and online buyers. Recommender system helps internet users to get their information in a short span of time. It acts as an information extraction system that works behind users to perform their search easier. The recommender system comes under user’s content or item based search, similar users browsing behavior called collaborative and combination of both known as a hybrid. Here collaborative-based approach is adopted which recommends items to their users based on their past browsing behavior. In this article, the User-Item-Rating matrix is formulated concerning user personal profile, rating of the product, and reviews given by the users during their previous browsing history. In this research, user location is considered as an important attribute to group similar users. It also attempts to suppress the scalability and sparsity problems of the traditional collaborative filtering approach. So, the User-Item-Rating (UIR) matrix has considered the location, ratings and reviews for future recommendation. The Navie Bayes classifier algorithm is used to provide accurate topmost recommendations to internet users. The data set is taken from the MovieLens and IMDb database. The accuracy of the recommender system is measured based on the main metric f-measure. The experimental result has proven the improvement of the recommender system with the mentioned added attributes.
Published: 2020
Full Text: View/download PDF

26. Enhancing the Data Mining Tool WEKA

Author: Pranav Kotak and Hiral Modi
Subjects: Computer science, business.industry, Decision tree, Bayes classifier, computer.software_genre, Data mining algorithm, Support vector machine, Statistical classification, Data mining, business, Cluster analysis, computer, Graphical user interface, Intuition
Abstract: Data Mining is the key to get the best conclusive intuition behind a large chunk of data. WEKA has been the center of the talk because of its wide functionalities which include Data Mining Algorithms like Naϊve Bayes Classifier, Support Vector Machine, Decision Tree, KStar etc. which is helping various industries today to make their work quicker and superior. Even though it provides functionalities there are some disadvantages to the same. This paper aims at overcoming the major disadvantages/drawbacks of the Tool by proposing different approaches. There have been experiments carried out for a few approaches to validate the approach as well.
Published: 2020
Full Text: View/download PDF

27. Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review

Author: Ilham Esa Tiffani
Subjects: Computer science, business.industry, Bigram, Sentiment analysis, Bayes classifier, computer.software_genre, Naive Bayes classifier, n-gram, Text mining, Feature (machine learning), Trigram, Artificial intelligence, business, computer, Natural language processing
Abstract: The information needed in its development requires that proper analysis can provide support in making decisions. Sentiment analysis is a data processing technique that can be completed properly. To make it easy to classify hotels based on sentiment analysis using the Naїve Bayes Classifier algorithm. As a classification tool, Naїve Bayes Classifier is considered efficient and simple. In this study consists of 3 stages of sentiment analysis process. The first stage is text pre-processing which consists of transform case, stopword removal, and stemming. The second stage is the implementation of N-Gram features, namely Unigram, Bigram, Trigram. The N-Gram feature is a feature that contains a collection of words that will be referred to in the next process. Next, the last click is the hotel review classification process using Na menggunakanve Bayes Classifier. OpinRank Hotels Review dataset on Naїve Bayes Classifier using N-Gram namely Unigram, Bigram, Trigram with research results that show Unigram can provide better test results than Bigram and Trigram with an average accuracy of 81.30%.
Published: 2020
Full Text: View/download PDF

28. On the Detection of Shilling Attacks in Federated Collaborative Filtering

Author: Jiang Yangfan, Di Wu, Yipeng Zhou, Yan Wang, and Chao Li
Subjects: User information, business.industry, Computer science, Context (language use), 02 engineering and technology, Recommender system, Bayes classifier, Machine learning, computer.software_genre, Federated learning, Recommendation model, User privacy, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Collaborative filtering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Federated collaborative filtering (Fed-CF) is a variant of federated learning (FL) models, which can protect user privacy in recommender systems. In Fed-CF, the recommendation model is collectively trained across multiple decentralized clients by exchanging gradients only. However, the decentralized nature of Fed-CF makes it vulnerable to shilling attacks, which can be realized by inserting fake ratings of target items to distort recommendation results. Unfortunately, previous detection algorithms cannot work well in the FL framework, as all original data samples are not disclosed at all. In this paper, we are the first to systematically study the problem of shilling attacks in the context of federated learning, and propose an effective detection method called Federated Shilling Attack Detector (FSAD) to detect shilling attackers in Fed-CF. We first show the feasibility of shilling attacks in Fed-CF. Next, we dedicatedly design four novel features based on exchanged gradients among clients. By incorporating these gradient-based features, we train a semi-supervised Bayes classifier to identify shilling attackers effectively. Finally, we conduct extensive experiments based on real-world datasets to evaluate the performance of our proposed FSAD method. The experimental results show that FSAD can detect shilling attackers in Fed-CF with high accuracy, with the F1 value as high as 0.90 on the Netflix dataset, which approaches the performance of the optimal detector that utilizes complete private user information for detection.
Published: 2020
Full Text: View/download PDF

29. Cyberbully Detection Using Term Weighting Scheme and Naïve Bayes Classifier

Author: Rafeena Mohamad Rabii and Maheyzah Md Siraj
Subjects: Emoji, business.industry, Computer science, Feature selection, Bayes classifier, Machine learning, computer.software_genre, Weighting, Support vector machine, Statistical classification, Bayes' theorem, The Internet, Artificial intelligence, business, computer
Abstract: The internet especially social media has been a major platform where people interact with each other. We are able to interact with each other regardless of time and place because of the advancement of technology. Unfortunately, not all of the interaction that goes on are good or positive. One of the negative interaction that can happen online is cyberbullying which has rapidly increase throughout the years, whether it be through social media, emails or texting. Therefore, it is important to prevent cyberbullying from occurring which is why this research is done. Detection the presence of cyberbullying is one if the main issue in avoiding it from happening. Cyberbullying detection can be challenging because the many languages used in the world, most of the time slangs and informal languages are used and special characters like emoji are also used during online conversation. The aim of this research is to detect the presence of text cyberbullying from online post. Two term weighting schemes and two classification algorithms are compared in this research. The weighting schemes used namely Entropy and Term Frequency - Inverse Document Frequency (TF-IDF) for feature selection and Naïve Bayes algorithm is used and compared with Support Vector Machine (SVM) algorithm. As a result, it shows that Naïve Bayes classifier yields a better accuracy when used with TF-IDF which is 97.60%. Hopefully this research is able give other researchers an insight, particularly to those who are interested in a similar area.
Published: 2020
Full Text: View/download PDF

30. A generative semi-supervised classifier for datasets with unknown classes

Author: Anja Zernig, Stefan Schrunner, Roman Kern, and Bernhard C. Geiger
Subjects: Training set, Computer science, business.industry, Bayesian probability, Supervised learning, 020207 software engineering, 02 engineering and technology, Semi-supervised learning, Bayes classifier, Mixture model, Machine learning, computer.software_genre, Generative model, ComputingMethodologies_PATTERNRECOGNITION, 020204 information systems, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, business, Classifier (UML), computer
Abstract: Classification has been tackled by a large number of algorithms, predominantly following a supervised learning setting. Surprisingly little research has been devoted to the problem setting where a dataset is only partially labeled, including even instances of entirely unlabeled classes. Algorithmic solutions that are suited for such problems are especially important in practical scenarios, where the labelling of data is prohibitively expensive, or the understanding of the data is lacking, including cases, where only a subset of the classes is known. We present a generative method to address the problem of semi-supervised classification with unknown classes, whereby we follow a Bayesian perspective. In detail, we apply a two-step procedure based on Bayesian classifiers and exploit information from both a small set of labeled data in combination with a larger set of unlabeled training data, allowing that the labeled dataset does not contain samples from all present classes. This represents a common practical application setup, where the labeled training set is not exhaustive. We show in a series of experiments that our approach outperforms state-of-the-art methods tackling similar semi-supervised learning problems. Since our approach yields a generative model, which aids the understanding of the data, it is particularly suited for practical applications.
Published: 2020
Full Text: View/download PDF

31. BrainChain -A Machine learning Approach for protecting Blockchain applications using SDN

Author: Abdelhakim Hafid, Lyes Khoukhi, Zakaria Abou El Houda, VU VAN, Jean-Baptiste, Département d'Informatique et de Recherche Opérationnelle [Montreal] (DIRO), Université de Montréal (UdeM), Environnement de Réseaux Autonomes (ERA), Institut Charles Delaunay (ICD), and Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)-Université de Technologie de Troyes (UTT)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Blockchain, business.industry, Computer science, Entropy, 05 social sciences, 050801 communication & media studies, Denial-of-service attack, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], Bayes Classifier, Computer security, computer.software_genre, SDN, 0508 media and communications, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], DNS Amplification, 0502 economics and business, 050211 marketing, The Internet, business, DDoS, computer, [INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR]
Abstract: International audience; Nowadays, blockchain technology is seen as one of the main technological innovations to emerge since the advent of the internet. Many applications can benefit from blockchain to protect their exchanges. Nonetheless, applications with more restricted interests cannot use public blockchains. Permissioned blockchains promise to combine effectiveness of blockchains with stricter permissions to join blockchain's network. In permissioned blockchain, the number of participating entities is limited compared to public blockchain. However, by targeting the peers of the blockchain, the attackers can easily take control of consensus process and halt the blockchain operations. In this paper, we propose BrainChain, a scalable and efficient scheme to protect permissioned blockchain nodes from the largest ever Distributed Denial of Service (DDoS) attack (i.e., Domain Name System (DNS) amplification attack) in the context of software defined networks (SDN). BrainChain consists of 4 schemes: (1) Flow statistics collection scheme (FS) to gather the features of flows in an efficient way using sFlow; (2) Entropy based scheme (ES) to measure disorder of network features; (3) Bayes Network based Filtering scheme (BF) to classify, based on entropy values, illegitimate DNS requests; and (4) DNS Mitigation (DM) scheme to mitigate in an effective way the illegitimate flows (i.e., illegitimate DNS requests). Experimental results show that BrainChain can quickly and effectively detect and mitigate the attacks (i.e., DNS amplification attacks) with a high accuracy and a small false positive rate making it a promising scheme to protect blockchain applications from DNS Amplification attacks.
Published: 2020

32. Bayes Classifier Chain Based on SVM for Traditional Chinese Medical Prescription Generation

Author: Chaohan Pei, Yun Yang, Chunyang Ruan, and Yanchun Zhang
Subjects: Multi-label classification, 050101 languages & linguistics, Standardization, business.industry, Computer science, 05 social sciences, 02 engineering and technology, Traditional Chinese medicine, Bayes classifier, Machine learning, computer.software_genre, Support vector machine, Class imbalance, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0501 psychology and cognitive sciences, Artificial intelligence, Medical prescription, business, computer
Abstract: Traditional Chinese Medicine (TCM) plays an important role in the comprehensive treatment of lung cancer. However the quality of the prescriptions from TCM doctors depends on the doctor’s personal experience, which leads to the TCM prescriptions are the lack of standardization. We apply the original clinical TCM prescriptions data to train a standardized prescription generating model for TCM therapy. Our model adopts the Bayes Classifier Chain (BCC) algorithm to solve the label correlation problem, whose basic classifier is cost-sensitive SVM targeted to the class imbalance of the label. The results of experiments on the prescription dataset demonstrated the effectiveness and practicability of the proposed model for a prescription generation.
Published: 2020
Full Text: View/download PDF

33. Source Code Authorship Identification Using Tokenization and Boosting Algorithms

Author: Sergey Gorshkov, Dmitry Namiot, Eugene Ilyushin, Vladimir Sukhomlin, and Maxim Nered
Subjects: Source code, Boosting (machine learning), Word embedding, Computer science, media_common.quotation_subject, Bayes classifier, computer.software_genre, Random forest, Abstract syntax, Malware, Programmer, computer, Algorithm, media_common
Abstract: Each programmer has his unique coding style. Identification source code authorship solves the problem of determining the most likely creator of the source code, in particular, for plagiarism and disputes about intellectual property violations, as well as to help in finding the creators of malware. Extraction a unique style helps to maintain the uniformity of code in repositories, considering the different influence of programmers. Currently, methods based on random forests and abstract syntax trees, short n-grams for structure preservation and Bayes classifier and others are proposed. We present a new model, called StyleIndex, based on tokenization and tools for analyzing the semantics of programming languages and context of tokens in the program text, and extraction unique author’s style Index. The algorithm applies to various programming languages and shows very high classification accuracy. Moreover, our algorithm is able not only to correlate the source code and its creator, examples of programs which are available for training, but also to divide the program into categories by the alleged authors and have trained on other authors, thereby extraction the components define the style as a global concept, independent from specific authors. The main factors that determine the style are also identified.
Published: 2020
Full Text: View/download PDF

34. Performance analysis of one-dimensional naiïve bayes as a data imputation method for car insurance problems

Author: Hendri Murfi and Natalia Aji Yuwanti
Subjects: Support vector machine, Bayes' theorem, Statistics::Applications, Computer science, Data_GENERAL, Mode (statistics), Statistics::Methodology, Imputation (statistics), Data mining, Bayes classifier, Missing data, computer.software_genre, computer
Abstract: Machine learning methods are very widely used in helping human work. Not all data is as we expected. Some data have missing values. Data that has a missing value must be handled first at the pre-processing stage, one of which is by the imputation of the missing value. This study is comparing the imputation method of missing value uses mode and One-Dimensional Na1ve Bayes Classifier (1DNBC) to determine the performance analysis by using Support Vector Machine (SVM) for the prediction of car insurance participation. A better method is seen from the accuracy. Based on the simulation is obtained the same results for imputation using mode and One-Dimensional Na1ve Bayes are 1.00, which when examined further turns out to be the imputation of each missing value with the mode and prediction of imputation with One-Dimensional Na1ve Bayes are the same.
Published: 2020
Full Text: View/download PDF

35. Twitter Sentiment Analysis using Na¨ive Bayes Classifier with Mutual Information Feature Selection

Author: Maria Arista Ulfa, Budi Irmawati, and Ario Yudo Husodo
Subjects: Application programming interface, Computer science, business.industry, Sentiment analysis, Mutual information, Bayes classifier, computer.software_genre, Cross-validation, Naive Bayes classifier, Identification (information), Artificial intelligence, business, computer, Natural language processing, Sentence
Abstract: Analisis sentimen merupakan suatu teknik idetifikasi terhadap emosi yangdiekspresikan melalui teks. Tujuan analisis sentimen adalah menentukan apakah suatupendapat dalam kalimat atau dokumen termasuk kategori positif ataunegatif. Twitter merupakan salah satu media sosial yang sering digunakan dalammenyampaikan pendapat. Twitter memungkinkan penggunanya (user) untuk menulispendapat mereka mengenai berbagai topik dalam sebuah tweet. Data twitter dalampenelitian ini didownload melalui twitter Application Programming Interface (API).Data twitter tersebut terdiri dari 500 tweet tentang pariwisata Lombok dengan hashtag#lombok dan #woderfullombok. Fitur informasi dari setiap tweet diseleksimenggunakan metode Mutual Information dan dianalisis menggunakan modelklasifikasi Naïve Bayes (Naïve Bayes Classifier). Hasil pengujian klasifikasisentimen twitter pada kategori positif dan negatif menggunakan 10-fold crossvalidation memperoleh akurasi rata-rata sebesar 97,9%.Kata kunci : Analisis Sentimen, Twitter, Naïve Bayes Classifier, Mutual Information
Published: 2018
Full Text: View/download PDF

36. Sentiment analysis: a review and comparative analysis over social media

Author: Deepak Singh Tomar, Nikhil Kumar Singh, and Arun Kumar Sangaiah
Subjects: General Computer Science, Computer science, business.industry, Feature extraction, Sentiment analysis, 020206 networking & telecommunications, Context (language use), 02 engineering and technology, Bayes classifier, computer.software_genre, Random forest, Support vector machine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Social media, Artificial intelligence, Dimension (data warehouse), business, computer, Natural language processing
Abstract: Sentiment analysis is the computational examination of end user’s opinion, attitudes and emotions towards a particular topic or product. Sentiment analysis classifies the message according to their polarity whether it is positive, negative, or neutral. Recently researchers focused on lexical and machine-learning based method for sentiment analysis of social media post. Social media is a micro blogger site in which end users can post their comment in slag language that contains symbols, idioms, misspelled words and sarcastic sentences. Social media data also have curse of dimension problem i.e. high dimension nature of data that required specific pre-processing and feature extraction, which leads to improve classification accuracy. This paper present comprehensive overview of sentiment analysis technique based on recent research and subsequently explores machine learning (SVM, Navies Bayes, Linear Regression and Random Forest) and feature extraction techniques (POS, BOW and HASS tagging) in context of Sentiment analysis over social media data set. Further twitter data-sets are scrutinized and pre-processed with proposed framework,which yield intersecting facts about the capabilities and deficiency of sentiment analysis methods. POS is most suitable feature extraction technique with SVM and Navie Bayes classifier. Whereas Random Forest and linear regression provide the better result with Hass tagging.
Published: 2018
Full Text: View/download PDF

37. A novel Bayes defect predictor based on information diffusion function

Author: Chengzu Bai, Yaning Wu, Song Huang, Changyou Zheng, and Haijin Ji
Subjects: Information Systems and Management, Computer science, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Logistic regression, Management Information Systems, Normal distribution, Naive Bayes classifier, Bayes' theorem, Artificial Intelligence, Statistical significance, 0202 electrical engineering, electronic engineering, information engineering, Training set, business.industry, 020207 software engineering, Software metric, Support vector machine, Statistical classification, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML), computer, Software
Abstract: Software defect prediction plays a significant part in identifying the most defect-prone modules before software testing. Quite a number of researchers have made great efforts to improve prediction accuracy. However, the problem of insufficient historical data available for within- or cross- project still remains unresolved. Further, it is common practice to use the probability density function for a normal distribution in Naive Bayes (NB) classifier. Nevertheless, after performing a Kolmogorov–Smirnov test, we find that the 21 main software metrics are not normally distributed at the 5% significance level. Therefore, this paper proposes a new Bayes classifier, which evolves NB classifier with non-normal information diffusion function, to help solve the problem of lacking appropriate training data for new projects. We conduct three experiments on 34 data sets obtained from 10 open source projects, using only 10%, 6.67%, 5%, 3.33% and 2% of the total data for training, respectively. Four well-known classification algorithms are also included for comparison, namely Logistic Regression, Naive Bayes, Random Tree and Support Vector Machine. All experimental results demonstrate the efficiency and practicability of the new classifier.
Published: 2018
Full Text: View/download PDF

38. Optimization of classifier chains via conditional likelihood maximization

Author: Lu Sun and Mineichi Kudo
Subjects: Conditional likelihood, Computer science, business.industry, Feature selection, Pattern recognition, 02 engineering and technology, Maximization, Bayes classifier, Quadratic classifier, Machine learning, computer.software_genre, ComputingMethodologies_PATTERNRECOGNITION, Artificial Intelligence, 020204 information systems, Signal Processing, Margin classifier, 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Classifier chains, business, computer, Software
Abstract: Multi-label classification associates an unseen instance with multiple relevant labels. In recent years, a variety of methods have been proposed to handle the multi-label problems. Classifier chains is one of the most popular multi-label methods because of its efficiency and simplicity. In this paper, we consider to optimize classifier chains from the viewpoint of conditional likelihood maximization. In the proposed unified framework, classifier chains can be optimized in either or both of two aspects: label correlation modeling and multi-label feature selection. In this paper we show that previous classifier chains algorithms are specified in the unified framework. In addition, previous information theoretic multi-label feature selection algorithms are specified with different assumptions on the feature and label spaces. Based on these analyses, we propose a novel multi-label method, k-dependence classifier chains with label-specific features, and demonstrate the effectiveness of the method.
Published: 2018
Full Text: View/download PDF

39. Characterizing information propagation patterns in emergencies: A case study with Yiliang Earthquake

Author: Jun Tian, Qingpeng Zhang, Haolin Wang, and Lifang Li
Subjects: Event monitoring, Service (systems architecture), Information retrieval, Emergency management, Computer Networks and Communications, Computer science, Microblogging, business.industry, 05 social sciences, 02 engineering and technology, Library and Information Sciences, Bayes classifier, computer.software_genre, Social media analytics, Categorization, 020204 information systems, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, 050211 marketing, Social media, Data mining, business, computer, Information Systems
Abstract: We developed a Multinomial Nave Bayes Classifier to categorize the microblog posts into five types according to the text content of posts.Different types of information had significantly different propagation patterns in terms of scale and topological features.Social media users exhibited significantly different interaction patterns for different types of information at different stages. Social media has been playing an increasingly important role in information publishing and event monitoring in emergencies like natural disasters. The propagation of different types of information on social media is critical in understanding the reaction and mobility of social media users during natural disasters. In this research, we analyzed the dynamic social networks formed by the reposting (retweeting) behaviors in Weibo.com (the major microblog service in China) during Yiliang Earthquake. We developed a Multinomial Nave Bayes Classifier to categorize the microblog posts into five types based on the content, and then characterized the information propagation patterns of the five types of information at different stages after the earthquake occurred. We found that the type of information has significant influence on the propagation patterns in terms of scale and topological features. This research revealed the important role of information type in the publicity and propagation of disaster-related information, thus generated data-driven insights for timely and efficient emergency management using the publicly available social media data.
Published: 2018
Full Text: View/download PDF

40. Comparison of Classifiers for Leak Location in Water Distribution Networks

Author: José M. Bernal de Lázaro, Cristina Verde, Alberto Prieto-Moreno, Marcos Quiñones-Grueiro, and Orestes Llanes-Santiago
Subjects: 0209 industrial biotechnology, Leak, Artificial neural network, Distribution networks, Computer science, 0208 environmental biotechnology, 02 engineering and technology, Bayes classifier, computer.software_genre, 020801 environmental engineering, k-nearest neighbors algorithm, Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, 020901 industrial engineering & automation, Control and Systems Engineering, Robustness (computer science), Data mining, computer
Abstract: In this paper, the use of supervised classifiers for leak location in water distribution networks (WDN) is discussed. A comparative study is presented in the context of a benchmark network under the same leak and sensor placement scenarios. The comparison considers four classification tools widely used in the pattern recognition framework: Nearest Neighbor, Bayes Classifier, Artificial Neural Networks and Support Vector Machines. The classifiers’ selection is made by considering their different working principles and application advantages. Training and testing sets are formed by the residuals generated by using the EPANET hydraulic simulator. The robustness of the methods is compared with respect to the leak location performance under model parameter uncertainty, demand uncertainty, leak size uncertainty and sensor noise. The SVM performs similar or better than the other classifiers when all uncertainties are present.
Published: 2018
Full Text: View/download PDF

41. Identification with machine learning techniques of a classification model for the degree of damage to rubber-textile conveyor belts with the aim to achieve sustainability

Author: Miriam Andrejiová, Daniela Marasova, and Anna Grincova
Subjects: Computer science, business.industry, General Engineering, Decision tree, 020101 civil engineering, Conveyor belt, Regression analysis, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Logistic regression, 0201 civil engineering, Naive Bayes classifier, Identification (information), 020303 mechanical engineering & transports, Cohen's kappa, 0203 mechanical engineering, General Materials Science, Artificial intelligence, business, computer
Abstract: This article presents the results of experimental research on belt conveyance systems. The main objective was to identify the correlations between the occurrence of significant damage in rubber-textile conveyor belts and the selected parameters (the type of falling material and the impact height). The conveyor belt specimens used in the experimental research were extracted from both a new and a renovated conveyor belt. Within the experimental research, four classification models were created, while the conveyor belt specimens used in the predefined experimental conditions were classified by assigning them to one of the two determined degrees of damage (significant or insignificant damage). The classification models were created by applying several machine learning methods, such as a regression analysis, logistic regression, decision trees, and the Naive Bayes classifier. The quality of the models was verified using the training and testing groups and three coefficients (overall accuracy, Kappa coefficient and AUC). An analysis of the results indicated that the type of falling material and the impact height had significant effects on the degree of conveyor belt damage, regardless of the conveyor belt type (new or renovated). An evaluation of the models indicated that all the designed classification models provided similar results. As for the quality coefficients, the classification models that were created by applying the decision tree and the Naive Bayes classifier exhibited the best classification and prediction abilities.
Published: 2021
Full Text: View/download PDF

42. A probabilistic clustering model for hate speech classification in twitter

Author: Friday Thomas Ibharalu, Idowu Ademola Osinuga, Adebayo Abayomi-Alli, Femi Emmanuel Ayo, and Olusegun Folorunso
Subjects: 0209 industrial biotechnology, Voice activity detection, Computer science, business.industry, General Engineering, 02 engineering and technology, Bayes classifier, computer.software_genre, Class (biology), Fuzzy logic, Cross-validation, Computer Science Applications, Metadata, ComputingMethodologies_PATTERNRECOGNITION, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, tf–idf, business, Cluster analysis, computer, ComputingMilieux_MISCELLANEOUS, Natural language processing
Abstract: The key challenges for automatic hate-speech classification in Twitter are the lack of generic architecture, imprecision, threshold settings and fragmentation issues. Most studies used binary classifiers for hate speech classification, but these classifiers cannot really capture other emotions that may overlap between positive or negative class. Hence, a probabilistic clustering model for hate speech classification in twitter was developed to tackle problems with hate speech classification. A metadata extractor was used to collect tweets containing hate speech keywords and a crowd-sourced experts was employed to label the collected hate tweets into two categories: hate speech and non-hate speech. Features representation was done with Term Frequency- Inverse Document Frequency (TF-IDF) model and enhanced with topics inferred by a Bayes classifier. A rule-based clustering method was used to automatically classify real-time tweets into the correct topic clusters. Fuzzy logic was then used for hate speech classification using semantic fuzzy rules and a score computation module. From the evaluation results, it was observed that the developed model performed better in hate speech detection with F1-sore of 0.9256 using a 5-fold cross validation. Similarly, the developed model for hate speech classification performed better with F1-score of 91.5 compared to related models. The developed model also indicates a more perfect test having an AUC of 0.9645, when compared to similar methods. The Paired Sample t-Test validated the efficiency of the developed model for hate speech classification.
Published: 2021
Full Text: View/download PDF

43. Dynamic data-driven learning for self-healing avionics

Author: Carlos A. Varela, Wennan Zhu, Shigeru Imai, and Sida Chen
Subjects: Computer Networks and Communications, Computer science, Data stream mining, Dynamic data, Airspeed, Real-time computing, ComputerApplications_COMPUTERSINOTHERSYSTEMS, 020206 networking & telecommunications, 02 engineering and technology, Avionics, Bayes classifier, Integrated modular avionics, computer.software_genre, Bayesian inference, Outlier, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, computer, Software
Abstract: In sensor-based systems, spatio-temporal data streams are often related in non-trivial ways. For example in avionics, while the airspeed that an aircraft attains in cruise phase depends on the weight it carries, it also depends on many other factors such as engine inputs, angle of attack, and air density. It is therefore a challenge to develop failure models that can help recognize errors in the data, such as an incorrect fuel quantity or an incorrect airspeed. In this paper, we present a highly-declarative programming framework that facilitates the development of self-healing avionics applications, which can detect and recover from data errors. Our programming framework enables specifying expert-created failure models using error signatures, as well as learning failure models from data. To account for unanticipated failure modes, we propose a new dynamic Bayes classifier, that detects outliers and upgrades them to new modes when statistically significant. We evaluate error signatures and our dynamic Bayes classifier for accuracy, response time, and adaptability of error detection. While error signatures can be more accurate and responsive than dynamic Bayesian learning, the latter method adapts better due to its data-driven nature.
Published: 2017
Full Text: View/download PDF

44. How to use negative class information for Naive Bayes classification

Author: Youngjoong Ko
Subjects: 02 engineering and technology, Library and Information Sciences, Management Science and Operations Research, Bayes classifier, Machine learning, computer.software_genre, Naive Bayes classifier, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Mathematics, Soft independent modelling of class analogies, business.industry, 05 social sciences, Supervised learning, Search engine indexing, Pattern recognition, Quadratic classifier, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Bayes error rate, 020201 artificial intelligence & image processing, Artificial intelligence, 0509 other social sciences, 050904 information & library sciences, business, Classifier (UML), computer, Information Systems
Abstract: The Naive Bayes (NB) classifier is a popular classifier for text classification problems due to its simple, flexible framework and its reasonable performance. In this paper, we present how to effectively utilize negative class information to improve NB classification. As opposed to information retrieval, supervised learning based text classification already obtains class information, a negative class as well as a positive class, from a labeled training dataset. Since the negative class can also provide significant information to improve the NB classifier, the negative class information is applied to the NB classifier through two phases of indexing and class prediction tasks. As a result, the new classifier using the negative class information consistently performs better than the traditional multinomial NB classifier.
Published: 2017
Full Text: View/download PDF

45. PGNBC: Pearson Gaussian Naïve Bayes classifier for data stream classification with recurring concept drift

Author: D. Kishore Babu, Y. Ramadevi, and K.V. Ramana
Subjects: Data stream, Concept drift, Computer science, business.industry, Gaussian, Pattern recognition, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Theoretical Computer Science, symbols.namesake, Naive Bayes classifier, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, symbols, Bayes error rate, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer
Published: 2017
Full Text: View/download PDF

46. Music Emotions Recognition by Machine Learning With Cognitive Classification Methodologies

Author: Shi Jinliang, Ying Wu, Jun Peng, Kan Luo, Yingxu Wang, Feng Lixiao, Jianqing Li, and Junjie Bai
Subjects: Computer science, business.industry, Emotion classification, Cognitive computing, Feature extraction, 0102 computer and information sciences, 02 engineering and technology, Bayes classifier, Linear discriminant analysis, Machine learning, computer.software_genre, 01 natural sciences, Human-Computer Interaction, Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, 010201 computation theory & mathematics, Artificial Intelligence, Pattern recognition (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Affective computing, computer, Software
Abstract: Music emotions recognition (MER) is a challenging field of studies addressed in multiple disciplines such as musicology, cognitive science, physiology, psychology, arts and affective computing. In this article, music emotions are classified into four types known as those of pleasing, angry, sad and relaxing. MER is formulated as a classification problem in cognitive computing where 548 dimensions of music features are extracted and modeled. A set of classifications and machine learning algorithms are explored and comparatively studied for MER, which includes Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Neuro-Fuzzy Networks Classification (NFNC), Fuzzy KNN (FKNN), Bayes classifier and Linear Discriminant Analysis (LDA). Experimental results show that the SVM, FKNN and LDA algorithms are the most effective methodologies that obtain more than 80% accuracy for MER.
Published: 2017
Full Text: View/download PDF

47. Intrusion Detection System Based On Flows Using Machine Learning Algorithms

Author: Danillo Roberto Pereira, Helton Molina Sapia, João Paulo Papa, Ronaldo Toshiaki Oiakawa, Victor Hugo C. de Albuquerque, Eduardo Massato Kakihata, Francisco Assis da Silva, Universidade Do Oeste Paulista (Unoeste), Universidade Estadual Paulista (Unesp), and Universidade de Fortaleza (Unifor)
Subjects: General Computer Science, Computer science, SVM, KNN, 02 engineering and technology, Intrusion detection system, Bayes Classifier, Bayes classifier, computer.software_genre, Machine learning, Machine Learning, NetFlow, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Intrusion Detection System, business.industry, Network packet, Netflow, 020207 software engineering, Flow network, Support vector machine, Flow (mathematics), OPF, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, Personally identifiable information, computer
Abstract: Made available in DSpace on 2018-12-11T17:34:40Z (GMT). No. of bitstreams: 0 Previous issue date: 2017-10-01 The use of technology information and communication by different types of devices generates a large quantity of data packets that contains of confidential and personal information. The traffic of data packet can be summarized in network flow. Due this reason, it is necessary to use computer security tools, such as Intrusion Detection Systems (IDS). This work presents an IDS that can perform the flow- based analysis (netflow). This research conducted an analysis on flows previously collected and properly detected of three different types of attacks. The flows were organized to be processed by machine learning methods. The results obtained by proposed approach were very promising. Also, this work aimed at building a public dataset to be used by researchers worldwide in order to foster IDS-related research. Universidade Do Oeste Paulista (Unoeste) Universidade Estadual Paulista (Unesp) Universidade de Fortaleza (Unifor) Universidade Estadual Paulista (Unesp)
Published: 2017
Full Text: View/download PDF

48. Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Author: Andrew Hunt, Elena Gaura, Kojo Sarfo Gyamfi, and James Brusey
Subjects: FOS: Computer and information sciences, Heteroscedasticity, Linear classifier, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Machine Learning (cs.LG), Artificial Intelligence, Homoscedasticity, 0202 electrical engineering, electronic engineering, information engineering, Engineering(all), Mathematics, business.industry, General Engineering, 020206 networking & telecommunications, Pattern recognition, Linear discriminant analysis, Computer Science Applications, Support vector machine, Computer Science - Learning, ComputingMethodologies_PATTERNRECOGNITION, Binary classification, Bayes error rate, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: We derive a linear classifier for heteroscedastic linear discriminant analysis.The proposed scheme efficiently minimises the Bayes error for binary classification.A local neighbourhood search is also proposed for non-normal distributions.The proposed schemes are experimentally validated on twelve datasets. Under normality and homoscedasticity assumptions, Linear Discriminant Analysis (LDA) is known to be optimal in terms of minimising the Bayes error for binary classification. In the heteroscedastic case, LDA is not guaranteed to minimise this error. Assuming heteroscedasticity, we derive a linear classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the Bayes error for binary classification. In addition, we also propose a local neighbourhood search (LNS) algorithm to obtain a more robust classifier if the data is known to have a non-normal distribution. We evaluate the proposed classifiers on two artificial and ten real-world datasets that cut across a wide range of application areas including handwriting recognition, medical diagnosis and remote sensing, and then compare our algorithm against existing LDA approaches and other linear classifiers. The GLD is shown to outperform the original LDA procedure in terms of the classification accuracy under heteroscedasticity. While it compares favourably with other existing heteroscedastic LDA approaches, the GLD requires as much as 60 times lower training time on some datasets. Our comparison with the support vector machine (SVM) also shows that, the GLD, together with the LNS, requires as much as 150 times lower training time to achieve an equivalent classification accuracy on some of the datasets. Thus, our algorithms can provide a cheap and reliable option for classification in a lot of expert systems.
Published: 2017
Full Text: View/download PDF

49. When is the Naive Bayes approximation not so naive?

Author: Ana Ruiz Linares, Christopher R. Stephens, and Hugo Flores Huerta
Subjects: business.industry, Generalization, 02 engineering and technology, Bayes classifier, Machine learning, computer.software_genre, Naive Bayes classifier, Artificial Intelligence, 020204 information systems, Phenomenon, 0202 electrical engineering, electronic engineering, information engineering, Performance prediction, A priori and a posteriori, Bayes error rate, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML), computer, Software, Mathematics
Abstract: The Naive Bayes approximation (NBA) and associated classifier are widely used and offer robust performance across a large spectrum of problem domains. As it depends on a very strong assumption--independence among features--this has been somewhat puzzling. Various hypotheses have been put forward to explain its success and many generalizations have been proposed. In this paper we propose a set of "local" error measures--associated with the likelihood functions for subsets of attributes and for each class--and show explicitly how these local errors combine to give a "global" error associated to the full attribute set. By so doing we formulate a framework within which the phenomenon of error cancelation, or augmentation, can be quantified and its impact on classifier performance estimated and predicted a priori. These diagnostics allow us to develop a deeper and more quantitative understanding of why the NBA is so robust and under what circumstances one expects it to break down. We show how these diagnostics can be used to select which features to combine and use them in a simple generalization of the NBA, applying the resulting classifier to a set of real world data sets.
Published: 2017
Full Text: View/download PDF

50. A Novel Technique for Fingerprint Classification based on Naive Bayes Classifier and Support Vector Machine

Author: Ashish Mishra and Preeti Maheshwary
Subjects: Probabilistic classification, Structured support vector machine, business.industry, Computer science, Pattern recognition, Quadratic classifier, Bayes classifier, Machine learning, computer.software_genre, Relevance vector machine, Naive Bayes classifier, Margin classifier, Bayes error rate, Artificial intelligence, business, computer
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

949 results on '"bayes classifier"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources