20 results on '"Bernhard Pfahringer"'
Search Results
2. Predicting COVID-19 Patient Shielding: A Comprehensive Study
- Author
-
Vithya Yogarajan, Jacob Montiel, Tony Smith, and Bernhard Pfahringer
- Published
- 2022
3. Multiclass Malware Classification Using Either Static Opcodes or Dynamic API Calls
- Author
-
Rajchada Chanajitt, Bernhard Pfahringer, Heitor Murilo Gomes, and Vithya Yogarajan
- Published
- 2022
4. A Comparison of Neural Network Architectures for Malware Classification Based on Noriben Operation Sequences
- Author
-
Rajchada Chanajitt, Bernhard Pfahringer, and Heitor Murilo Gomes
- Published
- 2022
5. Better Self-training for Image Classification Through Self-supervision
- Author
-
Attaullah Sahito, Eibe Frank, and Bernhard Pfahringer
- Published
- 2022
6. Towards Automated Configuration of Stream Clustering Algorithms
- Author
-
Matthias Carnein, Albert Bifet, Bernhard Pfahringer, and Heike Trautmann
- Subjects
Computer science ,Data stream mining ,02 engineering and technology ,Tracking (particle physics) ,computer.software_genre ,Algorithm Selection ,03 medical and health sciences ,0302 clinical medicine ,Automated algorithm ,Software deployment ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Cluster analysis ,computer - Abstract
Clustering is an important technique in data analysis which can reveal hidden patterns and unknown relationships in the data. A common problem in clustering is the proper choice of parameter settings. To tackle this, automated algorithm configuration is available which can automatically find the best parameter settings. In practice, however, many of our today’s data sources are data streams due to the widespread deployment of sensors, the internet-of-things or (social) media. Stream clustering aims to tackle this challenge by identifying, tracking and updating clusters over time. Unfortunately, none of the existing approaches for automated algorithm configuration are directly applicable to the streaming scenario. In this paper, we explore the possibility of automated algorithm configuration for stream clustering algorithms using an ensemble of different configurations. In first experiments, we demonstrate that our approach is able to automatically find superior configurations and refine them over time.
- Published
- 2020
7. Comparing High Dimensional Word Embeddings Trained on Medical Text to Bag-of-Words for Predicting Medical Codes
- Author
-
Tony C. Smith, Michael Mayo, Vithya Yogarajan, Bernhard Pfahringer, and Henry Gouk
- Subjects
0303 health sciences ,Computer science ,business.industry ,Context (language use) ,02 engineering and technology ,computer.software_genre ,03 medical and health sciences ,Binary classification ,Bag-of-words model ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,Representation (mathematics) ,business ,computer ,Natural language processing ,Word (computer architecture) ,030304 developmental biology ,Curse of dimensionality - Abstract
Word embeddings are a useful tool for extracting knowledge from the free-form text contained in electronic health records, but it has become commonplace to train such word embeddings on data that do not accurately reflect how language is used in a healthcare context. We use prediction of medical codes as an example application to compare the accuracy of word embeddings trained on health corpora to those trained on more general collections of text. It is shown that both an increase in embedding dimensionality and an increase in the volume of health-related training data improves prediction accuracy. We also present a comparison to the traditional bag-of-words feature representation, demonstrating that in many cases, this conceptually simple method for representing text results in superior accuracy to that of word embeddings.
- Published
- 2020
8. A Comparison of Machine Learning Methods for Cross-Domain Few-Shot Learning
- Author
-
Michael Mayo, Hongyu Wang, Henry Gouk, Eibe Frank, and Bernhard Pfahringer
- Subjects
business.industry ,Computer science ,Feature vector ,Cosine similarity ,02 engineering and technology ,Quadratic classifier ,Machine learning ,computer.software_genre ,Random forest ,Support vector machine ,Naive Bayes classifier ,ComputingMethodologies_PATTERNRECOGNITION ,020204 information systems ,Classifier (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
We present an empirical evaluation of machine learning algorithms in cross-domain few-shot learning based on a fixed pre-trained feature extractor. Experiments were performed in five target domains (CropDisease, EuroSAT, Food101, ISIC and ChestX) and using two feature extractors: a ResNet10 model trained on a subset of ImageNet known as miniImageNet and a ResNet152 model trained on the ILSVRC 2012 subset of ImageNet. Commonly used machine learning algorithms including logistic regression, support vector machines, random forests, nearest neighbour classification, naive Bayes, and linear and quadratic discriminant analysis were evaluated on the extracted feature vectors. We also evaluated classification accuracy when subjecting the feature vectors to normalisation using p-norms. Algorithms originally developed for the classification of gene expression data—the nearest shrunken centroid algorithm and LDA ensembles obtained with random projections—were also included in the experiments, in addition to a cosine similarity classifier that has recently proved popular in few-shot learning. The results enable us to identify algorithms, normalisation methods and pre-trained feature extractors that perform well in cross-domain few-shot learning. We show that the cosine similarity classifier and \(\ell ^2\)-regularised 1-vs-rest logistic regression are generally the best-performing algorithms. We also show that algorithms such as LDA yield consistently higher accuracy when applied to \(\ell ^2\)-normalised feature vectors. In addition, all classifiers generally perform better when extracting feature vectors using the ResNet152 model instead of the ResNet10 model.
- Published
- 2020
9. confStream: Automated Algorithm Selection and Configuration of Stream Clustering Algorithms
- Author
-
Heike Trautmann, Matthias Carnein, Albert Bifet, and Bernhard Pfahringer
- Subjects
Hyperparameter ,050101 languages & linguistics ,Computer science ,Data stream mining ,05 social sciences ,02 engineering and technology ,computer.software_genre ,Regression ,Algorithm Selection ,Task (computing) ,Automated algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Data mining ,Cluster analysis ,computer ,Selection (genetic algorithm) - Abstract
Machine learning has become one of the most important tools in data analysis. However, selecting the most appropriate machine learning algorithm and tuning its hyperparameters to their optimal values remains a difficult task. This is even more difficult for streaming applications where automated approaches are often not available to help during algorithm selection and configuration. This paper proposes the first approach for automated algorithm selection and configuration of stream clustering algorithms. We train an ensemble of different stream clustering algorithms and configurations in parallel and use the best performing configuration to obtain a clustering solution. By drawing new configurations from better performing ones, we are able to improve the ensemble performance over time. In large experiments on real and artificial data we show how our ensemble approach can improve upon default configurations and can also compete with a-posteriori algorithm configuration. Our approach is considerably faster than a-posteriori approaches and applicable in real-time. In addition, it is not limited to stream clustering and can be generalised to all streaming applications, including stream classification and regression.
- Published
- 2020
10. Transfer of Pretrained Model Weights Substantially Improves Semi-supervised Image Classification
- Author
-
Attaullah Sahito, Bernhard Pfahringer, and Eibe Frank
- Subjects
Similarity (geometry) ,Artificial neural network ,Contextual image classification ,business.industry ,Computer science ,Supervised learning ,02 engineering and technology ,010501 environmental sciences ,Overfitting ,Machine learning ,computer.software_genre ,01 natural sciences ,Domain (software engineering) ,ComputingMethodologies_PATTERNRECOGNITION ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Transfer of learning ,computer ,0105 earth and related environmental sciences - Abstract
Deep neural networks produce state-of-the-art results when trained on a large number of labeled examples but tend to overfit when small amounts of labeled examples are used for training. Creating a large number of labeled examples requires considerable resources, time, and effort. If labeling new data is not feasible, so-called semi-supervised learning can achieve better generalisation than purely supervised learning by employing unlabeled instances as well as labeled ones. The work presented in this paper is motivated by the observation that transfer learning provides the opportunity to potentially further improve performance by exploiting models pretrained on a similar domain. More specifically, we explore the use of transfer learning when performing semi-supervised learning using self-learning. The main contribution is an empirical evaluation of transfer learning using different combinations of similarity metric learning methods and label propagation algorithms in semi-supervised learning. We find that transfer learning always substantially improves the model's accuracy when few labeled examples are available, regardless of the type of loss used for training the neural network. This finding is obtained by performing extensive experiments on the SVHN, CIFAR10, and Plant Village image classification datasets and applying pretrained weights from Imagenet for transfer learning.
- Published
- 2020
11. Semi-supervised Learning Using Siamese Networks
- Author
-
Eibe Frank, Attaullah Sahito, and Bernhard Pfahringer
- Subjects
Similarity (geometry) ,Artificial neural network ,Euclidean space ,Computer science ,business.industry ,Process (computing) ,02 engineering and technology ,Semi-supervised learning ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Classifier (UML) ,computer ,0105 earth and related environmental sciences - Abstract
Neural networks have been successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are more difficult to train successfully for semi-supervised problems where small amounts of labeled instances are available along with a large number of unlabeled instances. This work explores a new training method for semi-supervised learning that is based on similarity function learning using a Siamese network to obtain a suitable embedding. The learned representations are discriminative in Euclidean space, and hence can be used for labeling unlabeled instances using a nearest-neighbor classifier. Confident predictions of unlabeled instances are used as true labels for retraining the Siamese network on the expanded training set. This process is applied iteratively. We perform an empirical study of this iterative self-training algorithm. For improving unlabeled predictions, local learning with global consistency [22] is also evaluated.
- Published
- 2019
12. MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes
- Author
-
Michael J. Cree, Henry Gouk, Bernhard Pfahringer, and Eibe Frank
- Subjects
Artificial neural network ,Series (mathematics) ,Computer science ,05 social sciences ,010501 environmental sciences ,Overfitting ,Lipschitz continuity ,01 natural sciences ,Matrix (mathematics) ,0502 economics and business ,Benchmark (computing) ,050207 economics ,Constant (mathematics) ,Algorithm ,SIMPLE algorithm ,0105 earth and related environmental sciences - Abstract
Effective regularisation of neural networks is essential to combat overfitting due to the large number of parameters involved. We present an empirical analogue to the Lipschitz constant of a feed-forward neural network, which we refer to as the maximum gain. We hypothesise that constraining the gain of a network will have a regularising effect, similar to how constraining the Lipschitz constant of a network has been shown to improve generalisation. A simple algorithm is provided that involves rescaling the weight matrix of each layer after each parameter update. We conduct a series of studies on common benchmark datasets, and also a novel dataset that we introduce to enable easier significance testing for experiments using convolutional networks. Performance on these datasets compares favourably with other common regularisation techniques. Data related to this paper is available at: https://www.cs.waikato.ac.nz/~ml/sins10/.
- Published
- 2019
13. Using Supervised Pretraining to Improve Generalization of Neural Networks on Binary Classification Problems
- Author
-
Patricia Riddle, Alex Yuxuan Peng, Yun Sing Koh, and Bernhard Pfahringer
- Subjects
Artificial neural network ,Computer science ,business.industry ,Generalization ,Process (computing) ,Initialization ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Synthetic data ,03 medical and health sciences ,0302 clinical medicine ,Binary classification ,Code (cryptography) ,Artificial intelligence ,High dimensionality ,business ,computer ,030217 neurology & neurosurgery ,0105 earth and related environmental sciences - Abstract
Neural networks are known to be very sensitive to the initial weights. There has been a lot of research on initialization that aims to stabilize the training process. However, very little research has studied the relationship between initialization and generalization. We demonstrate that poorly initialized model will lead to lower test accuracy. We propose a supervised pretraining technique that helps improve generalization on binary classification problems. The experimental results on four UCI datasets show that the proposed pretraining leads to higher test accuracy compared to the he_normal initialization when the training set is small. In further experiments on synthetic data, the improvement on test accuracy using the proposed pretraining reaches more than 30% when the data has high dimensionality and noisy features. Code related to this paper is available at: https://github.com/superRookie007/supervised_pretraining.
- Published
- 2019
14. Ensembles of Nested Dichotomies with Multiple Subset Evaluation
- Author
-
Bernhard Pfahringer, Tim Leathart, Eibe Frank, and Geoffrey Holmes
- Subjects
Selection (relational algebra) ,Mean squared error ,Computer science ,Binary number ,02 engineering and technology ,Construct (python library) ,010501 environmental sciences ,01 natural sciences ,Set (abstract data type) ,Binary classification ,Simple (abstract algebra) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Algorithm ,Randomness ,0105 earth and related environmental sciences - Abstract
A system of nested dichotomies (NDs) is a method of decomposing a multiclass problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of NDs produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of NDs, regardless of whether they are employed as an individual model or in an ensemble setting.
- Published
- 2019
15. On Calibration of Nested Dichotomies
- Author
-
Eibe Frank, Geoffrey Holmes, Tim Leathart, and Bernhard Pfahringer
- Subjects
Series (mathematics) ,Computer science ,Binary number ,02 engineering and technology ,010501 environmental sciences ,Base (topology) ,01 natural sciences ,Multiclass classification ,Set (abstract data type) ,Tree structure ,Binary classification ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Node (circuits) ,Algorithm ,0105 earth and related environmental sciences - Abstract
Nested dichotomies (NDs) are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these NDs typically exhibit poor probability calibration, even when the binary base models are well-calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full ND structure, especially when the number of classes is high.
- Published
- 2019
16. Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection
- Author
-
Eibe Frank, Bernhard Pfahringer, and Tim Leathart
- Subjects
Class (set theory) ,Theoretical computer science ,Selection (relational algebra) ,Dichotomy ,business.industry ,Computer science ,Binary number ,02 engineering and technology ,01 natural sciences ,Set (abstract data type) ,010104 statistics & probability ,Software ,Binary classification ,0202 electrical engineering, electronic engineering, information engineering ,Random pair ,020201 artificial intelligence & image processing ,0101 mathematics ,business - Abstract
A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Although ensembles of nested dichotomies with random structure have been shown to perform well in practice, using a more sophisticated class subset selection method can be used to improve classification accuracy. We investigate an approach to this problem called random-pair selection, and evaluate its effectiveness compared to other published methods of subset selection. We show that our method outperforms other methods in many cases when forming ensembles of nested dichotomies, and is at least on par in all other cases. The software related to this paper is available at https://svn.cms.waikato.ac.nz/svn/weka/trunk/packages/internal/ensemblesOfNestedDichotomies/.
- Published
- 2016
17. On Dynamic Feature Weighting for Feature Drifting Data Streams
- Author
-
Jean Paul Barddal, Bernhard Pfahringer, Albert Bifet, Heitor Murilo Gomes, and Fabrício Enembreck
- Subjects
Data stream ,Concept drift ,Computer science ,Data stream mining ,business.industry ,02 engineering and technology ,Machine learning ,computer.software_genre ,Weighting ,Naive Bayes classifier ,020204 information systems ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Adaptive learning ,Data mining ,Artificial intelligence ,business ,Classifier (UML) ,computer - Abstract
The ubiquity of data streams has been encouraging the development of new incremental and adaptive learning algorithms. Data stream learners must be fast, memory-bounded, but mainly, tailored to adapt to possible changes in the data distribution, a phenomenon named concept drift. Recently, several works have shown the impact of a so far nearly neglected type of drifcccct: feature drifts. Feature drifts occur whenever a subset of features becomes, or ceases to be, relevant to the learning task. In this paper we (i) provide insights into how the relevance of features can be tracked as a stream progresses according to information theoretical Symmetrical Uncertainty; and (ii) how it can be used to boost two learning schemes: Naive Bayesian and k-Nearest Neighbor. Furthermore, we investigate the usage of these two new dynamically weighted learners as prediction models in the leaves of the Hoeffding Adaptive Tree classifier. Results show improvements in accuracy (an average of 10.69 % for k-Nearest Neighbor, 6.23 % for Naive Bayes and 4.42 % for Hoeffding Adaptive Trees) in both synthetic and real-world datasets at the expense of a bounded increase in both memory consumption and processing time.
- Published
- 2016
18. AI 2015: Advances in Artificial Intelligence
- Author
-
Jochen Renz and Bernhard Pfahringer
- Subjects
Engineering ,business.industry ,Artificial intelligence ,business - Published
- 2015
19. Hierarchical Meta-Rules for Scalable Meta-Learning
- Author
-
Bernhard Pfahringer and Quan Sun
- Subjects
Quadratic equation ,Meta learning (computer science) ,Computer science ,Scalability ,Context (language use) ,Pairwise comparison ,Algorithm ,Ranking (information retrieval) - Abstract
The Pairwise Meta-Rules (PMR) method proposed in [18] has been shown to improve the predictive performances of several meta-learning algorithms for the algorithm ranking problem. Given m target objects (e.g., algorithms), the training complexity of the PMR method with respect to m is quadratic: \(\binom{m}{2} = m \times (m - 1) / 2\). This is usually not a problem when m is moderate, such as when ranking 20 different learning algorithms. However, for problems with a much larger m, such as the meta-learning-based parameter ranking problem, where m can be 100+, the PMR method is less efficient. In this paper, we propose a novel method named Hierarchical Meta-Rules (HMR), which is based on the theory of orthogonal contrasts. The proposed HMR method has a linear training complexity with respect to m, providing a way of dealing with a large number of objects that the PMR method cannot handle efficiently. Our experimental results demonstrate the benefit of the new method in the context of meta-learning.
- Published
- 2014
20. Propositionalisation of Multi-instance Data Using Random Forests
- Author
-
Eibe Frank and Bernhard Pfahringer
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,Competitive learning ,Online machine learning ,Semi-supervised learning ,Machine learning ,computer.software_genre ,Support vector machine ,Computational learning theory ,Unsupervised learning ,Instance-based learning ,Artificial intelligence ,business ,computer - Abstract
Multi-instance learning is a generalisation of attribute-value learning where examples for learning consist of labeled bags (i.e. multi-sets) of instances. This learning setting is more computationally challenging than attribute-value learning and a natural fit for important application areas of machine learning such as classification of molecules and image classification. One approach to solve multi-instance learning problems is to apply propositionalisation, where bags of data are converted into vectors of attribute-value pairs so that a standard propositional (i.e. attribute-value) learning algorithm can be applied. This approach is attractive because of the large number of propositional learning algorithms that have been developed and can thus be applied to the propositionalised data. In this paper, we empirically investigate a variant of an existing propositionalisation method called TLC. TLC uses a single decision tree to obtain propositionalised data. Our variant applies a random forest instead and is motivated by the potential increase in robustness that this may yield. We present results on synthetic and real-world data from the above two application domains showing that it indeed yields increased classification accuracy when applying boosting and support vector machines to classify the propositionalised data.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.