Database: 3 selected / Journal: machine learning / Topic: algorithms and machine learning - Searchworks@Jio Institute Digital Library Search Results

1. Finite-time error bounds for Greedy-GQ.

Author: Wang, Yue, Zhou, Yi, and Zou, Shaofeng
Subjects: MACHINE learning, ALGORITHMS, CONFERENCES & conventions
Abstract: Greedy-GQ with linear function approximation, originally proposed in Maei et al. (in: Proceedings of the international conference on machine learning (ICML), 2010), is a value-based off-policy algorithm for optimal control in reinforcement learning, and it has a non-linear two timescale structure with non-convex objective function. This paper develops its tightest finite-time error bounds. We show that the Greedy-GQ algorithm converges as fast as O (1 / T) under the i.i.d. setting and O (log T / T) under the Markovian setting. We further design variant of the vanilla Greedy-GQ algorithm using the nested-loop approach, and show that its sample complexity is O (log (1 / ϵ) ϵ - 2 ) , which matches with the one of the vanilla Greedy-GQ. Our finite-time error bounds match with the one of the stochastic gradient descent algorithm for general smooth non-convex optimization problems, despite of its additonal challenge in the two time-scale updates. Our finite-sample analysis provides theoretical guidance on choosing step-sizes for faster convergence in practice, and suggests the trade-off between the convergence rate and the quality of the obtained policy. Our techniques provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Unified SVM algorithm based on LS-DC loss.

Author: Zhou, Shuisheng and Zhou, Wendi
Subjects: MACHINE learning, NONCONVEX programming, CLASSIFICATION algorithms, SUPERVISED learning, SUPPORT vector machines, ALGORITHMS, NONLINEAR equations
Abstract: Over the past two decades, support vector machines (SVMs) have become a popular supervised machine learning model, and plenty of distinct algorithms are designed separately based on different KKT conditions of the SVM model for classification/regression with different losses, including convex and or nonconvex loss. In this paper, we propose an algorithm that can train different SVM models in a unified scheme. First, we introduce a definition of the least squares type of difference of convex loss (LS-DC) and show that the most commonly used losses in the SVM community are LS-DC loss or can be approximated by LS-DC loss. Based on the difference of convex algorithm (DCA), we then propose a unified algorithm called UniSVM which can solve the SVM model with any convex or nonconvex LS-DC loss, wherein only a vector is computed by the specifically chosen loss. UniSVM has a dominant advantage over all existing algorithms for training robust SVM models with nonconvex losses because it has a closed-form solution per iteration, while the existing algorithms always need to solve an L1SVM/L2SVM per iteration. Furthermore, by the low-rank approximation of the kernel matrix, UniSVM can solve large-scale nonlinear problems efficiently. To verify the efficacy and feasibility of the proposed algorithm, we perform many experiments on small artificial problems and large benchmark tasks both with and without outliers for classification and regression for comparison with state-of-the-art algorithms. The experimental results demonstrate that UniSVM can achieve comparable performance in less training time. The foremost advantage of UniSVM is that its core code in Matlab is less than 10 lines; hence, it can be easily grasped by users or researchers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Algorithm selection on a meta level.

Author: Tornede, Alexander, Gehring, Lukas, Tornede, Tanja, Wever, Marcel, and Hüllermeier, Eyke
Subjects: ALGORITHMS, MACHINE learning, LINEAR complementarity problem
Abstract: The problem of selecting an algorithm that appears most suitable for a specific instance of an algorithmic problem class, such as the Boolean satisfiability problem, is called instance-specific algorithm selection. Over the past decade, the problem has received considerable attention, resulting in a number of different methods for algorithm selection. Although most of these methods are based on machine learning, surprisingly little work has been done on meta learning, that is, on taking advantage of the complementarity of existing algorithm selection methods in order to combine them into a single superior algorithm selector. In this paper, we introduce the problem of meta algorithm selection, which essentially asks for the best way to combine a given set of algorithm selectors. We present a general methodological framework for meta algorithm selection as well as several concrete learning methods as instantiations of this framework, essentially combining ideas of meta learning and ensemble learning. In an extensive experimental evaluation, we demonstrate that ensembles of algorithm selectors can significantly outperform single algorithm selectors and have the potential to form the new state of the art in algorithm selection. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Unsupervised feature selection based on kernel fisher discriminant analysis and regression learning.

Author: Shang, Ronghua, Meng, Yang, Liu, Chiyang, Jiao, Licheng, Esfahani, Amir M. Ghalamzan, and Stolkin, Rustam
Subjects: FEATURE selection, FISHER discriminant analysis, MACHINE learning, CLUSTER analysis (Statistics), MATHEMATICAL optimization, ALGORITHMS
Abstract: In this paper, we propose a new feature selection method called kernel fisher discriminant analysis and regression learning based algorithm for unsupervised feature selection. The existing feature selection methods are based on either manifold learning or discriminative techniques, each of which has some shortcomings. Although some studies show the advantages of two-steps method benefiting from both manifold learning and discriminative techniques, a joint formulation has been shown to be more efficient. To do so, we construct a global discriminant objective term of a clustering framework based on the kernel method. We add another term of regression learning into the objective function, which can impose the optimization to select a low-dimensional representation of the original dataset. We use L2,1-norm of the features to impose a sparse structure upon features, which can result in more discriminative features. We propose an algorithm to solve the optimization problem introduced in this paper. We further discuss convergence, parameter sensitivity, computational complexity, as well as the clustering and classification accuracy of the proposed algorithm. In order to demonstrate the effectiveness of the proposed algorithm, we perform a set of experiments with different available datasets. The results obtained by the proposed algorithm are compared against the state-of-the-art algorithms. These results show that our method outperforms the existing state-of-the-art methods in many cases on different datasets, but the improved performance comes with the cost of increased time complexity. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

5. Transfer learning by mapping and revising boosted relational dependency networks.

Author: Azevedo Santos, Rodrigo, Paes, Aline, and Zaverucha, Gerson
Subjects: STATISTICAL learning, REGRESSION trees, TECHNOLOGY transfer, OPERATOR theory, MACHINE learning, ALGORITHMS
Abstract: Statistical machine learning algorithms usually assume the availability of data of considerable size to train the models. However, they would fail in addressing domains where data is difficult or expensive to obtain. Transfer learning has emerged to address this problem of learning from scarce data by relying on a model learned in a source domain where data is easy to obtain to be a starting point for the target domain. On the other hand, real-world data contains objects and their relations, usually gathered from noisy environments. Finding patterns through such uncertain relational data has been the focus of the Statistical Relational Learning (SRL) area. Thus, to address domains with scarce, relational, and uncertain data, in this paper, we propose TreeBoostler, an algorithm that transfers the SRL state-of-the-art Boosted Relational Dependency Networks learned in a source domain to the target domain. TreeBoostler first finds a mapping between pairs of predicates to accommodate the additive trees into the target vocabulary. After, it employs two theory revision operators devised to handle incorrect relational regression trees aiming at improving the performance of the mapped trees. In the experiments presented in this paper, TreeBoostler has successfully transferred knowledge between several distinct domains. Moreover, it performs comparably or better than learning from scratch methods in terms of accuracy and outperforms a transfer learning approach in terms of accuracy and runtime. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

6. JGPR: a computationally efficient multi-target Gaussian process regression algorithm.

Author: Nabati, Mohammad, Ghorashi, Seyed Ali, and Shahbazian, Reza
Subjects: KRIGING, ALGORITHMS, COVARIANCE matrices, COMPUTATIONAL complexity
Abstract: Multi-target regression algorithms are designed to predict multiple outputs at the same time, and allow us to take all output variables into account during the training phase. Despite the recent advances, this context of machine learning is still an open challenge for developing a low-cost and high accurate algorithm. The main challenge in multi-target regression algorithms is how to use different targets' information in the training and/or test phases. In this paper, we introduce a low-cost multi-target Gaussian process regression (GPR) algorithm, called joint GPR (JGPR) that employs a shared covariance matrix among the targets during the training phase and solves a sub-optimal cost function for optimization of hyperparameters. The proposed strategy reduces the computational complexity considerably during the training and test phases and simultaneously avoids overfitting of the multi-target regression algorithm upon the targets. We have performed extensive experiments on both simulated data and 18 benchmark datasets to assess the proposed method compared with other multi-target regression algorithms. Experimental results show that the proposed JGPR outperforms the state-of-the-art approaches on most of the given benchmark datasets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

7. ReliefE: feature ranking in high-dimensional spaces via manifold embeddings.

Author: Škrlj, Blaž, Džeroski, Sašo, Lavrač, Nada, and Petković, Matej
Subjects: SPARSE matrices, MACHINE learning, ALGORITHMS
Abstract: Feature ranking has been widely adopted in machine learning applications such as high-throughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and not-well suited for high-dimensional sparse input spaces. In contrast, recent embedding-based methods learn compact, low-dimensional representations, potentially facilitating down-stream learning capabilities of conventional learners. This paper explores how the Relief branch of algorithms can be adapted to benefit from (Riemannian) manifold-based embeddings of instance and target spaces, where a given embedding's dimensionality is intrinsic to the dimensionality of the considered data set. The developed ReliefE algorithm is faster and can result in better feature rankings, as shown by our evaluation on 20 real-life data sets for multi-class and multi-label classification tasks. The utility of ReliefE for high-dimensional data sets is ensured by its implementation that utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE to other ranking algorithms is studied via the Fuzzy Jaccard Index. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. Learning from interpretation transition using differentiable logic programming semantics.

Author: Gao, Kun, Wang, Hanpin, Cao, Yongzhi, and Inoue, Katsumi
Subjects: LOGIC programming, MATHEMATICAL logic, SEMANTICS (Philosophy), ARTIFICIAL neural networks, FIRST-order logic, ALGORITHMS, INDUCTION (Logic)
Abstract: The combination of learning and reasoning is an essential and challenging topic in neuro-symbolic research. Differentiable inductive logic programming is a technique for learning a symbolic knowledge representation from either complete, mislabeled, or incomplete observed facts using neural networks. In this paper, we propose a novel differentiable inductive logic programming system called differentiable learning from interpretation transition (D-LFIT) for learning logic programs through the proposed embeddings of logic programs, neural networks, optimization algorithms, and an adapted algebraic method to compute the logic program semantics. The proposed model has several characteristics, including a small number of parameters, the ability to generate logic programs in a curriculum-learning setting, and linear time complexity for the extraction of trained neural networks. The well-known bottom clause positionalization algorithm is incorporated when the proposed system learns from relational datasets. We compare our model with NN-LFIT, which extracts propositional logic rules from retuned connected networks, the highly accurate rule learner RIPPER, the purely symbolic LFIT system LF1T, and CILP++, which integrates neural networks and the propositionalization method to handle first-order logic knowledge. From the experimental results, we conclude that D-LFIT yields comparable accuracy with respect to the baselines when given complete, incomplete, and mislabeled data. Our experimental results indicate that D-LFIT not only learns symbolic logic programs quickly and precisely but also performs robustly when processing mislabeled and incomplete datasets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Aggregating Algorithm for prediction of packs.

Author: Adamskiy, Dmitry, Bellotti, Anthony, Dzhamtyrova, Raisa, and Kalnishkan, Yuri
Subjects: HOME prices, PREDICTION theory, ALGORITHMS
Abstract: This paper formulates a protocol for prediction of packs, which is a special case of on-line prediction under delayed feedback. Under the prediction of packs protocol, the learner must make a few predictions without seeing the respective outcomes and then the outcomes are revealed in one go. The paper develops the theory of prediction with expert advice for packs by generalising the concept of mixability. We propose a number of merging algorithms for prediction of packs with tight worst case loss upper bounds similar to those for Vovk's Aggregating Algorithm. Unlike existing algorithms for delayed feedback settings, our algorithms do not depend on the order of outcomes in a pack. Empirical experiments on sports and house price datasets are carried out to study the performance of the new algorithms and compare them against an existing method. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

10. Extreme value correction: a method for correcting optimistic estimations in rule learning.

Author: Možina, Martin, Demšar, Janez, Bratko, Ivan, and Žabkar, Jure
Subjects: ERROR correction (Information theory), MACHINE learning, ALGORITHMS, VALUE distribution theory, DATABASES
Abstract: Machine learning algorithms rely on their ability to evaluate the constructed hypotheses for choosing the optimal hypothesis during learning and assessing the quality of the model afterwards. Since these estimates, in particular the former ones, are based on the training data from which the hypotheses themselves were constructed, they are usually optimistic. The paper shows three different solutions; two for the artificial boundary cases with the smallest and the largest optimism and a general correction procedure called extreme value correction (EVC) based on extreme value distribution. We demonstrate the application of the technique to rule learning, specifically to estimating classification accuracy of a single rule, and evaluate it on an artificial data set and on a number of UCI data sets. We observed that the correction successfully improved the accuracy estimates. We also describe an approach for combining rules into a linear global classifier and show that using EVC estimates leads to more accurate classifiers. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

11. RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification.

Author: Koziarski, Michał, Bellinger, Colin, and Woźniak, Michał
Subjects: RESAMPLING (Statistics), ALGORITHMS, DATA distribution, RADIAL basis functions, CLASSIFICATION
Abstract: Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5 × 2 cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

12. Ordinal regression with explainable distance metric learning based on ordered sequences.

Author: Suárez, Juan Luis, García, Salvador, and Herrera, Francisco
Subjects: DISTANCE education, CASE-based reasoning, MACHINE learning, KERNEL functions, ALGORITHMS
Abstract: The purpose of this paper is to introduce a new distance metric learning algorithm for ordinal regression. Ordinal regression addresses the problem of predicting classes for which there is a natural ordering, but the real distances between classes are unknown. Since ordinal regression walks a fine line between standard regression and classification, it is a common pitfall to either apply a regression-like numerical treatment of variables or underrate the ordinal information applying nominal classification techniques. On a different note, distance metric learning is a discipline that has proven to be very useful when improving distance-based algorithms such as the nearest neighbors classifier. In addition, an appropriate distance can enhance the explainability of this model. In our study we propose an ordinal approach to learning a distance, called chain maximizing ordinal metric learning. It is based on the maximization of ordered sequences in local neighborhoods of the data. This approach takes into account all the ordinal information in the data without making use of any of the two extremes of classification or regression, and it is able to adapt to data for which the class separations are not clear. We also show how to extend the algorithm to learn in a non-linear setup using kernel functions. We have tested our algorithm on several ordinal regression problems, showing a high performance under the usual evaluation metrics in this domain. Results are verified through Bayesian non-parametric testing. Finally, we explore the capabilities of our algorithm in terms of explainability using the case-based reasoning approach. We show these capabilities empirically on two different datasets, experiencing significant improvements over the case-based reasoning with the traditional Euclidean nearest neighbors. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. An empirical analysis of binary transformation strategies and base algorithms for multi-label learning.

Author: Rivolli, Adriano, Read, Jesse, Soares, Carlos, Pfahringer, Bernhard, and de Carvalho, André C. P. L. F.
Subjects: MACHINE learning, ALGORITHMS, LABELS, EVALUATION methodology
Abstract: Investigating strategies that are able to efficiently deal with multi-label classification tasks is a current research topic in machine learning. Many methods have been proposed, making the selection of the most suitable strategy a challenging issue. From this premise, this paper presents an extensive empirical analysis of the binary transformation strategies and base algorithms for multi-label learning. This subset of strategies uses the one-versus-all approach to transform the original data, generating one binary data set per label, upon which any binary base algorithm can be applied. Considering that the influence of the base algorithm on the predictive performance obtained by the strategies has not been considered in depth by many empirical studies, we investigated the influence of distinct base algorithms on the performance of several strategies. Thus, this study covers a family of multi-label strategies using a diversified range of base algorithms, exploring their relationship over different perspectives. This finding has significant implications concerning the methodology of evaluation adopted in multi-label experiments containing binary transformation strategies, given that multiple base algorithms should be considered. Despite these improvements in strategy and base algorithms, for many data sets, a large number of labels, mainly those less frequent, were either never predicted, or always misclassified. We conclude the experimental analysis by recommending strategies and base algorithms in accordance with different performance criteria. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

14. Fast and scalable Lasso via stochastic Frank-Wolfe methods with a convergence guarantee.

Author: Frandi, Emanuele, Ñanculef, Ricardo, Lodi, Stefano, Sartori, Claudio, and Suykens, Johan
Subjects: STOCHASTIC convergence, ALGORITHMS, MACHINE learning, MATHEMATICAL optimization, REGRESSION analysis, MATHEMATICAL regularization
Abstract: Frank-Wolfe (FW) algorithms have been often proposed over the last few years as efficient solvers for a variety of optimization problems arising in the field of machine learning. The ability to work with cheap projection-free iterations and the incremental nature of the method make FW a very effective choice for many large-scale problems where computing a sparse model is desirable. In this paper, we present a high-performance implementation of the FW method tailored to solve large-scale Lasso regression problems, based on a randomized iteration, and prove that the convergence guarantees of the standard FW method are preserved in the stochastic setting. We show experimentally that our algorithm outperforms several existing state of the art methods, including the Coordinate Descent algorithm by Friedman et al. (one of the fastest known Lasso solvers), on several benchmark datasets with a very large number of features, without sacrificing the accuracy of the model. Our results illustrate that the algorithm is able to generate the complete regularization path on problems of size up to four million variables in <1 min. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

15. A survey of class-imbalanced semi-supervised learning.

Author: Gui, Qian, Zhou, Hong, Guo, Na, and Niu, Baoning
Subjects: ARTIFICIAL neural networks, SUPERVISED learning, MACHINE learning, DEEP learning, ALGORITHMS
Abstract: Semi-supervised learning(SSL) can substantially improve the performance of deep neural networks by utilizing unlabeled data when labeled data is scarce. The state-of-the-art(SOTA) semi-supervised algorithms implicitly assume that the class distribution of labeled datasets and unlabeled datasets are balanced, which means the different classes have the same numbers of training samples. However, they can hardly perform well on minority classes when the class distribution of training data is imbalanced. Recent work has found several ways to decrease the degeneration of semi-supervised learning models in class-imbalanced learning. In this article, we comprehensively review class-imbalanced semi-supervised learning (CISSL), starting with an introduction to this field, followed by a realistic evaluation of existing class-imbalanced semi-supervised learning algorithms and a brief summary of them. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. New algorithms for trace-ratio problem with application to high-dimension and large-sample data dimensionality reduction.

Author: Shi, Wenya and Wu, Gang
Subjects: SINGULAR value decomposition, DATA reduction, ALGORITHMS, LINEAR systems, MACHINE learning
Abstract: Learning large-scale data sets with high dimensionality is a main concern in research areas including machine learning, visual recognition, information retrieval, to name a few. In many practical uses such as images, video, audio, and text processing, we have to face with high-dimension and large-sample data problems. The trace-ratio problem is a key problem for feature extraction and dimensionality reduction to circumvent the high dimensional space. However, it has been long believed that this problem has no closed-form solution, and one has to solve it by using some inner-outer iterative algorithms that are very time consuming. Therefore, efficient algorithms for high-dimension and large-sample trace-ratio problems are still lacking, especially for dense data problems. In this work, we present a closed-form solution for the trace-ratio problem, and propose two algorithms to solve it. Based on the formula and the randomized singular value decomposition, we first propose a randomized algorithm for solving high-dimension and large-sample dense trace-ratio problems. For high-dimension and large-sample sparse trace-ratio problems, we then propose an algorithm based on the closed-form solution and solving some consistent under-determined linear systems. Theoretical results are established to show the rationality and efficiency of the proposed methods. Numerical experiments are performed on some real-world data sets, which illustrate the superiority of the proposed algorithms over many state-of-the-art algorithms for high-dimension and large-sample dimensionality reduction problems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Lifted discriminative learning of probabilistic logic programs.

Author: Nguembang Fadja, Arnaud and Riguzzi, Fabrizio
Subjects: LOGIC programming, STATISTICAL learning, ALGORITHMS, MACHINE learning, INDUCTIVE logic programming
Abstract: Probabilistic logic programming (PLP) provides a powerful tool for reasoning with uncertain relational models. However, learning probabilistic logic programs is expensive due to the high cost of inference. Among the proposals to overcome this problem, one of the most promising is lifted inference. In this paper we consider PLP models that are amenable to lifted inference and present an algorithm for performing parameter and structure learning of these models from positive and negative examples. We discuss parameter learning with EM and LBFGS and structure learning with LIFTCOVER, an algorithm similar to SLIPCOVER. The results of the comparison of LIFTCOVER with SLIPCOVER on 12 datasets show that it can achieve solutions of similar or better quality in a fraction of the time. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

18. Improved linear embeddings via Lagrange duality.

Author: Sheth, Kshiteej, Garg, Dinesh, and Dasgupta, Anirban
Subjects: EMBEDDINGS (Mathematics), DATA science, MACHINE learning, POLYNOMIAL time algorithms, MATHEMATICAL optimization, ALGORITHMS, APPROXIMATION theory
Abstract: Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create a dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

19. A decomposition of the outlier detection problem into a set of supervised learning problems.

Author: Paulheim, Heiko and Meusel, Robert
Subjects: OUTLIER detection, SUPERVISED learning, DECOMPOSITION method, DATA analysis, ALGORITHMS, DEVIATION (Statistics)
Abstract: Outlier detection methods automatically identify instances that deviate from the majority of the data. In this paper, we propose a novel approach for unsupervised outlier detection, which re-formulates the outlier detection problem in numerical data as a set of supervised regression learning problems. For each attribute, we learn a predictive model which predicts the values of that attribute from the values of all other attributes, and compute the deviations between the predictions and the actual values. From those deviations, we derive both a weight for each attribute, and a final outlier score using those weights. The weights help separating the relevant attributes from the irrelevant ones, and thus make the approach well suitable for discovering outliers otherwise masked in high-dimensional data. An empirical evaluation shows that our approach outperforms existing algorithms, and is particularly robust in datasets with many irrelevant attributes. Furthermore, we show that if a symbolic machine learning method is used to solve the individual learning problems, the approach is also capable of generating concise explanations for the detected outliers. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

20. On evaluating stream learning algorithms.

Author: Gama, João, Sebastião, Raquel, and Rodrigues, Pedro
Subjects: MACHINE learning, ALGORITHMS, STREAMING technology, STATISTICAL hypothesis testing, DECISION making, ERRORS
Abstract: Most streaming decision models evolve continuously over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet convincingly addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of prequential error with forgetting mechanisms to provide reliable error estimators. We prove that, in stationary data and for consistent learning algorithms, the holdout estimator, the prequential error and the prequential error estimated over a sliding window or using fading factors, all converge to the Bayes error. The use of prequential error with forgetting mechanisms reveals to be advantageous in assessing performance and in comparing stream learning algorithms. It is also worthwhile to use the proposed methods for hypothesis testing and for change detection. In a set of experiments in drift scenarios, we evaluate the ability of a standard change detection algorithm to detect change using three prequential error estimators. These experiments point out that the use of forgetting mechanisms (sliding windows or fading factors) are required for fast and efficient change detection. In comparison to sliding windows, fading factors are faster and memoryless, both important requirements for streaming applications. Overall, this paper is a contribution to a discussion on best practice for performance assessment when learning is a continuous process, and the decision models are dynamic and evolve over time. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

21. Experiment databases.

Author: Vanschoren, Joaquin, Blockeel, Hendrik, Pfahringer, Bernhard, and Holmes, Geoffrey
Subjects: DATABASES, MACHINE learning, EXPERIMENTS, EXPERIMENTAL design, ALGORITHMS
Abstract: Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

22. Adaptive partitioning schemes for bipartite ranking.

Author: Clémençon, Stéphan, Depecker, Marine, and Vayatis, Nicolas
Subjects: RECURSIVE partitioning, BIPARTITE graphs, MACHINE learning, ALGORITHMS, MATHEMATICAL variables
Abstract: Recursive partitioning methods are among the most popular techniques in machine learning. The purpose of this paper is to investigate how to adapt this methodology to the bipartite ranking problem. Following in the footsteps of the TreeRank approach developed in Clémençon and Vayatis (Proceedings of the 2008 Conference on Algorithmic Learning Theory, and IEEE Trans. Inf. Theory 55(9):4316-4336, ), we present tree-structured algorithms designed for learning to rank instances based on classification data. The main contributions of the present work are the following: the practical implementation of the TreeRank algorithm, well-founded solutions to the crucial issues related to the splitting rule and the choice of the 'right' size for the ranking tree. From the angle embraced in this paper, splitting is viewed as a cost-sensitive classification task with data-dependent cost. Hence, up to straightforward modifications, any classification algorithm may serve as a splitting rule. Also, we propose to implement a cost-complexity pruning method after the growing stage in order to produce a 'right-sized' ranking sub-tree with large AUC. In particular, performance bounds are established for pruning schemes inspired by recent work on nonparametric model selection. Eventually, we propose indicators for variable importance and variable dependence, plus various simulation studies illustrating the potential of our method. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

23. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning.

Author: Crandall, Jacob and Goodrich, Michael
Subjects: REINFORCEMENT learning, MACHINE learning, ALGORITHMS, GAME theory, LEARNING
Abstract: We consider the problem of learning in repeated general-sum matrix games when a learning algorithm can observe the actions but not the payoffs of its associates. Due to the non-stationarity of the environment caused by learning associates in these games, most state-of-the-art algorithms perform poorly in some important repeated games due to an inability to make profitable compromises. To make these compromises, an agent must effectively balance competing objectives, including bounding losses, playing optimally with respect to current beliefs, and taking calculated, but profitable, risks. In this paper, we present, discuss, and analyze M-Qubed, a reinforcement learning algorithm designed to overcome these deficiencies by encoding and balancing best-response, cautious, and optimistic learning biases. We show that M-Qubed learns to make profitable compromises across a wide-range of repeated matrix games played with many kinds of learners. Specifically, we prove that M-Qubed's average payoffs meet or exceed its maximin value in the limit. Additionally, we show that, in two-player games, M-Qubed's average payoffs approach the value of the Nash bargaining solution in self play. Furthermore, it performs very well when associating with other learners, as evidenced by its robust behavior in round-robin and evolutionary tournaments of two-player games. These results demonstrate that an agent can learn to make good compromises, and hence receive high payoffs, in repeated games by effectively encoding and balancing best-response, cautious, and optimistic learning biases. [ABSTRACT FROM AUTHOR]
Published: 2011
Full Text: View/download PDF

24. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms.

Author: Shalev-Shwartz, Shai and Singer, Yoram
Subjects: ALGORITHMS, MACHINE learning, MACHINE theory, ARTIFICIAL intelligence, LEARNING
Abstract: Boosting algorithms build highly accurate prediction mechanisms from a collection of low-accuracy predictors. To do so, they employ the notion of weak-learnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ1 margin. The equivalence is a direct consequence of von Neumann’s minimax theorem. Nonetheless, we derive the equivalence directly using Fenchel duality. We then use our derivation to describe a family of relaxations to the weak-learnability assumption that readily translates to a family of relaxations of linear separability with margin. This alternative perspective sheds new light on known soft-margin boosting algorithms and also enables us to derive several new relaxations of the notion of linear separability. Last, we describe and analyze an efficient boosting framework that can be used for minimizing the loss functions derived from our family of relaxations. In particular, we obtain efficient boosting algorithms for maximizing hard and soft versions of the ℓ1 margin. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

25. The generalization performance of ERM algorithm with strongly mixing observations.

Author: Zou, Bin, Li, Luoqing, and Xu, Zongben
Subjects: GENERALIZATION, ALGORITHMS, MACHINE learning, STOCHASTIC convergence, EXPONENTIAL functions, RISK
Abstract: The generalization performance is the main concern of machine learning theoretical research. The previous main bounds describing the generalization ability of the Empirical Risk Minimization (ERM) algorithm are based on independent and identically distributed (i.i.d.) samples. In order to study the generalization performance of the ERM algorithm with dependent observations, we first establish the exponential bound on the rate of relative uniform convergence of the ERM algorithm with exponentially strongly mixing observations, and then we obtain the generalization bounds and prove that the ERM algorithm with exponentially strongly mixing observations is consistent. The main results obtained in this paper not only extend the previously known results for i.i.d. observations to the case of exponentially strongly mixing observations, but also improve the previous results for strongly mixing samples. Because the ERM algorithm is usually very time-consuming and overfitting may happen when the complexity of the hypothesis space is high, as an application of our main results we also explore a new strategy to implement the ERM algorithm in high complexity hypothesis space. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

26. Joint feature re-extraction and classification using an iterative semi-supervised support vector machine algorithm.

Author: Yuanqing Li and Cuntai Guan
Subjects: MACHINE learning, ARTIFICIAL intelligence, MACHINE theory, ALGORITHMS, FEATURE extraction, ELECTROENCEPHALOGRAPHY
Abstract: Abstract The focus of this paper is on joint feature re-extraction and classification in cases when the training data set is small. An iterative semi-supervised support vector machine (SVM) algorithm is proposed, where each iteration consists both feature re-extraction and classification, and the feature re-extraction is based on the classification results from the previous iteration. Feature extraction is first discussed in the framework of Rayleigh coefficient maximization. The effectiveness of common spatial pattern (CSP) feature, which is commonly used in Electroencephalogram (EEG) data analysis and EEG-based brain computer interfaces (BCIs), can be explained by Rayleigh coefficient maximization. Two other features are also defined using the Rayleigh coefficient. These features are effective for discriminating two classes with different means or different variances. If we extract features based on Rayleigh coefficient maximization, a large training data set with labels is required in general; otherwise, the extracted features are not reliable. Thus we present an iterative semi-supervised SVM algorithm embedded with feature re-extraction. This iterative algorithm can be used to extract these three features reliably and perform classification simultaneously in cases where the training data set is small. Each iteration is composed of two main steps: (i) the training data set is updated/augmented using unlabeled test data with their predicted labels; features are re-extracted based on the augmented training data set. (ii) The re-extracted features are classified by a standard SVM. Regarding parameter setting and model selection of our algorithm, we also propose a semi-supervised learning-based method using the Rayleigh coefficient, in which both training data and test data are used. This method is suitable when cross-validation model selection may not work for small training data set. Finally, the results of data analysis are presented to demonstrate the validity of our approach. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

27. Generalized exploration in policy search.

Author: Hoof, Herke, Tanneberg, Daniel, and Peters, Jan
Subjects: REINFORCEMENT learning, MACHINE learning, ALGORITHMS, ENTROPY (Information theory), ERGODIC theory
Abstract: To learn control policies in unknown environments, learning agents need to explore by trying actions deemed suboptimal. In prior work, such exploration is performed by either perturbing the actions at each time-step independently, or by perturbing policy parameters over an entire episode. Since both of these strategies have certain advantages, a more balanced trade-off could be beneficial. We introduce a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs. This trade-off strategy can be used with various reinforcement learning algorithms. In this paper, we study this generalized exploration strategy in a policy gradient method and in relative entropy policy search. We evaluate the exploration strategy on four dynamical systems and compare the results to the established step-based and episode-based exploration strategies. Our results show that a more balanced trade-off can yield faster learning and better final policies, and illustrate some of the effects that cause these performance differences. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

28. Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors.

Author: Ting, Kai, Washio, Takashi, Wells, Jonathan, and Aryal, Sunil
Subjects: LEARNING curve, ALGORITHMS, DETECTORS, MACHINE learning, INFORMATION processing
Abstract: Conventional wisdom in machine learning says that all algorithms are expected to follow the trajectory of a learning curve which is often colloquially referred to as 'more data the better'. We call this 'the gravity of learning curve', and it is assumed that no learning algorithms are 'gravity-defiant'. Contrary to the conventional wisdom, this paper provides the theoretical analysis and the empirical evidence that nearest neighbour anomaly detectors are gravity-defiant algorithms. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

29. Triadic Formal Concept Analysis and triclustering: searching for optimal patterns.

Author: Mirkin, Boris, Ignatov, Dmitry, Gnatyshak, Dmitry, and Kuznetsov, Sergei
Subjects: CLUSTER analysis (Statistics), SEQUENTIAL pattern mining, ALGORITHMS, DENSITY, NP-complete problems, LEAST squares, MACHINE learning, LAPLACIAN matrices
Abstract: This paper presents several definitions of 'optimal patterns' in triadic data and results of experimental comparison of five triclustering algorithms on real-world and synthetic datasets. The evaluation is carried over such criteria as resource efficiency, noise tolerance and quality scores involving cardinality, density, coverage, and diversity of the patterns. An ideal triadic pattern is a totally dense maximal cuboid (formal triconcept). Relaxations of this notion under consideration are: OAC-triclusters; triclusters optimal with respect to the least-square criterion; and graph partitions obtained by using spectral clustering. We show that searching for an optimal tricluster cover is an NP-complete problem, whereas determining the number of such covers is #P-complete. Our extensive computational experiments lead us to a clear strategy for choosing a solution at a given dataset guided by the principle of Pareto-optimality according to the proposed criteria. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

30. Consensus hashing.

Author: Leng, Cong and Cheng, Jian
Subjects: HASHING, NEAREST neighbor analysis (Statistics), MACHINE learning, INFORMATION retrieval, ALGORITHMS, BIG data, IMAGE recognition (Computer vision)
Abstract: Hashing techniques have been widely used in many machine learning applications because of their efficiency in both computation and storage. Although a variety of hashing methods have been proposed, most of them make some implicit assumptions about the statistical or geometrical structure of data. In fact, few hashing algorithms can adequately handle all kinds of data with different structures. When considering hybrid structure datasets, different hashing algorithms might produce different and possibly inconsistent binary codes. Inspired by the successes of classifier combination and clustering ensembles, in this paper, we present a novel combination strategy for multiple hashing results, named consensus hashing. By defining the measure of consensus of two hashing results, we put forward a simple yet effective model to learn consensus hash functions which generate binary codes consistent with the existing ones. Extensive experiments on several large scale benchmarks demonstrate the overall superiority of the proposed method compared with state-of-the-art hashing algorithms. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

31. Linearized alternating direction method with parallel splitting and adaptive penalty for separable convex programs in machine learning.

Author: Lin, Zhouchen, Liu, Risheng, and Li, Huan
Subjects: MACHINE learning, CONVEX programming, SUPERVISED learning, ALGORITHMS, MACHINE theory
Abstract: Many problems in machine learning and other fields can be (re)formulated as linearly constrained separable convex programs. In most of the cases, there are multiple blocks of variables. However, the traditional alternating direction method (ADM) and its linearized version (LADM, obtained by linearizing the quadratic penalty term) are for the two-block case and cannot be naively generalized to solve the multi-block case. So there is great demand on extending the ADM based methods for the multi-block case. In this paper, we propose LADM with parallel splitting and adaptive penalty (LADMPSAP) to solve multi-block separable convex programs efficiently. When all the component objective functions have bounded subgradients, we obtain convergence results that are stronger than those of ADM and LADM, e.g., allowing the penalty parameter to be unbounded and proving the sufficient and necessary conditions for global convergence. We further propose a simple optimality measure and reveal the convergence rate of LADMPSAP in an ergodic sense. For programs with extra convex set constraints, with refined parameter estimation we devise a practical version of LADMPSAP for faster convergence. Finally, we generalize LADMPSAP to handle programs with more difficult objective functions by linearizing part of the objective function as well. LADMPSAP is particularly suitable for sparse representation and low-rank recovery problems because its subproblems have closed form solutions and the sparsity and low-rankness of the iterates can be preserved during the iteration. It is also highly parallelizable and hence fits for parallel or distributed computing. Numerical experiments testify to the advantages of LADMPSAP in speed and numerical accuracy. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

32. Adaptive Euclidean maps for histograms: generalized Aitchison embeddings.

Author: Le, Tam and Cuturi, Marco
Subjects: HISTOGRAMS, MACHINE learning, EMBEDDINGS (Mathematics), EUCLIDEAN distance, ALGORITHMS
Abstract: Learning distances that are specifically designed to compare histograms in the probability simplex has recently attracted the attention of the machine learning community. Learning such distances is important because most machine learning problems involve bags of features rather than simple vectors. Ample empirical evidence suggests that the Euclidean distance in general and Mahalanobis metric learning in particular may not be suitable to quantify distances between points in the simplex. We propose in this paper a new contribution to address this problem by generalizing a family of embeddings proposed by Aitchison (J R Stat Soc 44:139-177, ) to map the probability simplex onto a suitable Euclidean space. We provide algorithms to estimate the parameters of such maps by building on previous work on metric learning approaches. The criterion we study is not convex, and we consider alternating optimization schemes as well as accelerated gradient descent approaches. These algorithms lead to representations that outperform alternative approaches to compare histograms in a variety of contexts. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

33. Learning policies for battery usage optimization in electric vehicles.

Author: Ermon, Stefano, Xue, Yexiang, Gomes, Carla, and Selman, Bart
Subjects: ELECTRIC vehicles, ELECTRIC batteries, MACHINE learning, SUPERCAPACITORS, SERVICE life, ALGORITHMS
Abstract: The high cost, limited capacity, and long recharge time of batteries pose a number of obstacles for the widespread adoption of electric vehicles. Multi-battery systems that combine a standard battery with supercapacitors are currently one of the most promising ways to increase battery lifespan and reduce operating costs. However, their performance crucially depends on how they are designed and operated. In this paper, we formalize the problem of optimizing real-time energy management of multi-battery systems as a stochastic planning problem, and we propose a novel solution based on a combination of optimization, machine learning and data-mining techniques. We evaluate the performance of our intelligent energy management system on various large datasets of commuter trips crowdsourced in the United States. We show that our policy significantly outperforms the leading algorithms that were previously proposed as part of an open algorithmic challenge. Further, we show how to extend our approach to an incremental learning setting, where the policy is capable of improving and adapting as new data is being collected over time. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

34. Preference-based learning to rank.

Author: Ailon, Nir and Mohri, Mehryar
Subjects: MACHINE learning, ALGORITHMS, ALGEBRA, MACHINE theory, ARTIFICIAL intelligence
Abstract: This paper presents an efficient preference-based ranking algorithm running in two stages. In the first stage, the algorithm learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes use of that preference function to produce an accurate ranking, thereby reducing the learning problem of ranking to binary classification. This reduction is based on the familiar QuickSort and guarantees an expected pairwise misranking loss of at most twice that of the binary classifier derived in the first stage. Furthermore, in the important special case of bipartite ranking, the factor of two in loss is reduced to one. This improved bound also applies to the regret achieved by our ranking and that of the binary classifier obtained. Our algorithm is randomized, but we prove a lower bound for any deterministic reduction of ranking to binary classification showing that randomization is necessary to achieve our guarantees. This, and a recent result by Balcan et al., who show a regret bound of two for a deterministic algorithm in the bipartite case, suggest a trade-off between achieving low regret and determinism in this context. Our reduction also admits an improved running time guarantee with respect to that deterministic algorithm. In particular, the number of calls to the preference function in the reduction is improved from Ω( n2) to O( nlog n). In addition, when the top k ranked elements only are required ( k≪ n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be further reduced to O( klog k+ n). Our algorithm is thus practical for realistic applications where the number of points to rank exceeds several thousand. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

35. Extracting certainty from uncertainty: regret bounded by variation in costs.

Author: Hazan, Elad and Kale, Satyen
Subjects: MACHINE learning, LOGICAL prediction, FORECASTING, ALGORITHMS, ONLINE algorithms, COMPUTER algorithms
Abstract: Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that of the best expert in hindsight (in other words, whose average regret approaches zero). Traditionally the regret of online algorithms was bounded in terms of the number of prediction rounds. Cesa-Bianchi, Mansour and Stoltz (Mach. Learn. 66(2–3):21–352, ) posed the question whether it is be possible to bound the regret of an online algorithm by the variation of the observed costs. In this paper we resolve this question, and prove such bounds in the fully adversarial setting, in two important online learning scenarios: prediction from expert advice, and online linear optimization. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

36. Concept learning in description logics using refinement operators.

Author: Lehmann, Jens and Hitzler, Pascal
Subjects: SEMANTIC Web, MACHINE learning, LOGIC programming, DESCRIPTION logics, REASONING, ALGORITHMS
Abstract: With the advent of the Semantic Web, description logics have become one of the most prominent paradigms for knowledge representation and reasoning. Progress in research and applications, however, is constrained by the lack of well-structured knowledge bases consisting of a sophisticated schema and instance data adhering to this schema. It is paramount that suitable automated methods for their acquisition, maintenance, and evolution will be developed. In this paper, we provide a learning algorithm based on refinement operators for the description logic ALCQ including support for concrete roles. We develop the algorithm from thorough theoretical foundations by identifying possible abstract property combinations which refinement operators for description logics can have. Using these investigations as a basis, we derive a practically useful complete and proper refinement operator. The operator is then cast into a learning algorithm and evaluated using our implementation DL-Learner. The results of the evaluation show that our approach is superior to other learning approaches on description logics, and is competitive with established ILP systems. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

37. On structured output training: hard cases and an efficient alternative.

Author: Gärtner, Thomas and Vembu, Shankar
Subjects: ALGORITHMS, PREDICTION models, MATHEMATICAL models, MACHINE learning, MACHINE theory, ARTIFICIAL intelligence
Abstract: We consider a class of structured prediction problems for which the assumptions made by state-of-the-art algorithms fail. To deal with exponentially sized output sets, these algorithms assume, for instance, that the best output for a given input can be found efficiently. While this holds for many important real world problems, there are also many relevant and seemingly simple problems where these assumptions do not hold. In this paper, we consider route prediction, which is the problem of finding a cyclic permutation of some points of interest, as an example and show that state-of-the-art approaches cannot guarantee polynomial runtime for this output set. We then present a novel formulation of the learning problem that can be trained efficiently whenever a particular ‘super-structure counting’ problem can be solved efficiently for the output set. We also list several output sets for which this assumption holds and report experimental results. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

38. An efficient algorithm for learning to rank from preference graphs.

Author: Pahikkala, Tapio, Tsivtsivadze, Evgeni, Airola, Antti, Järvinen, Jouni, and Boberg, Jorma
Subjects: RANKING, ALGORITHMS, MACHINE learning, LEAST squares, KERNEL functions, GEOMETRIC function theory, MATHEMATICAL statistics, REGRESSION analysis
Abstract: In this paper, we introduce a framework for regularized least-squares (RLS) type of ranking cost functions and we propose three such cost functions. Further, we propose a kernel-based preference learning algorithm, which we call RankRLS, for minimizing these functions. It is shown that RankRLS has many computational advantages compared to the ranking algorithms that are based on minimizing other types of costs, such as the hinge cost. In particular, we present efficient algorithms for training, parameter selection, multiple output learning, cross-validation, and large-scale learning. Circumstances under which these computational benefits make RankRLS preferable to RankSVM are considered. We evaluate RankRLS on four different types of ranking tasks using RankSVM and the standard RLS regression as the baselines. RankRLS outperforms the standard RLS regression and its performance is very similar to that of RankSVM, while RankRLS has several computational benefits over RankSVM. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

39. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns.

Author: Geoffrey Webb
Subjects: MACHINE learning, ARTIFICIAL intelligence, COMPUTATIONAL learning theory, CONTROL theory (Engineering), CODING theory, MATHEMATICAL models, ALGORITHMS, MATHEMATICAL logic
Abstract: Abstract Standard pattern discovery techniques, such as association rules, suffer an extreme risk of finding very large numbers of spurious patterns for many knowledge discovery tasks. The direct-adjustment approach to controlling this risk applies a statistical test during the discovery process, using a critical value adjusted to take account of the size of the search space. However, a problem with the direct-adjustment strategy is that it may discard numerous true patterns. This paper investigates the assignment of different critical values to different areas of the search space as an approach to alleviating this problem, using a variant of a technique originally developed for other purposes. This approach is shown to be effective at increasing the number of discoveries while still maintaining strict control over the risk of false discoveries. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

40. Inductive process modeling.

Author: Will Bridewell, Pat Langley, Ljupčo Todorovski, and Sašo Džeroski
Subjects: MACHINE learning, ARTIFICIAL intelligence, MACHINE theory, DATA mining, ALGORITHMS, MATHEMATICAL models
Abstract: Abstract In this paper, we pose a novel research problem for machine learning that involves constructing a process model from continuous data. We claim that casting learned knowledge in terms of processes with associated equations is desirable for scientific and engineering domains, where such notations are commonly used. We also argue that existing induction methods are not well suited to this task, although some techniques hold partial solutions. In response, we describe an approach to learning process models from time-series data and illustrate its behavior in three domains. In closing, we describe open issues in process model induction and encourage other researchers to tackle this important problem. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

41. Multi-Class Learning by Smoothed Boosting.

Author: Rong Jin and Jian Zhang
Subjects: ALGORITHMS, MACHINE learning, ALGEBRA, FOUNDATIONS of arithmetic
Abstract: Abstract AdaBoost.OC has been shown to be an effective method in boosting “weak” binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

42. TAN Classifiers Based on Decomposable Distributions.

Author: Jess Cerquides and Ramon Lpez de Mntaras
Subjects: MACHINE learning, ALGORITHMS, MATHEMATICAL variables, ARTIFICIAL intelligence, DATA mining
Abstract: Abstract In this paper we present several Bayesian algorithms for learning Tree Augmented Naive Bayes (TAN) models. We extend the results in Meila & Jaakkola (2000a) to TANs by proving that accepting a prior decomposable distribution over TANs, we can compute the exact Bayesian model averaging over TAN structures and parameters in polynomial time. Furthermore, we prove that the k-maximum a posteriori (MAP) TAN structures can also be computed in polynomial time. We use these results to correct minor errors in Meila & Jaakkola (2000a) and to construct several TAN based classifiers. We show that these classifiers provide consistently better predictions over Irvine datasets and artificially generated data than TAN based classifiers proposed in the literature. [ABSTRACT FROM AUTHOR]
Published: 2005
Full Text: View/download PDF

43. Microchoice Bounds and Self Bounding Learning Algorithms.

Author: John Langford and Avrim Blum
Subjects: ALGORITHMS, MACHINE learning, ARTIFICIAL intelligence, MACHINE theory
Abstract: A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998). We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency. [ABSTRACT FROM AUTHOR]
Published: 2003
Full Text: View/download PDF

44. Faster Riemannian Newton-type optimization by subsampling and cubic regularization.

Author: Deng, Yian and Mu, Tingting
Subjects: OPTIMIZATION algorithms, MACHINE learning, PROBLEM solving, MATHEMATICAL regularization, RIEMANNIAN manifolds, ALGORITHMS
Abstract: This work is on constrained large-scale non-convex optimization where the constraint set implies a manifold structure. Solving such problems is important in a multitude of fundamental machine learning tasks. Recent advances on Riemannian optimization have enabled the convenient recovery of solutions by adapting unconstrained optimization algorithms over manifolds. However, it remains challenging to scale up and meanwhile maintain stable convergence rates and handle saddle points. We propose a new second-order Riemannian optimization algorithm, aiming at improving convergence rate and reducing computational cost. It enhances the Riemannian trust-region algorithm that explores curvature information to escape saddle points through a mixture of subsampling and cubic regularization techniques. We conduct rigorous analysis to study the convergence behavior of the proposed algorithm. We also perform extensive experiments to evaluate it based on two general machine learning tasks using multiple datasets. The proposed algorithm exhibits improved computational speed, e.g., a speed improvement from 12 % to 227 % , and improved convergence behavior, e.g., an iteration number reduction from O max ϵ g - 2 ϵ H - 1 , ϵ H - 3 to O max ϵ g - 2 , ϵ H - 3 , compared to a large set of state-of-the-art Riemannian optimization algorithms. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

45. An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization.

Author: Chen, Ziyi, Kailkhura, Bhavya, and Zhou, Yi
Subjects: BILEVEL programming, NONSMOOTH optimization, OPTIMIZATION algorithms, ALGORITHMS, MACHINE learning, CONVEX geometry
Abstract: Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity O ~ (κ 3.5 ϵ - 2) over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a Łojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Imbalanced regression using regressor-classifier ensembles.

Author: Orhobor, Oghenejokpeme I., Grinberg, Nastasiya F., Soldatova, Larisa N., and King, Ross D.
Subjects: MACHINE learning, ALGORITHMS, CLASSIFICATION algorithms
Abstract: We present an extension to the federated ensemble regression using classification algorithm, an ensemble learning algorithm for regression problems which leverages the distribution of the samples in a learning set to achieve improved performance. We evaluated the extension using four classifiers and four regressors, two discretizers, and 119 responses from a wide variety of datasets in different domains. Additionally, we compared our algorithm to two resampling methods aimed at addressing imbalanced datasets. Our results show that the proposed extension is highly unlikely to perform worse than the base case, and on average outperforms the two resampling methods with significant differences in performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

47. A parameter-less algorithm for tensor co-clustering.

Author: Battaglia, Elena and Pensa, Ruggero G.
Subjects: CYBER physical systems, ALGORITHMS, RANDOM variables, MACHINE learning, FACTORIZATION, LATENT variables
Abstract: The majority of the data produced by human activities and modern cyber-physical systems involve complex relations among their features. Such relations can be often represented by means of tensors, which can be viewed as generalization of matrices and, as such, can be analyzed by using higher-order extensions of existing machine learning methods, such as clustering and co-clustering. Tensor co-clustering, in particular, has been proven useful in many applications, due to its ability of coping with n-modal data and sparsity. However, setting up a co-clustering algorithm properly requires the specification of the desired number of clusters for each mode as input parameters. This choice is already difficult in relatively easy settings, like flat clustering on data matrices, but on tensors it could be even more frustrating. To face this issue, we propose a new tensor co-clustering algorithm that does not require the number of desired co-clusters as input, as it optimizes an objective function based on a measure of association across discrete random variables (called Goodman and Kruskal's τ ) that is not affected by their cardinality. We introduce different optimization schemes and show their theoretical and empirical convergence properties. Additionally, we show the effectiveness of our algorithm on both synthetic and real-world datasets, also in comparison with state-of-the-art co-clustering methods based on tensor factorization and latent block models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. Unconfused ultraconservative multiclass algorithms.

Author: Louche, Ugo and Ralaivola, Liva
Subjects: ALGORITHMS, MACHINE learning, PERCEPTRONS, LEARNING classifier systems, KERNEL functions
Abstract: We tackle the problem of learning linear classifiers from noisy datasets in a multiclass setting. The two-class version of this problem was studied a few years ago where the proposed approaches to combat the noise revolve around a Perceptron learning scheme fed with peculiar examples computed through a weighted average of points from the noisy training set. We propose to build upon these approaches and we introduce a new algorithm called Unconfused Multiclass additive Algorithm ( U MA) which may be seen as a generalization to the multiclass setting of the previous approaches. In order to characterize the noise we use the confusion matrix as a multiclass extension of the classification noise studied in the aforementioned literature. Theoretically well-founded, U MA furthermore displays very good empirical noise robustness, as evidenced by numerical simulations conducted on both synthetic and real data. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

49. Quantum speed-up for unsupervised learning.

Author: Aïmeur, Esma, Brassard, Gilles, and Gambs, Sébastien
Subjects: MACHINE learning, ALGORITHMS, DATA mining, MATHEMATICAL analysis, MEDIAN (Mathematics)
Abstract: We show how the quantum paradigm can be used to speed up unsupervised learning algorithms. More precisely, we explain how it is possible to accelerate learning algorithms by quantizing some of their subroutines. Quantization refers to the process that partially or totally converts a classical algorithm to its quantum counterpart in order to improve performance. In particular, we give quantized versions of clustering via minimum spanning tree, divisive clustering and k-medians that are faster than their classical analogues. We also describe a distributed version of k-medians that allows the participants to save on the global communication cost of the protocol compared to the classical version. Finally, we design quantum algorithms for the construction of a neighbourhood graph, outlier detection as well as smart initialization of the cluster centres. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

50. An extended DEIM algorithm for subset selection and class identification.

Author: Hendryx, Emily P., Rivière, Béatrice M., and Rusin, Craig G.
Subjects: SUBSET selection, MATRIX decomposition, ALGORITHMS, DIFFERENTIAL equations, MACHINE learning
Abstract: The discrete empirical interpolation method (DEIM) has been shown to be a viable index-selection technique for identifying representative subsets in data. Having gained some popularity in reducing dimensionality of physical models involving differential equations, its use in subset-/pattern-identification tasks is not yet broadly known within the machine learning community. While it has much to offer as is, the DEIM algorithm is limited in that the number of selected indices cannot exceed the rank of the corresponding data matrix. Although this is not an issue for many data sets, there are cases in which the number of classes represented in a given data set is greater than the rank of the data matrix; in such cases, it is impossible for the standard DEIM algorithm to identify all classes. To overcome this issue, we present a novel extension of DEIM, called E-DEIM. With the proposed algorithm, we also provide some theoretical results for using extensions of DEIM to form the CUR matrix factorization in identifying both rows and columns to approximate the original data matrix. Results from applying variations of E-DEIM to two different data sets indicate that the presented extension can indeed allow for the identification of additional classes along with those selected by standard DEIM. In addition, comparing these results to those of some more familiar methods demonstrates that the proposed deterministic E-DEIM approach including coherence performs comparably to or better than the other evaluated methods and should be considered in future class-identification tasks. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

86 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources