Descriptor: "statistical learning theory" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"statistical learning theory"' showing total 36 results

Start Over Descriptor "statistical learning theory" Database OAIster

36 results on '"statistical learning theory"'

1. Learning from fuzzy labels: Theoretical issues and algorithmic solutions

Author: Campagner, A, Campagner A., Campagner, A, and Campagner A.
Abstract: In this article we study the problem of learning from fuzzy labels (LFL), a form of weakly supervised learning in which the supervision target is not precisely specified but is instead given in the form of possibility distributions, that express the imprecise knowledge of the annotating agent. While several approaches for LFL have been proposed in the literature, including generalized risk minimization (GRM), instance-based methods and pseudo label-based learning, both their theoretical properties and their empirical performance have scarcely been studied. We address this gap by: first, presenting a review of the previous results relative to the sample complexity and generalization bounds for GRM and instance-based methods; second, studying both their computational complexity, by proving in particular the impossibility of efficiently solving LFL using GRM, as well as impossibility theorems. We then propose a novel pseudo label-based learning method, called Random Resampling-based Learning (RRL), which directly draws from ensemble learning and possibility theory and study its learning- and complexity-theoretic properties, showing that it achieves guarantees similar to those for GRM while being computationally efficient. Finally, we study the empirical performance of several state-of-the-art LFL algorithms on wide set of synthetic and real-world benchmark datasets, by which we confirm the effectiveness of the proposed RRL method. Additionally, we describe directions for future research, and highlight opportunities for further interaction between machine learning and uncertainty representation theories.
Published: 2024

2. Improving the interpretation of data-driven water consumption models via the use of social norms

Author: Obringer, R., Nateghi, R., Ma, Z., Kumar, Rohini, Obringer, R., Nateghi, R., Ma, Z., and Kumar, Rohini
Abstract: Water is essential to improving social equity, promoting just economic development and protecting the function of the Earth system. It is therefore important to have access to credible models of water consumption, so as to ensure that water utilities can adequately supply water to meet the growing demand. Within the literature, there are a variety of models, but often these models evaluate the water consumption at aggregate scales (e.g., city or regional), thus overlooking intra-city differences. Conversely, the models that evaluate intra-city differences tend to rely heavily on one or two sources of quantitative data (e.g., climate variables or demographics), potentially missing key cultural aspects that may act as confounding factors in quantitative models. Here, we present a novel mixed-methods approach to predict intra-city residential water consumption patterns by integrating climate and demographic data, and by incorporating social norm data to aid the interpretation of model results. Using Indianapolis, Indiana as a test case, we show the value in adopting a more integrative approach to modeling residential water consumption. In particular, we leverage qualitative interview data to interpret the results from a predictive model based on a state-of-the-art machine learning algorithm. This integrative approach provides community-specific interpretations of model results that would otherwise not be observed by considering demographics alone. Ultimately, the results demonstrate the value and importance of such approaches when working on complex problems.
Published: 2022

3. Robustness Should Not Be at Odds with Accuracy

Author: Sadia Chowdhury and Ruth Urner, Chowdhury, Sadia, Urner, Ruth, Sadia Chowdhury and Ruth Urner, Chowdhury, Sadia, and Urner, Ruth
Abstract: The phenomenon of adversarial examples in deep learning models has caused substantial concern over their reliability and trustworthiness: in many instances an imperceptible perturbation can falsely flip a neural network’s prediction. Applied research in this area has mostly focused on developing novel adversarial attack strategies or building better defenses against such. It has repeatedly been pointed out that adversarial robustness may be in conflict with requirements for high accuracy. In this work, we take a more principled look at modeling the phenomenon of adversarial examples. We argue that deciding whether a model’s label change under a small perturbation is justified, should be done in compliance with the underlying data-generating process. Through a series of formal constructions, systematically analyzing the relation between standard Bayes classifiers and robust-Bayes classifiers, we make the case for adversarial robustness as a locally adaptive measure. We propose a novel way defining such a locally adaptive robust loss, show that it has a natural empirical counterpart, and develop resulting algorithmic guidance in form of data-informed adaptive robustness radius. We prove that our adaptive robust data-augmentation maintains consistency of 1-nearest neighbor classification under deterministic labels and thereby argue that robustness should not be at odds with accuracy.
Published: 2022
Full Text: View/download PDF

4. Advances and open problems in federated learning

Author: Kairouz, P. (Peter), McMahan, H. B. (H. Brendan), Avent, B. (Brendan), Bellet, A. (Aurélien), Bennis, M. (Mehdi), Bhagoji, A. N. (Arjun Nitin), Bonawitz, K. (Kallista), Charles, Z. (Zachary), Cormode, G. (Graham), Cummings, R. (Rachel), D’Oliveira, R. G. (Rafael G.L.), Eichner, H. (Hubert), El Rouayheb, S. (Salim), Evans, D. (David), Gardner, J. (Josh), Garrett, Z. (Zachary), Gascón, A. (Adriá), Ghazi, B. (Badih), Gibbons, P. B. (Phillip B.), Gruteser, M. (Marco), Harchaoui, Z. (Zaid), He, C. (Chaoyang), He, L. (Lie), Huo, Z. (Zhouyuan), Hutchinson, B. (Ben), Hsu, J. (Justin), Jaggi, M. (Martin), Javidi, T. (Tara), Joshi, G. (Gauri), Khodak, M. (Mikhail), Konecny, J. (Jakub), Korolova, A. (Aleksandra), Koushanfar, F. (Farinaz), Koyejo, S. (Sanmi), Lepoint, T. (Tancrede), Liu, Y. (Yang), Mittal, P. (Prateek), Mohri, M. (Mehryar), Nock, R. (Richard), Özgür, A. (Ayfer), Pagh, R. (Rasmus), Qi, H. (Hang), Ramage, D. (Daniel), Raskar, R. (Ramesh), Raykova, M. (Mariana), Song, D. (Dawn), Song, W. (Weikang), Stich, S. U. (Sebastian U.), Sun, Z. (Ziteng), Theertha Suresh, A. (Ananda), Tramér, F. (Florian), Vepakomma, P. (Praneeth), Wang, J. (Jianyu), Xiong, L. (Li), Xu, Z. (Zheng), Yang, Q. (Qiang), Yu, F. X. (Felix X.), Yu, H. (Han), Zhao, S. (Sen), Kairouz, P. (Peter), McMahan, H. B. (H. Brendan), Avent, B. (Brendan), Bellet, A. (Aurélien), Bennis, M. (Mehdi), Bhagoji, A. N. (Arjun Nitin), Bonawitz, K. (Kallista), Charles, Z. (Zachary), Cormode, G. (Graham), Cummings, R. (Rachel), D’Oliveira, R. G. (Rafael G.L.), Eichner, H. (Hubert), El Rouayheb, S. (Salim), Evans, D. (David), Gardner, J. (Josh), Garrett, Z. (Zachary), Gascón, A. (Adriá), Ghazi, B. (Badih), Gibbons, P. B. (Phillip B.), Gruteser, M. (Marco), Harchaoui, Z. (Zaid), He, C. (Chaoyang), He, L. (Lie), Huo, Z. (Zhouyuan), Hutchinson, B. (Ben), Hsu, J. (Justin), Jaggi, M. (Martin), Javidi, T. (Tara), Joshi, G. (Gauri), Khodak, M. (Mikhail), Konecny, J. (Jakub), Korolova, A. (Aleksandra), Koushanfar, F. (Farinaz), Koyejo, S. (Sanmi), Lepoint, T. (Tancrede), Liu, Y. (Yang), Mittal, P. (Prateek), Mohri, M. (Mehryar), Nock, R. (Richard), Özgür, A. (Ayfer), Pagh, R. (Rasmus), Qi, H. (Hang), Ramage, D. (Daniel), Raskar, R. (Ramesh), Raykova, M. (Mariana), Song, D. (Dawn), Song, W. (Weikang), Stich, S. U. (Sebastian U.), Sun, Z. (Ziteng), Theertha Suresh, A. (Ananda), Tramér, F. (Florian), Vepakomma, P. (Praneeth), Wang, J. (Jianyu), Xiong, L. (Li), Xu, Z. (Zheng), Yang, Q. (Qiang), Yu, F. X. (Felix X.), Yu, H. (Han), and Zhao, S. (Sen)
Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this monograph discusses recent advances and presents an extensive collection of open problems and challenges.
Published: 2021

5. Fast rates for general unbounded loss functions: From ERM to generalized bayes

Author: Grünwald, P.D. (Peter), Mehta, N.A. (Nishant), Grünwald, P.D. (Peter), and Mehta, N.A. (Nishant)
Abstract: We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to η-generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate η is set correctly. For general loss functions, our bounds rely on two separate conditions: the v-GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter v in the v-GRIP conditions determines the achievable rate and is akin to the exponent in the Tsybakov margin condition and the Bernstein condition for bounded losses, which the v-GRIP conditions generalize; favorable v in combination with small model complexity leads to Õ(1/n) rates. The witness condition allows us to connect the excess risk to an “annealed” version thereof, by which we generalize several previous results connecting Hellinger and Rényi divergence to KL divergence.
Published: 2020

6. Assumptions & Expectations in Semi-Supervised Machine Learning

Author: Mey, A. (author) and Mey, A. (author)
Abstract: The goal of this thesis is to investigate theoretical results in the field of semi-supervised learning, while also linking them to problems in related subjects as class probability estimation., Interactive Intelligence
Published: 2020

7. Information Losses in Neural Classifiers From Sampling.

Author: Foggo, Brandon, Foggo, Brandon, Yu, Nanpeng, Shi, Jie, Gao, Yuanqi, Foggo, Brandon, Foggo, Brandon, Yu, Nanpeng, Shi, Jie, and Gao, Yuanqi
Abstract: This article considers the subject of information losses arising from the finite data sets used in the training of neural classifiers. It proves a relationship between such losses as the product of the expected total variation of the estimated neural model with the information about the feature space contained in the hidden representation of that model. It then bounds this expected total variation as a function of the size of randomly sampled data sets in a fairly general setting, and without bringing in any additional dependence on model complexity. It ultimately obtains bounds on information losses that are less sensitive to input compression and in general much smaller than existing bounds. This article then uses these bounds to explain some recent experimental findings of information compression in neural networks that cannot be explained by previous work. Finally, this article shows that not only are these bounds much smaller than existing ones, but they also correspond well with experiments.
Published: 2020

8. Benign overfitting in linear regression.

Author: Bartlett, Peter L, Bartlett, Peter L, Long, Philip M, Lugosi, Gábor, Tsigler, Alexander, Bartlett, Peter L, Bartlett, Peter L, Long, Philip M, Lugosi, Gábor, and Tsigler, Alexander
Abstract: The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lie in an infinite-dimensional space vs. when the data lie in a finite-dimensional space with dimension that grows faster than the sample size.
Published: 2020

9. Assumptions & Expectations in Semi-Supervised Machine Learning

Author: Mey, A. (author) and Mey, A. (author)
Abstract: The goal of this thesis is to investigate theoretical results in the field of semi-supervised learning, while also linking them to problems in related subjects as class probability estimation., Interactive Intelligence
Published: 2020

10. Algorithms for Query-Efficient Active Learning

Author: Yan, Songbai, Chaudhuri, Kamalika1, Javidi, Tara, Yan, Songbai, Yan, Songbai, Chaudhuri, Kamalika1, Javidi, Tara, and Yan, Songbai
Abstract: Recent decades have witnessed great success of machine learning, especially for tasks where large annotated datasets are available for training models. However, in many applications, raw data, such as images, are abundant, but annotations, such as descriptions of images, are scarce. Annotating data requires human effort and can be expensive. Consequently, one of the central problems in machine learning is how to train an accurate model with as few human annotations as possible. Active learning addresses this problem by bringing the annotator to work together with the learner in the learning process. In active learning, a learner can sequentially select examples and ask the annotator for labels, so that it may require fewer annotations if the learning algorithm avoids querying less informative examples.This dissertation focuses on designing provable query-efficient active learning algorithms. The main contributions are as follows. First, we study noise-tolerant active learning in the standard stream-based setting. We propose a computationally efficient algorithm for actively learning homogeneous halfspaces under bounded noise, and prove it achieves nearly optimal label complexity. Second, we theoretically investigate a novel interactive model where the annotator can not only return noisy labels, but also abstain from labeling. We propose an algorithm which utilizes abstention responses, and analyze its statistical consistency and query complexity under different conditions of the noise and abstention rate. Finally, we study how to utilize auxiliary datasets in active learning. We consider a scenario where the learner has access to a logged observational dataset where labeled examples are observed conditioned on a selection policy. We propose algorithms that effectively take advantage of both auxiliary datasets and active learning. We prove that these algorithms are statistically consistent, and achieve a lower label requirement than alternative methods theoretically
Published: 2019

11. Algorithms for Query-Efficient Active Learning

Author: Yan, Songbai, Chaudhuri, Kamalika1, Javidi, Tara, Yan, Songbai, Yan, Songbai, Chaudhuri, Kamalika1, Javidi, Tara, and Yan, Songbai
Abstract: Recent decades have witnessed great success of machine learning, especially for tasks where large annotated datasets are available for training models. However, in many applications, raw data, such as images, are abundant, but annotations, such as descriptions of images, are scarce. Annotating data requires human effort and can be expensive. Consequently, one of the central problems in machine learning is how to train an accurate model with as few human annotations as possible. Active learning addresses this problem by bringing the annotator to work together with the learner in the learning process. In active learning, a learner can sequentially select examples and ask the annotator for labels, so that it may require fewer annotations if the learning algorithm avoids querying less informative examples.This dissertation focuses on designing provable query-efficient active learning algorithms. The main contributions are as follows. First, we study noise-tolerant active learning in the standard stream-based setting. We propose a computationally efficient algorithm for actively learning homogeneous halfspaces under bounded noise, and prove it achieves nearly optimal label complexity. Second, we theoretically investigate a novel interactive model where the annotator can not only return noisy labels, but also abstain from labeling. We propose an algorithm which utilizes abstention responses, and analyze its statistical consistency and query complexity under different conditions of the noise and abstention rate. Finally, we study how to utilize auxiliary datasets in active learning. We consider a scenario where the learner has access to a logged observational dataset where labeled examples are observed conditioned on a selection policy. We propose algorithms that effectively take advantage of both auxiliary datasets and active learning. We prove that these algorithms are statistically consistent, and achieve a lower label requirement than alternative methods theoretically
Published: 2019

12. Robust Phoneme Recognition with Little Data

Author: Christopher Dane Shulby and Martha Dais Ferreira and Rodrigo F. de Mello and Sandra Maria Aluisio, Shulby, Christopher Dane, Ferreira, Martha Dais, de Mello, Rodrigo F., Aluisio, Sandra Maria, Christopher Dane Shulby and Martha Dais Ferreira and Rodrigo F. de Mello and Sandra Maria Aluisio, Shulby, Christopher Dane, Ferreira, Martha Dais, de Mello, Rodrigo F., and Aluisio, Sandra Maria
Abstract: A common belief in the community is that deep learning requires large datasets to be effective. We show that with careful parameter selection, deep feature extraction can be applied even to small datasets.We also explore exactly how much data is necessary to guarantee learning by convergence analysis and calculating the shattering coefficient for the algorithms used. Another problem is that state-of-the-art results are rarely reproducible because they use proprietary datasets, pretrained networks and/or weight initializations from other larger networks. We present a two-fold novelty for this situation where a carefully designed CNN architecture, together with a knowledge-driven classifier achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization. We also beat the best replication study of the state of the art with a 28% FER. More importantly, we are able to achieve transparent, reproducible frame-level accuracy and, additionally, perform a convergence analysis to show the generalization capacity of the model providing statistical evidence that our results are not obtained by chance. Furthermore, we show how algorithms with strong learning guarantees can not only benefit from raw data extraction but contribute with more robust results.
Published: 2019
Full Text: View/download PDF

13. Hydrological Interpretation of a Statistical Measure of Basin Complexity

Author: Pande, S. (author), Moayeri, M. (author), Pande, S. (author), and Moayeri, M. (author)
Abstract: This paper studies how streamflow predictability varies with basin characteristics. We introduce an index of basin complexity that is based on a model of least statistical complexity that is needed to reliably predict daily streamflow of the basin. We then relate it with climate, vegetation and soil characteristics of the basin. Daily streamflow is modeled using k nearest neighbor model of lagged streamflow that predicts next time step streamflow based on the occurrences of similar streamflow events from the past. In order to calculate basin complexity, we identify difficult streamflow events of the basin and then use Vapnik-Chervonenkis generalization theory, which trades off model performance with Vapnik-Chervonenkis dimension (i.e., a measure of model complexity), to find a k nearest neighbor model of appropriate complexity for predicting a difficult streamflow event of the basin. The average of selected model complexities corresponding to difficult events is then defined as the basin's complexity. Basin complexity of 412 Model Parameter Estimation Experiment basins from continental United States are then related with its six basin characteristics. All the characteristics have been derived from the Model Parameter Estimation Experiment database to represent climate, vegetation and soil characteristics of the basins in a concise manner. Results find that more complex basins that are drier have less seasonal rainfall, vegetation with more storage capacity (i.e., smaller 5-week Normalized Difference Vegetation Index gradient), and faster responsive soils. The results reaffirm prior observations that minimum complexity that is required to model a basin depends on its climate and landscape characteristics (e.g., complex models do not perform well in dry basins)., Water Resources
Published: 2018
Full Text: View/download PDF

14. Hydrological Interpretation of a Statistical Measure of Basin Complexity

Author: Pande, S. (author), Moayeri, M. (author), Pande, S. (author), and Moayeri, M. (author)
Abstract: This paper studies how streamflow predictability varies with basin characteristics. We introduce an index of basin complexity that is based on a model of least statistical complexity that is needed to reliably predict daily streamflow of the basin. We then relate it with climate, vegetation and soil characteristics of the basin. Daily streamflow is modeled using k nearest neighbor model of lagged streamflow that predicts next time step streamflow based on the occurrences of similar streamflow events from the past. In order to calculate basin complexity, we identify difficult streamflow events of the basin and then use Vapnik-Chervonenkis generalization theory, which trades off model performance with Vapnik-Chervonenkis dimension (i.e., a measure of model complexity), to find a k nearest neighbor model of appropriate complexity for predicting a difficult streamflow event of the basin. The average of selected model complexities corresponding to difficult events is then defined as the basin's complexity. Basin complexity of 412 Model Parameter Estimation Experiment basins from continental United States are then related with its six basin characteristics. All the characteristics have been derived from the Model Parameter Estimation Experiment database to represent climate, vegetation and soil characteristics of the basins in a concise manner. Results find that more complex basins that are drier have less seasonal rainfall, vegetation with more storage capacity (i.e., smaller 5-week Normalized Difference Vegetation Index gradient), and faster responsive soils. The results reaffirm prior observations that minimum complexity that is required to model a basin depends on its climate and landscape characteristics (e.g., complex models do not perform well in dry basins)., Water Resources
Published: 2018
Full Text: View/download PDF

15. Fast Rates in Statistical and Online Learning

Author: Erven, T.A.L. (Tim) van, Grünwald, P.D. (Peter), Mehta, N.A. (Nishant), Reid, M.D., Williamson, R.C., Erven, T.A.L. (Tim) van, Grünwald, P.D. (Peter), Mehta, N.A. (Nishant), Reid, M.D., and Williamson, R.C.
Abstract: The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for `proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.
Published: 2015

16. On the Stability of Structured Prediction

Author: London, Benjamin Alexei and London, Benjamin Alexei
Abstract: Many important applications of artificial intelligence---such as image segmentation, part-of-speech tagging and network classification---are framed as multiple, interdependent prediction tasks. These structured prediction problems are typically modeled using some form of joint inference over the outputs, to exploit the relational dependencies. Joint reasoning can significantly improve predictive accuracy, but it introduces a complication in the analysis of structured models: the stability of inference. In optimizations involving multiple interdependent variables, such as joint inference, a small change to the input or parameters could induce drastic changes in the solution. In this dissertation, I investigate the impact of stability in structured prediction. I explore two topics, connected by the stability of inference. First, I provide generalization bounds for learning from a limited number of examples with large internal structure. The effective learning rate can be significantly sharper than rates given in related work. Under certain conditions on the data distribution and stability of the predictor, the bounds decrease with both the number of examples and the size of each example, meaning one could potentially learn from a single giant example. Secondly, I investigate the benefits of learning with strongly convex variational inference. Using the duality between strong convexity and stability, I demonstrate, both theoretically and empirically, that learning with a strongly convex free energy can result in significantly more accurate marginal probabilities. One consequence of this work is a new technique that ``strongly convexifies" many free energies used in practice. These two seemingly unrelated threads are tied by the idea that stable inference leads to lower error, particularly in the limited example setting, thereby demonstrating that inference stability is of critical importance to the study and practice of structured prediction.
Published: 2015

17. Fast Rates in Statistical and Online Learning

Author: Erven, T.A.L. (Tim) van, Grünwald, P.D. (Peter), Mehta, N.A. (Nishant), Reid, M.D., Williamson, R.C., Erven, T.A.L. (Tim) van, Grünwald, P.D. (Peter), Mehta, N.A. (Nishant), Reid, M.D., and Williamson, R.C.
Abstract: The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for `proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.
Published: 2015

18. Randomized Algorithms for Systems and Control: Theory and Applications

Author: INSTITUTE FOR ELECTRONICS ENGINEERING INFORMATION AND TELECOMMUNICATIONS TURIN (ITALY), Tempo, Roberto, INSTITUTE FOR ELECTRONICS ENGINEERING INFORMATION AND TELECOMMUNICATIONS TURIN (ITALY), and Tempo, Roberto
Abstract: The main objection of this NATO lecture series is the introduction of rigorous study of randomized algorithms for uncertain systems and control, with specific UAV applications. A Randomized Algorithm (RA) is an algorithm that makes random choices during its execution to produce a result. This briefing covers 1) Probabilistic Robustness Analysis and Synthesis; 2) Sequential Methods for Convex Problems; 3) Non-Sequential Methods; 4) A Posteriori Analysis; 5) RACT; and 6) Systems and Control Applications., Presented at the NATO/RTO Systems Concepts and Integration Panel Lecture Series SCI-195 on Advanced Autonomous Formation Control and Trajectory Management Techniques for Multiple Micro UAV Applications held in Glasgow, United Kingdom on 19-21 May 2008. See also ADM002223. The original document contains color images.
Published: 2008

19. Randomized Algorithms for Systems and Control: Theory and Applications

Author: INSTITUTE FOR ELECTRONICS ENGINEERING INFORMATION AND TELECOMMUNICATIONS TURIN (ITALY), Tempo, Roberto, INSTITUTE FOR ELECTRONICS ENGINEERING INFORMATION AND TELECOMMUNICATIONS TURIN (ITALY), and Tempo, Roberto
Abstract: Randomized algorithms (RA) are frequently used in many areas of engineering, computer science, physics, finance, optimization, but their appearance in systems and control is mostly limited to Monte Carlo simulations. Main objective of this mini-course is the introduction to rigorous study of RAs for uncertain systems and control, with specific applications. Randomized algorithms are Probably Approximately Correct (PAC). This implies accepting a "small" risk of giving a wrong solution. The risk can be made arbitrarily small (but not zero) taking suitable values of so-called confidence and accuracy., Presented at the NATO/RTO Systems Concepts and Integration Panel Lecture Series SCI-195 on Advanced Autonomous Formation Control and Trajectory Management Techniques for Multiple Micro UAV Applications held in Glasgow, United Kingdom on 19-21 May 2008. See also ADM002223. The original document contains color images.
Published: 2008

20. The Default Risk of Firms Examined with Smooth Support Vector Machines

Author: Härdle, Wolfgang, Lee, Yuh-Jye, Schäfer, Dorothea, Yeh, Yi-Ren, Härdle, Wolfgang, Lee, Yuh-Jye, Schäfer, Dorothea, and Yeh, Yi-Ren
Abstract: In the era of Basel II a powerful tool for bankruptcy prognosis is vital for banks. The tool must be precise but also easily adaptable to the bank's objections regarding the relation of false acceptances (Type I error) and false rejections (Type II error). We explore the suitability of Smooth Support Vector Machines (SSVM), and investigate how important factors such as selection of appropriate accounting ratios (predictors), length of training period and structure of the training sample influence the precision of prediction. Furthermore we show that oversampling can be employed to gear the tradeoff between error types. Finally, we illustrate graphically how different variants of SSVM can be used jointly to support the decision task of loan officers.
Published: 2008

21. The Default Risk of Firms Examined with Smooth Support Vector Machines

Author: Härdle, Wolfgang Karl, Lee, Yuh-Jye, Schäfer, Dorothea, Yeh, Yi-Ren, Härdle, Wolfgang Karl, Lee, Yuh-Jye, Schäfer, Dorothea, and Yeh, Yi-Ren
Abstract: In the era of Basel II a powerful tool for bankruptcy prognosis is vital for banks. The tool must be precise but also easily adaptable to the bank's objections regarding the relation of false acceptances (Type I error) and false rejections (Type II error). We explore the suitability of Smooth Support Vector Machines (SSVM), and investigate how important factors such as selection of appropriate accounting ratios (predictors), length of training period and structure of the training sample influence the precision of prediction. Furthermore we show that oversampling can be employed to gear the tradeoff between error types. Finally, we illustrate graphically how different variants of SSVM can be used jointly to support the decision task of loan officers.
Published: 2008

22. A Note on Perturbation Results for Learning Empirical Operators

Author: Tomaso Poggio, Center for Biological and Computational Learning (CBCL), De Vito, Ernesto, Belkin, Mikhail, Rosasco, Lorenzo, Tomaso Poggio, Center for Biological and Computational Learning (CBCL), De Vito, Ernesto, Belkin, Mikhail, and Rosasco, Lorenzo
Abstract: A large number of learning algorithms, for example, spectral clustering, kernel Principal Components Analysis and many manifold methods are based on estimating eigenvalues and eigenfunctions of operators defined by a similarity function or a kernel, given empirical data. Thus for the analysis of algorithms, it is an important problem to be able to assess the quality of such approximations. The contribution of our paper is two-fold: 1. We use a technique based on a concentration inequality for Hilbert spaces to provide new much simplified proofs for a number of results in spectral approximation. 2. Using these methods we provide several new results for estimating spectral properties of the graph Laplacian operator extending and strengthening results from [26].
Published: 2008

23. A Note on Perturbation Results for Learning Empirical Operators

Author: Tomaso Poggio, Center for Biological and Computational Learning (CBCL), De Vito, Ernesto, Belkin, Mikhail, Rosasco, Lorenzo, Tomaso Poggio, Center for Biological and Computational Learning (CBCL), De Vito, Ernesto, Belkin, Mikhail, and Rosasco, Lorenzo
Abstract: A large number of learning algorithms, for example, spectral clustering, kernel Principal Components Analysis and many manifold methods are based on estimating eigenvalues and eigenfunctions of operators defined by a similarity function or a kernel, given empirical data. Thus for the analysis of algorithms, it is an important problem to be able to assess the quality of such approximations. The contribution of our paper is two-fold: 1. We use a technique based on a concentration inequality for Hilbert spaces to provide new much simplified proofs for a number of results in spectral approximation. 2. Using these methods we provide several new results for estimating spectral properties of the graph Laplacian operator extending and strengthening results from [26].
Published: 2008

24. What do people want to know about their food? Measuring Central Coast consumers' interest in food systems issues

Author: Howard, Phil, Howard, Phil, Howard, Phil, and Howard, Phil
Abstract: What Do People Want to Know About Their Food? Measuring Central Coast Consumers’ Interest in Food Systems Issues reports on consumers’ interest in how their food is produced, processed, transported, and sold; and the criteria that influence their purchasing decisions.In 2004 Phil Howard and Jan Perez conducted five focus groups and mailed a 26-question survey to 1,000 randomly selected households in San Mateo, Santa Clara, Santa Cruz, San Benito, and Monterey Counties; the survey response rate was 48 percent. The study was funded by a U.S. Department of Agriculture (USDA) grant to foster sustainable agriculture on the Central Coast as part of the Center’s Central Coast Research Project.The focus groups and survey found that the majority of consumers want more information about how their food is grown and processed, how it reaches them, or what’s involved in food marketing. They’d like to see a system of eco-labels that would provide point-of-purchase information on such criteria as whether the workers receive a living wage, whether the animals were treated humanely, and whether the food was locally grown. When asked to rank five potential “eco-labels,” respondents were most enthusiastic about the idea of a “humane” label, with more than 30 percent citing it as their first choice, followed by “locally grown” (22 percent), “living wage” (16.5 percent), “U.S. grown” (5.9 percent), and “small-scale” (5.2 percent).
Published: 2005

25. What do people want to know about their food? Measuring Central Coast consumers' interest in food systems issues

Author: Howard, Phil, Howard, Phil, Howard, Phil, and Howard, Phil
Abstract: What Do People Want to Know About Their Food? Measuring Central Coast Consumers’ Interest in Food Systems Issues reports on consumers’ interest in how their food is produced, processed, transported, and sold; and the criteria that influence their purchasing decisions.In 2004 Phil Howard and Jan Perez conducted five focus groups and mailed a 26-question survey to 1,000 randomly selected households in San Mateo, Santa Clara, Santa Cruz, San Benito, and Monterey Counties; the survey response rate was 48 percent. The study was funded by a U.S. Department of Agriculture (USDA) grant to foster sustainable agriculture on the Central Coast as part of the Center’s Central Coast Research Project.The focus groups and survey found that the majority of consumers want more information about how their food is grown and processed, how it reaches them, or what’s involved in food marketing. They’d like to see a system of eco-labels that would provide point-of-purchase information on such criteria as whether the workers receive a living wage, whether the animals were treated humanely, and whether the food was locally grown. When asked to rank five potential “eco-labels,” respondents were most enthusiastic about the idea of a “humane” label, with more than 30 percent citing it as their first choice, followed by “locally grown” (22 percent), “living wage” (16.5 percent), “U.S. grown” (5.9 percent), and “small-scale” (5.2 percent).
Published: 2005

26. Local complexities for empirical risk minimization

Author: Bartlett, Peter L, Bartlett, Peter L, Mendelson, Shahar, Philips, Petra, Bartlett, Peter L, Bartlett, Peter L, Mendelson, Shahar, and Philips, Petra
Abstract: We present sharp bounds on the risk of the empirical minimization algorithm under mild assumptions on the class. We introduce the notion of isomorphic coordinate projections and show that this leads to a sharper error bound than the best previously known. The quantity which governs this bound on the empirical minimizer is the largest fixed point of the function xi(n)(r) = E sup {Ef - E(n)f : f is an element of F, Ef = r}. We prove that this is the best estimate one can obtain using "structural results", and that it is possible to estimate the error rate from data. We then prove that the bound on the empirical minimization algorithm can be improved further by a direct analysis, and that the correct error rate is the maximizer of xi'(n)(r) - r, where xi'(n)(r) = E sup {Ef - E(n)f : f is an element of F, Ef = r}.
Published: 2004

27. Neural Networks

Author: Jordan, Michael I., Bishop, Christopher M., Jordan, Michael I., and Bishop, Christopher M.
Abstract: We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
Published: 2004

28. Local complexities for empirical risk minimization

Author: Bartlett, Peter L, Bartlett, Peter L, Mendelson, Shahar, Philips, Petra, Bartlett, Peter L, Bartlett, Peter L, Mendelson, Shahar, and Philips, Petra
Abstract: We present sharp bounds on the risk of the empirical minimization algorithm under mild assumptions on the class. We introduce the notion of isomorphic coordinate projections and show that this leads to a sharper error bound than the best previously known. The quantity which governs this bound on the empirical minimizer is the largest fixed point of the function xi(n)(r) = E sup {Ef - E(n)f : f is an element of F, Ef = r}. We prove that this is the best estimate one can obtain using "structural results", and that it is possible to estimate the error rate from data. We then prove that the bound on the empirical minimization algorithm can be improved further by a direct analysis, and that the correct error rate is the maximizer of xi'(n)(r) - r, where xi'(n)(r) = E sup {Ef - E(n)f : f is an element of F, Ef = r}.
Published: 2004

29. On the importance of small coordinate projections

Author: Mendelson, Shahar, Philips, Petra, Mendelson, Shahar, and Philips, Petra
Abstract: It has been recently shown that sharp generalization bounds can be obtained when the function class from which the algorithm chooses its hypotheses is “small” in the sense that the Rademacher averages of this function class are small. We show that a new more general principle guarantees good generalization bounds. The new principle requires that random coordinate projections of the function class evaluated on random samples are “small” with high probability and that the random class of functions allows symmetrization. As an example, we prove that this geometric property of the function class is exactly the reason why the two lately proposed frameworks, the luckiness (Shawe-Taylor et al., 1998) and the algorithmic luckiness (Herbrich and Williamson, 2002), can be used to establish generalization bounds.
Published: 2004

30. Neural Networks

Author: Jordan, Michael I., Bishop, Christopher M., Jordan, Michael I., and Bishop, Christopher M.
Abstract: We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
Published: 2004

31. A Note on the Generalization Performance of Kernel Classifiers with Margin

Author: Evgeniou, Theodoros, Pontil, Massimiliano, Evgeniou, Theodoros, and Pontil, Massimiliano
Abstract: We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
Published: 2004

32. Neural Networks

Author: Jordan, Michael I., Bishop, Christopher M., Jordan, Michael I., and Bishop, Christopher M.
Abstract: We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
Published: 2004

33. A Note on the Generalization Performance of Kernel Classifiers with Margin

Author: Evgeniou, Theodoros, Pontil, Massimiliano, Evgeniou, Theodoros, and Pontil, Massimiliano
Abstract: We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
Published: 2004

34. Neural Networks

Author: Jordan, Michael I., Bishop, Christopher M., Jordan, Michael I., and Bishop, Christopher M.
Abstract: We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
Published: 2004

35. A Note on the Generalization Performance of Kernel Classifiers with Margin

Author: Evgeniou, Theodoros, Pontil, Massimiliano, Evgeniou, Theodoros, and Pontil, Massimiliano
Abstract: We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
Published: 2004

36. Multidisciplinary Research for Demining

Author: DUKE UNIV DURHAM NC DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, Carin, Lawrence, DUKE UNIV DURHAM NC DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, and Carin, Lawrence
Abstract: This report summarizes research progress on the Duke University led demining MURI (Multidisciplinary University Research Initiative), encompassing researchers from Duke, Caltech, Georgia Tech and Ohio State University. Sensors examined by this team include radar, electromagnetic induction, acoustic, olfactory and MEMS. In addition to sensor development, significant effort has been directed toward development of optimal signal processing algorithms., Prepared in cooperation with California Institute of Technology, Georgia Institute of Technology, and Ohio State University.
Published: 2002

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

36 results on '"statistical learning theory"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources