1,097 results on '"Empirical Risk Minimization"'
Search Results
2. Empirical risk minimization for big data driven prescriptive analytics: An exploration of two-stage stochastic programs with recourse
- Author
-
Clausen, Johan Bjerre Bach, Li, Hongyan, and Forget, Nicolas
- Published
- 2025
- Full Text
- View/download PDF
3. Multiple-Instance Learning from Pairwise Comparison Bags.
- Author
-
ZHOU, SHENGJIE, SHU, SENLIN, WANG, HAOBO, WEI, HONGXIN, XIANG, TAO, and LI, BEIBEI
- Subjects
- *
PROBLEM solving , *ACQUISITION of data , *GENERALIZATION , *PRIVACY , *COST - Abstract
Multiple-instance learning (MIL) is a significant weakly supervised learning problem, where the training data consist of bags containing multiple instances and bag-level labels. Most previous MIL research required fully labeled bags. However, collecting such data is challenging due to the labeling costs or privacy concerns. Fortunately, we can easily collect pairwise comparison information, indicating one bag is more likely to be positive than the other. Therefore, we investigate a novel MIL problem about learning a bag-level binary classifier only from pairwise comparison bags. To solve this problem, we display the data generation process and provide a baseline method to train an instance-level classifier based on unlabeled-unlabeled learning. To achieve better performance, we propose a convex formulation to train a bag-level classifier and give a generalization error bound. Comprehensive experiments show that both the baseline method and the convex formulation achieve satisfactory performance, while the convex formulation performs better.1 [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Minimum Resource Threshold Policy Under Partial Interference.
- Author
-
Park, Chan, Chen, Guanhua, Yu, Menggang, and Kang, Hyunseung
- Subjects
- *
DEMOGRAPHIC surveys , *COMMUNICABLE diseases , *CAUSAL inference , *VACCINE development , *HEALTH surveys , *SANITATION - Abstract
When developing policies for prevention of infectious diseases, policymakers often set specific, outcome-oriented targets to achieve. For example, when developing a vaccine allocation policy, policymakers may want to distribute them so that at least a certain fraction of individuals in a census block are disease-free and spillover effects due to interference within blocks are accounted for. The article proposes methods to estimate a block-level treatment policy that achieves a predefined, outcome-oriented target while accounting for spillover effects due to interference. Our policy, the minimum resource threshold policy (MRTP), suggests the minimum fraction of treated units required within a block to meet or exceed the target level of the outcome. We estimate the MRTP from empirical risk minimization using a novel, nonparametric, doubly robust loss function. We then characterize statistical properties of the estimated MRTP in terms of the excess risk bound. We apply our methodology to design a water, sanitation, and hygiene allocation policy for Senegal with the goal of increasing the proportion of households with no children experiencing diarrhea to a level exceeding a specified threshold. Our policy outperforms competing policies and offers new approaches to design allocation policies, especially in international development for communicable diseases. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. On the consistency of supervised learning with missing values.
- Author
-
Josse, Julie, Chen, Jacob M., Prost, Nicolas, Varoquaux, Gaël, and Scornet, Erwan
- Subjects
MISSING data (Statistics) ,DECISION trees ,MULTIPLE imputation (Statistics) ,FORECASTING ,SUPERVISED learning - Abstract
In many application settings, data have missing entries, which makes subsequent analyses challenging. An abundant literature addresses missing values in an inferential framework, aiming at estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and test data. We first rewrite classic missing values results for this setting. We then show the consistency of two approaches, test-time multiple imputation and single imputation in prediction. A striking result is that the widely-used method of imputing with a constant prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is frowned upon as it distorts the distribution of the data. The consistency of such a popular simple approach is important in practice. Finally, to contrast procedures based on imputation prior to learning with procedures that optimize the missing-value handling for prediction, we consider decision trees. Indeed, decision trees are among the few methods that can tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing empirically different missing values strategies in trees, we recommend using the "missing incorporated in attribute" method as it can handle both non-informative and informative missing values. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY.
- Author
-
Luxenberg, Eric, Malik, Dhruv, Li, Yuanzhi, Singh, Aarti, and Boyd, Stephen
- Subjects
- *
ROBUST optimization , *CONVEX programming , *CONVEX sets , *EXPERTISE , *CLASSIFICATION - Abstract
We consider robust empirical risk minimization (ERM), where model parameters are chosen to minimize the worst-case empirical loss when each data point varies over a given convex uncertainty set. In some simple cases, such problems can be expressed in an analytical form. In general the problem can be made tractable via dualization, which turns a min-max problem into a min-min problem. Dualization requires expertise and is tedious and error-prone. We demonstrate how CVXPY can be used to automate this dualization procedure in a user-friendly manner. Our framework allows practitioners to specify and solve robust ERM problems with a general class of convex losses, capturing many standard regression and classification problems. Users can easily specify any complex uncertainty set that is representable via disciplined convex programming (DCP) constraints. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Why and how to construct an epistemic justification of machine learning?
- Author
-
Spelda, Petr and Stritecky, Vit
- Abstract
Consider a set of shuffled observations drawn from a fixed probability distribution over some instance domain. What enables learning of inductive generalizations which proceed from such a set of observations? The scenario is worthwhile because it epistemically characterizes most of machine learning. This kind of learning from observations is also inverse and ill-posed. What reduces the non-uniqueness of its result and, thus, its problematic epistemic justification, which stems from a one-to-many relation between the observations and many learnable generalizations? The paper argues that this role belongs to any complexity regularization which satisfies Norton’s Material Theory of Induction (MTI) by localizing the inductive risk to facts in the given domain. A prime example of the localization is the Lottery Ticket Hypothesis (LTH) about overparameterized neural networks. The explanation of MTI’s role in complexity regularization of neural networks is provided by analyzing the stability of Empirical Risk Minimization (ERM), an inductive rule that controls the learning process and leads to an inductive generalization on the given set of observations. In cases where ERM might become asymptotically unstable, making the justification of the generalization by uniform convergence unavailable, LTH and MTI can be used to define a local stability. A priori, overparameterized neural networks are such cases and the combination of LTH and MTI can block ERM’s trivialization caused by equalizing the strengths of its inductive support for risk minimization. We bring closer the investigation of generalization in artificial neural networks and the study of inductive inference and show the division of labor between MTI and the optimality justifications (developed by Gerhard Schurz) in machine learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Stochastic Functions Learning from Distribution-Driven Data: Generalization Bound and Algorithms
- Author
-
Wu, Jia, Xiao, Xian-Tao, and Zhang, Li-Wei
- Published
- 2024
- Full Text
- View/download PDF
9. USING TAYLOR-APPROXIMATED GRADIENTS TO IMPROVE THE FRANK-WOLFE METHOD FOR EMPIRICAL RISK MINIMIZATION.
- Author
-
ZIKAI XIONG and FREUND, ROBERT M.
- Subjects
- *
STATISTICAL learning , *MACHINE learning , *SMOOTHNESS of functions , *CONVEX sets , *COMPUTATIONAL complexity - Abstract
The Frank--Wolfe method has become increasingly useful in statistical and machine learning applications due to the structure-inducing properties of the iterates and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of empirical risk minimization--one of the fundamental optimization problems in statistical and machine learning--the computational effectiveness of Frank--Wolfe methods typically grows linearly in the number of data observations n. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on n, we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example), and we propose amending the Frank--Wolfe method with Taylor series--approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance\varepsilon is sufficiently small, our methods are able to simultaneously reduce the dependence on large n while obtaining optimal convergence rates of Frank--Wolfe methods in both convex and nonconvex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Finally, we present computational experiments which show that our methods exhibit very significant speedups over existing methods on real-world datasets for both convex and nonconvex binary classification problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A multi-domain adaptive neural machine translation method based on domain data balancer.
- Author
-
Xu, Jinlei, Wen, Yonghua, Huang, Shuanghong, and Yu, Zhengtao
- Subjects
- *
MACHINE translating , *MAXIMUM likelihood statistics , *REINFORCEMENT learning - Abstract
Most methods for multi-domain adaptive neural machine translation (NMT) currently rely on mixing data from multiple domains in a single model to achieve multi-domain translation. However, this mixing can lead to imbalanced training data, causing the model to focus on training for the large-scale general domain while ignoring the scarce resources of specific domains, resulting in a decrease in translation performance. In this paper, we propose a multi-domain adaptive NMT method based on Domain Data Balancer (DDB) to address the problems of imbalanced data caused by simple fine-tuning. By adding DDB to the Transformer model, we adaptively learn the sampling distribution of each group of training data, replace the maximum likelihood estimation criterion with empirical risk minimization training, and introduce a reward-based iterative update of the bilevel optimizer based on reinforcement learning. Experimental results show that the proposed method improves the baseline model by an average of 1.55 and 0.14 BLEU (Bilingual Evaluation Understudy) scores respectively in English-German and Chinese-English multi-domain NMT. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Multiple-instance Learning from Triplet Comparison Bags.
- Author
-
Shu, Senlin, Wang, Deng-Bao, Yuan, Suqin, Wei, Hongxin, Jiang, Jiuchuan, Feng, Lei, and Zhang, Min-Ling
- Subjects
PROBLEM solving ,ELECTRONIC data processing ,ACQUISITION of data - Abstract
Multiple-instance learning (MIL) solves the problem where training instances are grouped in bags, and a binary (positive or negative) label is provided for each bag. Most of the existing MIL studies need fully labeled bags for training an effective classifier, while it could be quite hard to collect such data in many real-world scenarios, due to the high cost of data labeling process. Fortunately, unlike fully labeled data, triplet comparison data can be collected in a more accurate and human-friendly way. Therefore, in this article, we for the first time investigate MIL from only triplet comparison bags, where a triplet (X
a , Xb , Xc ) contains the weak supervision information that bag Xa is more similar to Xb than to Xc . To solve this problem, we propose to train a bag-level classifier by the empirical risk minimization framework and theoretically provide a generalization error bound. We also show that a convex formulation can be obtained only when specific convex binary losses such as the square loss and the double hinge loss are used. Extensive experiments validate that our proposed method significantly outperforms other baselines. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
12. Strong Overall Error Analysis for the Training of Artificial Neural Networks Via Random Initializations
- Author
-
Jentzen, Arnulf and Riekert, Adrian
- Published
- 2024
- Full Text
- View/download PDF
13. Explainable empirical risk minimization.
- Author
-
Zhang, Linli, Karakasidis, Georgios, Odnoblyudova, Arina, Dogruel, Leyla, Tian, Yu, and Jung, Alex
- Subjects
- *
MACHINE learning , *ARTIFICIAL intelligence , *WEATHER forecasting , *DECISION trees , *TRUST - Abstract
The successful application of machine learning (ML) methods increasingly depends on their interpretability or explainability. Designing explainable ML (XML) systems is instrumental for ensuring transparency of automated decision-making that targets humans. The explainability of ML methods is also an essential ingredient for trustworthy artificial intelligence. A key challenge in ensuring explainability is its dependence on the specific human end user of an ML system. The users of ML methods might have vastly different background knowledge about ML principles, with some having formal training in the specific field and others having none. We use information-theoretic concepts to develop a novel measure for the subjective explainability of predictions delivered by a ML method. We construct this measure via the conditional entropy of predictions, given the user signal. Our approach allows for a wide range of user signals, ranging from responses to surveys to biophysical measurements. We use this measure of subjective explainability as a regularizer for model training. The resulting explainable empirical risk minimization (EERM) principle strives to balance subjective explainability and risk. The EERM principle is flexible and can be combined with arbitrary ML models. We present several practical implementations of EERM for linear models and decision trees. Numerical experiments demonstrate the application of EERM to weather prediction and detecting inappropriate language in social media. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Differentially private Riemannian optimization.
- Author
-
Han, Andi, Mishra, Bamdev, Jawanpuria, Pratik, and Gao, Junbin
- Subjects
METRIC spaces ,RIEMANNIAN metric ,GAUSSIAN distribution ,GENERALIZED spaces ,RANDOM noise theory ,RIEMANNIAN manifolds - Abstract
In this paper, we study the differentially private empirical risk minimization problem where the parameter is constrained to a Riemannian manifold. We introduce a framework for performing differentially private Riemannian optimization by adding noise to the Riemannian gradient on the tangent space. The noise follows a Gaussian distribution intrinsically defined with respect to the Riemannian metric on the tangent space. We adapt the Gaussian mechanism from the Euclidean space to the tangent space compatible to such generalized Gaussian distribution. This approach presents a novel analysis as compared to directly adding noise on the manifold. We further prove privacy guarantees of the proposed differentially private Riemannian (stochastic) gradient descent using an extension of the moments accountant technique. Overall, we provide utility guarantees under geodesic (strongly) convex, general nonconvex objectives as well as under the Riemannian Polyak-Łojasiewicz condition. Empirical results illustrate the versatility and efficacy of the proposed framework in several applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. A mini-batch stochastic conjugate gradient algorithm with variance reduction.
- Author
-
Kou, Caixia and Yang, Han
- Subjects
INFORMATION storage & retrieval systems ,CONJUGATE gradient methods ,ALGORITHMS ,CONVEX functions ,ONLINE algorithms - Abstract
Stochastic gradient descent method is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, there have been many explicit variance reduction methods for stochastic descent, such as SVRG Johnson and Zhang [Advances in neural information processing systems, (2013), pp. 315–323], SAG Roux et al. [Advances in neural information processing systems, (2012), pp. 2663–2671], SAGA Defazio et al. [Advances in neural information processing systems, (2014), pp. 1646–1654] and so on. Conjugate gradient method, which has the same computation cost with gradient descent method, is considered. In this paper, in the spirit of SAGA, we propose a stochastic conjugate gradient algorithm which we call SCGA. With the Fletcher and Reeves type choices, we prove a linear convergence rate for smooth and strongly convex functions. We experimentally demonstrate that SCGA converges faster than the popular SGD type algorithms for four machine learning models, which may be convex, nonconvex or nonsmooth. Solving regression problems, SCGA is competitive with CGVR, which is the only one stochastic conjugate gradient algorithm with variance reduction so far, as we know. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Diametrical Risk Minimization: theory and computations.
- Author
-
Norton, Matthew D. and Royset, Johannes O.
- Subjects
GENERALIZATION ,NEIGHBORHOODS - Abstract
The theoretical and empirical performance of Empirical Risk Minimization (ERM) often suffers when loss functions are poorly behaved with large Lipschitz moduli and spurious sharp minimizers. We propose and analyze a counterpart to ERM called Diametrical Risk Minimization (DRM), which accounts for worst-case empirical risks within neighborhoods in parameter space. DRM has generalization bounds that are independent of Lipschitz moduli for convex as well as nonconvex problems and it can be implemented using a practical algorithm based on stochastic gradient descent. Numerical results illustrate the ability of DRM to find quality solutions with low generalization error in sharp empirical risk landscapes from benchmark neural network classification problems with corrupted labels. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Bootstrap-Based Inference for Cube Root Asymptotics
- Author
-
Cattaneo, MD, Jansson, M, and Nagasawa, K
- Subjects
Cube root asymptotics ,bootstrapping ,maximum score ,empirical risk minimization ,Economic Theory ,Applied Economics ,Econometrics - Abstract
This paper proposes a valid bootstrap-based distributional approximation for M-estimators exhibiting a Chernoff (1964)-type limiting distribution. For estimators of this kind, the standard nonparametric bootstrap is inconsistent. The method proposed herein is based on the nonparametric bootstrap, but restores consistency by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification leads to a generic and easy-to-implement resampling method for inference that is conceptually distinct from other available distributional approximations. We illustrate the applicability of our results with four examples in econometrics and machine learning.
- Published
- 2020
18. ERMPD: causal intervention for popularity debiasing in recommendation via empirical risk minimization
- Author
-
He, Ming, Wen, Hao, Hu, Xinlei, and An, Boyang
- Published
- 2024
- Full Text
- View/download PDF
19. Towards Understanding the fairness of differentially private margin classifiers.
- Author
-
Ruan, Wenqiang, Xu, Mingxin, Jing, Yinan, and Han, Weili
- Subjects
- *
FAIRNESS , *SUPPORT vector machines - Abstract
Margin classifiers, such as Support Vector Machine, are usually critical in the high-stakes decision domains. In recent years, differential privacy has been widely employed in margin classifiers to protect user privacy. However, incorporating differential privacy into margin classifiers might adversely cause the fairness issue in the sense that differentially private margin classifiers have significantly different true positive rates on different groups that are determined by sensitive attributes (e.g. race). In order to address this issue, we are motivated to identify the factor that dominates the fairness of differentially private margin classifiers based on well-designed experiments and further analysis. We first conduct an empirical study on three classical margin classifiers learned via three representative differentially private empirical risk minimization algorithms, respectively. The empirical result shows that the fairness of differentially private margin classifiers strongly depends on the fairness of their non-private versions. We then analyze how differential privacy impacts the fairness of margin classifiers and confirm the empirical study results. In a general sense, our study shows that when non-private margin classifiers are fair, the fairness of their differentially private counterparts can be ensured. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. A Stochastic Gradient Descent Algorithm Based on Adaptive Differential Privacy
- Author
-
Deng, Yupeng, Li, Xiong, He, Jiabei, Liu, Yuzhen, Liang, Wei, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Gao, Honghao, editor, Wang, Xinheng, editor, Wei, Wei, editor, and Dagiuklas, Tasos, editor
- Published
- 2022
- Full Text
- View/download PDF
21. Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-Purpose Regularizer and Privacy Budget Retrieval and Recycling
- Author
-
Li, Yinan, Liu, Fang, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2022
- Full Text
- View/download PDF
22. Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data
- Author
-
Furmańczyk, Konrad, Paczutkowski, Kacper, Dudziński, Marcin, Dziewa-Dawidczyk, Diana, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Groen, Derek, editor, de Mulatier, Clélia, editor, Paszynski, Maciej, editor, Krzhizhanovskaya, Valeria V., editor, Dongarra, Jack J., editor, and Sloot, Peter M. A., editor
- Published
- 2022
- Full Text
- View/download PDF
23. Improving Recognition of Handwritten Kannada Characters Using Mixup Regularization
- Author
-
Hebbi, Chandravva, Maiya, Anirudh, Mamatha, H. R., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Garg, Deepak, editor, Jagannathan, Sarangapani, editor, Gupta, Ankur, editor, Garg, Lalit, editor, and Gupta, Suneet, editor
- Published
- 2022
- Full Text
- View/download PDF
24. Joint Feature Selection and Classification for Positive Unlabelled Multi–Label Data Using Weighted Penalized Empirical Risk Minimization
- Author
-
Teisseyre Paweł
- Subjects
positive and unlabelled data ,multi-label classification ,feature selection ,empirical risk minimization ,Mathematics ,QA1-939 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.
- Published
- 2022
- Full Text
- View/download PDF
25. Accelerated Stochastic Peaceman–Rachford Method for Empirical Risk Minimization
- Author
-
Bai, Jian-Chao, Bian, Feng-Miao, Chang, Xiao-Kai, and Du, Lin
- Published
- 2023
- Full Text
- View/download PDF
26. Error Density-dependent Empirical Risk Minimization.
- Author
-
Chen, Hong, Zhang, Xuelin, Gong, Tieliang, Gu, Bin, and Zheng, Feng
- Subjects
- *
MACHINE learning , *REGRESSION analysis , *GENERALIZATION , *DENSITY , *CLASSIFICATION - Abstract
Empirical Risk Minimization (ERM) with the squared loss has become one of the most popular principles for designing learning algorithms. However, the existing mean regression models under ERM usually suffer from poor generalization performance due to their sensitivity to atypical observations (e.g., outliers). For alleviating this sensitivity, some strategies have been proposed by utilizing the quantitative relationship of error values with different observations to form robust learning objectives. Instead of focusing on error values, this paper considers the error density to uncover the structure information of observations and proposes a new learning objective, called Error Density-dependent Empirical Risk Minimization (EDERM), for robust regression under complex data environments. Property characterizations and experimental analysis validate the robustness and competitiveness of the proposed EDERM-based learning models. The implemented codes can be found at https://github.com/zhangxuelincode/EDERM. • Error density-dependent learning paradigm beyond the error value modeling. • Robust estimation for both regression and classification. • Property characterization and empirical validations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Portfolio optimization with feedback strategies based on artificial neural networks.
- Author
-
Kopeliovich, Yaacov and Pokojovy, Michael
- Abstract
Dynamic portfolio optimization has significantly benefited from a wider adoption of deep learning (DL). While existing research has focused on how DL can applied to solving the Hamilton–Jacobi–Bellman (HJB) equation, some very recent developments propose to forego the derivation of HJB in favor of empirical utility maximization over dynamic allocation strategies expressed through artificial neural networks. In addition to simplicity and transparency, this approach is universally applicable, as it is essentially agnostic about market dynamics. We apply it to optimal portfolio allocation between cash account and risky asset following Heston model. The results appear on par with theoretical ones. • Deep learning for asset allocation via empirical utility maximization is studied. • The approach is validated for Heston model calibrated on S&P 500/VIX and USO/OVX data. • The developments can serve as blueprint for other types of asset allocation problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Set-valued Classification with Out-of-distribution Detection for Many Classes.
- Author
-
Zhou Wang and Xingye Qiao
- Subjects
- *
STATISTICAL learning , *FEATURE selection , *NUMERICAL analysis - Abstract
Set-valued classification, a new classification paradigm that aims to identify all the plausible classes that an observation belongs to, improves over the traditional classification paradigms in multiple aspects. Existing set-valued classification methods do not consider the possibility that the test set may contain out-of-distribution data, that is, the emergence of a new class that never appeared in the training data. Moreover, they are computationally expensive when the number of classes is large. We propose a Generalized Prediction Set (GPS) approach to set-valued classification while considering the possibility of a new class in the test data. The proposed classifier uses kernel learning and empirical risk minimization to encourage a small expected size of the prediction set while guaranteeing that the class-specific accuracy is at least some value specified by the user. For high-dimensional data, further improvement is obtained through kernel feature selection. Unlike previous methods, the proposed method achieves a good balance between accuracy, efficiency, and out-of-distribution detection rate. Moreover, our method can be applied in parallel to all the classes to alleviate the computational burden. Both theoretical analysis and numerical experiments are conducted to illustrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
29. On Tilted Losses in Machine Learning: Theory and Applications.
- Author
-
Tian Li, Beirami, Ahmad, Sanjabi, Maziar, and Smith, Virginia
- Subjects
- *
MACHINE learning , *MACHINE theory , *INFORMATION theory , *ROBUST optimization , *VALUE at risk - Abstract
Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM--tilted empirical risk minimization (TERM)--which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
30. Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?
- Author
-
Shuxiao Chen, Qinqing Zheng, Qi Long, and Weijie J. Su
- Subjects
- *
FEDERATED learning , *INDIVIDUALIZED instruction , *MACHINE learning , *DISTRIBUTION (Probability theory) , *STATISTICAL learning , *HETEROGENEITY , *NP-complete problems - Abstract
A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FEDAVG algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FEDAVG following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning. [ABSTRACT FROM AUTHOR]
- Published
- 2023
31. Asynchronous Parallel, Sparse Approximated SVRG for High-Dimensional Machine Learning.
- Author
-
Shang, Fanhua, Huang, Hua, Fan, Jun, Liu, Yuanyuan, Liu, Hongying, and Liu, Jianhui
- Subjects
- *
MACHINE learning , *SPARSE approximations , *ASYNCHRONOUS learning , *MATHEMATICAL optimization , *PARALLEL algorithms - Abstract
With the increasing of the data size and the development of multi-core computers, asynchronous parallel stochastic optimization algorithms such as KroMagnon have gained significant attention. In this paper, we propose a new Sparse approximation and asynchronous parallel Stochastic Variance Reduced Gradient (SSVRG) method for sparse and high-dimensional machine learning problems. Unlike standard SVRG and its asynchronous parallel variant, KroMagnon, the snapshot point of SSVRG is set to the average of all the iterates in the previous epoch, which allows it to take much larger learning rates and also makes it more robust to the choice of learning rates. In particular, we use the sparse approximation of the popular SVRG estimator to perform completely sparse updates at all iterations. Therefore, SSVRG has a much lower per-iteration computational cost than its dense counterpart, SVRG++, and is very friendly to asynchronous parallel implementation. Moreover, we provide the convergence guarantees of SSVRG for both strongly convex and non-strongly convex problems, while existing asynchronous algorithms (e.g., KroMagnon and ASAGA) only have convergence guarantees for strongly convex problems. Finally, we extend SSVRG to non-smooth and asynchronous parallel settings. Numerical experimental results demonstrate that SSVRG converges significantly faster than the state-of-the-art asynchronous parallel methods, e.g., KroMagnon, and is usually more than three orders of magnitude faster than SVRG++. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Some Proposal of the High Dimensional PU Learning Classification Procedure
- Author
-
Furmańczyk, Konrad, Dudziński, Marcin, Dziewa-Dawidczyk, Diana, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Paszynski, Maciej, editor, Kranzlmüller, Dieter, editor, Krzhizhanovskaya, Valeria V., editor, Dongarra, Jack J., editor, and Sloot, Peter M.A., editor
- Published
- 2021
- Full Text
- View/download PDF
33. Escaping Saddle Points of Empirical Risk Privately and Scalably via DP-Trust Region Method
- Author
-
Wang, Di, Xu, Jinhui, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hutter, Frank, editor, Kersting, Kristian, editor, Lijffijt, Jefrey, editor, and Valera, Isabel, editor
- Published
- 2021
- Full Text
- View/download PDF
34. Data sampling strategies in stochastic algorithms for empirical risk minimization
- Author
-
Csiba, Dominik, Richtarik, Peter, and Sutton, Charles
- Subjects
005.7 ,big data ,stochastic dual coordinate ascent ,empirical risk minimization ,AdaSDCA+ ,minibatches ,sampling ,Polyak- Lojasiewicz condition ,block selection rules - Abstract
Gradient descent methods and especially their stochastic variants have become highly popular in the last decade due to their efficiency on big data optimization problems. In this thesis we present the development of data sampling strategies for these methods. In the first four chapters we focus on four views on the sampling for convex problems, developing and analyzing new state-of-the-art methods using non-standard data sampling strategies. Finally, in the last chapter we present a more flexible framework, which generalizes to more problems as well as more sampling rules. In the first chapter we propose an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization (ERM) problem. Our modification consists in allowing the method to adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves a provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing non-adaptive methods. In the second chapter we extend the dual-free analysis of SDCA, to arbitrary mini-batching schemes. Our method is able to better utilize the information in the data defining the ERM problem. For convex loss functions, our complexity results match those of QUARTZ, which is a primal-dual method also allowing for arbitrary mini-batching schemes. The advantage of a dual-free analysis comes from the fact that it guarantees convergence even for non-convex loss functions, as long as the average loss is convex. We illustrate through experiments the utility of being able to design arbitrary mini-batching schemes. In the third chapter we study importance sampling of minibatches. Minibatching is a well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance. Another popular technique is importance sampling { a strategy for preferential sampling of more important examples also capable of accelerating the training process. However, despite considerable effort by the community in these areas, and due to the inherent technical difficulty of the problem, there is no existing work combining the power of importance sampling with the strength of minibatching. In this chapter we propose the first importance sampling for minibatches and give simple and rigorous complexity analysis of its performance. We illustrate on synthetic problems that for training data of certain properties, our sampling can lead to several orders of magnitude improvement in training time. We then test the new sampling on several popular datasets, and show that the improvement can reach an order of magnitude. In the fourth chapter we ask whether randomized coordinate descent (RCD) methods should be applied to the ERM problem or rather to its dual. When the number of examples (n) is much larger than the number of features (d), a common strategy is to apply RCD to the dual problem. On the other hand, when the number of features is much larger than the number of examples, it makes sense to apply RCD directly to the primal problem. In this paper we provide the first joint study of these two approaches when applied to L2-regularized ERM. First, we show through a rigorous analysis that for dense data, the above intuition is precisely correct. However, we find that for sparse and structured data, primal RCD can significantly outperform dual RCD even if d ≪ n, and vice versa, dual RCD can be much faster than primal RCD even if n ≫ d. Moreover, we show that, surprisingly, a single sampling strategy minimizes both the (bound on the) number of iterations and the overall expected complexity of RCD. Note that the latter complexity measure also takes into account the average cost of the iterations, which depends on the structure and sparsity of the data, and on the sampling strategy employed. We confirm our theoretical predictions using extensive experiments with both synthetic and real data sets. In the last chapter we introduce two novel generalizations of the theory for gradient descent type methods in the proximal setting. Firstly, we introduce the proportion function, which we further use to analyze all the known block-selection rules for coordinate descent methods under a single framework. This framework includes randomized methods with uniform, non-uniform or even adaptive sampling strategies, as well as deterministic methods with batch, greedy or cyclic selection rules. We additionally introduce a novel block selection technique called greedy minibatches, for which we provide competitive convergence guarantees. Secondly, the whole theory of strongly-convex optimization was recently generalized to a specific class of non-convex functions satisfying the so-called Polyak- Lojasiewicz condition. To mirror this generalization in the weakly convex case, we introduce the Weak Polyak- Lojasiewicz condition, using which we give global convergence guarantees for a class of non-convex functions previously not considered in theory. Additionally, we give local convergence guarantees for an even larger class of non-convex functions satisfying only a certain smoothness assumption. By combining the two above mentioned generalizations we recover the state-of-the-art convergence guarantees for a large class of previously known methods and setups as special cases of our framework. Also, we provide new guarantees for many previously not considered combinations of methods and setups, as well as a huge class of novel non-convex objectives. The flexibility of our approach offers a lot of potential for future research, as any new block selection procedure will have a convergence guarantee for all objectives considered in our framework, while any new objective analyzed under our approach will have a whole fleet of block selection rules with convergence guarantees readily available.
- Published
- 2018
35. Stochastic approximation versus sample average approximation for Wasserstein barycenters.
- Author
-
Dvinskikh, Darina
- Subjects
- *
STOCHASTIC approximation , *CENTROID , *STOCHASTIC programming , *MACHINE learning , *LEARNING communities , *CONFIDENCE intervals - Abstract
In the machine learning and optimization community, there are two main approaches for the convex risk minimization problem, namely the Stochastic Approximation (SA) and the Sample Average Approximation (SAA). In terms of the oracle complexity (required number of stochastic gradient evaluations), both approaches are considered equivalent on average (up to a logarithmic factor). The total complexity depends on a specific problem, however, starting from the work [A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM. J. Opt. 19 (2009), pp. 1574–1609] it was generally accepted that the SA is better than the SAA. We show that for the Wasserstein barycenter problem, this superiority can be inverted. We provide a detailed comparison by stating the complexity bounds for the SA and SAA implementations calculating barycenters defined with respect to optimal transport distances and entropy-regularized optimal transport distances. As a byproduct, we also construct confidence intervals for the barycenter defined with respect to entropy-regularized optimal transport distances in the ℓ 2 -norm. The preliminary results are derived for a general convex optimization problem given by the expectation to have other applications besides the Wasserstein barycenter problem. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Deviation inequalities for stochastic approximation by averaging.
- Author
-
Fan, Xiequan, Alquier, Pierre, and Doukhan, Paul
- Subjects
- *
STOCHASTIC approximation , *RANDOM variables , *MARKOV processes , *STOCHASTIC models , *MARTINGALES (Mathematics) - Abstract
We introduce a class of Markov chains that includes models of stochastic approximation by averaging and non-averaging. Using a martingale approximation method, we establish various deviation inequalities for separately Lipschitz functions of such a chain, with different moment conditions on some dominating random variables of martingale differences. Finally, we apply these inequalities to stochastic approximation by averaging and empirical risk minimization. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Data‐driven platelet inventory management under uncertainty in the remaining shelf life of units.
- Author
-
Abouee‐Mehrizi, Hossein, Mirjalili, Mahdi, and Sarhangian, Vahid
- Subjects
INVENTORY control ,BLOOD platelets ,BLOOD products ,AGE distribution ,RATINGS of hospitals - Abstract
Platelets are perishable (5–7 day shelf life) blood products required for a variety of clinical treatments. In North America, hospitals typically procure platelet units from a central supplier. As such, the remaining shelf life of the delivered units could be subject to high uncertainty. Our work focuses on developing new models that leverage the increasingly available data from hospital information systems to prescribe ordering decisions in the presence of this uncertainty. Specifically, we consider a periodic review, perishable inventory system with zero lead time and uncertainty in demand and remaining shelf life of orders, operating under an oldest‐unit, first‐out allocation policy. We consider a family of base stock policies and adopt an empirical risk minimization approach to estimate the required inventory at the beginning of each period. The required inventory level for each period is assumed to be a linear function of a set of observed features in that period and the coefficients of the linear model are obtained by minimizing an approximate measure of the in‐sample empirical cost, comprised of a weighted sum of shortage and expiry costs. Our fixed initial age model assumes a constant remaining shelf life for all units. Our robust model assumes that an adversary selects the remaining shelf life of units subject to an uncertainty budget determined through an endogenous uncertainty set. We investigate the out‐of‐sample performance of the proposed models in a case study using data from two Canadian hospitals and in comparison to the hospitals' historical performances as well as other benchmarks. Both models achieve significant improvements over the historical decisions. For instance, the fixed initial age model achieves a 53% and 93% reduction in the expiry rate and an 82% and 99% reduction in the shortage rate for the two hospitals, respectively. Further, it either outperforms or performs as well as the other benchmarks. The robust model achieves better out‐of‐sample generalizability and demonstrates a more "robust" performance under counterfactual remaining age distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration.
- Author
-
Gao, Xuefeng, Gürbüzbalaban, Mert, and Zhu, Lingjiong
- Subjects
STOCHASTIC convergence ,RANDOM noise theory ,DEEP learning ,MACHINE learning - Abstract
Nonconvex Stochastic Optimization Nonconvex stochastic optimization problems arise in many machine learning problems, including deep learning. The stochastic gradient Hamiltonian Monte Carlo (SGHMC) is a variant of stochastic gradients with a momentum method in which a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates toward a global minimum. SGHMC has shown empirical success in practice for solving nonconvex stochastic optimization problems. In "Global convergence of stochastic gradient Hamiltonian Monte Carlo for nonconvex stochastic optimization: Nonasymptotic performance bounds and momentum-based acceleration," Gao, Gürbüzbalaban, and Zhu provide, for the first time, the finite-time performance bounds for the global convergence of SGHMC in the context of both population and empirical risk minimization problems and show that acceleration with momentum is possible in the context of global nonconvex stochastic optimization. Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is a variant of stochastic gradients with momentum where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates toward a global minimum. Many works report its empirical success in practice for solving stochastic nonconvex optimization problems; in particular, it has been observed to outperform overdamped Langevin Monte Carlo–based methods, such as stochastic gradient Langevin dynamics (SGLD), in many applications. Although the asymptotic global convergence properties of SGHMC are well known, its finite-time performance is not well understood. In this work, we study two variants of SGHMC based on two alternative discretizations of the underdamped Langevin diffusion. We provide finite-time performance bounds for the global convergence of both SGHMC variants for solving stochastic nonconvex optimization problems with explicit constants. Our results lead to nonasymptotic guarantees for both population and empirical risk minimization problems. For a fixed target accuracy level on a class of nonconvex problems, we obtain complexity bounds for SGHMC that can be tighter than those available for SGLD. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. ASPDC: Accelerated SPDC Regularized Empirical Risk Minimization for Ill-Conditioned Problems in Large-Scale Machine Learning.
- Author
-
Liang, Haobang, Cai, Hao, Wu, Hejun, Shang, Fanhua, Cheng, James, and Li, Xiying
- Subjects
MACHINE learning ,INTERIOR-point methods - Abstract
This paper aims to improve the response speed of SPDC (stochastic primal–dual coordinate ascent) in large-scale machine learning, as the complexity of per-iteration of SPDC is not satisfactory. We propose an accelerated stochastic primal–dual coordinate ascent called ASPDC and its further accelerated variant, ASPDC-i. Our proposed ASPDC methods achieve a good balance between low per-iteration computation complexity and fast convergence speed, even when the condition number becomes very large. The large condition number causes ill-conditioned problems, which usually requires many more iterations before convergence and longer per-iteration times in data training for machine learning. We performed experiments on various machine learning problems. The experimental results demonstrate that ASPDC and ASPDC-i converge faster than their counterparts, and enjoy low per-iteration complexity as well. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
- Author
-
Teisseyre, Paweł, Mielniczuk, Jan, Łazęcka, Małgorzata, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Krzhizhanovskaya, Valeria V., editor, Závodszky, Gábor, editor, Lees, Michael H., editor, Dongarra, Jack J., editor, Sloot, Peter M. A., editor, Brissos, Sérgio, editor, and Teixeira, João, editor
- Published
- 2020
- Full Text
- View/download PDF
41. The Support Vector Machine
- Author
-
Land, Walker H., Jr., Schaffer, J. David, Land Jr., Walker H., and Schaffer, J. David
- Published
- 2020
- Full Text
- View/download PDF
42. Distributed empirical risk minimization with differential privacy
- Author
-
Liu, Changxin, Johansson, Karl H., Shi, Yang, Liu, Changxin, Johansson, Karl H., and Shi, Yang
- Abstract
This work studies the distributed empirical risk minimization (ERM) problem under differential privacy (DP) constraint. Standard distributed algorithms achieve DP typically by perturbing all local subgradients with noise, leading to significantly degenerated utility. To tackle this issue, we develop a class of private distributed dual averaging (DDA) algorithms, which activates a fraction of nodes to perform optimization. Such subsampling procedure provably amplifies the DP guarantee, thereby achieving an equivalent level of DP with reduced noise. We prove that the proposed algorithms have utility loss comparable to centralized private algorithms for both general and strongly convex problems. When removing the noise, our algorithm attains the optimal O(1/t) convergence for non-smooth stochastic optimization. Finally, experimental results on two benchmark datasets are given to verify the effectiveness of the proposed algorithms., QC 20240201
- Published
- 2024
- Full Text
- View/download PDF
43. JOINT FEATURE SELECTION AND CLASSIFICATION FOR POSITIVE UNLABELLED MULTI–LABEL DATA USINGWEIGHTED PENALIZED EMPIRICAL RISK MINIMIZATION.
- Author
-
TEISSEYRE, PAWEŁ
- Subjects
FEATURE selection ,CLASSIFICATION - Abstract
We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. A review on instance ranking problems in statistical learning.
- Author
-
Werner, Tino
- Subjects
STATISTICAL learning ,CREDIT risk ,FRAUD investigation ,VECTOR data ,SET-valued maps - Abstract
Ranking problems, also known as preference learning problems, define a widely spread class of statistical learning problems with many applications, including fraud detection, document ranking, medicine, chemistry, credit risk screening, image ranking or media memorability. While there already exist reviews concentrating on specific types of ranking problems like label and object ranking problems, there does not yet seem to exist an overview concentrating on instance ranking problems that both includes developments in distinguishing between different types of instance ranking problems as well as careful discussions about their differences and the applicability of the existing ranking algorithms to them. In instance ranking, one explicitly takes the responses into account with the goal to infer a scoring function which directly maps feature vectors to real-valued ranking scores, in contrast to object ranking problems where the ranks are given as preference information with the goal to learn a permutation. In this article, we systematically review different types of instance ranking problems and the corresponding loss functions resp. goodness criteria. We discuss the difficulties when trying to optimize those criteria. As for a detailed and comprehensive overview of existing machine learning techniques to solve such ranking problems, we systematize existing techniques and recapitulate the corresponding optimization problems in a unified notation. We also discuss to which of the instance ranking problems the respective algorithms are tailored and identify their strengths and limitations. Computational aspects and open research problems are also considered. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Interpretation of Variable Consistency Dominance-Based Rough Set Approach by Minimization of Asymmetric Loss Function
- Author
-
Kusunoki, Yoshifumi, Błaszczyński, Jerzy, Inuiguchi, Masahiro, Słowiński, Roman, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Seki, Hirosato, editor, Nguyen, Canh Hao, editor, Huynh, Van-Nam, editor, and Inuiguchi, Masahiro, editor
- Published
- 2019
- Full Text
- View/download PDF
46. Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization
- Author
-
Pavel Dvurechensky, Dmitry Kamzolov, Aleksandr Lukashevich, Soomin Lee, Erik Ordentlich, César A. Uribe, and Alexander Gasnikov
- Subjects
Empirical risk minimization ,Distributed optimization ,Statistical preconditioning ,Tensor optimization methods ,Applied mathematics. Quantitative methods ,T57-57.97 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Statistical preconditioning enables fast methods for distributed large-scale empirical risk minimization problems. In this approach, multiple worker nodes compute gradients in parallel, which are then used by the central node to update the parameter by solving an auxiliary (preconditioned) smaller-scale optimization problem. The recently proposed Statistically Preconditioned Accelerated Gradient (SPAG) method [1] has complexity bounds superior to other such algorithms but requires an exact solution for computationally intensive auxiliary optimization problems at every iteration. In this paper, we propose an Inexact SPAG (InSPAG) and explicitly characterize the accuracy by which the corresponding auxiliary subproblem needs to be solved to guarantee the same convergence rate as the exact method. We build our results by first developing an inexact adaptive accelerated Bregman proximal gradient method for general optimization problems under relative smoothness and strong convexity assumptions, which may be of independent interest. Moreover, we explore the properties of the auxiliary problem in the InSPAG algorithm assuming Lipschitz third-order derivatives and strong convexity. For such problem class, we develop a linearly convergent Hyperfast second-order method and estimate the total complexity of the InSPAG method with hyperfast auxiliary problem solver. Finally, we illustrate the proposed method's practical efficiency by performing large-scale numerical experiments on logistic regression models. To the best of our knowledge, these are the first empirical results on implementing high-order methods on large-scale problems, as we work with data where the dimension is of the order of 3 million, and the number of samples is 700 million.
- Published
- 2022
- Full Text
- View/download PDF
47. Empirical Risk Minimization under Random Censorship.
- Author
-
Ausset, Guillaume, Clémençon, Stephan, and Portier, François
- Subjects
- *
CREDIT analysis , *KAPLAN-Meier estimator , *STATISTICAL learning , *CREDIT risk , *ACCELERATED life testing , *CENSORSHIP , *SUPERVISED learning , *CENSORING (Statistics) - Abstract
We consider the classic supervised learning problem where a continuous non-negative random label Y (e.g. a random duration) is to be predicted based upon observing a random vector X valued in Rd with d ≥ 1 by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis for instance, training observations can be right censored, meaning that, rather than on independent copies of (X; Y), statistical learning relies on a collection of n 1 independent realizations of the triplet (X; minfY; Cg; 1), where C is a nonnegative random variable with unknown distribution, modelling censoring and ffi = IfY ≤ Cg indicates whether the duration is right censored or not. As ignoring censoring in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censoring C given X, referred to as Beran risk, in order to perform empirical risk minimization. It is established, under mild conditions, that the learning rate of minimizers of this biased/weighted empirical risk functional is of order OP(p log(n)=n) when ignoring model bias issues inherent to plug-in estimation, as can be attained in absence of censoring. Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
48. Learning curves for the multi-class teacher–student perceptron
- Author
-
Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, and Lenka Zdeborová
- Subjects
multi-class classification ,empirical risk minimization ,high-dimensional statistics ,Computer engineering. Computer hardware ,TK7885-7895 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with a single-layer teacher–student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal (BO) estimation and empirical risk minimisation (ERM) were extensively analysed in this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the multi-class teacher–student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for the BO and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for Rademacher teacher we show that a first-order phase transition arises in the BO performance.
- Published
- 2023
- Full Text
- View/download PDF
49. Learning without Concentration.
- Author
-
MENDELSON, SHAHAR
- Subjects
SAMPLING errors ,RISK assessment -- Mathematical models ,SET theory ,MATHEMATICAL functions ,ESTIMATION theory ,MATHEMATICAL models - Abstract
We obtain sharp bounds on the estimation error of the Empirical Risk Minimization procedure, performed in a convex class and with respect to the squared loss, without assuming that class members and the target are bounded functions or have rapidly decaying tails. Rather than resorting to a concentration-based argument, the method used here relies on a "small-ball" assumption and thus holds for classes consisting of heavy-tailed functions and for heavy-tailed targets. The resulting estimates scale correctly with the "noise level" of the problem, and when applied to the classical, bounded scenario, always improve the known bounds. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
50. Differentially private empirical risk minimization for AUC maximization.
- Author
-
Wang, Puyu, Yang, Zhenhuan, Lei, Yunwen, Ying, Yiming, and Zhang, Hai
- Subjects
- *
RECEIVER operating characteristic curves , *EXPECTATION-maximization algorithms , *FRAUD investigation , *LEAST squares , *U-statistics , *DATA recorders & recording , *CANCER diagnosis - Abstract
Area under the ROC curve (AUC) is a widely used performance measure for imbalanced classification. Oftentimes, the ubiquitous imbalanced data such as financial records from fraud detection or genomic data from cancer diagnosis contains sensitive information, and therefore it is of practical and theoretical importance to develop privacy-preserving AUC maximization algorithms. In this paper, we propose differentially private empirical risk minimization (ERM) for AUC maximization, and systematically study their privacy and utility guarantees. In particular, we establish guarantees on the generalization (utility) performance of the proposed algorithms with fast rates. The technical novelty contains fast rates for the regularized ERM in AUC maximization, which is established using the peeling techniques for Rademacher averages [1] and properties of U-Statistics [2,3] to handle statistically non-independent pairs of examples in the objective function, and a new error decomposition to handle strongly smooth losses (e.g. least square loss). In addition, we revisit the private ERM with pointwise loss [4,5] and show optimal rates can be obtained using the uniform convergence approach. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.