45,558 results
Search Results
2. Position Paper: Challenges and Opportunities in Topological Deep Learning
- Author
-
Papamarkou, Theodore, Birdal, Tolga, Bronstein, Michael, Carlsson, Gunnar, Curry, Justin, Gao, Yue, Hajij, Mustafa, Kwitt, Roland, Liò, Pietro, Di Lorenzo, Paolo, Maroulas, Vasileios, Miolane, Nina, Nasrin, Farzana, Ramamurthy, Karthikeyan Natesan, Rieck, Bastian, Scardapane, Simone, Schaub, Michael T., Veličković, Petar, Wang, Bei, Wang, Yusu, Wei, Guo-Wei, and Zamzmi, Ghada
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.
- Published
- 2024
3. Position Paper: Why the Shooting in the Dark Method Dominates Recommender Systems Practice; A Call to Abandon Anti-Utopian Thinking
- Author
-
Rohde, David
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation., Comment: 11 pages
- Published
- 2024
4. Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI
- Author
-
Papamarkou, Theodore, Skoularidou, Maria, Palla, Konstantina, Aitchison, Laurence, Arbel, Julyan, Dunson, David, Filippone, Maurizio, Fortuin, Vincent, Hennig, Philipp, Lobato, Jose Miguel Hernandez, Hubin, Aliaksandr, Immer, Alexander, Karaletsos, Theofanis, Khan, Mohammad Emtiyaz, Kristiadi, Agustinus, Li, Yingzhen, Mandt, Stephan, Nemeth, Christopher, Osborne, Michael A., Rudner, Tim G. J., Rügamer, David, Teh, Yee Whye, Welling, Max, Wilson, Andrew Gordon, and Zhang, Ruqi
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.
- Published
- 2024
5. Position Paper: Generalized grammar rules and structure-based generalization beyond classical equivariance for lexical tasks and transduction
- Author
-
Petrache, Mircea and Trivedi, Shubhendu
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research., Comment: 12 pages
- Published
- 2024
6. Neural Architecture Search: Insights from 1000 Papers
- Author
-
White, Colin, Safari, Mahmoud, Sukthanker, Rhea, Ru, Binxin, Elsken, Thomas, Zela, Arber, Dey, Debadeepta, and Hutter, Frank
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries.
- Published
- 2023
7. Pen and Paper Exercises in Machine Learning
- Author
-
Gutmann, Michael U.
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
This is a collection of (mostly) pen-and-paper exercises in machine learning. The exercises are on the following topics: linear algebra, optimisation, directed graphical models, undirected graphical models, expressive power of graphical models, factor graphs and message passing, inference for hidden Markov models, model-based learning (including ICA and unnormalised models), sampling and Monte-Carlo integration, and variational inference., Comment: The associated github page is https://github.com/michaelgutmann/ml-pen-and-paper-exercises
- Published
- 2022
8. You Are the Best Reviewer of Your Own Papers: An Owner-Assisted Scoring Mechanism
- Author
-
Su, Weijie J.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Science and Game Theory ,Mathematics - Optimization and Control ,Statistics - Methodology ,Statistics - Machine Learning - Abstract
I consider a setting where reviewers offer very noisy scores for several items for the selection of high-quality ones (e.g., peer review of large conference proceedings), whereas the owner of these items knows the true underlying scores but prefers not to provide this information. To address this withholding of information, in this paper, I introduce the Isotonic Mechanism, a simple and efficient approach to improving imprecise raw scores by leveraging certain information that the owner is incentivized to provide. This mechanism takes the ranking of the items from best to worst provided by the owner as input, in addition to the raw scores provided by the reviewers. It reports the adjusted scores for the items by solving a convex optimization problem. Under certain conditions, I show that the owner's optimal strategy is to honestly report the true ranking of the items to her best knowledge in order to maximize the expected utility. Moreover, I prove that the adjusted scores provided by this owner-assisted mechanism are significantly more accurate than the raw scores provided by the reviewers. This paper concludes with several extensions of the Isotonic Mechanism and some refinements of the mechanism for practical consideration., Comment: Corrected typos and added a reference
- Published
- 2021
9. Intelligent Arxiv: Sort daily papers by learning users topics preference
- Author
-
Alvarez, Ezequiel, Lamagna, Federico, Miquel, Cesar, and Szewc, Manuel
- Subjects
Computer Science - Machine Learning ,Astrophysics - High Energy Astrophysical Phenomena ,General Relativity and Quantum Cosmology ,High Energy Physics - Phenomenology ,High Energy Physics - Theory ,Statistics - Machine Learning - Abstract
Current daily paper releases are becoming increasingly large and areas of research are growing in diversity. This makes it harder for scientists to keep up to date with current state of the art and identify relevant work within their lines of interest. The goal of this article is to address this problem using Machine Learning techniques. We model a scientific paper to be built as a combination of different scientific knowledge from diverse topics into a new problem. In light of this, we implement the unsupervised Machine Learning technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a given field to: i) define and extract underlying topics in the corpus; ii) get the topics weight vector for each paper in the corpus; and iii) get the topics weight vector for new papers. By registering papers preferred by a user, we build a user vector of weights using the information of the vectors of the selected papers. Hence, by performing an inner product between the user vector and each paper in the daily Arxiv release, we can sort the papers according to the user preference on the underlying topics. We have created the website IArxiv.org where users can read sorted daily Arxiv releases (and more) while the algorithm learns each users preference, yielding a more accurate sorting every day. Current IArxiv.org version runs on Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to others. We propose several new useful and relevant implementations to be additionally developed as well as new Machine Learning techniques beyond LDA to further improve the accuracy of this new tool., Comment: We are open to new ideas and to scientists and institutions wishing to collaborate and/or partner in further improvements for this service. With this tool the time a paper is sent is irrelevant for its order of appearance
- Published
- 2020
10. Method and Dataset Mining in Scientific Papers
- Author
-
Yao, Rujing, Hou, Linlin, Ye, Yingchun, Wu, Ou, Zhang, Ji, and Wu, Jian
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Information Retrieval ,Statistics - Machine Learning - Abstract
Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining of M and D are useful for discipline analysis and algorithm recommendation. In this paper, we propose a novel entity recognition model, called MDER, and constructe datasets from the papers of the PAKDD conferences (2009-2019). Some preliminary experiments are conducted to assess the extraction performance and the mining results are visualized.
- Published
- 2019
11. $hv$-Block Cross Validation is not a BIBD: a Note on the Paper by Jeff Racine (2000)
- Author
-
Zheng, Wenjie
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Quantitative Biology - Quantitative Methods ,Statistics - Methodology - Abstract
This note corrects a mistake in the paper "consistent cross-validatory model-selection for dependent data: $hv$-block cross-validation" by Racine (2000). In his paper, he implied that the therein proposed $hv$-block cross-validation is consistent in the sense of Shao (1993). To get this intuition, he relied on the speculation that $hv$-block is a balanced incomplete block design (BIBD). This note demonstrates that this is not the case, and thus the theoretical consistency of $hv$-block remains an open question. In addition, I also provide a Python program counting the number of occurrences of each sample and each pair of samples., Comment: Technique report. 5 pages, 1 figure
- Published
- 2019
12. The Role of Publicly Available Data in MICCAI Papers from 2014 to 2018
- Author
-
Heller, Nicholas, Rickman, Jack, Weight, Christopher, and Papanikolopoulos, Nikolaos
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing ,Statistics - Machine Learning - Abstract
Widely-used public benchmarks are of huge importance to computer vision and machine learning research, especially with the computational resources required to reproduce state of the art results quickly becoming untenable. In medical image computing, the wide variety of image modalities and problem formulations yields a huge task-space for benchmarks to cover, and thus the widespread adoption of standard benchmarks has been slow, and barriers to releasing medical data exacerbate this issue. In this paper, we examine the role that publicly available data has played in MICCAI papers from the past five years. We find that more than half of these papers are based on private data alone, although this proportion seems to be decreasing over time. Additionally, we observed that after controlling for open access publication and the release of code, papers based on public data were cited over 60% more per year than their private-data counterparts. Further, we found that more than 20% of papers using public data did not provide a citation to the dataset or associated manuscript, highlighting the "second-rate" status that data contributions often take compared to theoretical ones. We conclude by making recommendations for MICCAI policies which could help to better incentivise data sharing and move the field toward more efficient and reproducible science., Comment: 8 pages, 2 figures
- Published
- 2019
13. Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper
- Author
-
Maaz, Muhammad
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Systematic reviews, which summarize and synthesize all the current research in a specific topic, are a crucial component to academia. They are especially important in the biomedical and health sciences, where they synthesize the state of medical evidence and conclude the best course of action for various diseases, pathologies, and treatments. Due to the immense amount of literature that exists, as well as the output rate of research, reviewing abstracts can be a laborious process. Automation may be able to significantly reduce this workload. Of course, such classifications are not easily automated due to the peculiar nature of written language. Machine learning may be able to help. This paper explored the viability and effectiveness of using machine learning modelling to classify abstracts according to specific exclusion/inclusion criteria, as would be done in the first stage of a systematic review. The specific task was performing the classification of deciding whether an abstract is a randomized control trial (RCT) or not, a very common classification made in systematic reviews in the healthcare field. Random training/testing splits of an n=2042 dataset of labelled abstracts were repeatedly created (1000 times in total), with a model trained and tested on each of these instances. A Bayes classifier as well as an SVM classifier were used, and compared to non-machine learning, simplistic approaches to textual classification. An SVM classifier was seen to be highly effective, yielding a 90% accuracy, as well as an F1 score of 0.84, and yielded a potential workload reduction of 70%. This shows that machine learning has the potential to significantly revolutionize the abstract screening process in healthcare systematic reviews., Comment: 10 pages, 2 figures, 6 tables
- Published
- 2019
14. On Estimating Maximum Sum Rate of MIMO Systems with Successive Zero-Forcing Dirty Paper Coding and Per-antenna Power Constraint
- Author
-
Pham, Thuy M., Farrell, Ronan, and Tran, Le-Nam
- Subjects
Computer Science - Information Theory ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
In this paper, we study the sum rate maximization for successive zero-forcing dirty-paper coding (SZFDPC) with per-antenna power constraint (PAPC). Although SZFDPC is a low-complexity alternative to the optimal dirty paper coding (DPC), efficient algorithms to compute its sum rate are still open problems especially under practical PAPC. The existing solution to the considered problem is computationally inefficient due to employing high-complexity interior-point method. In this study, we propose two new low-complexity approaches to this important problem. More specifically, the first algorithm achieves the optimal solution by transforming the original problem in the broadcast channel into an equivalent problem in the multiple access channel, then the resulting problem is solved by alternating optimization together with successive convex approximation. We also derive a suboptimal solution based on machine learning to which simple linear regressions are applicable. The approaches are analyzed and validated extensively to demonstrate their superiors over the existing approach., Comment: 5 pages, 4 figures
- Published
- 2019
15. Topological based classification of paper domains using graph convolutional networks
- Author
-
Benami, Idan, Cohen, Keren, Nagar, Oved, and Louzoun, Yoram
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
The main approaches for node classification in graphs are information propagation and the association of the class of the node with external information. State of the art methods merge these approaches through Graph Convolutional Networks. We here use the association of topological features of the nodes with their class to predict this class. Moreover, combining topological information with information propagation improves classification accuracy on the standard CiteSeer and Cora paper classification task. Topological features and information propagation produce results almost as good as text-based classification, without no textual or content information. We propose to represent the topology and information propagation through a GCN with the neighboring training node classification as an input and the current node classification as output. Such a formalism outperforms state of the art methods.
- Published
- 2019
16. Should we Reload Time Series Classification Performance Evaluation ? (a position paper)
- Author
-
Gay, Dominique and Lemaire, Vincent
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Since the introduction and the public availability of the \textsc{ucr} time series benchmark data sets, numerous Time Series Classification (TSC) methods has been designed, evaluated and compared to each others. We suggest a critical view of TSC performance evaluation protocols put in place in recent TSC literature. The main goal of this `position' paper is to stimulate discussion and reflexion about performance evaluation in TSC literature., Comment: 8 pages
- Published
- 2019
17. Learning Taxonomies of Concepts and not Words using Contextualized Word Representations: A Position Paper
- Author
-
Schmelzeisen, Lukas and Staab, Steffen
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Taxonomies are semantic hierarchies of concepts. One limitation of current taxonomy learning systems is that they define concepts as single words. This position paper argues that contextualized word representations, which recently achieved state-of-the-art results on many competitive NLP tasks, are a promising method to address this limitation. We outline a novel approach for taxonomy learning that (1) defines concepts as synsets, (2) learns density-based approximations of contextualized word representations, and (3) can measure similarity and hypernymy among them., Comment: 5 pages, 1 figure
- Published
- 2019
18. Machine Learning in High Energy Physics Community White Paper
- Author
-
Albertsson, Kim, Altoe, Piero, Anderson, Dustin, Anderson, John, Andrews, Michael, Espinosa, Juan Pedro Araque, Aurisano, Adam, Basara, Laurent, Bevan, Adrian, Bhimji, Wahid, Bonacorsi, Daniele, Burkle, Bjorn, Calafiura, Paolo, Campanelli, Mario, Capps, Louis, Carminati, Federico, Carrazza, Stefano, Chen, Yi-fan, Childers, Taylor, Coadou, Yann, Coniavitis, Elias, Cranmer, Kyle, David, Claire, Davis, Douglas, De Simone, Andrea, Duarte, Javier, Erdmann, Martin, Eschle, Jonas, Farbin, Amir, Feickert, Matthew, Castro, Nuno Filipe, Fitzpatrick, Conor, Floris, Michele, Forti, Alessandra, Garra-Tico, Jordi, Gemmler, Jochen, Girone, Maria, Glaysher, Paul, Gleyzer, Sergei, Gligorov, Vladimir, Golling, Tobias, Graw, Jonas, Gray, Lindsey, Greenwood, Dick, Hacker, Thomas, Harvey, John, Hegner, Benedikt, Heinrich, Lukas, Heintz, Ulrich, Hooberman, Ben, Junggeburth, Johannes, Kagan, Michael, Kane, Meghan, Kanishchev, Konstantin, Karpiński, Przemysław, Kassabov, Zahari, Kaul, Gaurav, Kcira, Dorian, Keck, Thomas, Klimentov, Alexei, Kowalkowski, Jim, Kreczko, Luke, Kurepin, Alexander, Kutschke, Rob, Kuznetsov, Valentin, Köhler, Nicolas, Lakomov, Igor, Lannon, Kevin, Lassnig, Mario, Limosani, Antonio, Louppe, Gilles, Mangu, Aashrita, Mato, Pere, Meenakshi, Narain, Meinhard, Helge, Menasce, Dario, Moneta, Lorenzo, Moortgat, Seth, Neubauer, Mark, Newman, Harvey, Otten, Sydney, Pabst, Hans, Paganini, Michela, Paulini, Manfred, Perdue, Gabriel, Perez, Uzziel, Picazio, Attilio, Pivarski, Jim, Prosper, Harrison, Psihas, Fernanda, Radovic, Alexander, Reece, Ryan, Rinkevicius, Aurelius, Rodrigues, Eduardo, Rorie, Jamal, Rousseau, David, Sauers, Aaron, Schramm, Steven, Schwartzman, Ariel, Severini, Horst, Seyfert, Paul, Siroky, Filip, Skazytkin, Konstantin, Sokoloff, Mike, Stewart, Graeme, Stienen, Bob, Stockdale, Ian, Strong, Giles, Sun, Wei, Thais, Savannah, Tomko, Karen, Upfal, Eli, Usai, Emanuele, Ustyuzhanin, Andrey, Vala, Martin, Vasel, Justin, Vallecorsa, Sofia, Verzetti, Mauro, Vilasís-Cardona, Xavier, Vlimant, Jean-Roch, Vukotic, Ilija, Wang, Sean-Jiun, Watts, Gordon, Williams, Michael, Wu, Wenjing, Wunsch, Stefan, Yang, Kun, and Zapata, Omar
- Subjects
Physics - Computational Physics ,Computer Science - Machine Learning ,High Energy Physics - Experiment ,Statistics - Machine Learning - Abstract
Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We detail a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit., Comment: Editors: Sergei Gleyzer, Paul Seyfert and Steven Schramm
- Published
- 2018
19. When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings
- Author
-
Wasi, Azmine Toushik, Karlo, Šerbetar, Islam, Raima, Rafi, Taki Hasan, and Chae, Dong-Kyu
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Computation and Language ,Computer Science - Information Retrieval ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP., Comment: 7 pages, 2 figures, 5 tables, Accepted (invited to present) to the The Second Tiny Papers Track at ICLR 2024 (https://openreview.net/forum?id=VUYCyH8fCw)
- Published
- 2024
20. PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning
- Author
-
Lee, Jaejun, Hwang, Minsung, and Whang, Joyce Jiyoung
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs., Comment: Accepted to ICML 2024. This is not the final version of the paper. The camera-ready version will be uploaded soon
- Published
- 2024
21. Interpretability Needs a New Paradigm
- Author
-
Madsen, Andreas, Lakkaraju, Himabindu, Reddy, Siva, and Chandar, Sarath
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained. At the core of this debate is how each paradigm ensures its explanations are faithful, i.e., true to the model's behavior. This is important, as false but convincing explanations lead to unsupported confidence in artificial intelligence (AI), which can be dangerous. This paper's position is that we should think about new paradigms while staying vigilant regarding faithfulness. First, by examining the history of paradigms in science, we see that paradigms are constantly evolving. Then, by examining the current paradigms, we can understand their underlying beliefs, the value they bring, and their limitations. Finally, this paper presents 3 emerging paradigms for interpretability. The first paradigm designs models such that faithfulness can be easily measured. Another optimizes models such that explanations become faithful. The last paradigm proposes to develop models that produce both a prediction and an explanation.
- Published
- 2024
22. Learning with Posterior Sampling for Revenue Management under Time-varying Demand
- Author
-
Shimizu, Kazuma, Honda, Junya, Ito, Shinji, and Nakadai, Shinji
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments., Comment: An extended version of the paper accepted by the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)
- Published
- 2024
23. Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning
- Author
-
Gu, Yihong, Fang, Cong, Bühlmann, Peter, and Fan, Jianqing
- Subjects
Mathematics - Statistics Theory ,Computer Science - Machine Learning ,Statistics - Methodology ,Statistics - Machine Learning ,62G08 - Abstract
Statistics suffers from a fundamental problem, "the curse of endogeneity" -- the regression function, or more broadly the prediction risk minimizer with infinite data, may not be the target we wish to pursue. This is because when complex data are collected from multiple sources, the biases deviated from the interested (causal) association inherited in individuals or sub-populations are not expected to be canceled. Traditional remedies are of hindsight and restrictive in being tailored to prior knowledge like untestable cause-effect structures, resulting in methods that risk model misspecification and lack scalable applicability. This paper seeks to offer a purely data-driven and universally applicable method that only uses the heterogeneity of the biases in the data rather than following pre-offered commandments. Such an idea is formulated as a nonparametric invariance pursuit problem, whose goal is to unveil the invariant conditional expectation $m^\star(x)\equiv \mathbb{E}[Y^{(e)}|X_{S^\star}^{(e)}=x_{S^\star}]$ with unknown important variable set $S^\star$ across heterogeneous environments $e\in \mathcal{E}$. Under the structural causal model framework, $m^\star$ can be interpreted as certain data-driven causality in general. The paper contributes to proposing a novel framework, called Focused Adversarial Invariance Regularization (FAIR), formulated as a single minimax optimization program that can solve the general invariance pursuit problem. As illustrated by the unified non-asymptotic analysis, our adversarial estimation framework can attain provable sample-efficient estimation akin to standard regression under a minimal identification condition for various tasks and models. As an application, the FAIR-NN estimator realized by two Neural Network classes is highlighted as the first approach to attain statistically efficient estimation in general nonparametric invariance learning., Comment: 47 pages, 5 figures with appendix
- Published
- 2024
24. Stability Evaluation via Distributional Perturbation Analysis
- Author
-
Blanchet, Jose, Cui, Peng, Li, Jiajin, and Liu, Jiashuo
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Optimization and Control - Abstract
The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this paper, we utilize the optimal transport (OT) discrepancy with moment constraints on the \textit{(sample, density)} space to quantify this perturbation. Therefore, our stability evaluation criterion can address both \emph{data corruptions} and \emph{sub-population shifts} -- the two most common types of distribution shifts in real-world scenarios. To further realize practical benefits, we present a series of tractable convex formulations and computational methods tailored to different classes of loss functions. The key technical tool to achieve this is the strong duality theorem provided in this paper. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion's ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models., Comment: Accepted by ICML 2024
- Published
- 2024
25. Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
- Author
-
Nguyen, Quan and Dieng, Adji Bousso
- Subjects
Statistics - Machine Learning ,Condensed Matter - Materials Science ,Computer Science - Machine Learning ,Quantitative Biology - Biomolecules - Abstract
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines., Comment: Published in International Conference on Machine Learning, ICML 2024. Code can be found in the Vertaix GitHub: https://github.com/vertaix/Quality-Weighted-Vendi-Score. Paper dedicated to Kwame Nkrumah
- Published
- 2024
26. The Privacy Power of Correlated Noise in Decentralized Learning
- Author
-
Allouah, Youssef, Koloskova, Anastasia, Firdoussi, Aymane El, Jaggi, Martin, and Guerraoui, Rachid
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Distributed, Parallel, and Cluster Computing ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use., Comment: Accepted as conference paper at ICML 2024
- Published
- 2024
27. Error Exponent in Agnostic PAC Learning
- Author
-
Hendel, Adi and Feder, Meir
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Statistical learning theory and the Probably Approximately Correct (PAC) criterion are the common approach to mathematical learning theory. PAC is widely used to analyze learning problems and algorithms, and have been studied thoroughly. Uniform worst case bounds on the convergence rate have been well established using, e.g., VC theory or Radamacher complexity. However, in a typical scenario the performance could be much better. In this paper, we consider PAC learning using a somewhat different tradeoff, the error exponent - a well established analysis method in Information Theory - which describes the exponential behavior of the probability that the risk will exceed a certain threshold as function of the sample size. We focus on binary classification and find, under some stability assumptions, an improved distribution dependent error exponent for a wide range of problems, establishing the exponential behavior of the PAC error probability in agnostic learning. Interestingly, under these assumptions, agnostic learning may have the same error exponent as realizable learning. The error exponent criterion can be applied to analyze knowledge distillation, a problem that so far lacks a theoretical analysis., Comment: paper with appendix to accepted ISIT2024 paper with the same name
- Published
- 2024
28. Uncertainty quantification for iterative algorithms in linear models with application to early stopping
- Author
-
Bellec, Pierre C. and Tan, Kai
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Statistics - Computation ,Statistics - Methodology - Abstract
This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results.
- Published
- 2024
29. From Linear to Linearizable Optimization: A Novel Framework with Applications to Stationary and Non-stationary DR-submodular Optimization
- Author
-
Pedramfar, Mohammad and Aggarwal, Vaneet
- Subjects
Mathematics - Optimization and Control ,Computer Science - Computational Complexity ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
This paper introduces the notion of upper linearizable/quadratizable functions, a class that extends concavity and DR-submodularity in various settings, including monotone and non-monotone cases over different convex sets. A general meta-algorithm is devised to convert algorithms for linear/quadratic maximization into ones that optimize upper quadratizable functions, offering a unified approach to tackling concave and DR-submodular optimization problems. The paper extends these results to multiple feedback settings, facilitating conversions between semi-bandit/first-order feedback and bandit/zeroth-order feedback, as well as between first/zeroth-order feedback and semi-bandit/bandit feedback. Leveraging this framework, new algorithms are derived using existing results as base algorithms for convex optimization, improving upon state-of-the-art results in various cases. Dynamic and adaptive regret guarantees are obtained for DR-submodular maximization, marking the first algorithms to achieve such guarantees in these settings. Notably, the paper achieves these advancements with fewer assumptions compared to existing state-of-the-art results, underscoring its broad applicability and theoretical contributions to non-convex optimization., Comment: arXiv admin note: text overlap with arXiv:2402.08621
- Published
- 2024
30. A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting
- Author
-
Adachi, Masaki, Hayakawa, Satoshi, Jørgensen, Martin, Hamid, Saad, Oberhauser, Harald, and Osborne, Michael A.
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Statistics - Machine Learning ,62C10, 62F15 - Abstract
Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion., Comment: This work is the journal extension of the workshop paper (arXiv:2301.11832) and AISTATS paper (arXiv:2306.05843). 48 pages, 11 figures
- Published
- 2024
31. Classification Tree-based Active Learning: A Wrapper Approach
- Author
-
Jose, Ashna, Devijver, Emilie, Amini, Massih-Reza, Jakse, Noel, and Poloni, Roberta
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Supervised machine learning often requires large training sets to train accurate models, yet obtaining large amounts of labeled data is not always feasible. Hence, it becomes crucial to explore active learning methods for reducing the size of training sets while maintaining high accuracy. The aim is to select the optimal subset of data for labeling from an initial unlabeled set, ensuring precise prediction of outcomes. However, conventional active learning approaches are comparable to classical random sampling. This paper proposes a wrapper active learning method for classification, organizing the sampling process into a tree structure, that improves state-of-the-art algorithms. A classification tree constructed on an initial set of labeled samples is considered to decompose the space into low-entropy regions. Input-space based criteria are used thereafter to sub-sample from these regions, the total number of points to be labeled being decomposed into each region. This adaptation proves to be a significant enhancement over existing active learning methods. Through experiments conducted on various benchmark data sets, the paper demonstrates the efficacy of the proposed framework by being effective in constructing accurate classification models, even when provided with a severely restricted labeled data set.
- Published
- 2024
32. An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
- Author
-
Chen, Minshuo, Mei, Song, Fan, Jianqing, and Wang, Mengdi
- Subjects
Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Statistics - Machine Learning - Abstract
Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active guidance towards task-desired properties. Despite the significant empirical success, theory of diffusion models is very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls. Next, we overview the existing theories of diffusion models, covering their statistical properties and sampling capabilities. We adopt a progressive routine, beginning with unconditional diffusion models and connecting to conditional counterparts. Further, we review a new avenue in high-dimensional structured optimization through conditional diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded theoretical exposure for stimulating forward-looking theories and methods of diffusion models.
- Published
- 2024
33. On the Convergence of Continual Learning with Adaptive Methods
- Author
-
Han, Seungyub, Kim, Yeongmo, Cho, Taehyun, and Lee, Jungwoo
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based continual learning with stochastic gradient descent and empirical evidence that training current tasks causes the cumulative degradation of previous tasks. We propose an adaptive method for nonconvex continual learning (NCCL), which adjusts step sizes of both previous and current tasks with the gradients. The proposed method can achieve the same convergence rate as the SGD method when the catastrophic forgetting term which we define in the paper is suppressed at each iteration. Further, we demonstrate that the proposed algorithm improves the performance of continual learning over existing methods for several image classification tasks., Comment: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI 2023), see https://proceedings.mlr.press/v216/han23a.html
- Published
- 2024
34. On the Learnability of Out-of-distribution Detection
- Author
-
Fang, Zhen, Li, Yixuan, Liu, Feng, Han, Bo, and Lu, Jie
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Statistics - Machine Learning - Abstract
Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms, and corresponding learning theory is still an open problem. To study the generalization of OOD detection, this paper investigates the probably approximately correct (PAC) learning theory of OOD detection that fits the commonly used evaluation metrics in the literature. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we offer theoretical support for representative OOD detection works based on our OOD theory., Comment: Accepted by JMLR in 7th of April, 2024. This is a journal extension of the previous NeurIPS 2022 Outstanding Paper "Is Out-of-distribution Detection Learnable?" [arXiv:2210.14707]
- Published
- 2024
35. Demand Balancing in Primal-Dual Optimization for Blind Network Revenue Management
- Author
-
Miao, Sentao and Wang, Yining
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
This paper proposes a practically efficient algorithm with optimal theoretical regret which solves the classical network revenue management (NRM) problem with unknown, nonparametric demand. Over a time horizon of length $T$, in each time period the retailer needs to decide prices of $N$ types of products which are produced based on $M$ types of resources with unreplenishable initial inventory. When demand is nonparametric with some mild assumptions, Miao and Wang (2021) is the first paper which proposes an algorithm with $O(\text{poly}(N,M,\ln(T))\sqrt{T})$ type of regret (in particular, $\tilde O(N^{3.5}\sqrt{T})$ plus additional high-order terms that are $o(\sqrt{T})$ with sufficiently large $T\gg N$). In this paper, we improve the previous result by proposing a primal-dual optimization algorithm which is not only more practical, but also with an improved regret of $\tilde O(N^{3.25}\sqrt{T})$ free from additional high-order terms. A key technical contribution of the proposed algorithm is the so-called demand balancing, which pairs the primal solution (i.e., the price) in each time period with another price to offset the violation of complementary slackness on resource inventory constraints. Numerical experiments compared with several benchmark algorithms further illustrate the effectiveness of our algorithm.
- Published
- 2024
36. Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization
- Author
-
Zhang, Qi, Zhou, Yi, Prater-Bennette, Ashley, Shen, Lixin, and Zou, Shaofeng
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $\chi^2$-divergences as a special case. We prove that our algorithm finds an $\epsilon$-stationary point with a computational complexity of $\mathcal O(\epsilon^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO., Comment: We have corrected Theorem 1 in Sec 4 for AAAI 2024 version, where the order of $n_z$ changes from ${\epsilon}^{-k_*} )$ to ${\epsilon}^{-2k_*-2}$
- Published
- 2024
37. Optimal Policy Learning with Observational Data in Multi-Action Scenarios: Estimation, Risk Preference, and Potential Failures
- Author
-
Cerulli, Giovanni
- Subjects
Statistics - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
This paper deals with optimal policy learning (OPL) with observational data, i.e. data-driven optimal decision-making, in multi-action (or multi-arm) settings, where a finite set of decision options is available. It is organized in three parts, where I discuss respectively: estimation, risk preference, and potential failures. The first part provides a brief review of the key approaches to estimating the reward (or value) function and optimal policy within this context of analysis. Here, I delineate the identification assumptions and statistical properties related to offline optimal policy learning estimators. In the second part, I delve into the analysis of decision risk. This analysis reveals that the optimal choice can be influenced by the decision maker's attitude towards risks, specifically in terms of the trade-off between reward conditional mean and conditional variance. Here, I present an application of the proposed model to real data, illustrating that the average regret of a policy with multi-valued treatment is contingent on the decision-maker's attitude towards risk. The third part of the paper discusses the limitations of optimal data-driven decision-making by highlighting conditions under which decision-making can falter. This aspect is linked to the failure of the two fundamental assumptions essential for identifying the optimal choice: (i) overlapping, and (ii) unconfoundedness. Some conclusions end the paper.
- Published
- 2024
38. Clustering Change Sign Detection by Fusing Mixture Complexity
- Author
-
Urano, Kento, Yuki, Ryo, and Yamanishi, Kenji
- Subjects
Statistics - Machine Learning ,Computer Science - Information Theory ,Computer Science - Machine Learning - Abstract
This paper proposes an early detection method for cluster structural changes. Cluster structure refers to discrete structural characteristics, such as the number of clusters, when data are represented using finite mixture models, such as Gaussian mixture models. We focused on scenarios in which the cluster structure gradually changed over time. For finite mixture models, the concept of mixture complexity (MC) measures the continuous cluster size by considering the cluster proportion bias and overlap between clusters. In this paper, we propose MC fusion as an extension of MC to handle situations in which multiple mixture numbers are possible in a finite mixture model. By incorporating the fusion of multiple models, our approach accurately captured the cluster structure during transitional periods of gradual change. Moreover, we introduce a method for detecting changes in the cluster structure by examining the transition of MC fusion. We demonstrate the effectiveness of our method through empirical analysis using both artificial and real-world datasets., Comment: 23 pages
- Published
- 2024
39. Usage-Specific Survival Modeling Based on Operational Data and Neural Networks
- Author
-
Holmer, Olov, Krysander, Mattias, and Frisk, Erik
- Subjects
Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Systems and Control ,Statistics - Machine Learning - Abstract
Accurate predictions of when a component will fail are crucial when planning maintenance, and by modeling the distribution of these failure times, survival models have shown to be particularly useful in this context. The presented methodology is based on conventional neural network-based survival models that are trained using data that is continuously gathered and stored at specific times, called snapshots. An important property of this type of training data is that it can contain more than one snapshot from a specific individual which results in that standard maximum likelihood training can not be directly applied since the data is not independent. However, the papers show that if the data is in a specific format where all snapshot times are the same for all individuals, called homogeneously sampled, maximum likelihood training can be applied and produce desirable results. In many cases, the data is not homogeneously sampled and in this case, it is proposed to resample the data to make it homogeneously sampled. How densely the dataset is sampled turns out to be an important parameter; it should be chosen large enough to produce good results, but this also increases the size of the dataset which makes training slow. To reduce the number of samples needed during training, the paper also proposes a technique to, instead of resampling the dataset once before the training starts, randomly resample the dataset at the start of each epoch during the training. The proposed methodology is evaluated on both a simulated dataset and an experimental dataset of starter battery failures. The results show that if the data is homogeneously sampled the methodology works as intended and produces accurate survival models. The results also show that randomly resampling the dataset on each epoch is an effective way to reduce the size of the training data., Comment: 7 pages
- Published
- 2024
40. skscope: Fast Sparsity-Constrained Optimization in Python
- Author
-
Wang, Zezhi, Zhu, Jin, Chen, Peng, Peng, Huiyang, Zhang, Xiaoke, Wang, Anran, Zheng, Yu, Zhu, Junxian, and Wang, Xueqin
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Statistics - Computation - Abstract
Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope., Comment: 4 pages
- Published
- 2024
41. A note on generalization bounds for losses with finite moments
- Author
-
Rodríguez-Gálvez, Borja, Rivasplata, Omar, Thobaben, Ragnar, and Skoglund, Mikael
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings., Comment: 9 pages: 5 of main text, 1 of references, and 3 of appendices
- Published
- 2024
42. Near-Optimal differentially private low-rank trace regression with guaranteed private initialization
- Author
-
Zha, Mengyue
- Subjects
Statistics - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
We study differentially private (DP) estimation of a rank-$r$ matrix $M \in \mathbb{R}^{d_1\times d_2}$ under the trace regression model with Gaussian measurement matrices. Theoretically, the sensitivity of non-private spectral initialization is precisely characterized, and the differential-privacy-constrained minimax lower bound for estimating $M$ under the Schatten-$q$ norm is established. Methodologically, the paper introduces a computationally efficient algorithm for DP-initialization with a sample size of $n \geq \widetilde O (r^2 (d_1\vee d_2))$. Under certain regularity conditions, the DP-initialization falls within a local ball surrounding $M$. We also propose a differentially private algorithm for estimating $M$ based on Riemannian optimization (DP-RGrad), which achieves a near-optimal convergence rate with the DP-initialization and sample size of $n \geq \widetilde O(r (d_1 + d_2))$. Finally, the paper discusses the non-trivial gap between the minimax lower bound and the upper bound of low-rank matrix estimation under the trace regression model. It is shown that the estimator given by DP-RGrad attains the optimal convergence rate in a weaker notion of differential privacy. Our powerful technique for analyzing the sensitivity of initialization requires no eigengap condition between $r$ non-zero singular values.
- Published
- 2024
43. Boarding for ISS: Imbalanced Self-Supervised: Discovery of a Scaled Autoencoder for Mixed Tabular Datasets
- Author
-
Stocksieker, Samuel, Pommeret, Denys, and Charpentier, Arthur
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
The field of imbalanced self-supervised learning, especially in the context of tabular data, has not been extensively studied. Existing research has predominantly focused on image datasets. This paper aims to fill this gap by examining the specific challenges posed by data imbalance in self-supervised learning in the domain of tabular data, with a primary focus on autoencoders. Autoencoders are widely employed for learning and constructing a new representation of a dataset, particularly for dimensionality reduction. They are also often used for generative model learning, as seen in variational autoencoders. When dealing with mixed tabular data, qualitative variables are often encoded using a one-hot encoder with a standard loss function (MSE or Cross Entropy). In this paper, we analyze the drawbacks of this approach, especially when categorical variables are imbalanced. We propose a novel metric to balance learning: a Multi-Supervised Balanced MSE. This approach reduces the reconstruction error by balancing the influence of variables. Finally, we empirically demonstrate that this new metric, compared to the standard MSE: i) outperforms when the dataset is imbalanced, especially when the learning process is insufficient, and ii) provides similar results in the opposite case.
- Published
- 2024
44. Automatic Outlier Rectification via Optimal Transport
- Author
-
Blanchet, Jose, Li, Jiajin, Pelger, Markus, and Zanotti, Greg
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Statistics - Methodology - Abstract
In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize an optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We discuss the fundamental differences between our estimator and optimal transport-based distributionally robust optimization estimator. finally, we demonstrate the effectiveness and superiority of our approach over conventional approaches in extensive simulation and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.
- Published
- 2024
45. A Probabilistic Approach for Alignment with Human Comparisons
- Author
-
Cao, Junyu and Bayati, Mohsen
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
A growing trend involves integrating human knowledge into learning frameworks, leveraging subtle human feedback to refine AI models. Despite these advances, no comprehensive theoretical framework describing the specific conditions under which human comparisons improve the traditional supervised fine-tuning process has been developed. To bridge this gap, this paper studies the effective use of human comparisons to address limitations arising from noisy data and high-dimensional models. We propose a two-stage "Supervised Fine Tuning+Human Comparison" (SFT+HC) framework connecting machine learning with human feedback through a probabilistic bisection approach. The two-stage framework first learns low-dimensional representations from noisy-labeled data via an SFT procedure, and then uses human comparisons to improve the model alignment. To examine the efficacy of the alignment phase, we introduce a novel concept termed the "label-noise-to-comparison-accuracy" (LNCA) ratio. This paper theoretically identifies the conditions under which the "SFT+HC" framework outperforms pure SFT approach, leveraging this ratio to highlight the advantage of incorporating human evaluators in reducing sample complexity. We validate that the proposed conditions for the LNCA ratio are met in a case study conducted via an Amazon Mechanical Turk experiment.
- Published
- 2024
46. Conformal Predictions for Probabilistically Robust Scalable Machine Learning Classification
- Author
-
Carlevaro, Alberto, Cantarero, Teodoro Alamo, Dabbene, Fabrizio, and Mongelli, Maurizio
- Subjects
Statistics - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning ,68T37 (Primary) ,I.5.3 - Abstract
Conformal predictions make it possible to define reliable and robust learning algorithms. But they are essentially a method for evaluating whether an algorithm is good enough to be used in practice. To define a reliable learning framework for classification from the very beginning of its design, the concept of scalable classifier was introduced to generalize the concept of classical classifier by linking it to statistical order theory and probabilistic learning theory. In this paper, we analyze the similarities between scalable classifiers and conformal predictions by introducing a new definition of a score function and defining a special set of input variables, the conformal safety set, which can identify patterns in the input space that satisfy the error coverage guarantee, i.e., that the probability of observing the wrong (possibly unsafe) label for points belonging to this set is bounded by a predefined $\varepsilon$ error level. We demonstrate the practical implications of this framework through an application in cybersecurity for identifying DNS tunneling attacks. Our work contributes to the development of probabilistically robust and reliable machine learning models., Comment: 19 pages, 6 figures, journal paper
- Published
- 2024
47. Recursive Causal Discovery
- Author
-
Mokhtarian, Ehsan, Elahi, Sepehr, Akbari, Sina, and Kiyavash, Negar
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and extends four of our prior publications (Mokhtarian et al., 2021; Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced the concept of removable variables, which are the only variables that can be removed recursively for the purpose of causal discovery. Presence and identification of removable variables allow recursive approaches for causal discovery, a promising solution that helps to address the aforementioned challenges by reducing the problem size successively. This reduction not only minimizes conditioning sets in each conditional independence (CI) test, leading to fewer errors but also significantly decreases the number of required CI tests. The worst-case performances of these methods nearly match the lower bound. In this paper, we present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation. A comprehensive literature review is also included, comparing the computational complexity of our methods with existing approaches, showcasing their state-of-the-art efficiency. Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com., Comment: 50 pages, 5 tables, 11 algorithms, 5 figures
- Published
- 2024
48. On the Convergence of Locally Adaptive and Scalable Diffusion-Based Sampling Methods for Deep Bayesian Neural Network Posteriors
- Author
-
Rensmeyer, Tim and Niggemann, Oliver
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Achieving robust uncertainty quantification for deep neural networks represents an important requirement in many real-world applications of deep learning such as medical imaging where it is necessary to assess the reliability of a neural network's prediction. Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks. Unfortunately, generating samples from the posterior distribution of neural networks is a major challenge. One significant advance in that direction would be the incorporation of adaptive step sizes, similar to modern neural network optimizers, into Monte Carlo Markov chain sampling algorithms without significantly increasing computational demand. Over the past years, several papers have introduced sampling algorithms with claims that they achieve this property. However, do they indeed converge to the correct distribution? In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.
- Published
- 2024
49. Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
- Author
-
Kumar, Akshay and Haupt, Jarvis
- Subjects
Computer Science - Machine Learning ,Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function.
- Published
- 2024
50. On the Approximation of Kernel functions
- Author
-
Dommel, Paul and Pichler, Alois
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
Various methods in statistical learning build on kernels considered in reproducing kernel Hilbert spaces. In applications, the kernel is often selected based on characteristics of the problem and the data. This kernel is then employed to infer response variables at points, where no explanatory data were observed. The data considered here are located in compact sets in higher dimensions and the paper addresses approximations of the kernel itself. The new approach considers Taylor series approximations of radial kernel functions. For the Gauss kernel on the unit cube, the paper establishes an upper bound of the associated eigenfunctions, which grows only polynomially with respect to the index. The novel approach substantiates smaller regularization parameters than considered in the literature, overall leading to better approximations. This improvement confirms low rank approximation methods such as the Nystr\"om method.
- Published
- 2024
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.