243,023 results on '"Vardi A."'
Search Results
2. CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering
- Author
-
Vardi, Ben, Nir, Oron, and Shamir, Ariel
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent Vision-Language Models (VLMs) have demonstrated remarkable capabilities in visual understanding and reasoning, and in particular on multiple-choice Visual Question Answering (VQA). Still, these models can make distinctly unnatural errors, for example, providing (wrong) answers to unanswerable VQA questions, such as questions asking about objects that do not appear in the image. To address this issue, we propose CLIP-UP: CLIP-based Unanswerable Problem detection, a novel lightweight method for equipping VLMs with the ability to withhold answers to unanswerable questions. By leveraging CLIP to extract question-image alignment information, CLIP-UP requires only efficient training of a few additional layers, while keeping the original VLMs' weights unchanged. Tested across LLaVA models, CLIP-UP achieves state-of-the-art results on the MM-UPD benchmark for assessing unanswerability in multiple-choice VQA, while preserving the original performance on other tasks.
- Published
- 2025
3. Falsification of Autonomous Systems in Rich Environments
- Author
-
Elimelech, Khen, Lahijanian, Morteza, Kavraki, Lydia E., and Vardi, Moshe Y.
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Validating the behavior of autonomous Cyber-Physical Systems (CPS) and Artificial Intelligence (AI) agents, which rely on automated controllers, is an objective of great importance. In recent years, Neural-Network (NN) controllers have been demonstrating great promise. Unfortunately, such learned controllers are often not certified and can cause the system to suffer from unpredictable or unsafe behavior. To mitigate this issue, a great effort has been dedicated to automated verification of systems. Specifically, works in the category of ``black-box testing'' rely on repeated system simulations to find a falsifying counterexample of a system run that violates a specification. As running high-fidelity simulations is computationally demanding, the goal of falsification approaches is to minimize the simulation effort (NN inference queries) needed to return a falsifying example. This often proves to be a great challenge, especially when the tested controller is well-trained. This work contributes a novel falsification approach for autonomous systems under formal specification operating in uncertain environments. We are especially interested in CPS operating in rich, semantically-defined, open environments, which yield high-dimensional, simulation-dependent sensor observations. Our approach introduces a novel reformulation of the falsification problem as the problem of planning a trajectory for a ``meta-system,'' which wraps and encapsulates the examined system; we call this approach: meta-planning. This formulation can be solved with standard sampling-based motion-planning techniques (like RRT) and can gradually integrate domain knowledge to improve the search. We support the suggested approach with an experimental study on falsification of an obstacle-avoiding autonomous car with a NN controller, where meta-planning demonstrates superior performance over alternative approaches.
- Published
- 2024
4. LTLf Synthesis Under Unreliable Input
- Author
-
Hagemeier, Christian, de Giacomo, Giuseppe, and Vardi, Moshe Y.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Logic in Computer Science - Abstract
We study the problem of realizing strategies for an LTLf goal specification while ensuring that at least an LTLf backup specification is satisfied in case of unreliability of certain input variables. We formally define the problem and characterize its worst-case complexity as 2EXPTIME-complete, like standard LTLf synthesis. Then we devise three different solution techniques: one based on direct automata manipulation, which is 2EXPTIME, one disregarding unreliable input variables by adopting a belief construction, which is 3EXPTIME, and one leveraging second-order quantified LTLf (QLTLf), which is 2EXPTIME and allows for a direct encoding into monadic second-order logic, which in turn is worst-case nonelementary. We prove their correctness and evaluate them against each other empirically. Interestingly, theoretical worst-case bounds do not translate into observed performance; the MSO technique performs best, followed by belief construction and direct automata manipulation. As a byproduct of our study, we provide a general synthesis procedure for arbitrary QLTLf specifications., Comment: 8 pages, to appear at AAAI2025
- Published
- 2024
5. Generating medical screening questionnaires through analysis of social media data
- Author
-
Ashkenazi, Ortal, Yom-Tov, Elad, and David, Liron Vardi
- Subjects
Computer Science - Machine Learning ,Computer Science - Computers and Society - Abstract
Screening questionnaires are used in medicine as a diagnostic aid. Creating them is a long and expensive process, which could potentially be improved through analysis of social media posts related to symptoms and behaviors prior to diagnosis. Here we show a preliminary investigation into the feasibility of generating screening questionnaires for a given medical condition from social media postings. The method first identifies a cohort of relevant users through their posts in dedicated patient groups and a control group of users who reported similar symptoms but did not report being diagnosed with the condition of interest. Posts made prior to diagnosis are used to generate decision rules to differentiate between the different groups, by clustering symptoms mentioned by these users and training a decision tree to differentiate between the two groups. We validate the generated rules by correlating them with scores given by medical doctors to matching hypothetical cases. We demonstrate the proposed method by creating questionnaires for three conditions (endometriosis, lupus, and gout) using the data of several hundreds of users from Reddit. These questionnaires were then validated by medical doctors. The average Pearson's correlation between the latter's scores and the decision rules were 0.58 (endometriosis), 0.40 (lupus) and 0.27 (gout). Our results suggest that the process of questionnaire generation can be, at least partly, automated. These questionnaires are advantageous in that they are based on real-world experience but are currently lacking in their ability to capture the context, duration, and timing of symptoms.
- Published
- 2024
6. LTLf+ and PPLTL+: Extending LTLf and PPLTL to Infinite Traces
- Author
-
Aminof, Benjamin, De Giacomo, Giuseppe, Rubin, Sasha, and Vardi, Moshe Y.
- Subjects
Computer Science - Logic in Computer Science ,Computer Science - Artificial Intelligence ,Computer Science - Formal Languages and Automata Theory - Abstract
We introduce LTLf+ and PPLTL+, two logics to express properties of infinite traces, that are based on the linear-time temporal logics LTLf and PPLTL on finite traces. LTLf+/PPLTL+ use levels of Manna and Pnueli's LTL safety-progress hierarchy, and thus have the same expressive power as LTL. However, they also retain a crucial characteristic of the reactive synthesis problem for the base logics: the game arena for strategy extraction can be derived from deterministic finite automata (DFA). Consequently, these logics circumvent the notorious difficulties associated with determinizing infinite trace automata, typical of LTL reactive synthesis. We present DFA-based synthesis techniques for LTLf+/PPLTL+, and show that synthesis is 2EXPTIME-complete for LTLf+ (matching LTLf) and EXPTIME-complete for PPLTL+ (matching PPLTL). Notably, while PPLTL+ retains the full expressive power of LTL, reactive synthesis is EXPTIME-complete instead of 2EXPTIME-complete. The techniques are also adapted to optimally solve satisfiability, validity, and model-checking, to get EXPSPACE-complete for LTLf+ (extending a recent result for the guarantee level using LTLf), and PSPACE-complete for PPLTL+.
- Published
- 2024
7. Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
- Author
-
Tsilivis, Nikolaos, Vardi, Gal, and Kempe, Julia
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent, in deep homogeneous neural networks. We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy and characterize the late-stage bias of the algorithms. In particular, we define a generalized notion of stationarity for optimization problems and show that the algorithms progressively reduce a (generalized) Bregman divergence, which quantifies proximity to such stationary points of a margin-maximization problem. We then experimentally zoom into the trajectories of neural networks optimized with various steepest descent algorithms, highlighting connections to the implicit bias of Adam.
- Published
- 2024
8. Provable Tempered Overfitting of Minimal Nets and Typical Nets
- Author
-
Harel, Itamar, Hoza, William M., Vardi, Gal, Evron, Itay, Srebro, Nathan, and Soudry, Daniel
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension., Comment: 60 pages, 4 figures
- Published
- 2024
9. Benign Overfitting in Single-Head Attention
- Author
-
Magen, Roey, Shang, Shuning, Xu, Zhiwei, Frei, Spencer, Hu, Wei, and Vardi, Gal
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers. We prove that under appropriate conditions, the model exhibits benign overfitting in a classification setting already after two steps of gradient descent. Moreover, we show conditions where a minimum-norm/maximum-margin interpolator exhibits benign overfitting. We study how the overfitting behavior depends on the signal-to-noise ratio (SNR) of the data distribution, namely, the ratio between norms of signal and noise tokens, and prove that a sufficiently large SNR is both necessary and sufficient for benign overfitting.
- Published
- 2024
10. Provable Privacy Attacks on Trained Shallow Neural Networks
- Author
-
Smorodinsky, Guy, Vardi, Gal, and Safran, Itay
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security - Abstract
We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably reconstruct a set of which at least a constant fraction are training points in a univariate setting, and can also be used to identify with high probability whether a given point was used in the training set in a high dimensional setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this setting.
- Published
- 2024
11. Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
- Author
-
Frei, Spencer and Vardi, Gal
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that in some settings, these trained transformers can exhibit "benign overfitting in-context": when in-context examples are corrupted by label flipping noise, the transformer memorizes all of its in-context examples (including those with noisy labels) yet still generalizes near-optimally for clean test examples., Comment: 36 pages; added experiments
- Published
- 2024
12. Encoding Reusable Multi-Robot Planning Strategies as Abstract Hypergraphs
- Author
-
Elimelech, Khen, Motes, James, Morales, Marco, Amato, Nancy M., Vardi, Moshe Y., and Kavraki, Lydia E.
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems - Abstract
Multi-Robot Task Planning (MR-TP) is the search for a discrete-action plan a team of robots should take to complete a task. The complexity of such problems scales exponentially with the number of robots and task complexity, making them challenging for online solution. To accelerate MR-TP over a system's lifetime, this work looks at combining two recent advances: (i) Decomposable State Space Hypergraph (DaSH), a novel hypergraph-based framework to efficiently model and solve MR-TP problems; and \mbox{(ii) learning-by-abstraction,} a technique that enables automatic extraction of generalizable planning strategies from individual planning experiences for later reuse. Specifically, we wish to extend this strategy-learning technique, originally designed for single-robot planning, to benefit multi-robot planning using hypergraph-based MR-TP.
- Published
- 2024
13. Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
- Author
-
Medvedev, Marko, Vardi, Gal, and Srebro, Nathan
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian kernel ridge regression (i.e. kernel ridgeless regression), when the bandwidth or input dimension varies with the sample size. For fixed dimensions, we show that even with varying or tuned bandwidth, the ridgeless solution is never consistent and, at least with large enough noise, always worse than the null predictor. For increasing dimension, we give a generic characterization of the overfitting behavior for any scaling of the dimension with sample size. We use this to provide the first example of benign overfitting using the Gaussian kernel with sub-polynomial scaling dimension. All our results are under the Gaussian universality ansatz and the (non-rigorous) risk predictions in terms of the kernel eigenstructure.
- Published
- 2024
14. Many-body adiabatic passage: Instability, chaos, and quantum classical correspondence
- Author
-
Varma, Anant Vijay, Vardi, Amichay, and Cohen, Doron
- Subjects
Quantum Physics ,Condensed Matter - Quantum Gases ,Nonlinear Sciences - Chaotic Dynamics - Abstract
Adiabatic passage in systems of interacting bosons is substantially affected by interactions and inter-particle entanglement. We consider STIRAP-like schemes in Bose-Hubbard chains that exhibit low-dimensional chaos (a 3 site chain), and high-dimensional chaos (more than 3 sites). The dynamics that is generated by a transfer protocol exhibits striking classical and quantum chaos fingerprints that are manifest in the mean-field classical treatment, in the truncated-Wigner semiclassical treatment, and in the full many-body quantum simulations., Comment: 14 pages, 14 figures, including SM
- Published
- 2024
- Full Text
- View/download PDF
15. Non-Conventional Thermal States of Interacting Bosonic Oligomers
- Author
-
Vardi, Amichay, Ramos, Alba, and Kottos, Tsampikos
- Subjects
Physics - Optics - Abstract
There has recently been a growing effort to understand in a comprehensive manner the physics and intricate dynamics of many-body and many-state (multimode) interacting bosonic systems. For instance, in photonics, nonlinear multimode fibers are nowadays intensely investigated due to their promise for ultra-high-bandwidth and high-power capabilities. Similar prospects are pursued in connection with magnon Bose-Einstein condensates, and ultra-cold atoms in periodic lattices for room-temperature quantum devices and quantum computation respectively. While it is practically impossible to monitor the phase space of such complex systems (classically or quantum mechanically), thermodynamics, has succeeded to predict their thermal state: the Rayleigh-Jeans (RJ) distribution for classical fields and the Bose-Einstein (BE) distribution for quantum systems. These distributions are monotonic and promote either the ground state or the most excited mode. Here, we demonstrate the possibility to advance the participation of other modes in the thermal state of bosonic oligomers. The resulting non-monotonic modal occupancies are described by a microcanonical treatment while they deviate drastically from the RJ/BE predictions of canonical and grand-canonical ensembles. Our results provide a paradigm of ensemble equivalence violation and can be used for designing the shape of thermal states., Comment: 17 pages, 11 figures
- Published
- 2024
16. Approaching Deep Learning through the Spectral Dynamics of Weights
- Author
-
Yunis, David, Patel, Kumar Kshitij, Wheeler, Samuel, Savarese, Pedro, Vardi, Gal, Livescu, Karen, Maire, Michael, and Walter, Matthew R.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in optimization across various experiments, from small-scale ``grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers. We also demonstrate that weight decay enhances this bias beyond its role as a norm regularizer, even in practical systems. Moreover, we show that these spectral dynamics distinguish memorizing networks from generalizing ones, offering a novel perspective on this longstanding conundrum. Additionally, we leverage spectral dynamics to explore the emergence of well-performing sparse subnetworks (lottery tickets) and the structure of the loss surface through linear mode connectivity. Our findings suggest that spectral dynamics provide a coherent framework to better understand the behavior of neural networks across diverse settings.
- Published
- 2024
17. On-the-fly Synthesis for LTL over Finite Traces: An Efficient Approach that Counts
- Author
-
Xiao, Shengping, Li, Yongkang, Zhu, Shufang, Sun, Jun, Li, Jianwen, Pu, Geguang, and Vardi, Moshe Y.
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Logic in Computer Science - Abstract
We present an on-the-fly synthesis framework for Linear Temporal Logic over finite traces (LTLf) based on top-down deterministic automata construction. Existing approaches rely on constructing a complete Deterministic Finite Automaton (DFA) corresponding to the LTLf specification, a process with doubly exponential complexity relative to the formula size in the worst case. In this case, the synthesis procedure cannot be conducted until the entire DFA is constructed. This inefficiency is the main bottleneck of existing approaches. To address this challenge, we first present a method for converting LTLf into Transition-based DFA (TDFA) by directly leveraging LTLf semantics, incorporating intermediate results as direct components of the final automaton to enable parallelized synthesis and automata construction. We then explore the relationship between LTLf synthesis and TDFA games and subsequently develop an algorithm for performing LTLf synthesis using on-the-fly TDFA game solving. This algorithm traverses the state space in a global forward manner combined with a local backward method, along with the detection of strongly connected components. Moreover, we introduce two optimization techniques -- model-guided synthesis and state entailment -- to enhance the practical efficiency of our approach. Experimental results demonstrate that our on-the-fly approach achieves the best performance on the tested benchmarks and effectively complements existing tools and approaches., Comment: 32 pages, 3 figures, 3 tables
- Published
- 2024
18. Reconstructing Training Data From Real World Models Trained with Transfer Learning
- Author
-
Oz, Yakir, Yehudai, Gilad, Vardi, Gal, Antebi, Itai, Irani, Michal, and Haim, Niv
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Current methods for reconstructing training data from trained classifiers are restricted to very small models, limited training set sizes, and low-resolution images. Such restrictions hinder their applicability to real-world scenarios. In this paper, we present a novel approach enabling data reconstruction in realistic settings for models trained on high-resolution images. Our method adapts the reconstruction scheme of arXiv:2206.07758 to real-world scenarios -- specifically, targeting models trained via transfer learning over image embeddings of large pre-trained models like DINO-ViT and CLIP. Our work employs data reconstruction in the embedding space rather than in the image space, showcasing its applicability beyond visual data. Moreover, we introduce a novel clustering-based method to identify good reconstructions from thousands of candidates. This significantly improves on previous works that relied on knowledge of the training set to identify good reconstructed images. Our findings shed light on a potential privacy risk for data leakage from models trained using transfer learning.
- Published
- 2024
19. A user’s guide to coral reef restoration terminologies
- Author
-
Suggett, David J., Goergen, Elizabeth A., Fraser, Megan, Hein, Margaux Y., Hoot, Whitney, McLeod, Ian, Montoya-Maya, Phanor H., Moore, Tom, Ross, Andrew M., and Vardi, Tali
- Published
- 2025
- Full Text
- View/download PDF
20. Effects of the COVID-19 Lockdown on HbA1c Levels of Ethnic Minorities and Low-income Groups with Type 2 Diabetes in Israel
- Author
-
Riklin, Galia, Friger, Michael, Shoham-Vardi, Ilana, Golan, Rachel, and Wainstock, Tamar
- Published
- 2024
- Full Text
- View/download PDF
21. The critical role of coral reef restoration in a changing world
- Author
-
Peixoto, Raquel S., Voolstra, Christian R., Baums, Iliana B., Camp, Emma F., Guest, James, Harrison, Peter L., Montoya-Maya, Phanor H., Pollock, F. Joseph, Smith, David J., Wangpraseurt, Daniel, Banaszak, Anastazia T., Chui, Apple P. Y., Shah, Nirmal, Moore, Tom, Fabricius, Katharina E., Vardi, Tali, and Suggett, David J.
- Published
- 2024
- Full Text
- View/download PDF
22. MoXI: An Intermediate Language for Symbolic Model Checking
- Author
-
Rozier, Kristin Yvonne, Dureja, Rohit, Irfan, Ahmed, Johannsen, Chris, Nukala, Karthik, Shankar, Natarajan, Tinelli, Cesare, Vardi, Moshe Y., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Neele, Thomas, editor, and Wijs, Anton, editor
- Published
- 2025
- Full Text
- View/download PDF
23. Dynamic Programming for Symbolic Boolean Realizability and Synthesis
- Author
-
Lin, Yi, Tabajara, Lucas M., and Vardi, Moshe Y.
- Subjects
Computer Science - Formal Languages and Automata Theory ,Computer Science - Logic in Computer Science - Abstract
Inspired by recent progress in dynamic programming approaches for weighted model counting, we investigate a dynamic-programming approach in the context of boolean realizability and synthesis, which takes a conjunctive-normal-form boolean formula over input and output variables, and aims at synthesizing witness functions for the output variables in terms of the inputs. We show how graded project-join trees, obtained via tree decomposition, can be used to compute a BDD representing the realizability set for the input formulas in a bottom-up order. We then show how the intermediate BDDs generated during realizability checking phase can be applied to synthesizing the witness functions in a top-down manner. An experimental evaluation of a solver -- DPSynth -- based on these ideas demonstrates that our approach for Boolean realizabilty and synthesis has superior time and space performance over a heuristics-based approach using same symbolic representations. We discuss the advantage on scalability of the new approach, and also investigate our findings on the performance of the DP framework., Comment: 32 pages including appendices and bibliography, 5 figures, paper is to be published in CAV 2024, but this version is inclusive of the Appendix
- Published
- 2024
24. The Trembling-Hand Problem for LTLf Planning
- Author
-
Yu, Pian, Zhu, Shufang, De Giacomo, Giuseppe, Kwiatkowska, Marta, and Vardi, Moshe
- Subjects
Computer Science - Robotics ,Computer Science - Formal Languages and Automata Theory - Abstract
Consider an agent acting to achieve its temporal goal, but with a "trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and planning for temporally extended goals expressed in Linear Temporal Logic on finite traces (LTLf), where we want to synthesize a strategy (aka plan) that maximizes the probability of satisfying the LTLf goal in spite of the trembling hand. We consider both deterministic and nondeterministic (adversarial) domains. We propose solution techniques for both cases by relying respectively on Markov Decision Processes and on Markov Decision Processes with Set-valued Transitions with LTLf objectives, where the set-valued probabilistic transitions capture both the nondeterminism from the environment and the possible action instruction errors from the agent. We formally show the correctness of our solution techniques and demonstrate their effectiveness experimentally through a proof-of-concept implementation., Comment: The paper is accepted by IJCAI 2024
- Published
- 2024
25. Characterization of hybrid quantum eigenstates in systems with mixed classical phasespace
- Author
-
Varma, Anant Vijay, Vardi, Amichay, and Cohen, Doron
- Subjects
Quantum Physics ,Condensed Matter - Quantum Gases - Abstract
Generic low-dimensional Hamiltonian systems feature a structured, mixed classical phase-space. The traditional Percival classification of quantum spectra into regular states supported by quasi-integrable regions and irregular states supported by quasi-chaotic regions turns out to be insufficient to capture the richness of the Hilbert space. Berry's conjecture and the eigenstate thermalization hypothesis are not applicable and quantum effects such as tunneling, scarring, and localization, do not obey the standard paradigms. We demonstrate these statements for a prototype Bose-Hubbard model. We highlight the hybridization of chaotic and regular regions from opposing perspectives of ergodicity and localization., Comment: 11 pages, 13 figures
- Published
- 2024
- Full Text
- View/download PDF
26. Stochastic Games for Interactive Manipulation Domains
- Author
-
Muvvala, Karan, Wells, Andrew M., Lahijanian, Morteza, Kavraki, Lydia E., and Vardi, Moshe Y.
- Subjects
Computer Science - Robotics ,Computer Science - Computer Science and Game Theory ,Computer Science - Multiagent Systems - Abstract
As robots become more prevalent, the complexity of robot-robot, robot-human, and robot-environment interactions increases. In these interactions, a robot needs to consider not only the effects of its own actions, but also the effects of other agents' actions and the possible interactions between agents. Previous works have considered reactive synthesis, where the human/environment is modeled as a deterministic, adversarial agent; as well as probabilistic synthesis, where the human/environment is modeled via a Markov chain. While they provide strong theoretical frameworks, there are still many aspects of human-robot interaction that cannot be fully expressed and many assumptions that must be made in each model. In this work, we propose stochastic games as a general model for human-robot interaction, which subsumes the expressivity of all previous representations. In addition, it allows us to make fewer modeling assumptions and leads to more natural and powerful models of interaction. We introduce the semantics of this abstraction and show how existing tools can be utilized to synthesize strategies to achieve complex tasks with guarantees. Further, we discuss the current computational limitations and improve the scalability by two orders of magnitude by a new way of constructing models for PRISM-games., Comment: Accepted: ICRA 2024
- Published
- 2024
27. Spatiotemporal patterns of public attention to invasive species across an invasion front: a case study of lionfish (Pterois miles) from the Mediterranean Sea
- Author
-
Fazzari, Lara, Vardi, Reut, Jaric, Ivan, Correia, Ricardo A., Coll, Marta, and Sbragaglia, Valerio
- Published
- 2024
- Full Text
- View/download PDF
28. Hyperelastic models for the human zona pellucida and their implications on shear modulus estimation in the clinical practice
- Author
-
Priel, E., Mittelman, B., Efraim, L., Priel, T., Szaingurten-Solodkin, I., and Har-Vardi, I.
- Published
- 2024
- Full Text
- View/download PDF
29. Cerebral air embolism in pediatric patients undergoing cardiac surgery
- Author
-
Hubara, Evyatar, Feingold, Iris Motro, Skourikhin, Yelena, Lerner, Reut Kassif, Vardi, Amir, Mishaly, David, Katz, Uriel, and Bar-Yosef, Omer
- Published
- 2024
- Full Text
- View/download PDF
30. Neural network analysis for predicting metrics of fragmented laminar artifacts: a case study from MPPNB sites in the Southern Levant
- Author
-
Nobile, Eugenio, Troiano, Maurizio, Mangini, Fabio, Mastrogiuseppe, Marco, Vardi, Jacob, Frezza, Fabrizio, Conati Barbaro, Cecilia, and Gopher, Avi
- Published
- 2024
- Full Text
- View/download PDF
31. Restoration as a meaningful aid to ecological recovery of coral reefs
- Author
-
Suggett, David J., Guest, James, Camp, Emma F., Edwards, Alasdair, Goergen, Liz, Hein, Margaux, Humanes, Adriana, Levy, Jessica S., Montoya-Maya, Phanor H., Smith, David J., Vardi, Tali, Winters, R. Scott, and Moore, Tom
- Published
- 2024
- Full Text
- View/download PDF
32. Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
- Author
-
Xu, Zhiwei, Wang, Yutong, Frei, Spencer, Vardi, Gal, and Hu, Wei
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps.
- Published
- 2023
33. The Fourth Global Coral Bleaching Event: Where do we go from here?
- Author
-
Reimer, James Davis, Peixoto, Raquel S., Davies, Sarah W., Traylor-Knowles, Nikki, Short, Morgan L., Cabral-Tena, Rafael A., Burt, John A., Pessoa, Igor, Banaszak, Anastazia T., Winters, R. Scott, Moore, Tom, Schoepf, Verena, Kaullysing, Deepeeka, Calderon-Aguilera, Luis E., Wörheide, Gert, Harding, Simon, Munbodhe, Vikash, Mayfield, Anderson, Ainsworth, Tracy, Vardi, Tali, Eakin, C. Mark, Pratchett, Morgan S., and Voolstra, Christian R.
- Published
- 2024
- Full Text
- View/download PDF
34. Diagnostic capabilities of ChatGPT in ophthalmology
- Author
-
Shemer, Asaf, Cohen, Michal, Altarescu, Aya, Atar-Vardi, Maya, Hecht, Idan, Dubinsky-Pertzov, Biana, Shoshany, Nadav, Zmujack, Sigal, Or, Lior, Einan-Lifshitz, Adi, and Pras, Eran
- Published
- 2024
- Full Text
- View/download PDF
35. Single-cell RNA-seq of the rare virosphere reveals the native hosts of giant viruses in the marine environment
- Author
-
Fromm, Amir, Hevroni, Gur, Vincent, Flora, Schatz, Daniella, Martinez-Gutierrez, Carolina A., Aylward, Frank O., and Vardi, Assaf
- Published
- 2024
- Full Text
- View/download PDF
36. Noisy Interpolation Learning with Shallow Univariate ReLU Networks
- Author
-
Joshi, Nirmit, Vardi, Gal, and Srebro, Nathan
- Subjects
Computer Science - Machine Learning - Abstract
Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. 2022 noted that neural networks seem to often exhibit ``tempered overfitting'', wherein the population risk does not converge to the Bayes optimal error, but neither does it approach infinity, yielding non-trivial generalization. However, this has not been studied rigorously. We provide the first rigorous analysis of the overfitting behavior of regression with minimum norm ($\ell_2$ of weights), focusing on univariate two-layer ReLU networks. We show overfitting is tempered (with high probability) when measured with respect to the $L_1$ loss, but also show that the situation is more complex than suggested by Mallinar et. al., and overfitting is catastrophic with respect to the $L_2$ loss, or when taking an expectation over the training set., Comment: To appear at ICLR 2024. Updated version with minor changes in the presentation
- Published
- 2023
37. Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses
- Author
-
Buzaglo, Gon, Haim, Niv, Yehudai, Gilad, Vardi, Gal, Oz, Yakir, Nikankin, Yaniv, and Irani, Michal
- Subjects
Computer Science - Machine Learning - Abstract
Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. (2022) proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability. Code: https://github.com/gonbuzaglo/decoreco, Comment: Code: https://github.com/gonbuzaglo/decoreco. arXiv admin note: text overlap with arXiv:2305.03350
- Published
- 2023
38. An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression
- Author
-
Zhou, Lijia, Simon, James B., Vardi, Gal, and Srebro, Nathan
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning - Abstract
We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (cf. Mallinar et al. 2022)., Comment: This is the ICLR CR version
- Published
- 2023
39. Orchiopexy: one procedure, two diagnoses – different male infertility outcomes
- Author
-
Nitza Heiman Newman, Idan Farber, Eitan Lunenfeld, Atif Zeadna, Iris Har Vardi, and Zaki Assi
- Subjects
male infertility ,orchiopexy ,sperm analysis ,testicular torsion ,undescended testicle ,Diseases of the genitourinary system. Urology ,RC870-923 - Abstract
Infertility, affecting one in six couples, is often related to the male partner’s congenital and/or environmental conditions or complications postsurgery. This retrospective study examines the link between orchiopexy for undescended testicles (UDT) and testicular torsion (TT) in childhood and adult fertility as assessed through sperm analysis. The study involved the analysis of semen samples from 7743 patients collected at Soroka University Medical Center (Beer Sheva, Israel) between January 2009 and December 2017. Patients were classified into two groups based on sperm concentration: those with concentrations below 5 × 106 sperm per ml (AS group) and those above (MN group). Medical records and surgical histories were reviewed, categorizing orchiopexies by surgical approach. Among 140 individuals who had undergone pediatric surgery, 83 (59.3%) were placed in the MN group and 57 (40.7%) in the AS group. A higher likelihood of being in the MN group was observed in Jewish compared to Arab patients (75.9% vs 24.1%, P = 0.006). In cases of childhood UDT, 45 (78.9%) patients exhibited sperm concentrations below 5 × 106 sperm per ml (P < 0.001), and 66 (76.7%) had undergone unilateral and 18 (20.9%) bilateral orchiopexy. Bilateral orchiopexy was significantly associated with lower sperm concentration, total motility, and progressive motility than unilateral cases (P = 0.014, P = 0.001, and P = 0.031, respectively). Multivariate analysis identified UDT as a weak risk factor for low sperm concentration (odds ratio [OR]: 2.712, P = 0.078), with bilateral UDT further increasing this risk (OR: 6.314, P = 0.012). Jewish ethnicity and TT diagnosis were associated with a reduced risk of sperm concentrations below 5 × 106 sperm per ml. The findings indicate that initial diagnosis, surgical approach, and ethnicity markedly influence male fertility outcomes following pediatric orchiopexy.
- Published
- 2024
- Full Text
- View/download PDF
40. The mechanism underlying successful deep learning
- Author
-
Tzach, Yarden, Meir, Yuval, Tevet, Ofek, Gross, Ronit D., Hodassman, Shiri, Vardi, Roni, and Kanter, Ido
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep architectures consist of tens or hundreds of convolutional layers (CLs) that terminate with a few fully connected (FC) layers and an output layer representing the possible labels of a complex classification task. According to the existing deep learning (DL) rationale, the first CL reveals localized features from the raw data, whereas the subsequent layers progressively extract higher-level features required for refined classification. This article presents an efficient three-phase procedure for quantifying the mechanism underlying successful DL. First, a deep architecture is trained to maximize the success rate (SR). Next, the weights of the first several CLs are fixed and only the concatenated new FC layer connected to the output is trained, resulting in SRs that progress with the layers. Finally, the trained FC weights are silenced, except for those emerging from a single filter, enabling the quantification of the functionality of this filter using a correlation matrix between input labels and averaged output fields, hence a well-defined set of quantifiable features is obtained. Each filter essentially selects a single output label independent of the input label, which seems to prevent high SRs; however, it counterintuitively identifies a small subset of possible output labels. This feature is an essential part of the underlying DL mechanism and is progressively sharpened with layers, resulting in enhanced signal-to-noise ratios and SRs. Quantitatively, this mechanism is exemplified by the VGG-16, VGG-6, and AVGG-16. The proposed mechanism underlying DL provides an accurate tool for identifying each filter's quality and is expected to direct additional procedures to improve the SR, computational complexity, and latency of DL., Comment: 33 pages, 8 figures
- Published
- 2023
41. Most Neural Networks Are Almost Learnable
- Author
-
Daniely, Amit, Srebro, Nathan, and Vardi, Gal
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
We present a PTAS for learning random constant-depth networks. We show that for any fixed $\epsilon>0$ and depth $i$, there is a poly-time algorithm that for any distribution on $\sqrt{d} \cdot \mathbb{S}^{d-1}$ learns random Xavier networks of depth $i$, up to an additive error of $\epsilon$. The algorithm runs in time and sample complexity of $(\bar{d})^{\mathrm{poly}(\epsilon^{-1})}$, where $\bar d$ is the size of the network. For some cases of sigmoid and ReLU-like activations the bound can be improved to $(\bar{d})^{\mathrm{polylog}(\epsilon^{-1})}$, resulting in a quasi-poly-time algorithm for learning constant depth random networks., Comment: Small fixes after review
- Published
- 2023
42. Singly Exponential Translation of Alternating Weak B\'uchi Automata to Unambiguous B\'uchi Automata
- Author
-
Li, Yong, Schewe, Sven, and Vardi, Moshe Y.
- Subjects
Computer Science - Formal Languages and Automata Theory ,F.4.3 - Abstract
We introduce a method for translating an alternating weak B\"uchi automaton (AWA), which corresponds to a Linear Dynamic Logic (LDL) formula, to an unambiguous B\"uchi automaton (UBA). Our translations generalise constructions for Linear Temporal Logic (LTL), a less expressive specification language than LDL. In classical constructions, LTL formulas are first translated to alternating \emph{very weak} automata (AVAs) -- automata that have only singleton strongly connected components (SCCs); the AVAs are then handled by efficient disambiguation procedures. However, general AWAs can have larger SCCs, which complicates disambiguation. Currently, the only available disambiguation procedure has to go through an intermediate construction of nondeterministic B\"uchi automata (NBAs), which would incur an exponential blow-up of its own. We introduce a translation from \emph{general} AWAs to UBAs with a \emph{singly} exponential blow-up, which also immediately provides a singly exponential translation from LDL to UBAs. Interestingly, the complexity of our translation is \emph{smaller} than the best known disambiguation algorithm for NBAs (broadly $(0.53n)^n$ vs. $(0.76n)^n$), while the input of our construction can be exponentially more succinct., Comment: 23 pages
- Published
- 2023
43. Model Checking Strategies from Synthesis Over Finite Traces
- Author
-
Bansal, Suguman, Li, Yong, Tabajara, Lucas Martinelli, Vardi, Moshe Y., and Wells, Andrew
- Subjects
Computer Science - Formal Languages and Automata Theory ,Computer Science - Artificial Intelligence ,Computer Science - Logic in Computer Science - Abstract
The innovations in reactive synthesis from {\em Linear Temporal Logics over finite traces} (LTLf) will be amplified by the ability to verify the correctness of the strategies generated by LTLf synthesis tools. This motivates our work on {\em LTLf model checking}. LTLf model checking, however, is not straightforward. The strategies generated by LTLf synthesis may be represented using {\em terminating} transducers or {\em non-terminating} transducers where executions are of finite-but-unbounded length or infinite length, respectively. For synthesis, there is no evidence that one type of transducer is better than the other since they both demonstrate the same complexity and similar algorithms. In this work, we show that for model checking, the two types of transducers are fundamentally different. Our central result is that LTLf model checking of non-terminating transducers is \emph{exponentially harder} than that of terminating transducers. We show that the problems are EXPSPACE-complete and PSPACE-complete, respectively. Hence, considering the feasibility of verification, LTLf synthesis tools should synthesize terminating transducers. This is, to the best of our knowledge, the \emph{first} evidence to use one transducer over the other in LTLf synthesis., Comment: Accepted by ATVA 23
- Published
- 2023
44. Reconstructing Training Data from Multiclass Neural Networks
- Author
-
Buzaglo, Gon, Haim, Niv, Yehudai, Gilad, Vardi, Gal, and Irani, Michal
- Subjects
Computer Science - Machine Learning ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Reconstructing samples from the training set of trained neural networks is a major privacy concern. Haim et al. (2022) recently showed that it is possible to reconstruct training samples from neural network binary classifiers, based on theoretical results about the implicit bias of gradient methods. In this work, we present several improvements and new insights over this previous work. As our main improvement, we show that training-data reconstruction is possible in the multi-class setting and that the reconstruction quality is even higher than in the case of binary classification. Moreover, we show that using weight-decay during training increases the vulnerability to sample reconstruction. Finally, while in the previous work the training set was of size at most $1000$ from $10$ classes, we show preliminary evidence of the ability to reconstruct from a model trained on $5000$ samples from $100$ classes.
- Published
- 2023
45. Multi-Agent Systems with Quantitative Satisficing Goals
- Author
-
Rajasekaran, Senthil, Bansal, Suguman, and Vardi, Moshe Y.
- Subjects
Computer Science - Computer Science and Game Theory ,Computer Science - Formal Languages and Automata Theory - Abstract
In the study of reactive systems, qualitative properties are usually easier to model and analyze than quantitative properties. This is especially true in systems where mutually beneficial cooperation between agents is possible, such as multi-agent systems. The large number of possible payoffs available to agents in reactive systems with quantitative properties means that there are many scenarios in which agents deviate from mutually beneficial outcomes in order to gain negligible payoff improvements. This behavior often leads to less desirable outcomes for all agents involved. For this reason we study satisficing goals, derived from a decision-making approach aimed at meeting a good-enough outcome instead of pure optimization. By considering satisficing goals, we are able to employ efficient automata-based algorithms to find pure-strategy Nash equilibria. We then show that these algorithms extend to scenarios in which agents have multiple thresholds, providing an approximation of optimization while still retaining the possibility of mutually beneficial cooperation and efficient automata-based algorithms. Finally, we demonstrate a one-way correspondence between the existence of $\epsilon$-equilibria and the existence of equilibria in games where agents have multiple thresholds., Comment: Preliminary version of the technical report for a paper to appear in IJCAI'23
- Published
- 2023
46. Multi-Phase Relaxation Labeling for Square Jigsaw Puzzle Solving
- Author
-
Vardi, Ben, Torcinovich, Alessandro, Khoroshiltseva, Marina, Pelillo, Marcello, and Ben-Shahar, Ohad
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present a novel method for solving square jigsaw puzzles based on global optimization. The method is fully automatic, assumes no prior information, and can handle puzzles with known or unknown piece orientation. At the core of the optimization process is nonlinear relaxation labeling, a well-founded approach for deducing global solutions from local constraints, but unlike the classical scheme here we propose a multi-phase approach that guarantees convergence to feasible puzzle solutions. Next to the algorithmic novelty, we also present a new compatibility function for the quantification of the affinity between adjacent puzzle pieces. Competitive results and the advantage of the multi-phase approach are demonstrated on standard datasets., Comment: 10 pages, 7 figures. Published in Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4 VISAPP: VISAPP, 785-795, 2023
- Published
- 2023
- Full Text
- View/download PDF
47. Dynamic Combinatorial Assignment
- Author
-
Nguyen, Thành, Teytelboym, Alexander, and Vardi, Shai
- Subjects
Economics - Theoretical Economics ,Computer Science - Computer Science and Game Theory - Abstract
We study a model of dynamic combinatorial assignment of indivisible objects without money. We introduce a new solution concept called ``dynamic approximate competitive equilibrium from equal incomes'' (DACEEI), which stipulates that markets must approximately clear in almost all time periods. A naive repeated application of approximate competitive equilibrium from equal incomes (Budish, 2011) does not yield a desirable outcome because the approximation error in market-clearing compounds quickly over time. We therefore develop a new version of the static approximate competitive equilibrium from carefully constructed random budgets which ensures that, in expectation, markets clear exactly. We then use it to design the ``online combinatorial assignment mechanism'' (OCAM) which implements a DACEEI with high probability. The OCAM is (i) group-strategyproof up to one object (ii) envy-free up to one object for almost all agents (iii) approximately market-clearing in almost all periods with high probability when the market is large and arrivals are random. Applications include refugee resettlement, daycare assignment, and airport slot allocation.
- Published
- 2023
48. Enhancing the accuracies by performing pooling decisions adjacent to the output layer
- Author
-
Meir, Yuval, Tzach, Yarden, Gross, Ronit D., Tevet, Ofek, Vardi, Roni, and Kanter, Ido
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m=6, 8, 14, 13, and 16, respectively. The results indicate A-VGG8s' accuracy is superior to VGG16s', and that the accuracies of A-VGG13 and A-VGG16 are equal, and comparable to that of Wide-ResNet16. In addition, replacing the three fully connected (FC) layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation FC layers, yielded similar accuracies. These significantly enhanced accuracies stem from training the most influential input-output routes, in comparison to the inferior routes selected following multiple MP decisions along the deep architecture. In addition, accuracies are sensitive to the order of the non-commutative MP and average pooling operators adjacent to the output layer, varying the number and location of training routes. The results call for the reexamination of previously proposed deep architectures and their accuracies by utilizing the proposed pooling strategy adjacent to the output layer., Comment: 29 pages, 3 figures, 1 table, and Supplementary Information
- Published
- 2023
- Full Text
- View/download PDF
49. Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization
- Author
-
Frei, Spencer, Vardi, Gal, Bartlett, Peter L., and Srebro, Nathan
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization. In this work we establish a number of settings where the satisfaction of these KKT conditions implies benign overfitting in linear classifiers and in two-layer leaky ReLU networks: the estimators interpolate noisy training data and simultaneously generalize well to test data. The settings include variants of the noisy class-conditional Gaussians considered in previous work as well as new distributional settings where benign overfitting has not been previously observed. The key ingredient to our proof is the observation that when the training data is nearly-orthogonal, both linear classifiers and leaky ReLU networks satisfying the KKT conditions for their respective margin maximization problems behave like a nearly uniform average of the training examples., Comment: 53 pages
- Published
- 2023
50. The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks
- Author
-
Frei, Spencer, Vardi, Gal, Bartlett, Peter L., and Srebro, Nathan
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial examples. Our results hold even in cases where the network has many more parameters than training examples. Despite the potential for harmful overfitting in such overparameterized settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial $\ell_2$-perturbations), even though robust networks that fit the data exist., Comment: 42 pages; NeurIPS 2023 camera ready
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.