Author: "Havasi, Marton" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Havasi, Marton"' showing total 15 results

Start Over Author "Havasi, Marton"

15 results on '"Havasi, Marton"'

1. Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

Author: Phan, Buu, Amos, Brandon, Gat, Itai, Havasi, Marton, Muckley, Matthew, and Ullrich, Karen
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Tokenization is associated with many poorly understood shortcomings in language models (LMs), yet remains an important component for long sequence scaling purposes. This work studies how tokenization impacts model performance by analyzing and comparing the stochastic behavior of tokenized models with their byte-level, or token-free, counterparts. We discover that, even when the two models are statistically equivalent, their predictive distributions over the next byte can be substantially different, a phenomenon we term as "tokenization bias''. To fully characterize this phenomenon, we introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution. From this result, we develop a next-byte sampling algorithm that eliminates tokenization bias without requiring further training or optimization. In other words, this enables zero-shot conversion of tokenized LMs into statistically equivalent token-free ones. We demonstrate its broad applicability with two use cases: fill-in-the-middle (FIM) tasks and model ensembles. In FIM tasks where input prompts may terminate mid-token, leading to out-of-distribution tokenization, our method mitigates performance degradation and achieves an approximately 18% improvement in FIM coding benchmarks, consistently outperforming the standard token healing fix. For model ensembles where each model employs a distinct vocabulary, our approach enables seamless integration, resulting in improved performance (up to 3.7%) over individual models across various standard baselines in reasoning, knowledge, and coding.
Published: 2024

2. Understanding and Mitigating Tokenization Bias in Language Models

Author: Phan, Buu, Havasi, Marton, Muckley, Matthew, and Ullrich, Karen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: State-of-the-art language models are autoregressive and operate on subword units known as tokens. Specifically, one must encode the conditioning string into a list of tokens before passing to the language models for next-token prediction. We show that popular encoding schemes, such as maximum prefix encoding (MPE) and byte-pair-encoding (BPE), induce a sampling bias that cannot be mitigated with more training or data. To counter this universal problem, for each encoding scheme above, we propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data. Our methods do not require finetuning the model, and the complexity, defined as the number of model runs, scales linearly with the sequence length in the case of MPE. As a result, we show that one can simulate token-free behavior from a tokenized language model. We empirically verify the correctness of our method through a Markov-chain setup, where it accurately recovers the transition probabilities, as opposed to the conventional method of directly prompting tokens into the language model.
Published: 2024

3. Guarantee Regions for Local Explanations

Author: Havasi, Marton, Parbhoo, Sonali, and Doshi-Velez, Finale
Subjects: Computer Science - Machine Learning
Abstract: Interpretability methods that utilise local surrogate models (e.g. LIME) are very good at describing the behaviour of the predictive model at a point of interest, but they are not guaranteed to extrapolate to the local region surrounding the point. However, overfitting to the local curvature of the predictive model and malicious tampering can significantly limit extrapolation. We propose an anchor-based algorithm for identifying regions in which local explanations are guaranteed to be correct by explicitly describing those intervals along which the input features can be trusted. Our method produces an interpretable feature-aligned box where the prediction of the local surrogate model is guaranteed to match the predictive model. We demonstrate that our algorithm can be used to find explanations with larger guarantee regions that better cover the data manifold compared to existing baselines. We also show how our method can identify misleading local explanations with significantly poorer guarantee regions.
Published: 2024

4. What Makes a Good Explanation?: A Harmonized View of Properties of Explanations

Author: Chen, Zixi, Subhash, Varshini, Havasi, Marton, Pan, Weiwei, and Doshi-Velez, Finale
Subjects: Computer Science - Machine Learning
Abstract: Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning., Comment: Short version accepted at NeurIPS 2022 workshops on Progress and Challenges in Building Trustworthy Embodied AI and Trustworthy and Socially Responsible Machine Learning
Published: 2022

5. Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

Author: Nado, Zachary, Band, Neil, Collier, Mark, Djolonga, Josip, Dusenberry, Michael W., Farquhar, Sebastian, Feng, Qixuan, Filos, Angelos, Havasi, Marton, Jenatton, Rodolphe, Jerfel, Ghassen, Liu, Jeremiah, Mariet, Zelda, Nixon, Jeremy, Padhy, Shreyas, Ren, Jie, Rudner, Tim G. J., Sbahi, Faris, Wen, Yeming, Wenzel, Florian, Murphy, Kevin, Sculley, D., Lakshminarayanan, Balaji, Snoek, Jasper, Gal, Yarin, and Tran, Dustin
Subjects: Computer Science - Machine Learning
Abstract: High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at https://github.com/google/uncertainty-baselines.
Published: 2021

6. Training independent subnetworks for robust prediction

Author: Havasi, Marton, Jenatton, Rodolphe, Fort, Stanislav, Liu, Jeremiah Zhe, Snoek, Jasper, Lakshminarayanan, Balaji, Dai, Andrew M., and Tran, Dustin
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network. However, these methods still require multiple forward passes for prediction, leading to a significant computational cost. In this work, we show a surprising result: the benefits of using multiple predictions can be achieved `for free' under a single model's forward pass. In particular, we show that, using a multi-input multi-output (MIMO) configuration, one can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand. By ensembling the predictions made by the subnetworks, we improve model robustness without increasing compute. We observe a significant improvement in negative log-likelihood, accuracy, and calibration error on CIFAR10, CIFAR100, ImageNet, and their out-of-distribution variants compared to previous methods., Comment: Updated to the ICLR camera ready version, added reference to Soflaei et al. 2020
Published: 2020

7. Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding

Author: Flamich, Gergely, Havasi, Marton, and Hernández-Lobato, José Miguel
Subjects: Computer Science - Information Theory, Electrical Engineering and Systems Science - Image and Video Processing, Statistics - Machine Learning, 94A08 (Primary) 94A34 (Secondary), E.4, G.3, H.1.1
Abstract: Variational Autoencoders (VAEs) have seen widespread use in learned image compression. They are used to learn expressive latent representations on which downstream compression methods can operate with high efficiency. Recently proposed 'bits-back' methods can indirectly encode the latent representation of images with codelength close to the relative entropy between the latent posterior and the prior. However, due to the underlying algorithm, these methods can only be used for lossless compression, and they only achieve their nominal efficiency when compressing multiple images simultaneously; they are inefficient for compressing single images. As an alternative, we propose a novel method, Relative Entropy Coding (REC), that can directly encode the latent representation with codelength close to the relative entropy for single images, supported by our empirical results obtained on the Cifar10, ImageNet32 and Kodak datasets. Moreover, unlike previous bits-back methods, REC is immediately applicable to lossy compression, where it is competitive with the state-of-the-art on the Kodak dataset., Comment: Accepted at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Published: 2020

8. Advances in compression using probabilistic models

Author: Havasi, Marton and Hernández-Lobato, José Miguel
Subjects: Compression, machine learning
Abstract: The increasing demand for data transmission and storage necessitate the use of efficient compression methods. Compression algorithms work by mapping data to a more compact representation from which the original data can be recovered. To operate efficiently, they need to capture the characteristics of the data distribution, which can be difficult, especially for high-dimensional data. One emerging solution lies in applying probabilistic machine learning to capture the data distribution in an unsupervised manner. Once a probabilistic model for the data is defined, variational inference can be used to infer its parameters from data. Variational inference is closely related to the optimal compression size, as stated by Hinton's bits-back argument: the evidence lower bound, the objective optimized by variational inference, corresponds to a lower bound on the optimal compression size of the average datapoint. However, current compression methods rely on variational inference merely as a heuristic, and they do not approach its postulated efficiency. In this thesis, we present principled and practical algorithms that get closer to this limit. After discussing our approach, we demonstrate its efficacy in image compression and model compression. First, we focus on image compression, where we use a variational autoencoder to learn a mapping between the images and their unobserved, latent representations. We propose a stochastic coding scheme to encode the latent representation, from which the original image can be approximately reconstructed. Next, we look at the compression of deep learning models. We use variational inference to approximate the posterior distribution of the weights in a neural network, and apply our stochastic coding scheme to encode a weight configuration. Finally, we investigate a connection between variational inference and our compression algorithm. We show that a technique we used for compression can improve variational inference by generating samples from a highly flexible posterior approximation, without significantly increasing the computational costs.
Published: 2021
Full Text: View/download PDF

9. Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters

Author: Havasi, Marton, Peharz, Robert, and Hernández-Lobato, José Miguel
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an utmost goal in deep learning. A typical approach is to train a set of deterministic weights, while applying certain techniques such as pruning and quantization, in order that the empirical weight distribution becomes amenable to Shannon-style coding schemes. However, as shown in this paper, relaxing weight determinism and using a full variational distribution over weights allows for more efficient coding schemes and consequently higher compression rates. In particular, following the classical bits-back argument, we encode the network weights using a random sample, requiring only a number of bits corresponding to the Kullback-Leibler divergence between the sampled variational distribution and the encoding distribution. By imposing a constraint on the Kullback-Leibler divergence, we are able to explicitly control the compression rate, while optimizing the expected loss on the training set. The employed encoding scheme can be shown to be close to the optimal information-theoretical lower bound, with respect to the employed variational family. Our method sets new state-of-the-art in neural network compression, as it strictly dominates previous approaches in a Pareto sense: On the benchmarks LeNet-5/MNIST and VGG-16/CIFAR-10, our approach yields the best test performance for a fixed memory budget, and vice versa, it achieves the highest compression rates for a fixed test performance., Comment: Under review as a conference paper at ICLR 2019
Published: 2018

10. Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Author: Havasi, Marton, Hernández-Lobato, José Miguel, and Murillo-Fuentes, Juan José
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Deep Gaussian Processes (DGPs) are hierarchical generalizations of Gaussian Processes that combine well calibrated uncertainty estimates with the high flexibility of multilayer models. One of the biggest challenges with these models is that exact inference is intractable. The current state-of-the-art inference method, Variational Inference (VI), employs a Gaussian approximation to the posterior distribution. This can be a potentially poor unimodal approximation of the generally multimodal posterior. In this work, we provide evidence for the non-Gaussian nature of the posterior and we apply the Stochastic Gradient Hamiltonian Monte Carlo method to generate samples. To efficiently optimize the hyperparameters, we introduce the Moving Window MCEM algorithm. This results in significantly better predictions at a lower computational cost than its VI counterpart. Thus our method establishes a new state-of-the-art for inference in DGPs.
Published: 2018

11. Deep Gaussian Processes with Decoupled Inducing Inputs

Author: Havasi, Marton, Hernández-Lobato, José Miguel, and Murillo-Fuentes, Juan José
Subjects: Statistics - Machine Learning
Abstract: Deep Gaussian Processes (DGP) are hierarchical generalizations of Gaussian Processes (GP) that have proven to work effectively on a multiple supervised regression tasks. They combine the well calibrated uncertainty estimates of GPs with the great flexibility of multilayer models. In DGPs, given the inputs, the outputs of the layers are Gaussian distributions parameterized by their means and covariances. These layers are realized as Sparse GPs where the training data is approximated using a small set of pseudo points. In this work, we show that the computational cost of DGPs can be reduced with no loss in performance by using a separate, smaller set of pseudo points when calculating the layerwise variance while using a larger set of pseudo points when calculating the layerwise mean. This enabled us to train larger models that have lower cost and better predictive performance.
Published: 2018

12. Sampling the Variational Posterior with Local Refinement

Author: Havasi, Marton, primary, Snoek, Jasper, additional, Tran, Dustin, additional, Gordon, Jonathan, additional, and Hernández-Lobato, José Miguel, additional
Published: 2021
Full Text: View/download PDF

13. Minimal random code learning: Getting bits back from compressed model parameters

Author: Havasi, Marton, Peharz, Robert, Hernández-Lobato, José Miguel, and Uncertainty in Artificial Intelligence
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), SDG 7 - Affordable and Clean Energy, Machine Learning (cs.LG)
Abstract: While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an utmost goal in deep learning. A typical approach is to train a set of deterministic weights, while applying certain techniques such as pruning and quantization, in order that the empirical weight distribution becomes amenable to Shannon-style coding schemes. However, as shown in this paper, relaxing weight determinism and using a full variational distribution over weights allows for more efficient coding schemes and consequently higher compression rates. In particular, following the classical bits-back argument, we encode the network weights using a random sample, requiring only a number of bits corresponding to the Kullback-Leibler divergence between the sampled variational distribution and the encoding distribution. By imposing a constraint on the Kullback-Leibler divergence, we are able to explicitly control the compression rate, while optimizing the expected loss on the training set. The employed encoding scheme can be shown to be close to the optimal information-theoretical lower bound, with respect to the employed variational family. Our method sets new state-of-the-art in neural network compression, as it strictly dominates previous approaches in a Pareto sense: On the benchmarks LeNet-5/MNIST and VGG-16/CIFAR-10, our approach yields the best test performance for a fixed memory budget, and vice versa, it achieves the highest compression rates for a fixed test performance., Comment: Under review as a conference paper at ICLR 2019
Published: 2020
Full Text: View/download PDF

14. A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

Author: Bhardwaj, Kshitij, primary, Havasi, Marton, additional, Yao, Yuan, additional, Brooks, David M., additional, Hernández-Lobato, José Miguel, additional, and Wei, Gu-Yeon, additional
Published: 2020
Full Text: View/download PDF

15. Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization

Author: Bhardwaj, Kshitij, primary, Havasi, Marton, additional, Yao, Yuan, additional, Brooks, David M., additional, Lobato, Jose Miguel Hernendez, additional, and Wei, Gu-Yeon, additional
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

15 results on '"Havasi, Marton"'

1. Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

2. Understanding and Mitigating Tokenization Bias in Language Models

3. Guarantee Regions for Local Explanations

4. What Makes a Good Explanation?: A Harmonized View of Properties of Explanations

5. Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

6. Training independent subnetworks for robust prediction

7. Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding

8. Advances in compression using probabilistic models

9. Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters

10. Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

11. Deep Gaussian Processes with Decoupled Inducing Inputs

12. Sampling the Variational Posterior with Local Refinement

13. Minimal random code learning: Getting bits back from compressed model parameters

14. A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCs

15. Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

15 results on '"Havasi, Marton"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources