Author: "Zhmoginov, Andrey" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhmoginov, Andrey"' showing total 34 results

Start Over Author "Zhmoginov, Andrey"

34 results on '"Zhmoginov, Andrey"'

1. Learning and Unlearning of Fabricated Knowledge in Language Models

Author: Sun, Chen, Miller, Nolan Andrew, Zhmoginov, Andrey, Vladymyrov, Max, and Sandler, Mark
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: What happens when a new piece of knowledge is introduced into the training data and how long does it last while a large language model (LM) continues to train? We investigate this question by injecting facts into LMs from a new probing dataset, "Outlandish", which is designed to permit the testing of a spectrum of different fact types. When studying how robust these memories are, there appears to be a sweet spot in the spectrum of fact novelty between consistency with world knowledge and total randomness, where the injected memory is the most enduring. Specifically we show that facts that conflict with common knowledge are remembered for tens of thousands of training steps, while prompts not conflicting with common knowledge (mundane), as well as scrambled prompts (randomly jumbled) are both forgotten much more rapidly. Further, knowledge-conflicting facts can "prime'' how the language model hallucinates on logically unrelated prompts, showing their propensity for non-target generalization, while both mundane and randomly jumbled facts prime significantly less. Finally, we show that impacts of knowledge-conflicting facts in LMs, though they can be long lasting, can be largely erased by novel application of multi-step sparse updates, even while the training ability of the model is preserved. As such, this very simple procedure has direct implications for mitigating the effects of data poisoning in training.
Published: 2024

2. MELODI: Exploring Memory Compression for Long Contexts

Author: Chen, Yinpeng, Hutchins, DeLesley, Jansen, Aren, Zhmoginov, Andrey, Racz, David, and Andersen, Jesper
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: We present MELODI, a novel memory architecture designed to efficiently process long documents using short context windows. The key principle behind MELODI is to represent short-term and long-term memory as a hierarchical compression scheme across both network layers and context windows. Specifically, the short-term memory is achieved through recurrent compression of context windows across multiple layers, ensuring smooth transitions between windows. In contrast, the long-term memory performs further compression within a single middle layer and aggregates information across context windows, effectively consolidating crucial information from the entire history. Compared to a strong baseline - the Memorizing Transformer employing dense attention over a large long-term memory (64K key-value pairs) - our method demonstrates superior performance on various long-context datasets while remarkably reducing the memory footprint by a factor of 8.
Published: 2024

3. Narrowing the Focus: Learned Optimizers for Pretrained Models

Author: Kristiansen, Gus, Sandler, Mark, Zhmoginov, Andrey, Miller, Nolan, Goyal, Anirudh, Lee, Jihwan, and Vladymyrov, Max
Subjects: Computer Science - Machine Learning
Abstract: In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of the training process. Learned optimizers have shown some initial promise, but are generally unsuccessful as a general optimization mechanism applicable to every problem. In this work we explore a different direction: instead of learning general optimizers, we instead specialize them to a specific training environment. We propose a novel optimizer technique that learns a layer-specific linear combination of update directions provided by a set of base optimizers, effectively adapting its strategy to the specific model and dataset. When evaluated on image classification tasks, this specialized optimizer significantly outperforms both traditional off-the-shelf methods such as Adam, as well as existing general learned optimizers. Moreover, it demonstrates robust generalization with respect to model initialization, evaluating on unseen datasets, and training durations beyond its meta-training horizon.
Published: 2024

4. Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning

Author: Vladymyrov, Max, Zhmoginov, Andrey, and Sandler, Mark
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios., Comment: TMLR
Published: 2023

5. Training trajectories, mini-batch losses and the curious role of the learning rate

Author: Sandler, Mark, Zhmoginov, Andrey, Vladymyrov, Max, and Miller, Nolan
Subjects: Computer Science - Machine Learning
Abstract: Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for ResNet the loss for any fixed mini-batch can be accurately modeled by a quadratic function and a very low loss value can be reached in just one step of gradient descent with sufficiently large learning rate. We propose a simple model that allows to analyze the relationship between the gradients of stochastic mini-batches and the full batch. Our analysis allows us to discover the equivalency between iterate aggregates and specific learning rate schedules. In particular, for Exponential Moving Average (EMA) and Stochastic Weight Averaging we show that our proposed model matches the observed training trajectories on ImageNet. Our theoretical model predicts that an even simpler averaging technique, averaging just two points a many steps apart, significantly improves accuracy compared to the baseline. We validated our findings on ImageNet and other datasets using ResNet architecture., Comment: 21 pages, 14 figures
Published: 2023

6. Transformers learn in-context by gradient descent

Author: von Oswald, Johannes, Niklasson, Eyvind, Randazzo, Ettore, Sacramento, João, Mordvintsev, Alexander, Zhmoginov, Andrey, and Vladymyrov, Max
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of in-context learning in optimized Transformers. Building on this insight, we furthermore identify how Transformers surpass the performance of plain gradient descent by learning an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers. Code to reproduce the experiments can be found at https://github.com/google-research/self-organising-systems/tree/master/transformers_learn_icl_by_gd .
Published: 2022

7. Decentralized Learning with Multi-Headed Distillation

Author: Zhmoginov, Andrey, Sandler, Mark, Miller, Nolan, Kristiansen, Gus, and Vladymyrov, Max
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.
Published: 2022

8. Fine-tuning Image Transformers using Learnable Memory

Author: Sandler, Mark, Zhmoginov, Andrey, Vladymyrov, Max, and Jackson, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens". We show that augmenting a model with just a handful of such tokens per layer significantly improves accuracy when compared to conventional head-only fine-tuning, and performs only slightly below the significantly more expensive full fine-tuning. We then propose an attention-masking approach that enables extension to new downstream tasks, with a computation reuse. In this setup in addition to being parameters efficient, models can execute both old and new tasks as a part of single inference at a small incremental cost., Comment: CVPR 2022, to appear
Published: 2022

9. HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

Author: Zhmoginov, Andrey, Sandler, Mark, and Vladymyrov, Max
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable.
Published: 2022

10. Compositional Models: Multi-Task Learning and Knowledge Transfer with Modular Networks

Author: Zhmoginov, Andrey, Bashkirova, Dina, and Sandler, Mark
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Conditional computation and modular networks have been recently proposed for multitask learning and other problems as a way to decompose problem solving into multiple reusable computational blocks. We propose a new approach for learning modular networks based on the isometric version of ResNet with all residual blocks having the same configuration and the same number of parameters. This architectural choice allows adding, removing and changing the order of residual blocks. In our method, the modules can be invoked repeatedly and allow knowledge transfer to novel tasks by adjusting the order of computation. This allows soft weight sharing between tasks with only a small increase in the number of parameters. We show that our method leads to interpretable self-organization of modules in case of multi-task learning, transfer learning and domain adaptation while achieving competitive results on those tasks. From practical perspective, our approach allows to: (a) reuse existing modules for learning new task by adjusting the computation order, (b) use it for unsupervised multi-source domain adaptation to illustrate that adaptation to unseen data can be achieved by only manipulating the order of pretrained modules, (c) show how our approach can be used to increase accuracy of existing architectures for image classification tasks such as ImageNet, without any parameter increase, by reusing the same block multiple times.
Published: 2021

11. BasisNet: Two-stage Model Synthesis for Efficient Inference

Author: Zhang, Mingda, Chu, Chun-Te, Zhmoginov, Andrey, Howard, Andrew, Jou, Brendan, Zhu, Yukun, Zhang, Li, Hwa, Rebecca, and Kovashka, Adriana
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach incorporates a lightweight model to preview the input and generate input-dependent combination coefficients, which later controls the synthesis of a more accurate specialist model to make final prediction. The two-stage model synthesis strategy can be applied to any network architectures and both stages are jointly trained. We also show that proper training recipes are critical for increasing generalizability for such high capacity neural networks. On ImageNet classification benchmark, our BasisNet with MobileNets as backbone demonstrated clear advantage on accuracy-efficiency trade-off over several strong baselines. Specifically, BasisNet-MobileNetV3 obtained 80.3% top-1 accuracy with only 290M Multiply-Add operations, halving the computational cost of previous state-of-the-art without sacrificing accuracy. With early termination, the average cost can be further reduced to 198M MAdds while maintaining accuracy of 80.0% on ImageNet., Comment: To appear, 4th Workshop on Efficient Deep Learning for Computer Vision (ECV2021), CVPR2021 Workshop
Published: 2021

12. Meta-Learning Bidirectional Update Rules

Author: Sandler, Mark, Vladymyrov, Max, Zhmoginov, Andrey, Miller, Nolan, Jackson, Andrew, Madams, Tom, and Arcas, Blaise Aguera y
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks have neither explicit notion of nor ever receive gradients. The synapses and neurons are updated using a bidirectional Hebb-style update rule parameterized by a shared low-dimensional "genome". We show that such genomes can be meta-learned from scratch, using either conventional optimization techniques, or evolutionary strategies, such as CMA-ES. Resulting update rules generalize to unseen tasks and train faster than gradient descent based optimizers for several standard computer vision and synthetic tasks., Comment: ICML 2021, 17 pages
Published: 2021

13. Large-Scale Generative Data-Free Distillation

Author: Luo, Liangchen, Sandler, Mark, Lin, Zi, Zhmoginov, Andrey, and Howard, Andrew
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this problem, but they are either highly time-consuming or unable to scale to large datasets. To this end, we propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics of the trained teacher network. This enables us to build an ensemble of generators without training data that can efficiently produce substitute inputs for subsequent distillation. The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100 to 95.02% and 77.02% respectively. Furthermore, we are able to scale it to ImageNet dataset, which to the best of our knowledge, has never been done using generative models in a data-free setting.
Published: 2020

14. Image segmentation via Cellular Automata

Author: Sandler, Mark, Zhmoginov, Andrey, Luo, Liangcheng, Mordvintsev, Alexander, Randazzo, Ettore, and Arcas, Blaise Agúera y
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In this paper, we propose a new approach for building cellular automata to solve real-world segmentation problems. We design and train a cellular automaton that can successfully segment high-resolution images. We consider a colony that densely inhabits the pixel grid, and all cells are governed by a randomized update that uses the current state, the color, and the state of the $3\times 3$ neighborhood. The space of possible rules is defined by a small neural network. The update rule is applied repeatedly in parallel to a large random subset of cells and after convergence is used to produce segmentation masks that are then back-propagated to learn the optimal update rules using standard gradient descent methods. We demonstrate that such models can be learned efficiently with only limited trajectory length and that they show remarkable ability to organize the information to produce a globally consistent segmentation result, using only local information exchange. From a practical perspective, our approach allows us to build very efficient models -- our smallest automaton uses less than 10,000 parameters to solve complex segmentation tasks.
Published: 2020

15. Non-discriminative data or weak model? On the relative importance of data and model resolution

Author: Sandler, Mark, Baccash, Jonathan, Zhmoginov, Andrey, and Howard, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information content in the low-resolution input causes decay in the accuracy. In this paper, we show that up to a point, the input resolution alone plays little role in the network performance, and it is the internal resolution that is the critical driver of model quality. We then build on these insights to develop novel neural network architectures that we call \emph{Isometric Neural Networks}. These models maintain a fixed internal resolution throughout their entire depth. We demonstrate that they lead to high accuracy models with low activation footprint and parameter count., Comment: ICCV 2019 Workshop on Real-World Recognition from Low-Quality Images and Videos
Published: 2019

16. Information-Bottleneck Approach to Salient Region Discovery

Author: Zhmoginov, Andrey, Fischer, Ian, and Sandler, Mark
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: We propose a new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, our attention model produces a Boolean rather than a continuous mask, entirely concealing the information in masked-out pixels. Using a set of synthetic datasets based on MNIST and CIFAR10 and the SVHN datasets, we demonstrate that our method can successfully attend to features known to define the image class.
Published: 2019

17. K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

Author: Mudrakarta, Pramod Kaushik, Sandler, Mark, Zhmoginov, Andrey, and Howard, Andrew
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network to perform well on qualitatively different problems (e.g. converting a Single Shot MultiBox Detection (SSD) model into a 1000-class image classification model while reusing 98% of parameters of the SSD feature extractor). Similarly, we show that re-learning existing low-parameter layers (such as depth-wise convolutions) while keeping the rest of the network frozen also improves transfer-learning accuracy significantly. Our approach allows both simultaneous (multi-task) as well as sequential transfer learning. In several multi-task learning problems, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task performance., Comment: published at ICLR 2019
Published: 2018

18. MobileNetV2: Inverted Residuals and Linear Bottlenecks

Author: Sandler, Mark, Howard, Andrew, Zhu, Menglong, Zhmoginov, Andrey, and Chen, Liang-Chieh
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters
Published: 2018

19. CycleGAN, a Master of Steganography

Author: Chu, Casey, Zhmoginov, Andrey, and Sandler, Mark
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning, Statistics - Machine Learning
Abstract: CycleGAN (Zhu et al. 2017) is one recent successful approach to learn a transformation between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image into the images it generates in a nearly imperceptible, high-frequency signal. This trick ensures that the generator can recover the original sample and thus satisfy the cyclic consistency requirement, while the generated image remains realistic. We connect this phenomenon with adversarial attacks by viewing CycleGAN's training procedure as training a generator of adversarial examples and demonstrate that the cyclic consistency loss causes CycleGAN to be especially vulnerable to adversarial attacks., Comment: NIPS 2017, workshop on Machine Deception
Published: 2017

20. The Power of Sparsity in Convolutional Neural Networks

Author: Changpinyo, Soravit, Sandler, Mark, and Zhmoginov, Andrey
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep convolutional networks are well-known for their high computational and memory demands. Given limited resources, how does one design a network that balances its size, training time, and prediction accuracy? A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In many cases this leads to significantly smaller networks with only minimal changes to accuracy. In this paper, we take a step further by empirically examining a strategy for deactivating connections between filters in convolutional layers in a way that allows us to harvest savings both in run-time and memory for many network architectures. More specifically, we generalize 2D convolution to use a channel-wise sparse connection structure and show that this leads to significantly better results than the baseline approach for large networks including VGG and Inception V3.
Published: 2017

21. Information-Bottleneck Approach to Salient Region Discovery

Author: Zhmoginov, Andrey, Fischer, Ian, Sandler, Mark, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hutter, Frank, editor, Kersting, Kristian, editor, Lijffijt, Jefrey, editor, and Valera, Isabel, editor
Published: 2021
Full Text: View/download PDF

22. Inverting face embeddings with convolutional neural networks

Author: Zhmoginov, Andrey and Sandler, Mark
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Deep neural networks have dramatically advanced the state of the art for many areas of machine learning. Recently they have been shown to have a remarkable ability to generate highly complex visual artifacts such as images and text rather than simply recognize them. In this work we use neural networks to effectively invert low-dimensional face embeddings while producing realistically looking consistent images. Our contribution is twofold, first we show that a gradient ascent style approaches can be used to reproduce consistent images, with a help of a guiding image. Second, we demonstrate that we can train a separate neural network to effectively solve the minimization problem in one pass, and generate images in real-time. We then evaluate the loss imposed by using a neural network instead of the gradient descent by comparing the final values of the minimized loss function.
Published: 2016

23. Antimatter interferometry for gravity measurements

Author: Hamilton, Paul, Zhmoginov, Andrey, Robicheaux, Francis, Fajans, Joel, Wurtele, Jonathan, and Mueller, Holger
Subjects: Physics - Atomic Physics, General Relativity and Quantum Cosmology, High Energy Physics - Phenomenology, Physics - Plasma Physics
Abstract: We describe a light-pulse atom interferometer that is suitable for any species of atom and even for electrons and protons as well as their antiparticles, in particular for testing the Einstein equivalence principle with antihydrogen. The design obviates the need for resonant lasers through far-off resonant Bragg beam splitters and makes efficient use of scarce atoms by magnetic confinement and atom recycling. We expect to reach an initial accuracy of better than 1% for the acceleration of free fall of antihydrogen, which can be improved to the part-per million level., Comment: 5 pages, 4 figures. Minor changes, accepted for PRL
Published: 2013
Full Text: View/download PDF

24. Nonlinear dynamics of antihydrogen in magnetostatic traps: implications for gravitational measurements

Author: Zhmoginov, Andrey, Charman, Andrew, Fajans, Joel, and Wurtele, Jonathan
Subjects: Physics - Atomic Physics
Abstract: The influence of gravity on antihydrogen dynamics in magnetic traps is studied. The advantages and disadvantages of various techniques for measuring the ratio of the gravitational mass to the inertial mass of antihydrogen are discussed. Theoretical considerations and numerical simulations indicate that stochasticity may be especially important for some experimental techniques in vertically oriented traps., Comment: 12 pages, 9 figures
Published: 2013
Full Text: View/download PDF

25. Information-Bottleneck Approach to Salient Region Discovery

Author: Zhmoginov, Andrey, primary, Fischer, Ian, additional, and Sandler, Mark, additional
Published: 2021
Full Text: View/download PDF

26. Antimatter interferometry for gravity measurements.

Author: Hamilton, Paul, Zhmoginov, Andrey, Robicheaux, Francis, Fajans, Joel, Wurtele, Jonathan S, and Müller, Holger
Subjects: physics.atom-ph, gr-qc, hep-ph, physics.plasm-ph, General Physics, Physical Sciences, Mathematical Sciences, Engineering
Abstract: We describe a light-pulse atom interferometer that is suitable for any species of atom and even for electrons and protons as well as their antiparticles, in particular, for testing the Einstein equivalence principle with antihydrogen. The design obviates the need for resonant lasers through far-off resonant Bragg beam splitters and makes efficient use of scarce atoms by magnetic confinement and atom recycling. We expect to reach an initial accuracy of better than 1% for the acceleration of the free fall of antihydrogen, which can be improved to the part-per million level.
Published: 2014

27. Decentralized Learning with Multi-Headed Distillation

Author: Zhmoginov, Andrey, primary, Sandler, Mark, additional, Miller, Nolan, additional, Kristiansen, Gus, additional, and Vladymyrov, Max, additional
Published: 2023
Full Text: View/download PDF

28. Continual Few-Shot Learning Using HyperTransformers

Author: Vladymyrov, Max, Zhmoginov, Andrey, Sandler, Mark, Vladymyrov, Max, Zhmoginov, Andrey, and Sandler, Mark
Abstract: We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.
Published: 2023

29. Fine-tuning Image Transformers using Learnable Memory

Author: Sandler, Mark, primary, Zhmoginov, Andrey, additional, Vladymyrov, Max, additional, and Jackson, Andrew, additional
Published: 2022
Full Text: View/download PDF

30. BasisNet: Two-stage Model Synthesis for Efficient Inference

Author: Zhang, Mingda, primary, Chu, Chun-Te, additional, Zhmoginov, Andrey, additional, Howard, Andrew, additional, Jou, Brendan, additional, Zhu, Yukun, additional, Zhang, Li, additional, Hwa, Rebecca, additional, and Kovashka, Adriana, additional
Published: 2021
Full Text: View/download PDF

31. Non-Discriminative Data or Weak Model? On the Relative Importance of Data and Model Resolution

Author: Sandler, Mark, primary, Baccash, Jonathan, additional, Zhmoginov, Andrey, additional, and Howard, Andrew, additional
Published: 2019
Full Text: View/download PDF

32. MobileNetV2: Inverted Residuals and Linear Bottlenecks

Author: Sandler, Mark, primary, Howard, Andrew, additional, Zhu, Menglong, additional, Zhmoginov, Andrey, additional, and Chen, Liang-Chieh, additional
Published: 2018
Full Text: View/download PDF

33. Fundamental Tests of Physics with Antihydrogen at ALPHA

Author: Miranda, Daniel, primary, Cesar, Claudio L., additional, Sacramento, Rodrigo L., additional, Ahmadi, Mostafa, additional, Nolan, Paul, additional, Pusa, Petteri, additional, Baquero-Ruiz, Marcelo, additional, Carruth, Celeste, additional, Charman, Andrew, additional, Evans, Lenny, additional, Fajans, Joel, additional, Povilus, Alex, additional, So, Chukman, additional, Wurtele, Jonathan, additional, Zhmoginov, Andrey, additional, Bertsche, William, additional, Butler, Eoin, additional, Ishida, Akira, additional, Capra, Andrea, additional, Menary, Scott, additional, Charlton, Michael, additional, Eriksson, Stefan, additional, Isaac, Christopher, additional, Jones, Steve, additional, Madsen, Niels, additional, Maxwell, Daniel, additional, Sameed, Muhammed, additional, van der Werf, Dirk, additional, Evetts, Nathan, additional, Gutierrez, Andrea, additional, Hardy, Walter, additional, Friesen, Tim, additional, Hangst, Jeffrey S., additional, Rasmussen, Chris O., additional, Tharp, Timothy D., additional, Fujiwara, Makoto, additional, Gill, David, additional, Kurchaninov, Leonid, additional, Mckenna, Joseph, additional, Michan, Juan M., additional, Olchanski, Konstantin, additional, Olin, Art, additional, Hayden, Michael, additional, Munich, Justine J., additional, Jonsell, Svante, additional, Momose, Takamasa, additional, Robicheaux, Francis, additional, Sarid, Eli, additional, and Thompson, Robert I., additional
Published: 2016
Full Text: View/download PDF

34. Resonant Wave-Particle Manipulation Techniques

Author: Zhmoginov, Andrey [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States); Princeton Univ., NJ (United States). Dept. of Astrophysical Sciences]
Published: 2012

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

34 results on '"Zhmoginov, Andrey"'

1. Learning and Unlearning of Fabricated Knowledge in Language Models

2. MELODI: Exploring Memory Compression for Long Contexts

3. Narrowing the Focus: Learned Optimizers for Pretrained Models

4. Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning

5. Training trajectories, mini-batch losses and the curious role of the learning rate

6. Transformers learn in-context by gradient descent

7. Decentralized Learning with Multi-Headed Distillation

8. Fine-tuning Image Transformers using Learnable Memory

9. HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

10. Compositional Models: Multi-Task Learning and Knowledge Transfer with Modular Networks

11. BasisNet: Two-stage Model Synthesis for Efficient Inference

12. Meta-Learning Bidirectional Update Rules

13. Large-Scale Generative Data-Free Distillation

14. Image segmentation via Cellular Automata

15. Non-discriminative data or weak model? On the relative importance of data and model resolution

16. Information-Bottleneck Approach to Salient Region Discovery

17. K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

18. MobileNetV2: Inverted Residuals and Linear Bottlenecks

19. CycleGAN, a Master of Steganography

20. The Power of Sparsity in Convolutional Neural Networks

21. Information-Bottleneck Approach to Salient Region Discovery

22. Inverting face embeddings with convolutional neural networks

23. Antimatter interferometry for gravity measurements

24. Nonlinear dynamics of antihydrogen in magnetostatic traps: implications for gravitational measurements

25. Information-Bottleneck Approach to Salient Region Discovery

26. Antimatter interferometry for gravity measurements.

27. Decentralized Learning with Multi-Headed Distillation

28. Continual Few-Shot Learning Using HyperTransformers

29. Fine-tuning Image Transformers using Learnable Memory

30. BasisNet: Two-stage Model Synthesis for Efficient Inference

31. Non-Discriminative Data or Weak Model? On the Relative Importance of Data and Model Resolution

32. MobileNetV2: Inverted Residuals and Linear Bottlenecks

33. Fundamental Tests of Physics with Antihydrogen at ALPHA

34. Resonant Wave-Particle Manipulation Techniques

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

34 results on '"Zhmoginov, Andrey"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources