Author: "Botev, Aleksandar" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Botev, Aleksandar"' showing total 24 results

Start Over Author "Botev, Aleksandar"

24 results on '"Botev, Aleksandar"'

1. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Author: Botev, Aleksandar, De, Soham, Smith, Samuel L, Fernando, Anushan, Muraru, George-Cristian, Haroun, Ruba, Berrada, Leonard, Pascanu, Razvan, Sessa, Pier Giuseppe, Dadashi, Robert, Hussenot, Léonard, Ferret, Johan, Girgin, Sertan, Bachem, Olivier, Andreev, Alek, Kenealy, Kathleen, Mesnard, Thomas, Hardin, Cassidy, Bhupatiraju, Surya, Pathak, Shreya, Sifre, Laurent, Rivière, Morgane, Kale, Mihir Sanjay, Love, Juliette, Tafti, Pouya, Joulin, Armand, Fiedel, Noah, Senter, Evan, Chen, Yutian, Srinivasan, Srivatsan, Desjardins, Guillaume, Budden, David, Doucet, Arnaud, Vikram, Sharad, Paszke, Adam, Gale, Trevor, Borgeaud, Sebastian, Chen, Charlie, Brock, Andy, Paterson, Antonia, Brennan, Jenny, Risdal, Meg, Gundluru, Raj, Devanathan, Nesh, Mooney, Paul, Chauhan, Nilay, Culliton, Phil, Martins, Luiz Gustavo, Bandy, Elisa, Huntsperger, David, Cameron, Glenn, Zucker, Arthur, Warkentin, Tris, Peran, Ludovic, Giang, Minh, Ghahramani, Zoubin, Farabet, Clément, Kavukcuoglu, Koray, Hassabis, Demis, Hadsell, Raia, Teh, Yee Whye, and de Frietas, Nando
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Published: 2024

2. Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Author: De, Soham, Smith, Samuel L., Fernando, Anushan, Botev, Aleksandar, Cristian-Muraru, George, Gu, Albert, Haroun, Ruba, Berrada, Leonard, Chen, Yutian, Srinivasan, Srivatsan, Desjardins, Guillaume, Doucet, Arnaud, Budden, David, Teh, Yee Whye, Pascanu, Razvan, De Freitas, Nando, and Gulcehre, Caglar
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over 6 times fewer tokens. We also show that Griffin can extrapolate on sequences significantly longer than those seen during training. Our models match the hardware efficiency of Transformers during training, and during inference they have lower latency and significantly higher throughput. We scale Griffin up to 14B parameters, and explain how to shard our models for efficient distributed training., Comment: 25 pages, 11 figures
Published: 2024

3. Applications of flow models to the generation of correlated lattice QCD ensembles

Author: Abbott, Ryan, Botev, Aleksandar, Boyda, Denis, Hackett, Daniel C., Kanwar, Gurtej, Racanière, Sébastien, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., and Urban, Julian M.
Subjects: High Energy Physics - Lattice, Computer Science - Machine Learning
Abstract: Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters. This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables. Three different proof-of-concept applications are demonstrated using a novel residual flow architecture: continuum limits of gauge theories, the mass dependence of QCD observables, and hadronic matrix elements based on the Feynman-Hellmann approach. In all three cases, it is shown that statistical uncertainties are significantly reduced when machine-learned flows are incorporated as compared with the same calculations performed with uncorrelated ensembles or direct reweighting., Comment: 12 pages, 2 tables, 5 figures. v2: accepted for publication
Published: 2024

4. Normalizing flows for lattice gauge theory in arbitrary space-time dimension

Author: Abbott, Ryan, Albergo, Michael S., Botev, Aleksandar, Boyda, Denis, Cranmer, Kyle, Hackett, Daniel C., Kanwar, Gurtej, Matthews, Alexander G. D. G., Racanière, Sébastien, Razavi, Ali, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., and Urban, Julian M.
Subjects: High Energy Physics - Lattice, Condensed Matter - Statistical Mechanics, Computer Science - Machine Learning
Abstract: Applications of normalizing flows to the sampling of field configurations in lattice gauge theory have so far been explored almost exclusively in two space-time dimensions. We report new algorithmic developments of gauge-equivariant flow architectures facilitating the generalization to higher-dimensional lattice geometries. Specifically, we discuss masked autoregressive transformations with tractable and unbiased Jacobian determinants, a key ingredient for scalable and asymptotically exact flow-based sampling algorithms. For concreteness, results from a proof-of-principle application to SU(3) lattice gauge theory in four space-time dimensions are reported.
Published: 2023

5. Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Author: He, Bobby, Martens, James, Zhang, Guodong, Botev, Aleksandar, Brock, Andrew, Smith, Samuel L, and Teh, Yee Whye
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which we define as networks without skips or normalisation). However, these approaches are incompatible with the self-attention layers present in transformers, whose kernels are intrinsically more complicated to analyse and control. And so the question remains: is it possible to train deep vanilla transformers? We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. Our methods address various intricacies specific to signal propagation in transformers, including the interaction with positional encoding and causal masking. In experiments on WikiText-103 and C4, our approaches enable deep transformers without normalisation to train at speeds matching their standard counterparts, and deep vanilla transformers to reach the same performance as standard ones after about 5 times more iterations., Comment: ICLR 2023
Published: 2023

6. Aspects of scaling and scalability for flow-based sampling of lattice QCD

Author: Abbott, Ryan, Albergo, Michael S., Botev, Aleksandar, Boyda, Denis, Cranmer, Kyle, Hackett, Daniel C., Matthews, Alexander G. D. G., Racanière, Sébastien, Razavi, Ali, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., and Urban, Julian M.
Subjects: High Energy Physics - Lattice, Condensed Matter - Statistical Mechanics, Computer Science - Machine Learning
Abstract: Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing. However, these demonstrations have been at the scale of toy models, and it remains to be determined whether they can be applied to state-of-the-art lattice quantum chromodynamics calculations. Assessing the viability of sampling algorithms for lattice field theory at scale has traditionally been accomplished using simple cost scaling laws, but as we discuss in this work, their utility is limited for flow-based approaches. We conclude that flow-based approaches to sampling are better thought of as a broad family of algorithms with different scaling properties, and that scalability must be assessed experimentally., Comment: 22 pages, 8 figures
Published: 2022

7. Sampling QCD field configurations with gauge-equivariant flow models

Author: Abbott, Ryan, Albergo, Michael S., Botev, Aleksandar, Boyda, Denis, Cranmer, Kyle, Hackett, Daniel C., Kanwar, Gurtej, Matthews, Alexander G. D. G., Racanière, Sébastien, Razavi, Ali, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., and Urban, Julian M.
Subjects: High Energy Physics - Lattice
Abstract: Machine learning methods based on normalizing flows have been shown to address important challenges, such as critical slowing-down and topological freezing, in the sampling of gauge field configurations in simple lattice field theories. A critical question is whether this success will translate to studies of QCD. This Proceedings presents a status update on advances in this area. In particular, it is illustrated how recently developed algorithmic components may be combined to construct flow-based sampling algorithms for QCD in four dimensions. The prospects and challenges for future use of this approach in at-scale applications are summarized., Comment: Submitted as a proceedings to the 39th International Symposium on Lattice Field Theory (Lattice 2022)
Published: 2022

8. Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Author: Zhang, Guodong, Botev, Aleksandar, and Martens, James
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Training very deep neural networks is still an extremely challenging task. The common solution is to use shortcut connections and normalization layers, which are both crucial ingredients in the popular ResNet architecture. However, there is strong evidence to suggest that ResNets behave more like ensembles of shallower networks than truly deep ones. Recently, it was shown that deep vanilla networks (i.e. networks without normalization layers or shortcut connections) can be trained as fast as ResNets by applying certain transformations to their activation functions. However, this method (called Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks that overfit significantly more than ResNets on ImageNet. In this work, we rectify this situation by developing a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, achieves validation accuracies with deep vanilla networks that are competitive with ResNets (of the same width/depth), and significantly higher than those obtained with the Edge of Chaos (EOC) method. And unlike with EOC, the validation accuracies we obtain do not get worse with depth., Comment: ICLR 2022
Published: 2022

9. SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision

Author: Higgins, Irina, Wirnsberger, Peter, Jaegle, Andrew, and Botev, Aleksandar
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: A recently proposed class of models attempts to learn latent dynamics from high-dimensional observations, like images, using priors informed by Hamiltonian mechanics. While these models have important potential applications in areas like robotics or autonomous driving, there is currently no good way to evaluate their performance: existing methods primarily rely on image reconstruction quality, which does not always reflect the quality of the learnt latent dynamics. In this work, we empirically highlight the problems with the existing measures and develop a set of new measures, including a binary indicator of whether the underlying Hamiltonian dynamics have been faithfully captured, which we call Symplecticity Metric or SyMetric. Our measures take advantage of the known properties of Hamiltonian dynamics and are more discriminative of the model's ability to capture the underlying dynamics than reconstruction error. Using SyMetric, we identify a set of architectural choices that significantly improve the performance of a previously proposed model for inferring latent dynamics from pixels, the Hamiltonian Generative Network (HGN). Unlike the original HGN, the new HGN++ is able to discover an interpretable phase space with physically meaningful latents on some datasets. Furthermore, it is stable for significantly longer rollouts on a diverse range of 13 datasets, producing rollouts of essentially infinite length both forward and backwards in time with no degradation in quality on a subset of the datasets.
Published: 2021

10. Which priors matter? Benchmarking models for learning latent dynamics

Author: Botev, Aleksandar, Jaegle, Andrew, Wirnsberger, Peter, Hennes, Daniel, and Higgins, Irina
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Learning dynamics is at the heart of many important applications of machine learning (ML), such as robotics and autonomous driving. In these settings, ML algorithms typically need to reason about a physical system using high dimensional observations, such as images, without access to the underlying state. Recently, several methods have proposed to integrate priors from classical mechanics into ML models to address the challenge of physical reasoning from images. In this work, we take a sober look at the current capabilities of these models. To this end, we introduce a suite consisting of 17 datasets with visual observations based on physical systems exhibiting a wide range of dynamics. We conduct a thorough and detailed comparison of the major classes of physically inspired methods alongside several strong baselines. While models that incorporate physical priors can often learn latent spaces with desirable properties, our results demonstrate that these methods fail to significantly improve upon standard techniques. Nonetheless, we find that the use of continuous and time-reversible dynamics benefits models of all classes.
Published: 2021

11. Better, Faster Fermionic Neural Networks

Author: Spencer, James S., Pfau, David, Botev, Aleksandar, and Foulkes, W. M. C.
Subjects: Physics - Computational Physics, Computer Science - Machine Learning, Physics - Chemical Physics
Abstract: The Fermionic Neural Network (FermiNet) is a recently-developed neural network architecture that can be used as a wavefunction Ansatz for many-electron systems, and has already demonstrated high accuracy on small systems. Here we present several improvements to the FermiNet that allow us to set new records for speed and accuracy on challenging systems. We find that increasing the size of the network is sufficient to reach chemical accuracy on atoms as large as argon. Through a combination of implementing FermiNet in JAX and simplifying several parts of the network, we are able to reduce the number of GPU hours needed to train the FermiNet on large systems by an order of magnitude. This enables us to run the FermiNet on the challenging transition of bicyclobutane to butadiene and compare against the PauliNet on the automerization of cyclobutadiene, and we achieve results near the state of the art for both., Comment: To appear at the 3rd NeurIPS Workshop on Machine Learning and Physical Science
Published: 2020

12. Disentangling by Subspace Diffusion

Author: Pfau, David, Higgins, Irina, Botev, Aleksandar, and Racanière, Sébastien
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We present a novel nonparametric algorithm for symmetry-based disentangling of data manifolds, the Geometric Manifold Component Estimator (GEOMANCER). GEOMANCER provides a partial answer to the question posed by Higgins et al. (2018): is it possible to learn how to factorize a Lie group solely from observations of the orbit of an object it acts on? We show that fully unsupervised factorization of a data manifold is possible if the true metric of the manifold is known and each factor manifold has nontrivial holonomy -- for example, rotation in 3D. Our algorithm works by estimating the subspaces that are invariant under random walk diffusion, giving an approximation to the de Rham decomposition from differential geometry. We demonstrate the efficacy of GEOMANCER on several complex synthetic manifolds. Our work reduces the question of whether unsupervised disentangling is possible to the question of whether unsupervised metric learning is possible, providing a unifying insight into the geometric nature of representation learning., Comment: Camera-ready version for NeurIPS 2020
Published: 2020

13. Aspects of scaling and scalability for flow-based sampling of lattice QCD

Author: Abbott, Ryan, Albergo, Michael S., Botev, Aleksandar, Boyda, Denis, Cranmer, Kyle, Hackett, Daniel C., Matthews, Alexander G. D. G., Racanière, Sébastien, Razavi, Ali, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., and Urban, Julian M.
Published: 2023
Full Text: View/download PDF

14. The Gauss-Newton matrix for Deep Learning models and its applications

Author: Botev, Aleksandar
Subjects: 006.3
Abstract: Deep Learning learning has recently become one of the most predominantly used techniques in the field of Machine Learning. Optimising these models, however, is very difficult and in order to scale the training to large datasets and model sizes practitioners use first-order optimisation methods. One of the main challenges of using the more sophisticated second-order optimisation methods is that the curvature matrices of the loss surfaces of neural networks are usually intractable, which is an open avenue for research. In this work, we investigate the Gauss-Newton matrix for neural networks and its application in different areas of Machine Learning. Firstly, we analyse the structure of the Hessian and Gauss-Newton matrices for Feed Forward Neural Networks. Several insightful results are presented, and the relationship of these two matrices to each other and to the Fisher matrix is discussed. Based on this analysis, we develop a block-diagonal Kronecker Factored approximation to the Gauss-Newton matrix. The method is experimentally validated in the context of second-order optimisation, where it achieves competitive performance to other approaches on three datasets. In the last part of this work, we investigate the application of the proposed method for constructing an approximation to the posterior distribution of the parameters of a neural network. The approximation is constructed by adapting the well known Laplace approximation using the Kronecker factored Gauss-Newton matrix approximation. The method is compared against Dropout, a commonly used technique for uncertainty estimation, and achieves better uncertainty estimates on out of distribution data and is less susceptible to adversarial attacks. By combining the Laplace approximation with the Bayesian framework for online learning, we develop a scalable method for overcoming catastrophic forgetting. It achieves significantly better results than other approaches in the literature on several sequential learning tasks. The final chapter discusses potential future research directions that could be of interest to the curious reader.
Published: 2020

15. Hamiltonian Generative Networks

Author: Toth, Peter, Rezende, Danilo Jimenez, Jaegle, Andrew, Racanière, Sébastien, Botev, Aleksandar, and Higgins, Irina
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The Hamiltonian formalism plays a central role in classical and quantum physics. Hamiltonians are the main tool for modelling the continuous time evolution of systems with conserved quantities, and they come equipped with many useful properties, like time reversibility and smooth interpolation in time. These properties are important for many machine learning problems - from sequence prediction to reinforcement learning and density modelling - but are not typically provided out of the box by standard tools such as recurrent neural networks. In this paper, we introduce the Hamiltonian Generative Network (HGN), the first approach capable of consistently learning Hamiltonian dynamics from high-dimensional observations (such as images) without restrictive domain assumptions. Once trained, we can use HGN to sample new trajectories, perform rollouts both forward and backward in time and even speed up or slow down the learned dynamics. We demonstrate how a simple modification of the network architecture turns HGN into a powerful normalising flow model, called Neural Hamiltonian Flow (NHF), that uses Hamiltonian dynamics to model expressive densities. We hope that our work serves as a first practical demonstration of the value that the Hamiltonian formalism can bring to deep learning.
Published: 2019

16. Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

Author: Ritter, Hippolyt, Botev, Aleksandar, and Barber, David
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting., Comment: 13 pages, 6 figures
Published: 2018

17. Practical Gauss-Newton Optimisation for Deep Learning

Author: Botev, Aleksandar, Ritter, Hippolyt, and Barber, David
Subjects: Statistics - Machine Learning
Abstract: We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks. Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a labo- rious process, our approach can provide good performance even when used with default set- tings. A side result of our work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation., Comment: ICML 2017
Published: 2017

18. Applications of flow models to the generation of correlated lattice QCD ensembles

Author: Abbott, Ryan, primary, Botev, Aleksandar, additional, Boyda, Denis, additional, Hackett, Daniel C., additional, Kanwar, Gurtej, additional, Racanière, Sébastien, additional, Rezende, Danilo J., additional, Romero-López, Fernando, additional, Shanahan, Phiala E., additional, and Urban, Julian M., additional
Published: 2024
Full Text: View/download PDF

19. Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent

Author: Botev, Aleksandar, Lever, Guy, and Barber, David
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods. As natural special cases we re-derive classical momentum and Nesterov's accelerated gradient method, lending a new intuitive interpretation to the latter algorithm. We show that a new algorithm, which we term Regularised Gradient Descent, can converge more quickly than either Nesterov's algorithm or the classical momentum algorithm.
Published: 2016

20. Dealing with a large number of classes -- Likelihood, Discrimination or Ranking?

Author: Barber, David and Botev, Aleksandar
Subjects: Statistics - Machine Learning
Abstract: We consider training probabilistic classifiers in the case of a large number of classes. The number of classes is assumed too large to perform exact normalisation over all classes. To account for this we consider a simple approach that directly approximates the likelihood. We show that this simple approach works well on toy problems and is competitive with recently introduced alternative non-likelihood based approximations. Furthermore, we relate this approach to a simple ranking objective. This leads us to suggest a specific setting for the optimal threshold in the ranking objective.
Published: 2016

21. Aspects of scaling and scalability for flow-based sampling of lattice QCD

Author: Massachusetts Institute of Technology. Center for Theoretical Physics, Abbott, Ryan, Albergo, Michael S., Botev, Aleksandar, Boyda, Denis, Cranmer, Kyle, Hackett, Daniel C., Matthews, Alexander G. D. G., Racanière, Sébastien, Razavi, Ali, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., Urban, Julian M., Massachusetts Institute of Technology. Center for Theoretical Physics, Abbott, Ryan, Albergo, Michael S., Botev, Aleksandar, Boyda, Denis, Cranmer, Kyle, Hackett, Daniel C., Matthews, Alexander G. D. G., Racanière, Sébastien, Razavi, Ali, Rezende, Danilo J., Romero-López, Fernando, Shanahan, Phiala E., and Urban, Julian M.
Abstract: Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing. However, these demonstrations have been at the scale of toy models, and it remains to be determined whether they can be applied to state-of-the-art lattice quantum chromodynamics calculations. Assessing the viability of sampling algorithms for lattice field theory at scale has traditionally been accomplished using simple cost scaling laws, but as we discuss in this work, their utility is limited for flow-based approaches. We conclude that flow-based approaches to sampling are better thought of as a broad family of algorithms with different scaling properties, and that scalability must be assessed experimentally.
Published: 2023

22. Sampling QCD field configurations with gauge-equivariant flow models

Author: Shanahan, Phiala, primary, Abbott, Ryan, additional, Albergo, Michael, additional, Botev, Aleksandar, additional, Boyda, Denis, additional, Cranmer, Kyle, additional, Hackett, Daniel, additional, Kanwar, Gurtej, additional, Matthews, Alexander, additional, Racaniere, Sebastien, additional, Razavi, Ali, additional, Rezende, Danilo, additional, Romero-Lopez, Fernando, additional, and Urban, Julian, additional
Published: 2023
Full Text: View/download PDF

23. Nesterov's accelerated gradient and momentum as approximations to regularised update descent

Author: Botev, Aleksandar, primary, Lever, Guy, additional, and Barber, David, additional
Published: 2017
Full Text: View/download PDF

24. Overdispersed variational autoencoders

Author: Shah, Harshil, primary, Barber, David, additional, and Botev, Aleksandar, additional
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

24 results on '"Botev, Aleksandar"'

1. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

2. Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

3. Applications of flow models to the generation of correlated lattice QCD ensembles

4. Normalizing flows for lattice gauge theory in arbitrary space-time dimension

5. Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

6. Aspects of scaling and scalability for flow-based sampling of lattice QCD

7. Sampling QCD field configurations with gauge-equivariant flow models

8. Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

9. SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision

10. Which priors matter? Benchmarking models for learning latent dynamics

11. Better, Faster Fermionic Neural Networks

12. Disentangling by Subspace Diffusion

13. Aspects of scaling and scalability for flow-based sampling of lattice QCD

14. The Gauss-Newton matrix for Deep Learning models and its applications

15. Hamiltonian Generative Networks

16. Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

17. Practical Gauss-Newton Optimisation for Deep Learning

18. Applications of flow models to the generation of correlated lattice QCD ensembles

19. Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent

20. Dealing with a large number of classes -- Likelihood, Discrimination or Ranking?

21. Aspects of scaling and scalability for flow-based sampling of lattice QCD

22. Sampling QCD field configurations with gauge-equivariant flow models

23. Nesterov's accelerated gradient and momentum as approximations to regularised update descent

24. Overdispersed variational autoencoders

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

24 results on '"Botev, Aleksandar"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources