Search

Your search keyword '"Sohl-Dickstein, Jascha"' showing total 298 results

Search Constraints

Start Over You searched for: Author "Sohl-Dickstein, Jascha" Remove constraint Author: "Sohl-Dickstein, Jascha"
298 results on '"Sohl-Dickstein, Jascha"'

Search Results

1. Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

2. Scaling Exponents Across Parameterizations and Optimizers

3. Training LLMs over Neurally Compressed Text

4. The boundary of neural network trainability is fractal

5. Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

6. Frontier Language Models are not Robust to Adversarial Arithmetic, or 'What do I need to say so you agree 2+2=5?

7. Levels of AGI for Operationalizing Progress on the Path to AGI

8. Small-scale proxies for large-scale Transformer training instabilities

10. Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies

11. Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

12. General-Purpose In-Context Learning by Meta-Learning Transformers

13. VeLO: Training Versatile Learned Optimizers by Scaling Up

14. A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases

15. Language Model Cascades

16. Fast Finite Width Neural Tangent Kernel

17. Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling

18. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

19. Practical tradeoffs between memory, compute, and performance in learned optimizers

20. Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

21. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

22. Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping

24. Training Learned Optimizers with Randomly Initialized Learned Optimizers

25. Parallel Training of Deep Networks with Local Updates

26. Score-Based Generative Modeling through Stochastic Differential Equations

27. Towards NNGP-guided Neural Architecture Search

28. Reverse engineering learned optimizers reveals known and novel mechanisms

29. Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence

30. Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

31. Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization

32. Finite Versus Infinite Neural Networks: an Empirical Study

33. A new method for parameter estimation in probabilistic models: Minimum probability flow

34. Infinite attention: NNGP and NTK for deep attention networks

35. Exact posterior distributions of wide Bayesian neural networks

36. Two equalities expressing the determinant of a matrix in terms of expectations over matrix-vector products

37. Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling

38. The large learning rate phase of deep learning: the catapult mechanism

39. Using a thousand optimization tasks to learn hyperparameter search strategies

40. On the infinite width limit of neural networks with a standard parameterization

41. Neural Tangents: Fast and Easy Infinite Neural Networks in Python

42. Neural reparameterization improves structural optimization

43. Using learned optimizers to make models robust to input noise

44. The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

45. A RAD approach to deep mixture models

46. A Mean Field Theory of Batch Normalization

47. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

48. Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit

49. Measuring the Effects of Data Parallelism on Neural Network Training

50. Understanding and correcting pathologies in the training of learned optimizers

Catalog

Books, media, physical & digital resources