Search

Your search keyword '"Kumar Sanjiv"' showing total 47 results

Search Constraints

Start Over You searched for: Author "Kumar Sanjiv" Remove constraint Author: "Kumar Sanjiv" Topic machine learning (stat.ml) Remove constraint Topic: machine learning (stat.ml)
47 results on '"Kumar Sanjiv"'

Search Results

1. On student-teacher deviations in distillation: does it pay to disobey?

2. Leveraging Importance Weights in Subset Selection

3. Depth Dependence of $μ$P Learning Rates in ReLU MLPs

4. ResMem: Learn what you can and memorize the rest

5. When Does Confidence-Based Cascade Deferral Suffice?

6. The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

7. Predicting on the Edge: Identifying Where a Larger Model Does Better

8. Robust Training of Neural Networks Using Scale Invariant Architectures

9. ELM: Embedding and Logit Margins for Long-Tail Learning

10. Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

11. Multi-Stage Influence Function

12. $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers

13. Doubly-stochastic mining for heterogeneous retrieval

14. Federated Learning with Only Positive Labels

15. Low-Rank Bottleneck in Multi-head Attention Models

16. Pre-training Tasks for Embedding-based Large-scale Retrieval

17. Learning discrete distributions: user vs item-level privacy

18. Does label smoothing mitigate label noise?

19. Robust Large-Margin Learning in Hyperbolic Space

20. Why distillation helps: a statistical perspective

21. Adaptive Federated Optimization

22. Long-tail learning via logit adjustment

23. Learning to Learn by Zeroth-Order Oracle

24. Online Hierarchical Clustering Approximations

25. Accelerating Large-Scale Inference with Anisotropic Vector Quantization

26. Sampled Softmax with Random Fourier Features

27. Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

28. On the Convergence of Adam and Beyond

29. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

30. Local Orthogonal Decomposition for Maximum Inner Product Search

31. Efficient Inner Product Approximation in Hybrid Spaces

32. AdaCliP: Adaptive Clipping for Private SGD

33. Escaping Saddle Points with Adaptive Gradient Methods

34. Are Transformers universal approximators of sequence-to-sequence functions?

35. cpSGD: Communication-efficient and differentially-private distributed SGD

36. Stochastic Negative Mining for Learning with Large Output Spaces

37. Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

38. Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

39. Stochastic Generative Hashing

40. Orthogonal Random Features

41. Structured Transforms for Small-Footprint Deep Learning

42. Compact Nonlinear Maps and Circulant Extensions

43. Quantization based Fast Inner Product Search

44. Circulant Binary Embedding

45. On Learning from Label Proportions

46. $\propto$SVM for learning with label proportions

47. On the Difficulty of Nearest Neighbor Search

Catalog

Books, media, physical & digital resources