Search

Your search keyword '"Raj, Bhiksha"' showing total 676 results

Search Constraints

Start Over You searched for: Author "Raj, Bhiksha" Remove constraint Author: "Raj, Bhiksha" Publication Year Range Last 10 years Remove constraint Publication Year Range: Last 10 years
676 results on '"Raj, Bhiksha"'

Search Results

1. ADIFF: Explaining audio difference using natural language

2. Masked Autoencoders Are Effective Tokenizers for Diffusion Models

3. Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

4. Tessellated Linear Model for Age Prediction from Voice

5. SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

6. XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation

7. Perturbation Ontology based Graph Attention Networks

8. MACE: Leveraging Audio for Evaluating Audio Captioning Systems

9. FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis

10. On the Diversity of Synthetic Data and its Impact on Training Large Language Models

11. What Do Speech Foundation Models Not Learn About Speech?

12. Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

13. RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

14. Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

15. ImageFolder: Autoregressive Image Generation with Folded Tokens

16. Revisiting Acoustic Features for Robust ASR

17. ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

18. DeWinder: Single-Channel Wind Noise Reduction using Ultrasound Sensing

19. PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification

20. Efficient Autoregressive Audio Modeling via Next-Scale Prediction

21. Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

22. Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

23. SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios

24. Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

25. uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes

26. From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

27. ControlVAR: Exploring Controllable Visual Autoregressive Modeling

28. ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models

29. EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding

30. Fashion Image Retrieval with Occlusion

31. R-Bench: Benchmarking the Robustness of Referring Perception Models Under Perturbations

32. Slight Corruption in Pre-training Data Makes Better Diffusion Models

33. Synergistic Global-space Camera and Human Reconstruction from Videos

34. Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features

35. Learning with Noisy Foundation Models

36. Speech Robust Bench: A Robustness Benchmark For Speech Recognition

37. $\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

38. AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

39. Evaluating and Improving Continual Learning in Spoken Language Understanding

40. Domain Adaptation for Contrastive Audio-Language Models

41. Customizable Perturbation Synthesis for Robust SLAM Benchmarking

42. A General Framework for Learning from Weak Supervision

43. On Catastrophic Inheritance of Large Foundation Models

44. PAM: Prompting Audio-Language Models for Audio Quality Assessment

45. AugSumm: towards generalizable speech summarization using synthetic labels from large language model

46. FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding

47. Token Prediction as Implicit Classification to Identify LLM-Generated Text

48. Pairwise Similarity Learning is SimPLE

49. Privacy-oriented manipulation of speaker representations

50. Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Catalog

Books, media, physical & digital resources