Author: "Karanam, Srikrishna" / Publisher: arxiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Karanam, Srikrishna"' showing total 4 results

Start Over Author "Karanam, Srikrishna" Publisher arxiv

4 results on '"Karanam, Srikrishna"'

1. Learning with Difference Attention for Visually Grounded Self-supervised Representations

Author: Agarwal, Aishwarya, Karanam, Srikrishna, and Srinivasan, Balaji Vasan
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding. To demonstrate this concretely, we propose visual difference attention (VDA) to compute visual attention maps in an unsupervised fashion by comparing an image with its salient-regions-masked-out version. We use VDA to derive attention maps for state-of-the art SSL methods and show they do not highlight all salient regions in an image accurately, suggesting their inability to learn strong representations for downstream tasks like segmentation. Motivated by these limitations, we cast VDA as a differentiable operation and propose a new learning objective, Differentiable Difference Attention (DiDA) loss, which leads to substantial improvements in an SSL model's visually grounding to an image's salient regions., Comment: 15 pages, 14 figures
Published: 2023
Full Text: View/download PDF

2. Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

Author: Singh, Prachi, Karanam, Srikrishna, and Shekhar, Sumit
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Multimedia, Computer Science - Sound, Information Retrieval (cs.IR), Computer Science - Information Retrieval, Electrical Engineering and Systems Science - Audio and Speech Processing, Multimedia (cs.MM)
Abstract: We consider and propose a new problem of retrieving audio files relevant to multimodal design document inputs comprising both textual elements and visual imagery, e.g., birthday/greeting cards. In addition to enhancing user experience, integrating audio that matches the theme/style of these inputs also helps improve the accessibility of these documents (e.g., visually impaired people can listen to the audio instead). While recent work in audio retrieval exists, these methods and datasets are targeted explicitly towards natural images. However, our problem considers multimodal design documents (created by users using creative software) substantially different from a naturally clicked photograph. To this end, our first contribution is collecting and curating a new large-scale dataset called Melodic-Design (or MELON), comprising design documents representing various styles, themes, templates, illustrations, etc., paired with music audio. Given our paired image-text-audio dataset, our next contribution is a novel multimodal cross-attention audio retrieval (MMCAR) algorithm that enables training neural networks to learn a common shared feature space across image, text, and audio dimensions. We use these learned features to demonstrate that our method outperforms existing state-of-the-art methods and produce a new reference benchmark for the research community on our new dataset., Comment: 5 pages including references
Published: 2023
Full Text: View/download PDF

3. Hierarchical Kinematic Human Mesh Recovery

Author: Georgakis, Georgios, Li, Ren, Karanam, Srikrishna, Chen, Terrence, Kosecka, Jana, and Wu, Ziyan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Robotics (cs.RO), Machine Learning (cs.LG)
Abstract: We consider the problem of estimating a parametric model of 3D human mesh from a single image. While there has been substantial recent progress in this area with direct regression of model parameters, these methods only implicitly exploit the human body kinematic structure, leading to sub-optimal use of the model prior. In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. This results in a strong prior-informed design of the regressor architecture and an associated hierarchical optimization that is flexible to be used in conjunction with the current standard frameworks for 3D human mesh recovery. We demonstrate these aspects by means of extensive experiments on standard benchmark datasets, showing how our proposed new design outperforms several existing and popular methods, establishing new state-of-the-art results. By considering joint interdependencies, our method is equipped to infer joints even under data corruptions, which we demonstrate by conducting experiments under varying degrees of occlusion., Comment: 17 pages, 8 figures, 5 tables, ECCV 2020
Published: 2020
Full Text: View/download PDF

4. Towards Robust RGB-D Human Mesh Recovery

Author: Li, Ren, Cai, Changjiang, Georgakis, Georgios, Karanam, Srikrishna, Chen, Terrence, and Wu, Ziyan
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Robotics (cs.RO), Machine Learning (cs.LG)
Abstract: We consider the problem of human pose estimation. While much recent work has focused on the RGB domain, these techniques are inherently under-constrained since there can be many 3D configurations that explain the same 2D projection. To this end, we propose a new method that uses RGB-D data to estimate a parametric human mesh model. Our key innovations include (a) the design of a new dynamic data fusion module that facilitates learning with a combination of RGB-only and RGB-D datasets, (b) a new constraint generator module that provides SMPL supervisory signals when explicit SMPL annotations are not available, and (c) the design of a new depth ranking learning objective, all of which enable principled model training with RGB-D data. We conduct extensive experiments on a variety of RGB-D datasets to demonstrate efficacy., Comment: 10 pages, 4 figures, 4 tables
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"Karanam, Srikrishna"'

1. Learning with Difference Attention for Visually Grounded Self-supervised Representations

2. Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms

3. Hierarchical Kinematic Human Mesh Recovery

4. Towards Robust RGB-D Human Mesh Recovery

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

4 results on '"Karanam, Srikrishna"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources