9,732 results on '"Agrawal, A"'
Search Results
2. Spatio-Temporal Variability and Trend Analysis of Long-Term Rainfall in Parbati River Basin, Rajasthan
- Author
-
Agrawal, Abhishek, Kothari, Mahesh, Jaiswal, Rahul Kumar, Singh, Pradeep Kumar, Bhakar, Sita Ram, Yadav, Kamal Kishore, and Jain, Sanjay Kumar
- Published
- 2024
- Full Text
- View/download PDF
3. Dietary supplementation of licorice (Glycyrrhiza glabra) powder protects white pekin ducks exposed to hot and humid shed environment during summer from stress-induced alterations in the serum biochemical parameters
- Author
-
Jena, P.P., Patra, R.C., Agrawal, A., Jena, B.R., Sahoo, Rajasri, Das, D.P., Kumar, Dhirendra, Mishra, S.K., and Beura, C.K.
- Published
- 2024
- Full Text
- View/download PDF
4. Using Augmented Reality in Molecular Case Studies to Enhance Biomolecular Structure-Function Explorations in Undergraduate Classrooms
- Author
-
Didem Vardar-Ulu, Saif Eldeen Ragab, Swati Agrawal, and Shuchismita Dutta
- Abstract
Molecular case studies (MCSs) are open educational resources that use a storytelling approach to engage students in biomolecular structure-function explorations, at the interface of biology and chemistry. Although MCSs are developed for a particular target audience with specific learning goals, they are suitable for implementation in multiple disciplinary course contexts. Detailed teaching notes included in the case study help instructors plan and prepare for their implementation in diverse contexts. A newly developed MCS was simultaneously implemented in a biochemistry and a molecular parasitology course at two different institutions. Instructors participating in this cross-institutional and multidisciplinary implementation collaboratively identified the need for quick and effective ways to bridge the gap between the MCS authors' vision and the implementing instructor's interpretation of the case-related molecular structure-function discussions. Augmented reality (AR) is an interactive and engaging experience that has been used effectively in teaching molecular sciences. Its accessibility and ease-of-use with smart devices (e.g., phones and tablets) make it an attractive option for expediting and improving both instructor preparation and classroom implementation of MCSs. In this work, we report the incorporation of ready-to-use AR objects as checkpoints in the MCS. Interacting with these AR objects facilitated instructor preparation, reduced students' cognitive load, and provided clear expectations for their learning. Based on our classroom observations, we propose that the incorporation of AR in MCSs can facilitate its successful implementation, improve the classroom experience for educators and students, and make MCSs more broadly accessible in diverse curricular settings.
- Published
- 2024
5. Next-to-leading order QCD corrections to $Z\to q\bar{q}\gamma$, $q\bar{q}\gamma\gamma$
- Author
-
Agrawal, Pankaj, Bisal, Subhadip, Das, Biswajit, and Das, Debottam
- Subjects
High Energy Physics - Phenomenology - Abstract
We consider the rare decay channels of the $Z$ boson: $Z \to \text{two}\ \textrm{jets} + \gamma$ and $Z \to \text{two}\ \textrm{jets} +2\, \gamma$. To obtain the widths and distributions for these processes, we compute the effect of NLO QCD corrections to the processes $Z \to q {\bar q}+ \gamma$ and $Z \to q {\bar q} +2\, \gamma$. We find that these corrections reduce the widths of these processes by about $6.03\%$ and $12.39\%$, respectively. The reduction in the partial widths is larger at the jet level. These NLO-improved decay observables may be tested in future runs of the LHC or at future $e^{+}e^{-}$ colliders., Comment: 20 pages, 16 figures, 2 tables
- Published
- 2024
6. Fair Summarization: Bridging Quality and Diversity in Extractive Summaries
- Author
-
Nezhad, Sina Bagheri, Bandyapadhyay, Sayan, and Agrawal, Ameeta
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Fairness in multi-document summarization of user-generated content remains a critical challenge in natural language processing (NLP). Existing summarization methods often fail to ensure equitable representation across different social groups, leading to biased outputs. In this paper, we introduce two novel methods for fair extractive summarization: FairExtract, a clustering-based approach, and FairGPT, which leverages GPT-3.5-turbo with fairness constraints. We evaluate these methods using Divsumm summarization dataset of White-aligned, Hispanic, and African-American dialect tweets and compare them against relevant baselines. The results obtained using a comprehensive set of summarization quality metrics such as SUPERT, BLANC, SummaQA, BARTScore, and UniEval, as well as a fairness metric F, demonstrate that FairExtract and FairGPT achieve superior fairness while maintaining competitive summarization quality. Additionally, we introduce composite metrics (e.g., SUPERT+F, BLANC+F) that integrate quality and fairness into a single evaluation framework, offering a more nuanced understanding of the trade-offs between these objectives. This work highlights the importance of fairness in summarization and sets a benchmark for future research in fairness-aware NLP models., Comment: Accepted at Algorithmic Fairness through the Lens of Metrics and Evaluation Workshop @ NeurIPS 2024
- Published
- 2024
7. Control Protocol for Entangled Pair Verification in Quantum Optical Networks
- Author
-
Vasan, Vivek, Agrawal, Anuj, Nico-Katz, Alexander, Horgan, Jerry, Bash, Boulat A., Kilper, Daniel C., and Ruffini, Marco
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
We consider quantum networks, where entangled photon pairs are distributed using fibre optic links from a centralized source to entangling nodes. The entanglement is then stored (via an entanglement swap) in entangling nodes' quantum memories until used in, e.g., distributed quantum computing, quantum key distribution, quantum sensing, and other applications. Due to the fibre loss, some photons are lost in transmission. Noise in the transmission link and the quantum memory also reduces fidelity. Thus, entangling nodes must keep updated records of photon-pair arrivals to each destination, and their use by the applications. This coordination requires classical information exchange between each entangled node pair. However, the same fibre link may not admit both classical and quantum transmissions, as the classical channels can generate enough noise (i.e., via spontaneous Raman scattering) to make the quantum link unusable. Here, we consider coordinating entanglement distribution using a standard Internet protocol (IP) network instead, and propose a control protocol to enable such. We analyse the increase in latency from transmission over an IP network, together with the effect of photon loss, quantum memory noise and buffer size, to determine the fidelity and rate of entangled pairs. We characterize the relationship between the latency of the non-ideal IP network and the decoherence time of the quantum memories, providing a comparison of promising quantum memory technologies.
- Published
- 2024
8. Strategies for entanglement distribution in optical fiber networks
- Author
-
McAleese, Hannah, Agrawal, Anuj, Vasan, Vivek, Campbell, Conall J., Hawkins, Adam G., Kilper, Daniel C., Paternostro, Mauro, and Ruffini, Marco
- Subjects
Quantum Physics - Abstract
Distributing entanglement over long distances remains a challenge due to its fragility when exposed to environmental effects. In this work, we compare various entanglement distribution protocols in a realistic noisy fiber network. We focus specifically on two schemes that only require the sending of a non-entangled carrier photon to remote nodes of the network. These protocols rely on optical CNOT gates and we vary the probability with which they can be successfully performed. Encoding our entangled states in photon polarization, we analyse the effect of depolarizing noise on the photonic states as the carrier passes through the fibers. Building a robust model of photon loss and calculating the distillable entanglement of the noisy states, we find the entanglement distribution rate. We discover that methods involving a separable carrier can reach a higher rate than the standard entanglement distribution protocol, provided that the success probability of the optical CNOT gates is sufficiently high., Comment: 12 pages, 10 figures. Comments welcome
- Published
- 2024
9. Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
- Author
-
Ramos, Miguel Moura, Almeida, Tomás, Vareta, Daniel, Azevedo, Filipe, Agrawal, Sweta, Fernandes, Patrick, and Martins, André F. T.
- Subjects
Computer Science - Computation and Language - Abstract
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research has focused on RL methods that use sentence-level feedback, which leads to inefficient learning signals due to the reward sparsity problem -- the model receives a single score for the entire sentence. To address this, we introduce a novel approach that leverages fine-grained token-level reward mechanisms with RL methods. We use xCOMET, a state-of-the-art quality estimation system as our token-level reward model. xCOMET provides detailed feedback by predicting fine-grained error spans and their severity given source-translation pairs. We conduct experiments on small and large translation datasets to compare the impact of sentence-level versus fine-grained reward signals on translation quality. Our results show that training with token-level rewards improves translation quality across language pairs over baselines according to automatic and human evaluation. Furthermore, token-level reward optimization also improves training stability, evidenced by a steady increase in mean rewards over training epochs., Comment: 10 pages, work-in-progress
- Published
- 2024
10. Few-Shot Task Learning through Inverse Generative Modeling
- Author
-
Netanyahu, Aviv, Du, Yilun, Bronars, Antonia, Pari, Jyothish, Tenenbaum, Joshua, Shu, Tianmin, and Agrawal, Pulkit
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative model on a set of basic concepts and their demonstrations. Then, given a few demonstrations of a new concept (such as a new goal or a new action), our method learns the underlying concepts through backpropagation without updating the model weights, thanks to the invertibility of the generative model. We evaluate our method in five domains -- object rearrangement, goal-oriented navigation, motion caption of human actions, autonomous driving, and real-world table-top manipulation. Our experimental results demonstrate that via the pretrained generative model, we successfully learn novel concepts and generate agent plans or motion corresponding to these concepts in (1) unseen environments and (2) in composition with training concepts.
- Published
- 2024
11. Ultralow loss torsion micropendula for chipscale gravimetry
- Author
-
Condos, C. A., Pratt, J. R., Manley, J., Agrawal, A. R., Schlamminger, S., Pluchar, C. M., and Wilson, D. J.
- Subjects
Physics - Applied Physics ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Geophysics - Abstract
The pendulum is one of the oldest gravimeters, featuring frequency-based readout limited by geometric nonlinearity. While modern gravimeters focus on displacement-based spring-mass or free-fall designs, the advent of nanofabrication techniques invites a revisiting of the pendulum, motivated by the prospect of low-loss, compact, isochronous operation, leveraging precise dimensional control. Here we exploit advances in strain-engineered nanomechanics -- specifically, strained Si$_3$N$_4$ nanoribbon suspensions -- to realize a $0.1$ mg, $32$ Hz torsion pendulum with an ultralow damping rate of $16\,\mu$Hz and a parametric gravity sensitivity of $5$ Hz/$g_0$ ($g_0 = 9.8\;\text{m}/\text{s}^2)$. The low thermal acceleration of the pendulum, $2\times 10^{-9}g_0/\sqrt{\text{Hz}}$, gives access to a parametric gravity resolution of $10^{-8}g_0$ for drive amplitudes of $10\;\text{mrad}$ and integration times within the free decay time, of interest for both commercial applications and fundamental experiments. We present progress toward this goal, demonstrating free and self-sustained oscillators with frequency stabilities as little as $2.5\,\mu$Hz at 200 s, corresponding to a gravity resolution of $5\times 10^{-7}g_0$. We also show how the Duffing nonlinearity of the suspension can be used to cancel the pendulum nonlinearity, paving the way toward a fully isochronous, high-$Q$ micromechanical clock., Comment: 9 pages, 7 figures
- Published
- 2024
12. SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception
- Author
-
Sharma, Deepika, Negi, Shubham, Dutta, Trishit, Agrawal, Amogh, and Roy, Kaushik
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Spiking Neural Networks (SNNs), with their inherent recurrence, offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS), making them well-suited for event-based vision applications. However, existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes, inefficient membrane potential (Vmem) handling, and limited sparse optimizations. In response to these challenges, we propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator \chipname with a set of key features: 1) It uses in-memory computations and reconfigurable operating modes to minimize data movement associated with weight and Vmem data structures while efficiently adapting to different workloads. 2) It supports multiple weight/Vmem bit precision values, enabling a trade-off between accuracy and energy efficiency and enhancing adaptability to diverse application demands. 3) A zero-skipping mechanism for sparse inputs significantly reduces energy usage by leveraging the inherent sparsity of spikes without introducing high overheads for low sparsity. 4) Finally, the asynchronous handshaking mechanism maintains the computational efficiency of the pipeline for variable execution times of different computation units. We fabricated \chipname in 65 nm Taiwan Semiconductor Manufacturing Company (TSMC) low-power (LP) technology. It demonstrates competitive performance (scaled to the same technology node) to other digital SNN accelerators proposed in the recent literature and supports advanced reconfigurability. It achieves up to 5 TOPS/W energy efficiency at 95% input sparsity with 4-bit weights and 7-bit Vmem precision., Comment: 9 pages, 17 figures
- Published
- 2024
13. DexHub and DART: Towards Internet Scale Robot Data Collection
- Author
-
Park, Younghyo, Bhatia, Jagdeep Singh, Ankile, Lars, and Agrawal, Pulkit
- Subjects
Computer Science - Robotics - Abstract
The quest to build a generalist robotic system is impeded by the scarcity of diverse and high-quality data. While real-world data collection effort exist, requirements for robot hardware, physical environment setups, and frequent resets significantly impede the scalability needed for modern learning frameworks. We introduce DART, a teleoperation platform designed for crowdsourcing that reimagines robotic data collection by leveraging cloud-based simulation and augmented reality (AR) to address many limitations of prior data collection efforts. Our user studies highlight that DART enables higher data collection throughput and lower physical fatigue compared to real-world teleoperation. We also demonstrate that policies trained using DART-collected datasets successfully transfer to reality and are robust to unseen visual disturbances. All data collected through DART is automatically stored in our cloud-hosted database, DexHub, which will be made publicly available upon curation, paving the path for DexHub to become an ever-growing data hub for robot learning. Videos are available at: https://dexhub.ai/project, Comment: Visit https://dexhub.ai/project for more details
- Published
- 2024
14. Collective Model Intelligence Requires Compatible Specialization
- Author
-
Pari, Jyothish, Jelassi, Samy, and Agrawal, Pulkit
- Subjects
Computer Science - Machine Learning - Abstract
In this work, we explore the limitations of combining models by averaging intermediate features, referred to as model merging, and propose a new direction for achieving collective model intelligence through what we call compatible specialization. Current methods for model merging, such as parameter and feature averaging, struggle to effectively combine specialized models due to representational divergence during fine-tuning. As models specialize to their individual domains, their internal feature representations become increasingly incompatible, leading to poor performance when attempting to merge them for new tasks. We analyze this phenomenon using centered kernel alignment (CKA) and show that as models specialize, the similarity in their feature space structure diminishes, hindering their capacity for collective use. To address these challenges, we investigate routing-based merging strategies, which offer more flexible methods for combining specialized models by dynamically routing across different layers. This allows us to improve on existing methods by combining features from multiple layers rather than relying on fixed, layer-wise combinations. However, we find that these approaches still face limitations when layers within models are representationally incompatible. Our findings highlight the importance of designing new approaches for model merging that operate on well-defined input and output spaces, similar to how humans communicate through language rather than intermediate neural activations.
- Published
- 2024
15. AM Flow: Adapters for Temporal Processing in Action Recognition
- Author
-
Agrawal, Tanay, Ali, Abid, Dantcheva, Antitza, and Bremond, Francois
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning models, in particular \textit{image} models, have recently gained generalisability and robustness. %are becoming more general and robust by the day. In this work, we propose to exploit such advances in the realm of \textit{video} classification. Video foundation models suffer from the requirement of extensive pretraining and a large training time. Towards mitigating such limitations, we propose "\textit{Attention Map (AM) Flow}" for image models, a method for identifying pixels relevant to motion in each input video frame. In this context, we propose two methods to compute AM flow, depending on camera motion. AM flow allows the separation of spatial and temporal processing, while providing improved results over combined spatio-temporal processing (as in video models). Adapters, one of the popular techniques in parameter efficient transfer learning, facilitate the incorporation of AM flow into pretrained image models, mitigating the need for full-finetuning. We extend adapters to "\textit{temporal processing adapters}" by incorporating a temporal processing unit into the adapters. Our work achieves faster convergence, therefore reducing the number of epochs needed for training. Moreover, we endow an image model with the ability to achieve state-of-the-art results on popular action recognition datasets. This reduces training time and simplifies pretraining. We present experiments on Kinetics-400, Something-Something v2, and Toyota Smarthome datasets, showcasing state-of-the-art or comparable results.
- Published
- 2024
16. Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
- Author
-
Soman, Karthik, Langdon, Andrew, Villouta, Catalina, Agrawal, Chinmay, Salta, Lashaw, Peetoom, Braian, Bellucci, Gianmarco, and Buske, Orion J
- Subjects
Computer Science - Computation and Language - Abstract
Rare diseases present unique challenges in healthcare, often suffering from delayed diagnosis and fragmented information landscapes. The scarcity of reliable knowledge in these conditions poses a distinct challenge for Large Language Models (LLMs) in supporting clinical management and delivering precise patient information underscoring the need for focused training on these 'zebra' cases. We present Zebra-Llama, a specialized context-aware language model with high precision Retrieval Augmented Generation (RAG) capability, focusing on Ehlers-Danlos Syndrome (EDS) as our case study. EDS, affecting 1 in 5,000 individuals, exemplifies the complexities of rare diseases with its diverse symptoms, multiple subtypes, and evolving diagnostic criteria. By implementing a novel context-aware fine-tuning methodology trained on questions derived from medical literature, patient experiences, and clinical resources, along with expertly curated responses, Zebra-Llama demonstrates unprecedented capabilities in handling EDS-related queries. On a test set of real-world questions collected from EDS patients and clinicians, medical experts evaluated the responses generated by both models, revealing Zebra-Llama's substantial improvements over base model (Llama 3.1-8B-Instruct) in thoroughness (77.5% vs. 70.1%), accuracy (83.0% vs. 78.8%), clarity (74.7% vs. 72.0%) and citation reliability (70.6% vs. 52.3%). Released as an open-source resource, Zebra-Llama not only provides more accessible and reliable EDS information but also establishes a framework for developing specialized AI solutions for other rare conditions. This work represents a crucial step towards democratizing expert-level knowledge in rare disease management, potentially transforming how healthcare providers and patients navigate the complex landscape of rare diseases., Comment: 26 pages, 4 figures, 1 supplementary figure
- Published
- 2024
17. Humidity-enhanced NO$_2$ gas sensing using atomically sharp edges in multilayer MoS$_2$
- Author
-
Agrawal, Abhay V., Polyakov, Alexander Yu., Eriksson, Jens, Antosiewicz, Tomasz J., and Shegai, Timur O.
- Subjects
Physics - Applied Physics - Abstract
Ambient humidity poses a significant challenge in the development of practical room temperature NO$_2$ gas sensors. Here, we employ atomically precise zigzag edges in multilayer MoS$_2$, fabricated using electron beam lithography and anisotropic wet etching, to achieve highly sensitive and selective gas sensing performance that is humidity-tolerant at elevated temperatures and humidity-enhanced at room temperature under ultraviolet illumination. Notably, exposure to 2.5 parts per billion (ppb) NO$_2$ at 70% relative humidity under ultraviolet illumination and at room-temperature resulted in a 33-fold increase in response and a 6-fold faster recovery compared to 0% relative humidity, leading to response values exceeding 1100%. The optimized samples demonstrated a theoretical detection limit ranging from 4 to 400 parts per trillion (ppt) NO$_2$. The enhanced NO$_2$ sensing capabilities of MoS$_2$ edges have been further confirmed through first-principles calculations. Our study expands the applications of nanostructured MoS$_2$ and highlights its potential for detecting NO$_2$ at sub-ppb levels in complex scenarios, such as high humidity conditions., Comment: 40 pages, 7 figures
- Published
- 2024
18. Learning to Look Around: Enhancing Teleoperation and Learning with a Human-like Actuated Neck
- Author
-
Sen, Bipasha, Wang, Michelle, Thakur, Nandini, Agarwal, Aditya, and Agrawal, Pulkit
- Subjects
Computer Science - Robotics - Abstract
We introduce a teleoperation system that integrates a 5 DOF actuated neck, designed to replicate natural human head movements and perception. By enabling behaviors like peeking or tilting, the system provides operators with a more intuitive and comprehensive view of the environment, improving task performance, reducing cognitive load, and facilitating complex whole-body manipulation. We demonstrate the benefits of natural perception across seven challenging teleoperation tasks, showing how the actuated neck enhances the scope and efficiency of remote operation. Furthermore, we investigate its role in training autonomous policies through imitation learning. In three distinct tasks, the actuated neck supports better spatial awareness, reduces distribution shift, and enables adaptive task-specific adjustments compared to a static wide-angle camera.
- Published
- 2024
19. Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model
- Author
-
Nandi, Subhadip and Agrawal, Neeraj
- Subjects
Computer Science - Machine Learning ,Computer Science - Information Retrieval - Abstract
Few-Shot Cross-Domain NER is the process of leveraging knowledge from data-rich source domains to perform entity recognition on data scarce target domains. Most previous state-of-the-art (SOTA) approaches use pre-trained language models (PLMs) for cross-domain NER. However, these models are often domain specific. To successfully use these models for new target domains, we need to modify either the model architecture or perform model finetuning using data from the new domains. Both of these result in the creation of entirely new NER models for each target domain which is infeasible for practical scenarios. Recently,several works have attempted to use LLMs to solve Few-Shot Cross-Domain NER. However, most of these are either too expensive for practical purposes or struggle to follow LLM prompt instructions. In this paper, we propose IF-WRANER (Instruction Finetuned Word-embedding based Retrieval Augmented large language model for Named Entity Recognition), a retrieval augmented LLM, finetuned for the NER task. By virtue of the regularization techniques used during LLM finetuning and the adoption of word-level embedding over sentence-level embedding during the retrieval of in-prompt examples, IF-WRANER is able to outperform previous SOTA Few-Shot Cross-Domain NER approaches. We have demonstrated the effectiveness of our model by benchmarking its performance on the open source CrossNER dataset, on which it shows more than 2% F1 score improvement over the previous SOTA model. We have deployed the model for multiple customer care domains of an enterprise. Accurate entity prediction through IF-WRANER helps direct customers to automated workflows for the domains, thereby reducing escalations to human agents by almost 15% and leading to millions of dollars in yearly savings for the company.
- Published
- 2024
20. ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
- Author
-
Chaubey, Ashutosh, Agarwaal, Anoubhav, Roy, Sartaki Sinha, Agrawal, Aayush, and Ghose, Susmita
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval - Abstract
Contextual advertising serves ads that are aligned to the content that the user is viewing. The rapid growth of video content on social platforms and streaming services, along with privacy concerns, has increased the need for contextual advertising. Placing the right ad in the right context creates a seamless and pleasant ad viewing experience, resulting in higher audience engagement and, ultimately, better ad monetization. From a technology standpoint, effective contextual advertising requires a video retrieval system capable of understanding complex video content at a very granular level. Current text-to-video retrieval models based on joint multimodal training demand large datasets and computational resources, limiting their practicality and lacking the key functionalities required for ad ecosystem integration. We introduce ContextIQ, a multimodal expert-based video retrieval system designed specifically for contextual advertising. ContextIQ utilizes modality-specific experts-video, audio, transcript (captions), and metadata such as objects, actions, emotion, etc.-to create semantically rich video representations. We show that our system, without joint training, achieves better or comparable results to state-of-the-art models and commercial solutions on multiple text-to-video retrieval benchmarks. Our ablation studies highlight the benefits of leveraging multiple modalities for enhanced video retrieval accuracy instead of using a vision-language model alone. Furthermore, we show how video retrieval systems such as ContextIQ can be used for contextual advertising in an ad ecosystem while also addressing concerns related to brand safety and filtering inappropriate content., Comment: Accepted at WACV 2025
- Published
- 2024
21. Ultrathin 3R-MoS$_2$ metasurfaces with atomically precise edges for efficient nonlinear nanophotonics
- Author
-
Zograf, George, Küçüköz, Betül, Polyakov, Alexander Yu., Bancerek, Maria, Agrawal, Abhay V., Wieczorek, Witlef, Antosiewicz, Tomasz J., and Shegai, Timur O.
- Subjects
Physics - Optics ,Condensed Matter - Materials Science - Abstract
Dielectric metasurfaces that combine high-index materials with optical nonlinearities are widely recognized for their potential in various quantum and classical nanophotonic applications. However, the fabrication of high-quality metasurfaces poses significant material-dependent challenges, as their designs are often susceptible to disorder, defects, and scattering losses, which are particularly prone to occur at the edges of nanostructured features. Additionally, the choice of the material platforms featuring second-order optical nonlinearities, $\chi^{(2)}$, is limited to broken-inversion symmetry crystals such as GaAs, GaP, LiNbO$_3$, and various bulk van der Waals materials, including GaSe and NbOCl$_2$. Here, we use a combination of top-down lithography and anisotropic wet etching of a specially stacked van der Waals crystal -- 3R-MoS$_2$, which exhibits both a high refractive index and exceptional $\chi^{(2)}$ nonlinearity, to produce metasurfaces consisting of perfect equilateral triangle nanoholes with atomically precise zigzag edges. Due to the geometry of the triangle, the etching process is accompanied by a transition from an in-plane $C_4$ symmetric structure to a broken-in-plane symmetry configuration, thereby allowing for the realization of the quasi-bound-state-in-the-continuum (q-BIC) concept. The resulting ultrathin metasurface ($\sim$ 20-25 nm) demonstrates a remarkable enhancement in second-harmonic generation (SHG) -- over three orders of magnitude at specific wavelengths and linear polarization directions compared to a host flake.
- Published
- 2024
22. SCULPT: Systematic Tuning of Long Prompts
- Author
-
Kumar, Shanu, Venkata, Akhila Yesantarao, Khandelwal, Shubhanshu, Santra, Bishal, Agrawal, Parag, and Gupta, Manish
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
As large language models become increasingly central to solving complex tasks, the challenge of optimizing long, unstructured prompts has become critical. Existing optimization techniques often struggle to effectively handle such prompts, leading to suboptimal performance. We introduce SCULPT (Systematic Tuning of Long Prompts), a novel framework that systematically refines long prompts by structuring them hierarchically and applying an iterative actor-critic mechanism. To enhance robustness and generalizability, SCULPT utilizes two complementary feedback mechanisms: Preliminary Assessment, which assesses the prompt's structure before execution, and Error Assessment, which diagnoses and addresses errors post-execution. By aggregating feedback from these mechanisms, SCULPT avoids overfitting and ensures consistent improvements in performance. Our experimental results demonstrate significant accuracy gains and enhanced robustness, particularly in handling erroneous and misaligned prompts. SCULPT consistently outperforms existing approaches, establishing itself as a scalable solution for optimizing long prompts across diverse and real-world tasks.
- Published
- 2024
23. Constrained Nonlinear Kaczmarz Projection on Intersections of Manifolds for Coordinated Multi-Robot Mobile Manipulation
- Author
-
Agrawal, Akshaya, Mayer, Parker, Kingston, Zachary, and Hollinger, Geoffrey A.
- Subjects
Computer Science - Robotics - Abstract
Cooperative manipulation tasks impose various structure-, task-, and robot-specific constraints on mobile manipulators. However, current methods struggle to model and solve these myriad constraints simultaneously. We propose a twofold solution: first, we model constraints as a family of manifolds amenable to simultaneous solving. Second, we introduce the constrained nonlinear Kaczmarz (cNKZ) projection technique to produce constraint-satisfying solutions. Experiments show that cNKZ dramatically outperforms baseline approaches, which cannot find solutions at all. We integrate cNKZ with a sampling-based motion planning algorithm to generate complex, coordinated motions for 3 to 6 mobile manipulators (18--36 DoF), with cNKZ solving up to 80 nonlinear constraints simultaneously and achieving up to a 92% success rate in cluttered environments. We also demonstrate our approach on hardware using three Turtlebot3 Waffle Pi robots with OpenMANIPULATOR-X arms.
- Published
- 2024
24. ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning
- Author
-
Hwang, Jaedong, Cheung, Brian, Hong, Zhang-Wei, Boopathy, Akhilan, Agrawal, Pulkit, and Fiete, Ila
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Highly performant large-scale pre-trained models promise to also provide a valuable foundation for learning specialized tasks, by fine-tuning the model to the desired task. By starting from a good general-purpose model, the goal is to achieve both specialization in the target task and maintain robustness. To assess the robustness of models to out-of-distribution samples after fine-tuning on downstream datasets, we introduce a new robust fine-tuning benchmark, ImageNet-RIB (Robustness Inheritance Benchmark). The benchmark consists of a set of related but distinct specialized (downstream) tasks; pre-trained models are fine-tuned on one task in the set and their robustness is assessed on the rest, iterating across all tasks for fine-tuning and assessment. We find that the continual learning methods, EWC and LwF maintain robustness after fine-tuning though fine-tuning generally does reduce performance on generalization to related downstream tasks across models. Not surprisingly, models pre-trained on large and rich datasets exhibit higher initial robustness across datasets and suffer more pronounced degradation during fine-tuning. The distance between the pre-training and downstream datasets, measured by optimal transport, predicts this performance degradation on the pre-training dataset. However, counterintuitively, model robustness after fine-tuning on related downstream tasks is the worst when the pre-training dataset is the richest and the most diverse. This suggests that starting with the strongest foundation model is not necessarily the best approach for performance on specialist tasks. The benchmark thus offers key insights for developing more resilient fine-tuning strategies and building robust machine learning models. https://jd730.github.io/projects/ImageNet-RIB
- Published
- 2024
25. Parameterized Saga of First-Fit and Last-Fit Coloring
- Author
-
Agrawal, Akanksha, Lokshtanov, Daniel, Panolan, Fahad, Saurabh, Saket, and Verma, Shaily
- Subjects
Computer Science - Discrete Mathematics ,Computer Science - Data Structures and Algorithms - Abstract
The classic greedy coloring (first-fit) algorithm considers the vertices of an input graph $G$ in a given order and assigns the first available color to each vertex $v$ in $G$. In the {\sc Grundy Coloring} problem, the task is to find an ordering of the vertices that will force the greedy algorithm to use as many colors as possible. In the {\sc Partial Grundy Coloring}, the task is also to color the graph using as many colors as possible. This time, however, we may select both the ordering in which the vertices are considered and which color to assign the vertex. The only constraint is that the color assigned to a vertex $v$ is a color previously used for another vertex if such a color is available. Whether {\sc Grundy Coloring} and {\sc Partial Grundy Coloring} admit fixed-parameter tractable (FPT) algorithms, algorithms with running time $f(k)n^{\OO(1)}$, where $k$ is the number of colors, was posed as an open problem by Zaker and by Effantin et al., respectively. Recently, Aboulker et al. (STACS 2020 and Algorithmica 2022) resolved the question for \Grundycol\ in the negative by showing that the problem is W[1]-hard. For {\sc Partial Grundy Coloring}, they obtain an FPT algorithm on graphs that do not contain $K_{i,j}$ as a subgraph (a.k.a. $K_{i,j}$-free graphs). Aboulker et al.~re-iterate the question of whether there exists an FPT algorithm for {\sc Partial Grundy Coloring} on general graphs and also asks whether {\sc Grundy Coloring} admits an FPT algorithm on $K_{i,j}$-free graphs. We give FPT algorithms for {\sc Partial Grundy Coloring} on general graphs and for {\sc Grundy Coloring} on $K_{i,j}$-free graphs, resolving both the questions in the affirmative. We believe that our new structural theorems for partial Grundy coloring and ``representative-family'' like sets for $K_{i,j}$-free graphs that we use in obtaining our results may have wider algorithmic applications.
- Published
- 2024
26. ChartA11y: Designing Accessible Touch Experiences of Visualizations with Blind Smartphone Users
- Author
-
Zhang, Zhuohao Jerry, Thompson, John R., Shah, Aditi, Agrawal, Manish, Sarikaya, Alper, Wobbrock, Jacob O., Cutrell, Edward, and Lee, Bongshin
- Subjects
Computer Science - Human-Computer Interaction - Abstract
We introduce ChartA11y, an app developed to enable accessible 2-D visualizations on smartphones for blind users through a participatory and iterative design process involving 13 sessions with two blind partners. We also present a design journey for making accessible touch experiences that go beyond simple auditory feedback, incorporating multimodal interactions and multisensory data representations. Together, ChartA11y aimed at providing direct chart accessing and comprehensive chart understanding by applying a two-mode setting: a semantic navigation framework mode and a direct touch mapping mode. By re-designing traditional touch-to-audio interactions, ChartA11y also extends to accessible scatter plots, addressing the under-explored challenges posed by their non-linear data distribution. Our main contributions encompass the detailed participatory design process and the resulting system, ChartA11y, offering a novel approach for blind users to access visualizations on their smartphones.
- Published
- 2024
- Full Text
- View/download PDF
27. Microscopy of bosonic charge carriers in staggered magnetic fields
- Author
-
Bohrdt, Annabelle, Wei, David, Adler, Daniel, Srakaew, Kritsana, Agrawal, Suchita, Weckesser, Pascal, Bloch, Immanuel, Grusdt, Fabian, and Zeiher, Johannes
- Subjects
Condensed Matter - Quantum Gases ,Condensed Matter - Strongly Correlated Electrons - Abstract
The interplay of spin and charge degrees of freedom is believed to underlie various unresolved phenomena in strongly correlated systems. Quantum simulators based on neutral atoms provide an excellent testbed for investigating such phenomena and resolving their microscopic origins. Up to now, the majority of experimental and theoretical studies has focused on systems with fermionic exchange statistics. Here we expand the existing cold atom toolbox through the use of negative temperature states, enabling us to realize an antiferromagnetic, bosonic $t-J$ model in two spatial dimensions, subject to a strong staggered magnetic field in a quantum gas microscope. Through comparison of the spreading dynamics of a single hole in a N\'eel versus a spin-polarized initial state, we establish the relevance of memory effects resulting from the buildup of strong spin-charge correlations in the dynamics of charge carriers in antiferromagnets. We further numerically predict rich dynamics of pairs of doped holes, which we demonstrate to be bound by a similar memory effect, while their center-of-mass can expand freely. Our work paves the way for the systematic exploration of the effect of antiferromagnetic spin ordering on the properties of individual charge carriers as well as finite doping phases: Our study demonstrates that the staggered field can be used to single out the effect of antiferromagnetism and holds the prospect to prepare low-temperature states in the near future., Comment: 9+3 pages, 4+5 figures
- Published
- 2024
28. Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
- Author
-
Li, Zhangheng, You, Keen, Zhang, Haotian, Feng, Di, Agrawal, Harsh, Li, Xiujun, Moorthy, Mohana Prasad Sathya, Nichols, Jeff, Yang, Yinfei, and Gan, Zhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI 2 introduces three key innovations: support for multiple platform types, high-resolution perception through adaptive scaling, and advanced task training data generation powered by GPT-4o with set-of-mark visual prompting. These advancements enable Ferret-UI 2 to perform complex, user-centered interactions, making it highly versatile and adaptable for the expanding diversity of platform ecosystems. Extensive empirical experiments on referring, grounding, user-centric advanced tasks (comprising 9 subtasks $\times$ 5 platforms), GUIDE next-action prediction dataset, and GUI-World multi-platform benchmark demonstrate that Ferret-UI 2 significantly outperforms Ferret-UI, and also shows strong cross-platform transfer capabilities.
- Published
- 2024
29. AdaEDL: Early Draft Stopping for Speculative Decoding of Large Language Models via an Entropy-based Lower Bound on Token Acceptance Probability
- Author
-
Agrawal, Sudhanshu, Jeon, Wonseok, and Lee, Mingu
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Speculative decoding is a powerful technique that attempts to circumvent the autoregressive constraint of modern Large Language Models (LLMs). The aim of speculative decoding techniques is to improve the average inference time of a large, target model without sacrificing its accuracy, by using a more efficient draft model to propose draft tokens which are then verified in parallel. The number of draft tokens produced in each drafting round is referred to as the draft length and is often a static hyperparameter chosen based on the acceptance rate statistics of the draft tokens. However, setting a static draft length can negatively impact performance, especially in scenarios where drafting is expensive and there is a high variance in the number of tokens accepted. Adaptive Entropy-based Draft Length (AdaEDL) is a simple, training and parameter-free criteria which allows for early stopping of the token drafting process by approximating a lower bound on the expected acceptance probability of the drafted token based on the currently observed entropy of the drafted logits. We show that AdaEDL consistently outperforms static draft-length speculative decoding by 10%-57% as well as other training-free draft-stopping techniques by upto 10% in a variety of settings and datasets. At the same time, we show that AdaEDL is more robust than these techniques and preserves performance in high-sampling-temperature scenarios. Since it is training-free, in contrast to techniques that rely on the training of dataset-specific draft-stopping predictors, AdaEDL can seamlessly be integrated into a variety of pre-existing LLM systems., Comment: Workshop on Efficient Natural Language and Signal Processing at NeurIPS 2024
- Published
- 2024
30. Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure
- Author
-
Zhang, Chaoyun, Yao, Randolph, Qin, Si, Li, Ze, Agrawal, Shekhar, Mishra, Binit R., Tran, Tri, Ma, Minghua, Lin, Qingwei, Chintalapati, Murali, and Zhang, Dongmei
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
The presence of unhealthy nodes in cloud infrastructure signals the potential failure of machines, which can significantly impact the availability and reliability of cloud services, resulting in negative customer experiences. Effectively addressing unhealthy node mitigation is therefore vital for sustaining cloud system performance. This paper introduces Deoxys, a causal inference engine tailored to recommending mitigation actions for unhealthy node in cloud systems to minimize virtual machine downtime and interruptions during unhealthy events. It employs double machine learning combined with causal forest to produce precise and reliable mitigation recommendations based solely on limited observational data collected from the historical unhealthy events. To enhance the causal inference model, Deoxys further incorporates a policy fallback mechanism based on model uncertainty and action overriding mechanisms to (i) improve the reliability of the system, and (ii) strike a good tradeoff between downtime reduction and resource utilization, thereby enhancing the overall system performance. After deploying Deoxys in a large-scale cloud infrastructure at Microsoft, our observations demonstrate that Deoxys significantly reduces average VM downtime by 53% compared to a legacy policy, while leading to 49.5% lower VM interruption rate. This substantial improvement enhances the reliability and stability of cloud platforms, resulting in a seamless customer experience.
- Published
- 2024
31. Enhancing Trust and Safety in Digital Payments: An LLM-Powered Approach
- Author
-
Dahiphale, Devendra, Madiraju, Naveen, Lin, Justin, Karve, Rutvik, Agrawal, Monu, Modwal, Anant, Balakrishnan, Ramanan, Shah, Shanay, Kaushal, Govind, Mandawat, Priya, Hariramani, Prakash, and Merchant, Arif
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computational Engineering, Finance, and Science ,Computer Science - Machine Learning - Abstract
Digital payment systems have revolutionized financial transactions, offering unparalleled convenience and accessibility to users worldwide. However, the increasing popularity of these platforms has also attracted malicious actors seeking to exploit their vulnerabilities for financial gain. To address this challenge, robust and adaptable scam detection mechanisms are crucial for maintaining the trust and safety of digital payment ecosystems. This paper presents a comprehensive approach to scam detection, focusing on the Unified Payments Interface (UPI) in India, Google Pay (GPay) as a specific use case. The approach leverages Large Language Models (LLMs) to enhance scam classification accuracy and designs a digital assistant to aid human reviewers in identifying and mitigating fraudulent activities. The results demonstrate the potential of LLMs in augmenting existing machine learning models and improving the efficiency, accuracy, quality, and consistency of scam reviews, ultimately contributing to a safer and more secure digital payment landscape. Our evaluation of the Gemini Ultra model on curated transaction data showed a 93.33% accuracy in scam classification. Furthermore, the model demonstrated 89% accuracy in generating reasoning for these classifications. A promising fact, the model identified 32% new accurate reasons for suspected scams that human reviewers had not included in the review notes., Comment: 10 pages, 7 figures
- Published
- 2024
32. Centrality-aware Product Retrieval and Ranking
- Author
-
Saadany, Hadeel, Bhosale, Swapnil, Agrawal, Samarth, Kanojia, Diptesh, Orasan, Constantin, and Wu, Zhe
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to users' search queries. Ambiguity and complexity of user queries often lead to a mismatch between the user's intent and retrieved product titles or documents. Recent approaches have proposed the use of Transformer-based models, which need millions of annotated query-title pairs during the pre-training stage, and this data often does not take user intent into account. To tackle this, we curate samples from existing datasets at eBay, manually annotated with buyer-centric relevance scores and centrality scores, which reflect how well the product title matches the users' intent. We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimises for the user intent in semantic product search. To that end, we propose a dual-loss based optimisation to handle hard negatives, i.e., product titles that are semantically relevant but do not reflect the user's intent. Our contributions include curating challenging evaluation sets and implementing UCO, resulting in significant product ranking efficiency improvements observed for different evaluation metrics. Our work aims to ensure that the most buyer-centric titles for a query are ranked higher, thereby, enhancing the user experience on e-commerce platforms., Comment: EMNLP 2024: Industry track
- Published
- 2024
33. Mitigating the impact of noise transients in gravitational-wave searches using reduced basis timeseries and convolutional neural networks
- Author
-
Magee, Ryan, Sharma, Ritwik, Agrawal, Ananya, and Udall, Rhiannon
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
Gravitational-wave detection pipelines have helped to identify over one hundred compact binary mergers in the data collected by the Advanced LIGO and Advanced Virgo interferometers, whose sensitivity has provided unprecedented access to the workings of the gravitational universe. The detectors are, however, subject to a wide variety of noise transients (or glitches) that can contaminate the data. Although detection pipelines utilize a variety of noise mitigation techniques, glitches can occasionally bypass these checks and produce false positives. One class of mitigation techniques is the signal consistency check, which aims to quantify how similar the observed data is to the expected signal. In this work, we describe a new signal consistency check that utilizes a set of bases that spans the gravitational-wave signal space and convolutional neural networks (CNN) to probabilistically identify glitches. We recast the basis response as a grayscale image, and train a CNN to distinguish between gravitational-waves and glitches with similar morphologies. We find that the CNN accurately classifies $\gtrsim 99\%$ of the responses it is shown. We compare these results to a toy detection pipeline, finding that the two methods produce similar false positive rates, but that the CNN has a significantly higher true positive rate. We modify our toy model detection pipeline and demonstrate that including information from the network increases the toy pipeline's true positive rate by $4-7\%$ while decreasing the false positive rate to a data-limited bound of $\lesssim 0.1\%$.
- Published
- 2024
34. Comparing Surface Landmine Object Detection Models on a New Drone Flyby Dataset
- Author
-
Agrawal-Chung, Navin and Moin, Zohran
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Landmine detection using traditional methods is slow, dangerous and prohibitively expensive. Using deep learning-based object detection algorithms drone videos is promising but has multiple challenges due to the small, soda-can size of recently prevalent surface landmines. The literature currently lacks scientific evaluation of optimal ML models for this problem since most object detection research focuses on analysis of ground video surveillance images. In order to help train comprehensive models and drive research for surface landmine detection, we first create a custom dataset comprising drone images of POM-2 and POM-3 Russian surface landmines. Using this dataset, we train, test and compare 4 different computer vision foundation models YOLOF, DETR, Sparse-RCNN and VFNet. Generally, all 4 detectors do well with YOLOF outperforming other models with a mAP score of 0.89 while DETR, VFNET and Sparse-RCNN mAP scores are all around 0.82 for drone images taken from 10m AGL. YOLOF is also quicker to train consuming 56min of training time on a Nvidia V100 compute cluster. Finally, this research contributes landmine image, video datasets and model Jupyter notebooks at https://github.com/UnVeilX/ to enable future research in surface landmine detection., Comment: 9 pages, 22 figures, 7 tables
- Published
- 2024
35. Cthulhu: An Open Source Molecular and Atomic Cross Section Computation Code for Substellar Atmospheres
- Author
-
Agrawal, Arnav and MacDonald, Ryan J.
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - Earth and Planetary Astrophysics ,Astrophysics - Solar and Stellar Astrophysics - Abstract
Atmospheric studies of exoplanets and brown dwarfs are a cutting-edge and rapidly evolving area of astrophysics research. Calculating models of exoplanet or brown dwarf spectra requires knowledge of the wavelength-dependent absorption of light (cross sections) by the molecules and atoms in the atmosphere. Here we introduce Cthulhu, a pure Python package that rapidly calculates cross sections from atomic and molecular line lists. Cthulhu includes modules to automatically download molecular line lists from online databases (e.g. ExoMol and HITRAN) and compute cross sections on a user-specified temperature, pressure, and wavenumber grid. Cthulhu requires only CPUs and can run on a user's laptop (for smaller line lists with < 100 million lines) or on a large cluster in parallel (for many billion lines). Cthulhu includes in-depth Jupyter tutorials in the online documentation. Finally, Cthulhu can be used as an educational tool to demystify the process of making cross sections for atmospheric models., Comment: 7 pages, 1 figure, published in JOSS. Summon Cthulhu at https://cthulhu.readthedocs.io/en/latest/
- Published
- 2024
- Full Text
- View/download PDF
36. ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
- Author
-
Zhang, Chen Bo Calvin, Hong, Zhang-Wei, Pacchiano, Aldo, and Agrawal, Pulkit
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Robotics - Abstract
Reward shaping is a critical component in reinforcement learning (RL), particularly for complex tasks where sparse rewards can hinder learning. While shaping rewards have been introduced to provide additional guidance, selecting effective shaping functions remains challenging and computationally expensive. This paper introduces Online Reward Selection and Policy Optimization (ORSO), a novel approach that frames shaping reward selection as an online model selection problem. ORSO employs principled exploration strategies to automatically identify promising shaping reward functions without human intervention, balancing exploration and exploitation with provable regret guarantees. We demonstrate ORSO's effectiveness across various continuous control tasks using the Isaac Gym simulator. Compared to traditional methods that fully evaluate each shaping reward function, ORSO significantly improves sample efficiency, reduces computational time, and consistently identifies high-quality reward functions that produce policies comparable to those generated by domain experts through hand-engineered rewards., Comment: preprint, 35 pages, 23 figures
- Published
- 2024
37. Syn2Real Domain Generalization for Underwater Mine-like Object Detection Using Side-Scan Sonar
- Author
-
Agrawal, Aayush, Sikdar, Aniruddh, Makam, Rajini, Sundaram, Suresh, Besai, Suresh Kumar, and Gopi, Mahesh
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Underwater mine detection with deep learning suffers from limitations due to the scarcity of real-world data. This scarcity leads to overfitting, where models perform well on training data but poorly on unseen data. This paper proposes a Syn2Real (Synthetic to Real) domain generalization approach using diffusion models to address this challenge. We demonstrate that synthetic data generated with noise by DDPM and DDIM models, even if not perfectly realistic, can effectively augment real-world samples for training. The residual noise in the final sampled images improves the model's ability to generalize to real-world data with inherent noise and high variation. The baseline Mask-RCNN model when trained on a combination of synthetic and original training datasets, exhibited approximately a 60% increase in Average Precision (AP) compared to being trained solely on the original training data. This significant improvement highlights the potential of Syn2Real domain generalization for underwater mine detection tasks., Comment: 7 pages, 4 figures and 3 tables
- Published
- 2024
38. Leveraging Augmented Reality for Improved Situational Awareness During UAV-Driven Search and Rescue Missions
- Author
-
Nalamothu, Rushikesh, Sontha, Puneet, Karravula, Janardhan, and Agrawal, Ankit
- Subjects
Computer Science - Robotics - Abstract
In the high-stakes domain of search-and-rescue missions, the deployment of Unmanned Aerial Vehicles (UAVs) has become increasingly pivotal. These missions require seamless, real-time communication among diverse roles within response teams, particularly between Remote Operators (ROs) and On-Site Operators (OSOs). Traditionally, ROs and OSOs have relied on radio communication to exchange critical information, such as the geolocation of victims, hazardous areas, and points of interest. However, radio communication lacks information visualization, suffers from noise, and requires mental effort to interpret information, leading to miscommunications and misunderstandings. To address these challenges, this paper presents VizCom-AR, an Augmented Reality system designed to facilitate visual communication between ROs and OSOs and their situational awareness during UAV-driven search-and-rescue missions. Our experiments, focus group sessions with police officers, and field study showed that VizCom-AR enhances spatial awareness of both ROs and OSOs, facilitate geolocation information exchange, and effectively complement existing communication tools in UAV-driven emergency response missions. Overall, VizCom-AR offers a fundamental framework for designing Augmented Reality systems for large scale UAV-driven rescue missions., Comment: 8 pages
- Published
- 2024
39. Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
- Author
-
Banerjee, Somnath, Layek, Sayan, Shrawgi, Hari, Mandal, Rajarshi, Halder, Avik, Kumar, Shanu, Basu, Sagnik, Agrawal, Parag, Hazra, Rima, and Mukherjee, Animesh
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.
- Published
- 2024
40. Findings of the WMT 2024 Shared Task on Chat Translation
- Author
-
Mohammed, Wafaa, Agrawal, Sweta, Farajian, M. Amin, Cabarrão, Vera, Eikema, Bryan, Farinha, Ana C., and de Souza, José G. C.
- Subjects
Computer Science - Computation and Language - Abstract
This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversation context in translation quality and evaluation. We also include two new language pairs: English-Korean and English-Dutch, in addition to the set of language pairs from previous editions: English-German, English-French, and English-Brazilian Portuguese. We received 22 primary submissions and 32 contrastive submissions from eight teams, with each language pair having participation from at least three teams. We evaluated the systems comprehensively using both automatic metrics and human judgments via a direct assessment framework. The official rankings for each language pair were determined based on human evaluation scores, considering performance in both translation directions--agent and customer. Our analysis shows that while the systems excelled at translating individual turns, there is room for improvement in overall conversation-level translation quality., Comment: 12 pages, 5 figures, 13 tables
- Published
- 2024
41. Preliminary Evaluation of an Ultrasound-Guided Robotic System for Autonomous Percutaneous Intervention
- Author
-
Mohan, Pratima, Agrawal, Aayush, and Patel, Niravkumar A.
- Subjects
Computer Science - Robotics - Abstract
Cancer cases have been rising globally, resulting in nearly 10 million deaths in 2023. Biopsy, crucial for diagnosis, is often performed under ultrasound (US) guidance, demanding precise hand coordination and cognitive decision-making. Robot-assisted interventions have shown improved accuracy in lesion targeting by addressing challenges such as noisy 2D images and maintaining consistent probe-to-surface contact. Recent research has focused on fully autonomous robotic US systems to enable standardized diagnostic procedures and reproducible US-guided therapy. This study presents a fully autonomous system for US-guided needle placement capable of performing end-to-end clinical workflow. The system autonomously: 1) identifies the liver region on the patient's abdomen surface, 2) plans and executes the US scanning path using impedance control, 3) localizes lesions from the US images in real-time, and 4) targets the identified lesions, all without human intervention. This study evaluates both position and impedance-controlled systems. Validation on agar phantoms demonstrated a targeting error of 5.74 +- 2.70 mm, highlighting its potential for accurately targeting tumors larger than 5 mm. Achieved results show its potential for a fully autonomous system for US-guided biopsies., Comment: 7 pages and 6 figures
- Published
- 2024
42. Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
- Author
-
Zaranis, Emmanouil, Attanasio, Giuseppe, Agrawal, Sweta, and Martins, André F. T.
- Subjects
Computer Science - Computation and Language - Abstract
The automatic assessment of translation quality has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. Although quality estimation (QE) metrics have been optimized to align with human judgments, no attention has been given to these metrics' potential biases, particularly in reinforcing visibility and usability for some demographic groups over others. This study is the first to investigate gender bias in QE metrics and its downstream impact on machine translation (MT). Focusing on out-of-English translations into languages with grammatical gender, we ask: Do contemporary QE metrics exhibit gender bias? Can the use of contextual information mitigate this bias? How does QE influence gender bias in MT outputs? Experiments with state-of-the-art QE metrics across multiple domains, datasets, and languages reveal significant bias. Masculine-inflected translations score higher than feminine-inflected ones, and gender-neutral translations are penalized. Moreover, context-aware QE metrics reduce errors for masculine-inflected references but fail to address feminine referents, exacerbating gender disparities. Additionally, QE metrics can perpetuate gender bias in MT systems when used in quality-aware decoding. Our findings underscore the need to address gender bias in QE metrics to ensure equitable and unbiased MT systems., Comment: Work in progress
- Published
- 2024
43. Beyond-RAG: Question Identification and Answer Generation in Real-Time Conversations
- Author
-
Agrawal, Garima, Gummuluri, Sashank, and Spera, Cosimo
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In customer contact centers, human agents often struggle with long average handling times (AHT) due to the need to manually interpret queries and retrieve relevant knowledge base (KB) articles. While retrieval augmented generation (RAG) systems using large language models (LLMs) have been widely adopted in industry to assist with such tasks, RAG faces challenges in real-time conversations, such as inaccurate query formulation and redundant retrieval of frequently asked questions (FAQs). To address these limitations, we propose a decision support system that can look beyond RAG by first identifying customer questions in real time. If the query matches an FAQ, the system retrieves the answer directly from the FAQ database; otherwise, it generates answers via RAG. Our approach reduces reliance on manual queries, providing responses to agents within 2 seconds. Deployed in AI-powered human-agent assist solution at Minerva CQ, this system improves efficiency, reduces AHT, and lowers operational costs. We also introduce an automated LLM-agentic workflow to identify FAQs from historical transcripts when no predefined FAQs exist.
- Published
- 2024
44. Search for non-virialized axions with 3.3-4.2 $\mu$eV mass at selected resolving powers
- Author
-
Hipp, A. T., Quiskamp, A., Caligiure, T. J., Gleason, J. R., Han, Y., Jois, S., Sikivie, P., Solano, M. E., Sullivan, N. S., Tanner, D. B., Goryachev, M., Hartman, E., Tobar, M. E., McAllister, B. T., Duffy, L. D., Braine, T., Burns, E., Cervantes, R., Crisosto, N., Goodman, C., Guzzetti, M., Hanretty, C., Lee, S., Korandla, H., Leum, G., Mohapatra, P., Nitta, T., Rosenberg, L. J, Rybka, G., Sinnis, J., Zhang, D., Bartram, C., Dyson, T. A., Kuo, C. L., Ruppert, S., Withers, M. O., Awida, M. H., Bowring, D., Chou, A. S., Hollister, M., Knirck, S., Sonnenschein, A., Wester, W., Brodsky, J., Carosi, G., Du, N., Roberston, N., Woollett, N., Boutan, C., Jones, A. M., LaRoque, B. H., Lentz, E., Man, N. E., Oblath, N. S., Taubman, M. S., Yang, J., Khatiwada, R., Clarke, John, Siddiqi, I., Agrawal, A., Dixit, A. V., Daw, E. J., Perry, M. G., Buckley, J. H., Gaikwad, C., Hoffman, J., Murch, K. W., and Russell, J.
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
The Axion Dark Matter eXperiment is sensitive to narrow axion flows, given axions compose a fraction of the dark matter with a non-negligible local density. Detecting these low-velocity dispersion flows requires a high spectral resolution and careful attention to the expected signal modulation due to Earth's motion. We report an exclusion on the local axion dark matter density in narrow flows of $\rho_a \gtrsim 0.03\,\mathrm{GeV/cm^3}$ and $\rho_a \gtrsim 0.004\,\mathrm{GeV/cm^3}$ for Dine-Fischler-Srednicki-Zhitnitski and Kim-Shifman-Vainshtein-Zakharov axion-photon couplings, respectively, over the mass range $3.3-4.2\,\mu\text{eV}$. Measurements were made at selected resolving powers to allow for a range of possible velocity dispersions., Comment: 7 pages, 3 figures
- Published
- 2024
45. Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms
- Author
-
Kumar, Navdeep, Agrawal, Priyank, Ramponi, Giorgia, Levy, Kfir Yehuda, and Mannor, Shie
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(\epsilon^{-3})$, advancing beyond the existing local convergence results. Previous works provide local convergence guarantees with a sample complexity of $O(\epsilon^{-2})$ for bounding the squared gradient of the return, which translates to a global sample complexity of $O(\epsilon^{-4})$ using the gradient domination lemma. In contrast to traditional methods that employ decreasing step sizes for both the actor and critic, we demonstrate that a constant step size for the critic is sufficient to ensure convergence in expectation. This key insight reveals that using a decreasing step size for the actor alone is sufficient to handle the noise for both the actor and critic. Our findings provide theoretical support for the practical success of many algorithms that rely on constant step sizes.
- Published
- 2024
46. Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
- Author
-
Agrawal, Sweta, de Souza, José G. C., Rei, Ricardo, Farinhas, António, Faria, Gonçalo, Fernandes, Patrick, Guerreiro, Nuno M, and Martins, Andre
- Subjects
Computer Science - Computation and Language - Abstract
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved quality. However, preference data based on human feedback can be very expensive to obtain and curate at a large scale. Automatic metrics, on the other hand, can induce preferences, but they might not match human expectations perfectly. In this paper, we propose an approach that leverages the best of both worlds. We first collect sentence-level quality assessments from professional linguists on translations generated by multiple high-quality MT systems and evaluate the ability of current automatic metrics to recover these preferences. We then use this analysis to curate a new dataset, MT-Pref (metric induced translation preference) dataset, which comprises 18k instances covering 18 language directions, using texts sourced from multiple domains post-2022. We show that aligning TOWER models on MT-Pref significantly improves translation quality on WMT23 and FLORES benchmarks., Comment: Accepted at EMNLP Main 2024
- Published
- 2024
47. Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
- Author
-
Wu, Tong, Zhang, Shujian, Song, Kaiqiang, Xu, Silei, Zhao, Sanqiang, Agrawal, Ravi, Indurthi, Sathish Reddy, Xiang, Chong, Mittal, Prateek, and Zhou, Wenxuan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and data. As a result, lower-priority user prompts may override more critical system instructions, including safety protocols. Existing approaches to achieving instruction hierarchy, such as delimiters and instruction-based training, do not address this issue at the architectural level. We introduce the Instructional Segment Embedding (ISE) technique, inspired by BERT, to modern large language models, which embeds instruction priority information directly into the model. This approach enables models to explicitly differentiate and prioritize various instruction types, significantly improving safety against malicious prompts that attempt to override priority rules. Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instruction-following capability of up to 4.1% evaluated on AlpacaEval. Overall, our approach offers a promising direction for enhancing the safety and effectiveness of LLM architectures., Comment: Preprint
- Published
- 2024
48. Pixtral 12B
- Author
-
Agrawal, Pravesh, Antoniak, Szymon, Hanna, Emma Bou, Bout, Baptiste, Chaplot, Devendra, Chudnovsky, Jessica, Costa, Diogo, De Monicault, Baudouin, Garg, Saurabh, Gervet, Theophile, Ghosh, Soham, Héliou, Amélie, Jacob, Paul, Jiang, Albert Q., Khandelwal, Kartik, Lacroix, Timothée, Lample, Guillaume, Casas, Diego Las, Lavril, Thibaut, Scao, Teven Le, Lo, Andy, Marshall, William, Martin, Louis, Mensch, Arthur, Muddireddy, Pavankumar, Nemychnikova, Valera, Pellat, Marie, Von Platen, Patrick, Raghuraman, Nikhil, Rozière, Baptiste, Sablayrolles, Alexandre, Saulnier, Lucile, Sauvestre, Romain, Shang, Wendy, Soletskyi, Roman, Stewart, Lawrence, Stock, Pierre, Studnia, Joachim, Subramanian, Sandeep, Vaze, Sagar, Wang, Thomas, and Yang, Sophia
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models. Unlike many open-source models, Pixtral is also a cutting-edge text model for its size, and does not compromise on natural language performance to excel in multimodal tasks. Pixtral uses a new vision encoder trained from scratch, which allows it to ingest images at their natural resolution and aspect ratio. This gives users flexibility on the number of tokens used to process an image. Pixtral is also able to process any number of images in its long context window of 128K tokens. Pixtral 12B substanially outperforms other open models of similar sizes (Llama-3.2 11B \& Qwen-2-VL 7B). It also outperforms much larger open models like Llama-3.2 90B while being 7x smaller. We further contribute an open-source benchmark, MM-MT-Bench, for evaluating vision-language models in practical scenarios, and provide detailed analysis and code for standardized evaluation protocols for multimodal LLMs. Pixtral-12B is released under Apache 2.0 license.
- Published
- 2024
49. Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning
- Author
-
Singh, Ayush, Gupta, Mansi, Garg, Shivank, Kumar, Abhinav, and Agrawal, Vansh
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Vision-Language Models (VLMs) have transformed tasks requiring visual and reasoning abilities, such as image retrieval and Visual Question Answering (VQA). Despite their success, VLMs face significant challenges with tasks involving geometric reasoning, algebraic problem-solving, and counting. These limitations stem from difficulties effectively integrating multiple modalities and accurately interpreting geometry-related tasks. Various works claim that introducing a captioning pipeline before VQA tasks enhances performance. We incorporated this pipeline for tasks involving geometry, algebra, and counting. We found that captioning results are not generalizable, specifically with larger VLMs primarily trained on downstream QnA tasks showing random performance on math-related challenges. However, we present a promising alternative: task-based prompting, enriching the prompt with task-specific guidance. This approach shows promise and proves more effective than direct captioning methods for math-heavy problems.
- Published
- 2024
50. Give me a hint: Can LLMs take a hint to solve math problems?
- Author
-
Agrawal, Vansh, Singla, Pratham, Miglani, Amitoj Singh, Garg, Shivank, and Mangal, Ayush
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
While state-of-the-art LLMs have shown poor logical and basic mathematical reasoning, recent works try to improve their problem-solving abilities using prompting techniques. We propose giving "hints" to improve the language model's performance on advanced mathematical problems, taking inspiration from how humans approach math pedagogically. We also test robustness to adversarial hints and demonstrate their sensitivity to them. We demonstrate the effectiveness of our approach by evaluating various diverse LLMs, presenting them with a broad set of problems of different difficulties and topics from the MATH dataset and comparing against techniques such as one-shot, few-shot, and chain of thought prompting.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.