Author: "Bedi, P" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Bedi, P"' showing total 115 results

Start Over Author "Bedi, P" Database arXiv

115 results on '"Bedi, P"'

1. Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction

Author: Singh, Utsav, Chakraborty, Souradip, Suttle, Wesley A., Sadler, Brian M., Sahu, Anit Kumar, Shah, Mubarak, Namboodiri, Vinay P., and Bedi, Amrit Singh
Subjects: Computer Science - Machine Learning
Abstract: This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) that addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. HPO leverages maximum entropy reinforcement learning combined with token-level Direct Preference Optimization (DPO), eliminating the need for pre-trained reference policies that are typically unavailable in challenging robotic scenarios. Mathematically, we formulate HRL as a bi-level optimization problem and transform it into a primitive-regularized DPO formulation, ensuring feasible subgoal generation and avoiding degenerate solutions. Extensive experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines. Furthermore, ablation studies validate our design choices, and quantitative analyses confirm the ability of HPO to mitigate non-stationarity and infeasible subgoal generation issues in HRL.
Published: 2024

2. EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering

Author: Cheng, Kai, Li, Zhengyuan, Sun, Xingpeng, Min, Byung-Cheol, Bedi, Amrit Singh, and Bera, Aniket
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Embodied Question Answering (EQA) is an essential yet challenging task for robotic home assistants. Recent studies have shown that large vision-language models (VLMs) can be effectively utilized for EQA, but existing works either focus on video-based question answering without embodied exploration or rely on closed-form choice sets. In real-world scenarios, a robotic agent must efficiently explore and accurately answer questions in open-vocabulary settings. To address these challenges, we propose a novel framework called EfficientEQA for open-vocabulary EQA, which enables efficient exploration and accurate answering. In EfficientEQA, the robot actively explores unknown environments using Semantic-Value-Weighted Frontier Exploration, a strategy that prioritizes exploration based on semantic importance provided by calibrated confidence from black-box VLMs to quickly gather relevant information. To generate accurate answers, we employ Retrieval-Augmented Generation (RAG), which utilizes BLIP to retrieve useful images from accumulated observations and VLM reasoning to produce responses without relying on predefined answer choices. Additionally, we detect observations that are highly relevant to the question as outliers, allowing the robot to determine when it has sufficient information to stop exploring and provide an answer. Experimental results demonstrate the effectiveness of our approach, showing an improvement in answering accuracy by over 15% and efficiency, measured in running steps, by over 20% compared to state-of-the-art methods.
Published: 2024

3. ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment

Author: Obbad, Elyas, Mlauzi, Iddah, Miranda, Brando, Schaeffer, Rylan, Obbad, Kamal, Bedi, Suhana, and Koyejo, Sanmi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Data selection is crucial for optimizing language model (LM) performance on specific tasks, yet most existing methods fail to effectively consider the target task distribution. Current approaches either ignore task-specific requirements entirely or rely on approximations that fail to capture the nuanced patterns needed for tasks like Autoformalization or code generation. Methods that do consider the target distribution often rely on simplistic, sometimes noisy, representations, like hashed n-gram features, which can lead to collisions and introduce noise. We introduce ZIP-FIT, a data selection framework that uses gzip compression to directly measure alignment between potential training data and the target task distribution. In extensive evaluations on Autoformalization and Python code generation, ZIP-FIT significantly outperforms leading baselines like DSIR and D4. Models trained on ZIP-FIT-selected data achieve their lowest cross-entropy loss up to 85.1\% faster than baselines, demonstrating that better task alignment leads to more efficient learning. In addition, ZIP-FIT performs selection up to 65.8\% faster than DSIR and two orders of magnitude faster than D4. Notably, ZIP-FIT shows that smaller, well-aligned datasets often outperform larger but less targeted ones, demonstrating that a small amount of higher quality data is superior to a large amount of lower quality data. Our results imply that task-aware data selection is crucial for efficient domain adaptation, and that compression offers a principled way to measure task alignment. By showing that targeted data selection can dramatically improve task-specific performance, our work provides new insights into the relationship between data quality, task alignment, and model learning efficiency.
Published: 2024

4. On The Global Convergence Of Online RLHF With Neural Parametrization

Author: Gaur, Mudit, Bedi, Amrit Singh, Pasupathy, Raghu, and Aggarwal, Vaneet
Subjects: Computer Science - Machine Learning
Abstract: The importance of Reinforcement Learning from Human Feedback (RLHF) in aligning large language models (LLMs) with human values cannot be overstated. RLHF is a three-stage process that includes supervised fine-tuning (SFT), reward learning, and policy learning. Although there are several offline and online approaches to aligning LLMs, they often suffer from distribution shift issues. These issues arise from the inability to accurately capture the distributional interdependence between the reward learning and policy learning stages. Consequently, this has led to various approximated approaches, but the theoretical insights and motivations remain largely limited to tabular settings, which do not hold in practice. This gap between theoretical insights and practical implementations is critical. It is challenging to address this gap as it requires analyzing the performance of AI alignment algorithms in neural network-parameterized settings. Although bi-level formulations have shown promise in addressing distribution shift issues, they suffer from the hyper-gradient problem, and current approaches lack efficient algorithms to solve this. In this work, we tackle these challenges employing the bi-level formulation laid out in Kwon et al. (2024) along with the assumption \emph{Weak Gradient Domination} to demonstrate convergence in an RLHF setup, obtaining a sample complexity of $\epsilon^{-\frac{7}{2}}$ . Our key contributions are twofold: (i) We propose a bi-level formulation for AI alignment in parameterized settings and introduce a first-order approach to solve this problem. (ii) We analyze the theoretical convergence rates of the proposed algorithm and derive state-of-the-art bounds. To the best of our knowledge, this is the first work to establish convergence rate bounds and global optimality for the RLHF framework in neural network-parameterized settings.
Published: 2024

5. rECGnition_v1.0: Arrhythmia detection using cardiologist-inspired multi-modal architecture incorporating demographic attributes in ECG

Author: Srivastava, Shreya, Kumar, Durgesh, Bedi, Jatin, Seth, Sandeep, and Sharma, Deepak
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: A substantial amount of variability in ECG manifested due to patient characteristics hinders the adoption of automated analysis algorithms in clinical practice. None of the ECG annotators developed till date consider the characteristics of the patients in a multi-modal architecture. We employed the XGBoost model to analyze the UCI Arrhythmia dataset, linking patient characteristics to ECG morphological changes. The model accurately classified patient gender using discriminative ECG features with 87.75% confidence. We propose a novel multi-modal methodology for ECG analysis and arrhythmia classification that can help defy the variability in ECG related to patient-specific conditions. This deep learning algorithm, named rECGnition_v1.0 (robust ECG abnormality detection Version 1), fuses Beat Morphology with Patient Characteristics to create a discriminative feature map that understands the internal correlation between both modalities. A Squeeze and Excitation based Patient characteristic Encoding Network (SEPcEnet) has been introduced, considering the patient's demographics. The trained model outperformed the various existing algorithms by achieving the overall F1-score of 0.986 for the ten arrhythmia class classification in the MITDB and achieved near perfect prediction scores of ~0.99 for LBBB, RBBB, Premature ventricular contraction beat, Atrial premature beat and Paced beat. Subsequently, the methodology was validated across INCARTDB, EDB and different class groups of MITDB using transfer learning. The generalizability test provided F1-scores of 0.980, 0.946, 0.977, and 0.980 for INCARTDB, EDB, MITDB AAMI, and MITDB Normal vs. Abnormal Classification, respectively. Therefore, with a more enhanced and comprehensive understanding of the patient being examined and their ECG for diverse CVD manifestations, the proposed rECGnition_v1.0 algorithm paves the way for its deployment in clinics.
Published: 2024

6. On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning

Author: Barakat, Anas, Chakraborty, Souradip, Yu, Peihong, Tokekar, Pratap, and Bedi, Amrit Singh
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Reinforcement learning with general utilities has recently gained attention thanks to its ability to unify several problems, including imitation learning, pure exploration, and safe RL. However, prior work for solving this general problem in a unified way has mainly focused on the tabular setting. This is restrictive when considering larger state-action spaces because of the need to estimate occupancy measures during policy optimization. In this work, we address this issue and propose to approximate occupancy measures within a function approximation class using maximum likelihood estimation (MLE). We propose a simple policy gradient algorithm (PG-OMA) where an actor updates the policy parameters to maximize the general utility objective whereas a critic approximates the occupancy measure using MLE. We provide a sample complexity analysis of PG-OMA showing that our occupancy measure estimation error only scales with the dimension of our function approximation class rather than the size of the state action space. Under suitable assumptions, we establish first order stationarity and global optimality performance bounds for the proposed PG-OMA algorithm for nonconcave and concave general utilities respectively. We complement our methodological and theoretical findings with promising empirical results showing the scalability potential of our approach compared to existing tabular count-based approaches., Comment: 26 pages, 5 figures
Published: 2024

7. AIME: AI System Optimization via Multiple LLM Evaluators

Author: Patel, Bhrij, Chakraborty, Souradip, Suttle, Wesley A., Wang, Mengdi, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Text-based AI system optimization typically involves a feedback loop scheme where a single LLM generates an evaluation in natural language of the current output to improve the next iteration's output. However, in this work, we empirically demonstrate that for a practical and complex task (code generation) with multiple criteria to evaluate, utilizing only one LLM evaluator tends to let errors in generated code go undetected, thus leading to incorrect evaluations and ultimately suboptimal test case performance. Motivated by this failure case, we assume there exists an optimal evaluation policy that samples an evaluation between response and ground truth. We then theoretically prove that a linear combination of multiple evaluators can approximate this optimal policy. From this insight, we propose AI system optimization via Multiple LLM Evaluators (AIME). AIME is an evaluation protocol that utilizes multiple LLMs that each independently generate an evaluation on separate criteria and then combine them via concatenation. We provide an extensive empirical study showing AIME outperforming baseline methods in code generation tasks, with up to $62\%$ higher error detection rate and up to $16\%$ higher success rate than a single LLM evaluation protocol on LeetCodeHard and HumanEval datasets. We also show that the selection of the number of evaluators and which criteria to utilize is non-trivial as it can impact pact success rate by up to $12\%$., Comment: 21 pages, 10 Figures, 4 Tables
Published: 2024

8. Auction-Based Regulation for Artificial Intelligence

Author: Bornstein, Marco, Che, Zora, Julapalli, Suhas, Mohamed, Abdirisak, Bedi, Amrit Singh, and Huang, Furong
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Economics - General Economics
Abstract: In an era of "moving fast and breaking things", regulators have moved slowly to pick up the safety, bias, and legal pieces left in the wake of broken Artificial Intelligence (AI) deployment. Since AI models, such as large language models, are able to push misinformation and stoke division within our society, it is imperative for regulators to employ a framework that mitigates these dangers and ensures user safety. While there is much-warranted discussion about how to address the safety, bias, and legal woes of state-of-the-art AI models, the number of rigorous and realistic mathematical frameworks to regulate AI safety is lacking. We take on this challenge, proposing an auction-based regulatory mechanism that provably incentivizes model-building agents (i) to deploy safer models and (ii) to participate in the regulation process. We provably guarantee, via derived Nash Equilibria, that each participating agent's best strategy is to submit a model safer than a prescribed minimum-safety threshold. Empirical results show that our regulatory auction boosts safety and participation rates by 20% and 15% respectively, outperforming simple regulatory frameworks that merely enforce minimum safety standards., Comment: 20 pages, 7 figures
Published: 2024

9. meds_reader: A fast and efficient EHR processing library

Author: Steinberg, Ethan, Wornow, Michael, Bedi, Suhana, Fries, Jason Alan, McDermott, Matthew B. A., and Shah, Nigam H.
Subjects: Computer Science - Machine Learning, Computer Science - Databases
Abstract: The growing demand for machine learning in healthcare requires processing increasingly large electronic health record (EHR) datasets, but existing pipelines are not computationally efficient or scalable. In this paper, we introduce meds_reader, an optimized Python package for efficient EHR data processing that is designed to take advantage of many intrinsic properties of EHR data for improved speed. We then demonstrate the benefits of meds_reader by reimplementing key components of two major EHR processing pipelines, achieving 10-100x improvements in memory, speed, and disk usage. The code for meds_reader can be found at https://github.com/som-shahlab/meds_reader.
Published: 2024

10. Solving the strong CP problem with massless grand-color quarks

Author: Bedi, Ravneet, Gherghetta, Tony, and Harigaya, Keisuke
Subjects: High Energy Physics - Phenomenology
Abstract: We propose a solution to the strong CP problem that specifically relies on massless quarks and has no light axion. The QCD color group $SU(3)_c$ is embedded into a larger, simple gauge group (grand-color) where one of the massless, colored fermions enjoys an anomalous chiral symmetry, rendering the strong CP phase unphysical. The grand-color gauge group $G_{\rm GC}$ is Higgsed down to $SU(3)_c\times G_{c'}$, after which $G_{c'}$ eventually confines at a lower scale, spontaneously breaking the chiral symmetry and generating a real, positive mass to the massless, colored fermion. Since the chiral symmetry has a $G_{c'}$ anomaly, there is no corresponding light Nambu-Goldstone boson. The anomalous chiral symmetry can be an accidental symmetry that arises from an exact discrete symmetry without introducing a domain wall problem. Potential experimental signals of our mechanism include vector-like quarks near the TeV scale, pseudo Nambu-Goldstone bosons below the 10 GeV scale, light dark matter decay, and primordial gravitational waves from the new strong dynamics., Comment: 62 pages, 12 figures
Published: 2024

11. CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk

Author: Chehade, Mohamad Fares El Hajj, Bedi, Amrit Singh, Zhang, Amy, and Zhu, Hao
Subjects: Computer Science - Machine Learning
Abstract: Transfer learning in reinforcement learning (RL) has become a pivotal strategy for improving data efficiency in new, unseen tasks by utilizing knowledge from previously learned tasks. This approach is especially beneficial in real-world deployment scenarios where computational resources are constrained and agents must adapt rapidly to novel environments. However, current state-of-the-art methods often fall short in ensuring safety during the transfer process, particularly when unforeseen risks emerge in the deployment phase. In this work, we address these limitations by introducing a novel Caution-Aware Transfer Learning (CAT) framework. Unlike traditional approaches that limit risk considerations to mean-variance, we define "caution" as a more generalized and comprehensive notion of risk. Our core innovation lies in optimizing a weighted sum of reward return and caution-based on state-action occupancy measures-during the transfer process, allowing for a rich representation of diverse risk factors. To the best of our knowledge, this is the first work to explore the optimization of such a generalized risk notion within the context of transfer RL. Our contributions are threefold: (1) We propose a Caution-Aware Transfer (CAT) framework that evaluates source policies within the test environment and constructs a new policy that balances reward maximization and caution. (2) We derive theoretical sub-optimality bounds for our method, providing rigorous guarantees of its efficacy. (3) We empirically validate CAT, demonstrating that it consistently outperforms existing methods by delivering safer policies under varying risk conditions in the test tasks.
Published: 2024

12. TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation

Author: Sun, Xingpeng, Zhang, Yiran, Tang, Xindi, Bedi, Amrit Singh, and Bera, Aniket
Subjects: Computer Science - Robotics
Abstract: While LLMs are proficient at processing text in human conversations, they often encounter difficulties with the nuances of verbal instructions and, thus, remain prone to hallucinate trust in human command. In this work, we present TrustNavGPT, an LLM based audio guided navigation agent that uses affective cues in spoken communication elements such as tone and inflection that convey meaning beyond words, allowing it to assess the trustworthiness of human commands and make effective, safe decisions. Our approach provides a lightweight yet effective approach that extends existing LLMs to model audio vocal features embedded in the voice command and model uncertainty for safe robotic navigation.
Published: 2024

13. SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Author: Ding, Mucong, Chakraborty, Souradip, Agrawal, Vibhu, Che, Zora, Koppel, Alec, Wang, Mengdi, Bedi, Amrit, and Huang, Furong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead., Comment: 24 pages, 6 figures, 3 tables
Published: 2024

14. Multi-LLM QA with Embodied Exploration

Author: Patel, Bhrij, Dorbala, Vishnu Sashank, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have grown in popularity due to their natural language interface and pre trained knowledge, leading to rapidly increasing success in question-answering (QA) tasks. More recently, multi-agent systems with LLM-based agents (Multi-LLM) have been utilized increasingly more for QA. In these scenarios, the models may each answer the question and reach a consensus or each model is specialized to answer different domain questions. However, most prior work dealing with Multi-LLM QA has focused on scenarios where the models are asked in a zero-shot manner or are given information sources to extract the answer. For question answering of an unknown environment, embodied exploration of the environment is first needed to answer the question. This skill is necessary for personalizing embodied AI to environments such as households. There is a lack of insight into whether a Multi-LLM system can handle question-answering based on observations from embodied exploration. In this work, we address this gap by investigating the use of Multi-Embodied LLM Explorers (MELE) for QA in an unknown environment. Multiple LLM-based agents independently explore and then answer queries about a household environment. We analyze different aggregation methods to generate a single, final answer for each query: debating, majority voting, and training a central answer module (CAM). Using CAM, we observe a $46\%$ higher accuracy compared against the other non-learning-based aggregation methods. We provide code and the query dataset for further research., Comment: 16 pages, 9 Figures, 5 Tables
Published: 2024

15. DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

Author: Singh, Utsav, Chakraborty, Souradip, Suttle, Wesley A., Sadler, Brian M., Namboodiri, Vinay P, and Bedi, Amrit Singh
Subjects: Computer Science - Machine Learning
Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, while limited human preference data often is; making efficient use of such data to guide learning is therefore essential. Methods for learning to perform complex robotics tasks from human preference data must overcome both these challenges simultaneously. In this work, we introduce DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning, an efficient hierarchical approach that leverages direct preference optimization to learn a higher-level policy and reinforcement learning to learn a lower-level policy. DIPPER enjoys improved computational efficiency due to its use of direct preference optimization instead of standard preference-based approaches such as reinforcement learning from human feedback, while it also mitigates the well-known hierarchical reinforcement learning issues of non-stationarity and infeasible subgoal generation due to our use of primitive-informed regularization inspired by a novel bi-level optimization formulation of the hierarchical reinforcement learning problem. To validate our approach, we perform extensive experimental analysis on a variety of challenging robotics tasks, demonstrating that DIPPER outperforms hierarchical and non-hierarchical baselines, while ameliorating the non-stationarity and infeasible subgoal generation issues of hierarchical reinforcement learning.
Published: 2024

16. Transfer Q Star: Principled Decoding for LLM Alignment

Author: Chakraborty, Souradip, Ghosal, Soumya Suvra, Yin, Ming, Manocha, Dinesh, Wang, Mengdi, Bedi, Amrit Singh, and Huang, Furong
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{\pi_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $\rho_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.
Published: 2024

17. FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

Author: Bornstein, Marco, Bedi, Amrit Singh, Mohamed, Abdirisak, and Huang, Furong
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Economics - Theoretical Economics
Abstract: Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way out of contributing to federated training. In an effort to make free-riding-averse federated mechanisms truthful, and consequently less prone to breaking down in practice, we propose FACT. FACT is the first federated mechanism that: (1) eliminates federated free riding by using a penalty system, (2) ensures agents provide truthful information by creating a competitive environment, and (3) encourages agent participation by offering better performance than training alone. Empirically, FACT avoids free-riding when agents are untruthful, and reduces agent loss by over 4x., Comment: NeurIPS 2024, 19 pages, 7 figures
Published: 2024

18. Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

Author: Gaur, Mudit, Bedi, Amrit Singh, Wang, Di, and Aggarwal, Vaneet
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \textbf{M}arkovian sampling, \textbf{C}ontinuous state-action spaces, the performance of the \textbf{L}ast iterate, and \textbf{G}lobal optimality. These aspects are practically significant and have been largely overlooked in existing theoretical analyses of AC algorithms. In this work, we address these gaps by providing the first comprehensive theoretical analysis of AC algorithms that encompasses all five crucial practical aspects (covers MMCLG criteria). We establish global convergence sample complexity bounds of $\tilde{\mathcal{O}}\left({\epsilon^{-3}}\right)$. We achieve this result through our novel use of the weak gradient domination property of MDP's and our unique analysis of the error in critic estimation., Comment: Accepted at ICML 2024. This is a revised version of arXiv:2306.10486, where we have gone from finite action space to continuous action space, from average iterate convergence to last iterate convergence and from $\epsilon^{-4}$ to $\epsilon^{-3}$ sample complexity. Sample complexity of (Xu et al., 2020a) to $\epsilon^{-3}$. It was incorrectly listed as $\epsilon^{-2}$ in prior versions
Published: 2024

19. PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

Author: Singh, Utsav, Suttle, Wesley A., Sadler, Brian M., Namboodiri, Vinay P., and Bedi, Amrit Singh
Subjects: Computer Science - Machine Learning
Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.
Published: 2024

20. Bayesian modeling of co-occurrence microbial interaction networks

Author: Bedi, Tejasv, Zhu, Bencong, Neugent, Michael L., Lutz, Kevin C., De Nisco, Nicole J., and Li, Qiwei
Subjects: Statistics - Methodology
Abstract: The human body consists of microbiomes associated with the development and prevention of several diseases. These microbial organisms form several complex interactions that are informative to the scientific community for explaining disease progression and prevention. Contrary to the traditional view of the microbiome as a singular, assortative network, we introduce a novel statistical approach using a weighted stochastic infinite block model to analyze the complex community structures within microbial co-occurrence microbial interaction networks. Our model defines connections between microbial taxa using a novel semi-parametric rank-based correlation method on their transformed relative abundances within a fully connected network framework. Employing a Bayesian nonparametric approach, the proposed model effectively clusters taxa into distinct communities while estimating the number of communities. The posterior summary of the taxa community membership is obtained based on the posterior probability matrix, which could naturally solve the label switching problem. Through simulation studies and real-world application to microbiome data from postmenopausal patients with recurrent urinary tract infections, we demonstrate that our method has superior clustering accuracy over alternative approaches. This advancement provides a more nuanced understanding of microbiome organization, with significant implications for disease research., Comment: 25 pages
Published: 2024

21. Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

Author: Patel, Bhrij, Suttle, Wesley A., Koppel, Alec, Aggarwal, Vaneet, Sadler, Brian M., Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Machine Learning
Abstract: In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications. To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte-Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$known from prior work. With a 2D grid world goal-reaching navigation experiment, we demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings., Comment: 26 Pages, 2 Figures
Published: 2024

22. Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals

Author: Dorbala, Vishnu Sashank, Patel, Bhrij, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel approach to tackle the ObjectNav task for non-stationary and potentially occluded targets in an indoor environment. We refer to this task Portable ObjectNav (or P-ObjectNav), and in this work, present its formulation, feasibility, and a navigation benchmark using a novel memory-enhanced LLM-based policy. In contrast to ObjNav where target object locations are fixed for each episode, P-ObjectNav tackles the challenging case where the target objects move during the episode. This adds a layer of time-sensitivity to navigation, and is particularly relevant in scenarios where the agent needs to find portable targets (e.g. misplaced wallets) in human-centric environments. The agent needs to estimate not just the correct location of the target, but also the time at which the target is at that location for visual grounding -- raising the question about the feasibility of the task. We address this concern by inferring results on two cases for object placement: one where the objects placed follow a routine or a path, and the other where they are placed at random. We dynamize Matterport3D for these experiments, and modify PPO and LLM-based navigation policies for evaluation. Using PPO, we observe that agent performance in the random case stagnates, while the agent in the routine-following environment continues to improve, allowing us to infer that P-ObjectNav is solvable in environments with routine-following object placement. Using memory-enhancement on an LLM-based policy, we set a benchmark for P-ObjectNav. Our memory-enhanced agent significantly outperforms their non-memory-based counterparts across object placement scenarios by 71.76% and 74.68% on average when measured by Success Rate (SR) and Success Rate weighted by Path Length (SRPL), showing the influence of memory on improving P-ObjectNav performance. Our code and dataset will be made publicly available., Comment: 32
Published: 2024

23. Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Author: Yu, Peihong, Mishra, Manav, Koppel, Alec, Busart, Carl, Narayan, Priya, Manocha, Dinesh, Bedi, Amrit, and Tokekar, Pratap
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of policy behavior with demonstrations, and the second regulates incentives based on whether the behavior leads to the desired objective. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations, and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL's capability to leverage joint demonstrations in the StarCraft scenario and converge effectively even with demonstrations from non-co-trained policies.
Published: 2024

24. Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

Author: Wu, Xiyang, Chakraborty, Souradip, Xian, Ruiqi, Liang, Jing, Guan, Tianrui, Liu, Fuxiao, Sadler, Brian M., Manocha, Dinesh, and Bedi, Amrit Singh
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplored yet extremely critical. LLMs and VLMs are highly susceptible to adversarial inputs, prompting a significant inquiry into the safety of robotic systems. This concern is important because robotics operate in the physical world where erroneous actions can result in severe consequences. This paper explores this issue thoroughly, presenting a mathematical formulation of potential attacks on LLM/VLM-based robotic systems and offering experimental evidence of the safety challenges. Our empirical findings highlight a significant vulnerability: simple modifications to the input can drastically reduce system effectiveness. Specifically, our results demonstrate an average performance deterioration of 19.4% under minor input prompt modifications and a more alarming 29.1% under slight perceptual changes. These findings underscore the urgent need for robust countermeasures to ensure the safe and reliable deployment of advanced LLM/VLM-based robotic systems.
Published: 2024

25. Small instanton-induced flavor invariants and the axion potential

Author: Bedi, Ravneet, Gherghetta, Tony, Grojean, Christophe, Guedes, Guilherme, Kley, Jonathan, and Vuong, Pham Ngoc Hoa
Subjects: High Energy Physics - Phenomenology
Abstract: Small instantons which increase the axion mass due to an appropriate modification of QCD at a UV scale $\Lambda_{\rm SI}$, can also enhance the effect of CP-violating operators to shift the axion potential minimum by an amount, $\theta_{\rm ind}$, proportional to the flavorful couplings in the SMEFT. Since physical observables must be flavor basis independent, we construct a basis of determinant-like flavor invariants that arise from instanton calculations containing the effects of dimension-six CP-odd operators at the scale $\require{cancel}\Lambda_{\cancel{\rm CP}}$. This new basis provides a more reliable estimate of the shift $\theta_{\rm ind}$, that is severely constrained by neutron electric dipole moment experiments. In particular, for the case of four-quark, semi-leptonic and gluon dipole operators, these invariants are then used to provide improved limits on the ratio of scales $\require{cancel}\Lambda_{\rm SI}/\Lambda_{\cancel{\rm CP}}$ for different flavor scenarios. The CP-odd flavor invariants also provide a classification of the leading effects from Wilson coefficients, and as an example, we show that a semi-leptonic four-fermion operator is subdominant compared to the four-quark operators. More generally, the flavor invariants, together with an instanton NDA, can be used to more accurately estimate small instanton effects in the axion potential that arise from any SMEFT operator., Comment: 49 pages, 4 figures; matches version published in JHEP
Published: 2024

26. MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

Author: Chakraborty, Souradip, Qiu, Jiahao, Yuan, Hui, Koppel, Alec, Huang, Furong, Manocha, Dinesh, Bedi, Amrit Singh, and Wang, Mengdi
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. To provide an equitable solution to the problem, we learn a mixture of preference distributions via an expectation-maximization algorithm and propose a MaxMin alignment objective for policy learning inspired by the Egalitarian principle in social choice theory to better represent diverse human preferences. We elucidate the connection of our proposed approach to distributionally robust optimization and general utility RL, thereby highlighting the generality and robustness of our proposed solution. We present comprehensive experimental results on small-scale (GPT-2) and large-scale language models (with Tulu2-7B) and show the efficacy of the proposed approach in the presence of diversity among human preferences. Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms and improves the win-rate (accuracy) for minority groups by over 33% without compromising the performance of majority groups, showcasing the robustness and fairness of our approach. We remark that our findings in this work are not only limited to language models but also extend to reinforcement learning in general.
Published: 2024

27. Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

Author: Sun, Xingpeng, Meng, Haoming, Chakraborty, Souradip, Bedi, Amrit Singh, and Bera, Aniket
Subjects: Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present \emph{Beyond Text}; an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations.This approach not only achieves a 70.26\% winning rate, outperforming existing LLMs by 22.16\% to 48.30\% (gemini-1.5-pro and gpt-3.5 respectively), but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44\% less decrease ratio than the text-only language model in winning rate. ``\textit{Beyond Text}'' marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models., Comment: 28 pages, 7 figures
Published: 2024

28. REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback

Author: Chakraborty, Souradip, Singh, Anukriti, Bhaskar, Amisha, Tokekar, Pratap, Manocha, Dinesh, and Bedi, Amrit Singh
Subjects: Computer Science - Robotics, Computer Science - Machine Learning
Abstract: The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is heavily dependent on the design of the underlying reward function. However, a misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences; however, they inadvertently introduce a risk of reward overoptimization. In this work, we address this challenge by advocating for the adoption of regularized reward functions that more accurately mirror the intended behaviors. We propose a novel concept of reward regularization within the robotic RLHF (RL from Human Feedback) framework, which we refer to as \emph{agent preferences}. Our approach uniquely incorporates not just human feedback in the form of preferences but also considers the preferences of the RL agent itself during the reward function learning process. This dual consideration significantly mitigates the issue of reward function overoptimization in RL. We provide a theoretical justification for the proposed approach by formulating the robotic RLHF problem as a bilevel optimization problem. We demonstrate the efficiency of our algorithm {\ours} in several continuous control benchmarks including DeepMind Control Suite \cite{tassa2018deepmind} and MetaWorld \cite{yu2021metaworld} and high dimensional visual environments, with an improvement of more than 70\% in sample efficiency in comparison to current SOTA baselines. This showcases our approach's effectiveness in aligning reward functions with true behavioral intentions, setting a new benchmark in the field.
Published: 2023

29. Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

Author: Ghosal, Soumya Suvra, Chakraborty, Souradip, Geiping, Jonas, Huang, Furong, Manocha, Dinesh, and Bedi, Amrit Singh
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contaminating the web. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text. The basic idea is that whenever we can tell if the given text is either written by a human or an AI, we can utilize this information to address the above-mentioned concerns. To that end, a plethora of detection frameworks have been proposed, highlighting the possibilities of AI-generated text detection. But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i.e., focusing on the impossibilities of AI-generated text detection. This is a crucial step in order to make sure the detection frameworks are robust enough and it is not too easy to fool a detector. Despite the huge interest and the flurry of research in this domain, the community currently lacks a comprehensive analysis of recent developments. In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and the limitations of AI-generated text detection. To enrich the collective knowledge, we engage in an exhaustive discussion on critical and challenging open questions related to ongoing research on AI-generated text detection.
Published: 2023

30. Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution

Author: Bornstein, Marco, Bedi, Amrit Singh, Sahu, Anit Kumar, Khan, Furqan, and Huang, Furong
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Computers and Society, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Economics - Theoretical Economics
Abstract: Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we propose RealFM: the first federated mechanism that (1) realistically models device utility, (2) incentivizes data contribution and device participation, (3) provably removes the free-rider dilemma, and (4) relaxes assumptions on data homogeneity and data sharing. Compared to previous FL mechanisms, RealFM allows for a non-linear relationship between model accuracy and utility, which improves the utility gained by the server and participating devices. On real-world data, RealFM improves device and server utility, as well as data contribution, by over 3 and 4 magnitudes respectively compared to baselines., Comment: 24 pages, 11 figures
Published: 2023

31. Minimal surfaces over harmonic shears

Author: Bedi, Simran and Kumar, Sanjay
Subjects: Mathematics - Complex Variables
Abstract: Harmonic mappings have long intrigued researchers due to their intrinsic connection with minimal surfaces. In this paper, we investigate shearing of two distinct classes of univalent conformal mappings which are convex in horizontal direction with appropriate dilatations. Subsequently, we present a family of minimal surfaces constructed by lifting the harmonic mappings obtained through shear construction method given by Clunie-Sheil. Furthermore, we contribute to addressing an open problem partially, proposed by Boyd and Dorff, by identifying the resulting minimal surfaces for certain values of the parameters in one of the classes of mappings. Notably, this family of minimal surfaces transforms from the well-established Enneper's surface to a Helicoid., Comment: 17 pages, 6 figures
Published: 2023

32. LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

Author: Shek, Chak Lam, Wu, Xiyang, Suttle, Wesley A., Busart, Carl, Zaroukian, Erin, Manocha, Dinesh, Tokekar, Pratap, and Bedi, Amrit Singh
Subjects: Computer Science - Robotics
Abstract: Navigating robots through unstructured terrains is challenging, primarily due to the dynamic environmental changes. While humans adeptly navigate such terrains by using context from their observations, creating a similar context-aware navigation system for robots is difficult. The essence of the issue lies in the acquisition and interpretation of context information, a task complicated by the inherent ambiguity of human language. In this work, we introduce LANCAR, which addresses this issue by combining a context translator with reinforcement learning (RL) agents for context-aware locomotion. LANCAR allows robots to comprehend context information through Large Language Models (LLMs) sourced from human observers and convert this information into actionable context embeddings. These embeddings, combined with the robot's sensor data, provide a complete input for the RL agent's policy network. We provide an extensive evaluation of LANCAR under different levels of context ambiguity and compare with alternative methods. The experimental results showcase the superior generalizability and adaptability across different terrains. Notably, LANCAR shows at least a 7.4% increase in episodic reward over the best alternatives, highlighting its potential to enhance robotic navigation in unstructured environments. More details and experiment videos could be found in http://raaslab.org/projects/LLM_Context_Estimation/
Published: 2023

33. A generalized Bayesian stochastic block model for microbiome community detection

Author: Lutz, Kevin C., Neugent, Michael L., Bedi, Tejasv, De Nisco, Nicole J., and Li, Qiwei
Subjects: Statistics - Methodology, Statistics - Applications, Statistics - Computation, Statistics - Other Statistics
Abstract: Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the microbiome study. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility. Taxonomic abundance data generated from metagenomic sequencing technologies are high-dimensional and compositional, suffering from uneven sampling depth, over-dispersion, and zero-inflation. These characteristics often challenge the reliability of the current methods for microbiome community detection. To this end, we propose a Bayesian stochastic block model to study the microbiome co-occurrence network based on the recently developed modified centered-log ratio transformation tailored for microbiome data analysis. Our model allows us to incorporate taxonomic tree information using a Markov random field prior. The model parameters are jointly inferred by using Markov chain Monte Carlo sampling techniques. Our simulation study showed that the proposed approach performs better than competing methods even when taxonomic tree information is non-informative. We applied our approach to a real urinary microbiome dataset from postmenopausal women, the first time the urinary microbiome co-occurrence network structure has been studied. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies.
Published: 2023

34. PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

Author: Chakraborty, Souradip, Bedi, Amrit Singh, Koppel, Alec, Manocha, Dinesh, Wang, Huazheng, Wang, Mengdi, and Huang, Furong
Subjects: Computer Science - Machine Learning
Abstract: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. We identify a major gap within current algorithmic designs for solving policy alignment due to a lack of precise characterization of the dependence of the alignment objective on the data generated by policy trajectories. This shortfall contributes to the sub-optimal performance observed in contemporary algorithms. Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable (optimal policy for the designed reward). Interestingly, from an optimization perspective, our formulation leads to a new class of stochastic bilevel problems where the stochasticity at the upper objective depends upon the lower-level variable. {True to our best knowledge, this work presents the first formulation of the RLHF as a bilevel optimization problem which generalizes the existing RLHF formulations and addresses the existing distribution shift issues in RLHF formulations.} To demonstrate the efficacy of our formulation in resolving alignment issues in RL, we devised an algorithm named \textsf{A-PARL} to solve PARL problem, establishing sample complexity bounds of order $\mathcal{O}(1/T)$. Our empirical results substantiate that the proposed \textsf{PARL} can address the alignment concerns in RL by showing significant improvements (up to 63\% in terms of required samples) for policy alignment in large-scale environments of the Deepmind control suite and Meta world tasks.
Published: 2023

35. On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

Author: Gaur, Mudit, Bedi, Amrit Singh, Wang, Di, and Aggarwal, Vaneet
Subjects: Computer Science - Machine Learning, F.2.1
Abstract: Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the $Q$-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of $\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$. In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP., Comment: arXiv admin note: text overlap with arXiv:2211.07675
Published: 2023

36. iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

Author: Wu, Xiyang, Chandra, Rohan, Guan, Tianrui, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Multiagent Systems, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations. We model two distinct incentives for agents' strategies: Behavioral Incentive for high-level decision-making based on their driving behavior or personality and Instant Incentive for motion planning for collision avoidance based on the current traffic state. Our approach enables agents to infer their opponents' behavior incentives and integrate this inferred information into their decision-making and motion-planning processes. We perform experiments on two simulation environments, Non-Cooperative Navigation and Heterogeneous Highway. In Heterogeneous Highway, results show that, compared with centralized training decentralized execution (CTDE) MARL baselines such as QMIX and MAPPO, our method yields a 4.3% and 38.4% higher episodic reward in mild and chaotic traffic, with 48.1% higher success rate and 80.6% longer survival time in chaotic traffic. We also compare with a decentralized training decentralized execution (DTDE) baseline IPPO and demonstrate a higher episodic reward of 12.7% and 6.3% in mild traffic and chaotic traffic, 25.3% higher success rate, and 13.7% longer survival time.
Published: 2023

37. CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration

Author: Patel, Bhrij, Weerakoon, Kasun, Suttle, Wesley A., Koppel, Alec, Sadler, Brian M., Zhou, Tianyi, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We introduce Confidence-Controlled Exploration (CCE), a novel exploration scheme designed to enhance the training sample efficiency of reinforcement learning (RL) algorithms for sparse reward settings such as robot navigation. Sparse rewards are common in RL and convenient to design and implement, but typically hard to deal with due to the challenges of exploration. Existing methods deploy regularization-based methods to deal with the exploration challenges. However, it is hard to characterize the balance between exploration and exploitation because regularization modifies the reward function itself, hence changing the objective we are optimizing for. In contrast to regularization-based approaches in the existing literature, our approach, CCE, is based on a novel relationship we provide between gradient estimation and policy entropy. CCE dynamically adjusts the number of samples of the gradient update used during training to control exploration. Interestingly, CCE can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC) for goal-reaching robotic navigation tasks. We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization when constraining the sample budget. For a fixed sample budget, CCE achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of CCE by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments., Comment: 11 pages, 9 figures, 2 tables
Published: 2023

38. Bayesian Segmentation Modeling of Epidemic Growth

Author: Bedi, Tejasv, Xu, Yanxun, and Li, Qiwei
Subjects: Statistics - Methodology, Physics - Physics and Society
Abstract: Tracking the spread of infectious disease during a pandemic has posed a great challenge to the governments and health sectors on a global scale. To facilitate informed public health decision-making, the concerned parties usually rely on short-term daily and weekly projections generated via predictive modeling. Several deterministic and stochastic epidemiological models, including growth and compartmental models, have been proposed in the literature. These models assume that an epidemic would last over a short duration and the observed cases/deaths would attain a single peak. However, some infectious diseases, such as COVID-19, extend over a longer duration than expected. Moreover, time-varying disease transmission rates due to government interventions have made the observed data multi-modal. To address these challenges, this work proposes stochastic epidemiological models under a unified Bayesian framework augmented by a change-point detection mechanism to account for multiple peaks. The Bayesian framework allows us to incorporate prior knowledge, such as dates of influential policy changes, to predict the change-point locations precisely. We develop a trans-dimensional reversible jump Markov chain Monte Carlo algorithm to sample the posterior distributions of epidemiological parameters while estimating the number of change points and the resulting parameters. The proposed method is evaluated and compared to alternative methods in terms of change-point detection, parameter estimation, and long-term forecasting accuracy on both simulated and COVID-19 data of several major states in the United States.
Published: 2023

39. Image-based Indian Sign Language Recognition: A Practical Review using Deep Neural Networks

Author: K, Mallikharjuna Rao, Kaur, Harleen, Bedi, Sanjam Kaur, and Lekhana, M A
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: People with vocal and hearing disabilities use sign language to express themselves using visual gestures and signs. Although sign language is a solution for communication difficulties faced by deaf people, there are still problems as most of the general population cannot understand this language, creating a communication barrier, especially in places such as banks, airports, supermarkets, etc. [1]. A sign language recognition(SLR) system is a must to solve this problem. The main focus of this model is to develop a real-time word-level sign language recognition system that would translate sign language to text. Much research has been done on ASL(American sign language). Thus, we have worked on ISL(Indian sign language) to cater to the needs of the deaf and hard-of-hearing community of India[2]. In this research, we provide an Indian Sign Language-based Sign Language recognition system. For this analysis, the user must be able to take pictures of hand movements using a web camera, and the system must anticipate and display the name of the taken picture. The acquired image goes through several processing phases, some of which use computer vision techniques, including grayscale conversion, dilatation, and masking. Our model is trained using a convolutional neural network (CNN), which is then utilized to recognize the images. Our best model has a 99% accuracy rate[3]., Comment: 14 pages, 20 figures, 1 table
Published: 2023

40. On the Possibilities of AI-Generated Text Detection

Author: Chakraborty, Souradip, Bedi, Amrit Singh, Zhu, Sicheng, An, Bang, Manocha, Dinesh, and Huang, Furong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications. Despite ongoing debate about the feasibility of such differentiation, we present evidence supporting its consistent achievability, except when human and machine text distributions are indistinguishable across their entire support. Drawing from information theory, we argue that as machine-generated text approximates human-like quality, the sample size needed for detection increases. We establish precise sample complexity bounds for detecting AI-generated text, laying groundwork for future research aimed at developing advanced, multi-sample detectors. Our empirical evaluations across multiple datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) confirm the viability of enhanced detection methods. We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero. Our findings align with OpenAI's empirical data related to sequence length, marking the first theoretical substantiation for these observations.
Published: 2023

41. RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback

Author: Chakraborty, Souradip, Weerakoon, Kasun, Poddar, Prithvi, Elnoor, Mohamed, Narayanan, Priya, Busart, Carl, Tokekar, Pratap, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (REquest help and MOVE on) to adapt already trained policy to real-time changes in the environment without re-training via utilizing a language-based feedback. The proposed approach essentially boils down to addressing two main challenges of (1) when to ask for feedback and, if received, (2) how to incorporate feedback into trained policies. RE-MOVE incorporates an epistemic uncertainty-based framework to determine the optimal time to request instructions-based feedback. For the second challenge, we employ a zero-shot learning natural language processing (NLP) paradigm with efficient, prompt design and leverage state-of-the-art GPT-3.5, Llama-2 language models. To show the efficacy of the proposed approach, we performed extensive synthetic and real-world evaluations in several test-time dynamic navigation scenarios. Utilizing RE-MOVE result in up to 80% enhancement in the attainment of successful goals, coupled with a reduction of 13.50% in the normalized trajectory length, as compared to alternative approaches, particularly in demanding real-world environments with perceptual challenges.
Published: 2023

42. Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

Author: Suttle, Wesley A., Bedi, Amrit Singh, Patel, Bhrij, Sadler, Brian M., Koppel, Alec, and Manocha, Dinesh
Subjects: Computer Science - Machine Learning, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, making the step size inoperable. In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. This method, which we call \textbf{M}ulti-level \textbf{A}ctor-\textbf{C}ritic (MAC), is developed especially for infinite-horizon average-reward settings and neither relies on oracle knowledge of the mixing time in its parameter selection nor assumes its exponential decay; it, therefore, is readily applicable to applications with slower mixing times. Nonetheless, it achieves a convergence rate comparable to the state-of-the-art AC algorithms. We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
Published: 2023

43. STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

Author: Chakraborty, Souradip, Bedi, Amrit Singh, Koppel, Alec, Wang, Mengdi, Huang, Furong, and Manocha, Dinesh
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm \algo: \textbf{STE}in information dir\textbf{E}cted exploration for model-based \textbf{R}einforcement Learn\textbf{ING}. To enable its derivation, we develop fundamentally new variants of KSD for discrete conditional distributions. {We further establish that {\algo} archives sublinear Bayesian regret, improving upon prior learning rates of information-augmented MBRL.} Experimentally, we show that the proposed algorithm is computationally affordable and outperforms several prior approaches.
Published: 2023

44. SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Author: Bornstein, Marco, Rabbani, Tahseen, Wang, Evan, Bedi, Amrit Singh, and Huang, Furong
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT's run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms., Comment: 30 pages, 9 figures
Published: 2022

45. DMCA: Dense Multi-agent Navigation using Attention and Communication

Author: Arul, Senthil Hariharan, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics
Abstract: In decentralized multi-robot navigation, ensuring safe and efficient movement with limited environmental awareness remains a challenge. While robots traditionally navigate based on local observations, this approach falters in complex environments. A possible solution is to enhance understanding of the world through inter-agent communication, but mere information broadcasting falls short in efficiency. In this work, we address this problem by simultaneously learning decentralized multi-robot collision avoidance and selective inter-agent communication. We use a multi-head self-attention mechanism that encodes observable information from neighboring robots into a concise and fixed-length observation vector, thereby handling varying numbers of neighbors. Our method focuses on improving navigation performance through selective communication. We cast the communication selection as a link prediction problem, where the network determines the necessity of establishing a communication link with a specific neighbor based on the observable state information. The communicated information enhances the neighbor's observation and aids in selecting an appropriate navigation plan. By training the network end-to-end, we concurrently learn the optimal weights for the observation encoder, communication selection, and navigation components. We showcase the benefits of our approach by achieving safe and efficient navigation among multiple robots, even in dense and challenging environments. Comparative evaluations against various learning-based and model-based baselines demonstrate our superior navigation performance, resulting in an impressive improvement of up to 24% in success rate within complex evaluation scenarios.
Published: 2022

46. RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

Author: Agrawal, Aakriti, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics, Computer Science - Multiagent Systems
Abstract: We present a novel reinforcement learning based algorithm for multi-robot task allocation problem in warehouse environments. We formulate it as a Markov Decision Process and solve via a novel deep multi-agent reinforcement learning method (called RTAW) with attention inspired policy architecture. Hence, our proposed policy network uses global embeddings that are independent of the number of robots/tasks. We utilize proximal policy optimization algorithm for training and use a carefully designed reward to obtain a converged policy. The converged policy ensures cooperation among different robots to minimize total travel delay (TTD) which ultimately improves the makespan for a sufficiently large task-list. In our extensive experiments, we compare the performance of our RTAW algorithm to state of the art methods such as myopic pickup distance minimization (greedy) and regret based baselines on different navigation schemes. We show an improvement of upto 14% (25-1000 seconds) in TTD on scenarios with hundreds or thousands of tasks for different challenging warehouse layouts and task generation schemes. We also demonstrate the scalability of our approach by showing performance with up to $1000$ robots in simulations.
Published: 2022

47. DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments

Author: Agrawal, Aakriti, Hariharan, Senthil, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD). At the lower level, we use a decentralized navigation scheme based on ORCA that enables each robot to perform these tasks in an independent manner, and avoid collisions with other robots and dynamic obstacles. We combine these lower and upper levels by defining rewards for the higher level as the feedback from the lower level navigation algorithm. We perform extensive evaluation in complex warehouse layouts with large number of agents and highlight the benefits over state-of-the-art algorithms based on myopic pickup distance minimization and regret-based task selection. We observe improvement up to 14% in terms of task completion time and up-to 40% improvement in terms of computing collision-free trajectories for the robots.
Published: 2022

48. Predicting Future Mosquito Larval Habitats Using Time Series Climate Forecasting and Deep Learning

Author: Sun, Christopher, Nimbalkar, Jay, and Bedi, Ravnoor
Subjects: Computer Science - Machine Learning
Abstract: Mosquito habitat ranges are projected to expand due to climate change. This investigation aims to identify future mosquito habitats by analyzing preferred ecological conditions of mosquito larvae. After assembling a data set with atmospheric records and larvae observations, a neural network is trained to predict larvae counts from ecological inputs. Time series forecasting is conducted on these variables and climate projections are passed into the initial deep learning model to generate location-specific larvae abundance predictions. The results support the notion of regional ecosystem-driven changes in mosquito spread, with high-elevation regions in particular experiencing an increase in susceptibility to mosquito infestation., Comment: 2022 MIT IEEE Undergraduate Research Technology Conference
Published: 2022

49. HTRON:Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm

Author: Weerakoon, Kasun, Chakraborty, Souradip, Karapetyan, Nare, Sathyamoorthy, Adarsh Jagan, Bedi, Amrit Singh, and Manocha, Dinesh
Subjects: Computer Science - Robotics
Abstract: We present a novel approach to improve the performance of deep reinforcement learning (DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on carefully designed dense reward functions that learn the efficient behavior in an environment. We circumvent this issue by working only with sparse rewards (which are easy to design), and propose a novel adaptive Heavy-Tailed Reinforce algorithm for Outdoor Navigation called HTRON. Our main idea is to utilize heavy-tailed policy parametrizations which implicitly induce exploration in sparse reward settings. We evaluate the performance of HTRON against Reinforce, PPO and TRPO algorithms in three different outdoor scenarios: goal-reaching, obstacle avoidance, and uneven terrain navigation. We observe in average an increase of 34.41% in terms of success rate, a 15.15% decrease in the average time steps taken to reach the goal, and a 24.9% decrease in the elevation cost compared to the navigation policies obtained by the other methods. Further, we demonstrate that our algorithm can be transferred directly into a Clearpath Husky robot to perform outdoor terrain navigation in real-world scenarios.
Published: 2022

50. FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

Author: Bedi, Amrit Singh, Fan, Chen, Koppel, Alec, Sahu, Anit Kumar, Sadler, Brian M., Huang, Furong, and Manocha, Dinesh
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing, Mathematics - Optimization and Control
Abstract: In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considering the Lagrangian relaxation of this problem, we develop a novel primal-dual method called Federated Learning Beyond Consensus (\texttt{FedBC}). Theoretically, we establish that \texttt{FedBC} converges to a first-order stationary point at rates that matches the state of the art, up to an additional error term that depends on a tolerance parameter introduced to scalarize the multi-criterion formulation. Finally, we demonstrate that \texttt{FedBC} balances the global and local model test accuracy metrics across a suite of datasets (Synthetic, MNIST, CIFAR-10, Shakespeare), achieving competitive performance with state-of-the-art.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

115 results on '"Bedi, P"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources