Author: "Kalwar, Durgesh" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Kalwar, Durgesh"' showing total 6 results

Start Over Author "Kalwar, Durgesh"

6 results on '"Kalwar, Durgesh"'

1. Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning

Author: Bhambri, Siddhant, Bhattacharjee, Amrita, Kalwar, Durgesh, Guan, Lin, Liu, Huan, and Kambhampati, Subbarao
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Reinforcement Learning (RL) suffers from sample inefficiency in sparse reward domains, and the problem is further pronounced in case of stochastic transitions. To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster. However, designing a useful reward shaping function for all desirable states in the Markov Decision Process (MDP) is challenging, even for domain experts. Given that Large Language Models (LLMs) have demonstrated impressive performance across a magnitude of natural language tasks, we aim to answer the following question: `Can we obtain heuristics using LLMs for constructing a reward shaping function that can boost an RL agent's sample efficiency?' To this end, we aim to leverage off-the-shelf LLMs to generate a plan for an abstraction of the underlying MDP. We further use this LLM-generated plan as a heuristic to construct the reward shaping signal for the downstream RL agent. By characterizing the type of abstraction based on the MDP horizon length, we analyze the quality of heuristics when generated using an LLM, with and without a verifier in the loop. Our experiments across multiple domains with varying horizon length and number of sub-goals from the BabyAI environment suite, Household, Mario, and, Minecraft domain, show 1) the advantages and limitations of querying LLMs with and without a verifier to generate a reward shaping heuristic, and, 2) a significant improvement in the sample efficiency of PPO, A2C, and Q-learning when guided by the LLM-generated heuristics.
Published: 2024

2. Using General Value Functions to Learn Domain-Backed Inventory Management Policies

Author: Kalwar, Durgesh, Shelke, Omkar, and Khadilkar, Harshad
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control
Abstract: We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store. We propose a reinforcement learning (RL) approach that utilises General Value Functions (GVFs) to derive domain-backed inventory replenishment policies. The inventory replenishment decisions are modelled as a sequential decision making problem, which is challenging due to uncertain demand and the existence of aggregate (cross-product) constraints. In existing literature, GVFs have primarily been used for auxiliary task learning. We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. Using this domain expertise for more effective exploration, we train an RL agent to compute the inventory replenishment quantities for a large range of products (up to 6000 in the reported experiments), which share aggregate constraints such as the total weight/volume per delivery. Additionally, we show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent. Finally, since the environment dynamics are fully transferred, the trained GVFs can be used for faster adaptation to vastly different business objectives (for example, due to the start of a promotional period or due to deployment in a new customer environment).
Published: 2023

3. Safe Sequential Optimization for Switching Environments

Author: Kalwar, Durgesh and S, Vineeth B.
Subjects: Mathematics - Optimization and Control, Computer Science - Artificial Intelligence
Abstract: We consider the problem of designing a sequential decision making agent to maximize an unknown time-varying function which switches with time. At each step, the agent receives an observation of the function's value at a point decided by the agent. The observation could be corrupted by noise. The agent is also constrained to take safe decisions with high probability, i.e., the chosen points should have a function value greater than a threshold. For this switching environment, we propose a policy called Adaptive-SafeOpt and evaluate its performance via simulations. The policy incorporates Bayesian optimization and change point detection for the safe sequential optimization problem. We observe that a major challenge in adapting to the switching change is to identify safe decisions when the change point is detected and prevent attraction to local optima.
Published: 2023

4. Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Author: Kalwar, Durgesh, Shelke, Omkar, Nath, Somjit, Meisheri, Hardik, and Khadilkar, Harshad
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorporated where the reward is sparse. However, few studies have attempted to tackle both large scale and reward sparsity at the same time. This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We present a way to learn value functions which can be used to sample actions and provide directed exploration. Experiments on navigation tasks with varying grid sizes demonstrate the performance advantages over several competitive baselines.
Published: 2022

5. Guiding Offline Reinforcement Learning Using a Safety Expert

Author: Verma, Richa, primary, Kalwar, Durgesh, additional, Khadilkar, Harshad, additional, and Ravindran, Balaraman, additional
Published: 2024
Full Text: View/download PDF

6. Safe Sequential Optimization in Switching Environments

Author: Kalwar, Durgesh, primary and Sukumaran, Vineeth Bala, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Kalwar, Durgesh"'

1. Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning

2. Using General Value Functions to Learn Domain-Backed Inventory Management Policies

3. Safe Sequential Optimization for Switching Environments

4. Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

5. Guiding Offline Reinforcement Learning Using a Safety Expert

6. Safe Sequential Optimization in Switching Environments

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

6 results on '"Kalwar, Durgesh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources