Author: "Zhao, Xufeng" / Topic: computer science - robotics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhao, Xufeng"' showing total 8 results

Start Over Author "Zhao, Xufeng" Topic computer science - robotics

8 results on '"Zhao, Xufeng"'

1. Mental Modeling of Reinforcement Learning Agents by Language Models

Author: Lu, Wenhao, Zhao, Xufeng, Spisak, Josua, Lee, Jae Hee, and Wermter, Stefan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Robotics
Abstract: Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical world. This study empirically examines, for the first time, how well large language models (LLMs) can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research may unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in eXplainable reinforcement learning (XRL). To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully mental modelling agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs., Comment: https://lukaswill.github.io/
Published: 2024

2. Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

Author: Sun, Xiaowen, Zhao, Xufeng, Lee, Jae Hee, Lu, Wenhao, Kerzel, Matthias, and Wermter, Stefan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Robotics
Abstract: The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our knowledge, there is hardly any investigation on whether LLMs or VLMs can also generate object state-sensitive plans. To study this, we introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks. We propose two methods for OSSA: (i) a modular model consisting of a pre-trained vision processing module (dense captioning model, DCM) and a natural language processing model (LLM), and (ii) a monolithic model consisting only of a VLM. To quantitatively evaluate the performances of the two methods, we use tabletop scenarios where the task is to clear the table. We contribute a multimodal benchmark dataset that takes object states into consideration. Our results show that both methods can be used for object state-sensitive tasks, but the monolithic approach outperforms the modular approach. The code for OSSA is available at https://github.com/Xiao-wen-Sun/OSSA, Comment: ICANN24, Switzerland
Published: 2024

3. Agentic Skill Discovery

Author: Zhao, Xufeng, Weber, Cornelius, and Wermter, Stefan
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Language-conditioned robotic skills make it possible to apply the high-level reasoning of Large Language Models (LLMs) to low-level robotic control. A remaining challenge is to acquire a diverse set of fundamental skills. Existing approaches either manually decompose a complex task into atomic robotic actions in a top-down fashion, or bootstrap as many combinations as possible in a bottom-up fashion to cover a wider range of task possibilities. These decompositions or combinations, however, require an initial skill library. For example, a ``grasping'' capability can never emerge from a skill library containing only diverse ``pushing'' skills. Existing skill discovery techniques with reinforcement learning acquire skills by an exhaustive exploration but often yield non-meaningful behaviors. In this study, we introduce a novel framework for skill discovery that is entirely driven by LLMs. The framework begins with an LLM generating task proposals based on the provided scene description and the robot's configurations, aiming to incrementally acquire new skills upon task completion. For each proposed task, a series of reinforcement learning processes are initiated, utilizing reward and success determination functions sampled by the LLM to develop the corresponding policy. The reliability and trustworthiness of learned behaviors are further ensured by an independent vision-language model. We show that starting with zero skill, the skill library emerges and expands to more and more meaningful and reliable skills, enabling the robot to efficiently further propose and complete advanced tasks. Project page: \url{https://agentic-skill-discovery.github.io}., Comment: Webpage see https://agentic-skill-discovery.github.io/
Published: 2024

4. Large Language Models for Orchestrating Bimanual Robots

Author: Chu, Kun, Zhao, Xufeng, Weber, Cornelius, Li, Mengdi, Lu, Wenhao, and Wermter, Stefan
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Although there has been rapid progress in endowing robots with the ability to solve complex manipulation tasks, generating control policies for bimanual robots to solve tasks involving two hands is still challenging because of the difficulties in effective temporal and spatial coordination. With emergent abilities in terms of step-by-step reasoning and in-context learning, Large Language Models (LLMs) have demonstrated promising potential in a variety of robotic tasks. However, the nature of language communication via a single sequence of discrete symbols makes LLM-based coordination in continuous space a particular challenge for bimanual tasks. To tackle this challenge, we present LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot. Our results demonstrate that our method outperforms the baseline in terms of success rate. Additionally, we thoroughly analyze failure cases, offering insights into LLM-based approaches in bimanual robotic control and revealing future research trends. The project website can be found at http://labor-agent.github.io., Comment: Accepted in Humanoids 2024. The project website can be found at http://labor-agent.github.io
Published: 2024

5. Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Author: Chu, Kun, Zhao, Xufeng, Weber, Cornelius, Li, Mengdi, and Wermter, Stefan
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM., Comment: CoRL 2023 Workshop (oral)
Published: 2023

6. A Closer Look at Reward Decomposition for High-Level Robotic Explanations

Author: Lu, Wenhao, Zhao, Xufeng, Magg, Sven, Gromniak, Martin, Li, Mengdi, and Wermter, Stefan
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: Explaining the behaviour of intelligent agents learned by reinforcement learning (RL) to humans is challenging yet crucial due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for RL agents can be ambiguous as they fail to account for the agent's future behaviour at each transition, adding to the complexity of explaining robot actions. By leveraging abstracted actions that map to task-specific primitives, we avoid explanations on the movement level. To further improve the transparency and explainability of robotic systems, we propose an explainable Q-Map learning framework that combines reward decomposition (RD) with abstracted action spaces, allowing for non-ambiguous and high-level explanations based on object properties in the task. We demonstrate the effectiveness of our framework through quantitative and qualitative analysis of two robotic scenarios, showcasing visual and textual explanations, from output artefacts of RD explanations, that are easy for humans to comprehend. Additionally, we demonstrate the versatility of integrating these artefacts with large language models (LLMs) for reasoning and interactive querying., Comment: https://lukaswill.github.io
Published: 2023

7. Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Author: Zhao, Xufeng, Li, Mengdi, Weber, Cornelius, Hafez, Muhammad Burhan, and Wermter, Stefan
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Programming robot behavior in a complex world faces challenges on multiple levels, from dextrous low-level skills to high-level planning and reasoning. Recent pre-trained Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning. However, it remains challenging to ground LLMs in multimodal sensory input and continuous action output, while enabling a robot to interact with its environment and acquire novel information as its policies unfold. We develop a robot interaction scenario with a partially observable state, which necessitates a robot to decide on a range of epistemic actions in order to sample sensory information among multiple modalities, before being able to execute the task correctly. Matcha (Multimodal environment chatting) agent, an interactive perception framework, is therefore proposed with an LLM as its backbone, whose ability is exploited to instruct epistemic actions and to reason over the resulting multimodal sensations (vision, sound, haptics, proprioception), as well as to plan an entire task execution based on the interactively acquired information. Our study demonstrates that LLMs can provide high-level planning and reasoning skills and control interactive robot behavior in a multimodal environment, while multimodal modules with the context of the environmental state help ground the LLMs and extend their processing ability. The project website can be found at https://matcha-agent.github.io., Comment: IROS2023, Detroit. See the project website at https://matcha-agent.github.io
Published: 2023

8. Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations

Author: Zhao, Xufeng, Weber, Cornelius, Hafez, Muhammad Burhan, and Wermter, Stefan
Subjects: Computer Science - Robotics, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Sound is one of the most informative and abundant modalities in the real world while being robust to sense without contacts by small and cheap sensors that can be placed on mobile devices. Although deep learning is capable of extracting information from multiple sensory inputs, there has been little use of sound for the control and learning of robotic actions. For unsupervised reinforcement learning, an agent is expected to actively collect experiences and jointly learn representations and policies in a self-supervised way. We build realistic robotic manipulation scenarios with physics-based sound simulation and propose the Intrinsic Sound Curiosity Module (ISCM). The ISCM provides feedback to a reinforcement learner to learn robust representations and to reward a more efficient exploration behavior. We perform experiments with sound enabled during pre-training and disabled during adaptation, and show that representations learned by ISCM outperform the ones by vision-only baselines and pre-trained policies can accelerate the learning process when applied to downstream tasks., Comment: Accepted at IROS 2022
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Zhao, Xufeng"'

1. Mental Modeling of Reinforcement Learning Agents by Language Models

2. Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

3. Agentic Skill Discovery

4. Large Language Models for Orchestrating Bimanual Robots

5. Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

6. A Closer Look at Reward Decomposition for High-Level Robotic Explanations

7. Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

8. Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

8 results on '"Zhao, Xufeng"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources