Back to Search
Start Over
Proactive Human-Robot Interaction using Visuo-Lingual Transformers
- Publication Year :
- 2023
-
Abstract
- Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction. During collaboration, this enables proactive prediction of the underlying intention of a series of tasks. In contrast, robotic agents collaborating with humans naively follow elementary instructions to complete tasks or use specific hand-crafted triggers to initiate proactive collaboration when working towards the completion of a goal. Endowing such robots with the ability to reason about the end goal and proactively suggest intermediate tasks will engender a much more intuitive method for human-robot collaboration. To this end, we propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve. Specifically, we propose ViLing-MMT, a vision-language multimodal transformer-based architecture that captures inter and intra-modal dependencies to provide accurate scene descriptions and proactively suggest tasks where applicable. We evaluate our proposed model in simulation and real-world scenarios.<br />Comment: Accepted to IROS'23 Workshop: Geriatronics: AI and Robotics for Health & Well-Being in Older Age and Workshop: Assistive Robotics for Citizens
- Subjects :
- Computer Science - Robotics
Computer Science - Artificial Intelligence
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2310.02506
- Document Type :
- Working Paper