Author: "Huang, Zhiao" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Huang, Zhiao"' showing total 32 results

Start Over Author "Huang, Zhiao"

32 results on '"Huang, Zhiao"'

1. Reward-free World Models for Online Imitation Learning

Author: Li, Shangzhe, Huang, Zhiao, and Su, Hao
Subjects: Computer Science - Machine Learning
Abstract: Imitation learning (IL) enables agents to acquire skills directly from expert demonstrations, providing a compelling alternative to reinforcement learning. However, prior online IL approaches struggle with complex tasks characterized by high-dimensional inputs and complex dynamics. In this work, we propose a novel approach to online imitation learning that leverages reward-free world models. Our method learns environmental dynamics entirely in latent spaces without reconstruction, enabling efficient and accurate modeling. We adopt the inverse soft-Q learning objective, reformulating the optimization process in the Q-policy space to mitigate the instability associated with traditional optimization in the reward-policy space. By employing a learned latent dynamics model and planning for control, our approach consistently achieves stable, expert-level performance in tasks with high-dimensional observation or action spaces and intricate dynamics. We evaluate our method on a diverse set of benchmarks, including DMControl, MyoSuite, and ManiSkill2, demonstrating superior empirical performance compared to existing approaches.
Published: 2024

2. ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

Author: Tao, Stone, Xiang, Fanbo, Shukla, Arth, Qin, Yuzhe, Hinrichsen, Xander, Yuan, Xiaodi, Bao, Chen, Lin, Xinsong, Liu, Yulin, Chan, Tse-kai, Gao, Yuan, Li, Xuanlin, Mu, Tongzhou, Xiao, Nan, Gurha, Arnav, Huang, Zhiao, Calandra, Roberto, Chen, Rui, Luo, Shan, and Su, Hao
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms., Comment: Project website: http://maniskill.ai/
Published: 2024

3. DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics

Author: Huang, Zhiao, Chen, Feng, Pu, Yewen, Lin, Chunru, Su, Hao, and Gan, Chuang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory. However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic problems from non-expert users. We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks -- a combination of vision and natural language, given in multiple stages -- that can be readily leveraged by a differential physics solver. We have developed GUI tools that enable non-expert users to specify 100 tasks inspired by real-life soft-body manipulations from online videos, which we'll make public. We leverage large language models to translate task descriptions into machine-interpretable optimization objectives. The optimization objectives can help differentiable physics solvers to solve these long-horizon multistage tasks that are challenging for previous baselines.
Published: 2023

4. Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset

Author: Liang, Litian, Bian, Liuyu, Xiao, Caiwei, Zhang, Jialin, Chen, Linghao, Liu, Isabella, Xiang, Fanbo, Huang, Zhiao, and Su, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. To tackle these challenges, we present Robo360, a dataset that features robotic manipulation with a dense view coverage, which enables high-quality 3D neural representation learning, and a diverse set of objects with various physical and optical properties and facilitates research in various object manipulation and physical world modeling tasks. We confirm the effectiveness of our dataset using existing dynamic NeRF and evaluate its potential in learning multi-view policies. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control.
Published: 2023

5. Reparameterized Policy Learning for Multimodal Trajectory Optimization

Author: Huang, Zhiao, Liang, Litian, Ling, Zhan, Li, Xuanlin, Gan, Chuang, and Su, Hao
Subjects: Computer Science - Machine Learning
Abstract: We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG/
Published: 2023

6. Deductive Verification of Chain-of-Thought Reasoning

Author: Ling, Zhan, Fang, Yunhao, Li, Xuanlin, Huang, Zhiao, Lee, Mingu, Memisevic, Roland, and Su, Hao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks. Code will be released at https://github.com/lz1oceani/verify_cot., Comment: Published at NeurIPS 2023
Published: 2023

7. Chain-of-Thought Predictive Control

Author: Jia, Zhiwei, Thumuluri, Vineet, Liu, Fangchen, Chen, Linghao, Huang, Zhiao, and Su, Hao
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics
Abstract: We study generalizable policy learning from demonstrations for complex low-level control (e.g., contact-rich object manipulations). We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos. Firstly, we propose an observation space-agnostic approach that efficiently discovers the multi-step subskill decomposition of the demos in an unsupervised manner. By grouping temporarily close and functionally similar actions into subskill-level demo segments, the observations at the segment boundaries constitute a chain of planning steps for the task, which we refer to as the chain-of-thought (CoT). Next, we propose a Transformer-based design that effectively learns to predict the CoT as the subskill-level guidance. We couple action and subskill predictions via learnable prompt tokens and a hybrid masking strategy, which enable dynamically updated guidance at test time and improve feature representation of the trajectory for generalizable policy learning. Our method, Chain-of-Thought Predictive Control (CoTPC), consistently surpasses existing strong baselines on challenging manipulation tasks with sub-optimal demos., Comment: ICML 2024; project page at https://sites.google.com/view/cotpc
Published: 2023

8. DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics

Author: Li, Sizhe, Huang, Zhiao, Chen, Tao, Du, Tao, Su, Hao, Tenenbaum, Joshua B., and Gan, Chuang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: In this work, we aim to learn dexterous manipulation of deformable objects using multi-fingered hands. Reinforcement learning approaches for dexterous rigid object manipulation would struggle in this setting due to the complexity of physics interaction with deformable objects. At the same time, previous trajectory optimization approaches with differentiable physics for deformable manipulation would suffer from local optima caused by the explosion of contact modes from hand-object interactions. To address these challenges, we propose DexDeform, a principled framework that abstracts dexterous manipulation skills from human demonstration and refines the learned skills with differentiable physics. Concretely, we first collect a small set of human demonstrations using teleoperation. And we then train a skill model using demonstrations for planning over action abstractions in imagination. To explore the goal space, we further apply augmentations to the existing deformable shapes in demonstrations and use a gradient optimizer to refine the actions planned by the skill model. Finally, we adopt the refined trajectories as new demonstrations for finetuning the skill model. To evaluate the effectiveness of our approach, we introduce a suite of six challenging dexterous deformable object manipulation tasks. Compared with baselines, DexDeform is able to better explore and generalize across novel goals unseen in the initial human demonstrations., Comment: ICLR 2023. Project page: https://sites.google.com/view/dexdeform
Published: 2023

9. MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field

Author: Yang, Kaizhi, Zhang, Xiaoshuai, Huang, Zhiao, Chen, Xuejin, Xu, Zexiang, and Su, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: We present MovingParts, a NeRF-based method for dynamic scene reconstruction and part discovery. We consider motion as an important cue for identifying parts, that all particles on the same part share the common motion pattern. From the perspective of fluid simulation, existing deformation-based methods for dynamic NeRF can be seen as parameterizing the scene motion under the Eulerian view, i.e., focusing on specific locations in space through which the fluid flows as time passes. However, it is intractable to extract the motion of constituting objects or parts using the Eulerian view representation. In this work, we introduce the dual Lagrangian view and enforce representations under the Eulerian/Lagrangian views to be cycle-consistent. Under the Lagrangian view, we parameterize the scene motion by tracking the trajectory of particles on objects. The Lagrangian view makes it convenient to discover parts by factorizing the scene motion as a composition of part-level rigid motions. Experimentally, our method can achieve fast and high-quality dynamic scene reconstruction from even a single moving camera, and the induced part-based representation allows direct applications of part tracking, animation, 3D scene editing, etc., Comment: Project Page: https://silenkzyoung.github.io/MovingParts-WebPage/
Published: 2023

10. RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects

Author: Xu, Zhenjia, Xian, Zhou, Lin, Xingyu, Chi, Cheng, Huang, Zhiao, Gan, Chuang, and Song, Shuran
Subjects: Computer Science - Robotics
Abstract: We introduce RoboNinja, a learning-based cutting system for multi-material objects (i.e., soft objects with rigid cores such as avocados or mangos). In contrast to prior works using open-loop cutting actions to cut through single-material objects (e.g., slicing a cucumber), RoboNinja aims to remove the soft part of an object while preserving the rigid core, thereby maximizing the yield. To achieve this, our system closes the perception-action loop by utilizing an interactive state estimator and an adaptive cutting policy. The system first employs sparse collision information to iteratively estimate the position and geometry of an object's core and then generates closed-loop cutting actions based on the estimated state and a tolerance value. The "adaptiveness" of the policy is achieved through the tolerance value, which modulates the policy's conservativeness when encountering collisions, maintaining an adaptive safety distance from the estimated core. Learning such cutting skills directly on a real-world robot is challenging. Yet, existing simulators are limited in simulating multi-material objects or computing the energy consumption during the cutting process. To address this issue, we develop a differentiable cutting simulator that supports multi-material coupling and allows for the generation of optimized trajectories as demonstrations for policy learning. Furthermore, by using a low-cost force sensor to capture collision feedback, we were able to successfully deploy the learned model in real-world scenarios, including objects with diverse core geometries and soft materials.
Published: 2023

11. ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

Author: Gu, Jiayuan, Xiang, Fanbo, Li, Xuanlin, Ling, Zhan, Liu, Xiqiang, Mu, Tongzhou, Tang, Yihe, Tao, Stone, Wei, Xinyue, Yao, Yunchao, Yuan, Xiaodi, Xie, Pengwei, Huang, Zhiao, Chen, Rui, and Su, Hao
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Generalizable manipulation skills, which can be composed to tackle long-horizon and complex daily chores, are one of the cornerstones of Embodied AI. However, existing benchmarks, mostly composed of a suite of simulatable environments, are insufficient to push cutting-edge research works because they lack object-level topological and geometric variations, are not based on fully dynamic simulation, or are short of native support for multiple types of manipulation tasks. To this end, we present ManiSkill2, the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. ManiSkill2 includes 20 manipulation task families with 2000+ object models and 4M+ demonstration frames, which cover stationary/mobile-base, single/dual-arm, and rigid/soft-body manipulation tasks with 2D/3D-input data simulated by fully dynamic engines. It defines a unified interface and evaluation protocol to support a wide range of algorithms (e.g., classic sense-plan-act, RL, IL), visual observations (point cloud, RGBD), and controllers (e.g., action type and parameterization). Moreover, it empowers fast visual input learning algorithms so that a CNN-based policy can collect samples at about 2000 FPS with 1 GPU and 16 processes on a regular workstation. It implements a render server infrastructure to allow sharing rendering resources across all environments, thereby significantly reducing memory usage. We open-source all codes of our benchmark (simulator, environments, and baselines) and host an online challenge open to interdisciplinary researchers., Comment: Published as a conference paper at ICLR 2023. Project website: https://maniskill2.github.io/
Published: 2023

12. Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation

Author: Lin, Xingyu, Qi, Carl, Zhang, Yunchu, Huang, Zhiao, Fragkiadaki, Katerina, Li, Yunzhu, Gan, Chuang, and Held, David
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels. Previous methods typically either focus on short-horizon tasks or make strong assumptions that full-state information is available, which prevents their use on deformable objects. In this paper, we propose PlAnning with Spatial-Temporal Abstraction (PASTA), which incorporates both spatial abstraction (reasoning about objects and their relations to each other) and temporal abstraction (reasoning over skills instead of low-level actions). Our framework maps high-dimension 3D observations such as point clouds into a set of latent vectors and plans over skill sequences on top of the latent set representation. We show that our method can effectively perform challenging sequential deformable object manipulation tasks in the real world, which require combining multiple tool-use skills such as cutting with a knife, pushing with a pusher, and spreading the dough with a roller., Comment: Published at the Conference on Robot Learning (CoRL 2022)
Published: 2022

13. Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization

Author: Tao, Stone, Li, Xiaochen, Mu, Tongzhou, Huang, Zhiao, Qin, Yuzhe, and Su, Hao
Subjects: Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution. Specifically, our method solves complex long-horizon tasks in three steps: build a paired abstract environment by simplifying geometry and physics, generate abstract trajectories, and solve the original task by an abstract-to-executable trajectory translator. In the abstract environment, complex dynamics such as physical manipulation are removed, making abstract trajectories easier to generate. However, this introduces a large domain gap between abstract trajectories and the actual executed trajectories as abstract trajectories lack low-level details and are not aligned frame-to-frame with the executed trajectory. In a manner reminiscent of language translation, our approach leverages a seq-to-seq model to overcome the large domain gap between the abstract and executable trajectories, enabling the low-level policy to follow the abstract trajectory. Experimental results on various unseen long-horizon tasks with different robot embodiments demonstrate the practicability of our methods to achieve one-shot task generalization., Comment: ICML 2023. Code and visualizations: https://trajectorytranslation.github.io/
Published: 2022

14. RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks

Author: Shi, Haochen, Xu, Huazhe, Huang, Zhiao, Li, Yunzhu, and Wu, Jiajun
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Modeling and manipulating elasto-plastic objects are essential capabilities for robots to perform complex industrial and household interaction tasks (e.g., stuffing dumplings, rolling sushi, and making pottery). However, due to the high degree of freedom of elasto-plastic objects, significant challenges exist in virtually every aspect of the robotic manipulation pipeline, e.g., representing the states, modeling the dynamics, and synthesizing the control signals. We propose to tackle these challenges by employing a particle-based representation for elasto-plastic objects in a model-based planning framework. Our system, RoboCraft, only assumes access to raw RGBD visual observations. It transforms the sensing data into particles and learns a particle-based dynamics model using graph neural networks (GNNs) to capture the structure of the underlying system. The learned model can then be coupled with model-predictive control (MPC) algorithms to plan the robot's behavior. We show through experiments that with just 10 minutes of real-world robotic interaction data, our robot can learn a dynamics model that can be used to synthesize control signals to deform elasto-plastic objects into various target shapes, including shapes that the robot has never encountered before. We perform systematic evaluations in both simulation and the real world to demonstrate the robot's manipulation capabilities and ability to generalize to a more complex action space, different tool shapes, and a mixture of motion modes. We also conduct comparisons between RoboCraft and untrained human subjects controlling the gripper to manipulate deformable objects in both simulation and the real world. Our learned model-based planning framework is comparable to and sometimes better than human subjects on the tested tasks., Comment: Accepted by Robotics: Science and Systems 2022; Project page: http://hxu.rocks/robocraft/
Published: 2022

15. Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

Author: Li, Sizhe, Huang, Zhiao, Du, Tao, Su, Hao, Tenenbaum, Joshua B., and Gan, Chuang
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Differentiable physics has recently been shown as a powerful tool for solving soft-body manipulation tasks. However, the differentiable physics solver often gets stuck when the initial contact points of the end effectors are sub-optimal or when performing multi-stage tasks that require contact point switching, which often leads to local minima. To address this challenge, we propose a contact point discovery approach (CPDeform) that guides the stand-alone differentiable physics solver to deform various soft-body plasticines. The key idea of our approach is to integrate optimal transport-based contact points discovery into the differentiable physics solver to overcome the local minima from initial contact points or contact switching. On single-stage tasks, our method can automatically find suitable initial contact points based on transport priorities. On complex multi-stage tasks, we can iteratively switch the contact points of end-effectors based on transport priorities. To evaluate the effectiveness of our method, we introduce PlasticineLab-M that extends the existing differentiable physics benchmark PlasticineLab to seven new challenging multi-stage soft-body manipulation tasks. Extensive experimental results suggest that: 1) on multi-stage tasks that are infeasible for the vanilla differentiable physics solver, our approach discovers contact points that efficiently guide the solver to completion; 2) on tasks where the vanilla solver performs sub-optimally or near-optimally, our contact point discovery method performs better than or on par with the manipulation performance obtained with handcrafted contact points., Comment: ICLR 2022 Spotlight. Project page: http://cpdeform.csail.mit.edu
Published: 2022

16. DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools

Author: Lin, Xingyu, Huang, Zhiao, Li, Yunzhu, Tenenbaum, Joshua B., Held, David, and Gan, Chuang
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Robotics
Abstract: We consider the problem of sequential robotic manipulation of deformable objects using tools. Previous works have shown that differentiable physics simulators provide gradients to the environment state and help trajectory optimization to converge orders of magnitude faster than model-free reinforcement learning algorithms for deformable object manipulation. However, such gradient-based trajectory optimization typically requires access to the full simulator states and can only solve short-horizon, single-skill tasks due to local optima. In this work, we propose a novel framework, named DiffSkill, that uses a differentiable physics simulator for skill abstraction to solve long-horizon deformable object manipulation tasks from sensory observations. In particular, we first obtain short-horizon skills using individual tools from a gradient-based optimizer, using the full state information in a differentiable simulator; we then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input. Finally, we plan over the skills by finding the intermediate goals and then solve long-horizon tasks. We show the advantages of our method in a new set of sequential deformable object manipulation tasks compared to previous reinforcement learning algorithms and compared to the trajectory optimizer., Comment: ICLR 2022. Project page: https://xingyu-lin.github.io/diffskill/
Published: 2022

17. Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

Author: Driess, Danny, Huang, Zhiao, Li, Yunzhu, Tedrake, Russ, and Toussaint, Marc
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: We present a method to learn compositional multi-object dynamics models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. However, most NeRF approaches are trained on a single scene, representing the whole scene with a global model, making generalization to novel scenes, containing different numbers of objects, challenging. Instead, we present a compositional, object-centric auto-encoder framework that maps multiple views of the scene to a set of latent vectors representing each object separately. The latent vectors parameterize individual NeRFs from which the scene can be reconstructed. Based on those latent vectors, we train a graph neural network dynamics model in the latent space to achieve compositionality for dynamics prediction. A key feature of our approach is that the latent vectors are forced to encode 3D information through the NeRF decoder, which enables us to incorporate structural priors in learning the dynamics models, making long-term predictions more stable compared to several baselines. Simulated and real world experiments show that our method can model and learn the dynamics of compositional scenes including rigid and deformable objects. Video: https://dannydriess.github.io/compnerfdyn/, Comment: v3: real robot exp
Published: 2022

18. Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

Author: Zhang, Xiaoshuai, Chen, Rui, Li, Ang, Xiang, Fanbo, Qin, Yuzhe, Gu, Jiayuan, Ling, Zhan, Liu, Minghua, Zeng, Peiyu, Han, Songfang, Huang, Zhiao, Mu, Tongzhou, Xu, Jing, and Su, Hao
Subjects: Computer Science - Robotics
Abstract: In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor in real time. We conduct real experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to the real-world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in developing robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities., Comment: The paper will appear in the IEEE Transactions on Robotics. 20 pages, 14 figures, 10 tables
Published: 2022

19. ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

Author: Mu, Tongzhou, Ling, Zhan, Xiang, Fanbo, Yang, Derek, Li, Xuanlin, Tao, Stone, Huang, Zhiao, Jia, Zhiwei, and Su, Hao
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, 3D assets in existing benchmarks mostly lack the diversity of 3D shapes that align with real-world intra-class complexity in topology and geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator. 3D assets in ManiSkill include large intra-class topological and geometric variations. Tasks are carefully chosen to cover distinct types of manipulation challenges. Latest progress in 3D vision also makes us believe that we should customize the benchmark so that the challenge is inviting to researchers working on 3D deep learning. To this end, we simulate a moving panoramic camera that returns ego-centric point clouds or RGB-D images. In addition, we would like ManiSkill to serve a broad set of researchers interested in manipulation research. Besides supporting the learning of policies from interactions, we also support learning-from-demonstrations (LfD) methods, by providing a large number of high-quality demonstrations (~36,000 successful trajectories, ~1.5M point cloud/RGB-D frames in total). We provide baselines using 3D deep learning and LfD algorithms. All code of our benchmark (simulator, environment, SDK, and baselines) is open-sourced, and a challenge facing interdisciplinary researchers will be held based on the benchmark., Comment: NeurIPS 2021 Track on Datasets and Benchmarks; code: https://github.com/haosulab/ManiSkill
Published: 2021

20. PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics

Author: Huang, Zhiao, Hu, Yuanming, Du, Tao, Zhou, Siyuan, Su, Hao, Tenenbaum, Joshua B., and Gan, Chuang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Robotics
Abstract: Simulated virtual environments serve as one of the main driving forces behind developing and evaluating skill learning algorithms. However, existing environments typically only simulate rigid body physics. Additionally, the simulation process usually does not provide gradients that might be useful for planning and control optimizations. We introduce a new differentiable physics benchmark called PasticineLab, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents. We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark. Experimental results suggest that 1) RL-based approaches struggle to solve most of the tasks efficiently; 2) gradient-based approaches, by optimizing open-loop control sequences with the built-in differentiable physics engine, can rapidly find a solution within tens of iterations, but still fall short on multi-stage tasks that require long-term planning. We expect that PlasticineLab will encourage the development of novel algorithms that combine differentiable physics and RL for more complex physics-based skill learning tasks., Comment: Accepted to ICLR 2021 as a spotlight presentation. Project page: http://plasticinelab.csail.mit.edu/
Published: 2021

21. Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous Graph Neural Networks

Author: Tang, Hao, Huang, Zhiao, Gu, Jiayuan, Lu, Bao-Liang, and Su, Hao
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..) when solving many graph analysis problems. Taking the perspective of synthesizing graph theory programs, we propose several extensions to address the issue. First, inspired by the dependency of the iteration number of common graph theory algorithms on graph size, we learn to terminate the message passing process in GNNs adaptively according to the computation progress. Second, inspired by the fact that many graph theory algorithms are homogeneous with respect to graph weights, we introduce homogeneous transformation layers that are universal homogeneous function approximators, to convert ordinary GNNs to be homogeneous. Experimentally, we show that our GNN can be trained from small-scale graphs but generalize well to large-scale graphs for a number of basic graph theory problems. It also shows generalizability for applications of multi-body physical simulation and image-based navigation problems., Comment: To appear at NeurIPS 2020
Published: 2020

22. Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories

Author: Luo, Tiange, Mo, Kaichun, Huang, Zhiao, Xu, Jiarui, Hu, Siyu, Wang, Liwei, and Su, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We address the problem of discovering 3D parts for objects in unseen categories. Being able to learn the geometry prior of parts and transfer this prior to unseen categories pose fundamental challenges on data-driven shape segmentation approaches. Formulated as a contextual bandit problem, we propose a learning-based agglomerative clustering framework which learns a grouping policy to progressively group small part proposals into bigger ones in a bottom-up fashion. At the core of our approach is to restrict the local context for extracting part-level features, which encourages the generalizability to unseen categories. On the large-scale fine-grained 3D part dataset, PartNet, we demonstrate that our method can transfer knowledge of parts learned from 3 training categories to 21 unseen testing categories without seeing any annotated samples. Quantitative comparisons against four shape segmentation baselines shows that our approach achieve the state-of-the-art performance., Comment: ICLR2020
Published: 2020

23. Modeling-based Optimization for Robotic Manipulation

Author: Huang, Zhiao
Subjects: Computer science, Generative Modeling, Optimization, Robotics, Simulation, Soft Body
Abstract: This dissertation explores the intersection of modeling and optimization in robotics, focusing on the development of efficient and effective systems for robotic manipulation. The primary objective is to study how to integrate modeling techniques with optimization processes, a concept we term "modeling-based optimization." We first introduce a differentiable physics simulator for soft-body manipulation, demonstrating the power of environment modeling in policy learning. By simulating elastoplastic materials such as plasticine, we benchmark reinforcement learning (RL) and gradient-based optimization methods, highlighting the strengths and limitations of each approach. The findings reveal that while gradient-based methods excel in environments with well-modeled physics, they struggle with long-term planning and multi-stage tasks.To address these challenges, we propose a reparameterized policy gradient method, which leverages latent variable models to facilitate exploration and avoid local minima. This approach integrates generative models to enhance policy expressiveness and improve performance in hard-exploration tasks. We further extend the concept of hierarchical policy modeling by introducing graph-based and vision-language-driven methods. These techniques enable robots to plan and execute long-horizon tasks by abstracting the search space and using human-like instructions to guide complex manipulations.The contributions of this thesis include the development of novel algorithms for soft-body manipulation, hierarchical policy modeling, and the integration of generative models with reinforcement learning. These advancements offer new insights into the relationship between learning, modeling, and optimization in robotics.
Published: 2024

24. Mapping State Space using Landmarks for Universal Goal Reaching

Author: Huang, Zhiao, Liu, Fangchen, and Su, Hao
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging. Our method explicitly models the environment in a hierarchical manner, with a high-level dynamic landmark-based map abstracting the visited state space, and a low-level value network to derive precise local decisions. We use farthest point sampling to select landmark states from past experience, which has improved exploration compared with simple uniform sampling. Experimentally we showed that our method enables the agent to reach long-range goals at the early training stage, and achieve better performance than standard RL algorithms for a number of challenging tasks.
Published: 2019

25. Object-Oriented Dynamics Predictor

Author: Zhu, Guangxiang, Huang, Zhiao, and Zhang, Chongjie
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning. However, previous work on action-conditioned dynamics prediction focuses on learning the pixel-level motion and thus does not generalize well to novel environments with different object layouts. In this paper, we present a novel object-oriented framework, called object-oriented dynamics predictor (OODP), which decomposes the environment into objects and predicts the dynamics of objects conditioned on both actions and object-to-object relations. It is an end-to-end neural network and can be trained in an unsupervised manner. To enable the generalization ability of dynamics learning, we design a novel CNN-based relation mechanism that is class-specific (rather than object-specific) and exploits the locality principle. Empirical results show that OODP significantly outperforms previous methods in terms of generalization over novel environments with various object layouts. OODP is able to learn from very few environments and accurately predict dynamics in a large number of unseen environments. In addition, OODP learns semantically and visually interpretable dynamics models., Comment: Accepted to NIPS 2018
Published: 2018

26. Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Author: Newell, Alejandro, Huang, Zhiao, and Deng, Jia
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping. A number of computer vision problems can be framed in this manner including multi-person pose estimation, instance segmentation, and multi-object tracking. Usually the grouping of detections is achieved with multi-stage pipelines, instead we propose an approach that teaches a network to simultaneously output detections and group assignments. This technique can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions. We show how to apply this method to both multi-person pose estimation and instance segmentation and report state-of-the-art performance for multi-person pose on the MPII and MS-COCO datasets., Comment: Added results on MS-COCO and updated results on MPII
Published: 2016

27. Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression

Author: Huang, Zhiao, Zhou, Erjin, and Cao, Zhimin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Facial landmark localization plays an important role in face recognition and analysis applications. In this paper, we give a brief introduction to a coarse-to-fine pipeline with neural networks and sequential regression. First, a global convolutional network is applied to the holistic facial image to give an initial landmark prediction. A pyramid of multi-scale local image patches is then cropped to feed to a new network for each landmark to refine the prediction. As the refinement network outputs a more accurate position estimation than the input, such procedure could be repeated several times until the estimation converges. We evaluate our system on the 300-W dataset [11] and it outperforms the recent state-of-the-arts.
Published: 2015

28. RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks.

Author: Shi, Haochen, Xu, Huazhe, Huang, Zhiao, Li, Yunzhu, and Wu, Jiajun
Abstract: Modeling and manipulating elasto-plastic objects are essential capabilities for robots to perform complex industrial and household interaction tasks (e.g., stuffing dumplings, rolling sushi, and making pottery). However, due to the high degrees of freedom of elasto-plastic objects, significant challenges exist in virtually every aspect of the robotic manipulation pipeline, for example, representing the states, modeling the dynamics, and synthesizing the control signals. We propose to tackle these challenges by employing a particle-based representation for elasto-plastic objects in a model-based planning framework. Our system, RoboCraft, only assumes access to raw RGBD visual observations. It transforms the sensory data into particles and learns a particle-based dynamics model using graph neural networks (GNNs) to capture the structure of the underlying system. The learned model can then be coupled with model predictive control (MPC) algorithms to plan the robot's behavior. We show through experiments that with just 10 min of real-world robot interaction data, our robot can learn a dynamics model that can be used to synthesize control signals to deform elasto-plastic objects into various complex target shapes, including shapes that the robot has never encountered before. We perform systematic evaluations in both simulation and the real world to demonstrate the robot's manipulation capabilities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

Author: Zhang, Xiaoshuai, primary, Chen, Rui, additional, Li, Ang, additional, Xiang, Fanbo, additional, Qin, Yuzhe, additional, Gu, Jiayuan, additional, Ling, Zhan, additional, Liu, Minghua, additional, Zeng, Peiyu, additional, Han, Songfang, additional, Huang, Zhiao, additional, Mu, Tongzhou, additional, Xu, Jing, additional, and Su, Hao, additional
Published: 2023
Full Text: View/download PDF

30. RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks

Author: Shi, Haochen, primary, Xu, Huazhe, additional, Huang, Zhiao, additional, Li, Yunzhu, additional, and Wu, Jiajun, additional
Published: 2022
Full Text: View/download PDF

31. Utilization of Multilevel Flow Modelling to Support Passive Safety System Reliability Assessment

Author: Huang, Zhiao, primary, Lind, Morten, additional, Zhang, Xinxin, additional, Wu, Jing, additional, and Miao, Huifang, additional
Published: 2021
Full Text: View/download PDF

32. ManiSkill-2021

Author: Tongzhou Mu, Tongzhou Mu, primary, Zhan Ling, Zhan Ling, additional, Fanbo Xiang, Fanbo Xiang, additional, Derek Yang, Derek Yang, additional, Xuanlin Li, Xuanlin Li, additional, Stone Tao, Stone Tao, additional, Zhiao Huang, Zhiao Huang, additional, Zhiwei Jia, Zhiwei Jia, additional, and Hao Su, Hao Su, additional
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

32 results on '"Huang, Zhiao"'

1. Reward-free World Models for Online Imitation Learning

2. ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

3. DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics

4. Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset

5. Reparameterized Policy Learning for Multimodal Trajectory Optimization

6. Deductive Verification of Chain-of-Thought Reasoning

7. Chain-of-Thought Predictive Control

8. DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics

9. MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field

10. RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects

11. ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

12. Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation

13. Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization

14. RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks

15. Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

16. DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools

17. Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

18. Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

19. ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

20. PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics

21. Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous Graph Neural Networks

22. Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories

23. Modeling-based Optimization for Robotic Manipulation

24. Mapping State Space using Landmarks for Universal Goal Reaching

25. Object-Oriented Dynamics Predictor

26. Associative Embedding: End-to-End Learning for Joint Detection and Grouping

27. Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression

28. RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks.

29. Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

30. RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks

31. Utilization of Multilevel Flow Modelling to Support Passive Safety System Reliability Assessment

32. ManiSkill-2021

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

32 results on '"Huang, Zhiao"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources