Author: "Huang, Sandy H." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Huang, Sandy H."' showing total 26 results

Start Over Author "Huang, Sandy H."

26 results on '"Huang, Sandy H."'

1. Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Author: Lampe, Thomas, Abdolmaleki, Abbas, Bechtle, Sarah, Huang, Sandy H., Springenberg, Jost Tobias, Bloesch, Michael, Groth, Oliver, Hafner, Roland, Hertweck, Tim, Neunert, Michael, Wulfmeier, Markus, Zhang, Jingwei, Nori, Francesco, Heess, Nicolas, and Riedmiller, Martin
Subjects: Computer Science - Robotics
Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and their embedding in an iterative online/offline scheme (``collect and infer'') can drastically improve data-efficiency by using all the collected experience, which empowers learning from real robot experience only. Moreover, the resulting policy improves significantly over the state of the art on a recently proposed real robot manipulation benchmark. Our approach learns end-to-end, directly from pixels, and does not rely on additional human domain knowledge such as a simulator or demonstrations.
Published: 2023

2. Coherent Soft Imitation Learning

Author: Watson, Joe, Huang, Sandy H., and Heess, Nicolas
Subjects: Computer Science - Machine Learning
Abstract: Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are not common, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behaviour-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilitates fine-tuning cloned policies using the reward estimate and additional interactions with the environment. This approach conveniently achieves imitation learning through initial behaviour cloning, followed by refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches. For the open-source implementation and simulation results, see https://joemwatson.github.io/csil/., Comment: 51 pages, 49 figures. DeepMind internship report. Accepted as a spotlight paper at Advances in Neural Information Processing Systems 2023
Published: 2023

3. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Author: Haarnoja, Tuomas, Moran, Ben, Lever, Guy, Huang, Sandy H., Tirumala, Dhruva, Humplik, Jan, Wulfmeier, Markus, Tunyasuvunakool, Saran, Siegel, Noah Y., Hafner, Roland, Bloesch, Michael, Hartikainen, Kristian, Byravan, Arunkumar, Hasenclever, Leonard, Tassa, Yuval, Sadeghi, Fereshteh, Batchelor, Nathan, Casarini, Federico, Saliceti, Stefano, Game, Charles, Sreendra, Neil, Patel, Kushal, Gwira, Marlon, Huber, Andrea, Hurley, Nicole, Nori, Francesco, Hadsell, Raia, and Heess, Nicolas
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives., Comment: Project website: https://sites.google.com/view/op3-soccer
Published: 2023
Full Text: View/download PDF

4. Learning Causal Overhypotheses through Exploration in Children and Computational Models

Author: Kosoy, Eliza, Liu, Adrian, Collins, Jasmine, Chan, David M, Hamrick, Jessica B, Ke, Nan Rosemary, Huang, Sandy H, Kaufmann, Bryanna, Canny, John, and Gopnik, Alison
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: Despite recent progress in reinforcement learning (RL), RL algorithms for exploration still remain an active area of research. Existing methods often focus on state-based metrics, which do not consider the underlying causal structures of the environment, and while recent research has begun to explore RL environments for causal learning, these environments primarily leverage causal information through causal inference or induction rather than exploration. In contrast, human children - some of the most proficient explorers - have been shown to use causal information to great benefit. In this work, we introduce a novel RL environment designed with a controllable causal structure, which allows us to evaluate exploration strategies used by both agents and children in a unified environment. In addition, through experimentation on both computation models and children, we demonstrate that there are significant differences between information-gain optimal RL exploration in causal environments and the exploration of children in the same environments. We conclude with a discussion of how these findings may inspire new directions of research into efficient exploration and disambiguation of causal structures for RL algorithms.
Published: 2022

5. On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

Author: Abdolmaleki, Abbas, Huang, Sandy H., Vezzani, Giulia, Shahriari, Bobak, Springenberg, Jost Tobias, Mishra, Shruti, TB, Dhruva, Byravan, Arunkumar, Bousmalis, Konstantinos, Gyorgy, Andras, Szepesvari, Csaba, Hadsell, Raia, Heess, Nicolas, and Riedmiller, Martin
Subjects: Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors. Often, the task reward and auxiliary objectives are in conflict, and in this paper we argue that this makes it natural to treat these cases as instances of multi-objective (MO) optimization problems. We demonstrate how this perspective allows us to develop novel and more effective RL algorithms. In particular, we focus on offline RL and finetuning as case studies, and show that existing approaches can be understood as MO algorithms relying on linear scalarization. We hypothesize that replacing linear scalarization with a better algorithm can improve performance. We introduce Distillation of a Mixture of Experts (DiME), a new MORL algorithm that outperforms linear scalarization and can be applied to these non-standard MO problems. We demonstrate that for offline RL, DiME leads to a simple new algorithm that outperforms state-of-the-art. For finetuning, we derive new algorithms that learn to outperform the teacher policy.
Published: 2021

6. A Distributional View on Multi-Objective Policy Optimization

Author: Abdolmaleki, Abbas, Huang, Sandy H., Hasenclever, Leonard, Neunert, Michael, Song, H. Francis, Zambelli, Martina, Martins, Murilo F., Heess, Nicolas, Hadsell, Raia, and Riedmiller, Martin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Robotics, Statistics - Machine Learning
Abstract: Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.
Published: 2020

7. Nonverbal Robot Feedback for Human Teachers

Author: Huang, Sandy H., Huang, Isabella, Pandya, Ravi, and Dragan, Anca D.
Subjects: Computer Science - Robotics, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Robots can learn preferences from human demonstrations, but their success depends on how informative these demonstrations are. Being informative is unfortunately very challenging, because during teaching, people typically get no transparency into what the robot already knows or has learned so far. In contrast, human students naturally provide a wealth of nonverbal feedback that reveals their level of understanding and engagement. In this work, we study how a robot can similarly provide feedback that is minimally disruptive, yet gives human teachers a better mental model of the robot learner, and thus enables them to teach more effectively. Our idea is that at any point, the robot can indicate what it thinks the correct next action is, shedding light on its current estimate of the human's preferences. We analyze how useful this feedback is, both in theory and with two user studies---one with a virtual character that tests the feedback itself, and one with a PR2 robot that uses gaze as the feedback mechanism. We find that feedback can be useful for improving both the quality of teaching and teachers' understanding of the robot's capability., Comment: CoRL 2019
Published: 2019

8. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning

Author: Huang, Sandy H., Zambelli, Martina, Kay, Jackie, Martins, Murilo F., Tassa, Yuval, Pilarski, Patrick M., and Hadsell, Raia
Subjects: Computer Science - Robotics
Abstract: Robots must know how to be gentle when they need to interact with fragile objects, or when the robot itself is prone to wear and tear. We propose an approach that enables deep reinforcement learning to train policies that are gentle, both during exploration and task execution. In a reward-based learning environment, a natural approach involves augmenting the (task) reward with a penalty for non-gentleness, which can be defined as excessive impact force. However, augmenting with only this penalty impairs learning: policies get stuck in a local optimum which avoids all contact with the environment. Prior research has shown that combining auxiliary tasks or intrinsic rewards can be beneficial for stabilizing and accelerating learning in sparse-reward domains, and indeed we find that introducing a surprise-based intrinsic reward does avoid the no-contact failure case. However, we show that a simple dynamics-based surprise is not as effective as penalty-based surprise. Penalty-based surprise, based on predicting forceful contacts, has a further benefit: it encourages exploration which is contact-rich yet gentle. We demonstrate the effectiveness of the approach using a complex, tendon-powered robot hand with tactile sensors. Videos are available at http://sites.google.com/view/gentlemanipulation.
Published: 2019

9. Human-AI Learning Performance in Multi-Armed Bandits

Author: Pandya, Ravi, Huang, Sandy H., Hadfield-Menell, Dylan, and Dragan, Anca D.
Subjects: Computer Science - Artificial Intelligence
Abstract: People frequently face challenging decision-making problems in which outcomes are uncertain or unknown. Artificial intelligence (AI) algorithms exist that can outperform humans at learning such tasks. Thus, there is an opportunity for AI agents to assist people in learning these tasks more effectively. In this work, we use a multi-armed bandit as a controlled setting in which to explore this direction. We pair humans with a selection of agents and observe how well each human-agent team performs. We find that team performance can beat both human and agent performance in isolation. Interestingly, we also find that an agent's performance in isolation does not necessarily correlate with the human-agent team's performance. A drop in agent performance can lead to a disproportionately large drop in team performance, or in some settings can even improve team performance. Pairing a human with an agent that performs slightly better than them can make them perform much better, while pairing them with an agent that performs the same can make them them perform much worse. Further, our results suggest that people have different exploration strategies and might perform better with agents that match their strategy. Overall, optimizing human-agent team performance requires going beyond optimizing agent performance, to understanding how the agent's suggestions will influence human decision-making., Comment: Artificial Intelligence, Ethics and Society (AIES) 2019
Published: 2018

10. Establishing Appropriate Trust via Critical States

Author: Huang, Sandy H., Bhatia, Kush, Abbeel, Pieter, and Dragan, Anca D.
Subjects: Computer Science - Robotics
Abstract: In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts. Learned neural network policies make that particularly challenging. We propose an approach for helping end-users build a mental model of such policies. Our key observation is that for most tasks, the essence of the policy is captured in a few critical states: states in which it is very important to take a certain action. Our user studies show that if the robot shows a human what its understanding of the task's critical states is, then the human can make a more informed decision about whether to deploy the policy, and if she does deploy it, when she needs to take control from it at execution time., Comment: IROS 2018
Published: 2018

11. Expressing Robot Incapability

Author: Kwon, Minae, Huang, Sandy H., and Dragan, Anca D.
Subjects: Computer Science - Robotics
Abstract: Our goal is to enable robots to express their incapability, and to do so in a way that communicates both what they are trying to accomplish and why they are unable to accomplish it. We frame this as a trajectory optimization problem: maximize the similarity between the motion expressing incapability and what would amount to successful task execution, while obeying the physical limits of the robot. We introduce and evaluate candidate similarity measures, and show that one in particular generalizes to a range of tasks, while producing expressive motions that are tailored to each task. Our user study supports that our approach automatically generates motions expressing incapability that communicate both what and why to end-users, and improve their overall perception of the robot and willingness to collaborate with it in the future., Comment: HRI 2018
Published: 2018
Full Text: View/download PDF

12. Enabling Robots to Communicate their Objectives

Author: Huang, Sandy H., Held, David, Abbeel, Pieter, and Dragan, Anca D.
Subjects: Computer Science - Robotics, Computer Science - Machine Learning
Abstract: The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. Since a robot's behavior is often a direct result of its underlying objective function, our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally develop such a mental model over time through observing the robot act, this familiarization process may be lengthy. Our approach reduces this time by having the robot model how people infer objectives from observed behavior, and then it selects those behaviors that are maximally informative. The problem of computing a posterior over objectives from observed behavior is known as Inverse Reinforcement Learning (IRL), and has been applied to robots learning human objectives. We consider the problem where the roles of human and robot are swapped. Our main contribution is to recognize that unlike robots, humans will not be exact in their IRL inference. We thus introduce two factors to define candidate approximate-inference models for human learning in this setting, and analyze them in a user study in the autonomous driving domain. We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations. Our results also suggest, however, that additional research is needed in modeling how humans extrapolate from examples of robot behavior., Comment: RSS 2017
Published: 2017
Full Text: View/download PDF

13. Learning agile soccer skills for a bipedal robot with deep reinforcement learning

Author: Haarnoja, Tuomas, primary, Moran, Ben, additional, Lever, Guy, additional, Huang, Sandy H., additional, Tirumala, Dhruva, additional, Humplik, Jan, additional, Wulfmeier, Markus, additional, Tunyasuvunakool, Saran, additional, Siegel, Noah Y., additional, Hafner, Roland, additional, Bloesch, Michael, additional, Hartikainen, Kristian, additional, Byravan, Arunkumar, additional, Hasenclever, Leonard, additional, Tassa, Yuval, additional, Sadeghi, Fereshteh, additional, Batchelor, Nathan, additional, Casarini, Federico, additional, Saliceti, Stefano, additional, Game, Charles, additional, Sreendra, Neil, additional, Patel, Kushal, additional, Gwira, Marlon, additional, Huber, Andrea, additional, Hurley, Nicole, additional, Nori, Francesco, additional, Hadsell, Raia, additional, and Heess, Nicolas, additional
Published: 2024
Full Text: View/download PDF

14. Toward personalizing treatment for depression: predicting diagnosis and severity.

Author: Huang, Sandy H, LePendu, Paea, Iyer, Srinivasan V, Tai-Seale, Ming, Carrell, David, and Shah, Nigam H
Subjects: Humans, Diagnosis, Differential, Severity of Illness Index, ROC Curve, Depressive Disorder, Models, Psychological, Female, Male, Electronic Health Records, Precision Medicine, data mining, depression, electronic health records, ontology, personalized medicine, Patient Safety, Behavioral and Social Science, Brain Disorders, Clinical Research, Mental Health, Health Services, Serious Mental Illness, Depression, 4.1 Discovery and preclinical testing of markers and technologies, Detection, screening and diagnosis, Mental health, Information and Computing Sciences, Engineering, Medical and Health Sciences, Medical Informatics
Abstract: ObjectiveDepression is a prevalent disorder difficult to diagnose and treat. In particular, depressed patients exhibit largely unpredictable responses to treatment. Toward the goal of personalizing treatment for depression, we develop and evaluate computational models that use electronic health record (EHR) data for predicting the diagnosis and severity of depression, and response to treatment.Materials and methodsWe develop regression-based models for predicting depression, its severity, and response to treatment from EHR data, using structured diagnosis and medication codes as well as free-text clinical reports. We used two datasets: 35,000 patients (5000 depressed) from the Palo Alto Medical Foundation and 5651 patients treated for depression from the Group Health Research Institute.ResultsOur models are able to predict a future diagnosis of depression up to 12 months in advance (area under the receiver operating characteristic curve (AUC) 0.70-0.80). We can differentiate patients with severe baseline depression from those with minimal or mild baseline depression (AUC 0.72). Baseline depression severity was the strongest predictor of treatment response for medication and psychotherapy.ConclusionsIt is possible to use EHR data to predict a diagnosis of depression up to 12 months in advance and to differentiate between extreme baseline levels of depression. The models use commonly available data on diagnosis, medication, and clinical progress notes, making them easily portable. The ability to automatically determine severity can facilitate assembly of large patient cohorts with similar severity from multiple sites, which may enable elucidation of the moderators of treatment response in the future.
Published: 2014

15. Trajectories of resilience and dysfunction following potential trauma: A review and statistical evaluation

Author: Galatzer-Levy, Isaac R., Huang, Sandy H., and Bonanno, George A.
Published: 2018
Full Text: View/download PDF

16. Enabling robots to communicate their objectives

Author: Huang, Sandy H., Held, David, Abbeel, Pieter, and Dragan, Anca D.
Published: 2019
Full Text: View/download PDF

17. Looking back and moving forward: dimensions of coping flexibility divergently predict long-term bereavement outcomes

Author: Huang, Sandy H., primary, Birk, Jeffrey L., additional, and Bonanno, George A., additional
Published: 2022
Full Text: View/download PDF

18. Looking back and moving forward: dimensions of coping flexibility divergently predict long-term bereavement outcomes.

Author: Huang, Sandy H., Birk, Jeffrey L., and Bonanno, George A.
Subjects: *BEREAVEMENT, *COMPLICATED grief, *LIFE change events, *POST-traumatic stress disorder, *PSYCHOLOGICAL adaptation
Abstract: Bereavement is a serious public health concern. Some people suffer prolonged and debilitating functional impairment after the death of a loved one. Evidence suggests that flexibility in coping approaches predicts resilience after stressful life events, but its long-term effects after the unique experience of bereavement are unknown. Which strategies of coping flexibility predict better—or worse—adjustment over time for bereaved people and at what times? The present study used path analyses to investigate longitudinal effects of forward-focus and loss-focus coping strategies on symptoms of persistent complex bereavement disorder (PCBD), depression, and posttraumatic stress disorder in a spousally bereaved adult sample (N = 248) at three time-points after the loss (∼3 months, ∼14 months, and ∼25 months). Forward-focus coping demonstrated adaptive utility overall, with sooner effects on PCBD than on depression. By contrast, loss-focus coping demonstrated a delayed-onset, maladaptive pattern. The findings contribute to the coping flexibility literature by suggesting that the adaptiveness or maladaptiveness of different coping strategies may depend on the context that requires coping. In particular, forward-focus coping may be substantially more advantageous than loss-focus coping in the context of bereavement. Implications, limitations, and future research directions are discussed. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Layers of Flexibility and the Prediction of Adaptation to Major Life Stressors

Author: Huang, Sandy H.
Published: 2022
Full Text: View/download PDF

20. On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Author: Abdolmaleki, Abbas, Huang, Sandy H., Vezzani, Giulia, Shahriari, Bobak, Springenberg, Jost Tobias, Mishra, Shruti, TB, Dhruva, Byravan, Arunkumar, Bousmalis, Konstantinos, Gyorgy, Andras, Szepesvari, Csaba, Hadsell, Raia, Heess, Nicolas, and Riedmiller, Martin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Robotics, Robotics (cs.RO), Machine Learning (cs.LG)
Abstract: Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives, or constraints, in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors when learning from experts or in offline RL. Often, task reward and auxiliary objectives are in conflict with each other and it is therefore natural to treat these examples as instances of multi-objective (MO) optimization problems. We study the principles underlying MORL and introduce a new algorithm, Distillation of a Mixture of Experts (DiME), that is intuitive and scale-invariant under some conditions. We highlight its strengths on standard MO benchmark problems and consider case studies in which we recast offline RL and learning from experts as MO problems. This leads to a natural algorithmic formulation that sheds light on the connection between existing approaches. For offline RL, we use the MO perspective to derive a simple algorithm, that optimizes for the standard RL objective plus a behavioral cloning term. This outperforms state-of-the-art on two established offline RL benchmarks.
Published: 2021
Full Text: View/download PDF

21. Human-AI Learning Performance in Multi-Armed Bandits

Author: Pandya, Ravi, primary, Huang, Sandy H., additional, Hadfield-Menell, Dylan, additional, and Dragan, Anca D., additional
Published: 2019
Full Text: View/download PDF

22. Establishing Appropriate Trust via Critical States

Author: Huang, Sandy H., primary, Bhatia, Kush, additional, Abbeel, Pieter, additional, and Dragan, Anca D., additional
Published: 2018
Full Text: View/download PDF

23. Enabling robots to communicate their objectives

Author: Huang, Sandy H., primary, Held, David, additional, Abbeel, Pieter, additional, and Dragan, Anca D., additional
Published: 2018
Full Text: View/download PDF

24. Expressing Robot Incapability

Author: Kwon, Minae, primary, Huang, Sandy H., additional, and Dragan, Anca D., additional
Published: 2018
Full Text: View/download PDF

25. Leveraging appearance priors in non-rigid registration, with application to manipulation of deformable objects

Author: Huang, Sandy H., primary, Pan, Jia, additional, Mulcaire, George, additional, and Abbeel, Pieter, additional
Published: 2015
Full Text: View/download PDF

26. Unifying scene registration and trajectory optimization for learning from demonstrations with application to manipulation of deformable objects

Author: Lee, Alex X., primary, Huang, Sandy H., additional, Hadfield-Menell, Dylan, additional, Tzeng, Eric, additional, and Abbeel, Pieter, additional
Published: 2014
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

26 results on '"Huang, Sandy H."'

1. Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

2. Coherent Soft Imitation Learning

3. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

4. Learning Causal Overhypotheses through Exploration in Children and Computational Models

5. On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

6. A Distributional View on Multi-Objective Policy Optimization

7. Nonverbal Robot Feedback for Human Teachers

8. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning

9. Human-AI Learning Performance in Multi-Armed Bandits

10. Establishing Appropriate Trust via Critical States

11. Expressing Robot Incapability

12. Enabling Robots to Communicate their Objectives

13. Learning agile soccer skills for a bipedal robot with deep reinforcement learning

14. Toward personalizing treatment for depression: predicting diagnosis and severity.

15. Trajectories of resilience and dysfunction following potential trauma: A review and statistical evaluation

16. Enabling robots to communicate their objectives

17. Looking back and moving forward: dimensions of coping flexibility divergently predict long-term bereavement outcomes

18. Looking back and moving forward: dimensions of coping flexibility divergently predict long-term bereavement outcomes.

19. Layers of Flexibility and the Prediction of Adaptation to Major Life Stressors

20. On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

21. Human-AI Learning Performance in Multi-Armed Bandits

22. Establishing Appropriate Trust via Critical States

23. Enabling robots to communicate their objectives

24. Expressing Robot Incapability

25. Leveraging appearance priors in non-rigid registration, with application to manipulation of deformable objects

26. Unifying scene registration and trajectory optimization for learning from demonstrations with application to manipulation of deformable objects

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

26 results on '"Huang, Sandy H."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources