Author: "Ma, Haitong" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ma, Haitong"' showing total 17 results

Start Over Author "Ma, Haitong" Publication Type Electronic Resources

17 results on '"Ma, Haitong"'

1. Efficient Duple Perturbation Robustness in Low-rank MDPs

Author: Hu, Yang, Ma, Haitong, Dai, Bo, Li, Na, Hu, Yang, Ma, Haitong, Dai, Bo, and Li, Na
Abstract: The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from efficiency issues that obstruct their real-world implementation. In this paper, we introduce duple perturbation robustness, i.e. perturbation on both the feature and factor vectors for low-rank Markov decision processes (MDPs), via a novel characterization of $(\xi,\eta)$-ambiguity sets. The novel robust MDP formulation is compatible with the function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Examples are designed to justify the new robustness concept, and algorithmic efficiency is supported by both theoretical bounds and numerical simulations., Comment: 25 pages, 8 figures, in submission to ICML'24
Published: 2024

2. Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Author: Ma, Haitong, Ren, Zhaolin, Dai, Bo, Li, Na, Ma, Haitong, Ren, Zhaolin, Dai, Bo, and Li, Na
Abstract: We study sim-to-real skill transfer and discovery in the context of robotics control using representation learning. We draw inspiration from spectral decomposition of Markov decision processes. The spectral decomposition brings about representation that can linearly represent the state-action value function induced by any policies, thus can be regarded as skills. The skill representations are transferable across arbitrary tasks with the same transition dynamics. Moreover, to handle the sim-to-real gap in the dynamics, we propose a skill discovery algorithm that learns new skills caused by the sim-to-real gap from real-world data. We promote the discovery of new skills by enforcing orthogonal constraints between the skills to learn and the skills from simulators, and then synthesize the policy using the enlarged skill sets. We demonstrate our methodology by transferring quadrotor controllers from simulators to Crazyflie 2.1 quadrotors. We show that we can learn the skill representations from a single simulator task and transfer these to multiple different real-world tasks including hovering, taking off, landing and trajectory tracking. Our skill discovery approach helps narrow the sim-to-real gap and improve the real-world controller performance by up to 30.2%., Comment: 9 pages, 6 figures. Project page: https://congharvard.github.io/steady-sim-to-real
Published: 2024

3. Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

Author: Zhang, Tianpeng, Kim, Sekeun, Charton, Jerome, Ma, Haitong, Kim, Kyungsang, Li, Na, Li, Quanzheng, Zhang, Tianpeng, Kim, Sekeun, Charton, Jerome, Ma, Haitong, Kim, Kyungsang, Li, Na, and Li, Quanzheng
Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate mapping between CT image and robot, and (iii) target US scan. Utilizing 3D US-CT registration and deep learning-based segmentation networks, we can achieve precise imaging of 3D hepatic veins, facilitating accurate coordinate mapping between CT and the robot. This enables the automatic localization of follow-up targets within the CT image, allowing the robot to navigate precisely to the target's surface. Evaluation of the ultrasound phantom confirms the quality of the US-CT registration and shows the robot reliably locates the targets in repeated trials. The proposed framework holds the potential to significantly reduce time and costs for healthcare providers, clinicians, and follow-up patients, thereby addressing the increasing healthcare burden associated with chronic disease in local communities.
Published: 2024

4. Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization

Author: Ma, Haitong, Zhang, Tianpeng, Wu, Yixuan, Calmon, Flavio P., Li, Na, Ma, Haitong, Zhang, Tianpeng, Wu, Yixuan, Calmon, Flavio P., and Li, Na
Abstract: We study the multi-agent Bayesian optimization (BO) problem, where multiple agents maximize a black-box function via iterative queries. We focus on Entropy Search (ES), a sample-efficient BO algorithm that selects queries to maximize the mutual information about the maximum of the black-box function. One of the main challenges of ES is that calculating the mutual information requires computationally-costly approximation techniques. For multi-agent BO problems, the computational cost of ES is exponential in the number of agents. To address this challenge, we propose the Gaussian Max-value Entropy Search, a multi-agent BO algorithm with favorable sample and computational efficiency. The key to our idea is to use a normal distribution to approximate the function maximum and calculate its mutual information accordingly. The resulting approximation allows queries to be cast as the solution of a closed-form optimization problem which, in turn, can be solved via a modified gradient ascent algorithm and scaled to a large number of agents. We demonstrate the effectiveness of Gaussian max-value Entropy Search through numerical experiments on standard test functions and real-robot experiments on the source-seeking problem. Results show that the proposed algorithm outperforms the multi-agent BO baselines in the numerical experiments and can stably seek the source with a limited number of noisy observations on real robots., Comment: 10 pages, 9 figures
Published: 2023

5. Policy Iteration Based Approximate Dynamic Programming Toward Autonomous Driving in Constrained Dynamic Environment

Author: Lin, Ziyu, Ma, Jun, Duan, Jingliang, Li, Shengbo Eben, Ma, Haitong, Cheng, Bo, Lee, Tong Heng, Lin, Ziyu, Ma, Jun, Duan, Jingliang, Li, Shengbo Eben, Ma, Haitong, Cheng, Bo, and Lee, Tong Heng
Abstract: In the area of autonomous driving, it typically brings great difficulty in solving the motion planning problem since the vehicle model is nonlinear and the driving scenarios are complex. Particularly, most of the existing methods cannot be generalized to dynamically changing scenarios with varying surrounding vehicles. To address this problem, this development here investigates the framework of integrated decision and control. As part of the modules, static path planning determines the reference candidates ahead, and then the optimal path-tracking controller realizes the specific autonomous driving task. An innovative and effective constrained finite-horizon approximate dynamic programming (ADP) algorithm is herein presented to generate the desired control policy for effective path tracking. With the generalized policy neural network that maps from the state to the control input, the proposed algorithm preserves the high effectiveness for the motion planning problem towards changing driving environments with varying surrounding vehicles. Moreover, the algorithm attains the noteworthy advantage of alleviating the typically heavy computational loads with the mode of offline training and online execution. As a result of the utilization of multi-layer neural networks in conjunction with the actor-critic framework, the constrained ADP method is capable of handling complex and multidimensional scenarios. Finally, various simulations have been carried out to show that the constrained ADP algorithm is effective. IEEE
Published: 2023

6. Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

Author: Ren, Tongzheng, Ren, Zhaolin, Ma, Haitong, Li, Na, Dai, Bo, Ren, Tongzheng, Ren, Zhaolin, Ma, Haitong, Li, Na, and Dai, Bo
Abstract: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method leverages an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation. To characterize the effectiveness of these finite dimensional approximations, we provide an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. Our analysis includes two prominent kernel approximation methods: truncations onto random features and Nystrom features. We also empirically test the algorithm and compare the performance with Koopman-based, iLQR, and energy-based methods on a few benchmark problems., Comment: Compared to v1, added analysis of Nystrom features, more streamlined proofs, and more extensive numerical studies; compared to v2, corrected a small error in ordering of author list
Published: 2023

7. Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Author: Yu, Dongjie, Zou, Wenjun, Yang, Yujie, Ma, Haitong, Li, Shengbo Eben, Duan, Jingliang, Chen, Jianyu, Yu, Dongjie, Zou, Wenjun, Yang, Yujie, Ma, Haitong, Li, Shengbo Eben, Duan, Jingliang, and Chen, Jianyu
Abstract: Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an issue in safe model-based RL, especially in training time safety. In this paper, we propose a distributional reachability certificate (DRC) and its Bellman equation to address model uncertainties and characterize robust persistently safe states. Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy. We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy. Comprehensive experiments on classical benchmarks such as constrained tracking and navigation indicate that the proposed algorithm achieves comparable returns with much fewer constraint violations during training., Comment: 12 pages, 6 figures
Published: 2022

8. Synthesize Efficient Safety Certificates for Learning-Based Safe Control using Magnitude Regularization

Author: Zheng, Haotian, Ma, Haitong, Zheng, Sifa, Li, Shengbo Eben, Wang, Jianqiang, Zheng, Haotian, Ma, Haitong, Zheng, Sifa, Li, Shengbo Eben, and Wang, Jianqiang
Abstract: Energy-function-based safety certificates can provide provable safety guarantees for the safe control tasks of complex robotic systems. However, all recent studies about learning-based energy function synthesis only consider the feasibility, which might cause over-conservativeness and result in less efficient controllers. In this work, we proposed the magnitude regularization technique to improve the efficiency of safe controllers by reducing the conservativeness inside the energy function while keeping the promising provable safety guarantees. Specifically, we quantify the conservativeness by the magnitude of the energy function, and we reduce the conservativeness by adding a magnitude regularization term to the synthesis loss. We propose the SafeMR algorithm that uses reinforcement learning (RL) for the synthesis to unify the learning processes of safe controllers and energy functions. Experimental results show that the proposed method does reduce the conservativeness of the energy functions and outperforms the baselines in terms of the controller efficiency while guaranteeing safety., Comment: 8 pages, 6 figures
Published: 2022

9. Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

Author: Lin, Ziyu, Duan, Jingliang, Li, Shengbo Eben, Ma, Haitong, Li, Jie, Chen, Jianyu, Cheng, Bo, Ma, Jun, Lin, Ziyu, Duan, Jingliang, Li, Shengbo Eben, Ma, Haitong, Li, Jie, Chen, Jianyu, Cheng, Bo, and Ma, Jun
Abstract: The Hamilton-Jacobi-Bellman (HJB) equation serves as the necessary and sufficient condition for the optimal solution to the continuous-time (CT) optimal control problem (OCP). Compared with the infinite-horizon HJB equation, the solving of the finite-horizon (FH) HJB equation has been a long-standing challenge, because the partial time derivative of the value function is involved as an additional unknown term. To address this problem, this study first-time bridges the link between the partial time derivative and the terminal-time utility function, and thus it facilitates the use of the policy iteration (PI) technique to solve the CT FH OCPs. Based on this key finding, the FH approximate dynamic programming (ADP) algorithm is proposed leveraging an actor-critic framework. It is shown that the algorithm exhibits important properties in terms of convergence and optimality. Rather importantly, with the use of multilayer neural networks (NNs) in the actor-critic architecture, the algorithm is suitable for CT FH OCPs toward more general nonlinear and complex systems. Finally, the effectiveness of the proposed algorithm is demonstrated by conducting a series of simulations on both a linear quadratic regulator (LQR) problem and a nonlinear vehicle tracking problem.
Published: 2022

10. Reachability Constrained Reinforcement Learning

Author: Yu, Dongjie, Ma, Haitong, Li, Shengbo Eben, Chen, Jianyu, Yu, Dongjie, Ma, Haitong, Li, Shengbo Eben, and Chen, Jianyu
Abstract: Constrained reinforcement learning (CRL) has gained significant interest recently, since safety constraints satisfaction is critical for real-world problems. However, existing CRL methods constraining discounted cumulative costs generally lack rigorous definition and guarantee of safety. In contrast, in the safe control research, safety is defined as persistently satisfying certain state constraints. Such persistent safety is possible only on a subset of the state space, called feasible set, where an optimal largest feasible set exists for a given environment. Recent studies incorporate feasible sets into CRL with energy-based methods such as control barrier function (CBF), safety index (SI), and leverage prior conservative estimations of feasible sets, which harms the performance of the learned policy. To deal with this problem, this paper proposes the reachability CRL (RCRL) method by using reachability analysis to establish the novel self-consistency condition and characterize the feasible sets. The feasible sets are represented by the safety value function, which is used as the constraint in CRL. We use the multi-time scale stochastic approximation theory to prove that the proposed algorithm converges to a local optimum, where the largest feasible set can be guaranteed. Empirical results on different benchmarks validate the learned feasible set, the policy performance, and constraint satisfaction of RCRL, compared to CRL and safe control baselines., Comment: Accepted by ICML 2022
Published: 2022

11. Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Author: Ma, Haitong, Liu, Changliu, Li, Shengbo Eben, Zheng, Sifa, Sun, Wenchao, Chen, Jianyu, Ma, Haitong, Liu, Changliu, Li, Shengbo Eben, Zheng, Sifa, Sun, Wenchao, and Chen, Jianyu
Abstract: In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.
Published: 2021

12. Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Author: Ma, Haitong, Liu, Changliu, Li, Shengbo Eben, Zheng, Sifa, Chen, Jianyu, Ma, Haitong, Liu, Changliu, Li, Shengbo Eben, Zheng, Sifa, and Chen, Jianyu
Abstract: Safety is the major consideration in controlling complex dynamical systems using reinforcement learning (RL), where the safety certificate can provide provable safety guarantee. A valid safety certificate is an energy function indicating that safe states are with low energy, and there exists a corresponding safe control policy that allows the energy function to always dissipate. The safety certificate and the safe control policy are closely related to each other and both challenging to synthesize. Therefore, existing learning-based studies treat either of them as prior knowledge to learn the other, which limits their applicability with general unknown dynamics. This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL. We do not rely on prior knowledge about either an available model-based controller or a perfect safety certificate. In particular, we formulate a loss function to optimize the safety certificate parameters by minimizing the occurrence of energy increases. By adding this optimization procedure as an outer loop to the Lagrangian-based constrained reinforcement learning (CRL), we jointly update the policy and safety certificate parameters and prove that they will converge to their respective local optima, the optimal safe policy and a valid safety certificate. We evaluate our algorithms on multiple safety-critical benchmark environments. The results show that the proposed algorithm learns provably safe policies with no constraint violation. The validity or feasibility of synthesized safety certificate is also verified numerically., Comment: 24 pages, 8 figures, accepted for oral presentation at L4DC 2022
Published: 2021

13. Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Author: Ma, Haitong, Guan, Yang, Li, Shegnbo Eben, Zhang, Xiangteng, Zheng, Sifa, Chen, Jianyu, Ma, Haitong, Guan, Yang, Li, Shegnbo Eben, Zhang, Xiangteng, Zheng, Sifa, and Chen, Jianyu
Abstract: The safety constraints commonly used by existing safe reinforcement learning (RL) methods are defined only on expectation of initial states, but allow each certain state to be unsafe, which is unsatisfying for real-world safety-critical tasks. In this paper, we introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained RL method that considers statewise safety, e.g, safety for each initial state. We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible. By constructing a statewise Lagrange function available on RL sampling and adopting an additional neural network to approximate the statewise Lagrange multiplier, we manage to obtain the optimal feasible policy which ensures safety for each feasible state and the safest possible policy for infeasible states. Furthermore, the trained multiplier net can indicate whether a given state is feasible or not through the statewise complementary slackness condition. We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization. Experimental results on both robot locomotive tasks and safe exploration tasks verify the safety enhancement and feasibility interpretation of the proposed method.
Published: 2021

14. Integrated Decision and Control: Towards Interpretable and Computationally Efficient Driving Intelligence

Author: Guan, Yang, Ren, Yangang, Sun, Qi, Li, Shengbo Eben, Ma, Haitong, Duan, Jingliang, Dai, Yifan, Cheng, Bo, Guan, Yang, Ren, Yangang, Sun, Qi, Li, Shengbo Eben, Ma, Haitong, Duan, Jingliang, Dai, Yifan, and Cheng, Bo
Abstract: Decision and control are core functionalities of high-level automated vehicles. Current mainstream methods, such as functionality decomposition and end-to-end reinforcement learning (RL), either suffer high time complexity or poor interpretability and adaptability on real-world autonomous driving tasks. In this paper, we present an interpretable and computationally efficient framework called integrated decision and control (IDC) for automated vehicles, which decomposes the driving task into static path planning and dynamic optimal tracking that are structured hierarchically. First, the static path planning generates several candidate paths only considering static traffic elements. Then, the dynamic optimal tracking is designed to track the optimal path while considering the dynamic obstacles. To that end, we formulate a constrained optimal control problem (OCP) for each candidate path, optimize them separately and follow the one with the best tracking performance. To unload the heavy online computation, we propose a model-based reinforcement learning (RL) algorithm that can be served as an approximate constrained OCP solver. Specifically, the OCPs for all paths are considered together to construct a single complete RL problem and then solved offline in the form of value and policy networks, for real-time online path selecting and tracking respectively. We verify our framework in both simulations and the real world. Results show that compared with baseline methods IDC has an order of magnitude higher online computing efficiency, as well as better driving performance including traffic efficiency and safety. In addition, it yields great interpretability and adaptability among different driving tasks. The effectiveness of the proposed method is also demonstrated in real road tests with complicated traffic conditions.
Published: 2021

15. Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Author: Ma, Haitong, Chen, Jianyu, Li, Shengbo Eben, Lin, Ziyu, Guan, Yang, Ren, Yangang, Zheng, Sifa, Ma, Haitong, Chen, Jianyu, Li, Shengbo Eben, Lin, Ziyu, Guan, Yang, Ren, Yangang, and Zheng, Sifa
Abstract: Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
Published: 2021

16. Feasibility Enhancement of Constrained Receding Horizon Control Using Generalized Control Barrier Function

Author: Ma, Haitong, Zhang, Xiangteng, Li, Shengbo Eben, Lin, Ziyu, Lyu, Yao, Zheng, Sifa, Ma, Haitong, Zhang, Xiangteng, Li, Shengbo Eben, Lin, Ziyu, Lyu, Yao, and Zheng, Sifa
Abstract: Receding horizon control (RHC) is a popular procedure to deal with optimal control problems. Due to the existence of state constraints, optimization-based RHC often suffers the notorious issue of infeasibility, which strongly shrinks the region of controllable state. This paper proposes a generalized control barrier function (CBF) to enlarge the feasible region of constrained RHC with only a one-step constraint on the prediction horizon. This design can reduce the constrained steps by penalizing the tendency to move towards the constraint boundary. Additionally, generalized CBF is able to handle high-order equality or inequality constraints through extending the constrained step to nonadjacent nodes. We apply this technique on an automated vehicle control task. The results show that compared to multi-step pointwise constraints, generalized CBF can effectively avoid the infeasibility issue in a larger partition of the state space, and the computing efficiency is also improved by 14%-23%.
Published: 2021

17. Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency

Author: Lin, Ziyu, Duan, Jingliang, Li, Shengbo Eben, Ma, Haitong, Yin, Yuming, Lin, Ziyu, Duan, Jingliang, Li, Shengbo Eben, Ma, Haitong, and Yin, Yuming
Abstract: The design of an automated vehicle controller can be generally formulated into an optimal control problem. This paper proposes a continuous-time finite-horizon approximate dynamicprogramming (ADP) method, which can synthesis off-line near-optimal control policy with analytical vehicle dynamics. Lying on the general Policy Iteration framework, it employs value andpolicy neural networks to approximate the mappings from thesystem states to value function and control inputs, respectively. The proposed method can converge to the near-optimal solutionof the finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. We further applied our algorithm to the simulation of automated vehicle control for the path tracking maneuver. The results suggest that the proposed ADP method can obtain the near-optimal policy with 1% error and less calculation time. What is more, the proposed ADP algorithm is also suitable for nonlinear control systems, where ADP is almost 500 times faster than the nonlinear MPC ipopt solver., Comment: 7 pages,conference
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Ma, Haitong"'

1. Efficient Duple Perturbation Robustness in Low-rank MDPs

2. Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

3. Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

4. Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization

5. Policy Iteration Based Approximate Dynamic Programming Toward Autonomous Driving in Constrained Dynamic Environment

6. Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

7. Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

8. Synthesize Efficient Safety Certificates for Learning-Based Safe Control using Magnitude Regularization

9. Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

10. Reachability Constrained Reinforcement Learning

11. Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

12. Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

13. Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

14. Integrated Decision and Control: Towards Interpretable and Computationally Efficient Driving Intelligence

15. Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

16. Feasibility Enhancement of Constrained Receding Horizon Control Using Generalized Control Barrier Function

17. Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

17 results on '"Ma, Haitong"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources