385 results on '"Evangelos A. Theodorou"'
Search Results
252. Sampling-Based Nonlinear Stochastic Optimal Control for Neuromechanical Systems
- Author
-
Marcus A. Pereira, Evangelos A. Theodorou, Emily A. Reed, and Francisco J. Valero-Cuevas
- Subjects
Stochastic control ,0209 industrial biotechnology ,Computer science ,Normal Distribution ,02 engineering and technology ,Kinematics ,Variance (accounting) ,Function (mathematics) ,Linear-quadratic-Gaussian control ,Biomechanical Phenomena ,Fingers ,Tendons ,03 medical and health sciences ,Stochastic differential equation ,Nonlinear system ,020901 industrial engineering & automation ,0302 clinical medicine ,Control theory ,Robot ,Humans ,030217 neurology & neurosurgery ,Algorithms - Abstract
Determining how the nervous system controls tendon-driven bodies remains an open question. Stochastic optimal control (SOC) has been proposed as a plausible analogy in the neuroscience community. SOC relies on solving the Hamilton-Jacobi-Bellman equation, which seeks to minimize a desired cost function for a given task with noisy controls. We evaluate and compare three SOC methodologies to produce tapping by a simulated planar 3-joint human index finger: iterative Linear Quadratic Gaussian (iLQG), Model-Predictive Path Integral Control (MPPI), and Deep Forward-Backward Stochastic Differential Equations (FBSDE). We show that averaged over 128 repeats these methodologies can place the fingertip at the desired final joint angles but–because of kinematic redundancy and the presence of noise–they each have joint trajectories and final postures with different means and variances. iLQG in particular, had the largest kinematic variance and departure from the final desired joint angles. We demonstrate that MPPI and FBSDE have superior performance for such nonlinear, tendon-driven systems with noisy controls.Clinical relevance— The mathematical framework provided by MPPI and FBSDE may be best suited for tendon-driven anthropomorphic robots, exoskeletons, and prostheses for amputees.
- Published
- 2020
253. Cooperative Path Integral Control for Stochastic Multi-Agent Systems
- Author
-
Evangelos A. Theodorou, Petros G. Voulgaris, Neng Wan, Naira Hovakimyan, and Aditya Gahlawat
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Mathematical optimization ,Computer Science - Machine Learning ,Computer science ,Systems and Control (eess.SY) ,02 engineering and technology ,Electrical Engineering and Systems Science - Systems and Control ,Machine Learning (cs.LG) ,Computer Science - Robotics ,020901 industrial engineering & automation ,FOS: Electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Computer Science - Multiagent Systems ,Mathematics - Optimization and Control ,Stochastic control ,Partial differential equation ,Multi-agent system ,020208 electrical & electronic engineering ,Approximation algorithm ,Optimal control ,Decentralised system ,Action (physics) ,Computer Science::Multiagent Systems ,Joint cost ,Optimization and Control (math.OC) ,Robotics (cs.RO) ,Multiagent Systems (cs.MA) - Abstract
A distributed stochastic optimal control solution is presented for cooperative multi-agent systems. The network of agents is partitioned into multiple factorial subsystems, each of which consists of a central agent and neighboring agents. Local control actions that rely only on agents' local observations are designed to optimize the joint cost functions of subsystems. When solving for the local control actions, the joint optimality equation for each subsystem is cast as a linear partial differential equation and solved using the Feynman-Kac formula. The solution and the optimal control action are then formulated as path integrals and approximated by a Monte-Carlo method. Numerical verification is provided through a simulation example consisting of a team of cooperative UAVs., To appear in American Control Conference 2021, New Orleans, LA, USA
- Published
- 2020
254. Deep Learning Tubes for Tube MPC
- Author
-
Evangelos A. Theodorou, Ali Agha, and David D. Fan
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Mathematical optimization ,business.industry ,Computer science ,Deep learning ,Probabilistic logic ,Context (language use) ,Systems and Control (eess.SY) ,Trajectory optimization ,Electrical Engineering and Systems Science - Systems and Control ,Machine Learning (cs.LG) ,Computer Science - Robotics ,Model predictive control ,Optimization and Control (math.OC) ,Margin (machine learning) ,FOS: Electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,Reinforcement learning ,Artificial intelligence ,Uncertainty quantification ,business ,Robotics (cs.RO) ,Mathematics - Optimization and Control - Abstract
Learning-based control aims to construct models of a system to use for planning or trajectory optimization, e.g. in model-based reinforcement learning. In order to obtain guarantees of safety in this context, uncertainty must be accurately quantified. This uncertainty may come from errors in learning (due to a lack of data, for example), or may be inherent to the system. Propagating uncertainty forward in learned dynamics models is a difficult problem. In this work we use deep learning to obtain expressive and flexible models of how distributions of trajectories behave, which we then use for nonlinear Model Predictive Control (MPC). We introduce a deep quantile regression framework for control that enforces probabilistic quantile bounds and quantifies epistemic uncertainty. Using our method we explore three different approaches for learning tubes that contain the possible trajectories of the system, and demonstrate how to use each of them in a Tube MPC scheme. We prove these schemes are recursively feasible and satisfy constraints with a desired margin of probability. We present experiments in simulation on a nonlinear quadrotor system, demonstrating the practical efficacy of these ideas., RSS 2020 Camera Ready Version
- Published
- 2020
255. Nonlinear Covariance Control via Differential Dynamic Programming
- Author
-
Yongxin Chen, Evangelos A. Theodorou, Zeji Yi, and Zhefeng Cao
- Subjects
Mathematical optimization ,Computer science ,Linear system ,MathematicsofComputing_NUMERICALANALYSIS ,Systems and Control (eess.SY) ,Covariance ,Optimal control ,Electrical Engineering and Systems Science - Systems and Control ,Nonlinear system ,Terminal (electronics) ,FOS: Electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Differential dynamic programming ,State (computer science) - Abstract
We consider covariance control problems for nonlinear stochastic systems. Our objective is to find an optimal control strategy to steer the state from an initial distribution to a terminal one with specified mean and covariance. This problem is considerably more complicated than previous studies on covariance control for linear systems. We leverage a widely used technique - differential dynamic programming - in nonlinear optimal control to achieve our goal. In particular, we adopt the stochastic differential dynamic programming framework to handle the stochastic dynamics. Additionally, to enforce the terminal statistical constraints, we construct a Lagrangian and apply a primal-dual type algorithm. Several examples are presented to demonstrate the effectiveness of our framework., 7 pages, 5 figures
- Published
- 2020
256. Constrained Differential Dynamic Programming Revisited
- Author
-
Evangelos A. Theodorou, Akash Patel, Yuichiro Aoyama, and George I. Boutselis
- Subjects
Mathematical optimization ,65K10 ,Augmented Lagrangian method ,Computer science ,G.1.6 ,Trajectory optimization ,Optimal control ,Slack variable ,Dynamic programming ,Optimization and Control (math.OC) ,Bellman equation ,FOS: Mathematics ,Penalty method ,Differential dynamic programming ,Mathematics - Optimization and Control - Abstract
Differential Dynamic Programming (DDP) has become a well established method for unconstrained trajectory optimization. Despite its several applications in robotics and controls however, a widely successful constrained version of the algorithm has yet to be developed. This paper builds upon penalty methods and active-set approaches, towards designing a Dynamic Programming-based methodology for constrained optimal control. Regarding the former, our derivation employs a constrained version of Bellman's principle of optimality, by introducing a set of auxiliary slack variables in the backward pass. In parallel, we show how Augmented Lagrangian methods can be naturally incorporated within DDP, by utilizing a particular set of penalty-Lagrangian functions that preserve second-order differentiability. We demonstrate experimentally that our extensions (individually and combinations thereof) enhance significantly the convergence properties of the algorithm, and outperform previous approaches on a large number of simulated scenarios., 13 pages, 9 figures
- Published
- 2020
257. Optimally Adhering to Behavioral Priors in Unknown Environments with RRTs
- Author
-
Evangelos A. Theodorou, Dimitri N. Mavris, Ethan N. Evans, Matthew J. Bays, and Patrick Meyer
- Subjects
business.industry ,Computer science ,Prior probability ,Artificial intelligence ,business ,Machine learning ,computer.software_genre ,computer - Published
- 2020
258. Approximate Inverse Reinforcement Learning from Vision-based Imitation Learning
- Author
-
Bogdan Vlahov, Evangelos A. Theodorou, Keuntaek Lee, Jason Gibson, and James M. Rehg
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,business.industry ,Computer science ,Computer Science - Artificial Intelligence ,media_common.quotation_subject ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Process (computing) ,Function generator ,Computer Science - Computer Vision and Pattern Recognition ,Visualization ,Machine Learning (cs.LG) ,Model predictive control ,Computer Science - Robotics ,Artificial Intelligence (cs.AI) ,Robustness (computer science) ,Reinforcement learning ,Artificial intelligence ,Function (engineering) ,business ,Robotics (cs.RO) ,media_common - Abstract
In this work, we present a method for obtaining an implicit objective function for vision-based navigation. The proposed methodology relies on Imitation Learning, Model Predictive Control (MPC), and an interpretation technique used in Deep Neural Networks. We use Imitation Learning as a means to do Inverse Reinforcement Learning in order to create an approximate cost function generator for a visual navigation challenge. The resulting cost function, the costmap, is used in conjunction with MPC for real-time control and outperforms other state-of-the-art costmap generators in novel environments. The proposed process allows for simple training and robustness to out-of-sample data. We apply our method to the task of vision-based autonomous driving in multiple real and simulated environments and show its generalizability. Supplementary video: https://youtu.be/WyJfT5lc0aQ
- Published
- 2020
- Full Text
- View/download PDF
259. Semi-parametric Approaches to Learning in Model-Based Hierarchical Control of Complex Systems
- Author
-
Areeb Mehmood, Victor Aladele, Byron Boots, Evangelos A. Theodorou, Shimin Zhang, Muhammad Ali Murtaza, Seth Hutchinson, Mouhyemen Khan, and Munzir Zafar
- Subjects
0209 industrial biotechnology ,Theoretical computer science ,business.industry ,Stability (learning theory) ,Probabilistic logic ,Complex system ,Robotics ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,020901 industrial engineering & automation ,Component (UML) ,Reinforcement learning ,State space ,Differential dynamic programming ,Artificial intelligence ,business ,0105 earth and related environmental sciences - Abstract
For systems with complex and unstable dynamics, such as humanoids, the use of model-based control within a hierarchical framework remains the tool of choice. This is due to the challenges associated with applying model-free reinforcement learning on such problems, such as sample inefficiency and limits on exploration of state space in the absence of safety/stability guarantees. However, relying purely on physics-based models comes with its own set of problems. For instance, the necessary limits on expressiveness imposed by committing to fixed basis functions, and consequently, their limited ability to learn from data gathered on-line. This gap between theoretical models and real-world dynamics gives rise to a need to incorporate a learning component at some level within the model-based control framework. In this work, we present a highly redundant wheeled inverted-pendulum humanoid as a testbed for experimental validation of some recent approaches proposed to deal with these fundamental issues in the field of robotics, such as: 1. Semi-parametric Gaussian Process-based approaches to computed-torque control of serial robots [1] 2. Probabilistic Differential Dynamic Programming framework for trajectory planning by high-level controllers [2, 3] 3. Barrier Certificate based safe-learning approaches for data collection to learn the dynamics of inherently unstable systems [4]. We discuss how a typical model-based hierarchical control framework can be extended to incorporate approaches for learning at various stages of control design and hierarchy, based on the aforementioned tools.
- Published
- 2020
260. Ensemble Bayesian Decision Making with Redundant Deep Perceptual Control Policies
- Author
-
Harleen K. Brar, Keuntaek Lee, Ziyi Wang, Evangelos A. Theodorou, and Bogdan Vlahov
- Subjects
FOS: Computer and information sciences ,Artificial neural network ,Computer science ,business.industry ,Event (computing) ,Bayesian probability ,Control (management) ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Task (project management) ,Computer Science - Robotics ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Noise (video) ,business ,Robotics (cs.RO) ,computer ,0105 earth and related environmental sciences - Abstract
This work presents a novel ensemble of Bayesian Neural Networks (BNNs) for control of safety-critical systems. Decision making for safety-critical systems is challenging due to performance requirements with significant consequences in the event of failure. In practice, failure of such systems can be avoided by introducing redundancies of control. Neural Networks (NNs) are generally not used for safety-critical systems as they can behave in unexpected ways in response to novel inputs. In addition, there may not be any indication as to when they will fail. BNNs have been recognized for their ability to produce not only viable outputs but also provide a measure of uncertainty in these outputs. This work combines the knowledge of prediction uncertainty obtained from BNNs and ensemble control for a redundant control methodology. Our technique is applied to an agile autonomous driving task. Multiple BNNs are trained to control a vehicle in an end-to-end fashion on different sensor inputs provided by the system. We show that an individual network is successful in maneuvering around the track but crashes in the presence of unforeseen input noise. Our proposed ensemble of BNNs shows successful task performance even in the event of multiple sensor failures.
- Published
- 2019
261. Stochastic Differential Games: A Sampling Approach via FBSDEs
- Author
-
Evangelos A. Theodorou, Panagiotis Tsiotras, and Ioannis Exarchos
- Subjects
Statistics and Probability ,0209 industrial biotechnology ,Economics and Econometrics ,Lemma (mathematics) ,Class (set theory) ,021103 operations research ,Computer science ,Applied Mathematics ,Numerical analysis ,MathematicsofComputing_NUMERICALANALYSIS ,Mathematics::Optimization and Control ,0211 other engineering and technologies ,Probabilistic logic ,Sampling (statistics) ,02 engineering and technology ,Computer Graphics and Computer-Aided Design ,Computer Science Applications ,Computational Mathematics ,Stochastic differential equation ,Nonlinear system ,020901 industrial engineering & automation ,Computational Theory and Mathematics ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,Applied mathematics ,Differential (mathematics) - Abstract
The aim of this work is to present a sampling-based algorithm designed to solve various classes of stochastic differential games. The foundation of the proposed approach lies in the formulation of the game solution in terms of a decoupled pair of forward and backward stochastic differential equations (FBSDEs). In light of the nonlinear version of the Feynman–Kac lemma, probabilistic representations of solutions to the nonlinear Hamilton–Jacobi–Isaacs equations that arise for each class are obtained. These representations are in form of decoupled systems of FBSDEs, which may be solved numerically.
- Published
- 2018
262. Stochastic optimal control via forward and backward stochastic differential equations and importance sampling
- Author
-
Evangelos A. Theodorou and Ioannis Exarchos
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Girsanov theorem ,Iterative method ,02 engineering and technology ,Optimal control ,01 natural sciences ,Stochastic partial differential equation ,010104 statistics & probability ,Nonlinear system ,Stochastic differential equation ,020901 industrial engineering & automation ,Control and Systems Engineering ,0101 mathematics ,Electrical and Electronic Engineering ,Importance sampling ,Mathematics - Abstract
The aim of this work is to present a novel sampling-based numerical scheme designed to solve a certain class of stochastic optimal control problems, utilizing forward and backward stochastic differential equations (FBSDEs). By means of a nonlinear version of the Feynman–Kac lemma, we obtain a probabilistic representation of the solution to the nonlinear Hamilton–Jacobi–Bellman equation, expressed in the form of a system of decoupled FBSDEs. This system of FBSDEs can be solved by employing linear regression techniques. The proposed framework relaxes some of the restrictive conditions present in recent sampling based methods within the Linearly Solvable Optimal Control framework, and furthermore addresses problems in which the time horizon is not prespecified. To enhance the efficiency of the proposed scheme when treating more complex nonlinear systems, we then derive an iterative algorithm based on Girsanov’s theorem on the change of measure, which features importance sampling. This scheme is shown to be capable of learning the optimal control without requiring an initial guess.
- Published
- 2018
263. Autonomous Suspended Load Operations via Trajectory Optimization and Variational Integrators
- Author
-
Eric N. Johnson, Gerardo De La Torre, and Evangelos A. Theodorou
- Subjects
020301 aerospace & aeronautics ,0209 industrial biotechnology ,Mathematical optimization ,Discretization ,Applied Mathematics ,Aerospace Engineering ,02 engineering and technology ,Linear-quadratic regulator ,Trajectory optimization ,Optimal control ,Dynamic programming ,020901 industrial engineering & automation ,0203 mechanical engineering ,Discrete time and continuous time ,Space and Planetary Science ,Control and Systems Engineering ,Control theory ,Differential dynamic programming ,Electrical and Electronic Engineering ,Variational integrator ,Mathematics - Abstract
This paper presents a real-time implementable trajectory optimization framework for autonomous suspended load operations in outdoor environments. The framework solves the posed optimal control problem with the iteration-based differential dynamic programming algorithm. The algorithm uses a variational integrator to propagate the modeled system’s state configuration and linearize the resulting discrete dynamics. The variational integrator is an excellent candidate for real-time implementation because it remains accurate despite relatively large discretization time steps. Therefore, the computational effort of the differential dynamic programming algorithm can be mitigated through the reduction of discrete time points. The state of the slung load is estimated via an augmentation to the existing navigation system that only uses vision-based measurements of the load. Simulation studies and a flight test are presented to demonstrate the effectiveness of the proposed framework.
- Published
- 2017
264. Model Predictive Path Integral Control: From Theory to Parallel Computation
- Author
-
Andrew Aldrich, Grady Williams, and Evangelos A. Theodorou
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Iterative method ,Computer science ,Applied Mathematics ,Graphics processing unit ,Aerospace Engineering ,Probability density function ,02 engineering and technology ,Nonlinear system ,020901 industrial engineering & automation ,Space and Planetary Science ,Control and Systems Engineering ,Path integral formulation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Graphics ,Algorithm ,Importance sampling - Abstract
In this paper, a model predictive path integral control algorithm based on a generalized importance sampling scheme is developed and parallel optimization via sampling is performed using a graphics...
- Published
- 2017
265. Practical System Identification for Small VTOL Unmanned Aerial Vehicle
- Author
-
Manan Gandhi, Evangelos A. Theodorou, Eric N. Johnson, and Lee Whitcher
- Subjects
Computer science ,System identification ,Control engineering - Published
- 2019
266. Deep Forward-Backward SDEs for Min-max Control
- Author
-
Keuntaek Lee, Ioannis Exarchos, Ziyi Wang, Evangelos A. Theodorou, and Marcus A. Pereira
- Subjects
Stochastic control ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Partial differential equation ,Differential equation ,MathematicsofComputing_NUMERICALANALYSIS ,Machine Learning (stat.ML) ,Optimal control ,Connection (mathematics) ,Machine Learning (cs.LG) ,Stochastic differential equation ,Nonlinear system ,Statistics - Machine Learning ,Optimization and Control (math.OC) ,ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION ,FOS: Mathematics ,Applied mathematics ,Mathematics - Optimization and Control ,Differential (mathematics) ,Mathematics - Abstract
This paper presents a novel approach to numerically solve stochastic differential games for nonlinear systems. The proposed approach relies on the nonlinear Feynman-Kac theorem that establishes a connection between parabolic deterministic partial differential equations and forward-backward stochastic differential equations. Using this theorem the Hamilton-Jacobi-Isaacs partial differential equation associated with differential games is represented by a system of forward-backward stochastic differential equations. Numerical solution of the aforementioned system of stochastic differential equations is performed using importance sampling and a Long-Short Term Memory recurrent neural network, which is trained in an offline fashion. The resulting algorithm is tested on two example systems in simulation and compared against the standard risk neutral stochastic optimal control formulations.
- Published
- 2019
- Full Text
- View/download PDF
267. Bayesian Learning-Based Adaptive Control for Safety Critical Systems
- Author
-
Ali-akbar Agha-mohammadi, Rohan Thakker, Evangelos A. Theodorou, Jennifer Nguyen, David D. Fan, and Nikhilesh Alatur
- Subjects
Lyapunov function ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,0209 industrial biotechnology ,Adaptive control ,Computer science ,Stability (learning theory) ,02 engineering and technology ,Systems and Control (eess.SY) ,Bayesian inference ,Electrical Engineering and Systems Science - Systems and Control ,Machine Learning (cs.LG) ,Computer Science - Robotics ,symbols.namesake ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Electrical engineering, electronic engineering, information engineering ,Gaussian process ,business.industry ,Stochastic process ,Deep learning ,Life-critical system ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Robotics (cs.RO) - Abstract
Deep learning has enjoyed much recent success, and applying state-of-the-art model learning methods to controls is an exciting prospect. However, there is a strong reluctance to use these methods on safety-critical systems, which have constraints on safety, stability, and real-time performance. We propose a framework which satisfies these constraints while allowing the use of deep neural networks for learning model uncertainties. Central to our method is the use of Bayesian model learning, which provides an avenue for maintaining appropriate degrees of caution in the face of the unknown. In the proposed approach, we develop an adaptive control framework leveraging the theory of stochastic CLFs (Control Lyapunov Functions) and stochastic CBFs (Control Barrier Functions) along with tractable Bayesian model learning via Gaussian Processes or Bayesian neural networks. Under reasonable assumptions, we guarantee stability and safety while adapting to unknown dynamics with probability 1. We demonstrate this architecture for high-speed terrestrial mobility targeting potential applications in safety-critical high-speed Mars rover missions., Comment: Corrected an error in section II, where previously the problem was introduced in a non-stochastic setting and wrongly assumed the solution to an ODE with Gaussian distributed parametric uncertainty was equivalent to an SDE with a learned diffusion term. See Lew, T et al. "On the Problem of Reformulating Systems with Uncertain Dynamics as a Stochastic Differential Equation"
- Published
- 2019
- Full Text
- View/download PDF
268. Learning Deep Stochastic Optimal Control Policies using Forward-Backward SDEs
- Author
-
Evangelos A. Theodorou, Ziyi Wang, Marcus A. Pereira, and Ioannis Exarchos
- Subjects
Stochastic control ,FOS: Computer and information sciences ,Mathematical optimization ,Partial differential equation ,Artificial neural network ,Relation (database) ,Computer science ,business.industry ,Robotics ,Computer Science - Robotics ,Stochastic differential equation ,Nonlinear system ,Artificial intelligence ,business ,Robotics (cs.RO) - Abstract
In this paper we propose a new methodology for decision-making under uncertainty using recent advancements in the areas of nonlinear stochastic optimal control theory, applied mathematics, and machine learning. Grounded on the fundamental relation between certain nonlinear partial differential equations and forward-backward stochastic differential equations, we develop a control framework that is scalable and applicable to general classes of stochastic systems and decision-making problem formulations in robotics and autonomy. The proposed deep neural network architectures for stochastic control consist of recurrent and fully connected layers. The performance and scalability of the aforementioned algorithm are investigated in three non-linear systems in simulation with and without control constraints. We conclude with a discussion on future directions and their implications to robotics.
- Published
- 2019
- Full Text
- View/download PDF
269. Constrained Sampling-based Trajectory Optimization using Stochastic Approximation
- Author
-
George I. Boutselis, Ziyi Wang, and Evangelos A. Theodorou
- Subjects
0209 industrial biotechnology ,Quadcopter ,Mathematical optimization ,Dynamical systems theory ,Computer science ,Parameterized complexity ,Sampling (statistics) ,02 engineering and technology ,Trajectory optimization ,Stochastic approximation ,Nonlinear system ,020901 industrial engineering & automation ,Optimization and Control (math.OC) ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,020201 artificial intelligence & image processing ,Mathematics - Optimization and Control - Abstract
We propose a sampling-based trajectory optimization methodology for constrained problems. We extend recent works on stochastic search to deal with box control constraints, as well as nonlinear state constraints for discrete dynamical systems. Regarding the former, our strategy is to optimize over truncated parameterized distributions on control inputs. Furthermore, we show how non-smooth penalty functions can be incorporated into our framework to handle state constraints. Simulations on cartpole and quadcopter show that our approach outperforms previous methods on constrained sampling-based optimization, in terms of quality of solutions and convergence speed.
- Published
- 2019
- Full Text
- View/download PDF
270. Path Integral Control on Lie Groups
- Author
-
Evangelos A. Theodorou and George I. Boutselis
- Subjects
Stochastic control ,0209 industrial biotechnology ,Computer science ,Stochastic calculus ,Lie group ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Nonlinear system ,020901 industrial engineering & automation ,Control theory ,Lie algebra ,Euclidean geometry ,Path integral formulation ,Applied mathematics ,0101 mathematics - Abstract
Path Integral control theory yields a sampling-based methodology for solving stochastic optimal control problems. Motivated by its computational efficiency, we extend this framework to account for systems evolving on Lie groups. Our derivation relies on recursive mappings between system poses and corresponding Lie algebra elements. This allows us to apply standard facts from stochastic calculus, and obtain expressions analogous to those of Euclidean problems. The results imply that stochastic optimal control can be applied in a parameterization-free manner, even when nonlinear configuration spaces are considered. The method is tested in simulation on a simple model of a rigid satellite.
- Published
- 2018
271. Linearly Solvable Stochastic Optimal Control for Infinite-Dimensional Systems
- Author
-
George I. Boutselis, Evangelos A. Theodorou, and Kaivalya Bakshi
- Subjects
Stochastic control ,Parallelizable manifold ,Dynamical systems theory ,Generalization ,Parameterized complexity ,Observable ,010501 environmental sciences ,01 natural sciences ,010104 statistics & probability ,Stochastic differential equation ,Applied mathematics ,0101 mathematics ,Representation (mathematics) ,0105 earth and related environmental sciences - Abstract
In this paper we investigate whether the linearly solvable stochastic optimal control framework generalizes to the case of stochastic differential equations in infinite dimensional spaces. In particular, we show that the connection between the relative entropy-free energy relation and dynamic programming principles caries over to infinite dimensional spaces. Our analysis is based on a generalization of the Feynman-Kac lemma for certain classes of infinite dimensional diffusions and Hilbert space-valued Q-Wiener processes. We observe that the utilized information theoretic representation allows the formulation of variational problems for parameterized policy optimization of infinite dimensional systems. This work creates new research avenues towards the development of parallelizable stochastic control and inference algorithms for infinite dimensional dynamical systems in physics, fluid mechanics, partially observable stochastic control and open quantum systems.
- Published
- 2018
272. Hierarchical Optimization for Whole-Body Control of Wheeled Inverted Pendulum Humanoids
- Author
-
Seth Hutchinson, Evangelos A. Theodorou, and Munzir Zafar
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Computer science ,020208 electrical & electronic engineering ,Control (management) ,Constraint (computer-aided design) ,02 engineering and technology ,Systems and Control (eess.SY) ,Degrees of freedom (mechanics) ,Inverted pendulum ,Computer Science - Robotics ,020901 industrial engineering & automation ,Control theory ,0202 electrical engineering, electronic engineering, information engineering ,FOS: Electrical engineering, electronic engineering, information engineering ,Torque ,Robot ,Computer Science - Systems and Control ,Center of mass ,Robotics (cs.RO) - Abstract
In this paper, we present a whole-body control framework for Wheeled Inverted Pendulum (WIP) Humanoids. WIP Humanoids are redundant manipulators dynamically balancing themselves on wheels. Characterized by several degrees of freedom, they have the ability to perform several tasks simultaneously, such as balancing, maintaining a body pose, controlling the gaze, lifting a load or maintaining end-effector configuration in operation space. The problem of whole-body control is to enable simultaneous performance of these tasks with optimal participation of all degrees of freedom at specified priorities for each objective. The control also has to obey constraint of angle and torque limits on each joint. The proposed approach is hierarchical with a low level controller for body joints manipulation and a high-level controller that defines center of mass (CoM) targets for the low-level controller to control zero dynamics of the system driving the wheels. The low-level controller plans for shorter horizons while considering more complete dynamics of the system, while the high-level controller plans for longer horizon based on an approximate model of the robot for computational efficiency.
- Published
- 2018
273. Locally Adaptive Online Trajectory Optimization in Unknown Environments With RRTs
- Author
-
Patrick Meyer, Ethan N. Evans, Dimitri N. Mavris, Samuel Seifert, and Evangelos A. Theodorou
- Subjects
Mathematical optimization ,Optimization algorithm ,Computer science ,Trajectory optimization - Abstract
Rapidly Exploring Random Trees (RRTs) have gained significant attention due to provable properties such as completeness and asymptotic optimality. However, offline methods are only useful when the entire problem landscape is known a priori. Furthermore, many real world applications have problem scopes that are orders of magnitude larger than typical mazes and bug traps that require large numbers of samples to match typical sample densities, resulting in high computational effort for reasonably low-cost trajectories. In this paper we propose an online trajectory optimization algorithm for uncertain large environments using RRTs, which we call Locally Adaptive Rapidly Exploring Random Tree (LARRT). This is achieved through two main contributions. We use an adaptive local sampling region and adaptive sampling scheme which depend on states of the dynamic system and observations of obstacles. We also propose a localized approach to planning and re-planning through fixing the root node to the current vehicle state and adding tree update functions. LARRT is designed to leverage local problem scope to reduce computational complexity and obtain a total lower-cost solution compared to a classical RRT of a similar number of nodes. Using this technique we can ensure that popular variants of RRT will remain online even for prohibitively large planning problems by transforming a large trajectory optimization approach to one that resembles receding horizon optimization. Finally, we demonstrate our approach in simulation and discuss various algorithmic trade-offs of the proposed approach.
- Published
- 2018
274. Information Theoretic Model Predictive Control on Jump Diffusion Processes
- Author
-
Grady Williams, Evangelos A. Theodorou, and Ziyi Wang
- Subjects
Stochastic control ,0303 health sciences ,0209 industrial biotechnology ,Mathematical optimization ,Computer science ,Computation ,Jump diffusion ,Shot noise ,Sampling (statistics) ,02 engineering and technology ,03 medical and health sciences ,Model predictive control ,020901 industrial engineering & automation ,Optimization and Control (math.OC) ,Path integral formulation ,FOS: Mathematics ,Stochastic optimization ,Mathematics - Optimization and Control ,030304 developmental biology - Abstract
In this paper we present an information theoretic approach to stochastic optimal control problems for systems with compound Poisson noise. We generalize previous work on information theoretic path integral control to discontinuous dynamics with compound Poisson noise. We also derive a control update law of the same form using a stochastic optimization approach. We develop a sampling-based iterative model predictive control (MPC) algorithm. The proposed algorithm is parallelizable and when implemented on a Graphical Processing Unit (GPU) can run in real time. We test the performance of the proposed algorithm in simulation for two control tasks using a cartpole and a quadrotor system. Our simulations demonstrate improved performance of the new scheme and indicate the importance of incorporating the statistical characteristics of stochastic disturbances in the computation of the stochastic optimal control policies.
- Published
- 2018
275. A mean-field game model for homogeneous flocking
- Author
-
Evangelos A. Theodorou, Kaivalya Bakshi, and Piyush Grover
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,Collective behavior ,Population ,FOS: Physical sciences ,General Physics and Astronomy ,Inverse ,Systems and Control (eess.SY) ,Dynamical Systems (math.DS) ,02 engineering and technology ,System of linear equations ,37N35, 34C23, 37L15, 91A13, 91A10, 93E20 ,01 natural sciences ,symbols.namesake ,020901 industrial engineering & automation ,FOS: Electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,Mathematics - Dynamical Systems ,0101 mathematics ,education ,Mathematics - Optimization and Control ,Mathematical Physics ,education.field_of_study ,Flocking (behavior) ,Applied Mathematics ,010102 general mathematics ,Statistical and Nonlinear Physics ,Optimal control ,Nonlinear Sciences - Adaptation and Self-Organizing Systems ,Optimization and Control (math.OC) ,Nash equilibrium ,Phase space ,symbols ,Computer Science - Systems and Control ,Adaptation and Self-Organizing Systems (nlin.AO) - Abstract
Empirically derived continuum models of collective behavior among large populations of dynamic agents are a subject of intense study in several fields, including biology, engineering and finance. We formulate and study a mean-field game model whose behavior mimics an empirically derived non-local homogeneous flocking model for agents with gradient self-propulsion dynamics. The mean-field game framework provides a non-cooperative optimal control description of the behavior of a population of agents in a distributed setting. In this description, each agent's state is driven by optimally controlled dynamics that result in a Nash equilibrium between itself and the population. The optimal control is computed by minimizing a cost that depends only on its own state, and a mean-field term. The agent distribution in phase space evolves under the optimal feedback control policy. We exploit the low-rank perturbative nature of the non-local term in the forward-backward system of equations governing the state and control distributions, and provide a linear stability analysis demonstrating that our model exhibits bifurcations similar to those found in the empirical model. The present work is a step towards developing a set of tools for systematic analysis, and eventually design, of collective behavior of non-cooperative dynamic agents via an inverse modeling approach., Comment: Revised to incorporate reviewers' suggestions. Accepted to Chaos journal
- Published
- 2018
276. Seizure Reduction using Model Predictive Control
- Author
-
Ioannis Exarchos, Evangelos A. Theodorou, Harleen K. Brar, Yunpeng Pan, and Babak Mahmoudi
- Subjects
Epilepsy ,021103 operations research ,Quantitative Biology::Neurons and Cognition ,Computer science ,Oscillation ,Physics::Medical Physics ,0211 other engineering and technologies ,Chaotic ,Brain ,PID controller ,Electroencephalography ,02 engineering and technology ,Optimal control ,medicine.disease ,Dynamic programming ,Model predictive control ,Seizures ,Control theory ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Humans ,020201 artificial intelligence & image processing ,Differential dynamic programming ,Algorithms - Abstract
This study presents a model predictive control approach for seizure reduction in a computational model of epilepsy. The differential dynamic programming (DDP) algorithm is implemented in a model predictive fashion to optimize a controller for suppressing seizures in a chaotic oscillator model. The chaotic oscillator model uses proportional-integral (PI) controller to represent the internal control mechanism that maintains stable neural activity in a healthy brain. In the pathological case, the gains of this PI controller are reduced, preventing sufficient feedback to suppress correlation increase between normal and pathological brain dynamics. This increase in correlation leads to synchronization of oscillator dynamics leading to the destabilization of neural activity and epileptic behavior. The pathological case of the chaotic oscillator model is formulated as an optimal control problem, which we solve using the dynamic programming principle. We propose using model predictive control with differential dynamic programming optimization as a possible method for controlling epileptic seizures in known models of epilepsy.
- Published
- 2018
277. Robust Sampling Based Model Predictive Control with Sparse Objective Information
- Author
-
Evangelos A. Theodorou, Paul Drews, Grady Williams, James M. Rehg, Brian Goldfain, and Kamil Saigol
- Subjects
0209 industrial biotechnology ,Model predictive control ,020901 industrial engineering & automation ,Computer science ,Sampling (statistics) ,02 engineering and technology ,Data mining ,Objective information ,computer.software_genre ,computer - Published
- 2018
278. A Fixed-Architecture Framework for Stochastic Nonlinear Controller Synthesis
- Author
-
Wassim M. Haddad, Ethan N. Evans, and Evangelos A. Theodorou
- Subjects
Lyapunov function ,symbols.namesake ,Polynomial ,Nonlinear system ,Exponential stability ,Computer science ,Control theory ,Stochastic process ,MathematicsofComputing_NUMERICALANALYSIS ,symbols ,Applied mathematics ,Optimal control ,Dynamical system - Abstract
In this paper, we present a Lyapunov function-based optimization approach for designing state and output feedback control laws for systems with polynomial nonlinearities. We use local polynomial expansions of a chosen order to approximate a higher-order nonlinear stochastic dynamical system, reformulate stochastic asymptotic stability conditions in the form of a nonlinear constrained optimization problem, and computationally determine the domain of attraction of the synthesized nonlinear controller on the original system. Finally, we illustrate the effectiveness of the proposed algorithm on two illustrative numerical examples.
- Published
- 2018
279. Safe Learning of Quadrotor Dynamics Using Barrier Certificates
- Author
-
Magnus Egerstedt, Evangelos A. Theodorou, and Li Wang
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Quadcopter ,Adaptive sampling ,Dynamical systems theory ,Computer science ,Process (engineering) ,Control engineering ,Systems and Control (eess.SY) ,02 engineering and technology ,Machine Learning (cs.LG) ,Computer Science - Learning ,symbols.namesake ,Nonlinear system ,020901 industrial engineering & automation ,Control system ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Computer Science - Systems and Control ,020201 artificial intelligence & image processing ,Invariant (mathematics) ,Gaussian process ,Invariant (computer science) - Abstract
To effectively control complex dynamical systems, accurate nonlinear models are typically needed. However, these models are not always known. In this paper, we present a data-driven approach based on Gaussian processes that learns models of quadrotors operating in partially unknown environments. What makes this challenging is that if the learning process is not carefully controlled, the system will go unstable, i.e., the quadcopter will crash. To this end, barrier certificates are employed for safe learning. The barrier certificates establish a non-conservative forward invariant safe region, in which high probability safety guarantees are provided based on the statistics of the Gaussian Process. A learning controller is designed to efficiently explore those uncertain states and expand the barrier certified safe region based on an adaptive sampling scheme. In addition, a recursive Gaussian Process prediction method is developed to learn the complex quadrotor dynamics in real-time. Simulation results are provided to demonstrate the effectiveness of the proposed approach., Comment: Submitted to ICRA 2018, 8 pages
- Published
- 2018
280. Model predictive PseudoSpectral Optimal Control with semi-parametric dynamics
- Author
-
Evangelos A. Theodorou, Kamil Saigol, Manan Gandhi, and Yunpeng Pan
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,Computer science ,Constrained optimization ,02 engineering and technology ,Trajectory optimization ,System dynamics ,Semiparametric model ,symbols.namesake ,020901 industrial engineering & automation ,Parametric model ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,020201 artificial intelligence & image processing ,Pseudospectral optimal control ,Gaussian process ,Parametric statistics - Abstract
Trajectory optimization of a controlled dynamical system is an essential part of autonomy, however many trajectory optimization techniques are limited by the fidelity of the underlying parametric model. In the field of robotics, a lack of model knowledge can be overcome with machine learning techniques by utilizing measurements to build a dynamical model from the data. This paper aims to take the middle ground between these two approaches by introducing a semi-parametric representation of the underlying system dynamics. Our goal is to leverage the considerable information contained in a traditional physics based model and combine it with a data-driven, non-parametric regression technique known as a Gaussian Process. Integrating this semi-parametric model with PseudoSpectral Optimal Control (PSOC), we demonstrate model learning in an episodic and receding horizon fashion. In order to manage parametric uncertainty, we introduce an algorithm that utilizes Sparse Spectrum Gaussian Processes (SSGP) for incremental learning after each rollout. The goal of this paper is to motivate and demonstrate the constrained optimization techniques with semi-parametric models for online learning.
- Published
- 2017
281. Evolving cost functions for model predictive control of multi-agent UAV combat swarms
- Author
-
Evangelos A. Theodorou, David D. Fan, and John Reeder
- Subjects
0301 basic medicine ,Mathematical optimization ,Artificial neural network ,Dynamical systems theory ,Computer science ,Multi-agent system ,Swarm behaviour ,Time horizon ,02 engineering and technology ,Trajectory optimization ,Optimal control ,03 medical and health sciences ,Model predictive control ,Nonlinear system ,030104 developmental biology ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Neuroevolution of augmenting topologies - Abstract
Recent advances in sampling-based Model Predictive Control (MPC) methods have enabled the control of nonlinear stochastic dynamical systems with complex and non-smooth cost functions. However, the main drawback of these methods is that they can be myopic with respect to high-level tasks, since MPC relies on predicting dynamics within a short time horizon. Furthermore, designing cost functions which capture high-level information may be prohibitive for complex tasks, especially multi-agent scenarios. Here we propose a hierarchical approach to this problem where the NeuroEvolution of Augmenting Topologies (NEAT) algorithm is used to build cost functions for an MPC trajectory optimization algorithm known as Model-Predictive Path Integral (MPPI) control. MPPI and NEAT are particularly well-suited to one another since MPPI can control an agent in a way that minimizes a non-differentiable cost function (including logic or non-smooth functions), while NEAT can build a neural network comprised of any arbitrary activation functions, including those which are non-differentiable or logic-based. We utilize this approach in controlling agile swarms of unmanned aerial vehicles (UAVs) in a simulated swarm vs. swarm combat scenario.
- Published
- 2017
282. Stochastic control of systems with control multiplicative noise using second order FBSDEs
- Author
-
David D. Fan, Evangelos A. Theodorou, and Kaivalya Bakshi
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,010102 general mathematics ,MathematicsofComputing_NUMERICALANALYSIS ,Mathematics::Optimization and Control ,Hamilton–Jacobi–Bellman equation ,02 engineering and technology ,Optimal control ,01 natural sciences ,Hamilton–Jacobi equation ,Multiplicative noise ,Dynamic programming ,symbols.namesake ,Nonlinear system ,020901 industrial engineering & automation ,symbols ,0101 mathematics ,Mathematics ,Gibbs sampling - Abstract
The Hamilton Jacobi Bellman (HJB) PDE for the stochastic optimal control (SOC) problem for diffusion SDE dynamics which have affine controls and state and control multiplicative noise is a second order fully nonlinear PDE. The previously known linearly solvable optimal control framework as well as the first order forward backward SDEs (FBSDEs) frameworks therefore have a characteristic inadequacy to support sampling algorithms for this SOC problem. We present the framework of second order FBSDEs for solving this SOC problem in this paper. We derive the nonlinear Feynman Kac representation of the second order fully nonlinear HJB PDE corresponding to diffusions with state and control multiplicative noise. The Feynman Kac representation enables a sampling based scheme for solving this SOC problem. This scheme is then leveraged to develop a least squares Monte Carlo regression based algorithm for implementations. The algorithm is validated by examples of simulated control of an underactuated system and comparison against an analytical characterization.
- Published
- 2017
283. Belief space stochastic control under unknown dynamics
- Author
-
Evangelos A. Theodorou, Yunpeng Pan, and Kamil Saigol
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Stochastic process ,Probabilistic logic ,Sampling (statistics) ,02 engineering and technology ,Optimal control ,Approximate inference ,020901 industrial engineering & automation ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,CMA-ES ,Mathematics - Abstract
We present a sampling-based stochastic optimal control (SOC) framework for systems with unknown dynamics based on the path integral formulation and probabilistic inference. This work is motivated by three major limitations of related SOC methods: first, full knowledge of the dynamics model is usually required. Second, model uncertainty is neglected. Third, convergence of the iterative scheme is quite slow. In order to cope with these issues, our method performs sampling in belief space using approximate inference in probabilistic models. When performing probability-weighted averaging, each sample is weighted by its predictive uncertainty. In addition, our method leverages covariance matrix adaptation to achieve faster convergence. We demonstrate the effectiveness and efficiency of the proposed method using a simulated cart-pole swing up task.
- Published
- 2017
284. Information theoretic MPC for model-based reinforcement learning
- Author
-
Nolan Wagener, James M. Rehg, Byron Boots, Brian Goldfain, Paul Drews, Grady Williams, and Evangelos A. Theodorou
- Subjects
0209 industrial biotechnology ,Artificial neural network ,Computer science ,business.industry ,020208 electrical & electronic engineering ,02 engineering and technology ,Optimal control ,Machine learning ,computer.software_genre ,Task (project management) ,Nonlinear system ,Model predictive control ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Trajectory ,Robot ,Reinforcement learning ,Artificial intelligence ,business ,computer - Abstract
We introduce an information theoretic model predictive control (MPC) algorithm capable of handling complex cost criteria and general nonlinear dynamics. The generality of the approach makes it possible to use multi-layer neural networks as dynamics models, which we incorporate into our MPC algorithm in order to solve model-based reinforcement learning tasks. We test the algorithm in simulation on a cart-pole swing up and quadrotor navigation task, as well as on actual hardware in an aggressive driving task. Empirical results demonstrate that the algorithm is capable of achieving a high level of performance and does so only utilizing data collected from the system.
- Published
- 2017
285. Agile Autonomous Driving using End-to-End Deep Imitation Learning
- Author
-
Keuntaek Lee, Kamil Saigol, Byron Boots, Xinyan Yan, Evangelos A. Theodorou, Yunpeng Pan, and Ching-An Cheng
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Matching (statistics) ,Artificial neural network ,Computer science ,business.industry ,Control (management) ,02 engineering and technology ,Imitation learning ,Throttle ,Computer Science - Robotics ,020901 industrial engineering & automation ,End-to-end principle ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,State (computer science) ,business ,Robotics (cs.RO) ,Agile software development - Abstract
We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance., Comment: 13 pages, Robotics: Science and Systems (RSS) 2018
- Published
- 2017
- Full Text
- View/download PDF
286. Pseudospectral Model Predictive Control under Partially Learned Dynamics
- Author
-
Manan Gandhi, Yunpeng Pan, and Evangelos A. Theodorou
- Subjects
business.industry ,Computer science ,Trajectory optimization ,Systems and Control (eess.SY) ,Machine learning ,computer.software_genre ,System dynamics ,Vehicle dynamics ,Model predictive control ,symbols.namesake ,Parametric model ,Obstacle avoidance ,symbols ,FOS: Electrical engineering, electronic engineering, information engineering ,Computer Science - Systems and Control ,Artificial intelligence ,business ,computer ,Gaussian process ,Parametric statistics - Abstract
Trajectory optimization of a controlled dynamical system is an essential part of autonomy, however many trajectory optimization techniques are limited by the fidelity of the underlying parametric model. In the field of robotics, a lack of model knowledge can be overcome with machine learning techniques, utilizing measurements to build a dynamical model from the data. This paper aims to take the middle ground between these two approaches by introducing a semi-parametric representation of the underlying system dynamics. Our goal is to leverage the considerable information contained in a traditional physics based model and combine it with a data-driven, non-parametric regression technique known as a Gaussian Process. Integrating this semi-parametric model with model predictive pseudospectral control, we demonstrate this technique on both a cart pole and quadrotor simulation with unmodeled damping and parametric error. In order to manage parametric uncertainty, we introduce an algorithm that utilizes Sparse Spectrum Gaussian Processes (SSGP) for online learning after each rollout. We implement this online learning technique on a cart pole and quadrator, then demonstrate the use of online learning and obstacle avoidance for the dubin vehicle dynamics., Comment: Accepted but withdrawn from AIAA Scitech 2017
- Published
- 2017
- Full Text
- View/download PDF
287. Game-theoretic and risk-sensitive stochastic optimal control via forward and backward stochastic differential equations
- Author
-
Evangelos A. Theodorou, Panagiotis Tsiotras, and Ioannis Exarchos
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Differential equation ,MathematicsofComputing_NUMERICALANALYSIS ,Mathematics::Optimization and Control ,Probabilistic logic ,02 engineering and technology ,Optimal control ,01 natural sciences ,Electronic mail ,Stochastic partial differential equation ,010104 statistics & probability ,Nonlinear system ,Stochastic differential equation ,020901 industrial engineering & automation ,0101 mathematics ,Mathematics - Abstract
In this work we present a sampling-based algorithm designed to solve game-theoretic control problems and risk-sensitive stochastic optimal control problems. The cornerstone of the proposed approach is the formulation of the problem in terms of forward and backward stochastic differential equations (FBSDEs). By means of a nonlinear version of the Feynman-Kac lemma, we obtain a probabilistic representation of the solution to the nonlinear Hamilton-Jacobi-Isaacs equation, expressed in the form of a decoupled system of FBSDEs. This system of FBSDEs can then be simulated by employing linear regression techniques. Utilizing the connection between stochastic differential games and risk-sensitive optimal control, we demonstrate that the proposed algorithm is also applicable to the latter class of problems. Simulation results validate the algorithm.
- Published
- 2016
288. Stochastic Game Theoretic trajectory optimization in continuous time
- Author
-
Evangelos A. Theodorou, Wei Sun, and Panagiotis Tsiotras
- Subjects
0209 industrial biotechnology ,Mathematical optimization ,Differential equation ,Stochastic game ,Approximation algorithm ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Inverted pendulum ,Nonlinear system ,020901 industrial engineering & automation ,Bellman equation ,Differential game ,Differential dynamic programming ,0105 earth and related environmental sciences ,Mathematics - Abstract
A Stochastic Game Theoretic Differential Dynamic Programming (SGT-DDP) algorithm is derived to solve a differential game under stochastic dynamics. We present the update law for the minimizing and maximizing controls for both players and provide a set of backward differential equations for the second order value function approximation. We compute the extra terms in the backward propagation equations that arise from the stochastic assumption compared with the original GTDDP. We present the SGT-DDP algorithm and analyze how the design of the cost function affects the feed-forward and feedback parts of the control policies under the game theoretic formulation. The performance of SGT-DDP is then investigated through simulations on two examples, namely, a first order nonlinear system, the inverted pendulum and the cart pole problems with conflicting controls. We conclude with some possible future extensions.
- Published
- 2016
289. Infinite dimensional control of doubly stochastic Jump Diffusions
- Author
-
Evangelos A. Theodorou and Kaivalya Bakshi
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Markov process ,Probability density function ,02 engineering and technology ,Trajectory optimization ,Optimal control ,Stochastic programming ,03 medical and health sciences ,Nonlinear system ,symbols.namesake ,020901 industrial engineering & automation ,0302 clinical medicine ,Integro-differential equation ,symbols ,Applied mathematics ,030217 neurology & neurosurgery ,Mathematics - Abstract
We present the infinite dimensional approach to control of a general class of doubly stochastic or otherwise known Q-mark Markov Jump Diffusion (Q-MJD) processes. The governing dynamics for the the probability density function (PDF) of this class of Q-MJD processes is a Partial Integro Differential Equation (PIDE). The infinite dimensional Minimum Principle (MP) is applied to control these PIDE dynamics. We qualitatively compare the infinite dimensional MP and the stochastic Dynamic Programming Principle (DPP) frameworks as applied to control of Q-MJD processes. The developed sampling based algorithms illustrate how the presented framework is a multi trajectory optimization method to solve nonlinear stochastic optimal control problems for Q-MJD processes.
- Published
- 2016
290. Reinforcement Learning and Synergistic Control of the ACT Hand
- Author
-
Evangelos A. Theodorou, Yoky Matsuoka, Emo Todorov, Mark Malhotra, and Eric Rombokas
- Subjects
Engineering ,business.industry ,Control engineering ,Index finger ,Space (commercial competition) ,Computer Science Applications ,Task (project management) ,medicine.anatomical_structure ,Control and Systems Engineering ,medicine ,Reinforcement learning ,Robot ,Motion planning ,Electrical and Electronic Engineering ,Control (linguistics) ,business ,Curse of dimensionality - Abstract
Tendon-driven systems are ubiquitous in biology and provide considerable advantages for robotic manipulators, but control of these systems is challenging because of the increase in dimensionality and intrinsic nonlinearities. Researchers in biological movement control have suggested that the brain may employ “muscle synergies” to make planning, control, and learning more tractable by expressing the tendon space in a lower dimensional virtual synergistic space. We employ synergies that respect the differing constraints of actuation and sensation, and apply path integral reinforcement learning in the virtual synergistic space as well as the full tendon space. Path integral reinforcement learning has been used successfully on torque-driven systems to learn episodic tasks without using explicit models, which is particularly important for difficult-to-model dynamics like tendon networks and contact transitions. We show that optimizing a small number of trajectories in virtual synergy space can produce comparable performance to optimizing the trajectories of the tendons individually. The six tendons of the index finger and eight tendons of the thumb, each actuating four degrees of joint freedom, are used to slide a switch and turn a knob. The learned control strategies provide a method for discovery of novel task strategies and system phenomena without explicitly modeling the physics of the robot and environment.
- Published
- 2013
291. Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments
- Author
-
A. Ellmer, Evangelos A. Theodorou, Michael Mistry, Freek Stulp, Jonas Buchli, and Stefan Schaal
- Subjects
0209 industrial biotechnology ,Computer science ,02 engineering and technology ,Robot end effector ,Robot learning ,Human–robot interaction ,Force field (chemistry) ,Robot control ,law.invention ,03 medical and health sciences ,020901 industrial engineering & automation ,0302 clinical medicine ,Impedance control ,Artificial Intelligence ,law ,Control theory ,Robot ,Reinforcement learning ,030217 neurology & neurosurgery ,Software ,Simulation - Abstract
For humans and robots, variable impedance control is an essential component for ensuring robust and safe physical interaction with the environment. Humans learn to adapt their impedance to specific tasks and environments; a capability which we continually develop and improve until we are well into our twenties. In this article, we reproduce functionally interesting aspects of learning impedance control in humans on a simulated robot platform. As demonstrated in numerous force field tasks, humans combine two strategies to adapt their impedance to perturbations, thereby minimizing position error and energy consumption: 1) if perturbations are unpredictable, subjects increase their impedance through cocontraction; and 2) if perturbations are predictable, subjects learn a feed-forward command to offset the perturbation. We show how a 7-DOF simulated robot demonstrates similar behavior with our model-free reinforcement learning algorithm PI2, by applying deterministic and stochastic force fields to the robot's end-effector. We show the qualitative similarity between the robot and human movements. Our results provide a biologically plausible approach to learning appropriate impedances purely from experience, without requiring a model of either body or environment dynamics. Not requiring models also facilitates autonomous development for robots, as prespecified models cannot be provided for each environment a robot might encounter.
- Published
- 2013
- Full Text
- View/download PDF
292. Cross-entropy optimization for neuromodulation
- Author
-
Evangelos A. Theodorou, Harleen K. Brar, Yunpeng Pan, and Babak Mahmoudi
- Subjects
Neurons ,Epilepsy ,021103 operations research ,Computer science ,Oscillation ,Entropy ,Models, Neurological ,0211 other engineering and technologies ,Chaotic ,Brain ,Control engineering ,0102 computer and information sciences ,02 engineering and technology ,01 natural sciences ,Feedback ,Machine Learning ,Cross entropy ,010201 computation theory & mathematics ,Control theory ,medicine ,Reinforcement learning ,Epileptic seizure ,medicine.symptom - Abstract
This study presents a reinforcement learning approach for the optimization of the proportional-integral gains of the feedback controller represented in a computational model of epilepsy. The chaotic oscillator model provides a feedback control systems view of the dynamics of an epileptic brain with an internal feedback controller representative of the natural seizure suppression mechanism within the brain circuitry. Normal and pathological brain activity is simulated in this model by adjusting the feedback gain values of the internal controller. With insufficient gains, the internal controller cannot provide enough feedback to the brain dynamics causing an increase in correlation between different brain sites. This increase in synchronization results in the destabilization of the brain dynamics, which is representative of an epileptic seizure. To provide compensation for an insufficient internal controller an external controller is designed using proportional-integral feedback control strategy. A cross-entropy optimization algorithm is applied to the chaotic oscillator network model to learn the optimal feedback gains for the external controller instead of hand-tuning the gains to provide sufficient control to the pathological brain and prevent seizure generation. The correlation between the dynamics of neural activity within different brain sites is calculated for experimental data to show similar dynamics of epileptic neural activity as simulated by the network of chaotic oscillators.
- Published
- 2016
293. Stochastic Optimal Control using polynomial chaos variational integrators
- Author
-
Evangelos A. Theodorou, George I. Boutselis, and Gerardo De La Torre
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Polynomial chaos ,Iterative method ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Chaos theory ,020901 industrial engineering & automation ,Convergence (routing) ,Differential dynamic programming ,Variational integrator ,0105 earth and related environmental sciences ,Mathematics ,Numerical stability - Abstract
In this paper a novel method towards solving the Stochastic Optimal Control problem is proposed, which is based on the combination of the generalized Polynomial Chaos theory and the Differential Dynamic Programming framework. Utilizing the Polynomial Chaos theory allows us to handle a wide range of uncertainties, without having to rely on limiting assumptions regarding the form of stochasticity. In addition, the Differential Dynamic Programming framework provides an iterative algorithm for finding optimal controls, which attains scalability and, under mild assumptions, fast convergence. Last but not least, towards increasing the numerical stability of our algorithm, the concept of Variational Integrators is incorporated. Numerical examples validate the applicability of the proposed approach.
- Published
- 2016
294. Aggressive driving with model predictive path integral control
- Author
-
Paul Drews, James M. Rehg, Brian Goldfain, Evangelos A. Theodorou, and Grady Williams
- Subjects
Stochastic control ,0209 industrial biotechnology ,Mathematical optimization ,Kullback–Leibler divergence ,Computer science ,02 engineering and technology ,Aggressive driving ,Model predictive control ,020901 industrial engineering & automation ,Control theory ,Path integral formulation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Importance sampling ,Energy (signal processing) - Abstract
In this paper we present a model predictive control algorithm designed for optimizing non-linear systems subject to complex cost criteria. The algorithm is based on a stochastic optimal control framework using a fundamental relationship between the information theoretic notions of free energy and relative entropy. The optimal controls in this setting take the form of a path integral, which we approximate using an efficient importance sampling scheme. We experimentally verify the algorithm by implementing it on a Graphics Processing Unit (GPU) and apply it to the problem of controlling a fifth-scale Auto-Rally vehicle in an aggressive driving task.
- Published
- 2016
295. An Information-Theoretic Active Localization Approach during Relative Circumnavigation in Orbit
- Author
-
Evangelos A. Theodorou, Michail Kontitsis, and Panagiotis Tsiotras
- Subjects
030213 general clinical medicine ,0209 industrial biotechnology ,Landmark ,Computer science ,02 engineering and technology ,Circumnavigation ,03 medical and health sciences ,020901 industrial engineering & automation ,0302 clinical medicine ,Cross entropy ,Feature (computer vision) ,Satellite ,Circular orbit ,Orbit (control theory) ,Algorithm ,Uncertainty reduction theory - Abstract
This paper presents an information-theoretic active localization technique applied to the problem of relative navigation and self-localization in orbit. We apply the approach to the problem of a chaser satellite circumnavigating a target satellite in a circular orbit, while observing a set of feature points (landmarks) on the target satellite. The approach relies on the Cross Entropy (CE) optimization method to select camera orientation trajectories that minimize both the localization error and the corresponding uncertainty bounds, while moderating the required control effort. The proposed method provides a framework for near-optimal solutions by jointly considering control, planning and estimation. We show the benefits of the method in terms of landmark uncertainty reduction and we compare the proposed approach against an open-loop strategy and a “greedy” feedback strategy.
- Published
- 2016
296. A Comparison between Trajectory Optimization Methods: Differential Dynamic Programming and Pseudospectral Optimal Control
- Author
-
Evangelos A. Theodorou and Manan Gandhi
- Subjects
Gauss pseudospectral method ,Computer science ,Control theory ,Differential dynamic programming ,Ross–Fahroo pseudospectral method ,010103 numerical & computational mathematics ,Trajectory optimization ,Pseudospectral optimal control ,0101 mathematics ,01 natural sciences - Published
- 2016
297. Learning variable impedance control
- Author
-
Stefan Schaal, Jonas Buchli, Evangelos A. Theodorou, and Freek Stulp
- Subjects
Stochastic control ,business.industry ,Computer science ,Applied Mathematics ,Mechanical Engineering ,Control engineering ,Robotics ,Variable (computer science) ,Gain scheduling ,Impedance control ,Artificial Intelligence ,Control theory ,Robustness (computer science) ,Modeling and Simulation ,Robot ,Reinforcement learning ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software - Abstract
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high degree-of-freedom (DOF) robotic tasks. In this contribution, we accomplish such variable impedance control with the reinforcement learning (RL) algorithm PI2 (P olicyI mprovement withP ath I ntegrals). PI2 is a model-free, sampling-based learning method derived from first principles of stochastic optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on the cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that reinforcement learning on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling for variable impedance control. We evaluate our approach by presenting results on several simulated and real robots. We consider tasks involving accurate tracking through via points, and manipulation tasks requiring physical contact with the environment. In these tasks, the optimal strategy requires both tuning of a reference trajectory and the impedance of the end-effector. The results show that we can use path integral based reinforcement learning not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.
- Published
- 2011
298. Computational Models for Neuromuscular Function
- Author
-
Heiko Hoffmann, Evangelos A. Theodorou, Francisco J. Valero-Cuevas, Jason J. Kutch, and M. U. Kurse
- Subjects
Computational model ,business.industry ,Computer science ,Biomedical Engineering ,Experimental data ,Statistical model ,Machine learning ,computer.software_genre ,Article ,Neuromuscular stimulation ,Leverage (statistics) ,Experimental work ,Statistical analysis ,Artificial intelligence ,Neuromuscular control ,business ,computer - Abstract
Computational models of the neuromuscular system hold the potential to allow us to reach a deeper understanding of neuromuscular function and clinical rehabilitation by complementing experimentation. By serving as a means to distill and explore specific hypotheses, computational models emerge from prior experimental data and motivate future experimental work. Here we review computational tools used to understand neuromuscular function including musculoskeletal modeling, machine learning, control theory, and statistical model analysis. We conclude that these tools, when used in combination, have the potential to further our understanding of neuromuscular function by serving as a rigorous means to test scientific hypotheses in ways that complement and leverage experimental data.
- Published
- 2009
299. Learning Optimal Control via Forward and Backward Stochastic Differential Equations
- Author
-
Evangelos A. Theodorou and Ioannis Exarchos
- Subjects
0209 industrial biotechnology ,Girsanov theorem ,Differential equation ,Computer science ,MathematicsofComputing_NUMERICALANALYSIS ,Systems and Control (eess.SY) ,02 engineering and technology ,01 natural sciences ,Electrical Engineering and Systems Science - Systems and Control ,010104 statistics & probability ,Stochastic differential equation ,Viscosity ,020901 industrial engineering & automation ,Linear regression ,FOS: Electrical engineering, electronic engineering, information engineering ,FOS: Mathematics ,Applied mathematics ,0101 mathematics ,Mathematics - Optimization and Control ,Stochastic control ,Probabilistic logic ,Sampling (statistics) ,Optimal control ,Nonlinear system ,Optimization and Control (math.OC) ,Trajectory ,Importance sampling - Abstract
In this paper we present a novel sampling-based numerical scheme designed to solve a certain class of stochastic optimal control problems, utilizing forward and backward stochastic differential equations (FBSDEs). By means of a nonlinear version of the Feynman-Kac lemma, we obtain a probabilistic representation of the solution to the nonlinear Hamilton-Jacobi-Bellman equation, expressed in the form of a decoupled system of FBSDEs. This system of FBSDEs can then be simulated by employing linear regression techniques. To enhance the efficiency of the proposed scheme when treating more complex nonlinear systems, we then derive an iterative modification based on Girsanov's theorem on the change of measure, which features importance sampling. The modified scheme is capable of learning the optimal control without requiring an initial guess. We present simulations that validate the algorithm and demonstrate its efficiency in treating nonlinear dynamics.
- Published
- 2015
300. Robust Trajectory Optimization: A Cooperative Stochastic Game Theoretic Approach
- Author
-
Kaivalya Bakshi, Evangelos A. Theodorou, and Yunpeng Pan
- Subjects
Dynamic programming ,Mathematical optimization ,Robustness (computer science) ,Computer science ,Stochastic game ,Differential game ,Reinforcement learning ,Stochastic optimization ,Trajectory optimization ,Optimal control - Abstract
We present a novel trajectory optimization framework to address the issue of robustness, scalability and efficiency in optimal control and reinforcement learning. Based on prior work in Cooperative Stochastic Differential Game (CSDG) theory, our method performs local trajectory optimization using cooperative controllers. The resulting framework is called Cooperative Game-Differential Dynamic Programming (CG-DDP). Compared to related methods, CG-DDP exhibits improved performance in terms of robustness and efficiency. The proposed framework is also applied in a data-driven fashion for belief space trajectory optimization under learned dynamics. We present experiments showing that CG-DDP can be used for optimal control and reinforcement learning under external disturbances and internal model uncertainties.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.