Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models
.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master :target: https://travis-ci.org/anassinator/ilqr
This is an implementation of the Iterative Linear Quadratic Regulator (iLQR) for non-linear trajectory optimization based on Yuval Tassa's
paper_.
It is compatible with both Python 2 and 3 and has built-in support for auto-differentiating both the dynamics model and the cost function using
Theano_.
To install, clone and run:
.. code-block:: bash
python setup.py install
You may also install the dependencies with
pipenvas follows:
.. code-block:: bash
pipenv install
After installing, :code:
importas follows:
.. code-block:: python
from ilqr import iLQR
You can see the
examples_ directory for
Jupyter_ notebooks to see how common control problems can be solved through iLQR.
Dynamics model ^^^^^^^^^^^^^^
You can set up your own dynamics model by either extending the :code:
Dynamicsclass and hard-coding it and its partial derivatives. Alternatively, you can write it up as a
Theanoexpression and use the :code:
AutoDiffDynamicsclass for it to be auto-differentiated. Finally, if all you have is a function, you can use the :code:
FiniteDiffDynamicsclass to approximate the derivatives with finite difference approximation.
This section demonstrates how to implement the following dynamics model:
.. math::
m \dot{v} = F - \alpha v
where :math:
mis the object's mass in :math:
kg, :math:
alphais the friction coefficient, :math:
vis the object's velocity in :math:
m/s, :math:
\dot{v}is the object's acceleration in :math:
m/s^2, and :math:
Fis the control (or force) you're applying to the object in :math:
N.
Automatic differentiation """""""""""""""""""""""""
.. code-block:: python
import theano.tensor as T from ilqr.dynamics import AutoDiffDynamics
x = T.dscalar("x") # Position. xdot = T.dscalar("xdot") # Velocity. F = T.dscalar("F") # Force.
dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.
# Acceleration. xdotdot = x_dot * (1 - alpha * dt / m) + F * dt / m
# Discrete dynamics model definition. f = T.stack([ x + xdot * dt, xdot + xdotdot * dt, ])
xinputs = [x, xdot] # State vector. u_inputs = [F] # Control vector.
# Compile the dynamics. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. dynamics = AutoDiffDynamics(f, xinputs, uinputs)
Note: If you want to be able to use the Hessians (:code:
f_xx, :code:
f_ux, and :code:
f_uu), you need to pass the :code:
hessians=Trueargument to the constructor. This will increase compilation time. Note that :code:
iLQRdoes not require second-order derivatives to function.
Batch automatic differentiation """""""""""""""""""""""""""""""
.. code-block:: python
import theano.tensor as T from ilqr.dynamics import BatchAutoDiffDynamics
statesize = 2 # [position, velocity] actionsize = 1 # [force]
dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.
def f(x, u, i): """Batched implementation of the dynamics model.
Args: x: State vector [*, state_size]. u: Control vector [*, action_size]. i: Current time step [*, 1].Returns: Next state vector [*, state_size]. """ x_ = x[..., 0] x_dot = x[..., 1] F = u[..., 0]
Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m
Discrete dynamics model definition.
return T.stack([ x_ + x_dot * dt, x_dot + x_dot_dot * dt, ]).T
# Compile the dynamics. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. dynamics = BatchAutoDiffDynamics(f, statesize, actionsize)
Note: This is a faster version of :code:
AutoDiffDynamicsthat doesn't support Hessians.
Finite difference approximation """""""""""""""""""""""""""""""
.. code-block:: python
from ilqr.dynamics import FiniteDiffDynamics
statesize = 2 # [position, velocity] actionsize = 1 # [force]
dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.
def f(x, u, i): """Dynamics model function.
Args: x: State vector [state_size]. u: Control vector [action_size]. i: Current time step.Returns: Next state vector [state_size]. """ [x, x_dot] = x [F] = u
Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m
return np.array([ x + x_dot * dt, x_dot + x_dot_dot * dt, ])
# NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be # as accurate. dynamics = FiniteDiffDynamics(f, statesize, actionsize)
Note: It is possible you might need to play with the epsilon values (:code:
x_epsand :code:
u_eps) used when computing the approximation if you run into numerical instability issues.
Usage """""
Regardless of the method used for constructing your dynamics model, you can use them as follows:
.. code-block:: python
currx = np.array([1.0, 2.0]) curru = np.array([0.0]) i = 0 # This dynamics model is not time-varying, so this doesn't matter.
dynamics.f(currx, curru, i) ... array([ 1.02 , 2.01998]) dynamics.fx(currx, curru, i) ... array([[ 1. , 0.01 ], [ 0. , 1.00999]]) dynamics.fu(currx, curru, i) ... array([[ 0. ], [ 0.0001]])
Comparing the output of the :code:
AutoDiffDynamicsand the :code:
FiniteDiffDynamicsmodels should generally yield consistent results, but the auto-differentiated method will always be more accurate. Generally, the finite difference approximation will be faster unless you're also computing the Hessians: in which case, Theano's compiled derivatives are more optimized.
Cost function ^^^^^^^^^^^^^
Similarly, you can set up your own cost function by either extending the :code:
Costclass and hard-coding it and its partial derivatives. Alternatively, you can write it up as a
Theanoexpression and use the :code:
AutoDiffCostclass for it to be auto-differentiated. Finally, if all you have are a loss functions, you can use the :code:
FiniteDiffCostclass to approximate the derivatives with finite difference approximation.
The most common cost function is the quadratic format used by Linear Quadratic Regulators:
.. math::
(x - x{goal})^T Q (x - x{goal}) + (u - u{goal})^T R (u - u{goal})
where :math:
Qand :math:
Rare matrices defining your quadratic state error and quadratic control errors and :math:
x_{goal}is your target state. For convenience, an implementation of this cost function is made available as the :code:
QRCostclass.
:code:
QRCostclass """"""""""""""""""""
.. code-block:: python
import numpy as np from ilqr.cost import QRCost
statesize = 2 # [position, velocity] actionsize = 1 # [force]
# The coefficients weigh how much your state error is worth to you vs # the size of your controls. You can favor a solution that uses smaller # controls by increasing R's coefficient. Q = 100 * np.eye(statesize) R = 0.01 * np.eye(actionsize)
# This is optional if you want your cost to be computed differently at a # terminal state. Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])
# State goal is set to a position of 1 m with no velocity. x_goal = np.array([1.0, 0.0])
# NOTE: This is instantaneous and completely accurate. cost = QRCost(Q, R, Qterminal=Qterminal, xgoal=xgoal)
Automatic differentiation """""""""""""""""""""""""
.. code-block:: python
import theano.tensor as T from ilqr.cost import AutoDiffCost
xinputs = [T.dscalar("x"), T.dscalar("xdot")] u_inputs = [T.dscalar("F")]
x = T.stack(xinputs) u = T.stack(uinputs)
xdiff = x - xgoal l = xdiff.T.dot(Q).dot(xdiff) + u.T.dot(R).dot(u) lterminal = xdiff.T.dot(Qterminal).dot(xdiff)
# Compile the cost. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. cost = AutoDiffCost(l, lterminal, xinputs, u_inputs)
Batch automatic differentiation """""""""""""""""""""""""""""""
.. code-block:: python
import theano.tensor as T from ilqr.cost import BatchAutoDiffCost
def cost_function(x, u, i, terminal): """Batched implementation of the quadratic cost function.
Args: x: State vector [*, state_size]. u: Control vector [*, action_size]. i: Current time step [*, 1]. terminal: Whether to compute the terminal cost.Returns: Instantaneous cost [*]. """ Q_ = Q_terminal if terminal else Q l = x.dot(Q_).dot(x.T) if l.ndim == 2: l = T.diag(l)
if not terminal: l_u = u.dot(R).dot(u.T) if l_u.ndim == 2: l_u = T.diag(l_u) l += l_u
return l
# Compile the cost. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. cost = BatchAutoDiffCost(costfunction, statesize, action_size)
Finite difference approximation """""""""""""""""""""""""""""""
.. code-block:: python
from ilqr.cost import FiniteDiffCost
def l(x, u, i): """Instantaneous cost function.
Args: x: State vector [state_size]. u: Control vector [action_size]. i: Current time step.Returns: Instantaneous cost [scalar]. """ x_diff = x - x_goal return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)
def l_terminal(x, i): """Terminal cost function.
Args: x: State vector [state_size]. i: Current time step.Returns: Terminal cost [scalar]. """ x_diff = x - x_goal return x_diff.T.dot(Q_terminal).dot(x_diff)
# NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as # accurate. cost = FiniteDiffCost(l, lterminal, statesize, action_size)
Note: It is possible you might need to play with the epsilon values (:code:
x_epsand :code:
u_eps) used when computing the approximation if you run into numerical instability issues.
Usage """""
Regardless of the method used for constructing your cost function, you can use them as follows:
.. code-block:: python
cost.l(currx, curru, i) ... 400.0 cost.lx(currx, curru, i) ... array([ 0., 400.]) cost.lu(currx, curru, i) ... array([ 0.]) cost.lxx(currx, curru, i) ... array([[ 200., 0.], [ 0., 200.]]) cost.lux(currx, curru, i) ... array([[ 0., 0.]]) cost.luu(currx, curr_u, i) ... array([[ 0.02]])
Putting it all together ^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
N = 1000 # Number of time-steps in trajectory. x0 = np.array([0.0, -0.1]) # Initial state. us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.
ilqr = iLQR(dynamics, cost, N) xs, us = ilqr.fit(x0, us_init)
:code:
xsand :code:
usnow hold the optimal state and control trajectory that reaches the desired goal state with minimum cost.
Finally, a :code:
RecedingHorizonControlleris also bundled with this package to use the :code:
iLQRcontroller in Model Predictive Control.
Important notes ^^^^^^^^^^^^^^^
To quote from Tassa's paper: "Two important parameters which have a direct impact on performance are the simulation time-step :code:
dtand the horizon length :code:
N. Since speed is of the essence, the goal is to choose those values which minimize the number of steps in the trajectory, i.e. the largest possible time-step and the shortest possible horizon. The size of :code:
dtis limited by our use of Euler integration; beyond some value the simulation becomes unstable. The minimum length of the horizon :code:
Nis a problem-dependent quantity which must be found by trial-and-error."
Contributions are welcome. Simply open an issue or pull request on the matter.
We use
YAPF_ for all Python formatting needs. You can auto-format your changes with the following command:
.. code-block:: bash
yapf --recursive --in-place --parallel .
You may install the linter as follows:
.. code-block:: bash
pipenv install --dev
See
LICENSE_.
This implementation was partially based on Yuval Tassa's :code:
MATLAB
implementation, and
navigator8972's
implementation_.