Need help with ilqr?

Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

135 Stars 32 Forks GNU General Public License v3.0 40 Commits 5 Opened issues

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models

Readme

.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master :target: https://travis-ci.org/anassinator/ilqr

This is an implementation of the Iterative Linear Quadratic Regulator (iLQR) for non-linear trajectory optimization based on Yuval Tassa's

paper_.

It is compatible with both Python 2 and 3 and has built-in support for auto-differentiating both the dynamics model and the cost function using

Theano_.

To install, clone and run:

.. code-block:: bash

python setup.py install

You may also install the dependencies with

pipenvas follows:

.. code-block:: bash

pipenv install

After installing, :code:

importas follows:

.. code-block:: python

from ilqr import iLQR

You can see the

examples_ directory for

Jupyter_ notebooks to see how common control problems can be solved through iLQR.

Dynamics model ^^^^^^^^^^^^^^

You can set up your own dynamics model by either extending the :code:

Dynamicsclass and hard-coding it and its partial derivatives. Alternatively, you can write it up as a

Theanoexpression and use the :code:

AutoDiffDynamicsclass for it to be auto-differentiated. Finally, if all you have is a function, you can use the :code:

FiniteDiffDynamicsclass to approximate the derivatives with finite difference approximation.

This section demonstrates how to implement the following dynamics model:

.. math::

m \dot{v} = F - \alpha v

where :math:

mis the object's mass in :math:

kg, :math:

alphais the friction coefficient, :math:

vis the object's velocity in :math:

m/s, :math:

\dot{v}is the object's acceleration in :math:

m/s^2, and :math:

Fis the control (or force) you're applying to the object in :math:

N.

Automatic differentiation """""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.dynamics import AutoDiffDynamics

x = T.dscalar("x") # Position.
x*dot = T.dscalar("x*dot") # Velocity.
F = T.dscalar("F") # Force.

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

# Acceleration.
x*dot*dot = x_dot * (1 - alpha * dt / m) + F * dt / m

# Discrete dynamics model definition.
f = T.stack([
x + x*dot * dt,
x*dot + x*dot*dot * dt,
])

x*inputs = [x, x*dot] # State vector.
u_inputs = [F] # Control vector.

# Compile the dynamics.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
dynamics = AutoDiffDynamics(f, x*inputs, u*inputs)

*Note*: If you want to be able to use the Hessians (:code:

f_xx, :code:

f_ux, and :code:

f_uu), you need to pass the :code:

hessians=Trueargument to the constructor. This will increase compilation time. Note that :code:

iLQRdoes not require second-order derivatives to function.

Batch automatic differentiation """""""""""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.dynamics import BatchAutoDiffDynamics

state*size = 2 # [position, velocity]
action*size = 1 # [force]

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

def f(x, u, i): """Batched implementation of the dynamics model.

Args: x: State vector [*, state_size]. u: Control vector [*, action_size]. i: Current time step [*, 1].Returns: Next state vector [*, state_size]. """ x_ = x[..., 0] x_dot = x[..., 1] F = u[..., 0]

## Acceleration.

x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

## Discrete dynamics model definition.

return T.stack([ x_ + x_dot * dt, x_dot + x_dot_dot * dt, ]).T

# Compile the dynamics.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
dynamics = BatchAutoDiffDynamics(f, state*size, action*size)

*Note*: This is a faster version of :code:

AutoDiffDynamicsthat doesn't support Hessians.

Finite difference approximation """""""""""""""""""""""""""""""

.. code-block:: python

from ilqr.dynamics import FiniteDiffDynamics

state*size = 2 # [position, velocity]
action*size = 1 # [force]

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

def f(x, u, i): """Dynamics model function.

Args: x: State vector [state_size]. u: Control vector [action_size]. i: Current time step.Returns: Next state vector [state_size]. """ [x, x_dot] = x [F] = u

## Acceleration.

x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

return np.array([ x + x_dot * dt, x_dot + x_dot_dot * dt, ])

# NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be
# as accurate.
dynamics = FiniteDiffDynamics(f, state*size, action*size)

*Note*: It is possible you might need to play with the epsilon values
(:code:

x_epsand :code:

u_eps) used when computing the approximation if you run into numerical instability issues.

Usage """""

Regardless of the method used for constructing your dynamics model, you can use them as follows:

.. code-block:: python

curr*x = np.array([1.0, 2.0])
curr*u = np.array([0.0])
i = 0 # This dynamics model is not time-varying, so this doesn't matter.

dynamics.f(curr

x, curru, i) ... array([ 1.02 , 2.01998]) dynamics.fx(currx, curru, i) ... array([[ 1. , 0.01 ], [ 0. , 1.00999]]) dynamics.fu(currx, curru, i) ... array([[ 0. ], [ 0.0001]])

Comparing the output of the :code:

AutoDiffDynamicsand the :code:

FiniteDiffDynamicsmodels should generally yield consistent results, but the auto-differentiated method will always be more accurate. Generally, the finite difference approximation will be faster unless you're also computing the Hessians: in which case, Theano's compiled derivatives are more optimized.

Cost function ^^^^^^^^^^^^^

Similarly, you can set up your own cost function by either extending the :code:

Costclass and hard-coding it and its partial derivatives. Alternatively, you can write it up as a

Theanoexpression and use the :code:

AutoDiffCostclass for it to be auto-differentiated. Finally, if all you have are a loss functions, you can use the :code:

FiniteDiffCostclass to approximate the derivatives with finite difference approximation.

The most common cost function is the quadratic format used by Linear Quadratic Regulators:

.. math::

(x - x*{goal})^T Q (x - x*{goal}) + (u - u*{goal})^T R (u - u*{goal})

where :math:

Qand :math:

Rare matrices defining your quadratic state error and quadratic control errors and :math:

x_{goal}is your target state. For convenience, an implementation of this cost function is made available as the :code:

QRCostclass.

:code:

QRCostclass """"""""""""""""""""

.. code-block:: python

import numpy as np from ilqr.cost import QRCost

state*size = 2 # [position, velocity]
action*size = 1 # [force]

# The coefficients weigh how much your state error is worth to you vs
# the size of your controls. You can favor a solution that uses smaller
# controls by increasing R's coefficient.
Q = 100 * np.eye(state*size)
R = 0.01 * np.eye(action*size)

# This is optional if you want your cost to be computed differently at a # terminal state. Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])

# State goal is set to a position of 1 m with no velocity. x_goal = np.array([1.0, 0.0])

# NOTE: This is instantaneous and completely accurate.
cost = QRCost(Q, R, Q*terminal=Q*terminal, x*goal=x*goal)

Automatic differentiation """""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.cost import AutoDiffCost

x*inputs = [T.dscalar("x"), T.dscalar("x*dot")]
u_inputs = [T.dscalar("F")]

x = T.stack(x*inputs)
u = T.stack(u*inputs)

x*diff = x - x*goal
l = x*diff.T.dot(Q).dot(x*diff) + u.T.dot(R).dot(u)
l*terminal = x*diff.T.dot(Q*terminal).dot(x*diff)

# Compile the cost.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
cost = AutoDiffCost(l, l*terminal, x*inputs, u_inputs)

Batch automatic differentiation """""""""""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.cost import BatchAutoDiffCost

def cost_function(x, u, i, terminal): """Batched implementation of the quadratic cost function.

Args: x: State vector [*, state_size]. u: Control vector [*, action_size]. i: Current time step [*, 1]. terminal: Whether to compute the terminal cost.Returns: Instantaneous cost [*]. """ Q_ = Q_terminal if terminal else Q l = x.dot(Q_).dot(x.T) if l.ndim == 2: l = T.diag(l)

if not terminal: l_u = u.dot(R).dot(u.T) if l_u.ndim == 2: l_u = T.diag(l_u) l += l_u

return l

# Compile the cost.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
cost = BatchAutoDiffCost(cost*function, state*size, action_size)

Finite difference approximation """""""""""""""""""""""""""""""

.. code-block:: python

from ilqr.cost import FiniteDiffCost

def l(x, u, i): """Instantaneous cost function.

Args: x: State vector [state_size]. u: Control vector [action_size]. i: Current time step.Returns: Instantaneous cost [scalar]. """ x_diff = x - x_goal return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)

def l_terminal(x, i): """Terminal cost function.

Args: x: State vector [state_size]. i: Current time step.Returns: Terminal cost [scalar]. """ x_diff = x - x_goal return x_diff.T.dot(Q_terminal).dot(x_diff)

# NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as
# accurate.
cost = FiniteDiffCost(l, l*terminal, state*size, action_size)

*Note*: It is possible you might need to play with the epsilon values
(:code:

x_epsand :code:

u_eps) used when computing the approximation if you run into numerical instability issues.

Usage """""

Regardless of the method used for constructing your cost function, you can use them as follows:

.. code-block:: python

cost.l(curr

x, curru, i) ... 400.0 cost.lx(currx, curru, i) ... array([ 0., 400.]) cost.lu(currx, curru, i) ... array([ 0.]) cost.lxx(currx, curru, i) ... array([[ 200., 0.], [ 0., 200.]]) cost.lux(currx, curru, i) ... array([[ 0., 0.]]) cost.luu(currx, curr_u, i) ... array([[ 0.02]])

Putting it all together ^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

N = 1000 # Number of time-steps in trajectory. x0 = np.array([0.0, -0.1]) # Initial state. us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.

ilqr = iLQR(dynamics, cost, N) xs, us = ilqr.fit(x0, us_init)

:code:

xsand :code:

usnow hold the optimal state and control trajectory that reaches the desired goal state with minimum cost.

Finally, a :code:

RecedingHorizonControlleris also bundled with this package to use the :code:

iLQRcontroller in Model Predictive Control.

Important notes ^^^^^^^^^^^^^^^

To quote from Tassa's paper: "Two important parameters which have a direct impact on performance are the simulation time-step :code:

dtand the horizon length :code:

N. Since speed is of the essence, the goal is to choose those values which minimize the number of steps in the trajectory, i.e. the largest possible time-step and the shortest possible horizon. The size of :code:

dtis limited by our use of Euler integration; beyond some value the simulation becomes unstable. The minimum length of the horizon :code:

Nis a problem-dependent quantity which must be found by trial-and-error."

Contributions are welcome. Simply open an issue or pull request on the matter.

We use

YAPF_ for all Python formatting needs. You can auto-format your changes with the following command:

.. code-block:: bash

yapf --recursive --in-place --parallel .

You may install the linter as follows:

.. code-block:: bash

pipenv install --dev

See

LICENSE_.

This implementation was partially based on Yuval Tassa's :code:

MATLAB

implementation

navigator8972

implementation_.