Need help with ilqr?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

#### About the developer

anassinator
135 Stars 32 Forks GNU General Public License v3.0 40 Commits 5 Opened issues

#### Description

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models

!
?

# 40,590
Shell
gRPC
c-plus-...
c-sharp
38 commits
# 45,171
Python
Shell
HTML
reprodu...
2 commits

# Iterative Linear Quadratic Regulator

.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master :target: https://travis-ci.org/anassinator/ilqr

This is an implementation of the Iterative Linear Quadratic Regulator (iLQR) for non-linear trajectory optimization based on Yuval Tassa's

paper
_.

It is compatible with both Python 2 and 3 and has built-in support for auto-differentiating both the dynamics model and the cost function using

Theano
_.

## Install

To install, clone and run:

.. code-block:: bash

python setup.py install

You may also install the dependencies with

pipenv
as follows:

.. code-block:: bash

pipenv install

## Usage

After installing, :code:

import
as follows:

.. code-block:: python

from ilqr import iLQR

You can see the

examples
_ directory for
Jupyter
_ notebooks to see how common control problems can be solved through iLQR.

Dynamics model ^^^^^^^^^^^^^^

You can set up your own dynamics model by either extending the :code:

Dynamics
class and hard-coding it and its partial derivatives. Alternatively, you can write it up as a
Theano
expression and use the :code:
AutoDiffDynamics
class for it to be auto-differentiated. Finally, if all you have is a function, you can use the :code:
FiniteDiffDynamics
class to approximate the derivatives with finite difference approximation.

This section demonstrates how to implement the following dynamics model:

.. math::

m \dot{v} = F - \alpha v

where :math:

m
is the object's mass in :math:
kg
, :math:
alpha
is the friction coefficient, :math:
v
is the object's velocity in :math:
m/s
, :math:
\dot{v}
is the object's acceleration in :math:
m/s^2
, and :math:
F
is the control (or force) you're applying to the object in :math:
N
.

Automatic differentiation """""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.dynamics import AutoDiffDynamics

x = T.dscalar("x") # Position. xdot = T.dscalar("xdot") # Velocity. F = T.dscalar("F") # Force.

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

# Acceleration. xdotdot = x_dot * (1 - alpha * dt / m) + F * dt / m

# Discrete dynamics model definition. f = T.stack([ x + xdot * dt, xdot + xdotdot * dt, ])

xinputs = [x, xdot] # State vector. u_inputs = [F] # Control vector.

# Compile the dynamics. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. dynamics = AutoDiffDynamics(f, xinputs, uinputs)

Note: If you want to be able to use the Hessians (:code:

f_xx
, :code:
f_ux
, and :code:
f_uu
), you need to pass the :code:
hessians=True
argument to the constructor. This will increase compilation time. Note that :code:
iLQR
does not require second-order derivatives to function.

Batch automatic differentiation """""""""""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.dynamics import BatchAutoDiffDynamics

statesize = 2 # [position, velocity] actionsize = 1 # [force]

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

def f(x, u, i): """Batched implementation of the dynamics model.

  Args:
x: State vector [*, state_size].
u: Control vector [*, action_size].
i: Current time step [*, 1].

Returns:
Next state vector [*, state_size].
"""
x_ = x[..., 0]
x_dot = x[..., 1]
F = u[..., 0]
Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m
Discrete dynamics model definition.
return T.stack([
x_ + x_dot * dt,
x_dot + x_dot_dot * dt,
]).T


# Compile the dynamics. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. dynamics = BatchAutoDiffDynamics(f, statesize, actionsize)

Note: This is a faster version of :code:

AutoDiffDynamics
that doesn't support Hessians.

Finite difference approximation """""""""""""""""""""""""""""""

.. code-block:: python

from ilqr.dynamics import FiniteDiffDynamics

statesize = 2 # [position, velocity] actionsize = 1 # [force]

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

def f(x, u, i): """Dynamics model function.

  Args:
x: State vector [state_size].
u: Control vector [action_size].
i: Current time step.

Returns:
Next state vector [state_size].
"""
[x, x_dot] = x
[F] = u
Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m
return np.array([
x + x_dot * dt,
x_dot + x_dot_dot * dt,
])


# NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be # as accurate. dynamics = FiniteDiffDynamics(f, statesize, actionsize)

Note: It is possible you might need to play with the epsilon values (:code:

x_eps
and :code:
u_eps
) used when computing the approximation if you run into numerical instability issues.

Usage """""

Regardless of the method used for constructing your dynamics model, you can use them as follows:

.. code-block:: python

currx = np.array([1.0, 2.0]) curru = np.array([0.0]) i = 0 # This dynamics model is not time-varying, so this doesn't matter.

dynamics.f(currx, curru, i) ... array([ 1.02 , 2.01998]) dynamics.fx(currx, curru, i) ... array([[ 1. , 0.01 ], [ 0. , 1.00999]]) dynamics.fu(currx, curru, i) ... array([[ 0. ], [ 0.0001]])

Comparing the output of the :code:

AutoDiffDynamics
and the :code:
FiniteDiffDynamics
models should generally yield consistent results, but the auto-differentiated method will always be more accurate. Generally, the finite difference approximation will be faster unless you're also computing the Hessians: in which case, Theano's compiled derivatives are more optimized.

Cost function ^^^^^^^^^^^^^

Similarly, you can set up your own cost function by either extending the :code:

Cost
class and hard-coding it and its partial derivatives. Alternatively, you can write it up as a
Theano
expression and use the :code:
AutoDiffCost
class for it to be auto-differentiated. Finally, if all you have are a loss functions, you can use the :code:
FiniteDiffCost
class to approximate the derivatives with finite difference approximation.

The most common cost function is the quadratic format used by Linear Quadratic Regulators:

.. math::

(x - x{goal})^T Q (x - x{goal}) + (u - u{goal})^T R (u - u{goal})

where :math:

Q
and :math:
R
are matrices defining your quadratic state error and quadratic control errors and :math:
x_{goal}
is your target state. For convenience, an implementation of this cost function is made available as the :code:
QRCost
class.

:code:

QRCost
class """"""""""""""""""""

.. code-block:: python

import numpy as np from ilqr.cost import QRCost

statesize = 2 # [position, velocity] actionsize = 1 # [force]

# The coefficients weigh how much your state error is worth to you vs # the size of your controls. You can favor a solution that uses smaller # controls by increasing R's coefficient. Q = 100 * np.eye(statesize) R = 0.01 * np.eye(actionsize)

# This is optional if you want your cost to be computed differently at a # terminal state. Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])

# State goal is set to a position of 1 m with no velocity. x_goal = np.array([1.0, 0.0])

# NOTE: This is instantaneous and completely accurate. cost = QRCost(Q, R, Qterminal=Qterminal, xgoal=xgoal)

Automatic differentiation """""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.cost import AutoDiffCost

xinputs = [T.dscalar("x"), T.dscalar("xdot")] u_inputs = [T.dscalar("F")]

x = T.stack(xinputs) u = T.stack(uinputs)

xdiff = x - xgoal l = xdiff.T.dot(Q).dot(xdiff) + u.T.dot(R).dot(u) lterminal = xdiff.T.dot(Qterminal).dot(xdiff)

# Compile the cost. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. cost = AutoDiffCost(l, lterminal, xinputs, u_inputs)

Batch automatic differentiation """""""""""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.cost import BatchAutoDiffCost

def cost_function(x, u, i, terminal): """Batched implementation of the quadratic cost function.

  Args:
x: State vector [*, state_size].
u: Control vector [*, action_size].
i: Current time step [*, 1].
terminal: Whether to compute the terminal cost.

Returns:
Instantaneous cost [*].
"""
Q_ = Q_terminal if terminal else Q
l = x.dot(Q_).dot(x.T)
if l.ndim == 2:
l = T.diag(l)
if not terminal:
l_u = u.dot(R).dot(u.T)
if l_u.ndim == 2:
l_u = T.diag(l_u)
l += l_u
return l


# Compile the cost. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. cost = BatchAutoDiffCost(costfunction, statesize, action_size)

Finite difference approximation """""""""""""""""""""""""""""""

.. code-block:: python

from ilqr.cost import FiniteDiffCost

def l(x, u, i): """Instantaneous cost function.

  Args:
x: State vector [state_size].
u: Control vector [action_size].
i: Current time step.

Returns:
Instantaneous cost [scalar].
"""
x_diff = x - x_goal
return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)


def l_terminal(x, i): """Terminal cost function.

  Args:
x: State vector [state_size].
i: Current time step.

Returns:
Terminal cost [scalar].
"""
x_diff = x - x_goal
return x_diff.T.dot(Q_terminal).dot(x_diff)


# NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as # accurate. cost = FiniteDiffCost(l, lterminal, statesize, action_size)

Note: It is possible you might need to play with the epsilon values (:code:

x_eps
and :code:
u_eps
) used when computing the approximation if you run into numerical instability issues.

Usage """""

Regardless of the method used for constructing your cost function, you can use them as follows:

.. code-block:: python

cost.l(currx, curru, i) ... 400.0 cost.lx(currx, curru, i) ... array([ 0., 400.]) cost.lu(currx, curru, i) ... array([ 0.]) cost.lxx(currx, curru, i) ... array([[ 200., 0.], [ 0., 200.]]) cost.lux(currx, curru, i) ... array([[ 0., 0.]]) cost.luu(currx, curr_u, i) ... array([[ 0.02]])

Putting it all together ^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

N = 1000 # Number of time-steps in trajectory. x0 = np.array([0.0, -0.1]) # Initial state. us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.

ilqr = iLQR(dynamics, cost, N) xs, us = ilqr.fit(x0, us_init)

:code:

xs
and :code:
us
now hold the optimal state and control trajectory that reaches the desired goal state with minimum cost.

Finally, a :code:

RecedingHorizonController
is also bundled with this package to use the :code:
iLQR
controller in Model Predictive Control.

Important notes ^^^^^^^^^^^^^^^

To quote from Tassa's paper: "Two important parameters which have a direct impact on performance are the simulation time-step :code:

dt
and the horizon length :code:
N
. Since speed is of the essence, the goal is to choose those values which minimize the number of steps in the trajectory, i.e. the largest possible time-step and the shortest possible horizon. The size of :code:
dt
is limited by our use of Euler integration; beyond some value the simulation becomes unstable. The minimum length of the horizon :code:
N
is a problem-dependent quantity which must be found by trial-and-error."

## Contributing

Contributions are welcome. Simply open an issue or pull request on the matter.

## Linting

We use

YAPF
_ for all Python formatting needs. You can auto-format your changes with the following command:

.. code-block:: bash

yapf --recursive --in-place --parallel .

You may install the linter as follows:

.. code-block:: bash

pipenv install --dev

See

LICENSE
_.

## Credits

This implementation was partially based on Yuval Tassa's :code:

MATLAB
implementation
, and
navigator8972
's
implementation
_.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.