NVIDIA container runtime
A modified version of runc adding a custom pre-start hook to all containers.
If environment variable
NVIDIA_VISIBLE_DEVICESis set in the OCI spec, the hook will configure GPU access for the container by leveraging
nvidia-container-clifrom project libnvidia-container.
# Setup a rootfs based on Ubuntu 16.04 cd $(mktemp -d) && mkdir rootfs curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04-core-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xzCreate an OCI runtime spec
nvidia-container-runtime spec sed -i 's;"sh";"nvidia-smi";' config.json sed -i 's;("TERM=xterm");\1, "NVIDIA_VISIBLE_DEVICES=0";' config.json
Run the container
sudo nvidia-container-runtime run nvidia_smi
nvidia-container-runtimepackage:
sudo apt-get install nvidia-container-runtime
nvidia-container-runtimepackage:
sudo yum install nvidia-container-runtime
Do not follow this section if you installed the
nvidia-docker2package, it already registers the runtime.
To register the
nvidiaruntime, use the method below that is best suited to your environment.
sudo mkdir -p /etc/systemd/system/docker.service.d sudo tee /etc/systemd/system/docker.service.d/override.conf <Daemon configuration file
sudo tee /etc/docker/daemon.json <You can optionally reconfigure the default runtime by adding the following to
/etc/docker/daemon.json:"default-runtime": "nvidia"Command line
sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]Environment variables (OCI spec)
Each environment variable maps to an command-line argument for
nvidia-container-clifrom libnvidia-container.
These variables are already set in our official CUDA images.
NVIDIA_VISIBLE_DEVICESThis variable controls which GPUs will be made accessible inside the container.
Possible values
0,1,2,
GPU-fef8089b…: a comma-separated list of GPU UUID(s) or index(es).
all: all GPUs will be accessible, this is the default value in our container images.
none: no GPU will be accessible, but driver capabilities will be enabled.
voidor empty or unset:
nvidia-container-runtimewill have the same behavior as
runc.
Note: When running on a MIG capable device, the following values will also be available: *
0:0,0:1,1:0,
MIG-GPU-fef8089b/0/1…: a comma-separated list of MIG Device UUID(s) or index(es).
Where the MIG device indices have the form
:as seen in the example output:
$ nvidia-smi -L GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5) MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0) MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1) MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)
NVIDIA_MIG_CONFIG_DEVICES
This variable controls which of the visible GPUs can have their MIG configuration managed from within the container. This includes enabling and disabling MIG mode, creating and destroying GPU Instances and Compute Instances, etc.
all: Allow all MIG-capable GPUs in the visible device list to have their MIG configurations managed.
Note: * This feature is only available on MIG capable devices (e.g. the A100). * To use this feature, the container must be started with
CAP_SYS_ADMINprivileges. * When not running as
root, the container user must have read access to the
/proc/driver/nvidia/capabilities/mig/configfile on the host.
NVIDIA_MIG_MONITOR_DEVICES
This variable controls which of the visible GPUs can have aggregate information about all of their MIG devices monitored from within the container. This includes inspecting the aggregate memory usage, listing the aggregate running processes, etc.
all: Allow all MIG-capable GPUs in the visible device list to have their MIG devices monitored.
Note: * This feature is only available on MIG capable devices (e.g. the A100). * To use this feature, the container must be started with
CAP_SYS_ADMINprivileges. * When not running as
root, the container user must have read access to the
/proc/driver/nvidia/capabilities/mig/monitorfile on the host.
NVIDIA_DRIVER_CAPABILITIES
This option controls which driver libraries/binaries will be mounted inside the container.
compute,video,
graphics,utility…: a comma-separated list of driver features the container needs.
all: enable all available driver capabilities.
utility,compute.
compute: required for CUDA and OpenCL applications.
compat32: required for running 32-bit applications.
graphics: required for running OpenGL and Vulkan applications.
utility: required for using
nvidia-smiand NVML.
video: required for using the Video Codec SDK.
display: required for leveraging X11 display.
NVIDIA_REQUIRE_*
A logical expression to define constraints on the configurations supported by the container.
cuda: constraint on the CUDA driver version.
driver: constraint on the driver version.
arch: constraint on the compute architectures of the selected GPUs.
brand: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).
Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed.
Multiple environment variables of the form
NVIDIA_REQUIRE_*are ANDed together.
NVIDIA_DISABLE_REQUIRE
Single switch to disable all the constraints of the form
NVIDIA_REQUIRE_*.
NVIDIA_REQUIRE_CUDA
The version of the CUDA toolkit used by the container. It is an instance of the generic
NVIDIA_REQUIRE_*case and it is set by official CUDA images. If the version of the NVIDIA driver is insufficient to run this version of CUDA, the container will not be started.
cuda>=7.5,
cuda>=8.0,
cuda>=9.0…: any valid CUDA version in the form
major.minor.
CUDA_VERSION
Similar to
NVIDIA_REQUIRE_CUDA, for legacy CUDA images.
NVIDIA_REQUIRE_CUDAis not set,
NVIDIA_VISIBLE_DEVICESand
NVIDIA_DRIVER_CAPABILITIESwill default to
all.
Checkout the Contributing document!