NVIDIA GPU Prometheus Exporter
This is a Prometheus Exporter for exporting NVIDIA GPU metrics. It uses the Go bindings for NVIDIA Management Library (NVML) which is a C-based API that can be used for monitoring NVIDIA GPU devices. Unlike some other similar exporters, it does not call the
nvidia-smibinary.
The repository includes
nvml.h, so there are no special requirements from the build environment.
go getshould be able to build the exporter binary.
go get github.com/mindprince/nvidia_gpu_prometheus_exporter
The exporter requires the following: - access to NVML library (
libnvidia-ml.so.1). - access to the GPU devices.
To make sure that the exporter can access the NVML libraries, either add them to the search path for shared libraries. Or set
LD_LIBRARY_PATHto point to their location.
By default the metrics are exposed on port
9445. This can be updated using the
-web.listen-addressflag.
There's a docker image available on Docker Hub at mindprince/nvidiagpuprometheus_exporter
If you are running the exporter inside a container, you will need to do the following to give the container access to NVML library:
-e LD_LIBRARY_PATH= --volume :
And you will need to do one of the following to give it access to the GPU devices: - Run with
--privileged- If you are on docker v17.04.0-ce or above, run with
--device-cgroup-rule 'c 195:* mrw'- Run with
--device /dev/nvidiactl:/dev/nvidiactl /dev/nvidia0:/dev/nvidia0 /dev/nvidia1:/dev/nvidia1
If you don't want to do the above, you can run it using nvidia-docker.
nvidia-docker run -p 9445:9445 -ti mindprince/nvidia_gpu_prometheus_exporter:0.1