GPU Sharing Scheduler for Kubernetes Cluster
More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very interested in this topic.
Now there is a GPU sharing solution on native Kubernetes: it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easily in your own Kubernetes.
For more details about the design of this project, please read this Design document.
You can follow this Installation Guide. If you are using Alibaba Cloud Kubernetes, please follow this doc to install with Helm Charts.
You can check this User Guide.
git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git && cd gpushare-scheduler-extender docker build -t cheyang/gpushare-scheduler-extender .
git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git && cd gpushare-device-plugin docker build -t cheyang/gpushare-device-plugin .
mkdir -p $GOPATH/src/github.com/AliyunContainerService cd $GOPATH/src/github.com/AliyunContainerService git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git cd gpushare-device-plugin go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go
If you are intrested in GPUShare and would like to share your experiences with others, you are warmly welcome to add your information on ADOPTERS.md page. We will continuousely discuss new requirements and feature design with you in advance.