Kubernetes Gated Deployments
Kubernetes Gated Deployments facilitates A/B tests on application deployments to perform automated regression testing and canary analysis.
Kubernetes Gated Deployments extends the Kubernetes API to add a new type of object called
GatedDeployments, that allows engineers to specify which deployments and decision plugins to include in the A/B test. It uses a controller in the Kubernetes control plane that is responsible for implementing this behavior by retrieving and analyzing metrics from the backend specified, decisioning the A/B test, and either rolling back or continuing the deployment.
GatedDeploymentobject is added to the cluster (see the section for Installing the controller)
GatedDeploymentobjects using the Kubernetes API
treatmentdeployment specified in the
GatedDeploymentobject is deployed and eligible for an A/B test, i.e., it has more than zero replicas and a different pod spec than the
controldeployment, the controller will start the experiment
treatmentdeployment is causing harm to the metrics measured
treatmentdeployment (by setting the number of replicas to zero), or promote the
treatmentdeployment by setting the
controldeployment's image to that of the
treatmentdeployment, followed by scaling the
treatmentdeployment down to zero replicas
kubectl
To create the
GatedDeploymentcontroller on an existing Kubernetes cluster, run the following:
kubectl apply -f gated-deployments.yml
This creates all the necessary resources and deploys the controller in the
kubernetes-gated-deploymentsnamespace.
Alternatively, Helm can be used to install and manage the resources and controller. To install, run the following:
helm install helm/kubernetes-gated-deployments --name kubernetes-gated-deployments
See the Developing section for running locally during development.
controland
treatmentdeployments
Create two identical deployments with different names (e.g.,
example-rest-service-controland
example-rest-service-treatment). Initially, the number of replicas for the treatment deployment must be set to 0 and control deployment will be the only one taking production traffic.
Example deployment manifests are available here.
NOTE: The names of the deployments cannot be a prefix of the other. The gated deployment controller uses the deployment name as the host prefix (since pod names are of the form
-and if one deployment name is the prefix of the other, it will include data from all the pods. For example, using
example-rest-servicefor control and
example-rest-service-treatmentfor treatment will result in control including the data for treatment as well.
example-rest-service-gated-deployment.ymlfile like below:
apiVersion: 'kubernetes-client.io/v1' kind: GatedDeployment metadata: name: example-rest-service deploymentDescriptor: control: name: example-rest-service-control treatment: name: example-rest-service-treatment decisionPlugins: - name: newRelicPerformance accountId: 807783 secretName: newrelic-secrets secretKey: example-rest-service appName: example-rest-service minSamples: 50 maxTime: 600 testPath: /shopper/products
Save the file and run:
kubectl apply -f example-rest-service-gated-deployment.yml
In the example above, the
GatedDeploymentobject specifies that we want to gate our deployments on the performance of the
/shopper/productspath, between the
example-rest-service-controldeployment and
example-rest-service-treatmentdeployment (the latter of which we deploy new changes to). In this case, we specify that we want the controller to use the
newRelicPerformancedecision plugin to analyze performance data, which will be retrieved from New Relic (which our application is instrumented with).
For this plugin, you will also need to create a secret containing the NewRelic API key; an example is shown below. In this case,
newRelic.secretNameis set to
newrelic-secrets, and
newRelic.secretKeyis set to
example-rest-service. This means that the controller will look in its deployed namespace for a secret called
newrelic-secrets, and look in the secret data for the value corresponding to the key
example-rest-service.
apiVersion: v1 kind: Secret metadata: name: newrelic-secrets type: Opaque data: example-rest-service: aW5zaWdodHNBcGlLZXk=
Within the
deploymentDescriptorsection of the
GatedDeploymentobject, these are the possible options to customize. All options are required unless explicitly specified as optional.
|Property|Description| |---|---| |
control|Section describing the control deployment.| |
control.name|Name of the control deployment.| |
treatment|Section describing the treatment deployment. This should be the one normally deployed, e.g. as part of your CICD pipeline.| |
treatment.name|Name of the treatment deployment.| |
decisionPlugins|Section containing the list of decision plugin config objects. See Plugin configurations below for details on specific plugins.|
Each type of plugin will require its own configuration. The following parameters are common to all plugins:
|Property|Description| |---|---| |
name|The plugin name. This allows the plugin factory to find the correct plugin class.| |
maxTime(optional)|The maximum amount of time the experiment will run, at which point the A/B test will stop and automatically roll out the treatment deployment to the control deployment. When not specified, this defaults to 600 seconds (10 minutes)|
Plugins are designed to return one of three values: *
WAIT: if the analysis cannot make a conclusion about the metric yet, e.g., it requires a minimum amount of time or if the result is not yet statistically significant *
PASS: if the treatment version does no harm to the metric analyzed *
FAIL: if the treatment does harm to the metric analyzed
|Property|Description| |---|---| |
name|Must be
newRelicPerformance| |
accountId|Account ID of the New Relic account integrated with your application.| |
secretName|Name of the secret where your New Relic API keys can be found. This should be created in the namespace where
kubernetes-gated-deploymentsis deployed.| |
secretKey|Name of the key in the secret specified in
secretNamethat contains the New Relic Insights API key, used to run NRQL to collect performance data.| |
appName|Name of the New Relic application.| |
testPath|Path that you want to measure performance of for both deployments.| |
minSamples|The minimum number of samples required for each deployment before testing for significance.| |
zScoreThreshold(optional)|The Z Score threshold for Mann-Whitney U test. Defaults to 1.96, which corresponds to a p-value of 0.05| |
harmThreshold(optional)|Maximum allowable ratio of treatment to control U values from the Mann-Whitney U Test before treatment is marked as causing harm. This defaults to 1.5.|
To contribute a new plugin, create a new plugins class in lib/plugins that is a subclass of
Plugin. At minimum, you should implement the following methods: *
build: this should create the plugin with any necessary setup *
_poll: this is called periodically, and it should fetch and analyze metrics to return a
DecisionResult
The following methods are implemented by default: *
onExperimentStart: this is called when the experiment starts, and sets the experiment start time *
onExperimentStop: this is called when an experiment ends, and clears the experiment start time *
onExperimentPoll: this is called on every polling interval; it will check if the maximum experiment duration has been reached and return
PASSif it has, or it will return the result from
_poll.
To roll out a new version, update the treatment deployment with the new image and set the number of replicas to a non zero value (depending on the percentage of traffic you want to send to the new version).
Once the treatment deploy is rolled out, the gated deployment controller will start a new experiment and start polling for decisions from the decision plugins. The experiment runs until either all plugins have returned
PASS, or any single plugin returns
FAIL, at which point the controller will set the
gatedDeployStatusannotation on the treatment deployment to either
noHarmor
harmrespectively.
An example command to get the value of the annotation
kubectl get deploy -o jsonpath='{.metadata.annotations.gatedDeployStatus}' example-rest-service-treatment
This value can be periodically polled to check if the new version is causing harm or not in the CI/CD pipeline of the application. If the deployment causes no harm, the controller automatically rolls it out the new version to the control deployment. The status of the rollout can be checked using the below command.
kubectl rollout status deploy/example-rest-service-control
See CONTRIBUTING.md for how to contribute to this project.
You can develop locally with Minikube.
On Linux, the
kvm2driver provides better performance than the default
virtualboxdriver, but either will work:
minikube start --vm-driver=kvm2
minikube startwill configure your
kubeconfigfor your local Minikube cluster and set the current context to be for Minikube. With that configuration you can run the
kubernetes-gated-deploymentcontroller on your host operating system:
npm start
Kubernetes Gated Deployments is MIT licensed.