A solution to gracefully handle GCE VM terminations in kubernetes clusters
This is not an official Google Project
This project provides an adapter for translating GCE node termination events to graceful pod terminations in Kubernetes. GCE VMs are typically live migratable. However, Preemptible VMs and VMs with Accelerators are not live migratable and are hence prone to VM terminations. Do not consume this project unless you are managing k8s clusters that run non migratable VM types.
To deploy this solution to a GKE or a GCE cluster:
shell kubectl apply -f deploy/
Note: This solution requires kubernetes versions >= 1.11 to work on Preemptible nodes.
The app deployed as part of this solution does the following:
kube-systemnamespace first before deleting the ones in it. Certain system pods like logging agents might need more time to flush out logs prior to termination and for this reason, pods in
kube-systemnamespaces are deleted last.
The agent crashes whenever it encounters an unrecoverable error with the metadata APIs. This agent is not production hardened yet and so use it with caution.
The pods that are not in the kube-system are called regular pods in this agent. By default, regular pods are deleted immediately before deleting system pods. If you want to delete regular pods gracefully, please add
--system-pod-grace-period=nin arguments according to the following rules:
nwith a value from
nwith a value from
0sto the value of
(--regular-vm-timeout / 2) - 1.
If you follow the rules above,
VM timeout - system-grace-pod-periodwill be given as a grace period for deleting regular pods. Note that
VM timeoutin Preemptible VM is 30 seconds.
If you specify
0s, the system pods will be terminated immediately and the regular pods will have about 30 seconds of grace period. If you specify
14s, both system and regular pods will have about
14sof grace period.
the timeout value of VM (e.g. preemptible=30s) / 2cannot be used as a maximum value in
--system-pod-grace-periodfor regular pods.
In addition, if the actual delete process fails, it will retry internally based on exponential backoff. In that case, the grace period is set considering the elapsed time, but it may shorten the actual grace period.