Best practices for building an app to run on Kubernetes
This is a list of best practices for building an app to run on Kubernetes, using cloud-native technologies.
This was born from a talk I gave at KubeCon 2017 in Austin, TX.
Contributions Welcome - This is a living document. I know that the Cloud Native / Kubernetes communities have a wealth of experience, and I want to hear about yours!
Let's get to it...
Kubernetes schedules your pods, and is more effective at doing so when it knows:
Crash-only architectures mean that your software should just crash instead of trying to recover more gracefully.
Erlang popularized this concept because it introduced a supervisor principle. The supervisor watches over your code (called a process) and restarts it if it crashes. This architecture encourages you to simply crash if something goes wrong, because you know you'll be restarted and can try again.
If you run in Kubernetes, Kubernetes itself is the "supervisor." Instead of building retry loops or other recovery logic into your app, simply crash and let Kubernetes restart your pod.
Whether you like it or not, your app is a distributed system. You can no longer think of it as a single process (even if you're just running one pod, because the pod can move, crash, etc...). One implication of this fact, among many others, is that it's hard to rely on a specific ordering of events.
Much research has been done on the topic of ordering in distributed systems, so if your app needs to rely on ordered events, you need to understand at least the basic ideas (causal ordering, wall clock ordering, etc...), decide what kind of ordering your app needs, implement it, and then test to make sure it adheres to the definition.
Alternatively, you can avoid ordering altogether and/or use the Kubernetes primitives to take care of it for you. Examples of these primitives:
As we know, Kubernetes is "always on" and watching over your app. If something changes, it adapts to the change and modifies your app's deployment accordingly to account for the change and bring it back to the state you want it to be in (this is called reconciliation).
Your app needs to tolerate that dynamism.
This fact means a wide variety of things:
Services or a service mesh
kubectl, the API, or something else, query it by labels, not the name
The "atom" of deployment is a
Podin Kubernetes, and pods can have more than one container. This design allows for tight coupling between containers - either they all will be running or none will be running.
So, if you have components of your app that are tightly coupled (e.g. legacy systems, "sidecars", etc...), run them all in containers in the same pod.
We see this topology in many cases, including:
localhostto access the service mesh
Kubernetes provides a declarative API that allows you to tell it the end state in which you want your app to be, without telling it how to get there. The logic that Kubernetes follows internally to change the state of the cluster from where it is to that end state is called reconciliation.
You should always keep the latest working copy of your application checked into your source repository (i.e. Github) so that you can always submit it to Kubernetes to get your app into a good state.
Express your app as a Helm chart, so that you are a simple
helm installaway from bringing your app to a good, working state.
Similar to the principle of least privilege, you should always ask Kubernetes for the fewest resources that your app needs to run properly. Leave the rest for Kubernetes to manage for you.
Here are some examples of resources that you should ask the least from Kubernetes:
The Kubernetes API abstracts away a lot of valuable functionality that is hard to get right. This functionality is built by smart people, and tested very thoroughly. Always try to use the Kubernetes API first before you build something from scratch.
If you can't, look to the cloud native ecosystem. It has a large-and-growing number of high quality software projects that your app can benefit from.
Finally, if you can't find exactly what you need, pick the next best thing and build atop that (and open source that if you can!)
We as a community should strive to follow the don't repeat yourself (DRY) principle, and this is how we do it.