Discussion on Hacker News
This article and its topic were the subject of fairly detailed discussion on this thread here on Hacker News. Perusing through the comments should be well worth your time, albeit you might emerge completely confused by the experience. And no–tech guys do not hate each other, we just have a wide-ranging spectrum of view points.
It is impossible to escape hearing about Kubernetes in your organization. The red hot container orchestration technology from Google is regularly part of elevator talk and water cooler conversations. It is not very often that you find a singular technology that the rank-and-file, from the DevOps intern to the CTO swear by, whether they fully understand it or not. But reading about Kubernetes can leave you more confused than you were before and you may fail to understand what the hoopla is all about. Also, there are many who present Kubernetes as a silver bullet and this is far from the truth. If you can adopt Kubernetes or not and how deeply, depends on multiple factors and I intend to throw light on some of the key ones. I shall also give you a step-by-step approach to gradually evaluating and introducing Kubernetes into your organization. Kubernetes is great technology and there are significant benefits from it, if it’s employed properly.
1. Pets to herds
There were days when servers were lovingly named with host names and sticky notes and taken can of by system administrators. Cut to the present, we don’t really know what type of servers run our workloads since those gray, rack-mounted boxes in cold server rooms and data centers have given way to public clouds run by Amazon, Microsoft and Google. System administrators did continue naming virtual machines running on public clouds as well, like they would do their pets. There used to be times when if servers were required, it could take weeks and talking to sales people over the phone or over email to get them provisioned. But with high-performance virtualization and public clouds came APIs. A simple API call could spin up a machine in seconds. The same machine which would earlier take weeks and talking to a sales person. And it all took off from here. Very quickly, clever tech folks realized that not only did public clouds provide quick access to compute, they provided the potential of automation via simple-to-use RESTful APIs. This was pure gold.
2. Infrastructure as code and DevOps
Then we saw technologies like Chef, Puppet, Ansible and SaltStack. With these the line between dev and ops began to blur. While system administrators rarely ventured beyond shell scripts, systems like Chef, Puppet and Ansible were full-blown system orchestration frameworks. While Puppet uses a domain specific language or DSL to let system administrators define the end state of their infrastructure setup, Chef uses a Ruby language based DSL that puts the full power of Ruby’s expressiveness to define infrastructure end state. These technologies put us squarely in the era of declarative and imperative infrastructure, where it was possible to define the end state of server infrastructure using code in files, which could then be versioned like regular source code. This is very different from system administrators setting up servers manually, then logging into them to setup various software services and hardware resources like storage or networking. While the best they could do was run shell scripts that provided some repeatability, Chef and Puppet provided centralized management of system configuration and very powerful APIs that made managing a large number of servers less cumbersome.
3. The rise of Docker
Another powerful technology will then make it very easy to use Linux primitives to put together and manage containers: Docker. As commonly misunderstood, Docker is not a container technology. Linux containers are built using Linux’s operating system primitives–CGroups or Control Groups and Namespaces. Docker made it easy to build and manage containers. Docker deserves an article/video of its own, but that’s for another day. With Docker, it became easy to package up an application, along with all its dependencies, containerizing it. Much like a real container, “shipping” this became very easy. This container could then be run on any Linux machine without the pain of having to first install all of its dependencies, which could be Linux distribution specific. This made it easy to run Nginx from Debian Linux, pair it with the Python Flask framework from Ubuntu, while using MySQL from Alpine–these packages coming together to run the user’s application, all running on the very Linux same server. Docker, in summary became a standard way to build containers declaratively, then run and manage their lifecycle on Linux servers, harnessing Linux’s underlying container technologies and providing tremendous utility. Docker, understandably was a rage when it first came out.
While technologies like Chef and Puppet were good at managing the configuration of physical and virtual machines, containers were quickly becoming the standard way DevOps engineers deployed applications. Also, containers encapsulate much of the application deployment logic, making the rest of the orchestration a lot simpler. The time was right for a more modern container-centric orchestration system.
However, in the meanwhile, there was another rapidly evolving methodology which would move ops more and more into the realm of the developer: DevOps. Engineers had figured out that while the ops folks wanted stability, developers wanted new features to be released all the time, challenging the stability ops sought so much. These were conflicting goals and often put the teams at loggerheads, directing affecting product shipping velocity. Also, when product developers aren’t aware of all the problems the code they write causes in the production environment, they don’t tend to put in the effort to fix it. This was because the production environment was traditionally the ops team’s headache. To solve these problems, teams started trying out a new methodology, DevOps. The idea was simple: those who wrote the code, ran it in production. They were the ones who were put on call, too. No more lobbing the code over a wall and forgetting about it. If they wanted to release new features more often, they figured out what they needed to do on the ops side to support that. Technologies like the cloud and containers, with their APIs and tooling, made it possible to bring a software engineering approach to cloud operations. Fast moving DevOps teams were building large applications often composed of microservices that could be independently developed and released without incurring the overhead of changing, testing and releasing one large monolithic app. Teams practicing DevOps using the microservices architecture loved the easily moldable elasticity of the cloud and containers that were easily controlled with scriptable tools and APIs.
4. Enter Kubernetes
Where DevOps and microservices meet containers, a new, more native orchestration system made more sense. Inspired by Google’s Borg system, Kubernetes is an open source container orchestration system built initially by engineers from Google and is now maintained by the Cloud Native Computing Foundation or the CNCF. As a container orchestration platform, Kubernetes can help automate application deployment, scaling and management. While systems like Docker manage containers within a server, Kubernetes does much more, but one of its main features is managing a cluster of servers or nodes on which containers can run. Comparable systems to Kubernetes are Apache Mesos and Docker Swarm. While there were debates as to which system was going to become a standard, as of now, it is more than clear that Kubernetes has emerged as a winner. Betting on Kubernetes for your container orchestration strategy is a very safe bet at this point.
4.1 Not just Docker
It is important to note that Kubernetes can orchestrate containers managed not only by Docker. It can also orchestrate containers managed by systems similar to Docker like ContainerD, Cri-O and RktLet. While you can create and manage containers with any of these systems, for the purposes of this article/video, we will substitute “Docker” for “container management” in general. Now, let’s look at some of Kubernetes’s most important features.
4.2 Why Kubernetes
To understand why a system like Kubernetes is important, it might be a good idea to think about where the capabilities of Docker end. While Docker makes it easy to manage the lifecycle of containers within a server, Kubernetes makes it easy to manage a cluster of servers running Docker containers. Also, modern, microservices based applications are usually made up of several containers. Kubernetes provides a notion of an “application deployment”, which is essentially a set of containers that make up an app, running in a distributed fashion on the cluster. When you want an application made of up a bunch of containers to run, you just tell Kubernetes about it and it can find out which of the nodes on the cluster have enough compute resources to run the containers and it schedules them there. It can also take care of restarting containers when they fail or even scaling your app based on some parameters, by running more containers to take on traffic surges. This is the essence of Kubernetes and this is what people are referring to then they’re talking about “container orchestration”.
When running multiple applications distributed on a cluster, there are other nice to have features that help with easing their management. Kubernetes has features making it easier to manage application configuration and credentials. There are other bits of infrastructure like storage and networking that Kubernetes manages. Along with compute, these are the other two blocks that make up the very basic three components of any infrastructure.
4.3 Managed or not
While it is possible to run Kubernetes on your private cloud by setting it up from scratch, it might be wiser to opt for Kubernetes distributions like OpenShift that have the option of paid support–at least before you scale up your Kubernetes infrastructure. It is also possible to setup Kubernetes on a cluster of machines on AWS, Azure or GCP, but all these public clouds also feature managed Kubernetes offerings. Kubernetes is made up of multiple components. These, working together, help run the cluster of machines known as compute nodes. There will still be Kubernetes components that will run on each compute node as well. When you opt for a managed Kubernetes offering from any of the public cloud vendors, you are generally allowed to choose any type of node from their compute offerings that will then be made part of the Kubernetes cluster. These are the nodes where your containerized workload will be deployed. The main Kubernetes master components otherwise known as the “control plane” is managed by the cloud provider.
5. Is your organization ready for Kubernetes?
This really depends on not one, but a bunch of factors. Let’s go through these factors in some detail.
5.1 Are you on private or public cloud?
Kubernetes can be deployed on both public and private clouds. On private clouds, while you can, in theory install and maintain Kubernetes yourself, you might be better off running a distribution of Kubernetes like OpenShift, that can be supported by a vendor. While Kubernetes was born in the cloud and is a what is known as a “cloud native” technology, it is a great choice to manage your compute cluster with. But, it doesn’t really matter if you are running it to manage a private cloud. In fact, it can be argued that Kubernetes is a great choice to manage your private cloud.
For public clouds like AWS, Azure or GCP, you can run Kubernetes on a set of compute nodes by setting it up yourself. For example Kops is a popular solution that lets you deploy Kubernetes on a set of EC2 instances running on AWS. However, I strongly recommend you pass on the headache of maintaining the Kubernetes control plane to the cloud provider by choosing to go with a managed Kubernetes offering so that you can concentrate on the mechanics of running your workloads on Kubernetes.
5.2 What’s your current level of Docker adoption?
If the apps you want to run on Kubernetes are not already containerized, it’s a non-starter. Kubernetes being a container orchestration platform, you’ll have to ensure that the apps you want to run on Kubernetes are well tested on Docker in production. In just a bit, we’ll discuss a simple strategy of how you can roll this out step by step. Docker adoption is widespread, the tooling is very mature and it is safe to say now that Docker is well past the hype phase. There should be no questions around any risk pertaining to Docker adoption no matter how “careful” an IT strategy your organization has. Another real advantage is that Docker, in combination with Kubernetes, when properly implemented, can really drive up server utilization levels. So, this is something to give thought to.
5.3 How mature is your DevOps culture?
The presence of a strong DevOps culture means that devs are responsible for running services they develop in production. They always look for ways to make themselves productive by finding means to automate things on the operations side. This is especially true if your app or service is based on a microservices architecture, which means there are many moving parts and teams owning different microservices need to operate independent of each other. Kubernetes is a great fit if this culture exists and should find internal adoption very quickly. Don’t get me wrong here. Kubernetes can run monolithic workloads just fine. The main point here is that if there are two separate teams, dev and ops, the ops team working on a completely new way of deploying and the dev team working to adopt to a build system for containerization and app configuration and then getting these teams’ efforts to fit like a hand in a glove seems like a lot of wishful thinking.
5.4 Availability of Kubernetes Talent
Kubernetes can take a while to learn and it only makes sense when the person learning it has already put in the effort to learn containerization. I can imagine few books where the “What you should know” section does not list Docker. If you are convinced that Kubernetes is for you and having considered points we’ve discussed so far, you’ll need one or more champions who are capable and confident of running production workloads on Kubernetes. If you only have folks in your organization who’ve just scratched the surface as far as Kubernetes goes, we’ll see how you can build on this experience while reducing risk and build a path to a future where you’ll be running production workloads on Kubernetes. In just a bit, we’ll see exactly how to get this done.
6. Kubernetes Gotchas
If someone calls Kubernetes a panacea, they’re not walking a middle path. Here are some things you need to be aware of.
6.1 Managed Kubernetes is not a panacea
Kubernetes is a system built out of many pieces of software working in tandem. Irrespective of whether you manage it directly or you opt for managed Kubernetes, things can go wrong. It is not uncommon for managed Kubernetes to have trouble with any of its various constituent components. Do not assume that just because the Kubernetes control plane is managed by your cloud provider nothing can go wrong. You can find several cloud provider specific Kubernetes issues assigned on Github. When something goes wrong, you might still need to contact support and get things in order. This can potentially involve downtime. Because Kubernetes components in the control plane only create and monitor containers, if they go down, they generally do not affect already running containers. For a Kubernetes control plane to go down taking down with it all containers is a rare thing. You can be prevented from creating new containers and from being able to auto-scale, etc, however.
6.2 Stateful applications on Kubernetes are still evolving
Kubernetes was made for applications that create and destroy containers like short-lived insects. A lot of containers can be created in response to a surge in traffic and be terminated once things get back to normal. The same is true for background job runners. It is meant to be a very dynamic environment where servers are truly treated like herds–a far cry from the days of naming servers lovingly, akin to pets. In that sense, Kubernetes’s support for more stateful applications like databases may seem like an afterthought. It is an area of active development and is fairly stable now, but one should not be surprised if the way stateful applications work under Kubernetes continues to change relatively rapidly in comparison to other areas.
It is possible to allocate persistent volumes in a cloud provider neutral way for use in your stateful applications, which are natively supported by Kubernetes.
If you want to use stateful services managed by the underlying cloud provider (e.g: RDS, DynamoDB, etc), the native way to do it with Kubernetes is to use a Service Catalog, which makes it easy to consume managed services from cloud providers.
6.3 Kubernetes upgrades
A quick web search should turn up enough horror stories around Kubernetes cluster upgrades making you want to hug your existing cluster setup. The best way might be to recreate a cluster with the same version that powers your production cluster, install your critical apps there and upgrade this cluster to check if everything turns out fine before you proceed to upgrade to that shiny new version of Kubernetes on your production cluster. Let’s face it–if you’re serious about Kubernetes and the benefits it brings, cluster upgradation is something you can’t escape. So, plan and execute.
6.4 Many moving parts
Take virtualization. It is an abstraction we now take for granted. An abstraction we are comfortable using. In fact when someone refers to a “machine” or “server”, they are most likely referring to a virtual machine. For applications, it is very much possible Kubernetes becomes the new, standard substrate. A new level of abstraction, that becomes the new normal. With virtual machines though, with most large systems standardizing on Linux’s KVM technology, it is very much part of the operating system layer. Although there are other components involved, they are fairly low-level. It is very unlike Kubernetes where a dozen services are talking to each other across a cluster of machines, doing fairly sophisticated things with compute, storage, network and auto-scaling.
When problems rear their ugly heads, you might have to roll up your sleeves and peer under Kubernetes’s hood to make sense of what’s going wrong. We just have to assume at some level that all the different versions of these components that Kubernetes uses are somehow in harmony with each other. Or at least that is what the cloud provider would like us to believe.
6.5 Retaining Kubernetes talent
If you are itching to jump headlong into Kubernetes and you do it with just one person behind you, you’re taking this jump at your own peril. While Kubernetes proves itself worthwhile for your use case, as we will see in a bit, it might be a good idea to get your whole DevOps team trained or self-trained on Kubernetes. After all, who in their right mind might want to pass on an opportunity to get trained and then work on a red hot piece of technology? To be on the safer side, it is important that you have not one or two, but a team of DevOps folks who are comfortable in dealing with Kubernetes so that you have continuity in your Kubernetes execution should your lone Kubernetes choose to move on. That happens a lot–trust me. With Kubernetes listed as a skill on their LinkedIn page, recruiters won’t take too long to call.
7. Introducing Kubernetes into your organization
By this point, if you’re convinced you can benefit from deploying Kubernetes in your organization, here are some steps you can follow to bring in change in a manner that lets you organically understand Kubernetes and what it takes to run it to manage production workloads.
7.1 Train or hire Kubernetes talent
You need people with hands-on Kubernetes knowledge from your DevOps team to execute your plan. Given the availability of high-quality training material online, they can train themselves or they can go through a more formal training program. You should check with your cloud provider if they can do a training specifically for your organization or if they have something coming up that your team can attend. You might also have other paid options in your area.
Hiring Kubernetes talent is another option. Look for folks who’ve run production workloads under Kubernetes. While hiring them, you might want to discuss about the path they took to production and about any challenges they faced while running production workloads on Kubernetes. Ask them if they ran any stateful workloads. Based on this discussion, you should be able to figure out if they might be a good fit for the projects you have in mind.
7.2 Move workloads to Docker
If the workloads you want to move to Kubernetes are not already running either in your staging or in your production environments on Docker, by moving these workloads directly to Kubernetes, you’re attempting a double leap. If you have any trouble, you won’t be able to exactly point out if its being caused by build-time, run-time or configuration problems. On the other hand, if you team has worked on containerizing your workloads and have run them in production with Docker you’ll have fewer variables to deal with, should you have trouble.
7.3 Run non-production workloads on Kubernetes
Now that your workloads are containerized, you’re good to move some non-production workloads such as those on dev and staging to Kubernetes. This way, your larger team can get used to the new environment, get Kubernetes cleanly integrated into your Continuous Deployment pipeline, etc.
7.4 Stateless workloads first
When it comes to moving production workloads to Kubernetes, it might be a good idea to start with stateless workloads first: containers that merely serve application requests and do not directly persist any data. Running stateful workloads on Kubernetes requires deeper Kubernetes expertise and you can build that first within your organization by running stateless workloads. With stateful workloads, you need to plan a bit more carefully about how you’ll manage when nodes go down, for example. This will be especially complicated with distributed stateful apps like Elasticsearch, for instance.
7.5 Move non-critical workloads
It is also a good idea to move non-critical workloads initially on to Kubernetes and let your team gain experience running production workloads on it. There are other aspects to running workloads in production like monitoring, alerting and application upgrades they’ll need to figure out, for example.
7.6 Make the big leap
We saw how you can adopt Kubernetes in your organization in a fashion that builds expertise while reducing risk. We also discussed how to get ready from an engineering point of view to containerize and slowly introduce production workloads onto Kubernetes. You are the best judge to figure out when exactly you’re ready to do this. Each organization’s adoption vs. risk ratio is different and you should use your best judgement. To automate deployment of Kubernetes clusters themselves, my recommendation is you use a tool like Terraform.
8. In Conclusion
The best laid plans take into account worst-case scenarios. That’s the main idea behind this article: to tell you what could potentially go wrong and how you can mitigate those risks in your Kubernetes strategy. Kubernetes is great tech. There are many areas where it will most certainly help you. But as with any technology on which you intend to run production workloads, you’ll want to weigh your risks and see if the benefits outweigh them. Resource like Kubernetes Failure Stories are useful in figuring out what the most common problems might be and even how you can work around them.
Personally, I’m super excited about Kubernetes and have production workloads running on it. There are real benefits you get from the orchestration, auto-scaling, server utilization and self-healing capabilities that Kubernetes provides and these aren’t worth building on your own.