Don’t Break the Kubernetes Contract!

Posted by Rick Richardson on Oct 13, 2019, 5:30:00 AM

These six things from your current enterprise probably won’t (or shouldn’t) work in your new Kubernetes architecture.

As engineers, we often tie the provisioning of resources to rigid, predefined interfaces, called contracts. This saves us countless hours, because contracts remove the cognitive load of managing all of the incidentals and externalities that come along with running software infrastructure.

This both has and has not changed in the move to Kubernetes. On the one hand, there are still contracts that help manage every imaginable dependency. On the other hand, these contracts are totally different from anything you’ve encountered before.

This creates a dilemma. You’ve spent years perfecting your existing infrastructure and procedures, but you want to take advantage of the agility and speed that Kubernetes, and containers offer. Do you really need to create a new infrastructure from scratch? Wouldn’t it be faster and easier to just lift-and-shift to containers?

Trying to keep the shape of your old IT infrastructure is probably going to cause more harm than good. Happily, it’s actually easy to get a dynamic, reliable environment, as long as you leverage Kubernetes and its powerful system of contracts.

The most important themes to remember about Kubernetes’ architecture are:

  1. Immutable infrastructure
  2. Location independence

Immutable Infrastructure

Fundamentally, immutable infrastructure means that the operating systems that execute your software do not change once they’re created. This is important for consistency. 

Especially in a large system, you have to architect your infrastructure with the understanding that a machine may go away at any moment. With containers and the immutable infrastructure paradigm, you can be sure that every time you execute your service, it has exactly the same operating environment as the previous instance. As will the one after it. So whether you have one instance or 100, they should all behave identically.

Location Independence

Location independence is critical for reliability. It means that one or more instances of your critical services can be started anywhere in your k8s cluster. Thanks to the fact that its environment is immutable, every invocation of the service is the same as any other. If you lose a machine, or a switch, or a cloud region, it is no big deal because replicas in other parts of your cluster can handle the load. It might even do so without dropping a single request!

Immutable replicas that can spin up and down dynamically introduce a host of new engineering challenges. Kubernetes is designed to address those challenges.

Let’s dive into more detail about how Kubernetes interacts with different parts of the support infrastructure.

1.

Service Discovery

 

In a microservices architecture, each service depends on other services. There could be multiple replicas spinning up and down at any time. On top of that, automated releases could be deploying different versions of existing services. One service might have several different incarnations, with different versions in various stages of rollout. How can you expect to maintain sanity in such a system?  The answer is one of the most important resources in Kubernetes. The Service.

Services are the facade for networked containers that provide a single interface to any number of container replicas running in your system as well as discovery. Services provide incredible flexibility. They can selectively route to containers based on a system of tags and rules. These are called Labels and Selectors, respectively. The contract that is the Kubernetes Service is, quite possibly, the most important component in making a micro-service architecture work.

When migrating to Kubernetes and containers, you might be tempted to just drop your monolithic application into a container and call it done, but you’d be missing out on the power, flexibility, and scalability that Services provide.

2.

Logging and Log Routing

 

Writing logs to your local disk violates the concept of immutable infrastructure. Kubernetes is designed with the idea that machines may fail at any moment. If you’ve got logs on your local machine, and it fails, odds are it will take very valuable information down with it. The default design for logging in Kubernetes is actually simple, straightforward and easy. So easy, in fact, that many developers don’t believe it.

The right logging infrastructure takes advantage of the container executor on your system. Let’s assume it’s Docker. The paradigm for Docker is that you log to the terminal, or “stdout.” That output then becomes the responsibility of the Docker Daemon, which routes those logs however it is told. Your Kubernetes deployment should have a log router installed as a logging driver in Docker. This would then route your logs to a central repository such as ElasticSearch or Graylog. As a developer, you have to do almost nothing to set up logging. Logging to stdout is almost always the default.

Since you’re now logging to a central log storage/analysis system, I’d recommend spending all of that time saved moving to structured logging in JSON, which makes the job of analysis, indexing and retrieval much easier.

3.

Ingress and Load Balancing

 

The Service discovery system within Kubernetes is world-changing. In addition to the flexibility that it offers, it is also a powerful contract that allows you to build very meaningful abstractions on top of it. One of the most important ones is Ingress. Ingress allows you to create a central load balancing service (reverse proxy style systems such as Traefik and Nginx, or even leverage hardware load-balancers such as f5).

These Ingress load balancers make it simple to host and expose your web services externally. They can even simplify the task of terminating SSL/TLS (more on that below). What this means is that your applications no longer need to be in the business of terminating their own external traffic. In fact, it’s probably going to cause problems if they try. With Ingress load-balancers and Services, it is probably best that you only expose the internal HTTP services of your applications, and leave the TLS termination and load balancing to the Kubernetes infrastructure.

4.

TLS Certificates

 

If you think that letting Kubernetes solve your load-balancing and service discovery is great, it gets better! The Kubernetes ecosystem can even remove the hassle of SSL certificate registration and renewal. Let’s Encrypt is leading the charge of the new paradigm of Certificate Management. This new system is based purely on automation; where scripts manage registration and renewal. The lifetime of each certificate is only 90 days. But it doesn’t matter, because your army of robots in Kubernetes is handling renewal for you.

Since you’re already having Kubernetes Ingress manage your load-balancing and SSL termination, you might as well let it manage your certificates. This can save literally hundreds of hours over the course of your application’s life. Gone are the hassles of dealing with registrars., signing requests, etc. You literally just configure cert-manager with some basic info about your organization and you’re done. Note that if you run your own internal CA, you can probably point cert-manager at that as well, or you can give cert-manager some intermediate certificates and let it be your cluster’s CA.

If you want to haul in your old certificates, you can, but keep this in mind: The dynamic nature of Kubernetes’ Ingresses allows you to create new subdomains and service names instantly. Your Ingress system combined with cert-manager can respond immediately, generating the specific certificates necessary to secure your new domains. What used to take hours or days now takes seconds. And you won’t have to worry about the expense, embarrassment or worse when an important certificate isn’t renewed on time.

5.

Metrics Collections

 

In Kubernetes, everything in your system is running in a container,and every container is easily enumerable. In addition, Kubernetes offers powerful units of deployment, including ways to run the same container on every one of your machines. This means that you can now run metrics collection on every container and every node.

The problem in Kubernetes is that it is too easy to collect and generate far more metrics than you want or need.

You can probably bring your old metrics collection system with you, but remember to follow the rule of immutable infrastructure. Those metrics collectors should run in containers, and, if they push metrics to a central location, that central location should be exposed as a Service, so that it can be free to move around for the purposes of reliability.

One very powerful new paradigm is enabled by Kubernetes Services is the ability to pull information from metrics collectors instead of push. This allows your metrics collection system to pull exactly as much information as it can handle and no more.

In a push architecture, it is very common that the metrics collectors and shippers overload the metrics collector, which results in network and OS overload. This, in turn, affects the performance and availability of the entire system. Metrics collection has to be very finely tuned, or it can harm the availability of your infrastructure.

A pull model avoids this congestion problem while allowing you to address the scalability of the collector at your own pace.

Prometheus is an example of the pull model for metrics collection and analysis. If you’re not in love with your existing metrics analysis and collection, give Prometheus a try.

6.

Dynamic DNS

 

It does not take long to realize how much power there is in being able to dynamically create new, uniquely named endpoints and expose them to your internal and external customers. This power does require some participation from your own infrastructure. Dynamic DNS updates are absolutely critical to success.

If your customers are internal and your DNS infrastructure is internal, then you will need to ensure that Kubernetes can effectively communicate changes to your nameservers.

I recommend taking a solid look at external-dns to automatically respond to changes in your Ingress and Services and make DNS updates accordingly. External-dns integrates with most of the popular external hosted DNS solutions, and it also speaks to internally hosted nameservers such as Bind.

This list is not exhaustive. Moving to Kubernetes requires rethinking development and operations practices related to things like state, security, and disaster recovery. The advantages of adopting Kubernetes far outweigh the challenges Kubernetes creates. But being aware ahead of time that Kubernetes will dramatically change everything you thought you knew about application development and operations will make the process as smooth as possible.

When it comes to software architecture, contracts are great. When coupled with proper implementations, they can remove a significant cognitive load from developers: Where do I log to? What format do I use for logging? How do I handle authentication? What happens when things break? If your enterprise architecture is even remotely mature, all of these questions are answered for your developers, so they can get straight to being productive.

The good news is that Kubernetes has its share of contracts as well. The bad news is that they probably look nothing like the monolithic systems from which you’re migrating.

Now, you’re saying: “It’s all just containers! I can run whatever I want!” Well, this is true. You *could* just pick up your entire system and drop it into a massive container and declare success. But you’d be missing 99% of what makes Kubernetes great.

Topics: Cloud, CTO, Infrastructure Automation, Kubernetes

Subscribe Here!

Recent Posts

Posts by Tag

See all

RSS Feed