Why Kubernetes Becomes Expensive - Allowing Rocketing Consumption

Authors:

Table of Contents

Kubernetes simplifies workload scaling but can lead to increased costs without proper management. Harness Cloud Cost Management provides visibility and optimization tools to track and reduce Kubernetes cluster spend, correlating deployment changes with cost impacts for efficient resource utilization.

The Apollo 11 Guidance Computer only had 32KB of memory to run all of this source code and that was in the 1960s. Fast forward to the 2020s and your smartwatch has exponentially more power so we should consistently be launching rockets from our watches. As computers get more powerful, so does our consumption of resources increase e.g Writh’s/Page’s law.

With every layer of abstraction and generics, we add overhead in the name of agility. Enter Kubernetes which is one of the great equalizers in the modern software ecosystem; application and infrastructure teams can have a common description of what the application needs to be deployed and live on.

A majority if not all of the operational and infrastructure related requirements can be described in one or several YAML files which once you have an awaiting Kubernetes cluster, you are off to the races. Looking at the greater Kubernetes ecosystem today, there are many ways for you to get a cluster.

From local development with Minikube to leveraging one of the public cloud vendors such as EKS and GKE to leveraging a platform-as-a-service like OpenShift; a Kubernetes cluster is just around the corner. Though Kubernetes is a relatively new technology and proliferation in the enterprise is still occurring, with the low barrier to scale workloads in Kubernetes has taken some of the organizational safeguards away in years gone by.

A Time Before Kubernetes

Clearly monumental shifts in computing have occurred since the Apollo Project so we would not go back that far in time. Looking at the time before Kubernetes dating back before 2014 in technology years can seem like an eternity with the current pace of innovation (happy 6th birthday K8s!). Add in a few years for the technology adoption curve for Kubernetes workloads to hit the mainstream, let’s consider what the paradigm looked like before reaping Kubernetes benefits.

Having a distributed application is certainly not a new creation; In the JAVA ecosystem clustering was around since the early 2000s. Testing out cluster functionality or more than one node locally was a challenge. Most likely depending on technology stack would really have to water down the node sizes so two could fit on your machine and for the networking stack increment/offset the needed ports so there was not a collision.

After locally testing, headed towards an upper environment/production, what if you as a developer had to add another node of an application server, let’s say JBoss Wildfly? First, you would not be the one adding the additional node which is part of the organizational safeguards. Most likely you would have a middleware/platform engineer on another team administering the application server clusters. This is because the steps to add a node is not trivial and most likely would incur a license cost with an application server vendor and require additional virtual or physical infrastructure.

These mythical keepers of your JEE clusters ranging from application servers to message brokers would keep a keen eye on the JAVA Virtual Machines with almost the same rigor as a system engineer monitoring infrastructure virtual machines. The middleware engineers would have a performance tuning and capacity planning/optimization part of their job to prune or expand clusters on a regular basis.

Historically since JAVA/JEE infrastructure and workloads are watched so closely by domain experts, reconstituting infrastructure is not a problem. Though if you are me and did not want your prized infrastructure being taking away, I wrote the below script at a bank I used to work for to make sure my particular cells of WebSphere were not reconstituted.

This was enough for the middleware engineering team to see that my cell(s) were active because there were license costs associated with WebSphere. How this plays out today, you can learn the basics of cloud costs in my webinar about cloud costs basics. With paradigms shifting towards Kubernetes, having another replica of applications is as simple as one line of YAML.

Enter Kubernetes - Understanding Kubernetes Basics

Need to scale your application right now? No worries with Kubernetes you can increase or decrease the replica count in one line of YAML or a simple “kubectl scale --replicas=3 rs/foo” command. Like magic, Kubernetes as a resource manager and scheduler will fulfill your request as quickly as possible once there is available space on the cluster.

Compared to what the perception of scaling your workload looked like without Kubernetes, increasing the number of replicas is a walk in the park. No pesky middleware engineer to interrogate you and submit a few tickets for review and further discussion; just fire up kubectl and bam. With this level of agility, application teams can assume a much more rapid ability to scale their workloads. Tunings that would have happened in the past because scaling was still time-consuming take a little of a back seat once the ability to rely on Kubernetes is there.

If leveraging Kubernetes as a piece of infrastructure, you still need infrastructure to run the cluster on. Kubernetes itself will run out of available resources. As workload demand increases, a common approach is to scale the Kubernetes cluster. With more resources comes more cost.

As more workloads headed towards Kubernetes, the control has been shifting in organizations from development teams owning Kubernetes to again the rise of platform engineering teams ensuring the clusters are up and running. There might be quotas and chargebacks applied to teams but they are much more lax to allow for scaling. Typically workload tuning will fall on the application team and cluster tuning will fall on the platform engineering team.

Tune Workload vs Tune Cluster or is it Both?

Kubernetes terminology matters. Kubernetes is no different from other pieces of software needing maintenance and administration; you would not leave your operating system running without updates and tuning until the end of time.

From a platform engineering standpoint, building Kubernetes clusters to support workloads for your organization is an iterative exercise. Understanding the capacity needs on Kubernetes vs physical or virtual hardware is exacerbated because the teams placing their workloads onto your prized Kubernetes clusters have a reasonable expectation of elasticity. As a purveyor of platforms, looking to the teams that placed workload would be part of the puzzle. Potentially, tuning the workload might be easier than cluster-wide changes when tuning the cluster.

Application teams own the application footprint which powers their ideas. Reducing overhead and increasing performance is intrinsic to software engineers building the next set of features. The application teams usually will create the Docker Image and Kubernetes Deployment YAMLs which has the resource specifications. As a software engineer building containerized workloads, I’ve been trained to assume that 100% of the resources in the container will always be used for design safety.

When running our virtual Harness Universities, given we have several dozen concurrent users and know the container resource limits, I built out a large cluster to handle the potential maximum. I felt I was a little on the low side and worried about hitting capacity so I had an autoscaler ready to add additional Kubernetes worker nodes.

Looking at the above numbers, even at the peak usage we were not even at 10% of the cluster capacity; talk about overkill. To make the above more efficient, the choices are to tune the workload and/or tune the cluster.

Implementing Kubernetes Monitoring Tools:Tune the Workload + Placement

The four humors of an application are storage, compute, networking, and memory. By reducing one or more of the humors, you are reducing the burden on the cluster and thus can reduce the cluster size or have the ability to place more work in the current cluster.

Reducing one of the resource requests/limits is the main way to lessen the footprint of your workload. Easier said than done, there can be lots of factors such as the availability of skillset if an application/service is not in active development, that type of tuning is risky without development support or proper metrics to back up the tune. Usually, resource requests/limits are sized with some benchmark or baseline in mind.

[Cloud Cost Management showing actual, requested, and limits of resources]

Age is also a factor, newer purpose-built applications for Kubernetes tend to leverage newer architectures and platforms vs a lift-and-shift into a container where the limits are what was once in a traditional application server.

Workloads that are more ephemeral and/or elastic can benefit from tuning the placement strategy. Placement strategy is exactly that, how Kubernetes goes about placing your workload in the cluster. Tiffany has an excellent blog post going into different autoscaling and placement strategies that you can take advantage of.

On the flip side, if your workloads continue to tax the cluster, for example, executing on one of the cluster level autoscalers such as the Horizontal Pod Autoscaller (HPA), you will run out of resources. In that case, you will need to take action at the cluster level such as adding additional nodes / tuning the worker nodes to reduce one of the pressures that triggered the HPA.

Tune the Cluster

The mighty Kubernetes is a worker node model in which a primary node orchestrates and the worker nodes are where the kubelets live e.g where the work takes place. At the cluster level can be broad levels of tunings from policies to kubelet tuning all the way down to the machine level.

Your Kubernetes node has to run on some operating system either physical or virtual and most likely that operating system is a Linux variety. In the Linux world, a common tune for Kubernetes is to disable swap which swaps memory from RAM to disk. In more modern Kubernetes distributions, your kubelet service would not actually start if the kubelet determines that swap is still enabled. Performance tunings that a system engineer would make on a Linux machine still would pay dividends for your workloads to run more efficiently.

Usually reserved for larger clusters, you can go above and beyond giving node affinities with taints and tolerations, you can actually modify the scheduling behavior by influencing how placement scores are calculated with kube-scheduler thresholds. Getting into the weeds on how Kubernetes schedules workloads might be out of scope for some organizations,

For the actual cluster size, in a previous blog post, I walk through commands to remove and drain nodes from your Kubernetes cluster. Even with a small cluster, your costs can still add up very quickly.

How costs can add up

Remember in an earlier section when trying to test a distributed application on your local machine in the pre-Kubernetes Days? Non-production scaling and testing was certainly a pain point before Kubernetes. Most likely today when leveraging Kubernetes, having the ability to test a distributed architecture at scale is simple in Kubernetes.

Ironically non-production workloads can take up as much or more resources than the actual production application. Having multiple environments or namespaces to support multiple lower environments can be a costly endeavor. The nature of providing a generic platform that allows for the quick and consistent scaling of workloads alludes itself to greater costs.

Since the platform is generic, the “death of the middleware engineer” and intimate knowledge of workloads have changed with Kubernetes. The face of middleware/platform engineers today is making sure the platform is running e.g Kubernetes but the workloads inside the platform are not their responsibility per se. If the cluster is being overtaxed, the platform engineers have their work cut out for them determining cost and the best way to justify the workloads. That challenge is eliminated with Cloud Cost Management.

Enter Cloud Cost Management - Partner with Harness

Cloud Cost Management allows you to track application and cluster level usage on the dollar amount of your cluster spend. This is accomplished without the need for labels and tagging inside of Kubernetes since Harness is a system of record for the deployments.

Cloud Cost Management also has the ability to correlate events such as changes/deployments to how they relate to cost. Looking at historical trends or impacting events in one concise spot is achievable with Cloud Cost Management.

The Harness Platform with Cloud Cost Management is your one-stop-shop for focusing on and addressing were to make optimizations with your Kubernetes workloads. Harness is here to help partner with you to reduce your cloud costs. If you have not seen Cloud Cost Management in action, feel free request a demo.

Cheers,

-Ravi

Cloudopoly: Master Cloud Spend to Achieve Strategy, Savings, and Scale

Join the FinOps Excellence Summit on July 16th. Learn from industry leaders about cloud cost optimization, savings strategies, and AI-powered FinOps. Register now!