December 14, 2021

Kubernetes Cost Management Strategies: Cost Savings

Table of Contents

Key takeaway

Effective Kubernetes cost management involves strategies like cluster downsizing to eliminate unused resources, workload rightsizing to optimize resource use, and autoscaling to automate resource adjustments. These techniques help reduce expenses and improve efficiency by leveraging tools for better visibility and automation.

Part two of our Strategies series! Last week, we started with visibility. Today, let's look at cost savings to help us achieve cloud cost optimization.

Once you have visibility into your Kubernetes costs, you can start to more effectively save on them. Understanding Kubernetes basics can also be pivotal to this. In the same way that it’s hard to save money personally if you don’t know where it’s going, when you do obtain that visibility into where the money is going, you can decide where to make cuts.

One of the biggest difficulties in implementing cost savings is the variety of different teams that work on delivering and maintaining applications. Infrastructure, CloudOps, and Platform teams manage clusters, while Application and DevOps teams manage the services and applications deployed on those clusters. The work spans across these teams for optimal resource efficiency, but also introduces more complexity with so many teams and infrastructure consumers involved.

Let’s take a look at some of the ways in which you can save money on Kubernetes.

This article contains an excerpt from our eBook, Cost Management Strategies for Kubernetes. If you like the content you see, stick around to the end where we’ll link the full eBook for you. It’s free - and best of all, ungated.

Examples of What You’ll See

With a good view into cost savings, you’ll be able to find a variety of savings opportunities. Some of these include:

  • Finding unused or “zombie” clusters and nodes that you can kill
  • Rightsizing based on actual node resource requirements versus initial assumptions
  • Increasing pod density to optimally use node resources and reduce idle costs
  • Storage being underutilized or unattached
  • Allocating appropriate resources to workloads with high needs and requests

Strategy One: Cluster Downsizing

Cluster Downsizing

Fewer clusters mean less compute cost. However, removing entire clusters isn’t the only way to reduce cluster costs. You should have visibility into how your Kubernetes resources are being utilized, and you should be able to understand which resources are unallocated. These are the low-hanging fruits that let you save on Kubernetes right away.

In a lot of ways, downsizing is a fancy way of saying “find what you provisioned and paid for that nobody is using, and get rid of it.” Whether it’s entire clusters, or nodes within clusters that are overprovisioned, trimming the fat on unallocated Kubernetes resources is a quick way to introduce cost savings.

The simplest way to get started here is by hooking up Kubernetes to Prometheus (or similar tools) so that you can get your monitoring set up. Now you’ll have a kubernetes monitoring tool. From there, you’ll be able to see your utilized, idle, and unallocated resources. For this strategy, you want to look at your unallocated resources and cut down on the ones you really don’t need.

Using good Kubernetes cost management tools can also provide you this information at a glance so you can minimize the toil involved with a Prometheus setup. For example, Harness provides this visibility out of the box so that you can see your breakdown of utilized, idle, and unallocated costs by cluster. You can also see, at a workload level, how much memory, CPU, and storage are costing by the same metrics.

Harness CCM Visibility
Harness CCM Cluster Visibility


Cluster downsizing, especially if you’re killing entire clusters, can be fraught with a lot of risk and requires clever engineering to make it work without interruptions to the service. Imagine if you had 5 people to do a job and then you cut it down to 2. How would that affect your ability to get things done? You have to be clever about how you work around those constraints while making sure you don’t drop the ball on the things that need to be done.

As with all infrastructure changes, you’ll also need to ensure that the unallocated capacity isn’t a result of a need that someone else has before you kill it. Especially if you are doing this manually and without an autoscaler, it can be troublesome to spin up or down resources for new nodes or new clusters.

When It’s Most Useful

Downsizing is appropriate for all. The beauty of Kubernetes is that you can define the need to the control plane or master node and the result will magically be taken care of. However, there is a good chance that waste will accumulate, like with all things, and you’ll have resources that become “zombies” - provisioned at one time, but then left unused or forgotten. 

Given the breakdown between cluster and workload managers, cluster-level optimizations like this are generally most appropriate for Infrastructure, CloudOps, and Platform teams, though of course any team that governs or manages the provisioning, usage, and operation of the clusters can use this.

Cluster downsizing is a great strategy for organizations at all levels and is one of the easiest strategies to implement and see strong cost savings. It’s recommended that you use an autoscaler to get the most out of this strategy.

Strategy Two: Workload Rightsizing

If downsizing is getting rid of unallocated resources, then rightsizing is the approach that minimizes idle resource costs. Instead of looking at which resources are completely unused, in this scenario you’re looking at which resources are being underutilized, usually at the pod level. This allows you to move workloads around and create a better profile for the compute resources you need to provision for the node.

Rightsizing often results in increased pod density, which better optimizes the use of resources across the node. To achieve this, you first need to understand historical usage or workload patterns. With this knowledge, you can understand that if average CPU utilization is 40%, then maybe you don’t need the level of compute initially thought, and you can change the configuration to use a compute resource with a smaller cost footprint.

Cost Savings With Rightsizing

On the other side of that coin is making sure that nodes have the appropriate resources allocated to them. In the above example, what if it turns out your CPU utilization is 40%, but you consistently run out of memory? In this case, you can’t just get a smaller resource. You have to change your request and limit profile entirely to decrease relative CPU utilization but increase memory parameters. However, it’s more common that both CPU and memory are overprovisioned. In either case, you want to ensure you select the right worker nodes for the workloads that need to be handled.

As with downsizing, you want to first get visibility into utilized, idle, and unallocated costs, which you can do by hooking into Prometheus (or similar tools) and visualizing your usage. For this strategy, you want to look at your idle costs. From there, you’ll want to decide what the best path forward is in terms of resizing your resources or moving workloads around. These basic steps will set you on the path towards minimizing your idle resources and being more cost-efficient.

Alternatively, a tool like Harness can dramatically reduce the effort required to find rightsizing opportunities, and then figure out what the right requests and limits should be. In this way, your historical data is leveraged to automatically generate recommended resource profiles, so all you need to do is go and make the change.

Cost Savings With Recommended Resources


The biggest risk to implementing a good rightsizing strategy is poor understanding of the data. If you don’t have good visibility into how your resources and workloads are performing relative to the requests and limits you’ve set, it becomes impossible to rightsize. Poor or no information can result in resources being severely underprovisioned and causing performance issues, or being severely overprovisioned and raising red flags around cost.

A key consideration that can be easy to forget is for application-level metrics in addition to rote workload-level metrics. For example, you should make sure to consider application-level metrics like JVM heap sizes for JVM-based microservices. Are you appropriately sizing for these needs, too?

You’ll also want to make sure, as with downsizing, that you’re not stepping on any other toes in the case of shared resources across the Kubernetes environment.

When It’s Most Useful

Rightsizing can end up being very technically-involved. As such, it’s most useful at organizations that have a good way to track their utilization metrics, which can be as simple as plugging into Prometheus to capture core infrastructure metrics. This strategy is good for teams that have the know-how to move workloads around without breaking things, and can effectively separate expected usage versus the overhead safety net to ensure both performance and cost considerations can be met.

Given the breakdown between cluster and workload managers, workload-level optimizations like this are generally most appropriate for Application and DevOps teams, though of course any team that governs or manages the provisioning, usage, and operation of Kubernetes workloads can use this.

For teams that are looking to more optimally provision and use existing resources, rightsizing is a solid cost savings strategy. It’s recommended that you use an autoscaler (specifically Vertical Pod Autoscaler, in this case) to get the most out of this strategy, since autoscaling takes care of the macro-level optimizations, leaving you to focus on the more micro adjustments that come with rightsizing without too much additional overhead to cut through.

Strategy Three: Autoscaling

If you don’t have to keep track of and optimize your Kubernetes footprint manually, why not go for an automated approach? Autoscalers provide you the ability to specify the conditions under which more resources should be provisioned, or when resources should be terminated. In addition, you can set the floor and ceiling for resource provisioning so you don’t inadvertently do too much in either direction.

Kubernetes provides autoscalers that let you autoscale your workloads or pods (Horizontal Pod Autoscaler, Vertical Pod Autoscaler, Kubernetes Event-Driven Autoscaler), as well as autoscaling your clusters or nodes (Cluster Autoscaler).

You’ll want to use workload or pod autoscaling through HPA, VPA, or KEDA when you want to autoscale workloads based on defined metrics. If your usage for a pod or workload crosses a threshold compared to the target metric, things can be scaled up, such as using more pods or increasing resource limits. Similarly, if usage is very low compared to the target metrics, things can be scaled down. The ability to scale pods and workloads is limited only by the resources made available to the node in which these reside, meaning if a node’s resources are at their limit, your autoscaling stops there.

Autoscaling a cluster or node makes pod scaling more effective. While pod scaling affects the scaling and provisioning of resources within a cluster or a node, this kind of scaling effectively determines the overall amount of resources available to all pods and workloads. Where pod scaling makes the most of available resources, node scaling determines the amount and type of available resources. Cluster Autoscaler detects when pods are in a pending state (waiting for resources) and scales up the number of nodes to add pending pods to. It also detects the opposite, when nodes are no longer needed, and scales down resource consumption.

Kubernetes Cost Savings With Autoscaling


Bad autoscaling policies are the bane of any Kubernetes cost savings effort. If limits are set incorrectly or conditions are met through strange edge cases, it can result in out-of-control autoscaling that causes a dramatic increase in costs. At the same time, autoscaling is a great litmus test or even leading indicator of something going wrong in your assumptions. For example, you might see rampant non-anomalous scaling that signifies user growth, or that costs spiral out of control, and that automation might be the first place to look.

When It’s Most Useful

Autoscaling is useful for any organization that needs some form of cloud automation. If you’re applying autoscaling policies in AWS, GCP, or Azure, chances are you’ll want to leverage autoscaling in Kubernetes. With the right limits set in place, and controls around cleaning up any issues that arise (such as cost snowballing or zombie resources), autoscaling is a tremendous step forward for any organization looking to more optimally spend money using Kubernetes.


We’ve seen Harness customers that have reduced their bill by as much as 80% using these Kubernetes cost saving strategies. While these strategies can be individually used to reduce costs in Kubernetes, they’re most effective when you can leverage all of them. The key to remember in reducing costs is that it’s notoriously difficult without good visibility. As you can see in each of these strategies, having a good understanding of what’s going on is critical to maximizing the returns and mitigating risk associated with any infrastructure change.

We have one last strategy to go over in our next post: cost forecasting. However, if you don’t want to wait for that post to go live, you can simply download the full eBook right now - it’s free and doesn’t require an email address! Download the Cost Management Strategies for Kubernetes eBook now and learn all about cost visibility, cost savings, and cost forecasting in one fell swoop!

You might also like
No items found.
Cloud Cost Management