When was the last time you spun up a new workload in Kubernetes and set the resource requests and limits high enough that performance would never become a problem? Now, when was the last time you went back to see whether you could make that resource profile more efficient, based on actual usage? Have you ever been told you’re spending too much money on cloud resources because of this?
Chances are you do the first part fairly regularly, and the second only when required, when someone tells you to make it more efficient because it’s costing too much. That’s a common story for engineering teams, who are the primary consumers of cloud resources, and who are tasked with finding and resolving cost issues, work that is piled on top of the development queue that already exists! It can be a real hassle that takes days or even weeks to resolve.
The good news is that there are ways to make doing this less of a drag, even if cloud cost issues are a reality that will never fully go away.
The core of Kubernetes cost concerns lies in the rise of the DevOps paradigm. In no way is DevOps a bad thing. In fact, it’s the reason you’re able to move so quickly and deliver innovation to customers faster than ever before. As part of this efficiency in software delivery, the use of Kubernetes in development and deployment is here to stay, because it simplifies the deployment, management, and scaling of applications.
On the other side of that coin, however, are the costs associated with moving fast. While infrastructure is being optimized for a high-quality end user experience, it can come with a hefty price tag. This is going to be a problem whether you’re on-prem, or in AWS, GCP, Azure, or any other Kubernetes-supported platform. Getting those costs under control is important, so let’s start by exploring the issues in Kubernetes cost management today.
Speed of delivery and level of performance often stand atop the list of priorities for engineering teams. On many occasions, cost will not be a concern until the organization or the budget owner encounters bill shock. At that point, it will become a mad dash to understand why costs ballooned and what we can do about them.
The problem with that approach is that often, it’ll turn into a one-time exercise to understand costs and potentially to get them to an acceptable level. At the end of the day, cloud cost management still isn’t a priority for engineers the way application performance management is. Until this issue of cloud finops is resolved, engineers will still prioritize performant applications and code over cloud cost optimization any day.
With your Kubernetes getting started journey underway, it's common to overestimate resources. This is by far the most common contributor to Kubernetes costs. It’s also the lowest-hanging fruit insofar as cost management is concerned. Because engineers prioritize performance, they opt to set resource requests and resource limits above what they think they’ll need. This is to ensure performant applications and a good end user experience. Imagine if engineering teams did the opposite and ended up under-provisioning their resources. That’s not a good outcome for end users nor the organization.
At the same time, guessing what resources the application or service will need is a tall task, and sometimes impossible. In lieu of having accurate utilization metrics before the service has gone live, engineers do what makes the most sense and make large resource requests. The good news is that once it’s in production, it’s easier to see how this can be scaled down.
Autoscalers are powerful tools, and Kubernetes provides them in four flavors: Horizontal Pod Autoscaler, Vertical Pod Autoscaler, Kubernetes Event-Driven Autoscaler, and Cluster Autoscaler. The first three help you to autoscale your workloads or pods, and the fourth deals with autoscaling of clusters and nodes.
In part because of the infrastructure management complexities that Kubernetes abstracts away, it’s easy to set an autoscaling policy and just let it run. Without more granular optimizations such as at the workload level or rightsizing compute instances, Kubernetes is happy to spin up more AWS EC2 instances, for example, as soon as resource needs are not being met. This can very often result in unused or idle capacity.
Bad autoscaling policies are the bane of any Kubernetes optimization effort. If limits are set incorrectly or conditions are met through strange edge cases, it can result in out-of-control autoscaling that causes a dramatic increase in costs. At the same time, autoscaling is a great litmus test of something going wrong in your assumptions. For example, you might see rampant non-anomalous scaling that signifies user growth, or costs that spiral out of control, and that automation might be the first place to look.
How can you manage costs if you don’t even know what costs exist, and where they’re coming from? This seems like an obvious thing to say, but it’s difficult to do well in practice, especially when infrastructure resources change all the time.
You want to be able to reasonably attribute costs to their origins, otherwise called cost allocation. Depending on the context of the business, this could be down to the level of the developer, project, application, service, business unit, or anything else that makes sense for your organization’s needs. In Kubernetes, this can be achieved by looking at clusters, nodes, namespaces, workloads, and pods.
Saying that you need visibility is one thing. Getting the right kind of visibility to the right people is a layer of complexity added into that. While cost allocation is important, providing stakeholders with an accurate view of which resources are being used and how is critical.
Rather than telling someone they need to spin down a specific cluster, it’s more valuable to empower them to see how they’re using their resources, why they should scale down, and what they should do to be efficient. This additionally allows organizations to create accountability and shift down when they provide appropriate context.
How do you do any kind of Kubernetes cost management without the proper tools? Sure, you can ingest all of the data yourself and create your own solution. But will that bespoke solution be easy to modify when it needs change?
The point of tools is to create a sort of standard around what the basic needs are for a given problem, and there are not a plethora of good tools out there today that do this for Kubernetes cost management. The de facto approach is to use Kubernetes APIs to hook into Prometheus and use it with something like Grafana to visualize and understand costs better. This works, but it can also be a hassle, and things are bound to get missed. Think also of how you’d take those cost metrics from Prometheus and turn them into legitimate cost savings opportunities. It seems like a simple two-step process, but each step comes with its own set of complexities, making it difficult to manage Kubernetes costs efficiently. If you can’t efficiently manage costs, how can you be efficient with your resource usage?
Analysis is the first step to proper management of Kubernetes costs. If you can understand what is going on with costs, you can paint a picture of what it could look like if you undertook the effort of optimization. Let’s take a look at a few ways to analyze costs; you might even be able to consider these sequential.
Labels are exactly like tags you might find in your cloud resources. They’re just called labels in Kubernetes. Labeling the resources that you’re using makes it easier to find them later, and to associate them with any logical grouping of resources that makes sense down the line.
By implementing a solid labeling strategy that maintains good coverage across your Kubernetes resource fleet, you’ll be able to slice and dice all of your resources by any context that is relevant. That can be from a high-level view, down to the most granular cost allocation use case. However, this is much easier said than done because it’s not just you, it’s your whole team, or even organization, that needs to participate to get good use out of labeling. That said, even with a minimal labeling practice, you’ll be able to understand and act on costs better than if you had nothing.
Labeling is great, and so is the open-source Prometheus. But if that’s all you have, you’re not seeing your costs. Prometheus is great for setting up monitoring of your Kubernetes services, but not so much for creating visualizations and dashboards.
Visualization does wonders when it comes to figuring out what’s going on. Something you may not see through a command line output or tabular view often becomes glaringly obvious when you render it in a graph. Analysis and understanding of costs both become easier with good visualizations.
Remember, too, that this view is not just for you, but also for those who have little context into your day-to-day and may have questions. It’s not just about creating good views for yourself, it’s about creating good visibility throughout the organization to do some great analysis and optimization.
This is the low-hanging fruit! Whether you’re working with Kubernetes APIs and Prometheus to pull this data, or creating a great visualization with Grafana or otherwise, figuring out where there’s straight wastage is a prime opportunity to manage costs, and requires less analysis and validation than other savings methods.
Idle and unutilized resources are often the result of over-provisioned infrastructure built to optimize performance with no risk of downtime or issues to the end user experience. Because engineers prioritize performance, they will over-allocate resources for an application or service and then never come back to it. However, this translates into resources that end up idle, underutilized, or unallocated - and these still cost money. In the example below, almost 75% of all Kubernetes resources are not being actively used. It would be monumentally risky to cut all of that out. But, the utilization numbers can certainly be brought up, thus bringing down cost.
Finding opportunities to downsize entire clusters or nodes, or to rightsize individual workloads, takes a little more work. First, you have to find the workloads and clusters that have wastage. Then, you have to validate whether those are real opportunities to cut cost or whether it’s a healthy amount of overage. Finally, you can determine how much you can actually cut out to keep the application or service performant while keeping an optimal cost profile.
While this may take time, you might recognize it as a common exercise across organizations to find ways to create more efficient cost profiles. Does a workload really need to request 7000m CPU and 20Gi memory? Are there smaller AWS EC2 instances more optimal for the node? Will the instance perform better if it’s GPU-optimized instead of CPU-optimized, or do we need to be memory-optimized? What about shared AWS S3 storage buckets across nodes?
Of the Kubernetes cost analysis and management tools out there (Harness Cloud Cost Management, Kubecost), many offer free trials that you can easily sign up for to visualize, analyze, and optimize costs right away. Obviously, this will likely be a one-time solution because of the nature of free trials, but these tools will give you an idea of what you’re doing. It’ll also give you a taste of what an optimized Kubernetes cluster might look like, if not your whole Kubernetes environment.
For example, with a free trial of Harness Cloud Cost Management, you can get out-of-the-box visibility into cluster, namespace, workload, node, application, service, and environment, all without tagging or labeling. You also get insight into your utilized, idle, and unallocated resources, as well as recommendations built in to optimize your workloads based on historical usage data. Tools like Harness Cloud Cost Management remove the toil and monotony from analyzing and optimizing Kubernetes environments.
At Harness, we believe that Kubernetes cost analysis and management should be a first-class citizen. The same way that a plethora of tools exist to simplify cloud cost management, Kubernetes deserves the same with how ubiquitous it has become. Harness Cloud Cost Management provides a differentiated view of costs that focuses on analyzing, managing, and optimizing Kubernetes environments for engineering teams. In fact, it’s the first thing we built!
We believe that that analysis and management come through solving for three distinct use cases, and we’ve built to support that:
Without all three of these use cases solved for, what’s left is an incomplete picture of what Kubernetes costs look like, and what can be done to create a more cost-efficient infrastructure.
As you’ve explored Kubernetes cost analysis, you’ve also been seeing screenshots of Harness Cloud Cost Management and its dashboard visualizations in action, demonstrating just how it fits into the picture.
Managing Kubernetes costs should not be difficult. Engineering teams should not have to deal with additional toil to understand and optimize their infrastructure on Kubernetes. Organizations should not be flying blind to their costs or struggle to plan, predict, and optimize costs. That’s what Harness Cloud Cost Management is built to solve.
Whether you’re having your first experience with bill shock, or you’re a seasoned professional, managing Kubernetes costs is important. There are easier ways to do it. With Harness Cloud Cost Management, you can empower yourself and your organization with the ability to contextualize and analyze costs at any level, and simplify the effort required to find and implement optimizations.
Harness is a software delivery platform built with the needs of engineering teams in mind, making it easy for them to get visibility into their usage and take control of their optimizations. Whether it’s answering a question about why something is set up the way it is, or validating optimization opportunities, Harness has your back.
When you’re ready to simplify Kubernetes cost management, read our eBook and get one step closer to cloud bill peace of mind.
Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.