This is a guest post by Harness customer Chris Camire - Senior Manager, Technical Services, Tyler Technologies
When you host cloud infrastructure for your clients across the globe, cloud costs can easily spiral out of control, especially when client non-production environments are fully operational every hour of the day. At Tyler Technologies, that was certainly true for our Enterprise Permitting & Licensing solution. Tyler Technologies is the largest software provider in the U.S. that is solely focused on the public sector; its Enterprise Permitting & Licensing solution is used by government agencies to automate and streamline their community development and business management operations.
We originally tried to tackle this problem with AWS CloudWatch, using alarms to power down non-production infrastructure outside of working hours. We knew this wasn’t a long-term solution; our clients often work late hours and need their non-production environments available on-demand. Inversely, we also had the challenge that some environments weren’t used daily, or even weekly, but could be needed on a moment’s notice.
Ultimately, we decided to adopt Harness Cloud AutoStopping™ to help us get control of our idle cloud resources. It’s been a massive success, we’ve gotten a great deal of benefit from it. You can read more about that in our case study that we did with Harness.
This blog doesn’t focus on how we’re using Cloud AutoStopping though, because we realized that before we created a single Cloud AutoStopping rule, we needed to take a good look at how our infrastructure was organized in order to really maximize cloud cost savings.
Organic Growth Leads to Infrastructure Sprawl
Each of our clients is provided with a set of non-production environments, on non-production infrastructure, organized into ‘pods’. These environments are used by our clients to test our software before deploying into their production environments. Given the regulatory nature of the permitting and licensing services we provide to government agencies, ensuring that new features or bug fixes are fully tested and vetted before pushing to production is critical.
As our client base grew over the years, we continued to build additional pods to meet demand, and ensure clients had secure, reliable environments to test in. What we didn’t do was create a strategy for how we organized our clients across that cloud infrastructure.
What we ended up with was clients across very different time zones being hosted in the same pod. Which meant that, from one end of the country to the other, there would always be a client in that pod that needed access to their environments.
Being Intentional About Organizing Shared Infrastructure
As we thought about reorganizing our infrastructure for cloud cost savings, we started with one key goal: organize in such a way that Cloud AutoStopping could stop the instances in our pods as much as possible. We’re taking a 2-step approach to this, first by client time zone, and then by client activity.
Organizing by Time Zone
Categorizing our clients by time zone was the obvious first choice, since it would group clients that start and stop their workdays at roughly the same time. Given that our client base consists solely of public agencies, we already knew which states and time zones those agencies worked in. We got to work and started migrating the underlying applications into new ‘pods’ that were designated by timezone.
This new organization has definitely helped us optimize the efficiency of our pods; they are now generally fully idle at the same time, enabling Cloud AutoStopping to power down these instances until they are needed again. When a client needs their environment, they access it just as they normally would, Cloud AutoStopping detects the incoming traffic, auto-starts the instances, and the client is off and running.
The results have been amazing since we started the process 6 months ago. Our cloud cost savings have increased exponentially as we’ve configured Cloud AutoStopping on more pods; initially saving us $15K to $20K a month and now passing the milestone of saving $100K last month alone.
Organizing by Activity
When we started the process 6 months ago, trying to organize our clients by their activity level was nowhere on our radar. We didn’t have any real visibility into that level of our clients' usage. But as we’ve continued to roll out Cloud AutoStopping across our infrastructure, it has become clear just how much some clients are (and aren’t) using their environments based on the idle / active times we see in the Harness Cloud AutoStopping console.
The processes and procedures that government agencies have in place to test new software patches or releases can vary greatly; some are testing or training every day - others much less, monthly or even quarterly. That means we have environments that are idle for days or weeks at a time. We realized that if we group these environments together, we can get to the point where pods could be powered down for much longer periods of time.
We’re working with Harness to get a little more intelligence here to help us be more efficient with this categorization, but we’ve already started the reorganization with the data we have in hand today.
For now, I’m focusing on getting as close to 100% coverage for Cloud AutoStopping rules as I can, since it has such a massive, ongoing positive impact on our bottom line. Then I’ll start exploring other features more in-depth like Cloud Asset Governance and implementing recommendations.
It’s been a great journey for us so far, and you can read more about it in our case study.
Editors Note: This is one of a new series of blogs that are focused on Harness Cloud Cost Management users are “Architecting for Cost Savings”, highlighting their experiences where cloud architectural decisions and changes have positively impacted cloud costs.