If you’ve only come to feature flags recently, you may not know that feature flags as a concept has existed for quite a long time, and can easily be seen as an evolution of app config systems. But first things first - if you need a refresher on feature flags, head on over to our “What Are Feature Flags?” article now.
One of the least used - but most powerful - use cases of kill switches using feature flags can be for Ops and SRE teams. Plenty of folks never think about this potential, but it can be a way in which feature flags can most transform your engineering organization. They’re a great way to increase velocity and reduce risk in software development.
Kill switches create tremendous potential both to control granular capabilities or features within an application, and also as long-lived operational switches that affect the entire application. And while feature flags are intuitively thought of as affecting frontend or client-facing applications, they can also be used just as well on the backend. Let’s explore how you can leverage kill switches to change the nature of how your team can deliver software.
Consider this the base case of implementing kill switches using feature flags. The concept is simple enough: you can rig any and all features and changes pushed to production (or any environment) with a toggle, or a kill switch. Feature flags, by nature, allow teams to isolate features within their applications so that they can be handled individually instead of as a deployment bundle. Kill switches themselves are the way in which teams can create a control to disable code in production at any time.
This basic kill switch functionality gives teams the ability to turn off a feature when it’s not ready. “Not ready” can mean two things in prod:
The first condition for not being ready is what slows down many teams during the deployment process. In particular, as many teams work on different features across different applications, integration and deployment take a long time. Without having everything rolled in, release teams and the software development teams that built them both find themselves slowed down.
Using feature flags as kill switches in this context allows teams to push new features to production even as incomplete features, which can result in faster development cycles. It simply needs to be pushed live, but with the kill switch activated (feature turned off) so that it’s not actually impacting anything.
Here’s a developer favorite: it worked in all the pre-production environments, and then a bug was discovered that’s now impacting production. Without the use of a feature flag, that broken feature impacting prod would now necessitate a complete rollback of the deployment - that’s right, all of the good features have to be pulled off the shelves too and production is switched back to the last working version.
On the other hand, rigging each new feature or change with a feature flag enables teams to hit the kill switch as soon as an issue is found. Now, all of the features that work get to stay in prod, while the broken features are isolated out and handled as required.
What happens when a severe issue occurs in production because of feature releases that are broken? Or when some infrastructure failure cascades across all of production? As it turns out, kill switches can be used to mitigate these incidents and improve the MTTR (mean time to resolution) for the teams responsible. Talk about stress management.
In scenario 1, a broken new feature is causing an incident in production. The good news is, if that new feature was rigged up to a feature flag, the kill switch could be triggered and that feature could be turned off in production right away, resulting in minimal impact. Not only is the feature no longer affecting production, but there is also suddenly no massive response team that needs to be assembled to handle the issue, which would typically mean the whole deployment had to be pulled. Instead, the feature is turned off, the team responsible for the feature is on the hook to fix it, and the one feature fix can be rolled forward when it’s ready.
In scenario 2, an infrastructure failure is one example of what could happen in production. However, production outages can happen due to a variety of issues. Oftentimes, Ops teams will have fallback options or “just turn it all off” runbooks in place, and kill switches can be used here as well. If it’s a new infrastructure change entirely, that whole change can be wrapped in a feature flag and a kill switch can be triggered that will failover to the old setup when an issue occurs. Some teams also create long-lived operational flags that can, say, put a whole site or application in “maintenance mode,” effectively triggering a kill switch for the whole application. This can be useful in P0 scenarios where things are completely broken and the preference is to not allow any access.
Let’s say your business is an e-commerce company. During the holidays, you experience high load relative to the rest of the year. Now more than ever, the business doesn’t want to have downtime or have customers experience issues! Turns out feature flagging is an incredibly low-cost way to solve this problem.
One approach to solve this problem could be to simply turn off some features that will reduce the load, and this is a great place to use a kill switch, but it might not be the most effective. Another solution might be to kill certain load balancers or servers that are useful in non-peak times, and cut over to infrastructure that’s designed specifically for high load.
At a previous job, we had log data that would need to go into storage from the short term cache, so we had a system to collect and store them. During peak loads or other periods of instability (particularly upstream with AWS), we would often see this storage become either very expensive or very unreliable. So, we would turn this service off during these periods, but this was manual and involved a series of AWS commands and database changes to both disable and re-enable later on. By wiring this up to a feature flag, we were able turn this feature off and on instantly in a much more lightweight, visible, and easy to govern way.
One of the most common use cases for teams doing feature flagging is testing in production. In short, this is where a specific feature (or set of features) needs to be tested against real production data. After all, pre-prod environments can only test so many things! There are two use cases that we see here:
The use of feature flags as kill switches here is a pretty simple one: have it live until the test has been run, and then turn it off. It’s also possible that a feature will fail during testing and need to be turned off to minimize impact. You can dig deeper into testing in production in this dedicated blog.
Here, let’s assume a basic knowledge of progessive delivery. It’s becoming an increasingly popular feature release methodology, very similar to the methodology used for canary releases in Continuous Delivery.
With progressive delivery, the need for a kill switch changes slightly. Yes, the main use case is still turning things off when they break or aren’t needed, but here, we have to layer in automation. By its nature, progressive delivery is something that teams run in an automated fashion. To do that, they need to pre-define in which scenarios flags are turned on, for whom, and under what conditions more users get access. Conversely, they must define failure criteria in which the kill switch is triggered and new features are turned off, or the user base is shrunk.
In Harness Feature Flags, we automate progressive delivery via the Pipeline. Teams can schedule releases, mandate approvals, integrate with plugins, create trigger events, and templatize rollouts - and then automate it.
Personalization is a hot topic. Kill switches using feature flags are of huge value in being able to do this well at scale. We can look at three distinct scenarios in which you can use kill switches to improve your ability to personalize experiences for customers.
When it comes to regulation, it’s a non-negotiable item to be able to “personalize” something like a mobile app release to comply with the laws in various countries, or to meet the needs of clients in specific verticals (e.g. government, finance). The most common of these is data privacy - specifically, complying with GDPR.
Considering this, it can be a mess showing a button in one place but not another. Wiring key features up as permanent feature flags allows you to easily turn something on for all your North American users, but not your European users where GDPR compliance is mandatory. It’s not just creating a kill switch that given context will turn off data collection in Europe, it’s also being able to kill data collection for any user that decides to opt out. It’s much easier than having a configuration file that tries to solve for all of the various scenarios.
Think of this as giving control to the user on what features they want to kill. This isn’t dissimilar to giving them access to admin settings on the app so they can choose to have dark mode or choose what notifications they get.
While on the frontend, users will define what they want and don’t for their personal experience on the app, on the backend, it’s just exposing parts of the feature flag schema to the end user and letting them make the decision on what to keep and what to kill.
Another consideration here is the use of features like screen time or parental controls, where a kill switch can be triggered to limit the access of users to specific applications altogether based on settings elsewhere on their devices. Of course, putting the onus on the user to define what they will do with such power is a whole other problem to solve.
Here’s the plug: using Harness Feature Flags gives you the ability to implement all of these use cases right out of the box. When you set up feature flags in Harness, you’re able to work either in a visual UI or entirely in code to manage your kill switches. You also give yourself peace of mind when you eventually want to scale your usage of kill switches or other feature flags use cases with built-in governance, compliance, and security considerations, as well as the critical integration into CI/CD (Continuous Integration and Continuous Delivery).
After all, you don’t want to set up a great feature flagging system that is entirely disconnected from the rest of your software delivery process. It doubles your work on access control, security, governance, and integration maintenance. You end up getting diminishing returns on your feature flagging platform, which is supposed to accelerate software delivery.
Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.