Continuing on our theme of things people don’t often think about when using feature flags, we want to take a look at another place where feature flags will make a huge impact for your organization, but which you may not have considered yet: incident response. It’s true - feature flags can help you gain faster incident resolution. Let’s dig into how.
Let’s take a look at how feature flags can help incident resolution. We’ll use a hypothetical scenario that maybe isn’t so hypothetical.
It’s 3pm in California and suddenly, tickets start rolling in that users can’t access a key section of the application. It appears to be a frontend bug blocking the clickable area.
The engineering team on the west coast begins to investigate, but it’s not in a service most of them work on. Most of them are backend anyway. For this service, the frontend team is largely based in Madrid where it’s midnight, or London where it’s 11pm. They’re not reachable and while they can be paged, no one likes doing that.
After some digging, it appears that a PR merged by the Europe team near the end of their day is likely a root cause. Rolling back seems like the best way to fix it. However, the ops team is based in India, where it’s currently 4:30am, so they won’t be online for a few more hours.
Here’s where we enter into two different worlds.
Without feature flags, you have three options:
With feature flags, the team in California can disable the flag for that specific UI update, the rest of the release without issues remains live, and the incident is over from the customer’s perspective. From reporting to conclusion takes maybe ten minutes, no one is paged, no rollback is needed, and all the good parts of your release stay active.
What we see in this scenario is that an incident that would otherwise cause disruption for multiple teams, require pages, rollbacks, or both - and require customer disruption for anywhere between a few hours and a full day - is over in ten minutes. One could say that the ability to fail safely is a critical advantage.
This is not limited to front end changes. As we explored previously, feature flags can and should be utilized for backend changes, API changes, and more. We encourage teams to think beyond “feature flags are used to release new features” and instead think about feature flags as a critical part of change management.
All changes to your application should potentially be released behind a feature flag. The flag, in this case, plays the role not just of a tool for more effective testing and releasing, but also as a way to make sure that no change to your application ever takes more than a single toggle to disable in prod.
The more you dig into feature flags, the more impact you will find - and not just with faster incident resolution.
Try to think about feature flags not only as part of your release strategy. Instead, see it as part of how you design your apps to be resilient and to guarantee a quick MTTR. That will improve your customer experience, your support satisfaction, and will keep your employees focused on doing the work they most need to do - without constant disruption for the more complicated ways that incidents can unfold.
Hope this article was useful for you! If you’d like to do some further reading on feature flags, how about reading up on 5 Common Challenges When Using Feature Flags?
Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.