Architecting Feature Flags for Performance at Scale
Learn how Harness created a feature flag management solution that's performant even at the largest of scales.
What are two things that engineers are really good at? If you guessed building things and optimizing things, you’re exactly right. More importantly though, it’s building and optimizing for performance at any scale. For any application or service that eventually gets pushed to customers, engineering teams focus heavily on delivering performant code so that the user experience is consistent and pleasing.
The same goes for Harness Feature Flags. It’s not enough to just build a feature flags solution, it also needs to be able to handle load for the largest use cases, which may involve millions of users. And we talk about Harness Feature Flags as built for developers - what kind of developer tool doesn’t think about performance at scale in addition to ease of use?
Today, we want to showcase our own journey to building a scalable and performant feature flag solution - going from where we started with our MVP, to our final product that customers use in production. Plus, we want to share some of the lessons that we learned along the way.
The Initial Model
There are basically two main parts to the feature flag system. One is what we call the feature flag SDK, which is embedded in the target application and will talk to the feature flag server. The SDK receives feature flag updates from the server.
Second, there is going to be a UI/API component, which will take inputs for making changes to feature flags. Once those changes come into the system, we want those changes to stream directly to the SDK, like a push notification. So, as soon as we get the changes on the UI, we want to send a push notification down to the SDK for a feature flag change, and then we want to make the appropriate changes in the application.
There is another mechanism as well - a poll model. Think about the instance in which a subscriber might have a network disconnect - we want to be proactive about finding that with a polling-based model. We wanted to also support this mechanism to get the feature updates from the server and the system needed to support both these models. This was where we started off.
Building The Core System
We started developing our application in Golang. Primarily, we chose Golang for two reasons: it's very performant and fast; and it is very quick to develop (much faster than something like Java, which is a traditional enterprise platform for developing applications).
Obviously, we needed to select a database. We went with a relational database, and we went with Postgres primarily because it’s very popular, it has a lot of extensions, and it also will help us build our Metrics service, which we'll cover later. We started with a very simple layered application in this respect.
We started off with a UI component, which is actually shared across the Harness platform. So now we have our UI service, which is running as a separate service, and we had our feature flag server, which was serving our feature flag components. This particular server is taking load from the UI input, and streaming downstream to the SDK. Now, the number of users who will be actually using the UI server or UI service and making changes to the feature flags is going to be fairly small. And the number of applications that will be using these SDKs and to whom these changes are going to be stream is going to be fairly large.
A simple example is, you have a developer, DevOps person, or a product manager who's going to be controlling the feature flags, and these changes are going to be streamed down to maybe hundreds of thousands of iPhones or Android devices. And so we needed to break down this service into two components that could solve and scale this system independently.
The Admin service would just be in charge of taking the input from the UI or from the API, which would contain all the feature flags changes that we’re trying to make. Those changes would then be streamed down to the feature flag SDK service, and the SDK service will be in charge of communicating to all the SDKs downstream. So, we broke down the service into two components, which allowed us to scale both these services independently.
We have the Admin service, which could be a lower number of containers because it's not going to be serving a huge number of users. And the SDK service could be scaled independently, which could serve a large number of users. This was the first design decision that we modified to solve this particular use case.
Going forward, we realized that the number of components, the number of SDKs, is actually still going to be fairly large. We would still require a large number of these SDK service containers to be serving all the SDKs. And that, again, wasn't working. We investigated and figured out that we required another component that would just be in charge of communicating and taking care of maintaining the network connections to the clients.
We needed the first mechanism to maintain a persistent connection to all the SDKs for pushing down the notifications, and to do that, we introduced another component in between the SDKs and the SDK service. We leveraged an open source library called Pushpin. It's a pretty awesome library, which specializes in maintaining the service server-side connections to different SDKs. The Pushpin component is an interesting one that actually gives us maybe 50x or 100x scalability without actually adding much to our infrastructure cost. Every Pushpin service is capable of maintaining tens of thousands of concurrent SSE (Server-Side Event) connections to the clients over a standard HTTP protocol, meaning we don’t have to leverage any additional libraries for the client. Each pushpin instance, in turn, maintains a single connection to the SDK service container. From the application perspective, there is no special trick requirement of all knowledge of how to manage an SSE connection.
Think of the workflow when a change is made on the UI. The UI service calls the Admin service, which makes changes to the database, which notifies the SDK service, which then sends streaming updates to the Pushpin service. The Pushpin service is aware of all the connections and the SDK channels and it sends a notification to all interested SSE channels, meaning the clients are going to receive the notification.
The clients make an HTTP call to the SDK service and it will get the latest configuration from the SDK service. Then, the SDK will update the local cache. Finally, the application will get the latest feature configuration. This was working really well.
How Do We Scale This?
When we started, we said that the push notification is brilliant. But when the SDKs queried the SDK service for the latest configuration, we saw an influx of requests to the database. And that was, in fact, creating a bottleneck for the system. How could we optimize this? It was time to introduce a caching layer.
The obvious choice for this was introducing a Redis Cache. Now, instead of directly querying the database, the SDK service would hit the Redis cache before the Postgres database. If the data was in Postgres, it would get the data, pre-populate the feature flag configurations that it will construct from all the tables, and then store that updated configuration in Redis. This, of course, reduced the number of database queries being made, and freed up the load on the database. That helped us significantly.
The next part of the optimization was how we could scale Pushpin. Pushpin is actually a stateless service, and the push itself does not have the knowledge of which connections it has to send the data to and which ones not to send data to. We needed a mechanism to send appropriate event updates to all Pushpin servers, and without this, couldn’t scale Pushpin to more than one instance. That was actually a must-have: scaling Pushpin so that this entire system will become a horizontally-scalable system.
To solve that problem, we introduced another layer: a messaging layer. Since we already had Redis as part of infrastructure, we decided to evaluate Redis Streams, which is a Redis-based pub/sub mechanism that is different from Redis pub/sub, but more performant. The idea was to use Redis Streams so that we could scale the number of Pushpin instances to more than one. So, the new mechanism of delivering updates to the SDKs was that whenever we made any changes on the UI, the Admin service could then send notifications to the SDKs via Redis Streams.
We brought this change into our architecture since we were leveraging a messaging system that allowed us to have a message passing-based architecture and event-driven mechanism to communicate between our different microservices. We started to use Redis Streams for communicating between the Admin and SDK services, and also started using the same registry names also for communicating downstream to different Pushpin instances.
Now every Pushpin instance registers with the SDK service, we know which channels the Pushpin services are interested in, and the channels carry the environment ID that every SDK instance has registered. Every client would subscribe to the SSE channel with the environment ID, and in turn, these would be the topics that the Pushpin instances would register with the different SDK containers.
Once a feature flag update is made, the changes propagate to the SDK service, and then the SDK service sends the notification on their unique environment ID. Every Pushpin instance knows which environment ID the changes are being made to, and it sends downstream to all the different channels with the same environment ID.
Let’s say in one instance, a Pushpin instance has access to 200 feature flag SDKs with the same ID. It's going to send that push notification to those 200 SDKs; another instance has no connectivity to any SDK matching that ID, it is just going to ignore that message. In this way, we can horizontally scale Pushpin as a very lightweight microservice. A bonus is that we can multiplex this entire connectivity of the system without introducing infrastructure cost and helps us with reducing the overall latency of the system. This was a big improvement from where we started to where we got to at that point in time.
Introducing Metrics Collection
The next thing we wanted to introduce was collecting all the metrics from our SDKs. Server-side SDKs were doing a lot of evaluations, and from client-side SDKs, we needed to collect a lot of data about flag usage. Of course, this data had to be aggregated and collected, and then we had to build intelligent analytics on top of that data. To do that, we had to build an entire Metrics service.
To initialize and get this Metrics service started, we had to start with two improvements to all the SDKs. For all the SDKs, we started to collect all the feature flags usage of a given application and the evaluations on the SDK itself. We pre-aggregated the data of the other feature flag usage, and then at regular intervals, we needed to stream this data to the Metrics service.
We knew this Metrics service was going to be a very data-intensive service, and we had to think about two parts. One was the data ingestion: we wanted to make sure that this data that is being received didn’t cause any slowdown on the SDK itself. For example, if the Metrics service is down, it shouldn't stop the feature flag SDK from performing or add additional load on the system. Second was that we were going to collect a huge amount of data from all the SDKs, and this needed to be useful, while also remaining performant. And it needed to be available in near real-time as part of a scalable system.
The first thing we wanted to do is select an appropriate database, which would be able to post, process, and aggregate this data. We went with a time series-based database called TimescaleDB. TimescaleDB is a very interesting database. It's based off of Postgres, so it gave us uniformity in our infrastructure, and the flexibility and synergy to take our feature flag system on-prem too, which is one of our future goals.
Anyhow, TimescaleDB is a time series database, which specializes in storing large amounts of time series data, is horizontally scalable, and is performant. It’s great for our use case, which is having a time series-based view of how feature flag usage has evolved over a period of time for any filter criteria.
The most obvious thing to do with this Metrics service was to break it down into two parts, the ingestion service and the processing service. The ingestion service is something that sits on the very edge of the network, and its job is very simple: to get the data collected from the different SDKs and pass it downstream for processing.
The Metrics processing service is more of an asynchronous process, which consumes the data from the ingestion service and sends it for processing storage into our analytics database. We already had a messaging system, Redis Streams, so we decided to leverage this same infrastructure. The only change that we made was, instead of using the same Redis Stream instance, we decided to give this one a dedicated Redis Stream instance. This was very crucial for us to do, primarily for the same reason that I mentioned before.
The Redis Stream instance that we are using for our primary system is a critical part of our infrastructure to solve our feature flag usage, and that shouldn't interfere with our analytics DB. Our analytics DB and Metrics service is more of a secondary part of our system, which is important, but having these separate ups the resiliency in case of failure of any one part of the system. So, that is what we wanted to achieve. And that's why we broke down these two into having their own dedicated Redis Stream instances: so that the load of each one does not affect the other one.
Network, Compute, and the Current State
The last part that we would like to cover is how we have organized our networking, our compute, and where things stand from that perspective. The entire service that I have described so far - this is part of our Feature Flags server, which is still a part of the Harness platform. These are the different microservices that are dedicated just for Feature Flags. But this is a subset of the Harness platform, which is composed of different components.
We have CD, CI, Feature Flags, and Cloud Cost Management, and everything comes together as a single platform. We also leverage and communicate with other Harness platform microservices. I won’t go into the details because that’s a different scope, but think of it as there’s an entire world outside of what I just described for Feature Flags.
This entire system is sitting in our Kubernetes cluster, and all these microservices are different pods. We’re using a GCLB load balancer, which sits at the edge of our network for both the outbound Pushpin-based system, as well as for the Metrics ingestion service. This model gives us the sense of security, the scalability and the peace of mind that the system is going to be secure and scalable for whatever use cases we try to achieve.
The other thing I would like to highlight is that these components, like Redis Streams, the TimescaleDB, the database, and the caching layer, are all like plug-and-play-based infrastructure pieces. Based on requirements, the deployment strategy, SaaS or on-prem, these can be swapped to whatever infrastructure pieces we’d like to support.
This covers the platform play of how the Feature Flags architecture evolved, what the architecture is today, and how we addressed the scalability and latency issues while designing this architecture, and given the flexibility of the design that we have, how it is possible to evolve in the future as well. Thank you very much.