Chapters
Try It For Free
November 11, 2025

When Cloud Providers Have an Outage, Your Feature Flags Shouldn’t

Cloud outage? Your flags should keep running. Harness Feature Management ensures seamless feature delivery with instant SDK fallback, local decisioning, and a globally distributed streaming architecture—no redeploys required.

Over the past few weeks, the software industry has experienced multiple cloud outages that have caused widespread disruptions across hundreds of applications and services. When systems went down, the difference between chaos and continuity came down to architecture. In feature management, reliability is not a nice-to-have; it is designed in. When an outage occurs, it’s often not the failure itself that defines the customer experience, but how the system is designed to respond.

During the event, Harness Feature Management & Experimentation (FME) maintained 100% flag-delivery uptime across all regions—no redeploys, no configuration changes, no missed evaluations. This wasn’t luck. It’s the result of an architecture built from day one for failure resilience.  FME was built from the ground up with fault tolerance and continuity in mind. From automatic fallback mechanisms to distributed decision engines and managed streaming infrastructure, every layer of our architecture is designed to ensure feature flag delivery remains resilient, even in the face of unexpected events.

Automatic Fallback: Zero-Touch Continuity

One of the most important architectural principles in FME is graceful degradation, ensuring that even when one service experiences disruption, the system continues to function seamlessly. Our SDKs are designed to automatically fall back to polling if there is any issue connecting to the streaming service. This means developers and operators never have to manually intervene or redeploy code during an outage. The fallback happens instantly and intelligently, preserving continuity and minimizing operational burden. In contrast, many legacy systems in the market rely on manual configuration changes to fallback to polling and restore flag delivery, an approach that adds risk and friction exactly when teams can least afford it.

Resilient Client-Side Decisioning: Real Evaluations, Not Stale Caches

Client-side SDKs are often the first point of impact during a network disruption. In many architectures, these SDKs can serve only cached flag values when connectivity issues arise, leaving new users or sessions without the ability to evaluate flags. Harness FME takes a different approach. Each client SDK functions as a self-contained decision engine, capable of evaluating flag rules locally and automatically switching to polling when needed. Combined with local caching and retrieval from CDN edge locations, this design ensures that even during service interruptions, both existing and new users continue to receive flag evaluations without delay or degradation.

Distributed Streaming Architecture: Built for Continuous Availability

Harness FME’s distributed streaming architecture is engineered for global reach and high availability. If a region or node experiences issues, traffic automatically reroutes to healthy endpoints.  Combined with instant SDK fallback to polling, this ensures uninterrupted flag delivery and real-time responsiveness, regardless of the scale of disruption. During the recent outages, as users of our own feature flags, we served each customer their targeted experience with no disruptions.

Separation of control-plane and delivery-plane: Reliability at the Core

Even with strong backend continuity, user experience matters. Both the web console and APIs are engineered for graceful degradation. During transient internet instability, a subset of users may experience slowdowns, challenges accessing the web console, or issues posting back flag evaluation records; however, feature flag delivery and evaluation remain unaffected. This separation of control plane and delivery plane ensures that UI performance issues never impact your SDK evaluations and customer traffic. It is a key architectural decision that protects live customer experiences even in volatile network conditions.

The Outcome: Reliability as a Competitive Advantage

Reliability isn’t just about surviving outages - it’s about designing for them. Building for resilience requires intentional architectural choices such as automatic fallback mechanisms, self-sufficient SDKs, and isolation between control and delivery planes. That’s why, at Harness, we are using these opportunities to learn while following best practices to continuously improve our products, minimize the impact of outages on our customers, and deliver uninterrupted feature management at a global scale. It’s not about avoiding every failure; that’s virtually impossible. However, it's essential to ensure that when failure does happen, your product continues to work for your customers.  

If you’re brand new to Harness FME, get a demo here or sign up for a free trial today.

Nico Zelaya

Nico Zelaya is a seasoned Product & Engineering Director at Harness with over a decade of experience in software development and leadership. Formerly at Split, he led SDK and platform engineering, shaping developer tools used by global teams. With a background in full-stack JavaScript and systems engineering, Nico blends technical depth with strategic vision to scale teams, drive innovation, and ensure software quality and reliability. Passionate about developer experience and healthy engineering culture, he brings a trusted, hands-on approach to modern product delivery.

Similar Blogs

Feature Management & Experimentation