
Feature flags are table stakes for modern software development. They allow teams to ship features safely, test new functionality, and iterate quickly, all without re-deploying their applications. As teams grow and ship across multiple services, environments, and languages, consistently managing feature flags becomes a significant challenge.

Harness Feature Management & Experimentation (FME) continues its investment in OpenFeature, building on our early support and adoption of the CNCF standard for feature flagging since 2022. OpenFeature provides a single, vendor-agnostic API that allows developers to interact with multiple feature management providers while maintaining consistent flag behavior.
With OpenFeature, you can standardize flag behavior across services and applications, and integrate feature flags across multiple languages and SDKs, including Node.js, Python, Java, .NET, Android, iOS, Angular, React, and Web.
Feature flagging may appear simple on the surface; you check a boolean, push up a branch, and move on. But as Pete Hodgson describes in his blog post about OpenFeature:
When I talk to people about adopting feature flags, I often describe feature flag management as a bit of an iceberg. On the surface, feature flagging seems really simple… However, once you get into it, there’s a fair bit of complexity lurking under the surface.

At scale, feature management is more than toggling booleans; it's about auditing configurations, controlling incremental rollouts, ensuring governance and operational best practices, tracking events, and integrating with analytics systems. OpenFeature provides a standard interface for consistent execution across SDKs and providers. Once teams hit those hidden layers of complexity, a standardized approach is no longer optional.
This need for standardization isn’t new. In fact, Harness FME (previously known as Split.io) was an early supporter of OpenFeature because teams were already running into the limits of proprietary, SDK-specific flag implementations. From a blog post about OpenFeature published in 2022:
While feature flags alone are very powerful, organizations that use flagging at scale quickly learn that additional functionality is needed for a proper, long-term feature management approach.
This post highlights challenges that are now commonplace in most organizations: maintaining several SDKs across services, inconsistent flag definitions between teams, and friction in integrating feature flags with analytics, monitoring, and CI/CD systems.
What’s changed since then isn’t the problem; it’s the urgency. Teams are now shipping faster, across more languages and environments, with higher expectations around governance, experimentation, and observability. OpenFeature is a solution that enables teams to meet those expectations without increasing complexity.
Feature flagging with OpenFeature provides your team with a consistent API to evaluate flags across environments and SDKs. With Harness FME, you can plug OpenFeature directly into your applications to standardize flag evaluations, simplify rollouts, and track feature impact, all from your existing workflow.

The Harness FME OpenFeature Provider wraps the Harness FME SDK, bridging the OpenFeature SDK with the Harness FME service. The provider maps OpenFeature's interface to the FME SDK, which handles communication with Harness services to evaluate feature flags and retrieve configuration updates.
In the following example, we’ll use the Harness FME Node.js OpenFeature Provider to evaluate and track feature flags in a sample application.
Before you begin, ensure you have the following requirements:
With the provider registered and your evaluation context configured, your Node.js service can now evaluate flags, track events, and access flag metadata through OpenFeature without needing custom clients or SDK rewrites. From here, you can add additional flags, expand your targeting attributes, configure rollout rules in Harness FME, and feed event data directly into your experimentation workflows.
Feature management at scale is a common operational challenge. Much like the feature flagging iceberg where the simple on/off switch is just the visible tip, most of the real work happens underneath the surface: consistent evaluation logic, targeting, auditing, event tracking, and rollout safety. Harness FME and OpenFeature help teams manage these hidden operational complexities in a unified, predictable way.
Looking ahead, we’re extending support to additional server-side providers such as Go and Ruby, continuing to broaden OpenFeature’s reach across your entire stack.
To learn more about supported providers and how teams use OpenFeature with Harness FME in practice, see the Harness FME OpenFeature documentation. If you’re brand new to Harness FME, sign up for a free trial today.
Get a demo switch to Harness FME

Product and experimentation teams need confidence in their data when making high-impact product decisions. Today, experiment results often require copying behavioral data into external systems, which creates delays, security risks, and black-box calculations that are difficult to trust or validate.
Warehouse Native Experimentation keeps experiment data directly in your data warehouse, enabling you to analyze results with full transparency and governance control.
With Warehouse Native Experimentation, you can:
Product velocity has become a competitive differentiator, but experimentation often lags behind. AI-accelerated development means teams are shipping code faster than ever, while maintaining confidence in data-driven decisions is becoming increasingly challenging.
Modern teams face increasing pressure to move faster while reducing operational costs, reducing risk when launching high-impact features, maintaining strict data compliance and governance, and aligning product decisions with reliable, shared business metrics.
Executives are recognizing that sustainable velocity requires trustworthy insights. According to the 2025 State of AI in Software Engineering report, 81% of engineering leaders surveyed agreed that:
“Purpose-built platforms that automate the end-to-end SDLC will be far more valuable than solutions that target just one specific task in the future.”
At the same time, investments in data warehouses such as Snowflake and Amazon Redshift have increased. These platforms have become the trusted source of truth for customer behavior, financial reporting, and operational metrics.
This shift creates a new expectation where experiments must run where data already lives, results must be fully transparent to data stakeholders, and insights must be trustworthy from the get-go.
Warehouse Native Experimentation enables teams to scale experimentation without relying on streaming data pipelines, vendor lock-in, or black-box calculations, as trust and speed are now critical to business success.
Warehouse Native Experimentation integrates with Snowflake and Amazon Redshift, allowing you to analyze assignments and events within your data warehouse.

Because all queries run inside your warehouse, you benefit from full visibility into data schemas and transformation logic, higher trust in experiment outcomes, and the ability to validate, troubleshoot, and customize queries.

When Warehouse Native experiment results are generated from the same source of truth for your organization, decision-making becomes faster and more confident.
Metrics define success, and Warehouse Native Experimentation enables teams to define them using data that already adheres to internal governance rules. You can build metrics using existing warehouse tables, reuse them across multiple experiments, and include guardrail metrics (such as latency, revenue, or stability) to ensure consistency and accuracy. As experimentation needs evolve, metrics evolve with them, without duplicate data definitions.

Experiments generate value when success metrics represent business reality. By codifying business logic into metrics, you can monitor the performance of what matters to your business, such as checkout conversion based on purchase events, average page load time as a performance guardrail, and revenue per user associated with e-commerce goals.
Once you've defined your metrics, Warehouse Native Experimentation automatically computes results on a daily recalculation or manual refresh and provides clear statistical significance indicators.
Because every result is generated with SQL that you can view in your data warehouse, teams can validate transformations, debug anomalies, and collaborate with data stakeholders. When everyone, from product to data science, can inspect the results, everyone trusts the decision.
Warehouse Native Experimentation requires connecting your data warehouse and ensuring your experiment and event data are ready for analysis. Warehouse Native Experimentation does not require streaming or ingestion; Harness FME reads directly from assignment and metric source tables.
To get started:
From setting up Warehouse Native Experimentation to accessing your first Warehouse Native experiment result, organizations can efficiently move from raw data to validated insights, without building data pipelines.
Warehouse Native Experimentation is ideal for organizations that already capture behavioral data in their warehouse, want experimentation without data exporting, and value transparency, governance, and flexibility in metrics.
Whether you're optimizing checkout or testing a new onboarding experience, Warehouse Native Experimentation enables you to make informed decisions, powered by the data sources your business already trusts.
Looking ahead, Harness FME will extend these workflows toward a shift-left approach, bringing experimentation closer to the release process with data checks in CI/CD pipelines, Harness RBAC permissioning, and policy-as-code governance. This alignment ensures product, experimentation, and engineering teams can release faster while maintaining confidence and compliance in every change.
To start running experiments in a supported data warehouse, see the Warehouse Native Experimentation documentation. If you're brand new to Harness FME, sign up for a free trial today.

Managing feature flags can be complex, especially across multiple projects and environments. Teams often need to navigate dashboards, APIs, and documentation to understand which flags exist, their configurations, and where they are deployed. What if you could handle these tasks using simple natural language prompts directly within your AI-powered IDE?

Harness Model Context Protocol (MCP) tools make this possible. By integrating with Claude Code, Windsurf, Cursor, or VS Code, developers and product managers can discover projects, list feature flags, and inspect flag definitions, all without leaving their development environment.
By using one of many AI-powered IDE agents, you can query your feature management data using natural language. They analyze your projects and flags to generate structured outputs that the agent can interpret to accurately answer questions and make recommendations for release planning.
With these agents, non-technical stakeholders can query and understand feature flags without deeper technical expertise. This approach reduces context switching, lowers the learning curve, and enables teams to make faster, data-driven decisions about feature management and rollout.
According to Harness and LeadDev’s survey of 500 engineering leaders in 2024:
82% of teams that are successful with feature management actively monitor system performance and user behavior at the feature level, and 78% prioritize risk mitigation and optimization when releasing new features.
Harness MCP tools help teams address these priorities by enabling developers and release engineers to audit, compare, and inspect feature flags across projects and environments in real time, aligning with industry best practices for governance, risk mitigation, and operational visibility.
Traditional feature flag management practices can present several challenges:
Harness MCP tools address these pain points by providing a conversational interface for interacting with your FME data, democratizing access to feature management insights across teams.
The FME MCP integration supports several capabilities:
You can also generate quick summaries of flag configurations or compare flag settings across environments directly in Claude Code using natural language prompts.
Some example prompts to get you started include the following:
"List all feature flags in the `checkout-service` project."
"Describe the rollout strategy and targeting rules for `enable_new_checkout`."
"Compare the `enable_checkout_flow` flag between staging and production."
"Show me all active flags in the `payment-service` project."
“Show me all environments defined for the `checkout-service` project.”
“Identify all flags that are fully rolled out and safe to remove from code.”
These prompts produce actionable insights in Claude Code (or your IDE of choice).
To start using Harness MCP tools for FME, ensure you have access to Claude Code and the Harness platform with FME enabled. Then, interact with the tools via natural language prompts to discover projects, explore flags, and inspect flag configurations.
Harness MCP tools transform feature management into a conversational, AI-assisted workflow, making it easier to audit and manage your feature flags consistently across environments.
{
...
"mcpServers": {
"harness": {
"command": "/path/to/harness-mcp-server",
"args": [
"stdio",
"--toolsets=fme"
],
"env": {
"HARNESS_API_KEY": "your-api-key-here",
"HARNESS_DEFAULT_ORG_ID": "your-org-id",
"HARNESS_DEFAULT_PROJECT_ID": "your-project-id",
"HARNESS_BASE_URL": "https://your-harness-instance.harness.io"
}
}
}
}To configure additional MCP-compatible AI tools like Windsurf, Cursor, or VS Code, see the Harness MCP Server documentation, which includes detailed setup instructions for all supported platforms.


Feature management at scale is a common operational challenge. With Harness MCP tools and AI-powered IDEs, teams can already discover, inspect, and summarize flag configurations conversationally, reducing context switching and speeding up audits.
Looking ahead, this workflow extends itself towards a DevOps-focused approach, where developers and release engineers can prompt tools like Claude Code to identify inconsistencies or misconfigurations in feature flags across environments and take action to address them.
By embedding these capabilities directly into the development workflow, feature management becomes more operational and code-aware, enabling teams to maintain governance and reliability in real time.
For more information about the Harness MCP Server, see the Harness MCP Server documentation and the GitHub repository. If you’re brand new to Harness FME, sign up for a free trial today.


Feature flags are table stakes for modern software development. They allow teams to ship features safely, test new functionality, and iterate quickly, all without re-deploying their applications. As teams grow and ship across multiple services, environments, and languages, consistently managing feature flags becomes a significant challenge.

Harness Feature Management & Experimentation (FME) continues its investment in OpenFeature, building on our early support and adoption of the CNCF standard for feature flagging since 2022. OpenFeature provides a single, vendor-agnostic API that allows developers to interact with multiple feature management providers while maintaining consistent flag behavior.
With OpenFeature, you can standardize flag behavior across services and applications, and integrate feature flags across multiple languages and SDKs, including Node.js, Python, Java, .NET, Android, iOS, Angular, React, and Web.
Feature flagging may appear simple on the surface; you check a boolean, push up a branch, and move on. But as Pete Hodgson describes in his blog post about OpenFeature:
When I talk to people about adopting feature flags, I often describe feature flag management as a bit of an iceberg. On the surface, feature flagging seems really simple… However, once you get into it, there’s a fair bit of complexity lurking under the surface.

At scale, feature management is more than toggling booleans; it's about auditing configurations, controlling incremental rollouts, ensuring governance and operational best practices, tracking events, and integrating with analytics systems. OpenFeature provides a standard interface for consistent execution across SDKs and providers. Once teams hit those hidden layers of complexity, a standardized approach is no longer optional.
This need for standardization isn’t new. In fact, Harness FME (previously known as Split.io) was an early supporter of OpenFeature because teams were already running into the limits of proprietary, SDK-specific flag implementations. From a blog post about OpenFeature published in 2022:
While feature flags alone are very powerful, organizations that use flagging at scale quickly learn that additional functionality is needed for a proper, long-term feature management approach.
This post highlights challenges that are now commonplace in most organizations: maintaining several SDKs across services, inconsistent flag definitions between teams, and friction in integrating feature flags with analytics, monitoring, and CI/CD systems.
What’s changed since then isn’t the problem; it’s the urgency. Teams are now shipping faster, across more languages and environments, with higher expectations around governance, experimentation, and observability. OpenFeature is a solution that enables teams to meet those expectations without increasing complexity.
Feature flagging with OpenFeature provides your team with a consistent API to evaluate flags across environments and SDKs. With Harness FME, you can plug OpenFeature directly into your applications to standardize flag evaluations, simplify rollouts, and track feature impact, all from your existing workflow.

The Harness FME OpenFeature Provider wraps the Harness FME SDK, bridging the OpenFeature SDK with the Harness FME service. The provider maps OpenFeature's interface to the FME SDK, which handles communication with Harness services to evaluate feature flags and retrieve configuration updates.
In the following example, we’ll use the Harness FME Node.js OpenFeature Provider to evaluate and track feature flags in a sample application.
Before you begin, ensure you have the following requirements:
With the provider registered and your evaluation context configured, your Node.js service can now evaluate flags, track events, and access flag metadata through OpenFeature without needing custom clients or SDK rewrites. From here, you can add additional flags, expand your targeting attributes, configure rollout rules in Harness FME, and feed event data directly into your experimentation workflows.
Feature management at scale is a common operational challenge. Much like the feature flagging iceberg where the simple on/off switch is just the visible tip, most of the real work happens underneath the surface: consistent evaluation logic, targeting, auditing, event tracking, and rollout safety. Harness FME and OpenFeature help teams manage these hidden operational complexities in a unified, predictable way.
Looking ahead, we’re extending support to additional server-side providers such as Go and Ruby, continuing to broaden OpenFeature’s reach across your entire stack.
To learn more about supported providers and how teams use OpenFeature with Harness FME in practice, see the Harness FME OpenFeature documentation. If you’re brand new to Harness FME, sign up for a free trial today.
Get a demo switch to Harness FME


Product and experimentation teams need confidence in their data when making high-impact product decisions. Today, experiment results often require copying behavioral data into external systems, which creates delays, security risks, and black-box calculations that are difficult to trust or validate.
Warehouse Native Experimentation keeps experiment data directly in your data warehouse, enabling you to analyze results with full transparency and governance control.
With Warehouse Native Experimentation, you can:
Product velocity has become a competitive differentiator, but experimentation often lags behind. AI-accelerated development means teams are shipping code faster than ever, while maintaining confidence in data-driven decisions is becoming increasingly challenging.
Modern teams face increasing pressure to move faster while reducing operational costs, reducing risk when launching high-impact features, maintaining strict data compliance and governance, and aligning product decisions with reliable, shared business metrics.
Executives are recognizing that sustainable velocity requires trustworthy insights. According to the 2025 State of AI in Software Engineering report, 81% of engineering leaders surveyed agreed that:
“Purpose-built platforms that automate the end-to-end SDLC will be far more valuable than solutions that target just one specific task in the future.”
At the same time, investments in data warehouses such as Snowflake and Amazon Redshift have increased. These platforms have become the trusted source of truth for customer behavior, financial reporting, and operational metrics.
This shift creates a new expectation where experiments must run where data already lives, results must be fully transparent to data stakeholders, and insights must be trustworthy from the get-go.
Warehouse Native Experimentation enables teams to scale experimentation without relying on streaming data pipelines, vendor lock-in, or black-box calculations, as trust and speed are now critical to business success.
Warehouse Native Experimentation integrates with Snowflake and Amazon Redshift, allowing you to analyze assignments and events within your data warehouse.

Because all queries run inside your warehouse, you benefit from full visibility into data schemas and transformation logic, higher trust in experiment outcomes, and the ability to validate, troubleshoot, and customize queries.

When Warehouse Native experiment results are generated from the same source of truth for your organization, decision-making becomes faster and more confident.
Metrics define success, and Warehouse Native Experimentation enables teams to define them using data that already adheres to internal governance rules. You can build metrics using existing warehouse tables, reuse them across multiple experiments, and include guardrail metrics (such as latency, revenue, or stability) to ensure consistency and accuracy. As experimentation needs evolve, metrics evolve with them, without duplicate data definitions.

Experiments generate value when success metrics represent business reality. By codifying business logic into metrics, you can monitor the performance of what matters to your business, such as checkout conversion based on purchase events, average page load time as a performance guardrail, and revenue per user associated with e-commerce goals.
Once you've defined your metrics, Warehouse Native Experimentation automatically computes results on a daily recalculation or manual refresh and provides clear statistical significance indicators.
Because every result is generated with SQL that you can view in your data warehouse, teams can validate transformations, debug anomalies, and collaborate with data stakeholders. When everyone, from product to data science, can inspect the results, everyone trusts the decision.
Warehouse Native Experimentation requires connecting your data warehouse and ensuring your experiment and event data are ready for analysis. Warehouse Native Experimentation does not require streaming or ingestion; Harness FME reads directly from assignment and metric source tables.
To get started:
From setting up Warehouse Native Experimentation to accessing your first Warehouse Native experiment result, organizations can efficiently move from raw data to validated insights, without building data pipelines.
Warehouse Native Experimentation is ideal for organizations that already capture behavioral data in their warehouse, want experimentation without data exporting, and value transparency, governance, and flexibility in metrics.
Whether you're optimizing checkout or testing a new onboarding experience, Warehouse Native Experimentation enables you to make informed decisions, powered by the data sources your business already trusts.
Looking ahead, Harness FME will extend these workflows toward a shift-left approach, bringing experimentation closer to the release process with data checks in CI/CD pipelines, Harness RBAC permissioning, and policy-as-code governance. This alignment ensures product, experimentation, and engineering teams can release faster while maintaining confidence and compliance in every change.
To start running experiments in a supported data warehouse, see the Warehouse Native Experimentation documentation. If you're brand new to Harness FME, sign up for a free trial today.
.png)
.png)
Over the past six months, we have been hard at work building an integrated experience to take full advantage of the new platform made available after the Split.io merger with Harness. We have shipped a unified Harness UI for migrated Split customers, added enterprise-grade controls for experiments and rollouts, and doubled down on AI to help teams see impact faster and act with confidence. Highlights include OpenFeature providers, Warehouse Native Experimentation (beta), AI Experiment Summaries, rule-based segments, SDK fallback treatments, dimensional analysis support, and new FME MCP tools that connect your flags to AI-assisted IDEs.
And our efforts are being noticed. Just last month, Forrester released the 2025 Forrester Wave™ for Continuous Delivery & Release Automation where Harness was ranked as a leader in part due to our platform approach including CI/CD and FME. This helps us uniquely solve some of the most challenging problems facing DevOps teams today.
This year we completed the front-end migration path that moves customers from app.split.io to app.harness.io, giving teams a consistent, modern experience across the Harness platform with no developer code changes required. Day-to-day user flows remain familiar, while admins gain Harness-native RBAC, SSO, and API management with personal access token and service account token support.
What this means for you:
For admins, the quick confidence checklist, logging steps, and side-by-side screens make the switch straightforward. FME Settings routes you into the standard Harness RBAC screens for long-term consistency where appropriate.
Two themes shaped our AI investments: explainability and in-flow assist.
To learn more, watch this video!

Warehouse Native Experimentation lets you run analyses directly in your own data warehouse using your assignment and event data for more transparent, flexible measurement. We are pleased to announce that this feature is now available in beta. Customers can request access through their account team and read more about it in our docs.

As you can see from all the new features below, we have been running hard and we are accelerating into the turn as we head toward the end of the year. We take pride in the partnerships we have with our customers. As we listen to your concerns, our engineering teams are working hard to implement the features you need to be successful.
October 2025
September 2025
July 2025
June 2025
Foundation laid earlier in 2025
As always, you can find details on all our new features by reading our release notes.
We are excited to add more value for our customers by continuing to integrate Split with Harness to achieve the best of both worlds. Harness CI/CD customers can expect familiar and proven methodologies to show up in FME like pipelines, RBAC, SSO support and more. To see the full roadmap and get a sneak peak at what is coming, reach out to us to schedule a call with your account representative.
Want the full details? Read the latest FME release notes for all features, dates, and docs.
Checkout The Feature Management & Experimentation Summit
Read comparison of Harness FME with Unleash


Over the past few weeks, the software industry has experienced multiple cloud outages that have caused widespread disruptions across hundreds of applications and services. When systems went down, the difference between chaos and continuity came down to architecture. In feature management, reliability is not a nice-to-have; it is designed in. When an outage occurs, it’s often not the failure itself that defines the customer experience, but how the system is designed to respond.
During the event, Harness Feature Management & Experimentation (FME) maintained 100% flag-delivery uptime across all regions—no redeploys, no configuration changes, no missed evaluations. This wasn’t luck. It’s the result of an architecture built from day one for failure resilience. FME was built from the ground up with fault tolerance and continuity in mind. From automatic fallback mechanisms to distributed decision engines and managed streaming infrastructure, every layer of our architecture is designed to ensure feature flag delivery remains resilient, even in the face of unexpected events.
One of the most important architectural principles in FME is graceful degradation, ensuring that even when one service experiences disruption, the system continues to function seamlessly. Our SDKs are designed to automatically fall back to polling if there is any issue connecting to the streaming service. This means developers and operators never have to manually intervene or redeploy code during an outage. The fallback happens instantly and intelligently, preserving continuity and minimizing operational burden. In contrast, many legacy systems in the market rely on manual configuration changes to fallback to polling and restore flag delivery, an approach that adds risk and friction exactly when teams can least afford it.
Client-side SDKs are often the first point of impact during a network disruption. In many architectures, these SDKs can serve only cached flag values when connectivity issues arise, leaving new users or sessions without the ability to evaluate flags. Harness FME takes a different approach. Each client SDK functions as a self-contained decision engine, capable of evaluating flag rules locally and automatically switching to polling when needed. Combined with local caching and retrieval from CDN edge locations, this design ensures that even during service interruptions, both existing and new users continue to receive flag evaluations without delay or degradation.
Harness FME’s distributed streaming architecture is engineered for global reach and high availability. If a region or node experiences issues, traffic automatically reroutes to healthy endpoints. Combined with instant SDK fallback to polling, this ensures uninterrupted flag delivery and real-time responsiveness, regardless of the scale of disruption. During the recent outages, as users of our own feature flags, we served each customer their targeted experience with no disruptions.
Even with strong backend continuity, user experience matters. Both the web console and APIs are engineered for graceful degradation. During transient internet instability, a subset of users may experience slowdowns, challenges accessing the web console, or issues posting back flag evaluation records; however, feature flag delivery and evaluation remain unaffected. This separation of control plane and delivery plane ensures that UI performance issues never impact your SDK evaluations and customer traffic. It is a key architectural decision that protects live customer experiences even in volatile network conditions.
Reliability isn’t just about surviving outages - it’s about designing for them. Building for resilience requires intentional architectural choices such as automatic fallback mechanisms, self-sufficient SDKs, and isolation between control and delivery planes. That’s why, at Harness, we are using these opportunities to learn while following best practices to continuously improve our products, minimize the impact of outages on our customers, and deliver uninterrupted feature management at a global scale. It’s not about avoiding every failure; that’s virtually impossible. However, it's essential to ensure that when failure does happen, your product continues to work for your customers.
If you’re brand new to Harness FME, get a demo here or sign up for a free trial today.



Databases have been crucial to web applications since their beginning, serving as the core storage for all functional aspects. They manage user identities, profiles, activities, and application-specific data, acting as the authoritative source of truth. Without databases, the interconnected information driving functionality and personalized experiences would not exist. Their integrity, performance, and scalability are vital for application success, and their strategic importance grows with increasing data complexity. In this article we are going to show you how you can leverage feature flags to compare different databases.
Let’s say you want to test and compare two different databases against one another. A common use case could be to compare the performance of two of the most popular open source databases. MariaDB and PostgreSQL.


MariaDB and PostgreSQL logos
Let’s think about how we want to do this. We want to compare the experience of our users with these different database. In this example we will be doing a 50/50 experiment. In a production environment doing real testing in all likelihood you already use one database and would use a very small percentage based rollout to the other one, such as a 90/10 (or even 95/5) to reduce the blast radius of potential issues.
To do this experiment, first, let’s make a Harness FME feature flag that distributes users 50/50 between MariaDB and PostgreSQL

Now for this experiment we need to have a reasonable amount of sample data in the db. In this sample experiment we will actually just load the same data into both databases. In production you’d want to build something like a read replica using a CDC (change data capture) tool so that your experimental database matches with your production data
Our code will generate 100,000 rows of this data table and load it into both before the experiment. This is not too big to cause issues with db query speed but big enough to see if some kind of change between database technologies. This table also has three different data types — text (varchar), numbers, and timestamps.
Now let’s make a basic app that simulates making our queries. Using Python we will make an app that executes queries from a list and displays the result.
Below you can see the basic architecture of our design. We will run MariaDB and Postgres on Docker and the application code will connect to both, using the Harness FME feature flag to determine which one to use for the request.

The sample queries we used can be seen below. We are using 5 queries with a variety of SQL keywords. We include joins, limits, ordering, functions, and grouping.
We use the Harness FME SDK to do the decisioning here for our user id values. It will determine if the incoming user experiences the Postgres or MariaDB treatment using the get_treatment method of the SDK based upon the rules we defined in the Harness FME console above.
Afterwards within the application we will run the query and then track the query_executionevent using the SDK’s track method.
See below for some key parts of our Python based app.
This code will initialize our Split (Harness FME) client for the SDK.
We will generate a sample user ID, just with an integer from 1–10,000
Now we need to get whether our user will be using Postgres or MariaDB. We also do some defensive programming here to ensure that we have a default if it’s not either postgres or mariadb
Now let’s run the query and track the query_executionevent. From the app you can select the query you want to run, or if you don’t it’ll just run one of the five sample queries at random.
The db_manager class handles maintaining the connections to the databases as well as tracking the execution time for the query. Here we can see it using Python’s time to track how long the query took. The object that the db_manager returns includes this value
Tracking the event allows us to see the impact of which database was faster for our users. The signature for the Harness FME SDK’s track method includes both a value and properties. In this case we supply the query execution time as the value and the actual query that ran as a property of the event that can be used later on for filtering and , as we will see later, dimensional analysis.
You can see a screenshot of what the app looks like below. There’s a simple bootstrap themed frontend that does the display here.

app screenshot
The last step here is that we need to build a metric to do the comparison.
Here we built a metric called db_performance_comparison . In this metric we set up our desired impact — we want the query time to decrease. Our traffic type is of user.

Metric configuration
One of the most important questions is what we will select for the Measure as option. Here we have a few options, as can be seen below

Measure as options
We want to compare across users, and are interested in faster average query execution times, so we select Average of event values per user. Count, sum, ratio, and percent don’t make sense here.
Lastly, we are measuring the query_execution event.
We added this metric as a key metric for our db_performance_comparison feature flag.

Selection of our metric as a key metric
One additional thing we will want to do is set up dimensional analysis, like we mentioned above. Dimensional analysis will let us drill down into the individual queries to see which one(s) were more or less performant on each database. We can have up to 20 values in here. If we’ve already been sending events they can simply be selected as we keep track of them internally — otherwise, we will input our queries here.

selection of values for dimensional analysis
Now that we have our dimensions, our metric, and our application set to use our feature flag, we can now send traffic to the application.
For this example, I’ve created a load testing script that uses Selenium to load up my application. This will send enough traffic so that I’ll be able to get significance on my db_performance_comparison metric.
I got some pretty interesting results, if we look at the metrics impact screen we can see that Postgres resulted in a 84% drop in query time.


Even more, if we drill down to the dimensional analysis for the metric, we can see which queries were faster and which were actually slower using Postgres.

So some queries were faster and some were slower, but the faster queries were MUCH faster. This allows you to pinpoint the performance you would get by changing database engines.
You can also see the statistics in a table below — seems like the query with the most significant speedup was one that used grouping and limits.

However, the query that used a join was much slower in Postgres — you can see it’s the query that starts with SELECT a.i... , since we are doing a self-join the table alias is a. Also the query that uses EXTRACT (an SQL date function) is nearly 56% slower as well.
In summary, running experiments on backend infrastructure like databases using Harness FME can yield significant insights and performance improvements. As demonstrated, testing MariaDB against PostgreSQL revealed an 84% drop in query time with Postgres. Furthermore, dimensional analysis allowed us to identify specific queries that benefited the most, specifically those involving grouping and limits, and which queries were slower. This level of detailed performance data enables you to make informed decisions about your database engine and infrastructure, leading to optimization, efficiency, and ultimately, better user experience. Harness FME provides a robust platform for conducting such experiments and extracting actionable insights. For example — if we had an application that used a lot of join based queries or used SQL date functions like EXTRACT it may end up showing that MariaDB would be faster than Postgres and it wouldn’t make sense to consider a migration to it.
The full code for our experiment lives here: https://github.com/Split-Community/DB-Speed-Test


Managing feature flags can be complex, especially across multiple projects and environments. Teams often need to navigate dashboards, APIs, and documentation to understand which flags exist, their configurations, and where they are deployed. What if you could handle these tasks using simple natural language prompts directly within your AI-powered IDE?

Harness Model Context Protocol (MCP) tools make this possible. By integrating with Claude Code, Windsurf, Cursor, or VS Code, developers and product managers can discover projects, list feature flags, and inspect flag definitions, all without leaving their development environment.
By using one of many AI-powered IDE agents, you can query your feature management data using natural language. They analyze your projects and flags to generate structured outputs that the agent can interpret to accurately answer questions and make recommendations for release planning.
With these agents, non-technical stakeholders can query and understand feature flags without deeper technical expertise. This approach reduces context switching, lowers the learning curve, and enables teams to make faster, data-driven decisions about feature management and rollout.
According to Harness and LeadDev’s survey of 500 engineering leaders in 2024:
82% of teams that are successful with feature management actively monitor system performance and user behavior at the feature level, and 78% prioritize risk mitigation and optimization when releasing new features.
Harness MCP tools help teams address these priorities by enabling developers and release engineers to audit, compare, and inspect feature flags across projects and environments in real time, aligning with industry best practices for governance, risk mitigation, and operational visibility.
Traditional feature flag management practices can present several challenges:
Harness MCP tools address these pain points by providing a conversational interface for interacting with your FME data, democratizing access to feature management insights across teams.
The FME MCP integration supports several capabilities:
You can also generate quick summaries of flag configurations or compare flag settings across environments directly in Claude Code using natural language prompts.
Some example prompts to get you started include the following:
"List all feature flags in the `checkout-service` project."
"Describe the rollout strategy and targeting rules for `enable_new_checkout`."
"Compare the `enable_checkout_flow` flag between staging and production."
"Show me all active flags in the `payment-service` project."
“Show me all environments defined for the `checkout-service` project.”
“Identify all flags that are fully rolled out and safe to remove from code.”
These prompts produce actionable insights in Claude Code (or your IDE of choice).
To start using Harness MCP tools for FME, ensure you have access to Claude Code and the Harness platform with FME enabled. Then, interact with the tools via natural language prompts to discover projects, explore flags, and inspect flag configurations.
Harness MCP tools transform feature management into a conversational, AI-assisted workflow, making it easier to audit and manage your feature flags consistently across environments.
{
...
"mcpServers": {
"harness": {
"command": "/path/to/harness-mcp-server",
"args": [
"stdio",
"--toolsets=fme"
],
"env": {
"HARNESS_API_KEY": "your-api-key-here",
"HARNESS_DEFAULT_ORG_ID": "your-org-id",
"HARNESS_DEFAULT_PROJECT_ID": "your-project-id",
"HARNESS_BASE_URL": "https://your-harness-instance.harness.io"
}
}
}
}To configure additional MCP-compatible AI tools like Windsurf, Cursor, or VS Code, see the Harness MCP Server documentation, which includes detailed setup instructions for all supported platforms.


Feature management at scale is a common operational challenge. With Harness MCP tools and AI-powered IDEs, teams can already discover, inspect, and summarize flag configurations conversationally, reducing context switching and speeding up audits.
Looking ahead, this workflow extends itself towards a DevOps-focused approach, where developers and release engineers can prompt tools like Claude Code to identify inconsistencies or misconfigurations in feature flags across environments and take action to address them.
By embedding these capabilities directly into the development workflow, feature management becomes more operational and code-aware, enabling teams to maintain governance and reliability in real time.
For more information about the Harness MCP Server, see the Harness MCP Server documentation and the GitHub repository. If you’re brand new to Harness FME, sign up for a free trial today.


Split is excited to announce participation in OpenFeature, an initiative led by Dynatrace and recently submitted to the Cloud Native Computing Foundation (CNCF) for consideration as a sandbox program.
As part of an effort to define a new open standard for feature flag management, this project brings together an industry consortium of top leaders. Together, we aim to provide a vendor-neutral approach to integrating with feature flagging and management solutions. By defining a standard API and SDK for feature flagging, OpenFeature is meant to reduce issues or friction commonly experienced today with the end goal of helping all development teams ramp reliable release cycles at scale and, ultimately, move towards a progressive delivery model.
At Split, we believe this effort is a strong signal that feature flagging is truly going “mainstream” and will be the standard best practice across all industries in the near future.
Feature flagging is a simple, yet powerful technique that can be used for a range of purposes to improve the entire software development lifecycle. Other common terms include things like “feature toggle” or “feature gate.” Despite sometimes going by different names, the basic concept underlying feature flags is the same:
A feature flag is a mechanism that allows you to decouple a feature release from a deployment and choose between different code paths in your system at runtime.
Because feature flags enable software development and delivery teams to turn functionality on and off at runtime without deploying new code, feature management has become a mission-critical component for delivering cloud-native applications. In fact, feature management supports a range of practices rooted in achieving continuous delivery, and it is especially key for progressive delivery’s goal of limiting blast radius by learning early.
Think about all the use cases. Feature flags allow you to run controlled rollouts, automate kill switches, a/b test in production, implement entitlements, manage large-scale architectural migrations, and more. More fundamentally, feature flags enable trunk-based development, which eliminates the need to maintain multiple long-lived feature branches within your source code, simplifying and accelerating release cycles.
While feature flags alone are very powerful, organizations that use flagging at scale quickly learn that additional functionality is needed for a proper, long-term feature management approach. This requires functionality like a management interface, the ability to perform controlled rollouts, automated scheduling, permissions and audit trails, integration into analytics systems, and more. For companies who want to start feature flagging at scale, and eventually move towards a true progressive delivery model, this is where companies like Split come into the mix.
Split offers full support for progressive delivery. We provide sophisticated targeting for controlled rollouts but also flag-aware monitoring to protect your KPIs for every release, as well as feature-level experimentation to optimize for impact. Additionally, we invite you to learn more about our enterprise-readiness, API-first approach, and leading integration ecosystem.
Feature flag tools, like Split, all use their proprietary SDKs with frameworks, definitions, and data/event types unique to their platform. There are differences across the feature management landscape in how we define, document, and integrate feature flags with 3rd party solutions, and with this, issues can arise.
For one, we all end up maintaining a library of feature flagging SDKs in various tech stacks. This can be quite a lot of effort, and that all is duplicated by each feature management solution. Additionally, while it is commonly accepted that feature management solutions are essential in modern software delivery, for some, these differences also make the barrier to entry seem too high. Rather, standardizing feature management will allow organizations to worry less about easy integration across their tech stack, so they can just get started using feature flags!
Ultimately, we see OpenFeature as an important opportunity to promote good software practices through developing a vendor-neutral approach and building greater feature flag awareness.
Created to support a robust feature flag ecosystem using cloud-native technologies, OpenFeature is a collective effort across multiple vendors and verticals. The mission of OpenFeature is to improve the software development lifecycle, no matter the size of the project, by standardizing feature flagging for developers.
By defining a standard API and providing a common SDK, OpenFeature will provide a language-agnostic, vendor-neutral standard for feature flagging. This provides flexibility for organizations, and their application integrators, to choose the solutions that best fit their current requirements while avoiding code-level lock-in.
Feature management solutions, like Split, will implement “providers” which integrate into the OpenFeature SDK, allowing users to rely on a single, standard API for flag evaluation across every tech stack. Ultimately, the hope is that this standardization will provide the confidence for more development teams to get started with feature flagging.
“OpenFeature is a timely initiative to promote a standardized implementation of feature flags. Time and again we’ve seen companies reinventing the wheel and hand-rolling their feature flags. At Split, we believe that every feature should be behind a feature flag, and that feature flags are best when paired with data. OpenFeature support for Open Telemetry is a great step in the right direction,” Pato Echagüe, Split CTO and sitting member of the OpenFeature consortium.
We are confident in the power of feature flagging and know that the future of software delivery will be done progressively using feature management solutions, like Split. Our hope is that OpenFeature provides a win for both development teams as well as vendors, including feature management tools and 3rd party solutions across the tech stack. Most importantly, this initiative will continue to push forward the concept of feature flagging as a standard best practice for all modern software delivery.
To learn more about OpenFeature, we invite you to visit: https://openfeature.dev.
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Switch on a free account today, schedule a demo, or contact us for further questions.


Delivering feature flags with lightning speed and reliability has always been one of our top priorities at Split. We’ve continuously improved our architecture as we’ve served more and more traffic over the past few years (We served half a trillion flags last month!). To support this growth, we use a stable and simple polling architecture to propagate all feature flag changes to our SDKs.
At the same time, we’ve maintained our focus on honoring one of our company values, “Every Customer”. We’ve been listening to customer feedback and weighing that feedback during each of our quarterly prioritization sessions. Over the course of those sessions, we’ve recognized that our ability to immediately propagate changes to SDKs was important for many customers so we decided to invest in a real-time streaming architecture.
Early this year we began to work on our new streaming architecture that broadcasts feature flag changes immediately. We plan for this new architecture to become the new default as we fully roll it out in the next two months.
For this streaming architecture, we chose Server-Sent Events (SSE from now on) as the preferred mechanism. SSE allows a server to send data asynchronously to a client (or a server) once a connection is established. It works over the HTTPS transport layer, which is an advantage over other protocols as it offers a standard JavaScript client API named EventSource implemented in most modern browsers as part of the HTML5 standard.
While real-time streaming using SSE will be the default going forward, customers will still have the option to choose polling by setting the configuration on the SDK side.

Running a benchmark to measure latencies over the Internet is always tricky and controversial as there is a lot of variability in the networks. To that point, describing the testing scenario is a key component of such tests.
We created several testing scenarios which measured:
We then ran this test several times from different locations to see how latency varies from one place to another.
In all those scenarios, the push notifications arrived within a few hundred milliseconds and the full message containing all the feature flag changes were consistently under a second latency. This last measurement includes the time until the last byte of the payload arrives.
As we march toward the general availability of this functionality, we’ll continue to perform more of these benchmarks and from new locations so we can continue to tune the systems to achieve acceptable performance and latency. So far we are pleased with the results and we look forward to rolling it out to everyone soon.
Both streaming and polling offer a reliable, highly performant platform to serve splits to your apps.
By default, we will move to a streaming mode because it offers:
In case the SDK detects any issues with the streaming service, it will use polling as a fallback mechanism.
In some cases, a polling technique is preferable. Rather than react to a push message, in polling mode, the client asks the server for new data on a user-defined interval. The benefits of using a polling approach include:
We are excited about the capabilities that this new streaming architecture approach to delivering feature flag changes will deliver. We’re rolling out the new streaming architecture in stages starting in early May. If you are interested in having early access to this functionality, contact your Split account manager or email support at support@split.io to be part of the beta.
To learn about other upcoming features and be the first to see all our content, we’d love to have you follow us on Twitter!
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Switch on a free account today, schedule a demo, or contact us for further questions.


Consider the advantages and disadvantages of employing a tenant (e.g., account-based) traffic type versus a conventional user traffic type for each experiment. Unless it is crucial to provide a consistent experience for all users within a specific account, opt for a user traffic type to facilitate experimentation and measurement. This will significantly increase your sample size, unlocking greater potential for insights and analysis.
Important to note: In Split, the traffic type for an experiment can be decided on a case-by-case basis, depending on the feature change, the test’s success metrics, and the sample size needed.
Even if using a tenant traffic type is the only logical choice for your experiment, there are strategies you can employ to increase the likelihood of a successful (i.e., statistically significant) test.
Utilize the 10 Tips for Running Experiments With Low Traffic guide. You can thank us later!
Split’s application ensures that a 50/50 experiment divides tenants according to that percentage utilizing its deterministic hashing algorithm and Sample Ratio Mismatch calculator, but doesn’t consider that some tenants may have more users than others.
This can result in an unbalanced user allocation across treatments, as shown below, using “Accounts” as the tenant type.

A reminder: The numerator is set to the event you want to count (e.g., number of clicks to “download desktop app”). The denominator is set to an event that occurs leading up to the numerator event (e.g., number of impressions or screen views where the user is prompted to “download desktop app”). The denominator can also be a generic event that tracks the number of users who saw the treatment.
If you follow these steps, you should be able to overcome most obstacles when running a B2B experiment. And remember: Split offers the unique flexibility to run experiments based on the traffic type that suits your needs. Learn more here.
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done.
Switch on a free account today, schedule a demo to learn more, or contact us for further questions and support.


The concept of Serverless Computing(https://en.wikipedia.org/wiki/Serverless_computing), also called Functions as a Service (FaaS) is fast becoming a trend in software development. This blog post will highlight steps and best practices for integrating Split feature flags into a serverless environment.
Serverless architectures enable you to add custom logic to other provider services, or to break up your system (or just a part of it) into a set of event-driven stateless functions that will execute on a certain trigger, perform some processing, and act on the result — either sending it to the next function in the pipeline, or by returning it as result of a request, or by storing it in a database. One interesting use case for FaaS is image processing where there is a need to validate the data before storing it in a database, retrieving assets from an S3 bucket, etc.
Some advantages of this architecture include:
Some of the main providers for serverless architecture include, Amazon: AWS Lambda; Google: Cloud Functions; and Microsoft: Azure Functions. Regardless of which provider you may choose, you will still reap the benefits of feature flagging without real servers.
In this blog post, we’ll focus on AWS lambda with functions written in JavaScript running on Node.js. Additionally we’ll highlight one approach to interacting with Split feature flags on a serverless application. It’s worth noting that there are several ways in which one can interact with Split on a serverless application, but we will highlight just one of them in this post.
If we are using Lambda functions in Amazon AWS, the best approach would be to use ElastiCache (Redis flavor) as an in-memory external data store, where we can store our feature rules that will be used by the Split SDKs running on Lambda functions to generate the feature flags.
One way to achieve this is to set up the Split Synchronizer, a background service created to synchronize Split information for multiple SDKs onto an external cache, Redis. To learn more about Split Synchronizer, check out our recent blog post.
On the other hand, the Split Node SDK has a built-in Redis integration that can be used to communicate with a Redis ElastiCache cluster. The diagram below illustrates the set up:

Start by going to the ElastiCache console and create a cluster within the same VPC that you’ll be running the Lambda functions from. Make sure to select Redis as the engine:

The next step would be to deploy the Split Synchronizer on ECS (in synchronizer mode) using the existing Split Synchronizer Docker image. Refer to this guide on how to deploy docker containers.
Now from the EC2 Container Service (ECS) console create an ECS cluster within the same VPC as before. As a next step create the task definition that will be used on the service by going to the Task Definitions page. This is where docker image repository will be specified, including any environment variables that will be required.
As images on Docker Hub are available by default, specify the organization/image:

And environment variables (specifics can be found on the Split Synchronizer docs):

Any Docker port mapping needed can be specified during the task creation.
At this point we have the EC2 cluster and we have our task. The next step is to create a service that uses this task — go to your new cluster and click “create” on the services tab. You need to at least select the task and the number of tasks running concurrently:

Finish with any custom configuration you may need, review and create the service. This will launch as many instances as specified. If there were no errors, the feature flags definitions provided by the Split service should already be in the external cache, and ready to be used by the SDKs integrated in the lambda functions that we’ll set up in the next section.
There are two things we need to know before we start:
On the custom function, install the @splitsoftware/splitio (NPM(https://www.npmjs.com/package/@splitsoftware/splitio)) npm package and include the node_modules folder on the zip.
Step-by-step of an example function:
@splitsoftware/splitio package.index.js file. Require the @splitsoftware/splitio package there.handler.One important thing to note — as async storage is used, async calls to the API will be received.
View the example code below:
Once the code has been written, it’s time to prepare the deployment package by creating a zip that includes index.js and the node_modules folder. Next, go to the Lambda console and select “create function”. On the blueprint selection page, select “Author from scratch” option and dd the trigger that will be used. It’s recommended not to enable it until you’re certain that the function works as expected.
On the Lambda function code section, select the “Upload a .ZIP file” option. It can also be uploaded to S3 and the URL specified. Any environment variables required on Lambda can specified here (for example the one pointing to Redis ElastiCache needed in the previous step):

Set up your handler function in the section called “Lambda function handler and role”. Leave the default as index.handler.
Note that the first part is the file name inside the zip where the handler function is exported, and the second part is the function name. For example, if a file is called app.js and the function is called myHandler, the “Handler” value would be app.myHandler.
On the Advanced settings of this step, set the VPC where the ElastiCache cluster is.
Once the roles and anything else that is required has been configured, click next, review and create the function.
That’s it! To test your function manually, just click the “Test” button, select a the synthetic trigger of preference and check that it works as expected.
There are few ways to make use of Split Feature Flags in a serverless application. This blog post covers the case of using Split Synchronizer and for javascript functions.
In future posts we’ll share another approach using Split “callhome” or Split Evaluator which is a microservice that can evaluate flags and return the result, in addition to storing the rules to evaluate the flags as highlighted in this post.
In case you’re wondering “can’t I hit the Split servers from my Lambda function?” The answer is yes, in a “standalone” mode, but it won’t be as efficient as having the state in one common place i.e. Redis. It’s NOT recommended to run the SDK in a standalone mode due to the latency it may incur at the creation of one SDK object per function.
For further help using Split synchronizer in a serverless environment contact us or use the support widget in our cloud console — we’re here to help!
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Switch on a free account today, schedule a demo, or contact us for further questions.


For any software company, reducing logs helps to save money. We also know precisely how painful it is to have a production problem or even an incident only to find that we haven’t logged nearly enough. There are several different strategies to try to balance these two conflicting goals, including configuration to control log levels and sampling. In this post, we will discuss how feature flags can help you improve your logging strategy. As a result, you can update variables without pushing a configuration change, allowing for faster modifications in a crisis.
First, whether or not you use feature flags, we recommend wrapping your logging in an internal library. This has a few advantages. It allows you to keep a consistent format across your logs. Instead of relying on each developer to formulate their own logs, you can have them specify a few parameters and format the rest for them. Additionally, it allows you to automatically fill in fields you want everywhere, such as trace_id or user_id (or whatever applies to your application). Finally, it gives you a single location to add a feature flag.
Now that we have a feature flag for our logs, how does that help? We will set it up to use that feature flag to control sampling rate and log level per class. There are a few ways to do this, and we’ll follow up with another post about how we actually did this for our own logs. For this post, though, we will explain one of the other options.
At a high level, we will set up a default logging level with the ability to override this—at the class level. To do this, we’ll start by creating a treatment for each log level.

Once we have created the Split with the log levels, we need to create a Logback Interceptor class. This will fetch the Split changes periodically and sets up the right level to the ROOT logger in runtime. The next class diagrams illustrate the idea:

And the next code snippet implements the Logback Interceptor:
To get it running, add a single call to the static method init() injecting the SplitClient (see how to setup Split SDK here) and the Split name:
So, with this simple code you can handle runtime log levels without stopping your program execution.
Taking this further, we can add a little more complexity to handle not only the log level but to also control the number of logs by sampling. To do this, we need to create a Logback appender and use the Split feature known as Dynamic Configuration:
The first step is to configure the Split with the desired configuration approach; you can use key-value pairs or a custom JSON.
In this example, we are setting up a custom JSON value to have more flexibility in our configuration:

Once we have set our dynamic configuration per treatment, we can write our code.
In this case, we will create a concurrent storage class to share the dynamic configuration across our LogbackInterceptor class. The LogbackInterceptor will fetch data from Split and write the configuration values into storage. The Logback appender will be reading from the storage when sampling log lines.
The next diagram illustrates this approach:

So, following the previous diagram the code of each class will be:
Now you can see how feature flags, and especially Split, can help improve your logging. We allow you to log less with the piece of mind that you can quickly and easily increase logs if something happens. And we can do all of this without pushing a code or configuration change and without littering the code with separate feature flags just waiting to be flipped in case of emergency.
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.


Companies that invest constantly in tools to help engineering teams reduce the cycle time show a higher degree of employee satisfaction, leading to less employee frustration and higher retention over time. Sounds great, right?
We also know that the most elite engineering organizations are able to move code from being written and committed and into production in less than one day. The next tier down, the highly successful engineering orgs have cycle time at one week or less. And the rest follow. Based on this classification, how do you rank your team?
This data comes from both DORA research and a joint webinar we recently conducted with Bryan Helmkamp from CodeClimate in which we discussed one way to define development cycle time as well as tips on how to improve it. I was excited about this webinar as there is a strong correlation with how feature flags can help reduce the time of several phases of it.
Before we dig deeper on how to reduce cycle time, let’s talk about what it is. Brian defines cycle time as the amount of time spent between when code is committed and when it is shipped to production.
Bryan and team have an opinionated view on how to define the phases. They recommend you start measuring the code cycle time when the pull request is made and to treat the time more like a Service Level Objective(https://www.atlassian.com/incident-management/kpis/sla-vs-slo-vs-sli) (SLO) where you measure 90th or 95th percentile, as opposed to just measuring averages.
I’ll go ahead and describe code cycle times below and provide my commentary regarding where feature flags can help decrease the time of some of the phases.

Time to open measures the time from first commit to the time the pull request is opened. It accounts for the largest chunk of overall cycle time. Bryan mentioned that when people create smaller pull requests they tend to be picked up quicker, reviewed faster and deployed faster as well.
Here is where feature flags have a profound positive impact on cycle time. Why is that? Because when using feature flags, you separate a push from a release. And when you do that, engineers feel safer merging code and shipping it to production. The new code path is gated by a flag that is not yet visible to users. And the byproduct of that is smaller pull requests, faster time to review and shorter Time to Open cycle time.
This is the amount of time from when the pull request is opened to the time of first review. This is an indicator of team collaboration patterns in the organization, As we all know, slow reviews increase the amount of work-in-progress.
Feature flags again help to decrease this time by allowing engineers to create pull requests in small batches that in turn help reviewers review and approve outstanding pull requests faster.
Engineering leaders must make sure that coding cadence is not the only thing that gets rewarded. Code reviews have to be something that the leader rewards as well, given the implications to the cycle time and development cadence.
Other investments you can make are in tooling and integrations (like Slack) to make sure people are aware there are pull requests ready to be reviewed and make collaboration more efficient.
Time to approve refers to the time from the first review to when the pull request is ready to be merged.
This speaks to finding alignment around the desired state of the PR to be considered ready and must balance speed and thoroughness.
Things to look out for here include what percentage of comments imply actions. Driving this metric to any extreme is not good. You must seek a balance. Too many comments slow things down and too few can lead to defects.
Lastly, the number of review cycles is another metric to optimize. Too many back and forths leads to a decrease in cycle time. Bryan’s team found their sweet spot at four review cycles per PR.
Time to deploy is the time from when the PR is ready to go until it is deployed to production.
You can decrease the time in this phase of the cycle time by investing in reliable Continuous Integration (CI) tools to increase people’s confidence in the deploy process. Usually when there is no investment in this area, as the code base grows people develop lack of confidence in the deployment due to the fear of breaking something. Automate as much as you can, such as testing and security checks (also known as shifting left).
Feature flags play an important role here and it is something Bryan calls out in his presentation. Code can arrive to production much faster and with lower friction when using feature flags (remember, a code push doesn’t imply a release). Now, this opens the question if there should be a fifth phase in the cycle time that includes time to release. What do you think?
If you’d like to learn more about DORA, the research organization mentioned in this post, check out their website, and their survey data on high-performing companies.
If you’re ready to get started with Split you can sign up for our forever-free tier. If you’d like to learn more, check out these resources:
If you’d like to catch our newest content right off the presses, we’d love to have you follow us on Twitter, YouTube, and Facebook!
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.


How often do you build a product that you end up using every day? At Split, we “dogfood” our own product in so many ways that our engineering and product teams are using Split nearly every day. It’s how we make Split better. Using your own product as a tool to build your product gives you a front-row experience of how valuable your product is to your customers, how well it solves specific use cases, where the pain points are, and so much more.
I believe every software company should deploy feature flags in their product. Why? Because feature flags provide a safety net to make engineering teams more productive by allowing engineers to ship code faster, they open up the possibility of testing in production and enable devs and product teams to quickly kill any feature that causes product degradation, often in a matter of seconds.
Today, I’d like to walk you through a few of the ways we’re using feature flags at Split. Some of these will hopefully be familiar and obvious, but my hope is that others will give you ideas for new ways to drive efficiency, innovation, or simply product-market fit in your organization.
We talk a lot about testing in production because it’s one of the most obvious, and obviously useful reasons to deploy feature flags. When a feature is ready for delivery, (or at minimum, it has passed all testing in your staging or pre-production environment) it can be deployed to production in a dark way. This means that the binary containing the new feature is in production but no user can access it as the flag is turned off.
At Split, we first toggle the new feature on for internal users to complete testing. Once it’s ready and the functionality has passed all testing criteria, we will ramp up the feature and expose it to 5%, 10%, 25%, 50%, and 100% of our users. For some feature releases, we’ll literally stop at each of those percentage rollouts to confirm everything is still working as intended before moving on. For others, we’ll use a subset of those steps. Only once we’ve reached 100% is the feature considered to be fully rolled out, at which point we remove the flag.
We also use flags to gate functionality based on the product tier an account or user is in. This is a really common feature flag use case. For example, if you are a free customer you only get access to email support. However, for our paid customers with premium support packages, we, via feature flags, enable chat support as well.
A product can be automated so that when a user upgrades to a new product tier, a feature flag is updated to include this new customer in the allowed customer list that has access to premium support functionality, like chat.
In many SaaS companies, customer success and engineering teams require some degree of access to production and customer data in order to help customers with their support requests. This obviously comes with a variety of regulatory and compliance issues, depending on your industry and certifications.
A practice we’ve adopted at Split is to gate the access to customer data or impersonation through a feature flag. Only a limited set of employees who have passed a rigorous background and financial check can have access to customer data. Every time new access is required, a feature flag grant request is created, a Split administrator can approve or reject the feature flag change request, and upon approval, this employee, via a feature flag grant, can access the impersonation functionality. For this, we leverage our recently released feature; approval flows. This segregation of duties is a key part of the SOC2 certification, and not having this practice in place can delay the certification approval process.
Feature flags are commonly used to help with technology migrations and to migrate from monolith to microservices. At Split, we use flags where there is any migration of technologies, for example, while evaluating a migration from AWS Kinesis to Kafka. Stick with me on this one, since we’re going to dip a toe into the world of experimentation, and how it’s enabled by feature flags. In a typical scenario, you would place a flag to enable a dark-write (or double writes) operation into the new system to test traffic live and verify how it will perform in production. Then a second flag is created to enable dark-read, similar to the prior flag to verify the read performance without affecting the performance of the user (hence, dark reads). Finally, a third flag is created to switch over the traffic to send requests to the new solution.
Throughout the life of Split, we have had a few opportunities to replace existing infrastructure, typically as part of a scaling conversation. Before we dig into the migration itself, we have to answer the question “Is the new system more expensive than the current?”. The quickest and lowest-risk approach to answering that question is to place the new system being evaluated next to the current one and send dark traffic for a short period of time, and then extrapolate the cost. Doing that is more resource-efficient since one can run an evaluation for one day with none to little side effects and extrapolate the cost.
At Split, we used this technique to evaluate a migration from using Kinesis Stream as the queue to receive all incoming flag evaluation data to SQS. SQS was placed behind a feature flag that allowed dark writes with the purpose of gathering data for 24 hours to then extrapolate what it would cost if we were to run it permanently. We were surprised to find that it ended up being a more economical and more performant solution and we prioritized resources to move to SQS in the end.
Michael Nygard popularized the Circuit Breaker pattern to prevent a cascade of failures in a system. We use feature flags as a main disconnect for functionality that is critical to behave within certain values of tolerance. If those values are exceeded, a simple toggle can disconnect that functionality from being used or alternatively use percentage rollouts to prevent it from being used excessively. The end goal? Make sure that system downstream is stable and healthy.
At Split, we use this pattern for things like external API endpoints, data collection services, frequency of synchronization with external systems, etc.
Because we use feature flags as manual circuit breakers, it is relatively easy to automate remediations when certain conditions are met. For example, if we gate certain functionality like data ingestion from source A, and that pipeline is getting more load than the system can handle, we can enable (or disable a flag) to indicate that a certain amount of noncritical traffic should be dropped to preserve the integrity of the system.
Currently, we are experimenting with Transposit to build automated runbooks so engineers can act automatically following a pre-established process to mitigate an incident. These processes will involve disabling, enabling, or changing the exposure of a feature flag as part of the runbook and with a click of a button. As part of this work, we’ll be excited to release runbook templates for our customers to use. Stay tuned!
This approach can be controversial since many logger frameworks allow you to enable debug or verbose mode natively. The advantage of using flags for this use case and wrapping a more verbose logging level around a feature flag is that you can target a specific customer or condition, vs doing that at the logger level, which is more coarse and tends to be more binary; verbose on or off. With feature flags, you can target a verbose mode for network traffic for a given user and set of users within a certain account, or a user agent, among others. Once the debugging session is done, the flag is turned back off.
We use this technique at Split when a support ticket is escalated to engineering for deeper analysis, and it has contributed to lower support request resolution times. One particular example is a flag that enables debugging for our SAML (single sign-on) functionality. Historically it has been an area with recurrent support tickets given the number of third-party identity providers, each of which has their own nuances. Having this logic toggle to turn on verbose logging has helped our support organization reduce support ticket resolution time.
I hope the set of use cases mentioned above in this post can serve as a starting point for those readers that are new to the concepts of experimentation and feature flags, or to deepen the usage of Split product for those who are already using Split.
If you’re ready to get started with Split you can sign up for our forever-free tier. If you’d like to learn more, check out these resources:
If you’d like to catch our newest content right off the presses, we’d love to have you follow us on Twitter, YouTube, and Facebook!
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.
Read more: The Benefits of Feature Flags in Software Delivery


Some experiments require a long time to reach significant results. If one option is preferable in the short term, you can split unevenly for a long period of time: expose 95% of traffic to the preferred option, and 5% to the other one. That pattern and the second variant are both known as a holdback.
Some changes to your web service or product, like making the purchase flow easier to navigate, are meant to raise business-critical metrics immediately. Others, like a new channel for customer service, might improve customer satisfaction rapidly but will only have a measurable, compounding effect on retention and other business-critical metrics in the long run. You can confirm that customers like the new option by looking at the Net Promoter Score (NPS). However, should you expose half of your users to a worse experience for months to measure the impact on churn?
There are many cases where an experiment should not last more than a few weeks, three months at most, to keep the product cycle manageable. However, some effects, like customer churn, can take longer to measure. Say you want to measure the impact of your change on churn. Say your customers book a holiday or review their retirement plan only once a year. In either of those case, a ten-week experiment is too short to expect any customer return and gather data to measure churn.
There are several options:
An approach that we could recommend is to run the experiment as expected but to set the short-term goal, like customer satisfaction survey as your objective criteria, and roll it out to all customers if the impact after a few weeks is significantly positive. Months later, you can check whether your overall retention has indeed improved compared to before your experiment. That comes with the limit of a Before-and-After comparison.
With that third approach, you can still measure what it’s like to have better customer service for a couple of purchase cycles; not only that, you can also measure the impact of expecting excellent service, time after time over extended periods. For example, it might increase entitlement, it could affect the brand positively, it could drive stories about exceptional situations where a better service was helpful.
The first question from your statisticians or analysts will likely be “Would we be able to measure the impact over only 5% of the audience? Wouldn’t that mean three times less power?” It would, roughly (5% is ten times fewer units than 50% and, following the central limit theorem, √10 is about 3) but a longer test set-up would be more sensitive: more visitors can enroll in the experiment and some effects compound.
More importantly, with customers being exposed to customer service multiple times, their retention should not just improve but compound. If your retention improves by 10% over one month, it’s 21% better after two, 77% better after six months. That’s several times larger. Those more consequential effects are easier to detect.
If you run a balanced 50/50 test, you know which variant offers the most short-term positive value, or which one is the most promising overall. To minimize the negative impact on the business from testing, you want to roll out to 90 or 95% of user population the variant with the most promising outcome, especially on leading indicators: best customer satisfaction, most items marked as favorites, etc.
You can decide to pick the option that will be easiest to deactivate, in case the holdback experiment gives surprising results. Introducing new interactions means that removing them will come at a cost. Keep in mind however that a hold-back is here to confirm a previous result, possibly measure its impact more accurately—it rarely flips the overall outcome.
Another way to decide which option to prioritize is to think about the possibilities that this opens. Allowing customers to identify their favorites (without buying) allows you to reactivate them with more purchase opportunities. It allows your machine learning team to train better recommendations. Those improvements can contribute to assigning more value to your preferred option.
Of course, if your users talk to each other, those left behind might resent that they don’t have a better experience. You might get bad press from the discrepancy. Exercise discretion and override the holdback when it is more expensive than interesting. Still, this effort will be beneficial in the long run to convince your executive stakeholders to invest in better service for long-term objectives.
When running this process, one of the most common issues is maintaining the old code or the previous operational processes for longer. That is a legitimate source of concern for the software engineers and the operational managers who will want to move on. Getting them engaged with the process is critical. You should explain the value of experimentation and why a holdback is useful when dealing with long-term effects. They will generally understand that this aligns with their objective of having a more streamlined experience and investing to resolve technical and operational debt in the long term too.
When you run an experiment that proves beneficial over the short term, you will want to roll it out as soon as you have significant results. However, if you still want to investigate its long-term effect, you also need to know the reliability of an experiment. To make sure as many users as possible benefit from the improved experience, roll it out to the majority of users, say 95%. Keep a minority of users in long-term control group. This is known as a holdback.
After several months, you should have a strong signal about the long-term impact on key metrics, notably those that compound. Remember to switch the holdback to the new experience when your experiment is over.
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.


In a digital business landscape marked by rapid evolution and customer-centricity, conversion rates have emerged as a vital metric of success. They are more than just numbers or percentages. Conversion rates signify the effectiveness of your marketing strategies and the resonance of your offerings with your target audience.
Your conversion rate is a clear indicator of how well you’re meeting your customers’ needs and wants. High conversion rates suggest that you’re providing value in a way that resonates with your audience, leading them to take the desired actions, whether it’s making a purchase, signing up for a newsletter, or any other goal you’ve set. Conversely, a lower conversion rate can signal a disconnect between your offerings and your audience’s expectations or needs. Understanding and optimizing conversion rates is, therefore, crucial for the growth and profitability of your business.
Feature flags are tools commonly used in software development for controlling the visibility and functionality of certain application features. However, their potential goes beyond just development. In the context of conversion rate optimization, feature flags can become a marketer’s secret weapon. They provide an opportunity to carry out extensive testing, refine the user experience, and, consequently, enhance the effectiveness of your conversion strategy.
The use of feature flags in conversion rate optimization represents a synergy between your development and marketing teams. It creates a pathway for these traditionally siloed units to collaborate and contribute towards a common goal—driving conversions. This collaborative approach can lead to a deeper understanding of user behavior and preferences, enabling you to tailor your offerings and user experience in a way that boosts conversion rates.
In this blog post, we’ll explore the concept of feature flags in depth, discuss how they can be leveraged for optimizing conversion rates, and illustrate how Split can support you in this endeavor.
The digital world’s dynamism means your conversion rates are never static. They can fluctuate based on myriad factors: evolving market trends, shifting user behavior, or changes in competitive dynamics.
To stay relevant and maintain high conversion rates, businesses must embrace adaptability in their strategies. This adaptability extends not just to marketing messaging but also to the user experience on your digital platform, which is where feature flags come into play.
Feature flags’ traditional use-case is in code deployment—they enable developers to release, test, and iterate on features in a live environment safely. But these powerful tools’ utility extends much beyond the engineering silo.
By enabling the dynamic manipulation of features, content, and overall user experience, feature flags can help marketers directly influence customer behavior and thereby optimize conversion rates.
Feature flags hold incredible potential as marketing tools. Though traditionally seen as a purely technical tool used for progressive delivery and risk mitigation, their usefulness in optimizing user experience and driving conversions has become increasingly apparent.
One of the significant benefits of feature flags is their ability to facilitate marketing experiments. By toggling features on or off for specific user segments, you can test various strategies and approaches, measure their effectiveness, and adjust accordingly. Feature flags provide the agility to test on a granular level, from modifying button colors and placement to the introduction of entirely new features. This experimental approach can help you understand what resonates best with your audience, providing valuable insights for future marketing strategies.
Today’s consumers crave personalized experiences, and feature flags can play an essential role in delivering them. Using feature flags, you can customize the features and user interface elements that different user segments experience. This high level of personalization can lead to increased engagement, better user experience, and, ultimately, higher conversion rates. For instance, a first-time visitor to your e-commerce platform might see a different set of features compared to a repeat customer, each designed to enhance their specific user journey and nudge them towards conversion.
Feature flags offer the ability to collect real-time feedback on the changes you implement, which can be critical in shaping your conversion rate optimization strategy. By monitoring user engagement and behavior after rolling out a feature to a small user segment, you can gain immediate insight into its impact. This fast feedback loop allows for the swift identification of features that drive conversions and those that might need further refinement.
Feature flags allow you to modify and test small elements at a time rather than implementing broad changes at once. This power of incremental changes can prove crucial for conversion rate optimization.
Numerous case studies and research suggest that cumulative, incremental changes—guided by data and user feedback—can lead to a significant boost in conversion rates over time.
Optimizing conversion rates with feature flags isn’t a one-team show. It involves close collaboration between development and marketing teams, marrying technical implementation with strategic decision-making.
In practice, organizations that have successfully leveraged this collaborative approach have reported significant improvements in their conversion rates. Their success underlines the power of breaking down silos and leveraging tools like feature flags across departments.
Feature flags offer a new way to approach conversion rate optimization—one that embraces adaptability, champions incremental improvements, and encourages collaboration across departments.
When engineering and marketing collaborate, using feature flags to align user experience with strategic objectives, businesses can make better, more informed decisions that drive conversions.
Ready to embrace this new way of conversion rate optimization? Split offers feature flagging solutions designed to empower both your development and marketing teams. Our platform supports dynamic configuration, enabling you to alter user experience in real time based on user feedback and analytics. This gives you the agility to adapt quickly and keep conversion rates high.
Dynamic configuration is an essential part of our platform’s power. It allows you to adjust the behavior of your software without needing to redeploy the entire application. With this feature, you can experiment, adjust and optimize on the go. Feature flagging is no longer just about risk mitigation; it’s about gaining actionable insights. Real-time adjustments lead to real-time insights, allowing you to stay ahead of the curve and keep conversion rates up.
Dynamic configuration empowers you to make changes that align with your users’ needs and behaviors as they evolve. When your digital platform can adapt quickly to shifting user preferences, you’ll see the impact on your conversion rates.
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.