Testing & Resilience Blogs

Featured Blogs

Technical

Applying Feature Flag Context To Your OpenTelemetry Spans

Learn how to enrich OpenTelemetry spans with feature flag context to better understand performance and behavior across flag treatments.

When you toggle a feature flag, you're changing the behavior of your application; sometimes, in subtle ways that are hard to detect through logs or metrics alone. By adding feature flag attributes directly to spans, you can make these changes observable at the trace level. This enables you to correlate performance, errors, or unusual behavior with the exact flag treatment a user received.

In practice, adding feature flag attributes to your spans allows faster debugging, clearer insights, and more confidence when rolling out flags in production. As teams ship code faster than ever, often with the help of AI, feature flags have become a primary tactic for controlling risk in production. However, when something goes wrong, it’s not enough to know that a request was slow or errored; you need to know which feature flag configuration caused the issue.

Without surfacing feature flag context in traces, teams are left to guess which rollout, experiment, or configuration change affected the behavior. Adding feature flag treatments directly to spans closes this gap by making flag-driven behavior observable, debuggable, and auditable in real time.

Enhancing Observability with Feature Flags and OpenTelemetry

If you’re already using OpenTelemetry, you may want to understand how to surface feature flag behavior in your traces. This article walks you through one approach to achieving this: manually enriching spans with feature flag attributes, allowing you to query traces based on specific flag states.

While this isn’t a native Harness FME integration, you can apply a simple pattern in your own applications to improve observability:

Identify the spans in your code where feature flag behavior impacts execution. This could be a request handler, a background job, or any logical unit of work.
Start a span (or use an existing one) for that unit of work using your OpenTelemetry tracer.
Retrieve the relevant feature flag treatments for the context (for example, a user ID or session).
Add each flag treatment as a span attribute so your traces can capture the state of feature flags during execution.
Use these attributes in your observability platform (e.g., Honeycomb) to filter or query traces by flag state.

This approach requires adding feature flag treatments as span attributes in your application code. Feature flags are not automatically exported to OpenTelemetry in Harness FME.

For this demonstration, we will use Honeycomb’s Java Agent and a small sample application (a threaded echo server) to show how feature flag treatments can be added to spans for improved visibility. While this example uses Java, this pattern is language-agnostic and can be applied in any application that supports OpenTelemetry. The same steps apply to web services, background jobs, or any application logic where you want to track the impact of feature flags.

Prerequisites

Before you begin, ensure you have the following requirements:

Java installed (v11 or later)
A working local development environment
Basic familiarity with Java sockets and threads
Permission to bind to local ports (the sample server listens on port 5009)

Setup

Follow these instructions to prepare your workspace for running the sample threaded echo server:

Create a working directory for your project by running the following command: mkdir threaded-echo-server && cd threaded-echo-server.
Add your Java files module (for example, `ThreadedEchoServer.java` and `ClientHandler.java`).
Compile the server by running the following command: javac ThreadedEchoServer.java.
Run the server with java ThreadedEchoServer.

How the Threaded Echo Server works

To illustrate this approach, we’ll use a small Java example: a threaded socket server that listens on port 5009 and echoes back whatever text the client sends.

The example below introduces a simple Java-based Threaded Echo Server. This server acts as our testbed for adding flag-aware span instrumentation.

‍

When the feature flag next_step is on, the server sleeps for two seconds. The sleep is wrapped with a span named "next_step" / "span2". When the flag is off, the server executes the normal doSomeWork behavior without the added wait time.

This produces the visible difference in performance shown by OpenTelemetry in the chart below. With the flag turned on, the spans appear in your Honeycomb trace.
‍

**Figure A:** A Honeycomb trace, displaying an Echo Server client session with the feature flag toggled on.

In this trace, the client sends four words. Each word shows nearly two seconds of processing time, which is the exact duration introduced by the feature flag.

With the flag turned off, the resulting trace shows the normal, faster echo processing flow:

**Figure B:** A Honeycomb trace, displaying an Echo Server client session with the feature flag toggled off.

The feature flag impacts the trace in two ways:

A new nested span appears, named after the feature flag. These green bars displayed in each span show how the flag creates explicit instrumented regions within a single client session.
Two seconds of artificial latency make the spans easy to identify.

Adding Feature Flag Treatments to Spans

So far, we’ve seen that feature flags can create additional spans in a trace. We can take this a step further: making the flags themselves queryable by adding their treatments as attributes to the top-level span. This lets you filter and analyze traces based on flag behavior.

The example below shows how the server evaluates its feature flags and attaches each treatment to the root echo span.

‍

The program evaluates three feature flags: next_step, multivariant_demo, and new_onboarding. Using Harness FME, all flags are evaluated up front and stored in a flag2treatments map. Any dynamic changes to a flag during execution are ignored for the remainder of the program's run; however, there are ways to handle this in more advanced scenarios.

For this example, caching the treatments is fine, and each treatment is also added as a span attribute. By including the flag “impression” in the span, you can query traces to see which sessions were affected by a particular flag or treatment. This makes it easier to isolate and analyze trace behavior driven by specific feature flags.

**Figure C:** A Honeycomb query that filters traces by feature flag impression.

In Honeycomb, you can query traces by feature flag “impressions” by setting COUNT in the Visualize section and adding split.next_step = on in the Where section (using AND if you have multiple conditions).

Next Steps for Feature Flag Observability

Feature flags aren’t ideal candidates for bytecode instrumentation. The challenge here isn’t in the SDK itself, but rather in determining what behavior you want to observe when a flag is toggled on or off.

Looking ahead, one possible approach is to treat spans as proxies for flags: a span could represent a flag, allowing you to enable or disable entire sections of live application code by identifying the associated spans. While conceptually powerful, this approach can be complex and may not scale well, depending on the number of spans your application uses.

In the short term, a simpler pattern works well: manually wrap feature flag changes with a span and add the flag treatments as span attributes. This provides you with visibility, powered by OpenTelemetry, into how feature flags impact your application's behavior, enabling better traceability and faster debugging.

To get started with feature flags, see the Harness FME Feature Management documentation. If you’re brand new to Harness FME, sign up for a free trial today.

Recent Blogs

Company News

Harness FME Fast and Furious

Read about all the updates we have made to Harness FME

Trevor Stuart

November 20, 2025

Time to read

Over the past six months, we have been hard at work building an integrated experience to take full advantage of the new platform made available after the Split.io merger with Harness. We have shipped a unified Harness UI for migrated Split customers, added enterprise-grade controls for experiments and rollouts, and doubled down on AI to help teams see impact faster and act with confidence. Highlights include OpenFeature providers, Warehouse Native Experimentation (beta), AI Experiment Summaries, rule-based segments, SDK fallback treatments, dimensional analysis support, and new FME MCP tools that connect your flags to AI-assisted IDEs.

And our efforts are being noticed. Just last month, Forrester released the 2025 Forrester Wave™ for Continuous Delivery & Release Automation where Harness was ranked as a leader in part due to our platform approach including CI/CD and FME. This helps us uniquely solve some of the most challenging problems facing DevOps teams today.

A more integrated experience: from Split UI to Harness UI

This year we completed the front-end migration path that moves customers from app.split.io to app.harness.io, giving teams a consistent, modern experience across the Harness platform with no developer code changes required. Day-to-day user flows remain familiar, while admins gain Harness-native RBAC, SSO, and API management with personal access token and service account token support.

What this means for you:

No more switching UIs for customers who use FME and other Harness products
No SDK or proxy changes required for production apps. Your flag evaluations continue as before.
Harness RBAC and SSO now govern FME access. Migrations include clear before and after guides for roles, groups, and SCIM.
Admin API parity with a documented before and after map and examples for the few endpoints that moved.

For admins, the quick confidence checklist, logging steps, and side-by-side screens make the switch straightforward. FME Settings routes you into the standard Harness RBAC screens for long-term consistency where appropriate.

Built for AI-driven workloads

Two themes shaped our AI investments: explainability and in-flow assist.

Explainable measurement: AI Summaries help teams move faster from what happened to what should we do, without forcing deep dives into raw statistics.
AI in the developer loop: FME MCP tools help developers inspect, compare, and adjust flag states without context switching. This shortens the loop between finding an issue and safely changing a treatment.
Data where you need it: Warehouse Native Experimentation runs analyses where your data already lives, improving transparency and aligning with the way modern AI and analytics teams operate.

To learn more, watch this video!

Warehouse Native Experimentation (beta)

Warehouse Native Experimentation lets you run analyses directly in your own data warehouse using your assignment and event data for more transparent, flexible measurement. We are pleased to announce that this feature is now available in beta. Customers can request access through their account team and read more about it in our docs.

What else is new in Harness FME

As you can see from all the new features below, we have been running hard and we are accelerating into the turn as we head toward the end of the year. We take pride in the partnerships we have with our customers. As we listen to your concerns, our engineering teams are working hard to implement the features you need to be successful.

October 2025

FME MCP tools connect feature flags and experiments to AI-assisted IDEs such as Claude Code, Windsurf, Cursor, and VS Code. Explore flags, compare flag definitions, and audit rules conversationally to speed up release workflows.
OpenFeature providers for Android, iOS, Web, Java, Python, .NET, Node.js, React, and Angular help standardize evaluations and reduce lock-in across services and teams.
Harness Proxy centralizes and secures outgoing SDK traffic with support for OAuth and mTLS, easing egress-control needs at scale. It also works with other Harness products like Database DevOps.
Owners as metadata clarifies accountability while edit privileges remain governed by RBAC.

September 2025

SDK Fallback treatments allow you to avoid unexpected control treatments by offering a centralized, simple and scalable way to set these across your SDKs and applications.
Experiment entry event filter keeps analysis clean by including only users who actually hit the experiment entry point.

July 2025

Dimensional analysis on experiments reveals effects by browser, device, region, and more so you can catch segment-level regressions early.
AI Summarize on experiments and metrics gives fast, accurate summaries for non-technical stakeholders and busy teams.

June 2025

Rule-based segments target users dynamically using attribute conditions, reducing static list maintenance and simplifying targeting rules reusability.
Experiment tags improve searchability and at-scale organization.
Client-side cache controls for Browser, iOS, and Android SDKs let you tune rollout cache expiration and initialization behavior.

Foundation laid earlier in 2025

Experiments Dashboard for easier setup and multi-treatment analysis.
Release Agent, rebranded from Switch, adds follow-up Q and A on metric summaries with admin-controlled AI settings.

As always, you can find details on all our new features by reading our release notes.

What customers can expect next

We are excited to add more value for our customers by continuing to integrate Split with Harness to achieve the best of both worlds. Harness CI/CD customers can expect familiar and proven methodologies to show up in FME like pipelines, RBAC, SSO support and more. To see the full roadmap and get a sneak peak at what is coming, reach out to us to schedule a call with your account representative.

Get started

Attending AWS ReInvent? Come see us at booth #731 and get a live demo.
Already migrated from Split? Sign in at app.harness.io. If you are an admin, review the RBAC and SSO guides to validate group mappings and SCIM.
New to Harness FME? Explore the product page and datasheet, then try FME in your environment.

Want the full details? Read the latest FME release notes for all features, dates, and docs.
‍

Checkout The Feature Management & Experimentation Summit
Read comparison of Harness FME with Unleash

Intent-Driven Assertions are Redefining How We Test Software

Reimagine QA with intent-driven, AI-powered assertions that reduce flakiness and align testing with real user outcomes.

Shibam Dhar

November 3, 2025

Time to read

Picture this: your QA team just rolled out a comprehensive new test suite ; polished, precise, and built to catch every bug. Yet soon after, half the tests fail. Not because the code is broken, but because the design team shifted a button slightly. And even when the tests pass, users still find issues in production. A familiar story?

End-to-end testing was meant to bridge that gap. This is how teams verify that complete user workflows actually work the way users expect them to. It's testing from the user's perspective; can they log in, complete a transaction, see their data?

‍

The Real Problem Isn't Maintenance. It's Misplaced Focus.

Maintaining traditional UI tests often feels endless. Hard-coded selectors break with every UI tweak, which happens nearly every sprint. A clean, well-structured test suite quickly turns into a maintenance marathon. Then come the flaky tests: scripts that fail because a button isn’t visible yet or an overlay momentarily blocks it. The application might work perfectly, yet the test still fails, creating unpredictable false alarms and eroding trust in test results.

The real issue lies in what’s being validated. Conventional assertions often focus on technical details- like whether a div.class-name-xy exists or a CSS selector returns a value, rather than confirming that the user experience actually works.

The problem with this approach is that it tests how something is implemented, not whether it works for the user. As a result, a test might pass even when the actual experience is broken, giving teams a false sense of confidence and wasting valuable debugging time.

Some common solutions attempt to bridge that gap. Teams experiment with smarter locators, dynamic waits, self-healing scripts, or visual validation tools to reduce flakiness. Others lean on behavior-driven frameworks such as Cucumber, SpecFlow, or Gauge to describe tests in plain, human-readable language. These approaches make progress, but they still rely on predefined selectors and rigid code structures that don’t always adapt when the UI or business logic changes.

What’s really needed is a shift in perspective : one that focuses on intent rather than implementation. Testing should understand what you’re trying to validate, not just how the test is written.

That’s exactly where Harness builds on these foundations. By combining AI understanding with intent-driven, natural language assertions, it goes beyond behavior-driven testing, actually turning human intent directly into executable validation.

‍

What Are Intent-Driven Natural Language Assertions?

‍

Harness AI Test Automation reimagines testing from the ground up. Instead of writing brittle scripts tied to UI selectors, it allows testers to describe what they actually want to verify, in plain, human language.

Think of it as moving from technical validation to intent validation. Rather than writing code to confirm whether a button exists, you can simply ask:

“Did the login succeed?” or
“Is the latest transaction a deposit?”.

Behind the scenes, Harness AI interprets these statements dynamically, understanding both the context and the intent of the test. It evaluates the live state of the application to ensure assertions reflect real business logic, not just surface-level UI details.

This shift is more than a technical improvement; it’s a cultural one. It democratizes testing, empowering anyone on the team, from developers to product managers, to contribute meaningful, resilient checks. The result is faster test creation, easier maintenance, and validations that truly align with what users care about: a working, seamless experience.

Harness describes this as "Intent-based Testing", where tests express what matters rather than how to check it, enabling developers and QA teams to focus on outcomes, not implementation details.

‍

Harness AI Test Automation Solving Traditional Testing Issues

‍

Traditional automation for end-to-end testing/UI testing often breaks when UIs change, leading to high maintenance overhead and flaky results. Playwright, Selenium, or Cypress scripts frequently fail because they depend on exact element paths or hardcoded data, which makes CI/CD pipelines brittle.

Industry statistics reveal that 70-80% of organizations still rely heavily on manual testing methods, creating significant bottlenecks in otherwise automated DevOps toolchains. Source

Harness AI Test Automation addresses these issues by leveraging AI-powered assertions that dynamically adapt to the live page or API context. Benefits include:

Reduced flakiness: Tests automatically handle UI changes without manual intervention
Lower maintenance costs: AI-generated selectors eliminate constant rewriting of selectors or brittle logic
Focus on business logic: Teams concentrate on verifying user-centric outcomes rather than technical details
Faster and No-Code test creation: Organizations report 10x faster test creation and the ability to cut test creation time by up to 90%

Organizations using AI Test Automation see up to 70% less maintenance effort and significant improvements in release velocity.

‍

How Harness AI Test Understands and Validates Your Intent

‍

Harness uses large language models (LLMs) optimized for testing contexts. The AI:

‍

Understands Your Intent: The AI parses your natural language assertion to grasp what you're trying to verify, for example, “Did the login succeed?" or “Is the button visible after submission?"
Analyzes Real Application Context: It evaluates the live state of your application by analyzing the HTML DOM and the rendered viewport. This provides the AI with a comprehensive understanding of the app's current behavior, structure, and visual presentation.
Maintains Context History: it keeps a record of previous steps and results, so the AI can use historical context when validating new assertions.
Learns from Past Runs: Outputs from prior test executions are stored and referenced, allowing future assertions to become more accurate and context-aware over time.
Provides Detailed Reasoning: Instead of just marking a test as “pass” or “fail,” the AI explains why, offering insights backed by both visual and structural evidence.

Together, these layers of intelligence make Harness AI Assertions not just smarter but contextually aware, giving you a more human-like and reliable testing experience every time you run your pipeline.

This context-aware approach identifies subtle bugs that are often missed by traditional tests and reduces the risks associated with AI “hallucinations.” Hybrid verification techniques cross-check outputs against real-time data, ensuring reliability.

For example, when testing a dynamic transaction table, an assertion like “Verify the latest transaction is a deposit over $500” will succeed even if the table order changes or new rows are added. Harness adapts automatically without requiring code changes
‍Harness Blog on AI Test Automation.

‍

Crucially, we are not asking the AI to generate code (although for some math questions it might) and then never consult it again; we actually ask the AI this question with the context of the webpage every time you run the test.

‍

Successful or not, the assertion will also give you back reasoning as to why it is true:

‍

‍

How Teams Use Harness AI Assertions

‍

Organizations across fintech, SaaS, and e-commerce are using Harness AI to simplify complex testing scenarios:

Financial services: Validating transaction tables and workflows with natural language assertions.
SaaS platforms: Checking onboarding flows and dynamic permission rules.
E-commerce: Confirming discount logic and inventory updates dynamically.
Healthcare: Transforming test creation from days to minutes

Even less-technical users can author and maintain robust tests. Auto-suggested assertions and natural language prompts accelerate collaboration across QA, developers, and product teams.

‍

‍

‍

You can also perform assertions based on parameters.

‍

An early adopter reported that after integrating Harness AI Assertions, release verification time dropped by more than 50%, freeing QA teams to focus on higher-value work. DevOpsDigest coverage

‍

Transforming QA with Harness AI: Faster, Smarter, Reliable

‍

Harness AI Test Automation empowers teams to move faster with confidence. Key benefits include:

Faster test creation: Write robust assertions in minutes rather than hours.
Reduced test maintenance: Fewer broken scripts and less manual debugging.
Improved collaboration: Align developers, testers, and product managers around shared intent.
Future-ready QA: Supports modern DevOps practices and continuous delivery pipelines.

Harness AI Test Automation turns traditional QA challenges into opportunities for smarter, more reliable automation, enabling organizations to release software faster while maintaining high quality.

Harness AI is to test what intelligent assistants are to coding: it allows humans to focus on strategy, intent, and value, while the AI handles repetitive validation (Harness AI Test Automation).

‍

Harness AI Test Automation represents a paradigm shift in testing. By combining intent-driven natural language assertions, AI-powered context awareness, and self-adapting validation, it empowers teams to deliver reliable software faster and with less friction.

‍

If you are excited about and want to simplify maintenance while improving test reliability, contact us to learn more about how intent-driven, natural-language assertions can transform your testing experience.

‍

DB Performance Testing with Harness FME banner

Engineering Blog

DB Performance Testing with Harness FME

Explore how to effectively conduct DB performance testing using Harness FME, comparing popular databases like MariaDB and PostgreSQL through feature flags. Gain insights on optimizing database integrity and performance to enhance your web applications.

Joshua Klein

December 31, 2024

Time to read

DB Performance Testing with Harness FME

Databases have been crucial to web applications since their beginning, serving as the core storage for all functional aspects. They manage user identities, profiles, activities, and application-specific data, acting as the authoritative source of truth. Without databases, the interconnected information driving functionality and personalized experiences would not exist. Their integrity, performance, and scalability are vital for application success, and their strategic importance grows with increasing data complexity. In this article we are going to show you how you can leverage feature flags to compare different databases.

Let’s say you want to test and compare two different databases against one another. A common use case could be to compare the performance of two of the most popular open source databases. MariaDB and PostgreSQL.

MariaDB and PostgreSQL logos

Let’s think about how we want to do this. We want to compare the experience of our users with these different database. In this example we will be doing a 50/50 experiment. In a production environment doing real testing in all likelihood you already use one database and would use a very small percentage based rollout to the other one, such as a 90/10 (or even 95/5) to reduce the blast radius of potential issues.

To do this experiment, first, let’s make a Harness FME feature flag that distributes users 50/50 between MariaDB and PostgreSQL

Now for this experiment we need to have a reasonable amount of sample data in the db. In this sample experiment we will actually just load the same data into both databases. In production you’d want to build something like a read replica using a CDC (change data capture) tool so that your experimental database matches with your production data

Our code will generate 100,000 rows of this data table and load it into both before the experiment. This is not too big to cause issues with db query speed but big enough to see if some kind of change between database technologies. This table also has three different data types — text (varchar), numbers, and timestamps.

‍‍Now let’s make a basic app that simulates making our queries. Using Python we will make an app that executes queries from a list and displays the result.

Below you can see the basic architecture of our design. We will run MariaDB and Postgres on Docker and the application code will connect to both, using the Harness FME feature flag to determine which one to use for the request.

The sample queries we used can be seen below. We are using 5 queries with a variety of SQL keywords. We include joins, limits, ordering, functions, and grouping.

We use the Harness FME SDK to do the decisioning here for our user id values. It will determine if the incoming user experiences the Postgres or MariaDB treatment using the get_treatment method of the SDK based upon the rules we defined in the Harness FME console above.

Afterwards within the application we will run the query and then track the query_executionevent using the SDK’s track method.

See below for some key parts of our Python based app.

This code will initialize our Split (Harness FME) client for the SDK.

We will generate a sample user ID, just with an integer from 1–10,000

Now we need to get whether our user will be using Postgres or MariaDB. We also do some defensive programming here to ensure that we have a default if it’s not either postgres or mariadb

Now let’s run the query and track the query_executionevent. From the app you can select the query you want to run, or if you don’t it’ll just run one of the five sample queries at random.

The db_manager class handles maintaining the connections to the databases as well as tracking the execution time for the query. Here we can see it using Python’s time to track how long the query took. The object that the db_manager returns includes this value

Tracking the event allows us to see the impact of which database was faster for our users. The signature for the Harness FME SDK’s track method includes both a value and properties. In this case we supply the query execution time as the value and the actual query that ran as a property of the event that can be used later on for filtering and , as we will see later, dimensional analysis.

You can see a screenshot of what the app looks like below. There’s a simple bootstrap themed frontend that does the display here.

app screenshot

The last step here is that we need to build a metric to do the comparison.

Here we built a metric called db_performance_comparison . In this metric we set up our desired impact — we want the query time to decrease. Our traffic type is of user.

Metric configuration

One of the most important questions is what we will select for the Measure as option. Here we have a few options, as can be seen below

Measure as options

We want to compare across users, and are interested in faster average query execution times, so we select Average of event values per user. Count, sum, ratio, and percent don’t make sense here.

Lastly, we are measuring the query_execution event.

We added this metric as a key metric for our db_performance_comparison feature flag.

Selection of our metric as a key metric

One additional thing we will want to do is set up dimensional analysis, like we mentioned above. Dimensional analysis will let us drill down into the individual queries to see which one(s) were more or less performant on each database. We can have up to 20 values in here. If we’ve already been sending events they can simply be selected as we keep track of them internally — otherwise, we will input our queries here.

selection of values for dimensional analysis

Now that we have our dimensions, our metric, and our application set to use our feature flag, we can now send traffic to the application.

For this example, I’ve created a load testing script that uses Selenium to load up my application. This will send enough traffic so that I’ll be able to get significance on my db_performance_comparison metric.

I got some pretty interesting results, if we look at the metrics impact screen we can see that Postgres resulted in a 84% drop in query time.

Even more, if we drill down to the dimensional analysis for the metric, we can see which queries were faster and which were actually slower using Postgres.

So some queries were faster and some were slower, but the faster queries were MUCH faster. This allows you to pinpoint the performance you would get by changing database engines.

You can also see the statistics in a table below — seems like the query with the most significant speedup was one that used grouping and limits.

However, the query that used a join was much slower in Postgres — you can see it’s the query that starts with SELECT a.i... , since we are doing a self-join the table alias is a. Also the query that uses EXTRACT (an SQL date function) is nearly 56% slower as well.

Conclusion

In summary, running experiments on backend infrastructure like databases using Harness FME can yield significant insights and performance improvements. As demonstrated, testing MariaDB against PostgreSQL revealed an 84% drop in query time with Postgres. Furthermore, dimensional analysis allowed us to identify specific queries that benefited the most, specifically those involving grouping and limits, and which queries were slower. This level of detailed performance data enables you to make informed decisions about your database engine and infrastructure, leading to optimization, efficiency, and ultimately, better user experience. Harness FME provides a robust platform for conducting such experiments and extracting actionable insights. For example — if we had an application that used a lot of join based queries or used SQL date functions like EXTRACT it may end up showing that MariaDB would be faster than Postgres and it wouldn’t make sense to consider a migration to it.

The full code for our experiment lives here: https://github.com/Split-Community/DB-Speed-Test

AI Agents vs Real-World Web Tasks: Harness Leads the Way in Enterprise Test Automation banner

Engineering Blog

AI Agents vs Real-World Web Tasks: Harness Leads the Way in Enterprise Test Automation

This article explores the capabilities of AI agents in executing real-world enterprise web tasks, focusing on their performance in test automation for complex banking and business applications. It provides valuable benchmarks and insights for engineering teams looking to enhance their automation st…

Ben Markines

December 31, 2024

Time to read

AI Agents vs Real-World Web Tasks: Harness Leads the Way in Enterprise Test Automation

Written by Deba Chatterjee, Gurashish Brar, Shubham Agarwal, and Surya Vemuri

‍

Can an AI agent test your enterprise banking workflow without human help? We found out. AI-powered test automation will be the de facto method for engineering teams to validate applications. Following our previous work exploring AI operations on the web and test automation capabilities, we expand our evaluation to include agents from the leading model providers to execute web tasks. In this latest benchmark, we evaluate how well top AI agents, including OpenAI Operator and Anthropic Computer Use, perform real-world enterprise scenarios. From banking applications to audit trail log navigation, we tested 22 tasks inspired by our customers and users.

Building on Previous Research

Our journey began with introducing a framework to benchmark AI-powered web automation solutions. We followed up with a direct comparison between our AI Test Automation and browser-use. This latest evaluation extends our research by incorporating additional enterprise-focused tasks inspired by the demands of today’s B2B applications.

The B2B Challenge

Business applications present unique challenges for agents performing tasks through web browser interactions. They feature complex workflows, specialized interfaces, and strict security requirements. Testing these applications demands precision, adaptability, and repeatability — the ability to navigate intricate UIs while maintaining consistent results across test runs.

To properly evaluate each agent, we expanded our original test suite with three additional tasks:

A banking application workflow requiring precise transaction handling, i.e., deposit of funds into a checking account
Navigation of a business application to view audit logs filtered by date
Interacting with a messaging application and validating the conversation in the history

These additions brought the total test suite to 22 distinct tasks varying in complexity and domain specificity.

Comprehensive Evaluation Results

User tasks and Agent results

The four solutions performed very differently, especially on complex tasks. Our AI Test Automation led with an 86% success rate, followed by browser-use at 64%, while OpenAI Operator and Anthropic Computer Use achieved 45% and 41% success rates, respectively.

The performance varies as tasks interact with complex artifacts such as calendars, information-rich tables, and chat interfaces.

Additional Web Automation Tasks

As in previous research, each agent executed their tasks on popular browsers, i.e., Firefox and Chrome. Also, even though OpenAI Operator required some user interaction, no additional manual help or intervention was provided outside the evaluation task.

The first additional task involves banking. The instructions include logging into a demo banking application, depositing $350 into a checking account, and verifying the transaction. Each solution must navigate the site without prior knowledge of the interface.

Our AI Test Automation completed the workflow, correctly selecting the family checking account and verifying that the $350 deposit appeared in the transaction history. Browser-use struggled with account selection and failed to complete the deposit action. Both Anthropic Computer Use and OpenAI Operator encountered login issues. Neither solution progressed past the initial authentication step.

Finding audit trail records in a table full of data is a common enterprise requirement. We challenged each solution to navigate Harness’s Audit Trail interface to locate two-day-old entries. The AI Test Automation solution navigated to the Audit Logs and paged through the table to identify two-day-old entries. Browser-use reached the audit log UI but failed to navigate, i.e., paginate to the requested records. Anthropic Computer Use did not scroll sufficiently to find the Audit Trail tile. The default browser resolution is a limiting factor with Anthropic Computer Use. The OpenAI Operator found the two-day-old audit logs.

This task demonstrates that handling information-rich tables remains challenging for browser automation tools.

Messaging Application Interaction

The third additional task involves a messaging application. The intent is to initiate a conversation with a bot and verify the conversation in a history table. This task incorporates browser interaction and verification logic.

The AI Test Automation solution completed the chat interaction and correctly verified the conversation’s presence in the history. Browser-use also completed this task. Anthropic Computer Use, on the other hand, is unable to start a conversation. OpenAI Operator initiates the conversation but never sends a message. As a result, a new conversation does not appear in the history.

This task reveals varying levels of sophistication in executing multi-step workflows with validation.

What Makes Solutions Perform Differently?

Several factors contribute to the performance differences observed:

Specialized Architecture: Harness AI Test Automation leverages multiple agents designed for software testing use cases. Each agent has varying levels of responsibility, from planning to handling special components like calendars and data-intensive tables.

Enterprise Focus: Harness AI Test Automation is designed with enterprise use cases in mind. There are certain features to take into account from the enterprise. A sample of these features includes:

security
repeatability for CI/CD integration
precision
ability to interact with an API
uncommon interfaces that are not generally accessible via web crawling, hence not available for training

Task Complexity: Browser-use, Anthropic Computer Use, and OpenAI Operator execute many tasks. But as complexity increases, the performance gap widens significantly.

Why Harness Outperforms

Custom agents for calendars, rich tables
API-driven validation where UI alone is insufficient
Secure handling of login and secrets

Conclusion

Our evaluation demonstrates that while all four solutions handle basic web tasks, the performance diverges when faced with more complex tasks and web UI elements. In such a fast-moving environment, we will continue to evolve our solution to execute more use cases. We will stay committed to tracking performance across emerging solutions and sharing insights with the developer community.

At Harness, we continue to enhance our solution to meet enterprise challenges. Promising enhancements to the product include self-diagnosis and tighter CI/CD integrations. Intent-based software testing is easier to write, more adaptable to updates, and easier to maintain than classic solutions. We continue to enhance our AI Test Automation solution to address the unique challenges of enterprise testing, empowering development teams to deliver high-quality software confidently. After all, we’re obsessed with empowering developers to do what they love: ship great software.

AI-Powered Resilience Testing with Harness MCP Server and Windsurf

Transform chaos engineering with AI! Use Harness MCP Server and Windsurf to run resilience tests through natural language - no complex setup required.

Ashutosh Bhadauriya

September 15, 2025

Time to read

The complexity of modern distributed systems demands proactive resilience testing, yet the old-school chaos engineering often presents a steep learning curve that can slow adoption across teams. What if you could perform chaos experiments using simple, natural language conversations directly within your development environment?

The integration of Harness Chaos Engineering with Windsurf through the Model Context Protocol (MCP) makes this vision a reality. This powerful combination enables DevOps, QA, and SRE teams to discover, execute, and analyze chaos experiments without deep vendor-specific knowledge, accelerating your organization's journey toward building a resilience testing culture.

Simplifying Chaos Engineering

Chaos engineering has proven its value in identifying system weaknesses before they impact production. However, traditional implementations face common challenges:

Technical Complexity: Setting up experiments requires deep understanding of fault injection mechanisms, blast radius calculations, and monitoring configurations.

Learning Curve: Teams need extensive training on vendor-specific tools and chaos engineering principles before becoming productive.

Context Switching: Engineers constantly move between documentation, experiment configuration interfaces, and result analysis tools.

Skill Scaling: Organizations struggle to democratize chaos engineering beyond specialized reliability teams.

The Harness MCP integration changes this landscape by bringing chaos engineering capabilities directly into your AI-powered development workflow.

Understanding Harness Chaos Engineering MCP Tools

The Harness Chaos Engineering MCP server provides six specialized tools that cover the complete chaos engineering lifecycle:

Core Experiment Tools

chaos_experiments_list: Discover all available chaos experiments in your project. Perfect for understanding your resilience testing capabilities and finding experiments relevant to specific services.

chaos_experiment_describe: Get details about any experiment, including its purpose, target infrastructure, expected impact, and success criteria.

chaos_experiment_run: Execute chaos experiments with intelligent parameter detection and automatic configuration, removing the complexity of manual setup.

chaos_experiment_run_result: Retrieve detailed results including resilience scores, performance impact analysis, and actionable recommendations for improvement.

Advanced Monitoring Tools

chaos_probes_list: Discover all available monitoring probes that validate system health during experiments, giving you visibility into your monitoring capabilities.

chaos_probe_describe: Get detailed information about specific probes, including their validation criteria, monitoring setup, and configuration parameters.

Setting Up Harness MCP Server with Windsurf

Prerequisites

Before beginning the setup, ensure you have:

Windsurf IDE installed
Harness Platform access with Chaos Engineering enabled
Harness API key with appropriate permissions
Go 1.23+ (to build from source)

Step 1: Build the Harness MCP Server Binary

You have multiple installation options. Choose the one that best fits your environment:

Building from Source

For advanced users who prefer building from source:

Clone the Repository:

git clone https://github.com/harness/mcp-server
cd mcp-server

‍

Build the Binary:

go build -o cmd/harness-mcp-server/harness-mcp-server ./cmd/harness-mcp-server

Step 2: Configure the Harness MCP Server in Windsurf

Navigate to your Windsurf Settings, click on Cascade, then Manage MCPs.

Click on View raw config to open your mcp_config.json file

‍

Add the below configuration to the file

{
  "mcpServers": {
    "harness": {
      "command": "/path/to/harness-mcp-server",
      "args": ["stdio"],
      "env": {
        "HARNESS_API_KEY": "your-api-key-here",
        "HARNESS_DEFAULT_ORG_ID": "your-org-id",
        "HARNESS_DEFAULT_PROJECT_ID": "your-project-id",
        "HARNESS_BASE_URL": "https://app.harness.io"
      }
    }
  }
}

‍

Step 3: Add the Path of your Binary and Harness Credentials

Gather the following information, add it to the placeholders and save the mcp_config.json file.

Command: Path to your built harness-mcp-server binary
API Key: Generate from your Harness account settings (Profile > My API Keys)
Organization ID: Found in your Harness URL or organization settings
Project ID: The project containing your chaos experiments
Base URL: Your Harness instance URL (typically https://app.harness.io)

Step 4: Verify Installation

Restart Windsurf: Close and reopen Windsurf to load the new configuration
Go back to Mange MCPs, you should see a list of tools available

Test Connection: Try a simple prompt like:

"List all chaos experiments available in my project"

‍

If successful, you should see chaos-related tools with the "chaos" prefix and receive a response with your experiment list.

AI-Powered Chaos Engineering in Action

With your setup complete, let's explore how to leverage these tools effectively through natural language interactions.

‍

Discovery and Learning Phase

Service-Specific Exploration:

"I am interested in catalog service resilience. Can you tell me what chaos experiments are available?"

‍

Expected Output: Filtered list of experiments targeting your catalog service, categorized by fault type (network, compute, storage).

‍

Deep-Dive Analysis:

"Describe briefly what the pod deletion experiment does and what services it targets"

‍

Expected Output: Technical details about the experiment, including fault injection mechanism, expected impact, target selection criteria, and success metrics.

‍

Understanding Resilience Metrics:

"Describe the resilience score calculation details for the network latency experiment"

‍

Expected Output: Detailed explanation of scoring methodology, performance thresholds, and interpretation guidelines.

‍

Experiment Execution Phase

Targeted Experiment Execution:

"Can you run the pod deletion experiment on my payment service?"

‍

Expected Output: Automatic parameter detection, experiment configuration, execution initiation, and real-time monitoring setup.

Structured Overview Creation:

"Can you list the network chaos experiments and the corresponding services targeted? Tabulate if possible."

‍

Expected Output: Well-organized table showing experiment names, target services, fault types, and current status.

Monitoring Probe Discovery:

"Show me all available chaos probes and describe how they work"

‍

Expected Output: Complete catalog of available probes with their monitoring capabilities, validation criteria, and configuration details.

‍

Analysis and Reporting Phase

Result Interpretation:

"Summarise the result of the database connection timeout experiment"

‍

Expected Output: Comprehensive analysis including performance impact, resilience score, business implications, and specific recommendations for improvement.

Probe Configuration Details:

"Describe the HTTP probe used in the catalog service experiment"

‍

Expected Output: Detailed probe configuration, validation criteria, success/failure thresholds, and monitoring setup instructions.

Comprehensive Resilience Assessment:

"Scan the experiments that were run against the payment service in the last week and summarise the resilience posture for me"

‍

Expected Output: Executive-level resilience report with trend analysis, critical findings, and actionable improvement recommendations.

The Road Ahead

The convergence of AI and chaos engineering represents more than a technological advancement, it's a fundamental shift toward more accessible, and intelligent resilience testing. By embracing this approach with Harness and Windsurf, you're not just testing your systems' resilience, you're building the foundation for reliable, battle-tested applications that can withstand the unexpected challenges of production environments.

Start your AI-powered chaos engineering journey today and discover how natural language can transform the way your organization approaches system reliability.

‍

Technical

Resilience Testing using Harness

Resilience testing is made easy to onboard and scalable with Harness Chaos Engineering. Harness provides all the required capabilities to practice resilience testing in SDLC.

Uma Mukkara

September 11, 2025

Time to read

‍

In today's fast-paced digital landscape, ensuring the reliability and resilience of your systems is more critical than ever. Downtime can lead to significant business losses, eroded customer trust, and operational headaches. That's where Harness Chaos Engineering comes in—a powerful module within the Harness platform designed to help teams proactively test and strengthen their infrastructure. In this blog post, we'll dive into what Harness Chaos Engineering is, how it works, its key features, and how you can leverage it to build more robust systems.

What is Harness Chaos Engineering?

Harness Chaos Engineering is a dedicated module on the Harness platform that enables efficient resilience testing. It's trusted by a wide range of teams, including developers, QA engineers, performance testing specialists, and Site Reliability Engineers (SREs). By simulating real-world failures in a controlled environment, it helps uncover hidden weaknesses in your systems and identifies potential risks that could impact your business.

At its core, resilience testing involves running chaos experiments. These experiments inject faults deliberately and measure how well your system holds up. Harness uses resilience probes to verify the expected state of the system during these tests, culminating in a resilience score ranging from 0 to 100. This score quantifies how effectively your system withstands injected failures.

But Harness goes beyond just resilence scoring— it also provides resilience test coverage metrics. Together, these form what's known as your system's resilience posture. This actionable insight empowers businesses to prioritize improvements and enhance overall service reliability.

Comprehensive Capabilities for End-to-End Resilience Testing

Harness Chaos Engineering is equipped with everything you need for thorough, end-to-end resilience testing. Here's a breakdown of its standout features:

Extensive Chaos Fault Library: Access over 200 out-of-the-box chaos faults through the enterprise Chaos Hub. These cover a broad spectrum of environments, including major cloud platforms, Linux and Windows systems, Kubernetes, Pivotal Cloud Foundry (PCF), and application runtimes like JVM.
Automated Resilience Probes: Measure resilience scores effortlessly with integrations to popular monitoring tools and services. Connect seamlessly with Kubernetes, Prometheus, Dynatrace, Datadog, New Relic, Splunk, and various cloud provider monitoring solutions to automate assessments.
ChaosGuard for Governance: Maintain control over chaos experiments with robust governance features. Define policies on who can run specific types of experiments, on which systems, and during designated time windows—ensuring safe and compliant testing.
GameDay Portal: SREs can easily orchestrate GameDays in production environments using the built-in portal. This facilitates collaborative, real-time exercises to prepare teams for actual incidents.
AI Reliability Agent: Harness incorporates AI to supercharge your chaos engineering efforts. Get intelligent recommendations for creating new experiments, optimizing existing ones, and troubleshooting probe failures.

Once you've created your chaos experiments and organized them into custom Chaos Hubs, the possibilities are endless.

Real-World Use Cases for Harness Chaos Experiments

Harness Chaos Engineering isn't just theoretical—it's built for practical application across your workflows. Here are some key use cases:

Integration with Deployment Pipelines: Embed chaos experiments directly into tools like Harness Continuous Delivery (CD), GitHub Actions, Jenkins, or GitLab. This ensures resilience is validated as part of your CI/CD process.
Combining with Load Testing: Run chaos alongside performance tools such as LoadRunner, Gatling, Locust, or JMeter to simulate high-stress scenarios and measure true system behavior under pressure.
GameDays and Production Testing: Use the GameDay portal to conduct structured exercises in live environments, fostering a culture of preparedness.
Disaster Recovery (DR) Testing: Validate your DR strategies by injecting faults that mimic outages, ensuring your failover mechanisms work as intended.

These integrations make it simple to incorporate chaos engineering into your existing processes, turning potential vulnerabilities into opportunities for improvement.

Easy Onboarding and Scalability

Getting started with Harness Chaos Engineering is straightforward, and it's designed to scale with your needs. Key features that support seamless adoption and growth include:

Centralized Chaos Execution Plane (Agentless Chaos): Manage experiments from a single, agentless control plane, simplifying operations across distributed environments.
Templates and Terraform Support: Reuse proven experiment templates and automate infrastructure with Terraform for faster setup.
Platform RBACs and Custom Chaos Hubs: Fine-tune access controls with Role-Based Access Control (RBAC) and create tailored Chaos Hubs to organize experiments by team or project.

Whether you're a small team just dipping your toes into chaos engineering or a large enterprise scaling across multiple clouds, Harness makes it efficient and manageable.

Deployment Options: SaaS and On-Premise

Harness Chaos Engineering is flexible in how you deploy it. The SaaS version offers a free plan that includes all core capabilities—even AI-driven features—to help you kickstart your resilience testing journey without upfront costs. For organizations preferring more control, an On-Premise option is available, ensuring compliance with internal security and data policies.

Conclusion: Build Resilient Systems with Harness

In an era where system failures can have cascading effects, Harness Chaos Engineering empowers you to test, measure, and improve resilience proactively. By discovering weaknesses early, you not only mitigate risks but also boost confidence in your infrastructure. Whether through automated probes, AI insights, or integrated workflows, Harness provides the tools to achieve a superior resilience posture.

Ready to get started? Explore the free SaaS plan today and transform how your teams approach reliability. For more details, visit the Harness platform or check out our documentation. Let's engineer chaos—for a more reliable tomorrow!

Learn How to Build a Chaos Lab for Real-World Resilience Testing

Checkout Harness Resilience Testing

Company News

Technical

Harness launches MCP tools to enhance its AI powered Chaos Engineering Capabilities

Chaos Tools on Harness MCP Server provide the opportunity to do the resilience testing using AI Agents and AI Tools with natural language prompts. Users can discover the resilience test capabilities, learn more about them and run the tests to measure the resilience data of the business critical applications or services. Chaos Tools lay the foundation for AI Powered Chaos Engineering using Harness in the Enterprise's journey towards building a strong resilience testing culture.

Uma Mukkara

July 10, 2025

Time to read

The practice of Chaos Engineering helps in doing resilience testing to get the measurable data for resilience of services or discover the weaknesses in them. Either way, the users will have actionable resilience data around their application services to check compliance and take proactive actions for improvements. This practice is on the rise in recent years because of heavy digital modernisation and move to cloud native systems. A successful adoption of this practice in an Enterprise requires consistent skilling of developers around chaos experimentation and resilience management, which is a challenge in itself.

‍

The uprise in the availability of AI LLMs and associated technology advancements such as AI Agents and MCP Tools make it possible to significantly reduce the skills required to do efficient resilience testing. Users will be able to do the resilience testing successfully with very little knowledge of the vendor tools and the actual chaos experiments details. The MCP tools will do the job of converting simple user prompts in the natural language to the required product API and provide the responses, which then are interpreted nicely by the LLMs.

Harness has published it's MCP server in open source here and the documentation is found here. In this article we are announcing the MCP tools for Chaos Engineering on Harness.

Introducing Chaos MCP tools:

The initial set of chaos tools that is released will help in discovering, understanding and planning the orchestration of chaos experiments for the end users. Following are the tools

chaos_experiments_list: List all the chaos experiments for a specific project in the account.
chaos_experiment_describe: Get details of a specific chaos experiment.
chaos_experiment_run: Run a specific chaos experiment.
chaos_experiment_run_result: Get the result of a specific chaos experiment run.

These MCP tools will help the user to start and make progress on resilience testing using simple natural language prompts.

Following are some of the prompts that user can effectively use with the above tools:

I am interested in catalog service resilience. Can you tell me what chaos experiments are available?
Describe briefly what a particular chaos experiment does?
Describe the resilience score calculation details of a specific chaos experiment?
Can you run a specific experiment for me?
Can you list the network chaos experiments and the corresponding services targeted? Tabulate if possible.
Summarise the result of a particular chaos experiment
Scan the experiments that were run against particular service in the last one week and summarise the resilience posture for me.

‍

An example report would look like the following with Claude Desktop

‍

How to setup the Harness MCP Server?

Harness MCP server can be setup in various ways. The installation setup of MCP server is available on the documentation site. Chaos tools are part of the Harness MCP server. Follow the instructions and setup the harness-mcp-server on your AI-editors or local AI desktop application like Claude Desktop.

How do I get started with resilience testing using Harness MCP tools?

Once MCP server is setup, provide simple natural language prompts to

First, discover the list of chaos experiment capabilities. You can even describe the resilience test scenario that you have in mind and check if your Harness project has suitable chaos experiments
Then, understand what each experiment does in detail.
Then, run chaos experiment of your choice and observe the resilience reports
Generate resilience summary or brief reports or detailed reports of a particular service or a set of service
Tabulate the results of resilience tests

Video Tutorial:

In the below video you can find details of how to configure Harness MCP server on Claude Desktop and do the resilience testing using simple natural language prompts.

Chaos Engineering with Claude Desktop">

Important Links:

New to Harness Chaos Engineering ? Signup here

Trying to find the documentation for Chaos Engineering ? Go here: Chaos Engineering

Want to build the Harness MCP server here ? Go here: GitHub

Want to know how to setup Harness MCP servers with Harness API Keys ? Go here: Manage API keys

Technical

Harness AI Test Automation: End-to-End, AI-Powered Testing for Faster, Smarter DevOps

Make testing easier to create and maintain with GenAI

Chinmay Gaikwad

Debaditya Chatterjee

June 25, 2025

Time to read

We’re excited to announce the General Availability (GA) of Harness AI Test Automation – the industry’s first truly AI-native, end-to-end test automation solution, that's fully integrated across the entire CI/CD pipeline, built to meet the speed, scale, and resilience demanded by modern DevOps. With AI Test Automation, Harness is transforming the software delivery landscape by eliminating the bottlenecks of manual and brittle testing and empowering teams to deliver quality software faster than ever before.

This launch comes at a critical time. Organizations spend billions of dollars on software quality assurance each year, yet most still face the same core challenges of fragility, speed, and pace. Testing remains a significant obstacle in an otherwise automated DevOps toolchain, with around 70-80% of organizations still reliant on manual testing methods, thus slowing down delivery and introducing risks.

Harness AI Test Automation changes that paradigm, replacing outdated test frameworks with a seamless, AI-powered solution that delivers smarter, faster, and more resilient testing across the SDLC. With this offering, Harness has the industry’s first fully automated software delivery platform, where our customers can code, build, test, and deploy applications seamlessly.

What Makes Harness AI Test Automation Different?

Harness AI Test Automation has a lot of unique capabilities that make high-quality end-to-end testing effortless.

"Traditional testing methods struggled to keep up, it is too manual, fragile, and slow. So we’ve reimagined testing with AI. Intent-based testing brings greater intelligence and adaptability to automation, and it seamlessly integrates into your delivery pipeline." - Sushil Kumar, Head of Business, AI Test Automation

‍

Some of the standout benefits of AI Test Automation are:

Create High-Quality Tests in Minutes: No Code Required

AI Test Automation streamlines the creation of high-quality tests:

Live Test Authoring: Record tests automatically by simply interacting with your web app. No scripting or coding expertise needed.
Intent-Based Testing: Author test cases and assertions using natural language prompts. Just type what you want to verify—like “Did the login succeed?”—and let AI handle the rest.
AI Auto Assertions: Harness AI suggests and auto-generates assertions after each step, pre-verified for seamless functionality. This ensures robust coverage and saves hours of manual work.
Visual Testing with AI: Validate complex UI elements, including dynamic and canvas-based components, using human-like visual testing. Use natural language to verify visual states, eliminating the need for fragile scripts.

Self-Healing, Adaptive Test Maintenance

AI Test Automation eliminates manual test maintenance while improving test reliability:

AI-Generated Selectors: Tests adapt automatically to UI and workflow changes, dramatically reducing test flakiness and maintenance by up to 70%.
Run Stable Tests Across Environments: Smart Selector technology and automatic URL translation ensure tests work everywhere, with no modifications required.

AI Test Automation adapts to the UI changes with its smart selectors

Intelligent, Scalable Test Execution

AI Test Automation boosts efficiency and streamlines testing workflows:

Intelligent Retries: AI Test Automation distinguishes between transient issues and real bugs, reducing false positives and debugging time.
Parallel Execution: Effortlessly scale to thousands of tests running in parallel, optimizing validation as your application grows.
Data-Driven Testing: Parameterized tests dynamically handle data at runtime, enabling flexible and efficient workflows.

Seamless Integration and Enterprise-Grade Security

AI Test Automation enables resilient end-to-end automation:

Unified with Harness CI/CD: AI Test Automation is natively integrated with Harness pipelines, enabling true end-to-end automation from build to deploy.
SOC 2 Type 2 Compliance: Enterprise-ready security and compliance, so you can trust your test automation at scale.
No-Code Simplicity + Advanced Flexibility: Add custom JavaScript or Puppeteer scripts for complex scenarios — get the best of both worlds.

Real Results: Customer Success Stories

At Harness, we use what we build. We’ve adopted AI Test Automation across our own products, achieving 10x faster test creation and enabling visual testing with AI – all fully integrated into our CI/CD pipelines. Since AI can be nondeterministic, and testing AI workflows using traditional test automation tools can be hard, we also use AI Test Automation to test our internal AI workflows.

“With AI Test Automation, I just literally wrote out and wireframed all the test cases, and in a matter of 15–20 minutes, I was able to knock out one test. Using the templating functionality, we were able to come up from a suite of 0 to 55 tests in the span of 2 and a half weeks.” - Rohan Gupta, Principal Product Manager, Harness

‍

Our customers are seeing dramatic results, too. For example, using Harness AI Test Automation, Siemens Healthineers slashed QA bottlenecks and transformed test creation from days to minutes.

Wasimil, a hotel booking and management platform, has reduced its test maintenance time by 50%, allowing it to release twice as frequently as it used to before using Playwright.

"With AI Test Automation, we could ship features, not just bug fixes. We don't wanna spend 30 to 40% of our engineering resources on fixing bugs because we can't be proud to ship bug fixes to our customers. Right? And they expect features, not bug fixes."
- Tom, CTO, Wasimil

‍

Watch AI Test Automation in action

‍

The Future Is Now: Join the Testing Revolution

With AI Test Automation, Harness becomes the first platform to offer true, fully automated software delivery, from build → test → deploy, without manual gaps or toolchain silos.

Create robust, intent-based tests 10x faster using natural language
Slash maintenance by up to 70% with self-healing, AI-powered tests
Accelerate release cycles and improve developer experience
Achieve higher quality, lower costs, and reduced risk—at enterprise scale

‍

Be part of the revolution: start your AI-powered testing journey with Harness today.

Ready to see AI Test Automation in action? Contact us to get started!

Split Embraces OpenFeature

Split joins OpenFeature to help define a vendor-neutral standard for feature flagging, advancing progressive delivery across all industries.

Harness Team

June 1, 2022

Time to read

Driving Feature Flag Standardization with OpenFeature

Split is excited to announce participation in OpenFeature, an initiative led by Dynatrace and recently submitted to the Cloud Native Computing Foundation (CNCF) for consideration as a sandbox program.

As part of an effort to define a new open standard for feature flag management, this project brings together an industry consortium of top leaders. Together, we aim to provide a vendor-neutral approach to integrating with feature flagging and management solutions. By defining a standard API and SDK for feature flagging, OpenFeature is meant to reduce issues or friction commonly experienced today with the end goal of helping all development teams ramp reliable release cycles at scale and, ultimately, move towards a progressive delivery model.

At Split, we believe this effort is a strong signal that feature flagging is truly going “mainstream” and will be the standard best practice across all industries in the near future.

What Is Feature Flagging?

Feature flagging is a simple, yet powerful technique that can be used for a range of purposes to improve the entire software development lifecycle. Other common terms include things like “feature toggle” or “feature gate.” Despite sometimes going by different names, the basic concept underlying feature flags is the same:

A feature flag is a mechanism that allows you to decouple a feature release from a deployment and choose between different code paths in your system at runtime.

Because feature flags enable software development and delivery teams to turn functionality on and off at runtime without deploying new code, feature management has become a mission-critical component for delivering cloud-native applications. In fact, feature management supports a range of practices rooted in achieving continuous delivery, and it is especially key for progressive delivery’s goal of limiting blast radius by learning early.

Think about all the use cases. Feature flags allow you to run controlled rollouts, automate kill switches, a/b test in production, implement entitlements, manage large-scale architectural migrations, and more. More fundamentally, feature flags enable trunk-based development, which eliminates the need to maintain multiple long-lived feature branches within your source code, simplifying and accelerating release cycles.

Where Does Split Come in?

While feature flags alone are very powerful, organizations that use flagging at scale quickly learn that additional functionality is needed for a proper, long-term feature management approach. This requires functionality like a management interface, the ability to perform controlled rollouts, automated scheduling, permissions and audit trails, integration into analytics systems, and more. For companies who want to start feature flagging at scale, and eventually move towards a true progressive delivery model, this is where companies like Split come into the mix.

Split offers full support for progressive delivery. We provide sophisticated targeting for controlled rollouts but also flag-aware monitoring to protect your KPIs for every release, as well as feature-level experimentation to optimize for impact. Additionally, we invite you to learn more about our enterprise-readiness, API-first approach, and leading integration ecosystem.

So, Why Is OpenFeature Needed?

Feature flag tools, like Split, all use their proprietary SDKs with frameworks, definitions, and data/event types unique to their platform. There are differences across the feature management landscape in how we define, document, and integrate feature flags with 3rd party solutions, and with this, issues can arise.

For one, we all end up maintaining a library of feature flagging SDKs in various tech stacks. This can be quite a lot of effort, and that all is duplicated by each feature management solution. Additionally, while it is commonly accepted that feature management solutions are essential in modern software delivery, for some, these differences also make the barrier to entry seem too high. Rather, standardizing feature management will allow organizations to worry less about easy integration across their tech stack, so they can just get started using feature flags!

Ultimately, we see OpenFeature as an important opportunity to promote good software practices through developing a vendor-neutral approach and building greater feature flag awareness.

Introducing OpenFeature

Created to support a robust feature flag ecosystem using cloud-native technologies, OpenFeature is a collective effort across multiple vendors and verticals. The mission of OpenFeature is to improve the software development lifecycle, no matter the size of the project, by standardizing feature flagging for developers.

By defining a standard API and providing a common SDK, OpenFeature will provide a language-agnostic, vendor-neutral standard for feature flagging. This provides flexibility for organizations, and their application integrators, to choose the solutions that best fit their current requirements while avoiding code-level lock-in.

Feature management solutions, like Split, will implement “providers” which integrate into the OpenFeature SDK, allowing users to rely on a single, standard API for flag evaluation across every tech stack. Ultimately, the hope is that this standardization will provide the confidence for more development teams to get started with feature flagging.

Final Thoughts

“OpenFeature is a timely initiative to promote a standardized implementation of feature flags. Time and again we’ve seen companies reinventing the wheel and hand-rolling their feature flags. At Split, we believe that every feature should be behind a feature flag, and that feature flags are best when paired with data. OpenFeature support for Open Telemetry is a great step in the right direction,” Pato Echagüe, Split CTO and sitting member of the OpenFeature consortium.

We are confident in the power of feature flagging and know that the future of software delivery will be done progressively using feature management solutions, like Split. Our hope is that OpenFeature provides a win for both development teams as well as vendors, including feature management tools and 3rd party solutions across the tech stack. Most importantly, this initiative will continue to push forward the concept of feature flagging as a standard best practice for all modern software delivery.

To learn more about OpenFeature, we invite you to visit: https://openfeature.dev.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Switch on a free account today, schedule a demo, or contact us for further questions.

Technical

The benefits of streaming architecture for feature flags

Split introduces real-time streaming architecture for faster feature flag updates with improved performance and reduced latency.

Harness Team

January 26, 2021

Time to read

Delivering feature flags with lightning speed and reliability has always been one of our top priorities at Split. We’ve continuously improved our architecture as we’ve served more and more traffic over the past few years (We served half a trillion flags last month!). To support this growth, we use a stable and simple polling architecture to propagate all feature flag changes to our SDKs.

At the same time, we’ve maintained our focus on honoring one of our company values, “Every Customer”. We’ve been listening to customer feedback and weighing that feedback during each of our quarterly prioritization sessions. Over the course of those sessions, we’ve recognized that our ability to immediately propagate changes to SDKs was important for many customers so we decided to invest in a real-time streaming architecture.

Our Approach to Streaming Architecture

Early this year we began to work on our new streaming architecture that broadcasts feature flag changes immediately. We plan for this new architecture to become the new default as we fully roll it out in the next two months.

For this streaming architecture, we chose Server-Sent Events (SSE from now on) as the preferred mechanism. SSE allows a server to send data asynchronously to a client (or a server) once a connection is established. It works over the HTTPS transport layer, which is an advantage over other protocols as it offers a standard JavaScript client API named EventSource implemented in most modern browsers as part of the HTML5 standard.

While real-time streaming using SSE will be the default going forward, customers will still have the option to choose polling by setting the configuration on the SDK side.

Streaming Architecture Performance

Running a benchmark to measure latencies over the Internet is always tricky and controversial as there is a lot of variability in the networks. To that point, describing the testing scenario is a key component of such tests.

We created several testing scenarios which measured:

Latencies from the time in which a feature flag (split) change was made
The time the push notification arrived
The time until the last piece of the message payload was received

We then ran this test several times from different locations to see how latency varies from one place to another.

In all those scenarios, the push notifications arrived within a few hundred milliseconds and the full message containing all the feature flag changes were consistently under a second latency. This last measurement includes the time until the last byte of the payload arrives.

As we march toward the general availability of this functionality, we’ll continue to perform more of these benchmarks and from new locations so we can continue to tune the systems to achieve acceptable performance and latency. So far we are pleased with the results and we look forward to rolling it out to everyone soon.

Choosing when Streaming or Polling is Best for You

Both streaming and polling offer a reliable, highly performant platform to serve splits to your apps.

By default, we will move to a streaming mode because it offers:

Immediate propagation time when changes are made to flags.
Reduced network traffic, as the server will initiate the request when there is data to be sent (aside from small traffic being sent to keep the connection alive).
Native browser support to handle sophisticated use cases like reconnections when using SSE.

In case the SDK detects any issues with the streaming service, it will use polling as a fallback mechanism.

In some cases, a polling technique is preferable. Rather than react to a push message, in polling mode, the client asks the server for new data on a user-defined interval. The benefits of using a polling approach include:

Easier to scale, stateless, and less memory-demanding as each connection is treated as an independent request.
More tolerant of unreliable connectivity environments, such as mobile networks.
Avoids security concerns around keeping connections open through firewalls.

Streaming Architecture and an Exciting Future for Split

We are excited about the capabilities that this new streaming architecture approach to delivering feature flag changes will deliver. We’re rolling out the new streaming architecture in stages starting in early May. If you are interested in having early access to this functionality, contact your Split account manager or email support at support@split.io to be part of the beta.

To learn about other upcoming features and be the first to see all our content, we’d love to have you follow us on Twitter!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

Overcoming Experimentation Obstacles In B2B

Maximize B2B experimentation success by selecting the right traffic type and using tailored metrics to normalize data for statistically valid results.

Harness Team

May 26, 2023

Time to read

Determine the Optimal Traffic Type

Consider the advantages and disadvantages of employing a tenant (e.g., account-based) traffic type versus a conventional user traffic type for each experiment. Unless it is crucial to provide a consistent experience for all users within a specific account, opt for a user traffic type to facilitate experimentation and measurement. This will significantly increase your sample size, unlocking greater potential for insights and analysis.

Important to note: In Split, the traffic type for an experiment can be decided on a case-by-case basis, depending on the feature change, the test’s success metrics, and the sample size needed.

Even if using a tenant traffic type is the only logical choice for your experiment, there are strategies you can employ to increase the likelihood of a successful (i.e., statistically significant) test.

Make a Plan for Lower Sample Sizes

Utilize the 10 Tips for Running Experiments With Low Traffic guide. You can thank us later!

Normalize Data for Tenant (Account) Traffic Types

Split’s application ensures that a 50/50 experiment divides tenants according to that percentage utilizing its deterministic hashing algorithm and Sample Ratio Mismatch calculator, but doesn’t consider that some tenants may have more users than others.

This can result in an unbalanced user allocation across treatments, as shown below, using “Accounts” as the tenant type.

The following tips can be applied to normalize the data when users aren’t balanced across tenants:

Utilize a “percent of” metric type: Use this metric type when the event you want to measure only needs to occur once per tenant to be considered a success (e.g., percent of accounts that upgraded plans). Instead of leveling your users across treatments, it equalizes the data.
Utilize a “ratio of two events” metric type: Use this metric type when measuring total engagement volume with a feature. This lets you control the numerator and denominator to equalize the metric data.

A reminder: The numerator is set to the event you want to count (e.g., number of clicks to “download desktop app”). The denominator is set to an event that occurs leading up to the numerator event (e.g., number of impressions or screen views where the user is prompted to “download desktop app”). The denominator can also be a generic event that tracks the number of users who saw the treatment.

If you follow these steps, you should be able to overcome most obstacles when running a B2B experiment. And remember: Split offers the unique flexibility to run experiments based on the traffic type that suits your needs. Learn more here.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Switch on a free account today, schedule a demo to learn more, or contact us for further questions and support.

Technical

Serverless Applications Powered by Split Feature Flags

Learn how to integrate Split feature flags in serverless environments using AWS Lambda, Redis ElastiCache, and the Split Synchronizer.

Harness Team

September 27, 2017

Time to read

The concept of Serverless Computing(https://en.wikipedia.org/wiki/Serverless_computing), also called Functions as a Service (FaaS) is fast becoming a trend in software development. This blog post will highlight steps and best practices for integrating Split feature flags into a serverless environment.

A quick look into Serverless Architecture

Serverless architectures enable you to add custom logic to other provider services, or to break up your system (or just a part of it) into a set of event-driven stateless functions that will execute on a certain trigger, perform some processing, and act on the result — either sending it to the next function in the pipeline, or by returning it as result of a request, or by storing it in a database. One interesting use case for FaaS is image processing where there is a need to validate the data before storing it in a database, retrieving assets from an S3 bucket, etc.

Some advantages of this architecture include:

Lower costs: Pay only for what you run and eliminate paying for idle servers. With the pay-per-use model server costs will be proportional to the time required to execute only on requests made.
Low maintenance: The infrastructure provider takes care of everything required to run and scale the code on demand and with high availability, eliminating the need to pre-plan and pre-provision servers servers to host these functions.
Easier to deploy: Just upload new function code and configure a trigger to have it up and running.
Faster prototyping: Using third party API’s for authentication, social, tracking, etc. minimizes time spent, resulting in an up-and-running prototype within just minutes.

Some of the main providers for serverless architecture include, Amazon: AWS Lambda; Google: Cloud Functions; and Microsoft: Azure Functions. Regardless of which provider you may choose, you will still reap the benefits of feature flagging without real servers.

In this blog post, we’ll focus on AWS lambda with functions written in JavaScript running on Node.js. Additionally we’ll highlight one approach to interacting with Split feature flags on a serverless application. It’s worth noting that there are several ways in which one can interact with Split on a serverless application, but we will highlight just one of them in this post.

Externalizing state

If we are using Lambda functions in Amazon AWS, the best approach would be to use ElastiCache (Redis flavor) as an in-memory external data store, where we can store our feature rules that will be used by the Split SDKs running on Lambda functions to generate the feature flags.

One way to achieve this is to set up the Split Synchronizer, a background service created to synchronize Split information for multiple SDKs onto an external cache, Redis. To learn more about Split Synchronizer, check out our recent blog post.

On the other hand, the Split Node SDK has a built-in Redis integration that can be used to communicate with a Redis ElastiCache cluster. The diagram below illustrates the set up:

Step 1: Preparing the ElastiCache cluster

Start by going to the ElastiCache console and create a cluster within the same VPC that you’ll be running the Lambda functions from. Make sure to select Redis as the engine:

Step 2: Run Split Synchronizer as a Docker Container Using ECS

The next step would be to deploy the Split Synchronizer on ECS (in synchronizer mode) using the existing Split Synchronizer Docker image. Refer to this guide on how to deploy docker containers.

Now from the EC2 Container Service (ECS) console create an ECS cluster within the same VPC as before. As a next step create the task definition that will be used on the service by going to the Task Definitions page. This is where docker image repository will be specified, including any environment variables that will be required.

As images on Docker Hub are available by default, specify the organization/image:

And environment variables (specifics can be found on the Split Synchronizer docs):

Any Docker port mapping needed can be specified during the task creation.

At this point we have the EC2 cluster and we have our task. The next step is to create a service that uses this task — go to your new cluster and click “create” on the services tab. You need to at least select the task and the number of tasks running concurrently:

Finish with any custom configuration you may need, review and create the service. This will launch as many instances as specified. If there were no errors, the feature flags definitions provided by the Split service should already be in the external cache, and ready to be used by the SDKs integrated in the lambda functions that we’ll set up in the next section.

Step 3: Using Feature Flags on Lambda Functions

There are two things we need to know before we start:

The Lambda programming model uses a handler function, which Lambda will call when the function is triggered. It’s also the one that receives parameters from AWS:
- the event that triggered the function;
- the context;
- a callback function to return the results.
Lambda functions can be written directly on the AWS console as long as it doesn’t have any library dependencies. If extra dependencies are needed, a deployment package should be built, which is no more than a .zip file with the functions code, as well as the required dependencies. Since we’ll be integrating a Split SDK to provide feature flags, we will be adding extra dependencies, and as such will have to create a deployment package.

Our custom code

On the custom function, install the @splitsoftware/splitio (NPM(https://www.npmjs.com/package/@splitsoftware/splitio)) npm package and include the node_modules folder on the zip.

Step-by-step of an example function:

Go to the working directory of the project and install the @splitsoftware/splitio package.
Create an index.js file. Require the @splitsoftware/splitio package there.
Instantiate the SDK client on consumer mode making sure it points to the correct Redis cluster (we’ll use an env variable for this in the next step). Don’t set a prefix for storage configurations unless same prefix was used for the Synchronizer.
Write whatever code is required but export the function as handler.

One important thing to note — as async storage is used, async calls to the API will be received.

View the example code below:

Once the code has been written, it’s time to prepare the deployment package by creating a zip that includes index.js and the node_modules folder. Next, go to the Lambda console and select “create function”. On the blueprint selection page, select “Author from scratch” option and dd the trigger that will be used. It’s recommended not to enable it until you’re certain that the function works as expected.

Upload the code

On the Lambda function code section, select the “Upload a .ZIP file” option. It can also be uploaded to S3 and the URL specified. Any environment variables required on Lambda can specified here (for example the one pointing to Redis ElastiCache needed in the previous step):

Set up your handler function in the section called “Lambda function handler and role”. Leave the default as index.handler.

Note that the first part is the file name inside the zip where the handler function is exported, and the second part is the function name. For example, if a file is called app.js and the function is called myHandler, the “Handler” value would be app.myHandler.

On the Advanced settings of this step, set the VPC where the ElastiCache cluster is.

Once the roles and anything else that is required has been configured, click next, review and create the function.

That’s it! To test your function manually, just click the “Test” button, select a the synthetic trigger of preference and check that it works as expected.

Summary

There are few ways to make use of Split Feature Flags in a serverless application. This blog post covers the case of using Split Synchronizer and for javascript functions.

In future posts we’ll share another approach using Split “callhome” or Split Evaluator which is a microservice that can evaluate flags and return the result, in addition to storing the rules to evaluate the flags as highlighted in this post.

In case you’re wondering “can’t I hit the Split servers from my Lambda function?” The answer is yes, in a “standalone” mode, but it won’t be as efficient as having the state in one common place i.e. Redis. It’s NOT recommended to run the SDK in a standalone mode due to the latency it may incur at the creation of one SDK object per function.

For further help using Split synchronizer in a serverless environment contact us or use the support widget in our cloud console — we’re here to help!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

How Feature Flags Can Improve Your Logging

Use feature flags to dynamically control log levels and sampling, reducing noise and cost while staying ready for production incidents.

Harness Team

April 21, 2023

Time to read

Smarter Logging with Feature Flags and Dynamic Configurations

For any software company, reducing logs helps to save money. We also know precisely how painful it is to have a production problem or even an incident only to find that we haven’t logged nearly enough. There are several different strategies to try to balance these two conflicting goals, including configuration to control log levels and sampling. In this post, we will discuss how feature flags can help you improve your logging strategy. As a result, you can update variables without pushing a configuration change, allowing for faster modifications in a crisis.

First, whether or not you use feature flags, we recommend wrapping your logging in an internal library. This has a few advantages. It allows you to keep a consistent format across your logs. Instead of relying on each developer to formulate their own logs, you can have them specify a few parameters and format the rest for them. Additionally, it allows you to automatically fill in fields you want everywhere, such as trace_id or user_id (or whatever applies to your application). Finally, it gives you a single location to add a feature flag.

Now that we have a feature flag for our logs, how does that help? We will set it up to use that feature flag to control sampling rate and log level per class. There are a few ways to do this, and we’ll follow up with another post about how we actually did this for our own logs. For this post, though, we will explain one of the other options.

At a high level, we will set up a default logging level with the ability to override this—at the class level. To do this, we’ll start by creating a treatment for each log level.

Define Treatments

Once we have created the Split with the log levels, we need to create a Logback Interceptor class. This will fetch the Split changes periodically and sets up the right level to the ROOT logger in runtime. The next class diagrams illustrate the idea:

And the next code snippet implements the Logback Interceptor:

Get It Running

To get it running, add a single call to the static method init() injecting the SplitClient (see how to setup Split SDK here) and the Split name:

So, with this simple code you can handle runtime log levels without stopping your program execution.

Taking this further, we can add a little more complexity to handle not only the log level but to also control the number of logs by sampling. To do this, we need to create a Logback appender and use the Split feature known as Dynamic Configuration:

The first step is to configure the Split with the desired configuration approach; you can use key-value pairs or a custom JSON.

Dynamic Configuration

In this example, we are setting up a custom JSON value to have more flexibility in our configuration:

Once we have set our dynamic configuration per treatment, we can write our code.

In this case, we will create a concurrent storage class to share the dynamic configuration across our LogbackInterceptor class. The LogbackInterceptor will fetch data from Split and write the configuration values into storage. The Logback appender will be reading from the storage when sampling log lines.

The next diagram illustrates this approach:

So, following the previous diagram the code of each class will be:

Now you can see how feature flags, and especially Split, can help improve your logging. We allow you to log less with the piece of mind that you can quickly and easily increase logs if something happens. And we can do all of this without pushing a code or configuration change and without littering the code with separate feature flags just waiting to be flipped in case of emergency.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.

Technical

How to Reduce Code Cycle Time with Feature Flags

Improve engineering cycle time with feature flags that decouple deployment from release, enabling faster, safer delivery at every stage.

Harness Team

April 29, 2021

Time to read

Accelerating Delivery Without Sacrificing Safety

Companies that invest constantly in tools to help engineering teams reduce the cycle time show a higher degree of employee satisfaction, leading to less employee frustration and higher retention over time. Sounds great, right?

We also know that the most elite engineering organizations are able to move code from being written and committed and into production in less than one day. The next tier down, the highly successful engineering orgs have cycle time at one week or less. And the rest follow. Based on this classification, how do you rank your team?

This data comes from both DORA research and a joint webinar we recently conducted with Bryan Helmkamp from CodeClimate in which we discussed one way to define development cycle time as well as tips on how to improve it. I was excited about this webinar as there is a strong correlation with how feature flags can help reduce the time of several phases of it.

Before we dig deeper on how to reduce cycle time, let’s talk about what it is. Brian defines cycle time as the amount of time spent between when code is committed and when it is shipped to production.

Bryan and team have an opinionated view on how to define the phases. They recommend you start measuring the code cycle time when the pull request is made and to treat the time more like a Service Level Objective(https://www.atlassian.com/incident-management/kpis/sla-vs-slo-vs-sli) (SLO) where you measure 90th or 95th percentile, as opposed to just measuring averages.

I’ll go ahead and describe code cycle times below and provide my commentary regarding where feature flags can help decrease the time of some of the phases.

Time to Open

Time to open measures the time from first commit to the time the pull request is opened. It accounts for the largest chunk of overall cycle time. Bryan mentioned that when people create smaller pull requests they tend to be picked up quicker, reviewed faster and deployed faster as well.

Tips to decrease time to open

Reduce multitasking and limit work in progress.
Identify and limit rework
Track and reduce pull request size

Here is where feature flags have a profound positive impact on cycle time. Why is that? Because when using feature flags, you separate a push from a release. And when you do that, engineers feel safer merging code and shipping it to production. The new code path is gated by a flag that is not yet visible to users. And the byproduct of that is smaller pull requests, faster time to review and shorter Time to Open cycle time.

Time to First Review

This is the amount of time from when the pull request is opened to the time of first review. This is an indicator of team collaboration patterns in the organization, As we all know, slow reviews increase the amount of work-in-progress.

Feature flags again help to decrease this time by allowing engineers to create pull requests in small batches that in turn help reviewers review and approve outstanding pull requests faster.

Engineering leaders must make sure that coding cadence is not the only thing that gets rewarded. Code reviews have to be something that the leader rewards as well, given the implications to the cycle time and development cadence.

Other investments you can make are in tooling and integrations (like Slack) to make sure people are aware there are pull requests ready to be reviewed and make collaboration more efficient.

Time to Approve

Time to approve refers to the time from the first review to when the pull request is ready to be merged.

This speaks to finding alignment around the desired state of the PR to be considered ready and must balance speed and thoroughness.

Things to look out for here include what percentage of comments imply actions. Driving this metric to any extreme is not good. You must seek a balance. Too many comments slow things down and too few can lead to defects.

Lastly, the number of review cycles is another metric to optimize. Too many back and forths leads to a decrease in cycle time. Bryan’s team found their sweet spot at four review cycles per PR.

Time to Deploy

Time to deploy is the time from when the PR is ready to go until it is deployed to production.

You can decrease the time in this phase of the cycle time by investing in reliable Continuous Integration (CI) tools to increase people’s confidence in the deploy process. Usually when there is no investment in this area, as the code base grows people develop lack of confidence in the deployment due to the fear of breaking something. Automate as much as you can, such as testing and security checks (also known as shifting left).

Learn More About How Feature Flags Can Allow You to Release with Speed and Safety

Feature flags play an important role here and it is something Bryan calls out in his presentation. Code can arrive to production much faster and with lower friction when using feature flags (remember, a code push doesn’t imply a release). Now, this opens the question if there should be a fifth phase in the cycle time that includes time to release. What do you think?

If you’d like to learn more about DORA, the research organization mentioned in this post, check out their website, and their survey data on high-performing companies.

If you’re ready to get started with Split you can sign up for our forever-free tier. If you’d like to learn more, check out these resources:

If you’d like to catch our newest content right off the presses, we’d love to have you follow us on Twitter, YouTube, and Facebook!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

7 Ways We Use Feature Flags Every Day at Split

Discover 7 powerful ways Split uses feature flags to boost velocity, reduce risk, automate ops, and debug faster in production.

Harness Team

February 10, 2022

Time to read

How often do you build a product that you end up using every day? At Split, we “dogfood” our own product in so many ways that our engineering and product teams are using Split nearly every day. It’s how we make Split better. Using your own product as a tool to build your product gives you a front-row experience of how valuable your product is to your customers, how well it solves specific use cases, where the pain points are, and so much more.

I believe every software company should deploy feature flags in their product. Why? Because feature flags provide a safety net to make engineering teams more productive by allowing engineers to ship code faster, they open up the possibility of testing in production and enable devs and product teams to quickly kill any feature that causes product degradation, often in a matter of seconds.

Today, I’d like to walk you through a few of the ways we’re using feature flags at Split. Some of these will hopefully be familiar and obvious, but my hope is that others will give you ideas for new ways to drive efficiency, innovation, or simply product-market fit in your organization.

1. Testing in Production

We talk a lot about testing in production because it’s one of the most obvious, and obviously useful reasons to deploy feature flags. When a feature is ready for delivery, (or at minimum, it has passed all testing in your staging or pre-production environment) it can be deployed to production in a dark way. This means that the binary containing the new feature is in production but no user can access it as the flag is turned off.

At Split, we first toggle the new feature on for internal users to complete testing. Once it’s ready and the functionality has passed all testing criteria, we will ramp up the feature and expose it to 5%, 10%, 25%, 50%, and 100% of our users. For some feature releases, we’ll literally stop at each of those percentage rollouts to confirm everything is still working as intended before moving on. For others, we’ll use a subset of those steps. Only once we’ve reached 100% is the feature considered to be fully rolled out, at which point we remove the flag.

2. Entitlement

We also use flags to gate functionality based on the product tier an account or user is in. This is a really common feature flag use case. For example, if you are a free customer you only get access to email support. However, for our paid customers with premium support packages, we, via feature flags, enable chat support as well.

A product can be automated so that when a user upgrades to a new product tier, a feature flag is updated to include this new customer in the allowed customer list that has access to premium support functionality, like chat.

3. Compliance

In many SaaS companies, customer success and engineering teams require some degree of access to production and customer data in order to help customers with their support requests. This obviously comes with a variety of regulatory and compliance issues, depending on your industry and certifications.

A practice we’ve adopted at Split is to gate the access to customer data or impersonation through a feature flag. Only a limited set of employees who have passed a rigorous background and financial check can have access to customer data. Every time new access is required, a feature flag grant request is created, a Split administrator can approve or reject the feature flag change request, and upon approval, this employee, via a feature flag grant, can access the impersonation functionality. For this, we leverage our recently released feature; approval flows. This segregation of duties is a key part of the SOC2 certification, and not having this practice in place can delay the certification approval process.

4. Infrastructure Migration

Feature flags are commonly used to help with technology migrations and to migrate from monolith to microservices. At Split, we use flags where there is any migration of technologies, for example, while evaluating a migration from AWS Kinesis to Kafka. Stick with me on this one, since we’re going to dip a toe into the world of experimentation, and how it’s enabled by feature flags. In a typical scenario, you would place a flag to enable a dark-write (or double writes) operation into the new system to test traffic live and verify how it will perform in production. Then a second flag is created to enable dark-read, similar to the prior flag to verify the read performance without affecting the performance of the user (hence, dark reads). Finally, a third flag is created to switch over the traffic to send requests to the new solution.

Cost Analysis

Throughout the life of Split, we have had a few opportunities to replace existing infrastructure, typically as part of a scaling conversation. Before we dig into the migration itself, we have to answer the question “Is the new system more expensive than the current?”. The quickest and lowest-risk approach to answering that question is to place the new system being evaluated next to the current one and send dark traffic for a short period of time, and then extrapolate the cost. Doing that is more resource-efficient since one can run an evaluation for one day with none to little side effects and extrapolate the cost.

At Split, we used this technique to evaluate a migration from using Kinesis Stream as the queue to receive all incoming flag evaluation data to SQS. SQS was placed behind a feature flag that allowed dark writes with the purpose of gathering data for 24 hours to then extrapolate what it would cost if we were to run it permanently. We were surprised to find that it ended up being a more economical and more performant solution and we prioritized resources to move to SQS in the end.

5. Feature Flags as Circuit Breakers

Michael Nygard popularized the Circuit Breaker pattern to prevent a cascade of failures in a system. We use feature flags as a main disconnect for functionality that is critical to behave within certain values of tolerance. If those values are exceeded, a simple toggle can disconnect that functionality from being used or alternatively use percentage rollouts to prevent it from being used excessively. The end goal? Make sure that system downstream is stable and healthy.

At Split, we use this pattern for things like external API endpoints, data collection services, frequency of synchronization with external systems, etc.

6. SRE Runbook Automation

Because we use feature flags as manual circuit breakers, it is relatively easy to automate remediations when certain conditions are met. For example, if we gate certain functionality like data ingestion from source A, and that pipeline is getting more load than the system can handle, we can enable (or disable a flag) to indicate that a certain amount of noncritical traffic should be dropped to preserve the integrity of the system.

Currently, we are experimenting with Transposit to build automated runbooks so engineers can act automatically following a pre-established process to mitigate an incident. These processes will involve disabling, enabling, or changing the exposure of a feature flag as part of the runbook and with a click of a button. As part of this work, we’ll be excited to release runbook templates for our customers to use. Stay tuned!

7. Debugging

This approach can be controversial since many logger frameworks allow you to enable debug or verbose mode natively. The advantage of using flags for this use case and wrapping a more verbose logging level around a feature flag is that you can target a specific customer or condition, vs doing that at the logger level, which is more coarse and tends to be more binary; verbose on or off. With feature flags, you can target a verbose mode for network traffic for a given user and set of users within a certain account, or a user agent, among others. Once the debugging session is done, the flag is turned back off.

We use this technique at Split when a support ticket is escalated to engineering for deeper analysis, and it has contributed to lower support request resolution times. One particular example is a flag that enables debugging for our SAML (single sign-on) functionality. Historically it has been an area with recurrent support tickets given the number of third-party identity providers, each of which has their own nuances. Having this logic toggle to turn on verbose logging has helped our support organization reduce support ticket resolution time.

Learn More About How You Can Increase Productivity and Reduce Uncertainty with Feature Flags

I hope the set of use cases mentioned above in this post can serve as a starting point for those readers that are new to the concepts of experimentation and feature flags, or to deepen the usage of Split product for those who are already using Split.

If you’re ready to get started with Split you can sign up for our forever-free tier. If you’d like to learn more, check out these resources:

If you’d like to catch our newest content right off the presses, we’d love to have you follow us on Twitter, YouTube, and Facebook!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

‍

Technical

When to Use a Holdback Pattern

Use holdback experiments to validate long-term impact while delivering short-term value to most users.

Harness Team

February 10, 2022

Time to read

What is a Holdback?

Some experiments require a long time to reach significant results. If one option is preferable in the short term, you can split unevenly for a long period of time: expose 95% of traffic to the preferred option, and 5% to the other one. That pattern and the second variant are both known as a holdback.

When is it Appropriate to Use That Approach?

Some changes to your web service or product, like making the purchase flow easier to navigate, are meant to raise business-critical metrics immediately. Others, like a new channel for customer service, might improve customer satisfaction rapidly but will only have a measurable, compounding effect on retention and other business-critical metrics in the long run. You can confirm that customers like the new option by looking at the Net Promoter Score (NPS). However, should you expose half of your users to a worse experience for months to measure the impact on churn?

What If the Experiment Can’t Last Long Enough?

There are many cases where an experiment should not last more than a few weeks, three months at most, to keep the product cycle manageable. However, some effects, like customer churn, can take longer to measure. Say you want to measure the impact of your change on churn. Say your customers book a holiday or review their retirement plan only once a year. In either of those case, a ten-week experiment is too short to expect any customer return and gather data to measure churn.

There are several options:

You can end the split early; have all the traffic revert to your default option; wait several weeks or months after both variants have merged; finally look at the impact of the split long after. In other words: did changing something in April and May have any measurable impact in October, or in April the following year? The conversion of experience in between will likely contaminate your results and require non-trivial corrections. You can limit your experiment to people who only access your service in April, for instance.
You can infer, based on previous observations, that an improvement in one leading metric should improve in a lagging metric. For instance, you have noticed that an increase of 5 points of NPS leads to a decrease of 1% in churn. That would be speculative.

An approach that we could recommend is to run the experiment as expected but to set the short-term goal, like customer satisfaction survey as your objective criteria, and roll it out to all customers if the impact after a few weeks is significantly positive. Months later, you can check whether your overall retention has indeed improved compared to before your experiment. That comes with the limit of a Before-and-After comparison.

Rather than rely on a noisy approach like Before-and-After, the current best practice to confirm the effect on business metrics would be to run an experiment that lasts longer than what quarterly planning would allow. If that seems costly because one option has clear short-term benefits, we can recommend a small twist: instead of rolling out to all and losing information, or maintaining an expensive 50/50 split, we can hold back a small portion of your traffic, say 5% or 10% for longer, say several months, two years maybe. Then, while most of your clients see improved customer service, the minority held-back will continue to experience the previous version of the service. It’s not ideal for users; we’d have to maintain two options, but you will be able to compare the actual, compounded impact of a better service over months.

With that third approach, you can still measure what it’s like to have better customer service for a couple of purchase cycles; not only that, you can also measure the impact of expecting excellent service, time after time over extended periods. For example, it might increase entitlement, it could affect the brand positively, it could drive stories about exceptional situations where a better service was helpful.

Can Keeping a Small Group Out Be Significant?

The first question from your statisticians or analysts will likely be “Would we be able to measure the impact over only 5% of the audience? Wouldn’t that mean three times less power?” It would, roughly (5% is ten times fewer units than 50% and, following the central limit theorem, √10 is about 3) but a longer test set-up would be more sensitive: more visitors can enroll in the experiment and some effects compound.

More importantly, with customers being exposed to customer service multiple times, their retention should not just improve but compound. If your retention improves by 10% over one month, it’s 21% better after two, 77% better after six months. That’s several times larger. Those more consequential effects are easier to detect.

Which Variant to Roll Out and Which One to Hold Back?

If you run a balanced 50/50 test, you know which variant offers the most short-term positive value, or which one is the most promising overall. To minimize the negative impact on the business from testing, you want to roll out to 90 or 95% of user population the variant with the most promising outcome, especially on leading indicators: best customer satisfaction, most items marked as favorites, etc.

You can decide to pick the option that will be easiest to deactivate, in case the holdback experiment gives surprising results. Introducing new interactions means that removing them will come at a cost. Keep in mind however that a hold-back is here to confirm a previous result, possibly measure its impact more accurately—it rarely flips the overall outcome.

Another way to decide which option to prioritize is to think about the possibilities that this opens. Allowing customers to identify their favorites (without buying) allows you to reactivate them with more purchase opportunities. It allows your machine learning team to train better recommendations. Those improvements can contribute to assigning more value to your preferred option.

Of course, if your users talk to each other, those left behind might resent that they don’t have a better experience. You might get bad press from the discrepancy. Exercise discretion and override the holdback when it is more expensive than interesting. Still, this effort will be beneficial in the long run to convince your executive stakeholders to invest in better service for long-term objectives.

Keep Technical Debt Under Control

When running this process, one of the most common issues is maintaining the old code or the previous operational processes for longer. That is a legitimate source of concern for the software engineers and the operational managers who will want to move on. Getting them engaged with the process is critical. You should explain the value of experimentation and why a holdback is useful when dealing with long-term effects. They will generally understand that this aligns with their objective of having a more streamlined experience and investing to resolve technical and operational debt in the long term too.

Learn More About the Holdback Pattern

When you run an experiment that proves beneficial over the short term, you will want to roll it out as soon as you have significant results. However, if you still want to investigate its long-term effect, you also need to know the reliability of an experiment. To make sure as many users as possible benefit from the improved experience, roll it out to the majority of users, say 95%. Keep a minority of users in long-term control group. This is known as a holdback.

After several months, you should have a strong signal about the long-term impact on key metrics, notably those that compound. Remember to switch the holdback to the new experience when your experiment is over.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

How Feature Flags Can Help You Optimize Your Conversion Rates

Use feature flags to optimize conversion rates through personalized experiences, real-time testing, and cross-team collaboration.

Harness Team

June 9, 2023

Time to read

In a digital business landscape marked by rapid evolution and customer-centricity, conversion rates have emerged as a vital metric of success. They are more than just numbers or percentages. Conversion rates signify the effectiveness of your marketing strategies and the resonance of your offerings with your target audience.

Your conversion rate is a clear indicator of how well you’re meeting your customers’ needs and wants. High conversion rates suggest that you’re providing value in a way that resonates with your audience, leading them to take the desired actions, whether it’s making a purchase, signing up for a newsletter, or any other goal you’ve set. Conversely, a lower conversion rate can signal a disconnect between your offerings and your audience’s expectations or needs. Understanding and optimizing conversion rates is, therefore, crucial for the growth and profitability of your business.

Feature flags are tools commonly used in software development for controlling the visibility and functionality of certain application features. However, their potential goes beyond just development. In the context of conversion rate optimization, feature flags can become a marketer’s secret weapon. They provide an opportunity to carry out extensive testing, refine the user experience, and, consequently, enhance the effectiveness of your conversion strategy.

The use of feature flags in conversion rate optimization represents a synergy between your development and marketing teams. It creates a pathway for these traditionally siloed units to collaborate and contribute towards a common goal—driving conversions. This collaborative approach can lead to a deeper understanding of user behavior and preferences, enabling you to tailor your offerings and user experience in a way that boosts conversion rates.

In this blog post, we’ll explore the concept of feature flags in depth, discuss how they can be leveraged for optimizing conversion rates, and illustrate how Split can support you in this endeavor.

A New Perspective on Conversion Rates

The digital world’s dynamism means your conversion rates are never static. They can fluctuate based on myriad factors: evolving market trends, shifting user behavior, or changes in competitive dynamics.

To stay relevant and maintain high conversion rates, businesses must embrace adaptability in their strategies. This adaptability extends not just to marketing messaging but also to the user experience on your digital platform, which is where feature flags come into play.

Feature Flags: Beyond the Code

Feature flags’ traditional use-case is in code deployment—they enable developers to release, test, and iterate on features in a live environment safely. But these powerful tools’ utility extends much beyond the engineering silo.

By enabling the dynamic manipulation of features, content, and overall user experience, feature flags can help marketers directly influence customer behavior and thereby optimize conversion rates.

Feature Flags as a Marketing Tool

Feature flags hold incredible potential as marketing tools. Though traditionally seen as a purely technical tool used for progressive delivery and risk mitigation, their usefulness in optimizing user experience and driving conversions has become increasingly apparent.

One of the significant benefits of feature flags is their ability to facilitate marketing experiments. By toggling features on or off for specific user segments, you can test various strategies and approaches, measure their effectiveness, and adjust accordingly. Feature flags provide the agility to test on a granular level, from modifying button colors and placement to the introduction of entirely new features. This experimental approach can help you understand what resonates best with your audience, providing valuable insights for future marketing strategies.

Today’s consumers crave personalized experiences, and feature flags can play an essential role in delivering them. Using feature flags, you can customize the features and user interface elements that different user segments experience. This high level of personalization can lead to increased engagement, better user experience, and, ultimately, higher conversion rates. For instance, a first-time visitor to your e-commerce platform might see a different set of features compared to a repeat customer, each designed to enhance their specific user journey and nudge them towards conversion.

Feature flags offer the ability to collect real-time feedback on the changes you implement, which can be critical in shaping your conversion rate optimization strategy. By monitoring user engagement and behavior after rolling out a feature to a small user segment, you can gain immediate insight into its impact. This fast feedback loop allows for the swift identification of features that drive conversions and those that might need further refinement.

The Power of Incremental Changes

Feature flags allow you to modify and test small elements at a time rather than implementing broad changes at once. This power of incremental changes can prove crucial for conversion rate optimization.

Numerous case studies and research suggest that cumulative, incremental changes—guided by data and user feedback—can lead to a significant boost in conversion rates over time.

The Intersection of Development and Marketing

Optimizing conversion rates with feature flags isn’t a one-team show. It involves close collaboration between development and marketing teams, marrying technical implementation with strategic decision-making.

In practice, organizations that have successfully leveraged this collaborative approach have reported significant improvements in their conversion rates. Their success underlines the power of breaking down silos and leveraging tools like feature flags across departments.

Seeing Conversion Rate Optimization in a New Way

Feature flags offer a new way to approach conversion rate optimization—one that embraces adaptability, champions incremental improvements, and encourages collaboration across departments.

When engineering and marketing collaborate, using feature flags to align user experience with strategic objectives, businesses can make better, more informed decisions that drive conversions.

Ready to embrace this new way of conversion rate optimization? Split offers feature flagging solutions designed to empower both your development and marketing teams. Our platform supports dynamic configuration, enabling you to alter user experience in real time based on user feedback and analytics. This gives you the agility to adapt quickly and keep conversion rates high.

Dynamic configuration is an essential part of our platform’s power. It allows you to adjust the behavior of your software without needing to redeploy the entire application. With this feature, you can experiment, adjust and optimize on the go. Feature flagging is no longer just about risk mitigation; it’s about gaining actionable insights. Real-time adjustments lead to real-time insights, allowing you to stay ahead of the curve and keep conversion rates up.

Dynamic configuration empowers you to make changes that align with your users’ needs and behaviors as they evolve. When your digital platform can adapt quickly to shifting user preferences, you’ll see the impact on your conversion rates.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

Database Migrations with Feature Flags

Use feature flags to manage database migrations safely, preserving backward compatibility and reducing risk through gradual rollout.

Harness Team

June 24, 2020

Time to read

There comes a time in every developer’s life when they need to do a database migration. Because business requirements change frequently, database schemas need to be updated from time to time. When you think of making changes to your code, it’s easy to roll back with git or feature flags, but why can’t the same principle be applied to database migrations?

What is a Database Migration?

In this post, we are talking about database migrations in the context of updating schemas rather than going from one platform to another. If you’re interested in platform migration, check out our Managing a Monolith Breakup post.

A lot of companies choose to undertake a database migration to update their schemas for a variety of reasons, including changes in the requirements of their data, to take advantage of cost reductions offered by cloud-based platforms, or simply data reorganization.

The Biggest Risk of Database Migration

The biggest risk when it comes to a database migration is, quite obviously, the loss of data. Understandably, for nearly every organization, this is an unacceptable risk. To mitigate that risk, development teams need a way to test that the data remains intact throughout the migration.

Use Feature Flags to Migrate Your Schemas

A common misconception about feature flags is that they can only be used for front end feature releases. However, feature flags can play a crucial role in your architectural strategy as well.

Let’s take an example of adding a middle name to an existing schema. Currently, your UI just asks for first name and last name, and we want to add a middle name column to the schema. The first thing you need to do is add a new column in the database. This is an additive change that is backward compatible. If no one is using that code, it’s just sitting there. When you actually make the code changes to use that middle name, you should put them behind a feature flag. The schema will work whether the flag is on or off, and if you turn the flag off it won’t break anything. If, in the future, you decide that you do not want to record people’s middle names anymore, you could perform a DB migration that removes that column. However, that is not backward compatible because, in the future, your code will look for a column that is not there anymore.

The safe and recommended way to do this is to wrap your feature flag around the elements of your codebase, not your database. If there is a problem with your code and you need to revert, or if you decide you do not want to ask for the users’ middle names anymore, all you would need to do is change the code that is behind the feature flag, without touching the database code. Then, once the feature flag is deactivated, you could do a separate migration that will just drop that column.

Preserve Backwards Compatibility in Database Migrations with Feature Flags

Let’s say you are in front of a frozen lake, and you are not sure if it is stable enough to hold you while you walk on it. You stretch out your right leg, gently apply pressure, and then slowly move your left leg to meet your right leg. You do this continuously to build your confidence until you reach the other end of the lake. If however, you feel the ice crack underneath your leg, you would slowly revert your actions and go back. This idea is called the expand-contract pattern.

Just as you need different versions of your software to work at the same time, you also need different versions of the database to work at the same time. In reality, you cant stop the world, update your code, and then start the world again. You need to do it gradually, which means more than one version will have to work at a time with the database. When you are migrating to a new database schema, you expand the schema to work with the old version and the new version and then contract to work only with the new. This allows the gradual rollout of the new changes to be successful while knowing that the old code still works for your users who haven’t gotten it yet. For this process to be effective, each change on its own should be backward compatible, so that just in case you need to roll back, your database will still be valid.

Feature flags give you the benefit of a layer of risk mitigation. For example, if you update your database and then immediately upgrade to version 2 of your code and there’s a bug, what do you do? Enter feature flags. The database still has to work after you release the feature because the change will not happen to everyone instantaneously — it will happen incrementally. By having a backward compatible migration, the risk is drastically reduced.

Learn More About Database Migrations and Feature Flags

Database migrations can cause developers problems if not done correctly. Lucky for you, we’ve got you covered! If you’re interested in learning more

Read more about feature flags and incremental, backward-compatible database changes in chapter 6 of our free Feature Flag Best Practices eBook
Learn more about database migrations
Watch this video on Feature Flag Best Practices
Stay up to date on controlled rollouts

Be sure to follow us on Twitter @splitsoftware, and subscribe to our YouTube channel!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

Testing AI Models Using Split

Use Split to safely test and fine-tune AI model parameters in production with feature flags and dynamic configuration for faster, data-driven iteration.

Harness Team

May 31, 2023

Time to read

If you haven’t read, large language model (LLM) based artificial intelligence tools have had an enormous impact on almost every industry and have taken the world by storm. From software engineers to artists, accountants, consultants, writers, and more. It’s an incredible tool that extends the creative power of humanity.

That’s a pretty weighty intro, true, but it’s undeniable how AI is changing the world. But if you want to implement AI into your own application, you should do it carefully, and with thoughtfulness. There is no need to move fast and break things, but rather, move fast with safety.

Using feature flags with AI allows you to test and measure the impact of AI on your metrics. AI tools can assist you in many fields, but you do need actual data on the outcomes that it is supposed to improve for you. For that, you need hard, statistically significant data.

This is where Split comes into play. Our measurement and learning capabilities can help you evaluate AI based approaches and iterate, ensuring that you can rapidly determine, with safety, which is best.

Some popular implementations include Google’s Bard, Microsoft’s Bing, and OpenAI’s ChatGPT.

Let’s look at some code using OpenAI in Python:

This is a pretty basic piece of code. This will have OpenAI’s Chat GPT write a funny computer joke.

In this case, it wrote for me:

Now, let’s say you want to modify the parameters, such as changing the temperature, the maximum number of tokens, or even the language model itself! These are parameters that OpenAI’s ChatGPT uses to fine tune the AI and what it provides in response to your prompts. We can store this information in Split’s handy Dynamic Configuration to allow modifying these from the Split UI without needing to deploy any new code.

To do this, first, let’s create a flag, we’ll call it AI_FLAG for this scenario:

We’ll give it two treatments that we want to test our AI parameters with. A standard treatment with the standard configuration, and a reduced_tokens treatment with a reduced number of tokens and a lower temperature.

ChatGPT’s temperature setting is the setting for the balance between factuality and creativity in the language mode. A higher temperature value means the AI gets to be more creative with responses, whereas a lower temperature means it will stick primarily to fact based knowledge that it has. It can range from zero to one. The maximum tokens setting is the maximum length of a response that ChatGPT can give you.

Now let’s create some dynamic configuration using key value pairs to hold the model, max_tokens, and temperature:

Then let’s hook up the parameters into the code using the Split SDK’s dynamic configuration to set the parameters. Note here that the different experiences are not actually based upon the name of the treatment, but rather the dynamic configuration contained within the Split web console. This is an extremely powerful set up, where parameters can be constantly iterated upon without needing to deploy any new code!

With an even more sophisticated approach, you could contain the training prompts within the Split dynamic configuration itself. This would be a highly advanced configuration but would allow not only parameterizing and iterating over the model and configuration, but also the training data. With the prompts in Split you have almost infinite possibilities for testing and iterating over AI setup.

Here is what that could look like as dynamic configuration

For this you would have to use the JSON as the dynamic configuration option.

In this example we have two treatments for this feature flag, one named chicken_jokes and one named computer_jokes—we want to see if our customers like chicken jokes instead of computer jokes. We can completely customize the prompts using dynamic configuration. Here is what the JSON could look like:

And in the code we would simply set the messages object to messages in the ChatGPT configuration:

Run this and your code will print out either a chicken joke or a computer joke, depending completely on the dynamic configuration cached from Split based upon the treatment the user has received.

This is an incredibly powerful example of how Split can be used in testing and validating AI models, allowing for extremely rapid iteration and learning.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation platform that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.

Technical

Experimentation in Split: Make Your Events Work for You!

Learn how to extract, transform, and load user event data into Split to power experimentation and measure real feature impact at scale.

Harness Team

March 11, 2021

Time to read

So, let me guess… you love feature delivery with Split, and you’ve heard about experimentation. Now you think you’re ready for the next step.

Awesome!

This post is filled with details about how you can get started, and how Split makes it easy! But maybe you have questions? Things like:

What is an event?
How does one extract events?
What if my events platform doesn’t support webhooks?
How does one transform events?
How are events sent to Split?
Is this really all worth it?

The good news is that if you’re a Split customer, you’re basically already on your way!

Maybe you’ve heard, “a split is a feature flag, and an experiment rolled into one.” We say this because Split automatically keeps track of which features were given to each of your customers (we call them impressions). This alone can only tell you who is in the A group and who is in B (in your AB test, because that’s what a lot of experimentation is!). To have an experiment, you need data about what those users experienced and what they did. In other words, you need events.

KEY CONCEPT: An event is a small package of data to describe a user’s behavior or experiences.

There are events that describe things other than users, but in this post we’ll focus on the events of anonymous and logged-in users (other types can be handled similarly anyway).

The good news for many Split customers is that events are readily available. If you’re lucky enough to use mParticle, Segment, or Google Analytics, you can stop reading this post and start reading our docs (at the links in this paragraph) to see how to leverage your existing events in Split. We’ll call that early graduation.

For the rest of you, keep reading and learn. My goal is to teach you the steps necessary to extract, transform, and load events to Split. I will draw on field integration code examples for Amplitude, MixPanel, Tealium, and Rudderstack. If you’re using one of those tools, most of the work has been done for you already. If you’re not, the blueprints I provide should help you build your own. Most of these technologies are available for review on Github: Amplitude, MixPanel, and Rudderstack.

What is an event?

Instead of describing events in the abstract, let me give you a series of examples. Most tools pass a generic event with a track SDK call. These examples are all JavaScript, denuded of their initialization so you can just see the event pass itself. Witness the birth of an event.

If you choose to use Split to report your events, you don’t have to finish the rest of the post. Why? Because Split events are directly transported, asynchronously and in batch, to the Split cloud. Why wouldn’t you just use Split? In most cases, the track calls to create events are already in place with one of the tools shown. Customer Data Platforms (CDPs) can do a cloud-to-cloud transfer of events to Split, removing any need to add Split tracking on top of the existing tracking. This post exists to liberate events!

If you skimmed through the examples, you’d notice that every approach includes not just an event but some details about the event. The details are often called properties, and they make the events much more useful when you go to do analytics.

TIP: A great integration pulls across all the properties into Split events.*

The asterisk is because I mean almost all the properties. Sometimes there are “bookkeeping details” that can be left behind. Let’s see how event extraction works so we can have a mapping strategy.

How do you extract events?

In short, by webhook or stand-alone API extraction.

A webhook is a function you host in your cloud. Google has Cloud Functions, and AWS has Lambdas; both are popular, but many other providers can host a webhook.

Many tools will let you specify headers to your webhook, so you can do things like configuring the Split environment and traffic type you want to use when you start sending traffic. Let’s look at a webhook to get a clearer understanding.

We have a handleRequest instead of a service method, but otherwise the picture is almost identical. The MixPanel example is a webhook you register with Split. Split also has webhooks for exporting impression and audit data to third parties (like MixPanel).

In both cases, the request’s input stream is fully read into a string before processing continues. A more sophisticated implementation would read the input and produce Split events as a stream.

In this example, the byteStream is consumed by a BufferedReader. Each line of input is a single event, making it convenient to parse them into JSONObject instances off the stream. After a batchSize of events are consumed, they are sent to Split in-line (could have been sent on another thread).

Streaming is beneficial when the input sizes are large. If you are reading more than a megabyte, you should use streaming. The Amplitude and MixPanel event integrations stream.

What if my events platform doesn’t support webhooks?

Then it will almost certainly support a REST API you can use to extract data. You need to do a little more work to get the data than with a webhook, including using your own REST API library to call the data export API of your tool. Java has a built-in HTTP client, but there are popular third-party libraries. Languages like Node.js use built-in commands too.

At the least, extraction APIs let you say what time period you want to export. Some allow you to specify very granular timestamps, and others give you one-day increments. If you want to extract high volumes of events, you should aim to run often, grabbing short periods each time. You can batch the events you create back to Split.

The MixPanel to Split integration is an excellent example of calling an API to retrieve events; the streaming code in the example above comes from that integration.

For an advanced example, consider Amplitude’s bulk events API. This one is trickier than most because it responds with a zipped archive of JSON events. You can see the full solution on Github. Pay attention to how the input is streamed to an iterator that decompresses and resolves into events.

How do you transform events?

Let’s look at a sample event tracked to Rudderstack.

Gold! Most of this event can be passed to Split in event properties. It must be flattened to do that, though. So the rich context object in the Rudderstack event presents a challenge.

Also, timestamps are in UTC, and Split wants them in milliseconds since the epoch. We have the problem of deciding when to send the event with anonymousId as its Split key, or userId (or both). Overall, the mapping is clear. The left-hand side is what Split expects in each event. The right-hand side is the mapping to the Rudderstack property shown in the sample track event above.

What Split Expects in each event	Mapping to the Rudderstock property in the sample track event above
`key`	`userId` or `anonymousId`
`trafficType`	Pulled from configuration (stand-alone) or `HTTP header` (webhook)
`eventTypeId`	A cleaned version of event name (Split allows `[a-zA-Z0-9][-_\.a-zA-Z0-9]{0,62}`)
`environmentName`	Also pulled from configuration
`timestamp`	`originalTimestamp` converted to milliseconds since epoch
`properties`	See below

‍

How do we get all those properties recursively? Recursively!

In the above example, the context object has all of its object children flattened. Each level is separated by a period. The end result is a Split event with all the goodies preserved.

It isn’t JSON’s nature to alphabetize, but if you compare the original //Rudderstack track event and the //Split event as received and transformed from Rudderstack, you’ll discover that most of the source event’s properties have been preserved in the Split event.

How do you send events to Split?

RESTfully. Consider this code.

This example shows batching events to Split’s events endpoint. You’re sending JSON, so your programming language of choice will have clever ways to put your events together for the POST body. Note that you’ll need a server-side API token to send an event.

Is it all worth it?

Yes, it’s worth it. 80% of features have a neutral or negative impact. We call this the “impact gap” at Split, and it’s a real thing.

If you don’t measure, you won’t know if you’re a success. You could build countless features on top of one that never even resonated with your customer base in the first place. Not being able to calculate and guess the right answer isn’t a failure. Refusing to measure your results is a failure. Split is your golden ticket to measuring, getting yourself onto a sound track, and doubling down on only the innovations that matter.

Plus, you can get some incredible views of your features.

Split will study your organization’s metrics and keep you informed of how each of your features is having its impact. Did a new feature result in a drop in signups? Are those errors background noise, or are they coming from that 5% canary rollout you just kicked off? Ever wish you could just hit a (kill) button to just stop that crazy world?

You’re already in the driver’s seat. You may even have a nice dashboard. Now just make sure you can look out the windshield and see where you’re going. Try Split today! And seriously, if you’ve read this far you know you want to hear how companies like Twilio, Speedway, Comcast, and Experian are running experimentation programs, and delivering business impact. To hear those stories and more, join us at Flagship!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Deliver Features That Matter, Faster. And Exhale.

Split is a feature management platform that attributes insightful data to everything you release. Whether your team is looking to test in production, perform gradual rollouts, or experiment with new features–Split ensures your efforts are safe, visible, and highly impactful. What a Release. Get going with a free account, schedule a demo to learn more, or contact us for further questions and support.

Technical

Managing Feature Flag Retirement and Technical Debt

Manage feature flag sprawl by planning expirations, enforcing removals, and reducing technical debt through intentional flag lifecycle practices.

Harness Team

February 2, 2018

Time to read

Teams working with feature flags usually come to the conclusion that a large number of active flags isn’t necessarily a good thing. While each active feature flag in your system delivers some benefit (I hope!), each flag also comes with a cost. I’m going to explain those costs, such as cognitive load and technical debt, and explain how to avoid them.

Feature flags ain’t free

Every feature flag under management adds cognitive load, increasing the set of flags you have to reason about when you’re working with your feature flagging system. In addition, every active flag is by definition a flag which could be either on or off for a user, which means you need to maintain test coverage for both scenarios. Perhaps the biggest cost from active feature flags comes in increased complexity within your codebase in the form of conditional statements or polymorphic behavior. This “carrying cost” for feature flagging is very real — flagged code is harder to understand, and harder to modify.

The case of the zombie feature flag

In a previous post, we saw the benefits of categorizing feature flags as either long or short-lived. However, even flags that have been explicitly identified as short-lived can still end up outstaying their welcome. A flag that’s no longer in active use might still remain in the system, with its implementation still muddying the codebase. What causes these zombie flags to stay hanging around?

Sometimes a feature flag that was intended to be short-lived is simply forgotten, perhaps lost amongst a large number of other flags. This is, in and of itself, another reason to keep the number of flags in your system low — it prevents a broken windows culture where actively managing your technical debt doesn’t seem worth the investment, creating a vicious cycle.

It’s also possible that a team is aware that a feature flag is past its expiration date but can’t quite prioritize getting rid of the flag. Retiring the flag always seems to be near the top of the task backlog for next sprint, never the current one. This is a variant of the general challenge that many delivery teams face in balancing urgent work vs important work; building a high-visibility feature vs. paying down technical debt.

Plan for feature flag retirement

The key to ensuring that feature flags live a short — but hopefully productive — life is in being intentional on retiring these flags, along with having established processes to help everyone stick to those intentions.

The first step is in explicitly identifying when a flag should be short-lived. As we’ve discussed, placing flags into defined feature management categories can help, but isn’t the only solution. Simply instituting a rule that every new flag have a stated expiration date can get you a lot of the way there. The key is in making that expiration date a requirement for every new flag. Of course, there also needs to be some mechanism to mark a feature flag that’s intended to be long-lived, with no expiration date. An example of this would be a flag controlling access to a paid-only feature. The ability to control access to that feature will always be required, so the flag should never expire.

Once you have the concept of an expiration date in place, the next step is to enforce that expiration.

A feature management technique which I would consider a bare minimum is to proactively place a flag retirement task on the team’s backlog whenever a new short-lived flag is created. This doesn’t entirely solve the issue — those tasks have a tendency of being serially deprioritized — but it’s a good start.

A rather extreme technique — and one that I’m rather fond of — is to attach a time bomb to every short-lived feature flag. The expiration date for such flags is included in the feature flagging system’s configuration, and a process simply refuses to launch if a flag that it uses has expired. Slightly less extreme variants of this approach would be for a process to alert loudly if it was using an expired flag, or having the flagging system itself send out alerts when an active flag expires, or drawing attention to expired flags in a feature management UI. I’m a fan of the time bomb though.

There’s a concept from Lean Manufacturing which can be applied to feature flag management. When running a manufacturing production line it’s beneficial to reduce the amount of stuff piling up between lines by enforcing a “Work in Progress limit” or WIP limit. The same technique can be applied with feature flags. A team declares that they will only allow themselves to have 4 (or 6, or 20) short-lived feature flags active at any one time. If a team has reached that WIP limit for flags and a product manager or tech lead wants to add a new flag they must first identify which flag the team is going to retire in order to “make room” for the new flag. This can be a very effective technique, mostly because it aligns incentives. The person who wants to add a flag is incentivized to also make sure that it will subsequently be removed — so that they can keep adding flags! For the same reason, WIP limits are best applied within the boundaries of a team — there’s nothing more frustrating than a limitation that you don’t have the power to fix.

Finish the feature management job

A feature flag should never be considered retired until the code which implements the flag is also removed. The most direct cost of the flag is on the codebase itself. Removing a flag from the feature flagging system also removes visibility of the cost of that flag, increasing the risk that a team will pay that carrying cost within their codebase for longer.

A good hard-and-fast rule to prevent this from happening is to only allow the configuration for a flag to be removed once there are no references to that flag within a codebase.

Setting feature management intention

The key to keeping the number of active feature flags in your system under control is intention, coupled with some good practices. Explicitly identifying short-lived feature flags and then applying the techniques discussed in this post will help your engineering org to succeed with feature flags at scale.

For more ideas on how to succeed with feature flags check out Chapter 3 of the O’Reilly eBook produced by Split titled Managing Feature Flags.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

Feature Management Architecture & Security

Best practices for building secure and scalable feature flag systems.

Harness Team

October 9, 2022

Time to read

Feature Management: What to Consider

100% of software engineers agree that if you want to embrace CI/CD effectively, feature flags are a necessity. In many ways, they’re becoming a commodity. But don’t be fooled: not all feature management and experimentation platforms are created equal. Making comparisons across the market is a little more complex than perusing the milk cooler for parity products. There are potential security risks with the wrong feature flagging tool and without cautions on cartons.

As it goes with any piece of technology, it’s all about the build. If a new feature management platform is on your consideration list, take a deeper look at its architecture and security. Even the most subtle nuances can mean the difference between accelerated software delivery and a potential leak of sensitive information. Let’s compare two main approaches to platform architecture & security, so you can measure, learn, and launch features with more confidence.

A Thin Layer = Less Protection

All feature management tools deliver feature flags and capture impression data (an impression represents a flag evaluation, when and for whom it was evaluated) by way of software development toolkits (SDK). These SDKs are separated into two categories. One being the client-side SDKs, which sit behind web browsers, IOS, android, and IoT devices. The other is located server-side: these SDKs operate on a server inside your infrastructure or in a cloud-based server of the feature flagging system.

In most feature management platforms, the client-side SDKs are a thin layer. These platforms might argue that “thin” is a nimble design choice, but beware. They’re really just a proxy, incapable of evaluating feature flags locally. As a result, the data needed to evaluate flags (like userid and its attributes) needs to be sent away via encoded urls to a cloud-based server for evaluation. Not only does this delay the analysis process, it increases the risk of a data leak from the urls left behind in access logs. If you’re relying on feature flags to power your banking application for example, there’s a chance that personal identifiable information (PII) could wind up in these logs and in the wrong hands.

For the most secure and private feature flagging capabilities, it’s crucial to limit the exposure of information across the internet. This particularly applies to companies at enterprise scale and with applications constantly exchanging highly-sensitive data that could be breached.

Look to a Rules Engine for Maximum Security

One unique approach to architecture and security starts with the foundation of a rules engine. What does that mean? Both sides of the client-side and server-side SDKs are treated the same. They’re not thin, they’re robust and intuitive. They’re both rules engines, which means they focus on the rules, not the answers. This is a very unique architectural difference that most feature management platforms don’t offer.

On the server side of the platform, feature flag rules are written and shared with the client-SDKs. They are then cached and saved for a more intuitive and private evaluation, and the benefits are major. While traditional feature management platforms can’t make feature flag evaluations within the client-side SDKs, a rules engine-based approach can do it all locally. This can be accomplished right inside your application. For example, it can be done in your online banking app, a healthcare records portal, or any other system requiring a higher level of security and privacy.

Because the inputs needed to make the feature flagging decision don’t leave your application to make that long, treacherous trek to the cloud-based server for a feature flag evaluation, neither does your customer’s PII. Social security numbers, location information, date of birth—that information remains between you and your customers. The cloud never sees it, only the recipe for how each feature flag behaves and reaches your customers. Therefore, the chances of a privacy leak getting into the wrong hands is minimized. Rule of thumb: trust a rules engine.

Split’s Rules Engine

From the beginning and through every update along the way, Split has been designed to be private, fast, resilient, secure, and versatile. By downloading and caching the user-defined ruleset locally, Split’s SDK is able to act as an autonomous rule engine and perform all evaluations locally. Beyond elevated privacy and security, this capability gives Split some additional key advantages.

Not Just Security, Speed

Because all evaluations performed by the Split SDK rely on local data rather than a cloud evaluation, processing time is virtually instantaneous, under a few milliseconds. The on premise evaluations can be re-run and updated at any time. This enables the ability to trigger tests, to capture real-time data, and make smarter feature changes on the fly.

More Resilience

Split is hosted in multiple AWS regions for failover purposes. However, rulesets are additionally cached in our CDN, Fastly, an edge cloud platform for optimized experiences, and would be available even if the AWS hosted Split Cloud were not. Furthermore, once a ruleset has been cached, the SDK can reevaluate it as often as required against a set of ever-changing data attributes. This, in combination with our streaming support, guarantees the feature flags updates are delivered to the SDKs as soon as the updates become available. With the Client SDK caching the rules, instant local decisions can even be made when there is no network connection. Thanks to this, Split is an ideal choice for many mobile applications, where a data connection is never guaranteed.

Next-Level Versatility

With 14 unique SDKs and REST support (via the Split Evaluator), Split is compatible with virtually any programming language. Furthermore, Split is ideal for supporting multi-tenancy.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Learn about: Feature Engineering, Feature Management Maturity Model

Read about Choosing a Feature Management Solution

Technical

Why You Should Use Undefined Instead of Null in Javascript

Avoids type coercion and improves clarity by preferring `undefined` over `null` in JavaScript logic.

Harness Team

October 1, 2021

Time to read

JavaScript is an unusual programming language in that it has two null-like values: undefined and null. These are distinct values (null !== undefined). They are also different data types in JavaScript, among the primitive types are string, number, boolean, null and undefined.

After a couple of years of working with JavaScript, I came to the conclusion that null is pretty problematic and should be avoided when writing JavaScript whenever possible. I’ve since spent several years writing both Node.js and various browser-based JavaScript applications without using null (except for converting to or from undefined:

I’ve found it to be a viable and desirable approach to take.

Before discussing why it seems better to avoid null, let’s first talk about how these values, undefined and null get into your application’s data flows to start with.

Where Undefined and Null Come From

undefined is omnipresent in JavaScript. If you declare a variable without an explicit value assigned to it, it gets the value undefined:

If you do not pass a parameter to a function, and you access that parameter in the function, it will have the value undefined:

If your function returns nothing, and the return value is assigned to a variable, that variable will receive the value undefined:

If you access an element of an array that has nothing in it, you will also get undefined:

Naturally, you can also explicitly set something to undefined:

While there are other obscure ways to generate undefined, such as let myVariable = void 0, the previously listed cases are the most common ways.

In contrast, null never comes as a result of no value being assigned to a variable or parameter. You must always explicitly set something to null. This happens either through an explicit assignment:

Calling a browser function that returns null:

Or deserialization a JSON value:

Why it is Desirable to Avoid Null

Supporting both null and undefined creates more work if you are routinely operating with both null and undefined in the data flows in your application, you are forced to do one of three things:

You must be studiously aware of which value is going through each data flow in your application
- This is unrealistic in all but the most trivial of applications. (See the note about TypeScript below)
You must routinely be checking for both in your data flows
- e.g. if (myValue === undefined || myValue === null) {…}. This makes your code harder to read.
You must use the regular equality operators
- For example, if (myValue != null) {…}, rather than the type-safe equality operators, if myValue !== null {…}

This is definitely a risky pattern. There are eslint rules that can help you do the right thing here (e.g. "eqeqeq": ["error", "always", {"null": "ignore"}]), but at best this introduces what your eyes will see as unnecessary variability in your code. At worst (if you are not using eslint’s rule), it will introduce bugs that you will overlook in code reviews.

Supporting Only `null` Is Not Easy

Because undefined is so extremely easy to introduce into your data flows, it takes substantial work to keep it out. If the goal is to get the highest quality of code with the minimal amount of work, this is not a good candidate for meeting that goal.

Supporting Only `undefined` Is Not Hard

In practice, it is easy to capture the null values any time that they would get introduced into your code and immediately switch that value to undefined. e.g. for function return values you can do something like:

For both browser and third party API’s, you can often figure out if a function returns (or expects) null from the documentation and convert, as above (though, you do need to be careful because both "" (empty string) and false will also be converted to undefined in the example above). If there are TypeScript definitions for your library, that can also help considerably, since TypeScript can make explicit the types libraries use. One common case is dealing with JSON data structures (unlike JavaScript, JSON has no notion of undefined). But a simple conversion routine solves this:

Similar things can be done for return values from third-party functions.

Null Doesn’t Trigger Default Values

If you have a function function myFunction(x = 5), when you call myFunction() or myFunction(undefined), then x will be set to 5. But if you call myFunction(null), then x will be set to null. If you have variables that could be indiscriminately set to either null or undefined (a situation which is very likely if you are allowing both into your application code), these default values will not always be applied as you are likely to want them to be applied.

On the other hand, if you are treating null and undefined distinctly, then you may actively find it useful to be able to not get that default value by deliberately passing null. But as mentioned elsewhere, the effort to make this safe doesn’t seem like a good tradeoff to make.

`typeof null === "object`“

One of the most frustrating things about null is that typeof null === "object". This means that if you do allow null into your data flows, when you do typeof on any variable that might receive null, you must check whether an object result means null or some form of {…}.
Similarly, if you are checking whether something is an object, you must then make sure it isn’t a null before you dereference it.

undefined has none of these problems, since typeof myValue === "undefined" is not ambiguous.

A Note About TypeScript

If you use TypeScript and use it with a great deal of discipline, it is much more realistic to manage both null and undefined flowing through your application. However, what this will also create for you is many situations where some data structures have null values in one part of the application, and their equivalents in other parts of the application will have undefined. Because these will collide and conflict, you’ll either end up with type declarations like myProperty: null | undefined or you will end up having to do a lot of data conversions at various unpredictable places in your application. In the end, while the explicit treatment of these types is a big improvement over JavaScript, the hassle factor remains unchanged (or even worse). Even with TypeScript, then, it still seems better to simply keep null out of the data flows in your application.

Conclusion

Because undefined is, effectively, unavoidable, because null is pretty easy to keep out of the data flows in an application, and because the code is simpler by not needing to manage both data types, I’ve found it more productive just to ignore the existence of null except at necessary interface boundaries where we simply convert to undefined.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

‍

Technical

Build an API with Node.js, Express, and TypeScript

Build a RESTful API with Node, Express, and TypeScript, and integrate Split feature flags to safely manage feature rollout and experimentation.

Harness Team

February 24, 2021

Time to read

A RESTful API in JavaScript can be built from scratch very quickly. It usually means using a Node.js environment and a server run by the Express library. One could argue that a downside of such a technical stack would be the lack of types – the fact that JavaScript isn’t a strongly typed language. But you can instantly stop worrying and learn to love JavaScript, well, actually – love TypeScript – JavaScript’s type-safe superset and an excellent tool for a better and safer development experience.

Let’s run quickly over the topics of this tutorial:

Creating an Express-driven Node.js application exposing a REST API
Making the application code strongly typed with TypeScript
Adding new feature using feature flag branching (branch by abstraction) with Split
Testing the configured percentage split

Wondering what the app’ll be about? Do the time zones of the world sound interesting enough? Let’s get started!

Deploy Continuously with Confidence – Sign up for a Split demo!

Speed up development cycles, reduce release risk, and focus your team on DevOps best practices that create maximum impact. Get a personalized demo today!

Prerequisites for Node and TypeScript

For the best experience inside this tutorial, you need to have:

Basic knowledge of Node.js and Express
Basic familiarity with the command line
Node.js and npm installed
A Split account

If you want to follow along by inspecting the codebase while reading the next sections, the full code example is available on a splitio-examples GitHub repo.

Set Up Your TypeScript Development Environment

You’ll start by creating a project directory and move to its root folder:

Start the npm project by running npm init, creating a package.json file. As an alternative, you can copy the following JSON structure to the package.json that you’ll make on your own:

If you plan to use TypeScript in your application, it’s best to hook it up at the very beginning, as TypeScript will provide useful development features while you code. Not surprisingly, it is installed as an npm package called typescript, and here you’ll install yet another one – ts-node:

typescript package is the key library in all the applications using TypeScript inside the codebase. It transforms the TypeScript code to JavaScript in a process called transcompiling or transpiling. The subtle difference from the term compiling exists, as compiling defines code transformation from a high-level programming language to a low-level one. At the same time, transpiling is the term used for code transformation between high-level languages. However, in TypeScript ecosystem(s), you’ll probably run into both terms.

ts-node is a useful package that enables running TypeScript files (ones having the .ts extension) from the command line within the Node environments.

The -D, also known as --dev, means that both packages should be installed as development dependencies. After the installation, you’ll find the devDependencies property inside the package.json populated with these packages.

Note: Node environment (or browser environment in any client-side app) still only understands the JavaScript language. The TypeScript code needs to be transpiled to JavaScript before the package is used in runtime. If someone were using your app as an npm package, he wouldn’t need to install the typescript dependency, as that person would only use the runtime version of the application/package. For that reason, typescript is a development dependency.

Next, create a tsconfig.json file in the project’s root folder. The presence of a tsconfig.json file in a directory indicates that the directory is the root of a TypeScript project. Also, this file allows you to configure how the typescript library will compile the TypeScript code inside the project. Populate the file with the following JSON:

The crucial property of the configuration file is called compilerOptions. Options set here define most of the TypeScript configuration. Let’s cover some of the basic ones.

module specifies a module system to be used in the compiled JavaScript code. The standard module system inside a Node environment would be CommonJS.
target property defines the targeted JavaScript version of the compiled code. Since the code can be run on your server, inside a Node environment, the ES6 JavaScript version is good to go. But, if this was a client-side app that runs in, e.g., Internet Explorer browser, then you should aim for lower ECMAScript versions and have something like "target": "es5".
rootDir defines the root location of typescript files inside the project. It doesn’t necessarily need to be the root of the project folder, like here.
esModuleInterop flag enables default imports for TypeScript modules with export = syntax that you’ll need for importing from the Express library later on.

Bear in mind that this basic set of tsconfig.json options is just something to get you started. TypeScript and its compiler are highly configurable, and there is a lot more to learn about it. Feel free to explore the TypeScript docs for more information.

What Are Declaration Files in TypeScript?

Declaration files describe types of various JavaScript APIs to the TypeScript compiler. In your project, you’ll be defining your own types, but you’ll also need types for various Node APIs or different external packages that you’ll be using, like Express. These files often come with the .d.ts extension. They are used for TypeScript module resolution. Those files are modules that don’t have any code implementation inside but serve as a layer that describes the JavaScript implementation behind it by its type.

Some external JavaScript libraries have the TypeScript declaration files shipped within the npm package (like the one you’ll use later on – @splitsoftware/splitio). In contrast, the other declaration files need to be installed as a separate package that usually comes with a @types namespace prefix, provided by the DefinitelyTyped project. Node APIs type definitions also need to be fetched from the @types namespace, so let’s first install the @types/node package as a development dependency:

Build an Express Server in Node

The next thing you’d need is to install Express.js, a popular package to create a server in Node.

With the TypeScript context in mind, let’s also install the types for Express as a development dependency:

In the root directory, create an app.ts where the server application will run:

The express module is required to create a server. With this configuration, the server will run on port 3000, with the base URL where the application runs being http://localhost:3000.

Install the utility package Nodemon, which will speed up your development by automatically restarting the server after each change. Nodemon is also installed as a development dependency, as you only need it during the development phase.

In the package.json file, inside the scripts property, add a script named serve with nodemon app.ts command that will be used to start the server. Remember, the ts-node package makes this possible under the hood, as normally you wouldn’t be able to start typescript files from the command line.

Now you can start your server by simply running:

The following should appear in the terminal:

Alternatively and without Nodemon, you could run the server with npx ts-node app.ts.

Add an Endpoint inside the Node Server

Change the import statement on the first line, so you also import the TypeScript interfaces that will be used for request, response, and next parameters inside the Express middleware.

As mentioned, this application will be all about the time zones of the world. The REST API will have a single GET /timezones endpoint, which will return the hardcoded list of locations with a timezone name, abbreviation, and the UTC offset. Since there is only one route, let’s just put it inside app.ts, by adding this code:

Note: Hardcoded in-memory data is something you’ll be using inside this example. In a real-world app, you’ll want to replace the hardcoded data with the one stored in a database.

Notice how this endpoint returns a list of locations with a type LocationWithTimezone you just easily defined using a TypeScript interface. There is no conceptual difference between this type you made yourself or any other type, e.g., the one imported from an external declaration file. They both present the same TypeScript mechanism to describe a JavaScript entity. To make sure this route works, you can test it against a request made with curl from the terminal:

This is the response you should see in the console:

Create a Feature Flag for Your Node + TypeScript App

Feature flags are used in numerous production applications around the world, and there is sound reasoning behind that. The natural state of living applications require updates from time to time, maybe even on daily or hourly basics. Every update or every new version of the application is a potential way to introduce a new bug. Feature flags come in handy in those situations, as they give you the ability to serve the latest version of the application only to a specific target inside your audience first.

Inside this tutorial, the newly introduced feature, or a code update, will be a new location added to the list of locations returned by GET /timezones route – a warm destination in the heart of Africa, known as Kenya. You don’t want all application users to get the Kenya location’s data at first. Maybe you want to see if that data is even relevant to the users, so you’ll want to do some kind of A/B testing first – let only half of the users get the time zone information about Kenya. Let’s start with the feature flag configuration.

To create a feature flag, you’ll need access to Split application. If you don’t have a Split account yet, you should register one to continue. After you log in to Split, navigate to the Splits section on the left and click Create Split. The dialog will prompt you for the split’s name, which you can define as timezone_split. Leave the default settings for everything else and click Create to finish.

You should see a newly created split with a Staging-Default environment preselected :

If Prod-Default environment is preselected, switch to Staging-Default by using the dropdown in the upper left corner:

To configure the split settings, click Add Rules.

The state of a feature flag in Split is known as treatment. The most common values for treatments are on or off, but you can use anything else. As configured here in the Define treatments section, when the treatment is on, users will get a new location in the given response of the GET /timezones endpoint. If the treatment is off, the same endpoint will return the original list of locations with timezones, without Kenya.

Now, let’s set up the targeting rules, where you’ll define the targeted audience for this split. The split will be configured as a percentage split, and that kind of targeting is set up inside the Set The Default Rule section. Percentage split means that treatments will be randomly distributed between users in the percentage you define. As seen in the next picture, you’ll define that half of your users get the on treatment, leaving the other half with the off treatment.

Note: There can be situations in which the split won’t be active in the application for various reasons, so the users will branch according to what you’ve set up inside the Set The Default Treatment section. A good practice here is to have the off treatment as the default one, as you probably don’t want new features to be accessible to everyone without being tested first.

After that, you click Save changes and then Confirm, resulting in the split settings being saved.

Connect Your Feature Flag with Node

Back in the application code, Split Node.js SDK is needed to apply the previously set logic in the application runtime. It can be installed via npm, and it also ships with the TypeScript declaration files, so you don’t need to install a separate package for that:

Add the following code in the app.ts. Optionally, you can put the import statement at the top of the file.

The API key you’ll use here is the one you can see in the Admin Settings of the Split dashboard. The key you search for is the second one from the – staging one for the SDK. Of course, in a production app, it is a good practice to store this key as an environment variable on your machine.

Note: Stop for a second on the line in the previous snippet where you import from the Split library: import split = require('@splitsoftware/splitio'). TypeScript module provided by Split uses the export = syntax for exposing its members, so this kind of import is needed according to TypeScript documentation. More on that topic can be found here.

As SDK now communicates with the app, it can be utilized to get a respective treatment for a user’s request with the getTreatment method. The method receives two arguments – key and split name. You can use the key to identify a particular user and calculate a specific treatment for that user. Simultaneously, the split name is the identifier of the split configured in the Split dashboard.

A good idea is to put the logic for calculating the user’s treatment inside an Express middleware. Execute the function against each API request before proceeding further. That middleware can read the user’s authentication data, e.g., the data stored in the authorization header, and use it as a key for the getTreatment method. The second argument is the name of the previously configured split (timezone_split).

Note: In the live application, you’d want to have a more robust authentication mechanism to identify your users, but here we’ll just be sending the unencrypted user’s data in the authorization header of each request.

Place the following code above the app.get...; line:

The sole purpose of the getTreatmentMiddleware is to put the treatment on the request object and proceed to the next middleware, which is the getLocationsWithTimezones function.

After adding this code, you’ll be getting an error from the TypeScript compiler – and a completely legit one – as there is no treatment property present on the Request interface, which is a type assigned to the request parameter. You shouldn’t think of this as a bad thing. TypeScript is doing here what the language was made for. It warns the developer about the type errors in compile time to avoid (much more severe) errors for the end-user in the runtime. Avoid the compile error by using a technique called declaration merging. That will effectively extend the Request interface provided by Express with your custom treatment property.

The way to expand Express declaration types is to create a declaration file inside the custom @types folder, with a structure that simulates the one located in ./node_modules/@types. That means creating an index.d.ts file located at ./@types/express/index.d.ts that will expand on Express type definitions found at ./node_modules/@types/express/index.d.ts.

Create the ./@types/express/index.d.ts file with:

Your project tree should look similar to the structure on the image:

Populate the file with this code:

Were you wondering about this wandering export {} statement from above? It is needed for this file to be understood as a typescript module, as only files with top-level import or export statements are interpreted as modules.

Add the typeRoots property to the tsconfig.json file, so the file now looks like this:

This will ensure that the TypeScript compiler will search for types not only inside the ./node_modules/@types folder, which is the default but also in the custom folder ./@types that you’ve made. Finally, you can use the treatment property inside the Request interface in the app.ts, and the error in the console should disappear.

Let’s take a quick look on the angle bracket syntax (<SplitIO.SplitKey>) in this line of the middleware: const key: SplitIO.SplitKey = <SplitIO.SplitKey>request.headers['authorization'];. The angle brackets provide a nice TypeScript feature of type casting a variable from one type to another. In this particular case, request.headers['authorization'], of type string, is casted into SplitIO.SplitKey type, as getTreatment function’s type definition expects the first argument to be of the SplitIO.SplitKey type.

Edit the routing line by adding a treatment middleware:

Now it’s time to use the request.treatment value for branching inside the endpoint function.

Route Branching with Feature Flags in Node

You should rework the getLocationsWithTimezone function to perform branching regarding the request.treatment value. Users who hit the on treatment will get an extra location in the response list – Kenya.

Things will soon get more clear after you check what the endpoint returns with a curl request that uses an authorization header:

Note: The value of user1 obviously doesn’t make much sense as the real authorization header. It’s used here just as an example to show how the feature flag key is used and what calculations are made based on that value.

Since we set the treatment up with a 50/50 split, you could get either response. So the Schrodinger’s response you are getting either includes or doesn’t include Kenya. The important thing is that every subsequent request with the same header value, or the same Split key, will give you the same treatment. Split ensures a consistent user experience.

Calculations that Split performs behind the scenes guarantee that for the same key parameter (user1), the getTreatment method returns the same treatment value every time, or at least until you say so.

Now try something different; for example, increment the number in the header value:

Even a minor change in the Split key that gets sent with the getTreatment method results in a different set of treatment calculations provided by Split. Again, this request has a 50% chance of resulting in either the on or the off treatment. If you are getting back the same response for authorization:user2 as for the authorization:user1, you can keep incrementing the number in the header, and eventually, you’ll get a different treatment resulting in a different response.

Reallocate the Feature Flag

Sometimes the consistent user experience provided by percentage split isn’t what you necessarily want all the time. As previously hinted, that can be stopped by the split author. In some situations, you’d like to change the treatment some users are getting without changing the targeting rules. Easily achieve this by split reallocation. After this action, the configuration, including the targeting rules, will remain intact. But you should recalculate the treatment for each user.

To reallocate the split, click on the three-dot menu in the upper-right corner of the timezone_split inside the Split dashboard, and then click Reallocate.

In the dialog, just type REALLOCATE in the first input field and click Reallocate again.

Now restart your server and try the first request again via curl:

Calculate the treatment for this request again, thanks to the split reallocation. There is a 50% chance that this request’s treatment will be different than the one before the allocation. If you are getting the same treatment again, try to reallocate the split and restart the server again. Eventually, you’ll get a different treatment.

Feature Flags in Real World Node + TypeScript Apps

This is a vivid example of how feature flags can serve different responses based on specific targeting. In this case, the targeting was random and consistent at the same time. Imagine an A/B testing situation where you don’t know which solution is better, and you can’t target a specific slice of the users by any definable criteria. But you want them to have a consistent experience during the testing phase. That situation would require random but consistent targeting, as shown here.

If you want to see it all in one place, this is how app.ts file looks like in the end:

Using feature flags can bring your software product to a whole new level. The Node TypeScript example shown here is just a small peek inside various ways to use feature flags to give the best possible experience to your application’s end users.

Learn More About Node, TypeScript, and Feature Flags

You’ve reached the end of the tutorial here. But feel free to continue learning about Node and TypeScript – a complementary set of tools for creating great server-side applications. Here are a few links to get you started:

Bookmark Split’s Node.js SDK Documentation
Build a library book API (another self-contained Node.js tutorial you can use or share): Get Started with Feature Flags in Node
Spin up an example application showcasing integration of the Split JavaScript SDK on a React + Redux codebase.

Switch It On With Split

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Technical

10 Tools Every React Developer Needs

Explore 10 powerful tools for building better React apps faster—plus a bonus feature flagging solution to boost release safety and speed.

Dave Karow

December 4, 2020

Time to read

Creating, maintaining, and deploying your React app can be frustrating if you’re not taking advantage of modern tooling. However, with the hundreds of thousands of tools out there, it can be overwhelming to decide which ones to use. With the end goal of making feature development and coding as smooth as possible, I’ve done some research, and I’m sharing the tools I found that are making my life easier every day.

1. Create React App

Facebook’s Create React App is the known fool-proof way of creating and maintaining a React application. You don’t have to worry about project structure, or which modules to add. Create React app takes care of all of that for you. The thing that makes Create React App stand out is the setup. When you create an app with this tool, you are automatically setting up all of the files that a React app needs to run. You also don’t need to configure anything, as it’s already taken care of for you.

2. No Sweat ESLint and Prettier Setup

This ESLint and Prettier setup from Wes Bos is one of the best linting packages you can use. It lints JavaScript based on the latest standards, fixes formatting errors with Prettier, and lints and fixes everything inside of html script tags. My favorite part about this configuration is that it automatically adds semicolons where necessary!

3. XState

XState is a library for creating, interpreting, and executing JavaScript and TypeScript finite state machines and statecharts for modern web development. With useState, you can represent a piece of state in your component that can be changed at any time. But since the state is directly “changed”, it’s unclear as to how the state can change; this logic is scattered in event handlers and other parts of your component. With useReducer, the logic can be represented clearly in a centralized place — the reducer. Events can be “dispatched” to the reducer, which determines the next state based on the current state and received event. But two things are unclear: you don’t have a full picture of all the possible logical “flows” that can occur, and side-effects are handled separately. This makes it difficult to know exactly what can happen, as all of this logic is stuck in the developer’s head, instead of the code.

This is where XState and @xstate/react come in. With the useMachine hook, you represent your state and logic similar to useReducer, except you define it as a state machine. This lets you specify finite states that control the behavior of “what happens next” per state, as well as what effects should be executed on state transitions. The result is more robust logic, making it impossible to get into impossible states. Oh, and it can be visualized, automatically!

4. React Testing Library

Testing is all about confidence. You want to be able to ship bug-free applications. As Kent C. Dodds rightfully says, the more your tests resemble the way your software is used, the more confidence they can give you. React Testing Library is an absolutely critical tool to enable you to do this effectively. Plus, it’s recommended by the React team for testing React apps and it’s the de facto standard for testing your React code. Give it a try!

5. React Router

React Router is a collection of components that get composed declaratively with your application. React Router conditionally renders certain components to display depending on the route that’s being used in the URL. The power with React Router is proven when you use multiple routes to determine which component should render based on which path is active at that moment. Many times, if you’re building a single page application, you will use React Router to render specific components that act like separate pages — making it look like your website is made up of more than one page.

6. Styled Components

In a typical React application, CSS files and JS files are separated — meaning your styles and components live in different places. Although this is the traditional way of approaching a React application, it comes with its downsides. For example, if you are looking at a CSS file, you have no clue which component is using it, so if you make any changes to it, you don’t know what it’s going to potentially break. The solution to this problem is to scope your styles so that the layout, styling, and app logic all lives in one place in your code. Because of this, when you make changes, you’ll know what it’s affecting, and you will avoid surprises in production. Styled Components make it easy to understand what’s going on because you won’t have to go back and forth between the CSS and js files — It removes this mapping between components and styles, reducing confusion and time wasted going back and forth.

7. Framer Motion

Framer Motion is a production-ready motion library for React. It’s an open-source prototyping tool for React applications. The motions and animations are powered by the Framer library. Framer Motion includes spring animations, simple keyframes syntax, gestures (drag/tap/hover), layout, and shared layout animations. The beauty of Framer Motion is you don’t need to be a CSS expert to make beautiful animations anymore — you can simply create a prototype and then integrate it with your application.

8. Storybook

Storybook is an open-source tool you can use to develop and inspect UI components in isolation. It makes development faster and easier by isolating components. This allows you to work on one component at a time. You can develop entire UIs without needing to start up a complex dev stack, force certain data into your database, or navigate around your application. It also helps you document components for reuse and automatically visually test your components to prevent bugs.

9. Reach UI

Reach UI provides the foundation for making your React App accessible. Reach UI is a library of accessible React components that have each been tested with Safari + VoiceOver, Firefox + NVDA, and Edge + JAWS. Because they have minimal styling, you can go ahead and add whatever styles you desire to the components.Whether or not you choose to use one or all of the components depending on your scope, Reach UI is a great place to start with accessibility!

10. React Proto

React Proto is another prototyping tool for developers and designers. You can prototype UIs without having to write code for it — you simply drag and drop the components into your existing application or create and export a new app with create-react-app. Once the export is complete, you will have auto-generated code that you can further customize with your team. This takes away the headaches that are brought on by classic CSS.

Fork and Clone Repository.

Bonus: Split

Split is a feature-flagging tool that allows developers to separate code deployment from feature release. With Split, you can test your code in production, run AB tests and other types of experiments, migrate your monolith to microservices, and release features with the click of a button.Split combines feature flags and data. Deploy often, release without fear, and experiment to maximize impact. With Split, you can develop faster, release more often, and maximize your impact. With its numerous integrations, it will fit into your current workstream easily!

Learn More About React and Feature Flags

Using the right tools for your React App can not only make your life simpler but can make development with React more fun! With these ten tools, you will have increased ease of feature development, which will accelerate your releases, and increase your engineering culture. For more information on getting started with feature flags in React, their benefits, and use cases, check out these posts:

We are always posting new content to our Twitter page, @splitsoftware, and our YouTube channel, so be sure to follow us!

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

Implementing Production Testing with Feature Flag Management

Safely test features in production using feature flags, boost release confidence, and implement a 90-day rollout plan for modern delivery.

Dave Karow

August 5, 2020

Time to read

Testing in production is becoming more and more common across tech. The most significant benefit is knowing that your features work in production before your users have access. With feature flags, you can safely deploy your code to production targeting only internal teammates, test the functionality, validate design and performance, fix any bugs or defects, and then turn the feature flag on and allow your users to access it already knowing that it works in production.

Summary

Ensure your automation framework makes it easy to write end-to-end tests and has excellent reporting so that the team knows exactly what happens when production testing fails. If you’re using a feature flag management tool you should set it up to mirror your current environment setup. You should decide with your product team which tests will run at which cadence.

Like most developers in tech, we want results fast. The following plan is both guidance and order of operations for what to implement if you want to start testing in production. If your team is still hesitant or fearful, have them watch my video on testing in production.

The Benefits of Testing in Production

Your users are not going to log into your staging environment to use your software, so why do companies use test environments to test their features before release? The answer is that it’s just been the status quo for so long in software development. The norm is to have your developers deploy their code to staging, have the QA team production testing in staging, and then deploy to production after testing. However, what do you do when your staging test results do not match production testing results? What do you tell the QA engineer who spent so much time testing a feature in staging that it broke in production?

Testing in production with feature flags is a safe way to ensure feature functionality in the environment that your features will live in where user experience is paramount. At the end of the day, no one cares if your features are working in staging, they care if they work in production, and the only way to know if it’s working in production is to test it in production.

Comparing Testing in Prod to Testing a in Staging Environment

Real World User Traffic: Testing in production exposes your software to real-world user traffic, providing invaluable insights that are often difficult to replicate in a staged environment. It allows you to see how your application behaves under real usage patterns.
Load Testing: Conducting load testing in production offers the advantage of seeing how your application handles traffic spikes and heavy usage in real time to ensure proper load balancing. This is crucial for applications that experience variable load, ensuring they remain responsive under different conditions.
Performance Tests: Performance tests in production can reveal issues that may not be visible in a staging environment, such as latency problems and resource bottlenecks, due to the interaction with live databases, APIs, and other services.
Integration Tests: While integration tests are crucial in both environments, in production, they validate the interaction between your application and external services under real operational conditions, ensuring all components work harmoniously.

Testing in a Staging Environment

Controlled Conditions: A staging environment allows for testing in a controlled, stable setting that mirrors the production environment but without affecting real users. This is essential for thorough testing before deployment.
Unit Tests and Integration Tests: Both unit tests, which test individual components for correct behavior, and integration tests, which ensure that all parts of the application work together correctly, are ideally conducted in a staging environment. This allows testers to isolate and fix issues before they impact users.
Load Testing and Performance Tests: While these tests can also be conducted in staging, the predictability of a controlled environment might not accurately capture how the application performs under unpredictable real-world conditions. However, staging allows for identifying and mitigating potential performance issues early.
Testers: Staging environments are predominantly the domain of professional testers who can extensively test and debug issues before any code reaches the production stage. This helps in ensuring that only well-tested and verified changes make it to the live environment.

Testing in production with feature flags makes code changes, rollbacks and the development process easier for software engineers. Even enabling modern methodologies like continuous delivery.

30-60-90 Day Implementation Plan

So you’re ready to implement, now what? With this sample 30-60-90 plan, you can be testing in production in just 90 days. For more complex systems or larger teams, you can expand the timeline, or for the smallest of orgs, you might be able to work through everything in the first 30!

The First 30 Days

The focus of the first 30 days should be project alignment and if you’ve chosen to implement a feature flag management tool like Split, education. This is when you hammer out the details that will make testing in production work for your team. In the first 30 days, it’s essential to revisit the team’s automation framework that’s in place to make sure it is easy to use and implement. If your organization is having trouble with their existing automation framework, it will be a roadblock in the future, especially when in a testing in production environment. Ensure your automation framework makes it easy to write end-to-end tests and has excellent reporting so that the team knows exactly what happens when production testing fails.

If you’re onboarding a tool to manage your feature flagging and experimentation, like Split, your next step is to go through the administration of said tool. This can include setting up SSO, permissions, user creation, and user maintenance. Once you have these set up, you are ready to implement the appropriate SDK.

During this phase, it is also important to gather baseline metrics for benchmarking. These metrics can include things like time to release, page load time, percentage of bugs in production vs. staging, percentage of bugs found before release vs. after release, etc. Once you have these baseline metrics, you have a standard to compare any changes against. After you release a feature, you can measure its performance and make any necessary process improvements.

Days 30-60

In the next 60 days, if you’re using a feature flag management tool you should set it up to mirror your current environment setup. For example, if you currently have Dev, Test, QA, UAT, and Prod in your SDLC, you should have those accurately reflected in your tool. These environments should mirror your current environment setup in your application. (Eventually, you will only have production and dev here to mirror true testing in production setup). Once you have the environments set up, you should add segments of teams to individually target in each environment. This can be done in the ‘Individual Targets’ Section of your feature flag configuration. For example, if your product team currently validates features after releasing to production, you can add a segment for the product team, and add them to the individual targets in the production environment. This means that while the feature flag is still off, that team will still have access to the feature.

Another important step, regardless of whether you’re using a tool or not, is to differentiate between test data and real data in production. One way to do this is to have a boolean set up for your test entities in production. With Split, you can use is_test_user = true for the test users and is_test_user = false set automatically for real production users. In your BI tool (Datadog, Looker), you can create a separate database for all of the test users’ activity so that you can make business decisions based on real user data, not from data from your automated tests in production.

Alignment on your team’s definition of done is a crucial pillar for success and should happen in this phase. Your entire team should agree that a feature is not considered “Done” until the tests are running in production, and the flag is on for 100% of the population.

Run Your First Test in Prod — Days 60-90

The last phase of your implementation plan is where we get to the fun stuff, your first real test in production. You will deploy your first feature to prod with the default rule off for safety, meaning that only the targeted users will have access to the feature. This can be set up through your feature flag configuration in Split. Then, you will run your automation scripts in production with the test users that you’ve targeted, as well as the regression suite to ensure previously released features continue to function normally. In this time when the feature flag is off and only your targeted team members have access to the feature, you will be testing in production. You will resolve any bugs and validate all proper functionality. Keep in mind that if anything does go wrong, there will be no impact to your end-users because they don’t have access yet.

Once you have confidence that the feature is working properly and you resolve all issues, you then release the feature to 1% of the population through a canary release. With Split’s percentage rollout allocation, you can easily allocate a specific percentage of users to have access to your new feature. As your confidence grows and you monitor your error logs, you can slowly increase that percentage until your entire user base has access to it. Via Split monitoring, you will be able to ensure that there is no negative impact to your baseline metrics, and you can slowly allocate more traffic as your comfort level increases. Finally, you will turn the default rule on already knowing that your feature is working in production.

Once you hit the 90-day mark, you should have a regular test cadence in place. You should decide with your product team which tests will run at which cadence. For example, you can have your high priority test suite run hourly, and a lower priority production testing suite run nightly. You should also have alerting set up for each test so that if a test fails for any reason, you will get alerted and be able to analyze as quickly and efficiently as possible.

At this point, its a good idea to do a retrospective of the process with your team and figure out what worked well and what part of the process needs improvement. Aligning on all the different stakeholders’ roles and responsibilities is imperative for optimal performance.

If you’ve implemented your feature with Split, you’ll have access to our technical documentation and support team throughout the entire onboarding process, and we’re always excited to lend a hand.

Test in Production, Now and Always

Hopefully, I have reduced the burden of creating a testing plan from scratch with this guide. Now you can set up tests in production in as little as 90 days! Remember that most of the pushback of a testing in production environment comes from fear – fear of impacting your user base, fear of impacting real data in the live environment, and fear of generally messing everything up in production. This fear and these risks can all be mitigated with feature flags. When implemented correctly, feature flags open the door to so many possibilities, not only testing in production, but canary releases, too. Don’t even get us started on the pre-production environment.

Learn More About Testing in Production and Feature Flags

Testing in production can be overwhelming when you don’t have a plan. With this implementation guide and the following resources, you’ll be on your way to testing like a pro no matter the use case.

Read about the 5 Best Practices for Testing in Production
If you’re having trouble convincing your team to test in production, link them to our Breakup Letter to Staging
Learn more about The Benefits of Feature Flags in Software Development

To stay up to date on all things testing in prod and feature flagging, follow us on Twitter @splitsoftware, and subscribe to our YouTube channel!

Switch It On With Split

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Explore more on-demand webinars: Testing in Production With Feature Flags & Continuous Deployment

‍

Technical

Why Would You Decouple Deployment from Release?

Decouple deployment from release to test safely, limit risk, and ship faster without user impact or rollback stress.

Dave Karow

August 13, 2019

Time to read

Last week, I spoke about the foundational idea of decoupling deployment from release. This week, let’s answer the question, “Why would I want to?”

At a high level, there are really just two reasons:

1. You want to prevent code that’s still a work in progress from being exposed to users before it’s ready.

If you are using Trunk-Based Development, where all code is committed to the main/trunk no less than once a day, you need this decoupling or else your work in process will go live the next time a deploy happens.

Even if you aren’t using Trunk-Based Development, you may want to put code on production in a way that only the dev team can execute it.

That leads us to the second reason you would want to decouple deployment from release:

2. You want to safely test in production, limiting the blast radius if unexpected bad things happen.

Even when you think you are “done” building and testing a feature, there’s still a chance that bad things can happen when that code hits production.

Testing in production may start with just the dev team.

From there, you can proceed through orderly stages of exposure, while checking the health of system and user behavior metrics at each step along the way.

Start small, and learn with less risk

The idea is to start with smaller, low-risk user populations and then to ramp up to larger, higher-risk user populations if things go well or to ramp down to internal users only or just developers doing debugging if things don’t go well.

Don’t roll back or roll forward, just un-release instead

When you decouple deployment from release, you can control the exposure of your code without a rollback or a roll forward. If something goes wrong, there’s no need to re-deploy the prior version or hastily build, test and deploy a patch, since you can simply “un-release” it from any population.

The bottom line is that decoupling deployment from release enables teams to ship more often with greater safety. That, by the way, is why we named this blog and videos, Safe at Any Speed.

Up Next: Progressive Delivery

Next time, we’ll start a two-part series on Progressive Delivery, a term that’s been gaining more traction in the last few months. James Governor of Redmonk coined that term after a conversation with Sam Guckenheimer, the Product Owner of Azure DevOps at Microsoft.

What was it Sam said to James?

Well, when we’re rolling out services. What we do is progressive experimentation because what really matters is the blast radius. How many people will be affected when we roll that service out and what can we learn from them?

InfoQ

It’s not just about “turning things off” when things go wrong, but about learning at each stage of the rollout. Being able to turn things off is nice, but without learning, you aren’t really safe, are you? See you next time!

Jump to the next episode of Safe at Any Speed: The Path To Progressive Delivery.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Technical

7 Ways Feature Flags Improve Software Development

Feature flags drive safer rollouts, empower experimentation, simplify migrations, and enable faster, more resilient software development.

Harness Team

September 16, 2020

Time to read

Feature flags. In modern development they’re an incredibly common building block we interact with every day. Yet they often don’t achieve their full potential by driving the kind of rapid testing and experimentation that will make your team and organization truly world-class (or truly resilient to the growing uncertainties of our world).

The bottom line is that every API, web app, SaaS platform… essentially every tech company, should be using a robust feature flag system. A well-built and well-implemented system will provide a host of value-adds and efficiencies for your dev team.

1. Feature Flags Enable Controlled Rollouts

A controlled rollout is “a feature rollout with highly granular user targeting. It allows you to release new features gradually, ensuring a good user experience for smaller groups of users before releasing them to larger groups.”

Controlled rollouts typically fall into two types, and both are best implemented via feature flags. In the first type the feature is released to a percentage of users (this can even be your internal or beta testers), and in the second the users are selected based on a specified attribute, like location, or IP address. Without a feature flag system in place, you’re likely going to be building this rollout mechanism yourself, and it will have limited reusability. A robust feature flag system will inherently enable either type of controlled rollout.

Beyond making targeting easier, feature flags make rollback a snap. There’s no code to re-deploy, you simply toggle your new feature off to the test group and relax in the knowledge that the majority of your user base were unaffected.

2. Feature Flags Centralize Visibility and Control

If all new features and updates deploy under the same feature flag system across your entire organization, you immediately ensure a level of safety and resilience in your app or API. If a rollback is needed it can be toggled by anyone with the appropriate permissions, vs. only those familiar with the specific code being deployed.

By using a master dashboard like the one in Split’s Feature Delivery Platform, you can control and monitor feature flag activity across multiple software products, and enable your Product Management team to take much of the release burden off of DevOps.

3. Feature Flags Enable Testing in Production

Let’s get real for a second. Everyone hates staging. Staging environments never exactly match production AND they’re costly to maintain. Truly, the only way to know that your features are working in production is to test them in production.

Since feature flags allow for continuous deployment of code directly into production, you can perform usability testing in production as well. Just enable your new features for devs and testers initially. Then verify the new code in production before a single customer is impacted by your changes.

4. Feature Flags Give You a Kill Switch

If you’re going to test in production, how do you do so safely? Simple, you need an easy to use kill switch for every feature flag, so that anyone on your team can immediately disable any flag if a problem is detected. (And hey, with feature flags it’ll also be easy to redeploy that feature you killed when it bugged out!)

A kill switch isn’t the same as a rollback either, if that’s what you’re thinking. It’s an off switch. When you do decide to turn it back on post-fix, you won’t have to go through a code review process or even revert the change that caused the issue. You, your teammate, or your PM can just flip the switch. (With a feature flag management platform this would also mean that those same PMs can kill a feature easily, without needing to stop the line and pull in a developer.)

5. Feature Flags Separate Code Deployment from Rollout

Code deployment and software rollout don’t need to occur simultaneously anymore.

You’re a savvy developer. You’re ready to deploy your new feature to prod weeks before your PM is ready to begin testing and optimization. Deploy your code behind a feature flag and move along with your sprint, the PM can have you toggle it on when they catch up (or, like I keep mentioning, if you have a management platform they can self-serve this function).

Bottom line? Feature flags allow you to deploy new features to production and choose when to enable them.

6. Feature Flags Enable Localization and Internationalization

This goes to one of the core functions of feature flags, user segmentation. Segmentation allows you to test in production with a canary, it enables randomized groups for experimentation, and it allows you to group your users by attributes. If localization and internationalism matter for your app or API, you can easily build that into your feature flag structure and only enable features in the appropriate regions, serve up language variations for your platform, or solve really any other demographic-specific need.

7. Feature Flags Simplify Migrations

Don’t push code changes at the same moment you’re ready to switch over to a new database or backend service: write the code, deploy and test it in production ahead of time, and use feature flags to cut over customer traffic the moment you’re ready. The benefits here are huge. Infrastructure changes are particularly fraught, and concerns can be mitigated slowly and responsibly with feature flags. Cut over one customer or region at a time to ensure the rest of your infrastructure can manage the load.

Bonus — Feature Flags Enable Automation and Experimentation

I’ve mentioned it before, but let’s recap… Feature flags, and especially a robust feature flagging platform, allow your application to automatically toggle features in response to changes in system performance or other predefined triggers without any human involvement. For example, when peak load occurs, you might turn off a recommendation engine or an inventory check before allowing an item to be put in the cart.

Feature flags also enable experimentation. While this isn’t purely a software development concern, I promise your PM cares about testing their features. With feature flags you can configure statistically rigorous tests that will drive real business impact, all based on how your customers are already using your features, and how they interact with new features you toggle on.

Learn More About Adding Feature Flags to Your Application

Here at Split we obviously think feature flags are the bomb, and that they’ll solve all your development woes. They might even be able to put to rest the age-old argument of tabs vs. space. We don’t know how, but we believe.

Anyway, if you’re interested in implementing feature flags in your application (and sharing the joy of flags with us) we’ve got some content just for you!

And as always, we’d love to have you join us on your social media platform of choice. We share all our latest content, features, and integrations on Twitter, LinkedIn, and Facebook, and we release new videos regularly on YouTube.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

‍

Technical

Automating Trunk-Based Development With CI/CD

Automate trunk-based development with CI/CD to improve speed, reduce errors, boost quality, and deliver reliable software at scale.

Harness Team

February 23, 2023

Time to read

The usage of a single, common codebase by all developers on a team is emphasized by the software development method known as trunk-based development (TBD). Instead of building feature branches, developers use this method to commit changes straight to the trunk, often known as the main branch. With frequent integrations and codebase validations, this strategy ensures that the code is constantly in a usable state.

Trunk-based development has grown in favor in recent years. This is due to its capacity to shorten development cycles, enhance teamwork, and lessen the likelihood of disputes like merge conflicts. Manual testing and validation can be laborious and prone to mistakes. As a result, Continuous Integration (CI) and Continuous Deployment (CD) are becoming a popular way to achieve trunk-based development.

What’s CI/CD? It’s an automation technique that gives developers a consistent and efficient way to integrate code changes, carry out automated tests, and release software to production. With CI/CD, teams can decrease the time and effort needed to remove new features and updates. They can also improve the quality and dependability of the product by automating build, test, and deployment processes.

Advantages of CI/CD

There are many advantages to automating trunk based development with CI/CD.

First, developers can quickly find and fix problems with automated testing and validation, which reduces the impact on the development cycle and saves time. As a result, there is a lower chance of introducing defects into the codebase. Instead, teams can find faults and flaws earlier in the development process.

Second, automation speeds up teams by minimizing human tasks like creating, testing, and deploying features. Therefore, developers are freed from having to spend time on tedious and time-consuming chores. As a result, teams are empowered to concentrate on building new features and enhancing the user experience.

Finally, automation boosts software quality assurance. Teams can guarantee that the codebase is constantly working by implementing automated testing and validation, which lowers the chance of errors and flaws in production.

Trunk-based Development is a methodology that encourages teamwork, flexibility, and effectiveness in software development. Teams can build and deliver software more quickly, confidently, and effectively by utilizing the advantages of CI/CD and trunk based development. Speed up the development cycle, decrease errors, and improve the reliability of applications by automating them with CI/CD.

Continuous Integration (CI) and Continuous Delivery (CD) in the Context of Trunk-Based Development

Software development teams may build, test, and release code fast and reliably thanks to the automation approaches of continuous integration (CI) and continuous delivery (CD). With frequent integrations and validations, CI/CD is essential for guaranteeing that the codebase is constantly in working condition.

What Are the Differences Between CI and CD?

Constantly integrating updated code into a shared repository, where it is automatically produced and tested, is known as continuous integration. This minimizes the possibility of introducing bugs into software by ensuring that errors or conflicts are found early in the development cycle. Using CI, developers may find problems fast and fix them before they have a chance to affect the rest of their team.

Automating the deployment of code changes to production is known as continuous delivery. Through CD, teams can quickly and reliably release new features and upgrades with little risk and delay. This greatly decreases the time and effort needed to remove software while simultaneously improving the quality and dependability of the product by automating the deployment process.

Teams that practice trunk-based development through CI/CD receive faster feedback. They can detect problems early and save time and effort through constant validation. This frees up developers to concentrate on building new features and enhancing the user experience rather than troubleshooting problems.

What are the appropriate platforms and technologies to help implement CI/CD in trunk-based development? Jenkins, Travis CI, CircleCI, and GitLab CI/CD are some standard CI/CD tools. These solutions allow teams to accelerate the development cycle with capabilities like automated testing, build automation, and deployment automation.

After choosing the right tools and platforms, teams must establish the CI/CD pipeline phases for trunk-based development. Any pipeline should be separated into stages for creating, testing, and deployment. Plus, each stage should be designed to give the development team prompt and accurate input. Set up the appropriate pipeline to ensure that code updates are validated quickly and accurately.

Teams should use code to put the CI/CD pipeline into practice. By specifying the pipeline steps and configuration in code, version control, collaboration, and automation become simpler. As a result, they can grow and optimize their development process with the help of code-based pipelines, guaranteeing uniformity and dependability within the group.

Benefits of Automating Trunk Based Development with CI/CD

Software development teams gain several advantages by automating trunk-based development with Continuous Integration (CI) and Continuous Delivery (CD). Teams can shorten the length of their development cycle and guarantee that the codebase is always functional by automating the build, test, and deployment procedures.

Some advantages of automating trunk-based development with CI/CD include the following:

Increased Collaboration

Trunk based development promotes a single, shared codebase, which fosters cooperation. Teams may collaborate more effectively and spend less time on manual tasks. By automating the testing and deployment process, this lowers the likelihood of mistakes and disagreements.

Faster Feedback

Teams may quickly identify problems and solve them with frequent integrations and validations. This makes moving forward with the development cycle simpler, because finding and repairing errors takes less time and effort.

Reduced Risks

Teams can lower the likelihood of bugs and errors in production by automating the testing and validation process to keep the codebase functional. This can improve the software’s quality and dependability while saving time and resources.

Better Quality

Automated testing produces more detailed and accurate results compared to manual testing. Teams may ensure that the codebase is validated and lower the chance of introducing bugs by automating the testing process.

Faster Deployment

By automating the deployment process with CD, teams can reduce the time and effort needed to release new features and upgrades. Doing this makes it possible to provide new features and upgrades with little risk and downtime.

Increased Efficiency

Automation frees developers to concentrate on developing new features and enhancing the user experience by saving time and effort. This leads to a quicker time to market, a more effective development cycle, and lower expenses.

Software development teams gain several advantages from these strategies. Faster development and deployment, confidence, and high-quality software are just a few benefits.

Setting Up a CI/CD Pipeline for Trunk-Based Development

Choosing the appropriate CI/CD tools and platforms is the first step in establishing a pipeline for CI/CD. As we’ve mentioned previously, GitLab CI/CD, Travis CI, CircleCI, and Jenkins are all great options. When making choices, consider aspects like cost, ease of usage, and compatibility with the technology stack you’re already using. Additionally, consider platforms that might provide more sophisticated automation features.

Setting Up the Pipeline Stages for Trunk-Based Development:

After choosing the platforms and tools, the pipeline stages need to be set up.
Building, testing, and deployment stages—crucial in trunk-based development—should be included in the pipeline.
The development team should receive prompt and accurate feedback from the pipeline phases.
Automated testing ought to be incorporated into the pipeline because it offers a more thorough and precise validation of the codebase.

Using Code To Implement the Pipeline

Using code is necessary to implement the pipeline.
Code-based pipelines provide more straightforward scaling and optimization and a more dependable and consistent implementation.
By specifying the pipeline steps and configuration in code, version control, collaboration, and automation are simpler.

The pipeline steps must first be specified in a configuration file to be implemented using code. A configuration language like YAML or JSON can be used for this. The pipeline steps, the tools and platforms utilized, dependencies, and environment variables should all be specified in the configuration file.

The configuration file can be saved in a version control system like Git once it has been defined. This makes it simpler to collaborate and manage versions, roll back, and recover from problems.

Finally, the pipeline can be used using a program like Jenkins or GitLab CI/CD. The tool will automatically execute the pipeline phases after reading the configuration file and provide feedback and validation to the development team at each stage.

Code-based pipelines provide more straightforward scaling, optimization, as well as dependable and consistent implementation. Choosing the appropriate tools and platforms, defining the pipeline stages, and putting the pipeline into practice using code are all necessary steps in setting up a CI/CD Pipeline for trunk-based development. By following these methods, teams can shorten their development cycle and guarantee that the codebase is always usable.

Automating Trunk-Based Development With CI/CD

More than selecting the appropriate tools and platforms are required to automate trunk-based development with continuous integration (CI) and continuous delivery (CD). Teams should adhere to best practices that ensure efficient and successful implementation to maximize the advantages of CI/CD.

Some best practices for automating trunk-based development with CI/CD include the ones listed below:

Trunk-Based Development Testing Methods and Techniques

Trunk-based development relies heavily on automated testing. However, it’s critical to employ the proper testing methodologies. Trunk-Based Development can use unit, integration, and end-to-end tests. To ensure that the codebase is fully validated, it’s crucial to strike a balance between testing speed and depth.

Processes for Code Review and Collaboration

By regularly examining the code, teams can ensure that the codebase is efficient, maintainable, and consistent. Code reviews also present a chance for learning and knowledge exchange, which can enhance the entire development process.

Ensuring the codebase’s security and quality trunk-based development can be automated with CI/CD to assist in maintaining the codebase’s quality and security. Teams should use automated testing and validation since they can find faults and problems early in the development cycle. Also, it’s critical to ensure that security testing, which can identify vulnerabilities and reduce security risks, is included in the pipeline.

Coding changes must often be merged with the main branch in a project using the trunk-based development methodology. This lowers the possibility of conflicts and merging problems and guarantees that the codebase is always functional. To ensure that the codebase is consistently updated and validated, teams should attempt to merge code changes at least once daily.

Version Control and Release Management Through Automation

Trunk-based development can be complex when managing version control and releases. Teams can handle version control and releases more effectively by employing automation. Automatic release management, tagging, and versioning can save time and labor while lowering the possibility of mistakes and inconsistencies.

Trunk-based development may lead to numerous code changes and integrations, which may put a load on the CI/CD pipeline. Scalability and optimization of the CI/CD pipeline. Teams should ensure the pipeline is flexible and tailored to the project’s size and complexity. This includes providing parallel testing and deployment, employing distributed build systems, and caching dependencies.

Challenges and Considerations for Trunk-Based Development with CI/CD

Although automating trunk-based development with CICD offers software development teams many advantages, some difficulties and factors must be considered. These difficulties include managing releases and version control, handling merge problems and conflicts, and scaling/improving the CI/CD pipeline for big, complicated projects.

Trunk-based development involves numerous developers working on the same codebase, which can lead to disagreements and merge concerns. Teams should have established procedures for handling disputes, such as using pull requests and giving real-time feedback on changes. Also, it’s critical to have plans to deal with significant code alterations and guarantee that the codebase is consistently usable.

Trunk-based development makes it more difficult to maintain version control and release processes. That’s why teams should use automation. They can lean on release branches with automatic versioning, tagging, and release management to ensure that production releases are stable and dependable.

Scaling and Improving the CI/CD Pipeline

As projects get more extensive and complicated, it gets harder to scale and improve the CI/CD pipeline. Teams should ensure the pipeline is flexible and tailored to the project’s size and complexity. This includes providing parallel testing and deployment, employing distributed build systems, and caching dependencies.

Handling Dependencies

Trunk-based Development relies on careful dependency management to keep the codebase operational. This includes, using package managers and other dependency management tools, as well as routinely checking and upgrading dependencies to make sure they are compatible with the codebase.

Security and Compliance Considerations

Trunk-based development with CI/CD can increase the risk of security and compliance concerns. Teams should ensure compliance standards, such as data privacy laws, are satisfied and security testing is included in the pipeline.

Trunk-based development can lead to problems with code quality and maintainability, including technical debt and code complexity. Teams should use automated testing and code review procedures to ensure the codebase is efficient, maintainable, and consistent.

The Future of Trunk-Based Development With CI/CD

With several trends and developments that will influence how software is developed, the future of trunk-based development with CI/CD is bright. What can we expect?

More Automation

Trunk-based development with CI/CD will become more and more common, necessitating a higher demand for automation in the development process. This will improve the software’s quality and dependability while lowering the likelihood of errors and flaws. Thanks to automation, teams will be able to develop, test, and deliver code more rapidly and effectively.

Better Collaboration

Teams will need to find ways to collaborate more successfully as software development grows more dispersed and global. This might involve streamlined code review, collaboration procedures, and tools allowing real-time feedback and communication.

AI & Machine Learning

As these fields continue to grow, they will become more and more crucial to the creation of software. This may involve using AI to code analysis, give developers feedback, or apply machine learning to enhance testing and validation procedures. Thanks to AI and machine learning, teams can automate more complex tasks like locating and resolving conflicts in sizable codebases.

DevOps and Site Reliability Engineering (SRE)

The DevOps movement and SRE will continue to influence software development in the future. SRE will ensure that software is dependable and performant in production environments while DevOps approaches, such as automated testing and deployment, will become widely used.

Containerization and Microservices

As software development becomes more dispersed and modular, containerization and microservices will gain popularity. Microservices allow teams to construct more complicated apps by dissecting larger ones into smaller, more manageable components. Containerization on the other hand will enable teams to bundle and deploy code more effectively.

Cloud Computing

In Trunk-Based Development with CI/CD, cloud computing will continue to be essential. Teams will have access to the infrastructure and resources they need to build and deploy software fast and effectively thanks to cloud services like AWS and Azure. Also, cloud computing will make it easier for teams to grow their infrastructure, which will be crucial for managing big, complex projects.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation platform that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a free account today, schedule a demo to learn more, or contact us for further questions and support.

‍

On-demand webinar: How to avoid common CI/CD problems

Technical

Dynamic Configurations: Run more experiments without changing code

Dynamic Configs let teams instantly update feature behavior and UI elements without code changes, accelerating experimentation and iteration.

Harness Team

May 20, 2019

Time to read

A cornerstone of the Split platform is to help our customers decouple the dependencies on engineers making code changes as they control the release of each of their features. To that end, we’ve designed a powerful targeting interface to allow any member of your team to easily place different customers into different variations of a feature. Over the years, we’ve added more powerful targeting capabilities (e.g. regex matchers) and have optimized our UI to make it as simple as possible to control your releases.

To further enhance these feature targeting capabilities, we’re excited to announce the release of dynamic configurations. You will now be able to attach configurations to your treatments to instantly change components of your features without needing an engineer to make any code changes. By using this functionality, you will be able to further speed up your rate of learning as you measure how these different variations of your features perform with your customers.

What should I configure?

During user testing, we have observed many different use cases that Dynamic Configs can solve. Here are some ways Dynamic Configs can be used in your application:

1. Color, Sizing, Polish of UI Elements

We all know how important it is important to critically think through how an application can best draw the attention of a user and help them intuitively navigate to the places they need. Simply changing the color of a border around a box could drastically change how often a user ends up clicking on a specific element in an app. Rather than being hampered by constantly asking an engineer to change small elements like color, users can simply define these configurations in the Split UI. As results come in on what variations perform the best, continued iteration becomes as simple as typing out a new color or size.

2. Educational Copy

Simple and concise copy can also make or break the adoption of an application as different phrases or words can change how quickly a user understands how to use different parts of an application. Users can dynamically configure phrases that appear on buttons or as help text to see what creates the best adoption and retention on different features.

3. Special Offers or Free Shipping Thresholds

Limited time promotions or sales can be a powerful way to increase volume of sales in any sector. However, one of the toughest things to figure out is what level of promotion is necessary to create the right balance between increased volume and loss of revenue per sale. By dynamically configuring percentage discount amounts and free shipping thresholds, users can instantly iterate on and measure what discount or threshold creates the best balance.

4. Algorithm Weighting

In today’s market, all developers must aim to make their application as intelligent as possible and provide personalized results to their audience. As developers tune their backend algorithms to provide intelligent results, they can leverage dynamic configuration to tune weighting on different inputs to an algorithm and measure which weights give the most relevant information back to an end user.

5. Throughput Optimization

Applications today must process and perform calculations on massive amounts of data within a short timeframe. This often leads to a large amount of tuning of a machines threading to parallelize work in the best way possible. Dynamic configurations allow developers to easily change things like how many threads should be allowed and measure what creates the best performance in a real world scenario.

6. Handling For Connection Issues

With the advent of open APIs and microservice based architecture. Rigor around the transfer of data between different machines and services in a distributed architecture has become a hot topic of balancing complexity, cost, and speed. Dynamically configuring how many retries or how long to wait until timing out a request allows engineers to measure what rates create the best balance of data retention and speed.

Changing your configs without code

The Split UI now has a section that allows you to input a set of configurations for each of your treatments. You can leverage a simple key-value pair editor to define standard configurations like copy for a part of your application or promotional values. This centralized interface gives any member of a team from product, marketing, and engineering the ability to make changes to things like the color of a button and have those changes instantly reflected in your application.

If you’re more comfortable with JSON or want to define more complex JSON objects, we’ve also built out a robust JSON editor to let you define as complex a configuration as needed for you.

Once you’ve defined your configurations, our SDKS will automatically download down this information and your engineers can easily access them by calling a simple getTreatmentWithConfig method and leverage them in your application. Below is a quick example of what this would look like in Javascript:

Start using Dynamic Configs today

To learn more about Dynamic Configurations and see how you might be able to leverage them in your application, check out our product documentation. Our team has a ton in store to further enhance our ability to support teams looking to manage their experiments and feature flags. Make sure to subscribe to our release notes to be the first to hear about all of the new features we are building.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Testing & Resilience Blogs

Featured Blogs

Applying Feature Flag Context To Your OpenTelemetry Spans

Enhancing Observability with Feature Flags and OpenTelemetry

Prerequisites

Setup

How the Threaded Echo Server works

Adding Feature Flag Treatments to Spans

Next Steps for Feature Flag Observability

Recent Blogs

Harness FME Fast and Furious

A more integrated experience: from Split UI to Harness UI

Built for AI-driven workloads

Warehouse Native Experimentation (beta)

What else is new in Harness FME

What customers can expect next

Get started

Intent-Driven Assertions are Redefining How We Test Software

What Are Intent-Driven Natural Language Assertions?

Harness AI Test Automation Solving Traditional Testing Issues

How Harness AI Test Understands and Validates Your Intent

How Teams Use Harness AI Assertions

Transforming QA with Harness AI: Faster, Smarter, Reliable

DB Performance Testing with Harness FME

DB Performance Testing with Harness FME

Conclusion

AI Agents vs Real-World Web Tasks: Harness Leads the Way in Enterprise Test Automation

AI Agents vs Real-World Web Tasks: Harness Leads the Way in Enterprise Test Automation

Building on Previous Research

The B2B Challenge

Comprehensive Evaluation Results

Additional Web Automation Tasks

Banking Application Navigation

Audit Trail Navigation

Messaging Application Interaction

What Makes Solutions Perform Differently?

Why Harness Outperforms

Conclusion

AI-Powered Resilience Testing with Harness MCP Server and Windsurf

Simplifying Chaos Engineering

Understanding Harness Chaos Engineering MCP Tools

Core Experiment Tools

Advanced Monitoring Tools

Setting Up Harness MCP Server with Windsurf

Prerequisites

Step 1: Build the Harness MCP Server Binary

Building from Source

Step 2: Configure the Harness MCP Server in Windsurf

Step 3: Add the Path of your Binary and Harness Credentials

Step 4: Verify Installation

AI-Powered Chaos Engineering in Action

Discovery and Learning Phase

Experiment Execution Phase

Analysis and Reporting Phase

The Road Ahead

Resilience Testing using Harness

What is Harness Chaos Engineering?

Comprehensive Capabilities for End-to-End Resilience Testing

Real-World Use Cases for Harness Chaos Experiments

Easy Onboarding and Scalability

Deployment Options: SaaS and On-Premise

Conclusion: Build Resilient Systems with Harness

Harness launches MCP tools to enhance its AI powered Chaos Engineering Capabilities

Introducing Chaos MCP tools:

How to setup the Harness MCP Server?

How do I get started with resilience testing using Harness MCP tools?

Video Tutorial:

Important Links:

Harness AI Test Automation: End-to-End, AI-Powered Testing for Faster, Smarter DevOps

What Makes Harness AI Test Automation Different?

Create High-Quality Tests in Minutes: No Code Required

Self-Healing, Adaptive Test Maintenance

Intelligent, Scalable Test Execution

Seamless Integration and Enterprise-Grade Security

Real Results: Customer Success Stories

The Future Is Now: Join the Testing Revolution

Split Embraces OpenFeature

Driving Feature Flag Standardization with OpenFeature

What Is Feature Flagging?

Where Does Split Come in?