


Your developers complain about 20-minute builds while your cloud bill spirals out of control. Pipeline sprawl across teams creates security gaps you can't even see. These aren't separate problems. They're symptoms of a lack of actionable data on what actually drives velocity and cost.
The right CI metrics transform reactive firefighting into proactive optimization. With analytics data from Harness CI, platform engineering leaders can cut build times, control spend, and maintain governance without slowing teams down.
Platform teams who track the right CI metrics can quantify exactly how much developer time they're saving, control cloud spending, and maintain security standards while preserving development velocity. The importance of tracking CI/CD metrics lies in connecting pipeline performance directly to measurable business outcomes.
Build time, queue time, and failure rates directly translate to developer hours saved or lost. Research shows that 78% of developers feel more productive with CI, and most want builds under 10 minutes. Tracking median build duration and 95th percentile outliers can reveal your productivity bottlenecks.
Harness CI delivers builds up to 8X faster than traditional tools, turning this insight into action.
Cost per build and compute minutes by pipeline eliminate the guesswork from cloud spending. AWS CodePipeline charges $0.002 per action-execution-minute, making monthly costs straightforward to calculate from your pipeline metrics.
Measuring across teams helps you spot expensive pipelines, optimize resource usage, and justify infrastructure investments with concrete ROI.
SBOM completeness, artifact integrity, and policy pass rates ensure your software supply chain meets security standards without creating development bottlenecks. NIST and related EO 14028 guidance emphasize on machine-readable SBOMs and automated hash verification for all artifacts.
However, measurement consistency remains challenging. A recent systematic review found that SBOM tooling variance creates significant detection gaps, with tools reporting between 43,553 and 309,022 vulnerabilities across the same 1,151 SBOMs.
Standardized metrics help you monitor SBOM generation rates and policy enforcement without manual oversight.
Not all metrics deserve your attention. Platform engineering leaders managing 200+ developers need measurements that reveal where time, money, and reliability break down, and where to fix them first.
So what does this look like in practice? Let's examine the specific metrics.
Build duration becomes most valuable when you track both median (p50) and 95th percentile (p95) times rather than simple averages. Research shows that timeout builds have a median duration of 19.7 minutes compared to 3.4 minutes for normal builds. That’s over five times longer.
While p50 reveals your typical developer experience, p95 exposes the worst-case delays that reduce productivity and impact developer flow. These outliers often signal deeper issues like resource constraints, flaky tests, or inefficient build steps that averages would mask. Tracking trends in both percentiles over time helps you catch regressions before they become widespread problems. Build analytics platforms can surface when your p50 increases gradually or when p95 spikes indicate new bottlenecks.
Keep builds under seven minutes to maintain developer engagement. Anything over 15 minutes triggers costly context switching. By monitoring both typical and tail performance, you optimize for consistent, fast feedback loops that keep developers in flow. Intelligent test selection reduces overall build durations by up to 80% by selecting and running only tests affected by the code changes, rather than running all tests.

An example of build durations dashboard (on Harness)
Queue time measures how long builds wait before execution begins. This is a direct indicator of insufficient build capacity. When developers push code, builds shouldn't sit idle while runners or compute resources are tied up. Research shows that heterogeneous infrastructure with mixed processing speeds creates excessive queue times, especially when job routing doesn't account for worker capabilities. Queue time reveals when your infrastructure can't handle developer demand.
Rising queue times signal it's time to scale infrastructure or optimize resource allocation. Per-job waiting time thresholds directly impact throughput and quality outcomes. Platform teams can reduce queue time by moving to Harness Cloud's isolated build machines, implementing intelligent caching, or adding parallel execution capacity. Analytics dashboards track queue time trends across repositories and teams, enabling data-driven infrastructure decisions that keep developers productive.
Build success rate measures the percentage of builds that complete successfully over time, revealing pipeline health and developer confidence levels. When teams consistently see success rates above 90% on their default branches, they trust their CI system to provide reliable feedback. Frequent failures signal deeper issues — flaky tests that pass and fail randomly, unstable build environments, or misconfigured pipeline steps that break under specific conditions.
Tracking success rate trends by branch, team, or service reveals where to focus improvement efforts. Slicing metrics by repository and pipeline helps you identify whether failures cluster around specific teams using legacy test frameworks or services with complex dependencies. This granular view separates legitimate experimental failures on feature branches from stability problems that undermine developer productivity and delivery confidence.

An example of Build Success/Failure Rate Dashboard (on Harness)
Mean time to recovery measures how fast your team recovers from failed builds and broken pipelines, directly impacting developer productivity. Research shows organizations with mature CI/CD implementations see MTTR improvements of over 50% through automated detection and rollback mechanisms. When builds fail, developers experience context switching costs, feature delivery slows, and team velocity drops. The best-performing teams recover from incidents in under one hour, while others struggle with multi-hour outages that cascade across multiple teams.
Automated alerts and root cause analysis tools slash recovery time by eliminating manual troubleshooting, reducing MTTR from 20 minutes to under 3 minutes for common failures. Harness CI's AI-powered troubleshooting surfaces failure patterns and provides instant remediation suggestions when builds break.
Flaky tests pass or fail non-deterministically on the same code, creating false signals that undermine developer trust in CI results. Research shows 59% of developers experience flaky tests monthly, weekly, or daily, while 47% of restarted failing builds eventually passed. This creates a cycle where developers waste time investigating false failures, rerunning builds, and questioning legitimate test results.
Tracking flaky test rate helps teams identify which tests exhibit unstable pass/fail behavior, enabling targeted stabilization efforts. Harness CI automatically detects problematic tests through failure rate analysis, quarantines flaky tests to prevent false alarms, and provides visibility into which tests exhibit the highest failure rates. This reduces developer context switching and restores confidence in CI feedback loops.
Cost per build divides your monthly CI infrastructure spend by the number of successful builds, revealing the true economic impact of your development velocity. CI/CD pipelines consume 15-40% of overall cloud infrastructure budgets, with per-run compute costs ranging from $0.40 to $4.20 depending on application complexity, instance type, region, and duration. This normalized metric helps platform teams compare costs across different services, identify expensive outliers, and justify infrastructure investments with concrete dollar amounts rather than abstract performance gains.
Automated caching and ephemeral infrastructure deliver the biggest cost reductions per build. Intelligent caching automatically stores dependencies and Docker layers. This cuts repeated download and compilation time that drives up compute costs.
Ephemeral build machines eliminate idle resource waste. They spin up fresh instances only when the queue builds, then terminate immediately after completion. Combine these approaches with right-sized compute types to reduce infrastructure costs by 32-43% compared to oversized instances.
Cache hit rate measures what percentage of build tasks can reuse previously cached results instead of rebuilding from scratch. When teams achieve high cache hit rates, they see dramatic build time reductions. Docker builds can drop from five to seven minutes to under 90 seconds with effective layer caching. Smart caching of dependencies like node_modules, Docker layers, and build artifacts creates these improvements by avoiding expensive regeneration of unchanged components.
Harness Build and Cache Intelligence eliminates the manual configuration overhead that traditionally plagues cache management. It handles dependency caching and Docker layer reuse automatically. No complex cache keys or storage management required.
Measure cache effectiveness by comparing clean builds against fully cached runs. Track hit rates over time to justify infrastructure investments and detect performance regressions.
Test cycle time measures how long it takes to run your complete test suite from start to finish. This directly impacts developer productivity because longer test cycles mean developers wait longer for feedback on their code changes. When test cycles stretch beyond 10-15 minutes, developers often switch context to other tasks, losing focus and momentum. Recent research shows that optimized test selection can accelerate pipelines by 5.6x while maintaining high failure detection rates.
Smart test selection optimizes these feedback loops by running only tests relevant to code changes. Harness CI Test Intelligence can slash test cycle time by up to 80% using AI to identify which tests actually need to run. This eliminates the waste of running thousands of irrelevant tests while preserving confidence in your CI deployments.
Categorizing pipeline issues into domains like code problems, infrastructure incidents, and dependency conflicts transforms chaotic build logs into actionable insights. Harness CI's AI-powered troubleshooting provides root cause analysis and remediation suggestions for build failures. This helps platform engineers focus remediation efforts on root causes that impact the most builds rather than chasing one-off incidents.

Visualizing issue distribution reveals whether problems are systemic or isolated events. Organizations using aggregated monitoring can distinguish between infrastructure spikes and persistent issues like flaky tests. Harness CI's analytics surface which pipelines and repositories have the highest failure rates. Platform teams can reduce overall pipeline issues by 20-30%.
Artifact integrity coverage measures the percentage of builds that produce signed, traceable artifacts with complete provenance documentation. This tracks whether each build generates Software Bills of Materials (SBOMs), digital signatures, and documentation proving where artifacts came from. While most organizations sign final software products, fewer than 20% deliver provenance data and only 3% consume SBOMs for dependency management. This makes the metric a leading indicator of supply chain security maturity.
Harness CI automatically generates SBOMs and attestations for every build, ensuring 100% coverage without developer intervention. The platform's SLSA L3 compliance capabilities generate verifiable provenance and sign artifacts using industry-standard frameworks. This eliminates the manual processes and key management challenges that prevent consistent artifact signing across CI pipelines.
Tracking CI metrics effectively requires moving from raw data to measurable improvements. The most successful platform engineering teams build a systematic approach that transforms metrics into velocity gains, cost reductions, and reliable pipelines.
Tag every pipeline with service name, team identifier, repository, and cost center. This standardization creates the foundation for reliable aggregation across your entire CI infrastructure. Without consistent tags, you can't identify which teams drive the highest costs or longest build times.
Implement naming conventions that support automated analysis. Use structured formats like team-service-environment for pipeline names and standardize branch naming patterns. Centralize this metadata using automated tag enforcement to ensure organization-wide visibility.
Modern CI platforms eliminate manual metric tracking overhead. Harness CI provides dashboards that automatically surface build success rates, duration trends, and failure patterns in real-time. Teams can also integrate with monitoring stacks like Prometheus and Grafana for live visualization across multiple tools.
Configure threshold-based alerts for build duration spikes or failure rate increases. This shifts you from fixing issues after they happen to preventing them entirely.
Focus on p95 and p99 percentiles rather than averages to identify critical performance outliers. Drill into failure causes and flaky tests to prioritize fixes with maximum developer impact. Categorize pipeline failures by root cause — environment issues, dependency problems, or test instability — then target the most frequent culprits first.
Benchmark cost per build and cache hit rates to uncover infrastructure savings. Optimized caching and build intelligence can reduce build times by 30-40% while cutting cloud expenses.
Standardize CI pipelines using centralized templates and policy enforcement to eliminate pipeline sprawl. Store reusable templates in a central repository and require teams to extend from approved templates. This reduces maintenance overhead while ensuring consistent security scanning and artifact signing.
Establish Service Level Objectives (SLOs) for your most impactful metrics: build duration, queue time, and success rate. Set measurable targets like "95% of builds complete within 10 minutes" to drive accountability. Automate remediation wherever possible — auto-retry for transient failures, automated cache invalidation, and intelligent test selection to skip irrelevant tests.
The difference between successful platform teams and those drowning in dashboards comes down to focus. Elite performers track build duration, queue time, flaky test rates, and cost per build because these metrics directly impact developer productivity and infrastructure spend.
Start with the measurements covered in this guide, establish baselines, and implement governance that prevents pipeline sprawl. Focus on the metrics that reveal bottlenecks, control costs, and maintain reliability — then use that data to optimize continuously.
Ready to transform your CI metrics from vanity to velocity? Experience how Harness CI accelerates builds while cutting infrastructure costs.
Platform engineering leaders often struggle with knowing which metrics actually move the needle versus creating metric overload. These answers focus on metrics that drive measurable improvements in developer velocity, cost control, and pipeline reliability.
Actionable metrics directly connect to developer experience and business outcomes. Build duration affects daily workflow, while deployment frequency impacts feature delivery speed. Vanity metrics look impressive, but don't guide decisions. Focus on measurements that help teams optimize specific bottlenecks rather than general health scores.
Build duration, queue time, and flaky test rate directly affect how fast developers get feedback. While coverage monitoring dominates current practices, build health and time-to-fix-broken-builds offer the highest productivity gains. Focus on metrics that reduce context switching and waiting.
Cost per build and cache hit rate reveal optimization opportunities that maintain quality while cutting spend. Intelligent caching and optimized test selection can significantly reduce both build times and infrastructure costs. Running only relevant tests instead of entire suites cuts waste without compromising coverage.
Begin with pipeline metadata standardization using consistent tags for service, team, and cost center. Most CI platforms provide basic metrics through built-in dashboards. Start with DORA metrics, then add build-specific measurements as your monitoring matures.
Daily monitoring of build success rates and queue times enables immediate issue response. Weekly reviews of build duration trends and monthly cost analysis drive strategic improvements. Automated alerts for threshold breaches prevent small problems from becoming productivity killers.



Modern unit testing in CI/CD can help teams avoid slow builds by using smart strategies. Choosing the right tests, running them in parallel, and using intelligent caching all help teams get faster feedback while keeping code quality high.
Platforms like Harness CI use AI-powered test intelligence to reduce test cycles by up to 80%, showing what’s possible with the right tools. This guide shares practical ways to speed up builds and improve code quality, from basic ideas to advanced techniques that also lower costs.
Knowing what counts as a unit test is key to building software delivery pipelines that work.
A unit test looks at a single part of your code, such as a function, class method, or a small group of related components. The main point is to test one behavior at a time. Unit tests are different from integration tests because they look at the logic of your code. This makes it easier to figure out what went wrong if something goes wrong.
Unit tests should only check code that you wrote and not things like databases, file systems, or network calls. This separation makes tests quick and dependable. Tests that don't rely on outside services run in milliseconds and give the same results no matter where they are run, like on your laptop or in a CI pipeline.
Unit tests are one of the most important part of continuous integration in CI/CD pipelines because they show problems right away after code changes. Because they are so fast, developers can run them many times a minute while they are coding. This makes feedback loops very tight, which makes it easier to find bugs and stops them from getting to later stages of the pipeline.
Teams that run full test suites on every commit catch problems early by focusing on three things: making tests fast, choosing the right tests, and keeping tests organized. Good unit testing helps developers stay productive and keeps builds running quickly.
Deterministic Tests for Every Commit
Unit tests should finish in seconds, not minutes, so that they can be quickly checked. Google's engineering practices say that tests need to be "fast and reliable to give engineers immediate feedback on whether a change has broken expected behavior." To keep tests from being affected by outside factors, use mocks, stubs, and in-memory databases. Keep commit builds to less than ten minutes, and unit tests should be the basis of this quick feedback loop.
As projects get bigger, running all tests on every commit can slow teams down. Test Impact Analysis looks at coverage data to figure out which tests really check the code that has been changed. AI-powered test selection chooses the right tests for you, so you don't have to guess or sort them by hand.
To get the most out of your infrastructure, use selective execution and run tests at the same time. Divide test suites into equal-sized groups and run them on different machines simultaneously. Smart caching of dependencies, build files, and test results helps you avoid doing the same work over and over. When used together, these methods cut down on build time a lot while keeping coverage high.
Standardized Organization for Scale
Using consistent names, tags, and organization for tests helps teams track performance and keep quality high as they grow. Set clear rules for test types (like unit, integration, or smoke) and use names that show what each test checks. Analytics dashboards can spot flaky tests, slow tests, and common failures. This helps teams improve test suites and keep things running smoothly without slowing down developers.
A good unit test uses the Arrange-Act-Assert pattern. For example, you might test a function that calculates order totals with discounts:
def test_apply_discount_to_order_total():
# Arrange: Set up test data
order = Order(items=[Item(price=100), Item(price=50)])
discount = PercentageDiscount(10)
# Act: Execute the function under test
final_total = order.apply_discount(discount)
# Assert: Verify expected outcome
assert final_total == 135 # 150 - 10% discountIn the Arrange phase, you set up the objects and data you need. In the Act phase, you call the method you want to test. In the Assert phase, you check if the result is what you expected.
Testing Edge Cases
Real-world code needs to handle more than just the usual cases. Your tests should also check edge cases and errors:
def test_apply_discount_with_empty_cart_returns_zero():
order = Order(items=[])
discount = PercentageDiscount(10)
assert order.apply_discount(discount) == 0
def test_apply_discount_rejects_negative_percentage():
order = Order(items=[Item(price=100)])
with pytest.raises(ValueError):
PercentageDiscount(-5)Notice the naming style: test_apply_discount_rejects_negative_percentage clearly shows what’s being tested and what should happen. If this test fails in your CI pipeline, you’ll know right away what went wrong, without searching through logs.
When teams want faster builds and fewer late-stage bugs, the benefits of unit testing are clear. Good unit tests help speed up development and keep quality high.
When you use smart test execution in modern CI/CD pipelines, these benefits get even bigger.
Disadvantages of Unit Testing: Recognizing the Trade-Offs
Unit testing is valuable, but knowing its limits helps teams choose the right testing strategies. These downsides matter most when you’re trying to make CI/CD pipelines faster and more cost-effective.
Research shows that automatically generated tests can be harder to understand and maintain. Studies also show that statement coverage doesn’t always mean better bug detection.
Industry surveys show that many organizations have trouble with slow test execution and unclear ROI for unit testing. Smart teams solve these problems by choosing the right tests, using smart caching, and working with modern CI platforms that make testing faster and more reliable.
Developers use unit tests in three main ways that affect build speed and code quality. These practices turn testing into a tool that catches problems early and saves time on debugging.
Before they start coding, developers write unit tests. They use test-driven development (TDD) to make the design better and cut down on debugging. According to research, TDD finds 84% of new bugs, while traditional testing only finds 62%. This method gives you feedback right away, so failing tests help you decide what to do next.
Unit tests are like automated guards that catch bugs when code changes. Developers write tests to recreate bugs that have been reported, and then they check that the fixes work by running the tests again after the fixes have been made. Automated tools now generate test cases from issue reports. They are 30.4% successful at making tests that fail for the exact problem that was reported. To stop bugs that have already been fixed from coming back, teams run these regression tests in CI pipelines.
Good developer testing doesn't look at infrastructure or glue code; it looks at business logic, edge cases, and public interfaces. Testing public methods and properties is best; private details that change often should be left out. Test doubles help developers keep business logic separate from systems outside of their control, which makes tests more reliable. Integration and system tests are better for checking how parts work together, especially when it comes to things like database connections and full workflows.
Slow, unreliable tests can slow down CI and hurt productivity, while also raising costs. The following proven strategies help teams check code quickly and cut both build times and cloud expenses.
Choosing between manual and automated unit testing directly affects how fast and reliable your pipeline is.
Manual Unit Testing: Flexibility with Limitations
Manual unit testing means developers write and run tests by hand, usually early in development or when checking tricky edge cases that need human judgment. This works for old systems where automation is hard or when you need to understand complex behavior. But manual testing can’t be repeated easily and doesn’t scale well as projects grow.
Automated Unit Testing: Speed and Consistency at Scale
Automated testing transforms test execution into fast, repeatable processes that integrate seamlessly with modern development workflows. Modern platforms leverage AI-powered optimization to run only relevant tests, cutting cycle times significantly while maintaining comprehensive coverage.
Why High-Velocity Teams Prioritize Automation
Fast-moving teams use automated unit testing to keep up speed and quality. Manual testing is still useful for exploring and handling complex cases, but automation handles the repetitive checks that make deployments reliable and regular.
Difference Between Unit Testing and Other Types of Testing
Knowing the difference between unit, integration, and other test types helps teams build faster and more reliable CI/CD pipelines. Each type has its own purpose and trade-offs in speed, cost, and confidence.
Unit Tests: Fast and Isolated Validation
Unit tests are the most important part of your testing plan. They test single functions, methods, or classes without using any outside systems. You can run thousands of unit tests in just a few minutes on a good machine. This keeps you from having problems with databases or networks and gives you the quickest feedback in your pipeline.
Integration Tests: Validating Component Interactions
Integration testing makes sure that the different parts of your system work together. There are two main types of tests: narrow tests that use test doubles to check specific interactions (like testing an API client with a mock service) and broad tests that use real services (like checking your payment flow with real payment processors). Integration tests use real infrastructure to find problems that unit tests might miss.
End-to-End Tests: Complete User Journey Validation
The top of the testing pyramid is end-to-end tests. They mimic the full range of user tasks in your app. These tests are the most reliable, but they take a long time to run and are hard to fix. Unit tests can find bugs quickly, but end-to-end tests may take days to find the same bug. This method works, but it can be brittle.
The Test Pyramid: Balancing Speed and Coverage
The best testing strategy uses a pyramid: many small, fast unit tests at the bottom, some integration tests in the middle, and just a few end-to-end tests at the top.
Modern development teams use a unit testing workflow that balances speed and quality. Knowing this process helps teams spot slow spots and find ways to speed up builds while keeping code reliable.
Before making changes, developers write code on their own computers and run unit tests. They run tests on their own computers to find bugs early, and then they push the code to version control so that CI pipelines can take over. This step-by-step process helps developers stay productive by finding problems early, when they are easiest to fix.
Once code is in the pipeline, automation tools run unit tests on every commit and give feedback right away. If a test fails, the pipeline stops deployment and lets developers know right away. This automation stops bad code from getting into production. Research shows this method can cut critical defects by 40% and speed up deployments.
Modern CI platforms use Test Intelligence to only run the tests that are affected by code changes in order to speed up this process. Parallel testing runs test groups in different environments at the same time. Smart caching saves dependencies and build files so you don't have to do the same work over and over. These steps can help keep coverage high while lowering the cost of infrastructure.
Teams analyze test results through dashboards that track failure rates, execution times, and coverage trends. Analytics platforms surface patterns like flaky tests or slow-running suites that need attention. This data drives decisions about test prioritization, infrastructure scaling, and process improvements. Regular analysis ensures the unit testing approach continues to deliver value as codebases grow and evolve.
Using the right unit testing techniques can turn unreliable tests into a reliable way to speed up development. These proven methods help teams trust their code and keep CI pipelines running smoothly:
These methods work together to build test suites that catch real bugs and stay easy to maintain as your codebase grows.
As we've talked about with CI/CD workflows, the first step to good unit testing is to separate things. This means you should test your code without using outside systems that might be slow or not work at all. Dependency injection is helpful because it lets you use test doubles instead of real dependencies when you run tests.
It is easier for developers to choose the right test double if they know the differences between them. Fakes are simple working versions, such as in-memory databases. Stubs return set data that can be used to test queries. Mocks keep track of what happens so you can see if commands work as they should.
This method makes sure that tests are always quick and accurate, no matter when you run them. Tests run 60% faster and there are a lot fewer flaky failures that slow down development when teams use good isolation.
Teams need more ways to get more test coverage without having to do more work, in addition to isolation. You can set rules that should always be true with property-based testing, and it will automatically make hundreds of test cases. This method is great for finding edge cases and limits that manual tests might not catch.
Parameterized testing gives you similar benefits, but you have more control over the inputs. You don't have to write extra code to run the same test with different data. Tools like xUnit's Theory and InlineData make this possible. This helps find more bugs and makes it easier to keep track of your test suite.
Both methods work best when you choose the right tests to run. You only run the tests you need, so platforms that know which tests matter for each code change give you full coverage without slowing things down.
The last step is to test complicated data, such as JSON responses or code that was made. Golden tests and snapshot testing make things easier by saving the expected output as reference files, so you don't have to do complicated checks.
If your code’s output changes, the test fails and shows what’s different. This makes it easy to spot mistakes, and you can approve real changes by updating the snapshot. This method works well for testing APIs, config generators, or any code that creates structured output.
Teams that use full automated testing frameworks see code coverage go up by 32.8% and catch 74.2% more bugs per build. Golden tests help by making it easier to check complex cases that would otherwise need manual testing.
The main thing is to balance thoroughness with easy maintenance. Golden tests should check real behavior, not details that change often. When you get this balance right, you’ll spend less time fixing bugs and more time building features.
Picking the right unit testing tools helps your team write tests efficiently, instead of wasting time on flaky tests or slow builds. The best frameworks work well with your language and fit smoothly into your CI/CD process.
Modern teams use these frameworks along with CI platforms that offer analytics and automation. This mix of good tools and smart processes turns testing from a bottleneck into a productivity boost.
Smart unit testing can turn CI/CD from a bottleneck into an advantage. When tests are fast and reliable, developers spend less time waiting and more time releasing code. Harness Continuous Integration uses Test Intelligence, automated caching, and isolated build environments to speed up feedback without losing quality.
Want to speed up your team? Explore Harness CI and see what's possible.


Filtering data is at the heart of developer productivity. Whether you’re looking for failed builds, debugging a service or analysing deployment patterns, the ability to quickly slice and dice execution data is critical.
At Harness, users across CI, CD and other modules rely on filtering to navigate complex execution data by status, time range, triggers, services and much more. While our legacy filtering worked, it had major pain points — hidden drawers, inconsistent behaviour and lost state on refresh — that slowed both developers and users.
This blog dives into how we built a new Filters component system in React: a reusable, type-safe and feature-rich framework that powers the filtering experience on the Execution Listing page (and beyond).
Our old implementation revealed several weaknesses as Harness scaled:
These problems shaped our success criteria: discoverability, smooth UX, consistent behaviour, reusable design and decoupled components.

Building a truly reusable and powerful filtering system required exploration and iteration. Our journey involved several key stages and learning from the pitfalls of each:
Shifted to React functional components but kept logic centralised in the FilterFramework. Each filter was conditionally rendered based on visibleFilters array. Framework fetched filter options and passed them down as props.
COMPONENT FilterFramework:
STATE activeFilters = {}
STATE visibleFilters = []
STATE filterOptions = {}
ON visibleFilters CHANGE:
FOR EACH filter IN visibleFilters:
IF filterOptions[filter] NOT EXISTS:
options = FETCH filterData(filter)
filterOptions[filter] = options
ON activeFilters CHANGE:
makeAPICall(activeFilters)
RENDER:
<AllFilters setVisibleFilters={setVisibleFilters} />
IF 'services' IN visibleFilters:
<DropdownFilter
name="Services"
options={filterOptions.services}
onAdd={updateActiveFilters}
onRemove={removeFromVisible}
/>
IF 'environments' IN visibleFilters:
<DropdownFilter ... />
Pitfalls: Adding new filters required changes in multiple places, creating a maintenance nightmare and poor developer experience. The framework had minimal control over filter implementation, lacked proper abstraction and scattered filter logic across the codebase, making it neither “stupid-proof” nor scalable.
Improved the previous approach by accepting filters as children and using React.cloneElement to inject callbacks (onAdd, onRemove) from the parent framework. This gave developers a cleaner API to add filters.
children.forEach(child => {
if (visibleFilters.includes(child.props.filterKey)) {
return React.cloneElement(child, {
onAdd: (label, value) => {
activeFilters[child.props.filterKey].push({ label, value });
},
onRemove: () => {
delete activeFilters[child.props.filterKey];
}
});
}
});Pitfalls: React.cloneElement is an expensive operation that causes performance issues with frequent re-renders and it’s considered an anti-pattern by the React team. The approach tightly coupled filters to the framework’s callback signature, made prop flow implicit and difficult to debug and created type safety issues since TypeScript struggles with dynamically injected props.
The winning design uses React Context API to provide filter state and actions to child components. Individual filters access setValue and removeFilter via useFiltersContext() hook. This decouples filters from the framework while maintaining control.
COMPONENT Filters({ children, onChange }):
STATE filtersMap = {} // { search: { value, query, state } }
STATE filtersOrder = [] // ['search', 'status']
FUNCTION updateFilter(key, newValue):
serialized = parser.serialize(newValue) // Type → String
filtersMap[key] = { value: newValue, query: serialized }
updateURL(serialized)
onChange(allValues)
ON URL_CHANGE:
parsed = parser.parse(urlString) // String → Type
filtersMap[key] = { value: parsed, query: urlString }
RENDER:
<Context.Provider value={{ updateFilter, filtersMap }}>
{children}
</Context.Provider>
END COMPONENTBenefits: This solution eliminated the performance overhead of cloneElement, decoupled filters from framework internals and made it easy to add new filters without touching framework code. The Context API provides clear data flow that’s easy to debug and test, with type safety through TypeScript.

The Context API in React unlocks something truly powerful — Inversion of Control (IoC). This design principle is about delegating control to a framework instead of managing every detail yourself. It’s often summed up by the Hollywood Principle: “Don’t call us, we’ll call you.”
In React, this translates to building flexible components that let the consumer decide what to render, while the component itself handles how and when it happens.
Our Filters framework applies this principle: you don’t have to manage when to update state or synchronise the URL. You simply define your filter components and the framework orchestrates the rest — ensuring seamless, predictable updates without manual intervention.
Our Filters framework demonstrates Inversion of Control in three key ways.
The result? A single, reusable Filters component that works across pipelines, services, deployments or repositories. By separating UI logic from business logic, we gain flexibility, testability and cleaner architecture — the true power of Inversion of Control.
COMPONENT DemoPage:
STATE filterValues
FilterHandler = createFilters()
FUNCTION applyFilters(data, filters):
result = data
IF filters.onlyActive == true:
result = result WHERE item.status == "Active"
RETURN result
filteredData = applyFilters(SAMPLE_DATA, filterValues)
RENDER:
<RouterContextProvider>
<FilterHandler onChange = (updatedFilters) => SET filterValues = updatedFilters>
// Dropdown to add filters dynamically
<FilterHandler.Dropdown>
RENDER FilterDropdownMenu with available filters
</FilterHandler.Dropdown>
// Active filters section
<FilterHandler.Content>
<FilterHandler.Component parser = booleanParser filterKey = "onlyActive">
RENDER CustomActiveOnlyFilter
</FilterHandler.Component>
</FilterHandler.Content>
</FilterHandler>
RENDER DemoTable(filteredData)
</RouterContextProvider>
END COMPONENTOne of the key technical challenges in building a filtering system is URL synchronization. Browsers only understand strings, yet our applications deal with rich data types — dates, booleans, arrays and more. Without a structured solution, each component would need to manually convert these values, leading to repetitive, error-prone code.
The solution is our parser interface, a lightweight abstraction with just two methods: parse and serialize.
parse converts a URL string into the type your app needs.serialize does the opposite, turning that typed value back into a string for the URL.This bidirectional system runs automatically — parsing when filters load from the URL and serialising when users update filters.
const booleanParser: Parser<boolean> = {
parse: (value: string) => value === 'true', // "true" → true
serialize: (value: boolean) => String(value) // true → "true"
}At the heart of our framework lies the FiltersMap — a single, centralized object that holds the complete state of all active filters. It acts as the bridge between your React components and the browser, keeping UI state and URL state perfectly in sync.
Each entry in the FiltersMap contains three key fields:
You might ask — why store both the typed value and its string form? The answer is performance and reliability. If we only stored the URL string, every re-render would require re-parsing, which quickly becomes inefficient for complex filters like multi-selects. By storing both, we parse only once — when the value changes — and reuse the typed version afterward. This ensures type safety, faster URL synchronization and a clean separation between UI behavior and URL representation. The result is a system that’s predictable, scalable, and easy to maintain.
interface FilterType<T = any> {
value?: T // The actual filter value
query?: string // Serialized string for URL
state: FilterStatus // VISIBLE | FILTER_APPLIED | HIDDEN
}Let’s trace how a filter value moves through the system — from user interaction to URL synchronization.
It all starts when a user interacts with a filter component — for example, selecting a date. This triggers an onChange event with a typed value, such as a Date object. Before updating the state, the parser’s serialize method converts that typed value into a URL-safe string.
The framework then updates the FiltersMap with both versions:
value andquery.From here, two things happen simultaneously:
onChange callback fires, passing typed values back to the parent component — allowing the app to immediately fetch data or update visualizations.The reverse flow works just as seamlessly. When the URL changes — say, the user clicks the back button — the parser’s parse method converts the string back into a typed value, updates the FiltersMap and triggers a re-render of the UI.
All of this happens within milliseconds, enabling a smooth, bidirectional synchronization between the application state and the URL — a crucial piece of what makes the Filters framework feel so effortless.

For teams tackling similar challenges — complex UI state management, URL synchronization and reusable component design — this architecture offers a practical blueprint to build upon. The patterns used are not specific to Harness; they are broadly applicable to any modern frontend system that requires scalable, stateful and user-driven filtering.
The team’s core objectives — discoverability, smooth UX, consistent behavior, reusable design and decoupled elements — directly shaped every architectural decision. Through Inversion of Control, the framework manages the when and how of state updates, lifecycle events and URL synchronization, while developers define the what — business logic, API calls and filter behavior.
By treating the URL as part of the filter state, the architecture enables shareability, bookmarkability and native browser history support. The Context API serves as the control distribution layer, removing the need for prop drilling and allowing deeply nested components to seamlessly access shared logic and state.
Ultimately, Inversion of Control also paved the way for advanced capabilities such as saved filters, conditional rendering, and sticky filters — all while keeping the framework lightweight and maintainable. This approach demonstrates how clear objectives and sound architectural principles can lead to scalable, elegant solutions in complex UI systems.


In most teams, the question is no longer "Do we need an internal developer portal?" but "Do we really want to run backstage ourselves?"
Backstage proved the internal developer portal (IDP) pattern, and it works. It gives you a flexible framework, plugins, and a central place for services and docs. It also gives you a long-term commitment: owning a React/TypeScript application, managing plugins, chasing upgrades, and justifying a dedicated platform squad to keep it all usable.
That's why there are Backstage alternatives like Harness IDP and managed Backstage services. It's also why so many platform teams are taking a long time to look at them before making a decision.
Backstage was created by Spotify to fix real problems with platform engineering, such as problems with onboarding, scattered documentation, unclear ownership, and not having clear paths for new services. There was a clear goal when Spotify made Backstage open source in 2020. The main value props are good: a software catalog, templates for new services, and a place to put all the tools you need to work together.
The problem is not the concept. It is the operating model. Backstage is a framework, not a product. If you adopt it, you are committing to:
Once Backstage moves beyond a proof of concept, it takes a lot of engineering work to keep it reliable, secure, and up to date. Many companies don't realize how much work it takes. At the same time, platforms like Harness are showing that you don't have to build everything yourself to get good results from a portal.
When you look at how Harness connects IDP to CI, CD, IaC Management, and AI-powered workflows, you start to see an alternate model: treat the portal as a product you adopt, then spend platform engineering energy on standards, golden paths, and self-service workflows instead of plumbing.
When you strip away branding, almost every Backstage alternative fits one of three patterns. The differences are in how much you own and how much you offload:
| Build (Self-Hosted Backstage) |
Hybrid (Managed Backstage) | Buy (Commercial IDP) | |
|---|---|---|---|
| You own | Everything: UI, plugins, infra, roadmap | Customization, plugin choices, catalog design | Standards, golden paths, workflows |
| Vendor owns | Nothing | Hosting, upgrades, security patches | Platform, upgrades, governance tooling, support |
| Engineering investment |
High (2–5+ dedicated engineers) | Medium (1–2 engineers for customization) | Low (configuration, not code) |
| Time to value | Months | Weeks to months | Weeks |
| Flexibility | Unlimited | High, within Backstage conventions | Moderate, within vendor abstractions |
| Governance & RBAC |
Build it yourself | Build or plugin-based | Built-in |
| Best for | Large orgs wanting full control | Teams standardized on Backstage who want less ops | Teams prioritizing speed, governance, and actionability |
What This Actually Means
You fork or deploy OSS Backstage, install the plugins you need, and host it yourself. Or you build your own internal portal from scratch. Either way, you now own:
Backstage gives you the most flexibility because you can add your own custom plugins, model your internal world however you want, and connect it to any tool. If you're willing to put a lot of money into it, that freedom is very powerful.
Where It Breaks Down
In practice, that freedom has a price:
This path could still work. If you run a very large organization and want to make the portal a core product, you need to have strong React/TypeScript and platform skills, and you really want to be able to customize it however you want, building on Backstage is a good idea. Just remember that you are not choosing a tool; you are hiring people to work on a long-term project.
What This Actually Means
Managed Backstage providers run and host Backstage for you. You still get the framework and everything that goes with it, but you don't have to fix Kubernetes manifests at 2 a.m. or investigate upstream patch releases.
Vendor responsibilities typically include:
You get "Backstage without the server babysitting."
Where The Trade-Offs Show Up
You also inherit Backstage's structural limits:
Hybrid works well if you have already standardized on Backstage concepts, want to keep the ecosystem, and simply refuse to run your own instance. If you're just starting out with IDPs and are still looking into things like golden paths, self-service workflows, and platform-managed scorecards, it might be helpful to compare hybrid Backstage to commercial IDPs that were made to be products from the start.
What This Actually Means
Commercial IDPs approach the space from the opposite angle. You do not start with a framework, you start with a product. You get a portal that ships with:
The main point that sets them apart is how well that portal is connected to the systems that your developers use every day. Some products act as a metadata hub, bringing together information from your current tools. Harness does things differently. The IDP is built right on top of a software delivery platform that already has CI, CD, IaC Management, Feature Flags, and more.
Why Teams Go This Route
Teams that choose commercial Backstage alternatives tend to prioritize:
You trade some of Backstage's absolute freedom for a more focused, maintainable platform. For most organizations, that is a win.
People often think that the difference is "Backstage is free; commercial IDPs are expensive." In reality, the choice is "Where do you want to spend?"
When you use open source, you save money but lose engineering capacity. With commercial IDPs like Harness, you do the opposite: you pay to keep developers focused on the platform and save time. A platform's main purpose is to serve the teams that build on it. Who does the hard work depends on whether you build or buy.
This is how it works in practice:
| Dimension | Open-Source Backstage | Commercial IDP (e.g., Harness) |
|---|---|---|
| Upfront cost | Free (no license fees) | Subscription or usage-based pricing |
| Engineering staffing | 2–5+ engineers dedicated at scale | Minimal—vendor handles core platform |
| Customization freedom | Unlimited—you own the code | Flexible within vendor abstractions |
| UX consistency | Drifts as teams extend the portal | Controlled by product design |
| AI/automation depth | Add-on or custom build | Native, grounded in delivery data |
| Vendor lock-in risk | Low (open source) | Medium (tied to platform ecosystem) |
| Long-term TCO (3–5 years) | High (hidden in headcount) | Predictable (visible in contract) |
Backstage is a solid choice if you explicitly want to own design, UX, and technical debt. Just be honest about how much that will cost over the next three to five years.
Commercial IDPs like Harness come with pre-made catalogs, scorecards, workflows, and governance that show you the best ways to do things. In short, it's ready to use right away. You get faster rollout of golden paths, self-service workflows, and environment management, as well as predictable roadmaps and vendor support.
The real question is what you want your platform team to do: shipping features in your portal framework, or defining and evolving the standards that drive better software delivery.
When compared to other Backstage options, Harness IDP is best understood as a platform-based choice rather than a separate portal. It runs on Backstage where it makes sense (for example, to use the plugin ecosystem), but it is packaged as a curated product that sits on top of the Harness Software Delivery Platform as a whole.
There are a few design principles stand out:
When you think about Backstage alternatives in terms of "How much of this work do we want to own?" and "Should our portal be a UI or a control plane?" Harness naturally fits into the group that sees the IDP as part of a connected delivery platform rather than as a separate piece of infrastructure.
A lot of teams say, "We'll start with Backstage, and if it gets too hard, we'll move to something else." That sounds safe on paper. In production, moving from Backstage gets harder over time.
Common points where things go wrong include:
The point isn't "never choose Backstage." The point is that if you do, you should think of it as a strategic choice, not an experiment you can easily undo in a year.
Whether you are comparing Backstage alone, Backstage in a managed form, or commercial platforms like Harness, use a lens that goes beyond feature checklists. These seven questions will help you cut through the noise.
If a solution cannot give you concrete answers here, it is not the right Backstage alternative for you.
Choosing among Backstage alternatives comes down to one question: what kind of work do you want your platform team to own?
Open source Backstage gives you maximum flexibility and maximum responsibility. Managed Backstage reduces ops burden but keeps you within Backstage's conventions. Commercial IDPs like Harness narrow the surface area you maintain and connect your portal directly to CI/CD, environments, and governance.
If you want fast time to value, built-in governance, and a portal that acts rather than just displays, connect with Harness.


We’ve all seen it happen. A DevOps initiative starts with high energy, but two years later, you’re left with a sprawl of "fragile agile" pipelines. Every team has built their own bespoke scripts, security checks are inconsistent (or non-existent), and maintaining the system feels like playing whack-a-mole.
This is where the industry is shifting from simple DevOps execution to Platform Engineering.
The goal of a modern platform team isn't to be a help desk that writes YAML files for developers. The goal is to architect a "Golden Path"—a standardized, pre-vetted route to production that is actually easier to use than the alternative. It reduces the cognitive load for developers while ensuring that organizational governance isn't just a policy document, but a reality baked into every commit.
In this post, I want to walk through the architecture of a Golden Standard Pipeline. We’re going to look beyond simple task automation and explore how to weave Governance, Security, and Supply Chain integrity into a unified workflow that stands the test of time.
A Golden Standard Pipeline isn't defined by the tools you use—Harness, Gitlab, GitHub Actions—but by its layers of validation. It’s not enough to simply "build and deploy" anymore. We need to architect a system that establishes trust at every single stage.
I like to break this architecture down into four distinct domains:

The Principle: Don't process what you can't approve.
In traditional pipelines, we often see compliance checks shoehorned in right before production deployment. This is painful. There is nothing worse than waiting 20 minutes for a build and test cycle, only to be told you can't deploy because you used a non-compliant base image.
In a Golden Standard architecture, we shift governance to Step Zero.
By implementing Policy as Code (using frameworks like OPA) at the very start of execution, we solve a few problems:
The Principle: Security must speed the developer up, not slow them down.
The "Inner Loop" is sacred ground. This is where developers live. If your security scanning adds friction or takes too long, developers will find a way to bypass it. To solve this, we rely on Parallel Orchestration.
Instead of running checks linearly (Lint → then SAST → then Secrets), we group "Code Smells," "Linting," and "Security Scanners" to run simultaneously.
This gives us a huge architectural advantage:
The Principle: Prove the origin and ingredients of your software.
This is the biggest evolution we've seen in CI/CD recently. We need to stop treating the build artifact (Docker image/Binary) as a black box. Instead, we generate three critical pieces of metadata that travel with the artifact:
The Principle: Build once, deploy everywhere.
A common anti-pattern I see is rebuilding artifacts for different environments—building a "QA image" and then rebuilding a "Prod image" later. This introduces risk.
In the Golden Standard, the artifact generated and signed in Layer 3 is the exact same immutable object deployed to QA and Production. We use a Rolling Deployment strategy with an Approval Gate between environments. The production stage explicitly references the digest of the artifact verified in QA, ensuring zero drift.
To successfully build this, your platform needs to provide specific capabilities mapped to these layers.

Tools change. Jenkins, Harness, GitHub Actions—they all evolve. But the Architecture remains constant. If you adhere to these principles, you future-proof your organization:
Adopting a Golden Standard architecture transforms the CI/CD pipeline from a simple task runner into a governance engine. By abstracting security and compliance into these reusable layers, Platform Engineering teams can guarantee that every microservice—regardless of the language or framework—adheres to the organization's highest standards of trust.


Kubernetes is a powerhouse of modern infrastructure — elastic, resilient, and beautifully abstracted. It lets you scale with ease, roll out deployments seamlessly, and sleep at night knowing your apps are self-healing.
But if you’re not careful, it can also silently drain your cloud budget.
In most teams, cost comes as an afterthought — only noticed when the monthly cloud bill starts to resemble a phone number. The truth is simple:
Kubernetes isn’t expensive by default.
Inefficient scheduling decisions are.
These inefficiencies don’t come from massive architectural mistakes. It’s the small, hidden inefficiencies — configuration-level choices — that pile up into significant cloud waste.
In this post, let’s unpack the hidden costs lurking in your Kubernetes clusters and how you can take control using smarter scheduling, bin packing, right-sizing, and better node selection.
Most teams play it safe by over-provisioning resource requests — sometimes doubling or tripling what the workload needs. This leads to wasted CPU and memory that sit idle, but still costs money because the scheduler reserves them.
Your cluster is “full” — but your nodes are barely sweating.

Kubernetes’s default scheduler optimizes for availability and spreading, not cost. As a result, workloads are often spread across more nodes than necessary. This leads to fragmented resource usage, like:

Choosing the wrong instance type can be surprisingly expensive:
But without node affinity, taints, or custom scheduling, workloads might not land where they should.
Old cron jobs, demo deployments, and failed jobs that never got cleaned up — they all add up. Worse, they might be on expensive nodes or keeping the autoscaler from scaling down.
Mixing too many node types across zones, architectures, or families without careful coordination leads to bin-packing failure. A pod that fits only one node type can prevent the scale-down of others, leading to stranded resources.
Many Kubernetes environments run 24/7 by default, even when there is little or no real activity. Development clusters, staging environments, and non-critical workloads often sit idle for large portions of the day, quietly accumulating cost.
This is one of the most overlooked cost traps.
Even a well-sized cluster becomes expensive if it runs continuously while doing nothing.
Because this waste doesn’t show up as obvious inefficiency — no failed pods, no over-provisioned nodes — it often goes unnoticed until teams review monthly cloud bills. By then, the cost is already sunk.
Idle infrastructure is still infrastructure you pay for.
Kubernetes doesn’t natively optimize for cost, but you can make it.
Encourage consolidation by:
In addition to affinity and anti-affinity, teams can use topology spread constraints to control the explicit distribution of pods across zones or nodes. While they’re often used for high availability, overly strict spread requirements can work against bin-packing and prevent efficient scale-down, making them another lever that needs cost-aware tuning.

All of us go through a state where all of our resources are running 24/7 but are barely getting used and racking up costs even when everything is idle.A tried and proved way to avoid this is to scale down these resources either based on schedules or based on idleness.
Harness CCM Kubernetes AutoStopping let’s you scale down your Kubernetes workloads, AutoScaling Groups, VMs and many more based on either their activity or based on Fixed schedules to save you from these idle costs.
Cluster Orchestrator can help you to scale down the entire cluster or specific Nodepools when they are not needed, based on schedules
It’s often shocking how many pods can run on half the resources they’re requesting. Instead of guessing resource requests:

Make architecture and pricing work in your favor:


Instead of 10 specialized pools, consider:
One overlooked reason why Kubernetes cost optimization is hard is that most scaling decisions are opaque. Nodes appear and disappear, but teams rarely know why a particular scale-up or scale-down happened.
Was it CPU fragmentation? A pod affinity rule? A disruption budget? A cost constraint?
Without decision-level visibility, teams are forced to guess — and that makes cost optimization feel risky instead of intentional.
Cost-aware systems work best when they don’t just act, but explain. Clear event-level insights into why a node was added, removed, or preserved help teams build trust, validate policies, and iterate safely on optimization strategies.


One of the most effective ways to eliminate idle cost is time- or activity-based scaling. Instead of keeping clusters and workloads always on, resources can be scaled down when they are not needed and restored only when activity resumes.
With Harness CCM Kubernetes AutoStopping, teams can automatically scale down Kubernetes workloads, Auto Scaling Groups, VMs, and other resources based on usage signals or fixed schedules. This removes idle spend without requiring manual intervention.
Cluster Orchestrator extends this concept to the cluster level. It enables scheduled scale-down of entire clusters or specific node pools, making it practical to turn off unused capacity during nights, weekends, or other predictable idle windows.
Sometimes, the biggest savings come from not running infrastructure at all when it isn’t needed.

Cost is not just a financial problem. It’s an engineering challenge — and one that we, as developers, can tackle with the same tools we use for performance, resilience, and scalability.
Start small. Review a few workloads. Test new node types. Measure bin-packing efficiency weekly.

You don’t need to sacrifice performance — just be intentional with your cluster design.
Check out Cluster Orchestrator by Harness CCM today!
Kubernetes doesn’t have to be expensive — just smarter.
.png)
.png)
Have you ever watched a “temporary” Infrastructure as Code script quietly become mission-critical, undocumented, and owned by someone who left the company two years ago? We can all related to a similar scenario, if not infrastructure-specific, and this is usually the moment teams realise the build vs buy IaC decision was made by accident, not design.
As your teams grow from managing a handful of environments to orchestrating hundreds of workspaces across multiple clouds, the limits of homegrown IaC pipeline management show up fast. It starts as a few shell scripts wrapping OpenTofu or Terraform commands often evolves into a fragile web of CI jobs, custom glue code, and tribal knowledge that no one feels confident changing.
The real question is not whether you can build your own IaC solution. Most teams can. The question is what it costs you in velocity, governance, and reliability once the platform becomes business-critical.
Building a custom IaC solution feels empowering at first. You control every detail. You understand exactly how plan and apply flows work. You can tailor pipelines to your team’s preferences without waiting on vendors or abstractions.
For small teams with simple requirements, this works. A basic OpenTofu or Terraform pipeline in GitHub Actions or GitLab CI can handle plan-on-pull-request and apply-on-merge patterns just fine. Add a manual approval step and a notification, and you are operational.
The problem is that infrastructure rarely stays simple.
As usage grows, the cracks start to appear:
At this point, the build vs buy IaC question stops being technical and becomes strategic.
We cannot simply label our infrastructure as code management platform as “CI for Terraform.” It exists to standardise how infrastructure changes are proposed, reviewed, approved, and applied across teams.
Instead of every team reinventing the same patterns, an IaCM platform provides shared primitives that scale.
Workspaces are treated as first-class entities. Plans, approvals, applies, and execution history are visible in one place. When something fails, you do not have to reconstruct context from CI logs and commit messages.
IaC governance stops being a best-practice document and becomes part of the workflow. Policy checks run automatically. Risky changes are surfaced early. Approval gates are applied consistently based on impact, not convention.
This matters regardless of whether teams are using OpenTofu as their open-source baseline or maintaining existing Terraform pipelines.
Managing environment-specific configuration across large numbers of workspaces is one of the fastest ways to introduce mistakes. IaCM platforms provide variable sets and secure secret handling so values are managed once and applied consistently.
Infrastructure drift is inevitable. Manual console changes, provider behaviour, and external automation all contribute. An IaCM platform detects drift continuously and surfaces it clearly, without relying on scheduled scripts parsing CLI output.
Reusable modules are essential for scaling IaC, but unmanaged reuse creates risk. A built-in module and provider registry ensures teams use approved, versioned components and reduces duplication across the organisation.
Most platform teams underestimate how much work lives beyond the initial pipeline.
You will eventually need:
None of these are hard in isolation. Together, they represent a long-term maintenance commitment. Unless building IaC tooling is your product, this effort rarely delivers competitive advantage.
Harness Infrastructure as Code Management (IaCM) is designed for teams that want control without rebuilding the same platform components over and over again.
It supports both OpenTofu and Terraform, allowing teams to standardise workflows even as tooling evolves. OpenTofu fits naturally as an open-source execution baseline for new workloads, while Terraform remains supported where existing investment makes sense.
Harness IaCM provides:
Instead of writing and maintaining custom orchestration logic, teams focus on infrastructure design and delivery.
Drift detection, approvals, and audit trails are handled consistently across every workspace, without bespoke scripts or CI hacks.
The build vs buy IaC decision should be intentional, not accidental.
If your organisation has a genuine need to own every layer of its tooling and the capacity to maintain it long-term, building can be justified. For most teams, however, the operational overhead outweighs the benefits.
An IaCM platform provides faster time-to-value, stronger governance, and fewer failure modes as infrastructure scales.
Harness Infrastructure as Code Management enables teams to operationalise best practices for OpenTofu and Terraform without locking themselves into brittle, homegrown solutions.
The real question is not whether you can build this yourself. It is whether you want to be maintaining it when the platform becomes critical.
Explore Harness IaCM and move beyond fragile IaC pipelines.


The rapid adoption of AI is fundamentally reshaping the software development landscape, driving an unprecedented surge in code generation speed. However, this acceleration has created a significant challenge for security teams: the AI velocity paradox. This paradox describes a situation where the benefits of accelerated code generation are being "throttled by the SDLC processes downstream," such as security, testing, deployment, and compliance, which have not matured or automated at the same pace as AI has advanced the development process.
This gap is a recognized concern among industry leaders. In Harness’s latest State of AI in Software Engineering report, 48% of surveyed organizations worry that AI coding assistants introduce vulnerabilities, and 43% fear compliance issues stemming from untested, AI-generated code.
This blog post explores strategies for closing the widening gap and defending against the new attack surfaces created by AI tooling.
The AI velocity paradox is most acutely manifested in security. The benefits gained from code generation are being slowed down by downstream SDLC processes, such as testing, deployment, security, and compliance. This is because these processes have not "matured or automated at the same pace as code generation has."
Every time a coding agent or AI agent writes code, it has the potential to expand the threat surface. This can happen if the AI spins up a new application component, such as a new API, or pulls in unvalidated open-source models or libraries. If deployed without proper testing and validation, these components "can really expand your threat surface."
The imbalance is stark: code generation is up to 25% faster, and 70% of developers are shipping more frequently, yet only 46% of security compliance workflows are automated.
The Harness report revealed that 48% of respondents were concerned that AI coding assistance introduced vulnerabilities, while 43% feared regulatory exposure. While both risks are evident in practice, they do not manifest equally.
The components that significantly expand the attack surface beyond the scope of traditional application security (appsec) tools are AI agents or LLMs integrated into applications.
Traditional non-AI applications are generally deterministic; you know exactly what payload is going into an API, and which fields are sensitive. Traditional appsec tools are designed to secure this predictable environment.
However, AI agents are non-deterministic and "can behave randomly." Security measures must focus on ensuring these agents do not receive "overly excessive permissions to access anything" and controlling the type of data they have access to.

Top challenges for AI application security
For development teams with weekly release cycles, we recommend prioritizing mitigation efforts based on the OWASP LLM Top 10. The three critical areas to test and mitigate first are:
We advise that organizations should "test all your applications" for these three issues before pushing them to production.
Here’s a walkthrough of a real-world prompt injection attack scenario to illustrate the danger of excessive agency.
The Attack Path is usually:
This type of successful attack can lead to "legal implications," data loss, and damage to the organization's reputation.
Here’s a playbook to tackle Prompt Injection attacks

Harness's approach to closing the AI security gap is built on three pillars:
Read more about Harness AI security in our blog post.
Looking six to 12 months ahead, the biggest risks come from autonomous agents, deeper tool chaining, and multimodal orchestration. The game has changed from focusing on "AI code-based risk versus decision risk."
Security teams must focus on upgrading their security and testing capabilities to understand the decision risk, specifically "what kind of data is flowing out of the system and what kind of things are getting exposed." The key is to manage the non-deterministic nature of AI applications.
To stay ahead, a phased maturity roadmap is recommended:
By focusing on automation, prioritizing the most critical threats, and adopting a platform that provides visibility, testing, and protection, organizations can manage the risks introduced by AI velocity and build resilient AI-native applications.
Learn more about tackling the AI velocity paradox in security in this webinar.


As an enterprise chaos engineering platform vendor, validating chaos faults is not optional — it’s foundational. Every fault we ship must behave predictably, fail safely, and produce measurable impact across real-world environments.
When we began building our end-to-end (E2E) testing framework, we quickly ran into a familiar problem: the barrier to entry was painfully high.
Running even a single test required a long and fragile setup process:
This approach slowed feedback loops, discouraged adoption, and made iterative testing expensive — exactly the opposite of what chaos engineering should enable.
To solve this, we built a comprehensive yet developer-friendly E2E testing framework for chaos fault validation. The goal was simple: reduce setup friction without sacrificing control or correctness.
The result is a framework that offers:
What previously took 30 minutes (or more) to set up and run can now be executed in under 5 minutes — consistently and at scale.



Purpose: Orchestrates the complete chaos experiment lifecycle from creation to validation.
Key Responsibilities:
Architecture Pattern: Template Method + Observer
type ExperimentRunner struct {
identifiers utils.Identifiers
config ExperimentConfig
}
type ExperimentConfig struct {
Name string
FaultName string
ExperimentYAML string
InfraID string
InfraType string
TargetNamespace string
TargetLabel string
TargetKind string
FaultEnv map[string]string
Timeout time.Duration
SkipTargetDiscovery bool
ValidationDuringChaos ValidationFunc
ValidationAfterChaos ValidationFunc
SamplingInterval time.Duration
}Execution Flow:
Run() →
1. getLogToken()
2. triggerExperimentWithRetry()
3. Start experimentMonitor
4. extractStreamID()
5. getTargetsFromLogs()
6. runValidationDuringChaos() [parallel]
7. waitForCompletion()
8. Validate ValidationAfterChaosPurpose: Centralized experiment status tracking with publish-subscribe pattern.
Architecture Pattern: Observer Pattern
type experimentMonitor struct {
experimentID string
runResp *experiments.ExperimentRunResponse
identifiers utils.Identifiers
stopChan chan bool
statusChan chan string
subscribers []chan string
}Key Methods:
start(): Begin monitoring (go-routine)subscribe(): Create subscriber channelbroadcast(status): Notify all subscribersstop(): Signal monitoring to stopBenefits:
Purpose: Dual-phase validation system for concrete chaos impact verification.
type ValidationFunc func(targets []string, namespace string) (bool, error)
// Returns: (passed bool, error)
Phase 1: Setup
├─ Load configuration
├─ Authenticate with API
└─ Validate environment
Phase 2: Preparation
├─ Get log stream token
├─ Resolve experiment YAML path
├─ Substitute template variables
└─ Create experiment via API
Phase 3: Execution
├─ Trigger experiment run
├─ Start status monitor
├─ Extract stream ID
└─ Discover targets from logs
Phase 4: Validation (Concurrent)
├─ Validation During Chaos (parallel)
│ ├─ Sample at intervals
│ ├─ Check fault impact
│ └─ Stop when passed/completed
└─ Wait for completion
Phase 5: Post-Validation
├─ Validation After Chaos
├─ Check recovery
└─ Final assertions
Phase 6: Cleanup
├─ Stop monitor
├─ Close channels
└─ Log results
Main Thread:
├─ Create experiment
├─ Start monitor goroutine
├─ Start target discovery goroutine
├─ Start validation goroutine [if provided]
└─ Wait for completion
Monitor Goroutine:
├─ Poll status every 5s
├─ Broadcast to subscribers
└─ Stop on terminal status
Target Discovery Goroutine:
├─ Subscribe to monitor
├─ Poll for targets every 5s
├─ Listen for failures
└─ Return when found or failed
Validation Goroutine:
├─ Subscribe to monitor
├─ Run validation at intervals
├─ Listen for completion
└─ Stop when passed or completed
Template Format: {{ VARIABLE_NAME }}
Built-in Variables:
INFRA_NAMESPACE // Infrastructure namespace
FAULT_INFRA_ID // Infrastructure ID (without env prefix)
EXPERIMENT_INFRA_ID // Full infrastructure ID (env/infra)
TARGET_WORKLOAD_KIND // deployment, statefulset, daemonset
TARGET_WORKLOAD_NAMESPACE // Target namespace
TARGET_WORKLOAD_NAMES // Specific workload names (or empty)
TARGET_WORKLOAD_LABELS // Label selector
EXPERIMENT_NAME // Experiment name
FAULT_NAME // Fault type
TOTAL_CHAOS_DURATION // Duration in seconds
CHAOS_INTERVAL // Interval between chaos actions
ADDITIONAL_ENV_VARS // Fault-specific environment variablesCustom Variables: Passed via FaultEnv map in ExperimentConfig.

1. Resource Validators
ValidatePodCPUStress(targets, namespace) (bool, error)
ValidatePodMemoryStress(targets, namespace) (bool, error)
ValidateDiskFill(targets, namespace) (bool, error)
ValidateIOStress(targets, namespace) (bool, error)Detection Logic:
2. Network Validators
ValidateNetworkLatency(targets, namespace) (bool, error)
ValidateNetworkLoss(targets, namespace) (bool, error)
ValidateNetworkCorruption(targets, namespace) (bool, error)Detection Methods:
3. Pod Lifecycle Validators
ValidatePodDelete(targets, namespace) (bool, error)
ValidatePodRestarted(targets, namespace) (bool, error)
ValidatePodsRunning(targets, namespace) (bool, error)Verification:
4. Application Validators
ValidateAPIBlock(targets, namespace) (bool, error)
ValidateAPILatency(targets, namespace) (bool, error)
ValidateAPIStatusCode(targets, namespace) (bool, error)
ValidateFunctionError(targets, namespace) (bool, error)5. Redis Validators
ValidateRedisCacheLimit(targets, namespace) (bool, error)
ValidateRedisCachePenetration(targets, namespace) (bool, error)
ValidateRedisCacheExpire(targets, namespace) (bool, error)Direct Validation: Executes redis-cli INFO in pod, parses metrics


// Input
ExperimentConfig
↓
// API Creation
ExperimentPayload (JSON)
↓
// API Response
ExperimentResponse {ExperimentID, Name}
↓
// Run Request
ExperimentRunRequest {NotifyID}
↓
// Run Response
ExperimentRunResponse {ExperimentRunID, Status, Nodes}
↓
// Log Streaming
StreamToken + StreamID
↓
// Target Discovery
[]string (target pod names)
↓
// Validation
ValidationFunc(targets, namespace) → (bool, error)
↓
// Final Result
Test Pass/Fail with error details
RunExperiment(ExperimentConfig{
Name: "CPU Stress Test",
FaultName: "pod-cpu-hog",
InfraID: infraID,
ProjectID: projectId,
TargetNamespace: targetNamespace,
TargetLabel: "app=nginx", // Customize based on your test app
TargetKind: "deployment",
FaultEnv: map[string]string{
"CPU_CORES": "1",
"TOTAL_CHAOS_DURATION": "60",
"PODS_AFFECTED_PERC": "100",
"RAMP_TIME": "0",
},
Timeout: timeout,
SamplingInterval: 5 * time.Second, // Check every 5 seconds during chaos
// Verify CPU is stressed during chaos
ValidationDuringChaos: func(targets []string, namespace string) (bool, error) {
clientset, err := faultcommon.GetKubeClient()
if err != nil {
return false, err
}
return validations.ValidatePodCPUStress(clientset, targets, namespace)
},
// Verify pods recovered after chaos
ValidationAfterChaos: func(targets []string, namespace string) (bool,error) {
clientset, err := faultcommon.GetKubeClient()
if err != nil {
return false, err
}
return validations.ValidateTargetAppsHealthy(clientset, targets, namespace)
},
})While this framework is proprietary and used internally, we believe in sharing knowledge and best practices. The patterns and approaches we’ve developed can help other teams building similar testing infrastructure:
Whether you’re building a chaos engineering platform, testing distributed systems, or creating any complex testing infrastructure, these principles apply:
We hope these insights help you build better testing infrastructure for your team!
Questions? Feedback? Ideas? Join Harness community. We’d love to hear about your testing challenges and how you’re solving them!
.png)
.png)
Knowledge graphs and RAG (Retrieval-Augmented Generation) are complementary techniques for enhancing large language models with external knowledge, and each brings unique strengths for DevOps use cases. While they are often mentioned together, they are fundamentally different systems, and combining them delivers far better outcomes than relying on either approach alone.
A knowledge graph is a semantic model composed of entities and relationships that reflect how systems, services, code, environments, and people connect. These entities may come from Harness or from third-party DevOps tools. Retrieval from a knowledge graph can be:
The foundation of the knowledge graph is its semantic layer, which serves as the source of truth for the structure and meaning of the data. This semantic layer defines what an “application,” “pipeline,” “service,” “environment,” “deployment,” or “policy” means - not just how it is stored. This enforces consistent definitions across tools, eliminates ambiguity, and grounds all reasoning in shared meaning.
Because the semantic layer governs how data flows into the graph, it ensures the graph scales cleanly, remains governable, and can incorporate new tools, relationships, and metadata without becoming chaotic.
RAG, by contrast, retrieves unstructured text (documents, runbooks, incident notes, commit messages, architecture diagrams) using embedding similarity and feeds the retrieved content to an LLM. RAG does not model structure or relationships; it retrieves relevant fragments of text.
The fundamental distinction lies in structure:
This is why the two approaches excel at different types of problems.
Knowledge graphs excel at multi-hop reasoning, where answering a question requires walking multiple relationships — linking a failing service to its owning team, its CI pipeline, the associated environment, and the policies governing that environment.
They offer:
The primary limitation is that a knowledge graph is limited by the data it models.
RAG systems shine when working with unstructured information at scale. They are excellent for:
However, RAG struggles with questions that require:
RAG retrieves text. It does not understand structure.
Modern DevOps AI systems increasingly combine both approaches:
The result is retrieval and reasoning that are not only relevant but also organized, contextualized, and aligned with the real structure of the software delivery environment.
DevOps environments are inherently relationship-heavy: pipelines, services, environments, teams, approvals, policies, artifacts, and dependencies all interact tightly.

A knowledge graph captures these interactions explicitly.
The semantic layer ensures that as systems evolve, definitions remain consistent.
This gives AI agents true organizational context — not just textual familiarity.
With a graph-backed semantic model, agents can reason about:
This is essential for generating pipelines, validating changes, automating deployments, and performing impact analysis.
RAG is excellent for retrieving documentation, API references, runbooks, and historical incidents. But it cannot reliably infer:
RAG retrieves text; it does not reason across structured relationships.
This limits RAG-only approaches to “chatbots over docs,” which is useful but insufficient for deeper automation.
A hybrid system uses both unstructured retrieval (RAG) and structured context (knowledge graph) to produce highly accurate, domain-aware answers. The semantic layer ensures that the graph remains consistent and scalable even as the organization grows.
This combination enables:
Knowledge graphs — and especially the semantic layer behind them — benefit the entire engineering ecosystem, not just AI.
They provide:
AI simply leverages this foundation to become more grounded, less error-prone, and deeply contextual.
Harness uses a Software Delivery Knowledge Graph built on a semantic model that continuously synchronizes entities and relationships across Harness modules and third-party DevOps tools. The semantic layer defines meaning and ensures structure, while RAG enriches the system with unstructured context.
This enables AI agents to:

Results include:
This is possible because the system blends semantic structure (knowledge graph), meaning (semantic layer), and breadth of context (RAG), producing far more reliable DevOps automation than any single method alone. We'll be writing more about Knowledge Graph in upcoming blog posts.


When I look back at how Harness Database DevOps came to life, it feels less like building a product and more like solving a collective industry puzzle, one piece at a time. Every engineer, DBA, and DevOps practitioner I met had their own version of the same story: application delivery had evolved rapidly, but databases were still lagging behind. Schema changes were risky, rollbacks were manual, and developers hesitated to touch the database layer for fear of breaking something critical.
That was where our journey began, not with an idea, but with a question: “What if database delivery could be as effortless, safe, and auditable as application delivery?”
At Harness, we’ve always been focused on making software delivery faster, safer, and more developer-friendly. But as we worked with enterprises across industries, one recurring gap became clear, while teams were automating CI/CD pipelines for applications, database changes were still handled in silos.
The process was often manual: SQL scripts being shared over email, version control inconsistencies, and late-night hotfixes that no one wanted to own. Even with existing tools, there was a noticeable disconnect between database engineers, developers, and platform teams. The result was predictable - slow delivery cycles, high change failure rates, and limited visibility.
We didn’t want to simply build another migration tool. We wanted to redefine how databases fit into the modern CI/CD narrative, how they could become first-class citizens in the software delivery pipeline.
Before writing a single line of code, we started by listening to DBAs, developers, and release engineers who lived through these challenges every day.
Our conversations revealed a few consistent pain points:
We also studied existing open-source practices. Many of us were active contributors or long-time users of Liquibase, which had already set strong foundations for schema versioning. Our goal was not to replace those efforts, but to learn from them, build upon them, and align them with the Harness delivery ecosystem.
That’s when the real learning began, understanding how different organizations implement Liquibase, how they handle rollbacks, and how schema evolution differs between teams using PostgreSQL, MySQL, or Oracle.
This phase of research and contribution provided us with valuable insights: while the tooling existed, the real challenge was operational, integrating database changes into CI/CD pipelines without friction or risk.
Armed with insights, we began sketching the first blueprints of what would eventually become Harness Database DevOps. Our design philosophy was simple:
Early prototypes focused on automating schema migration, enforcing policy compliance, and building audit trails for database changes. But we soon realized that wasn’t enough.
Database delivery isn’t just about applying migrations; it’s about governance, visibility, and confidence. Developers needed fast feedback loops; DBAs needed assurance that governance was intact; and platform teams needed to integrate it into their broader CI/CD fabric. That realization reshaped our vision entirely.
We started with the fundamentals: source control and pipelines. Every database change, whether a script or a declarative state definition, needed to be versioned, automatically-tested, and traceable.
To make this work at scale, we leveraged script-based migrations. This allowed teams to track the actual change scripts applied to reach that state, ensuring alignment and transparency. The next challenge was automation. We wanted pipelines that could handle complex database lifecycles, provisioning instances, running validations, managing approvals, and executing rollbacks, all within a CI/CD workflow familiar to developers.
This was where the engineering creativity of our team truly shined. We integrated database delivery into Harness Pipelines, enabling one-click deployments and policy-driven rollbacks with complete auditability.
Our internal mantra became: “If it’s repeatable, it’s automatable.”
Our first internal release was both exciting and humbling. We quickly learned that every organization manages database delivery differently. Some teams followed strict change control. Others moved fast and valued agility over structure.
To bridge that gap, we focused on flexibility, which allowed teams to define their own workflows, environments, and policies while keeping governance seamlessly built in.
We also realized the importance of observability. Teams didn’t just want confirmation that a migration succeeded; they wanted to understand “why something failed”, “how long it took”, and “what exactly changed” behind the scenes.
Each round of feedback, from customers and our internal teams, helped us to refine the product further. Every iteration made it stronger, smarter, and more aligned with real-world engineering needs. And the journey wasn’t just about code; it was about collaboration and teamwork. Here’s how Harness Database DevOps connects every role in the database delivery lifecycle.
Behind every release stood a passionate team: engineers, product managers, customer success engineer and developer advocates, with a shared mission: to make database delivery seamless, safe, and scalable.
We spent long nights debating rollback semantics, early mornings testing changelog edge cases, and countless hours perfecting pipeline behavior under real workloads. It wasn’t easy, but it mattered.
This wasn’t just about building software; it was about building trust between developers and DBAs, between automation and human oversight. When we finally launched Harness Database DevOps, it didn’t feel like a product release. It felt like the beginning of something bigger, a new way to bring automation and accountability to database delivery.
What makes us proud isn’t just the technology. It’s “how we built it”, with empathy, teamwork, and a deep partnership with our customers from day one. Together with our design partners, we shaped every iteration to ensure what we were building truly reflected their needs and that database delivery could evolve with the same innovation and collaboration that define the rest of DevOps.
After months of iteration, user testing, and refinements, Harness Database DevOps entered private beta in early 2024. The excitement was immediate. Teams finally saw their database workflows appear alongside application deployments, approvals, and governance check, all within a single pipeline.
During the beta, more than thirty customers participated, offering feedback that directly shaped the product. Some asked for folder-based trunk deployments. Others wanted deeper rollback intelligence. Some wanted Harness to help there developers design and author changes in the first place. Many just wanted to see what was happening inside their database environments.
By the time general availability rolled around, Database DevOps had evolved into a mature platform, not just a feature. It offered migration state tracking, rollback mechanisms, environment isolation, policy enforcement, and native integration with the Harness ecosystem.
But more importantly, it delivered something intangible: trust. Teams could finally move faster without sacrificing control.
Database DevOps is still an evolving space. Every new integration, every pipeline enhancement, every database engine we support takes us closer to a world where managing schema changes is as seamless as deploying code.
Our mission remains the same: to help teams move fast without breaking things, to give developers confidence without compromising governance, and to make database delivery as modern as the rest of DevOps.
And as we continue this journey, one thing is certain: the story of Harness Database DevOps isn’t just about a product. It’s about reimagining what’s possible when empathy meets engineering.
From its earliest whiteboard sketch to production pipelines across enterprises, Harness Database DevOps is the product of curiosity, collaboration, and relentless iteration. It was never about reinventing databases. It was about rethinking how teams deliver change, safely, visibly, and confidently.
And that journey, from concept to reality, continues every day with every release, every migration, and every team that chooses to make their database a part of DevOps.


Are you still using Terraform without realizing the party has already moved on?
For years, Terraform was the default language of Infrastructure as Code (IaC). It offered predictability, community, and portability across cloud providers. But then, the music stopped. In 2023, HashiCorp changed Terraform’s license from Mozilla Public License (MPL) to the Business Source License (BSL), a move that put guardrails around what users and competitors could do with the code.
That shift opened a door for something new and truly open.
That “something” is OpenTofu.
And if you’re not already using or contributing to it, you’re missing your chance to help shape the future of infrastructure automation.
OpenTofu didn’t just appear out of thin air. It was born from community demand, a collective realization that Terraform’s BSL license could limit the open innovation that made IaC thrive in the first place.
So OpenTofu forked from Terraform’s last open source MPL version and joined the Linux Foundation, ensuring that it would remain fully open, community-governed, and vendor-neutral. A true Terraform alternative.
Unlike Terraform’s now-centralized governance, OpenTofu’s roadmap is decided by contributors, people building real infrastructure at real companies, not by a single commercial entity.
That means if you depend on IaC tools to build and scale your environments, your voice actually matters here.
OpenTofu is not a “different tool.” It’s a continuation, the same HCL syntax, same workflows, and same mental model, but under open governance and a faster, community-driven release cadence.
Let’s break down the Terraform vs OpenTofu comparison:

It’s still Terraform-compatible. You can take your existing configurations and run them with OpenTofu today. But beyond compatibility, OpenTofu is already moving faster and more freely, prioritizing developer-requested features that a commercial model might not. Some key examples of it's true power and longevity include:
Packaging and sharing modules or providers privately has always been clunky. You either ran your own registry or relied on Terraform Cloud.
OpenTofu solves this with OCI Registries, i.e. using the same open container standard that Docker uses.
It’s clean, familiar, and scalable.
Your modules live in any OCI-compatible registry (Harbor, Artifactory, ECR, GCR, etc.), complete with built-in versioning, integrity checks, and discoverability. No proprietary backend required.
For organizations managing hundreds of modules or providers, this is a big deal. It means your IaC supply chain can be secured and audited with the same standards you already use for container images.
Secrets in your Terraform state have always been a headache.
Even with remote backends, you’re still left with the risk of plaintext credentials or keys living inside the state file.
OpenTofu is the only IaC framework with built-in encryption at rest.
You can define an encryption block directly in configuration:
This encrypts the state transparently, no custom wrapper scripts or external encryption logic.
It also supports multiple key providers (AWS KMS, GCP KMS, Azure Key Vault, and more).
Coming soon in OpenTofu 1.11 (beta): ephemeral resources.
This feature lets providers mark sensitive data as transient so it never touches your state file in the first place. That’s a security level no other mainstream IaC tool currently offers.
OpenTofu’s most powerful feature isn’t in its code, it’s in its process.
Every proposal goes through a public RFC. Every contributor has a say. Every decision is archived and transparent.
If you want a feature, you can write a proposal, gather community feedback, and influence the outcome.
Contrast that with traditional vendor-driven roadmaps, where features are often prioritized by product-market fit rather than user need.
That’s what “being late to the party” really means: you miss your seat at the table where the next decade of IaC innovation is being decided.
Being early in an open-source ecosystem isn’t about bragging rights, it’s about influence.
OpenTofu is already gaining serious traction:
If you join later, you’ll still get the code. But you won’t get the same opportunity to shape it.
The longer you wait, the more you’ll be reacting to other people’s decisions instead of helping make them.
Migrating is a one-liner!
The OpenTofu migration guide shows that most users can simply install the tofu CLI and reuse their existing Terraform files:
It’s the same commands, same workflow, but under an open license. You can even use your existing Terraform state files directly; no conversion step required.
For teams already managing infrastructure at scale, the move to OpenTofu doesn’t just preserve your workflow, it future-proofs it.
When you’re ready to bring OpenTofu into a managed, collaborative environment, Harness Infrastructure as Code Management (IaCM) has you covered.
Harness IaCM natively supports both Terraform and OpenTofu. You can create a workspace, select your preferred binary, and run init, plan, and apply pipelines without changing your configurations.
That means you can:
Harness essentially gives you the sandbox to explore OpenTofu’s potential, whether you’re testing ephemeral resource behavior or building private OCI registries for module distribution.
So while the OpenTofu community defines the standards, Harness ensures you can implement them securely and at scale.
The real magic of OpenTofu lies in participation.
If you’ve ever complained about Terraform limitations, this is your moment to shape the alternative.
You can:
Everything lives in the open on the OpenTofu Repository.
Even reading a few discussions there shows how open, constructive, and fast-moving the community is.
The IaC landscape is changing, and this time, the direction isn’t being set by a vendor, but by the community.
OpenTofu brings us back to the roots of open-source infrastructure: collaboration, transparency, and freedom to innovate.
It’s more than a fork, it’s a course correction.
If you’re still watching from the sidelines, remember: the earlier you join, the more your voice matters.
The OpenTofu party is already in full swing.
Grab your seat at the table, bring your ideas, and help build the future of IaC, before someone else decides it for you.


An airgapped environment enforces strict outbound policies, preventing external network communication. This setup enhances security but presents challenges for cross-cloud data synchronization.
A proxy server is a lightweight, high-performance intermediary facilitating outbound requests from workloads in restricted environments. It acts as a bridge, enabling controlled external communication.
ClickHouse is an open-source, column-oriented OLAP (Online Analytical Processing) database known for its high-performance analytics capabilities.
This article explores how to seamlessly sync data from BigQuery, Google Cloud’s managed analytics database, to ClickHouse running in an AWS-hosted airgapped Kubernetes cluster using proxy-based networking.
Deploying ClickHouse in airgapped environments presents challenges in syncing data across isolated cloud infrastructures such as GCP, Azure, or AWS.
In our setup, ClickHouse is deployed via Helm charts in an AWS Kubernetes cluster, with strict outbound restrictions. The goal is to sync data from a BigQuery table (GCP) to ClickHouse (AWS K8S), adhering to airgap constraints.
The solution leverages a corporate proxy server to facilitate communication. By injecting a custom proxy configuration into ClickHouse, we enable HTTP/HTTPS traffic routing through the proxy, allowing controlled outbound access.


Observed proxy logs confirming outbound requests were successfully relayed to GCP.

Left window shows query to BigQuery and right window shows proxy logs — the request forwarding through proxy server
This approach successfully enabled secure communication between ClickHouse (AWS) and BigQuery (GCP) in an airgapped environment. The use of a ConfigMap-based proxy configuration made the setup:
By leveraging ClickHouse’s extensible configuration system and Kubernetes, we overcame strict network isolation to enable cross-cloud data workflows in constrained environments. This architecture can be extended to other cloud-native workloads requiring external data synchronization in airgapped environments.



Databases have been crucial to web applications since their beginning, serving as the core storage for all functional aspects. They manage user identities, profiles, activities, and application-specific data, acting as the authoritative source of truth. Without databases, the interconnected information driving functionality and personalized experiences would not exist. Their integrity, performance, and scalability are vital for application success, and their strategic importance grows with increasing data complexity. In this article we are going to show you how you can leverage feature flags to compare different databases.
Let’s say you want to test and compare two different databases against one another. A common use case could be to compare the performance of two of the most popular open source databases. MariaDB and PostgreSQL.


MariaDB and PostgreSQL logos
Let’s think about how we want to do this. We want to compare the experience of our users with these different database. In this example we will be doing a 50/50 experiment. In a production environment doing real testing in all likelihood you already use one database and would use a very small percentage based rollout to the other one, such as a 90/10 (or even 95/5) to reduce the blast radius of potential issues.
To do this experiment, first, let’s make a Harness FME feature flag that distributes users 50/50 between MariaDB and PostgreSQL

Now for this experiment we need to have a reasonable amount of sample data in the db. In this sample experiment we will actually just load the same data into both databases. In production you’d want to build something like a read replica using a CDC (change data capture) tool so that your experimental database matches with your production data
Our code will generate 100,000 rows of this data table and load it into both before the experiment. This is not too big to cause issues with db query speed but big enough to see if some kind of change between database technologies. This table also has three different data types — text (varchar), numbers, and timestamps.
Now let’s make a basic app that simulates making our queries. Using Python we will make an app that executes queries from a list and displays the result.
Below you can see the basic architecture of our design. We will run MariaDB and Postgres on Docker and the application code will connect to both, using the Harness FME feature flag to determine which one to use for the request.

The sample queries we used can be seen below. We are using 5 queries with a variety of SQL keywords. We include joins, limits, ordering, functions, and grouping.
We use the Harness FME SDK to do the decisioning here for our user id values. It will determine if the incoming user experiences the Postgres or MariaDB treatment using the get_treatment method of the SDK based upon the rules we defined in the Harness FME console above.
Afterwards within the application we will run the query and then track the query_executionevent using the SDK’s track method.
See below for some key parts of our Python based app.
This code will initialize our Split (Harness FME) client for the SDK.
We will generate a sample user ID, just with an integer from 1–10,000
Now we need to get whether our user will be using Postgres or MariaDB. We also do some defensive programming here to ensure that we have a default if it’s not either postgres or mariadb
Now let’s run the query and track the query_executionevent. From the app you can select the query you want to run, or if you don’t it’ll just run one of the five sample queries at random.
The db_manager class handles maintaining the connections to the databases as well as tracking the execution time for the query. Here we can see it using Python’s time to track how long the query took. The object that the db_manager returns includes this value
Tracking the event allows us to see the impact of which database was faster for our users. The signature for the Harness FME SDK’s track method includes both a value and properties. In this case we supply the query execution time as the value and the actual query that ran as a property of the event that can be used later on for filtering and , as we will see later, dimensional analysis.
You can see a screenshot of what the app looks like below. There’s a simple bootstrap themed frontend that does the display here.

app screenshot
The last step here is that we need to build a metric to do the comparison.
Here we built a metric called db_performance_comparison . In this metric we set up our desired impact — we want the query time to decrease. Our traffic type is of user.

Metric configuration
One of the most important questions is what we will select for the Measure as option. Here we have a few options, as can be seen below

Measure as options
We want to compare across users, and are interested in faster average query execution times, so we select Average of event values per user. Count, sum, ratio, and percent don’t make sense here.
Lastly, we are measuring the query_execution event.
We added this metric as a key metric for our db_performance_comparison feature flag.

Selection of our metric as a key metric
One additional thing we will want to do is set up dimensional analysis, like we mentioned above. Dimensional analysis will let us drill down into the individual queries to see which one(s) were more or less performant on each database. We can have up to 20 values in here. If we’ve already been sending events they can simply be selected as we keep track of them internally — otherwise, we will input our queries here.

selection of values for dimensional analysis
Now that we have our dimensions, our metric, and our application set to use our feature flag, we can now send traffic to the application.
For this example, I’ve created a load testing script that uses Selenium to load up my application. This will send enough traffic so that I’ll be able to get significance on my db_performance_comparison metric.
I got some pretty interesting results, if we look at the metrics impact screen we can see that Postgres resulted in a 84% drop in query time.


Even more, if we drill down to the dimensional analysis for the metric, we can see which queries were faster and which were actually slower using Postgres.

So some queries were faster and some were slower, but the faster queries were MUCH faster. This allows you to pinpoint the performance you would get by changing database engines.
You can also see the statistics in a table below — seems like the query with the most significant speedup was one that used grouping and limits.

However, the query that used a join was much slower in Postgres — you can see it’s the query that starts with SELECT a.i... , since we are doing a self-join the table alias is a. Also the query that uses EXTRACT (an SQL date function) is nearly 56% slower as well.
In summary, running experiments on backend infrastructure like databases using Harness FME can yield significant insights and performance improvements. As demonstrated, testing MariaDB against PostgreSQL revealed an 84% drop in query time with Postgres. Furthermore, dimensional analysis allowed us to identify specific queries that benefited the most, specifically those involving grouping and limits, and which queries were slower. This level of detailed performance data enables you to make informed decisions about your database engine and infrastructure, leading to optimization, efficiency, and ultimately, better user experience. Harness FME provides a robust platform for conducting such experiments and extracting actionable insights. For example — if we had an application that used a lot of join based queries or used SQL date functions like EXTRACT it may end up showing that MariaDB would be faster than Postgres and it wouldn’t make sense to consider a migration to it.
The full code for our experiment lives here: https://github.com/Split-Community/DB-Speed-Test



Modern DevOps processes are essential for ensuring efficient, reliable, and scalable software delivery. However, managing infrastructure, CI/CD pipelines, monitoring, and incident response remains a complex and time-consuming challenge for many organizations. These tasks require continuous tuning, configuration management, and rapid troubleshooting, making DevOps resource-intensive. As software systems grow in complexity, manual intervention becomes a bottleneck, increasing the risk of human error, inefficiencies, and slower deployments. This is where automation becomes a necessity, helping teams streamline workflows, reduce operational overhead, and improve deployment velocity.
The rise of artificial intelligence, particularly large language models (LLMs), has opened new possibilities for automating various aspects of software development and operations. By leveraging AI, organizations can enhance efficiency, reduce manual effort, and accelerate software delivery. LLMs bring the potential to transform DevOps by enabling intelligent automation, improving decision-making, and making systems more adaptive to changing requirements.
Our AI engineering team has been at the forefront of integrating AI into DevOps workflows. From AI-powered CI/CD optimizations to intelligent deployment strategies, we continuously explore ways to leverage AI for greater efficiency. In this blog, we share our journey in evaluating LLMs for DevOps automation, benchmarking their performance, and understanding their impact on software delivery workflows.
Before diving into the evaluation, let’s first outline the specific problem we aim to solve using large language models. (Note: In this post, I won’t go into the underlying architecture of the Harness AI DevOps Agent — stay tuned for a future blog post on that!)
Our exploration begins with the task of pipeline generation. Specifically, the AI DevOps Agent takes a user command describing the desired pipeline as input, along with relevant context information. The expected output is a pipeline YAML file generated by the AI DevOps agent, which is composed of multiple sub-agents, automating the configuration process and streamlining DevOps workflows. An example user command and the resulting YAML pipeline would be:
“Create an IACM pipeline to do create a IACM init and plan”
Response:
For simplicity, we conducted the first phase of our evaluations by focusing on generating a single step of the pipeline. Additionally, we explored two different solution designs for utilizing LLMs:
In this blog post, we focus on the generation use case — specifically, creating pipeline steps, stages, and related configurations — and introduce the metrics used to evaluate the performance of different models for this task. Our evaluations are conducted against a benchmark dataset with a known ground truth. Specifically, we have curated a dataset consisting of user commands for creating pipeline steps and their corresponding YAML configurations. Using this benchmark data, we have developed a set of metrics to assess the quality of AI-generated YAML outputs in response to user prompts.
Since we are evaluating AI-generated pipelines against known, predefined pipelines, the comparison ultimately involves measuring the differences between two YAML files. To accomplish this, we leverage and build upon DeepDiff, a framework for computing the structural differences between key-value objects. DeepDiff is conceptually inspired by Levenshtein Edit Distance, making it well-suited for quantifying variations between YAML configurations and assessing how closely the generated output matches the expected pipeline definition.
At its core, DeepDiff quantifies the difference between two objects by determining the number of operations required to transform one into the other. This difference is then normalized to produce a similarity score between 0 and 1, providing a structured way to compare data. While we utilize the standard DeepDiff library as one of our evaluation metrics, we have also developed two modified versions tailored specifically for comparing step YAMLs. These adaptations address the unique challenges of our use case, ensuring a more precise and meaningful assessment of AI-generated pipeline configurations.
In particular, we have introduced:
Benchmark Dataset
Let’s first introduce the benchmark data used for this study.
At Harness, our QA team generates numerous sample pipelines using automation tools such as APIs and Terraform Providers to simulate customer use cases and various Harness configurations. These pipelines play a crucial role in sanity testing, ensuring that when a new version of Harness is released, all steps, stages, and pipelines continue to function as expected.
For this study, we leveraged this data to create a benchmark dataset of 115 step YAMLs. For each example, we manually added a potential user command that could generate the corresponding step. The same user command was then used to generate a step YAML using an LLM. The AI-generated solutions were subsequently compared against the original YAML file to evaluate accuracy and quality.
Below is an example of a user command and its corresponding YAML file, which serves as the ground truth in our evaluation:
User Command:“Please add a Terraform plan step to the pipeline.”
Ground Truth YAML:
This YAML structure represents the expected output when an LLM generates a pipeline step based on the given user command. The AI-generated YAML will be evaluated against this reference to assess its accuracy and quality.
We evaluated both an agentic framework and direct model calls for utilizing LLMs in pipeline generation. The selection of models for each approach was based on the technical adaptability of the frameworks we used. For example, AutoGen supports only a limited set of LLMs, which influenced our model choices for the agentic framework.
As a result, there isn’t a one-to-one correspondence between the models used in the agentic framework and those used in direct calls. However, there is significant overlap between the two sets.
This comparison allows us to assess how different models and methodologies perform in generating high-quality DevOps pipeline configurations.
The figure below illustrates the performance of each model based on the three evaluation metrics introduced earlier. Models that are called using an agentic framework are prefixed with “Autogen_” in the results.
Our findings indicate that using an agentic framework significantly improves response quality across all three metrics. However, AutoGen does not yet support DeepSeek models, so for these models, we only report their performance when called directly.

LLM Performance Comparison for Pipeline Step Generation
In order to gain deeper insights into the scores, we also visualize the number of samples that failed the schema verification step, where a zero score is assigned to such cases. This highlights instances where models struggle to generate valid YAML structures:

Schema Verification Failures Across Models
The plot above clearly demonstrates the effectiveness of an agentic framework with a dedicated schema verification agent. Notably, none of the models within the agentic framework produced outputs that failed schema validation.
Our evaluation of LLMs for DevOps automation provided valuable insights into their strengths, limitations, and practical applications. Below are some key takeaways:


Written by Deba Chatterjee, Gurashish Brar, Shubham Agarwal, and Surya Vemuri

Can an AI agent test your enterprise banking workflow without human help? We found out. AI-powered test automation will be the de facto method for engineering teams to validate applications. Following our previous work exploring AI operations on the web and test automation capabilities, we expand our evaluation to include agents from the leading model providers to execute web tasks. In this latest benchmark, we evaluate how well top AI agents, including OpenAI Operator and Anthropic Computer Use, perform real-world enterprise scenarios. From banking applications to audit trail log navigation, we tested 22 tasks inspired by our customers and users.

Our journey began with introducing a framework to benchmark AI-powered web automation solutions. We followed up with a direct comparison between our AI Test Automation and browser-use. This latest evaluation extends our research by incorporating additional enterprise-focused tasks inspired by the demands of today’s B2B applications.
Business applications present unique challenges for agents performing tasks through web browser interactions. They feature complex workflows, specialized interfaces, and strict security requirements. Testing these applications demands precision, adaptability, and repeatability — the ability to navigate intricate UIs while maintaining consistent results across test runs.
To properly evaluate each agent, we expanded our original test suite with three additional tasks:
These additions brought the total test suite to 22 distinct tasks varying in complexity and domain specificity.

User tasks and Agent results
The four solutions performed very differently, especially on complex tasks. Our AI Test Automation led with an 86% success rate, followed by browser-use at 64%, while OpenAI Operator and Anthropic Computer Use achieved 45% and 41% success rates, respectively.
The performance varies as tasks interact with complex artifacts such as calendars, information-rich tables, and chat interfaces.
As in previous research, each agent executed their tasks on popular browsers, i.e., Firefox and Chrome. Also, even though OpenAI Operator required some user interaction, no additional manual help or intervention was provided outside the evaluation task.
The first additional task involves banking. The instructions include logging into a demo banking application, depositing $350 into a checking account, and verifying the transaction. Each solution must navigate the site without prior knowledge of the interface.
Our AI Test Automation completed the workflow, correctly selecting the family checking account and verifying that the $350 deposit appeared in the transaction history. Browser-use struggled with account selection and failed to complete the deposit action. Both Anthropic Computer Use and OpenAI Operator encountered login issues. Neither solution progressed past the initial authentication step.

Finding audit trail records in a table full of data is a common enterprise requirement. We challenged each solution to navigate Harness’s Audit Trail interface to locate two-day-old entries. The AI Test Automation solution navigated to the Audit Logs and paged through the table to identify two-day-old entries. Browser-use reached the audit log UI but failed to navigate, i.e., paginate to the requested records. Anthropic Computer Use did not scroll sufficiently to find the Audit Trail tile. The default browser resolution is a limiting factor with Anthropic Computer Use. The OpenAI Operator found the two-day-old audit logs.
This task demonstrates that handling information-rich tables remains challenging for browser automation tools.

The third additional task involves a messaging application. The intent is to initiate a conversation with a bot and verify the conversation in a history table. This task incorporates browser interaction and verification logic.
The AI Test Automation solution completed the chat interaction and correctly verified the conversation’s presence in the history. Browser-use also completed this task. Anthropic Computer Use, on the other hand, is unable to start a conversation. OpenAI Operator initiates the conversation but never sends a message. As a result, a new conversation does not appear in the history.
This task reveals varying levels of sophistication in executing multi-step workflows with validation.

Several factors contribute to the performance differences observed:
Specialized Architecture: Harness AI Test Automation leverages multiple agents designed for software testing use cases. Each agent has varying levels of responsibility, from planning to handling special components like calendars and data-intensive tables.
Enterprise Focus: Harness AI Test Automation is designed with enterprise use cases in mind. There are certain features to take into account from the enterprise. A sample of these features includes:
Task Complexity: Browser-use, Anthropic Computer Use, and OpenAI Operator execute many tasks. But as complexity increases, the performance gap widens significantly.
Our evaluation demonstrates that while all four solutions handle basic web tasks, the performance diverges when faced with more complex tasks and web UI elements. In such a fast-moving environment, we will continue to evolve our solution to execute more use cases. We will stay committed to tracking performance across emerging solutions and sharing insights with the developer community.
At Harness, we continue to enhance our solution to meet enterprise challenges. Promising enhancements to the product include self-diagnosis and tighter CI/CD integrations. Intent-based software testing is easier to write, more adaptable to updates, and easier to maintain than classic solutions. We continue to enhance our AI Test Automation solution to address the unique challenges of enterprise testing, empowering development teams to deliver high-quality software confidently. After all, we’re obsessed with empowering developers to do what they love: ship great software.


As cloud adoption continues to rise, efficient cost management demands a robust and automated strategy. Native cloud provider recommendations, while helpful, often have limitations — they primarily focus on vendor-specific optimizations and may not fully align with unique business requirements. Additionally, cloud providers have little incentive to highlight cost-saving opportunities beyond a certain extent, making it essential for organisations to implement customised, independent cost optimization strategies.
At Harness, we developed a Policy-Based Cloud Cost Optimization Recommendations Engine that is highly customisable and operates across AWS, Azure, and Google Cloud. This engine leverages YAML-based policies powered by Cloud Custodian, allowing organisations to define and execute cost-saving rules at scale. The system continuously analyses cloud resources, estimates potential savings, and provides actionable recommendations, ensuring cost efficiency across cloud environments.
Cloud Custodian, an open-source CNCF-backed tool, is at the core of our policy-based engine. It enables defining governance rules in YAML, which are then executed as API calls against cloud accounts. This allows seamless policy execution across different cloud environments.
The system relies on detailed billing and usage reports from cloud providers to calculate cost savings:
The solution leverages Cloud Custodian to define YAML-based policies that identify cloud resources based on specific filters. The cost of these resources is retrieved from relevant cost data sources (AWS Cost and Usage Report (CUR), Azure Billing Report, and GCP Cost Usage Data). The identified cost is then multiplied by the predefined savings percentage to estimate the potential savings from the recommendation.

The diagram above illustrates the workflow of the recommendation engine. It begins with user-defined or Harness-defined cloud custodian policies, which are executed across various accounts and regions. The Harness application processes these policies, fetches cost data from cloud provider reports (AWS CUR, Azure Billing Report, GCP Cost Usage Data), and computes savings. The final output is a set of cost-saving recommendations that help users optimize their cloud spending.
Below is an example YAML rule that deletes unattached Amazon Elastic Block Store (EBS) volumes. When this policy is executed against any account and region, it filters out and deletes all unattached EBS volumes.
Harness CCM’s Policy-Based Recommendation Engine offers an intelligent, automated, and scalable approach to optimizing cloud costs. Unlike native cloud provider tools, it is designed for multi-cloud environments, allowing organisations to define custom cost-saving policies and gain transparent, data-driven insights for continuous optimization.
With over 50 built-in policies and full support for user-defined rules, Harness enables businesses to maximise savings, enhance cost visibility, and automate cloud cost management at scale. By reducing unnecessary cloud spend, companies can reinvest those savings into innovation, growth, and core business initiatives — rather than increasing the profits of cloud vendors.
Sign up for Harness CCM today and experience the power of automated cloud cost optimization firsthand!


It’s 2025 and if you work as a software engineer, you probably have access to an AI coding assistant at work. In this blog, I’ll share with you my experience working on a project to change the API endpoints of an existing codebase while making heavy use of an AI code assistant.
There’s a lot to be said about research showing the capability of AI code assistants on the day to day work of a software engineer. It’s clear as mud. Many people also have their own experience of working with AI tooling causing massive headaches with ‘AI Slop’ that is difficult to understand and only tangentially related to the original problem they were trying to address; filling up their codebase and making it impossible for them to actually understand what it is (or is supposed to be) doing.
I was part of the Split team that was acquired by Harness in Summer 2024. I had been maintaining an API wrapper for the Split APIs for a few years at this point.This allowed our users to take their existing python codebases and easily automate management of Split feature flags, users, groups, segments and other administrative entities. We were getting about 12–13,000 downloads per month. Not something that gets an enormous amount of traffic but not bad for someone who’s not officially on a Software Engineering team.
The architecture of the Python API client is that instantiating it constructs a client class that shares an API Key and optional base url configuration. Each API is served by what is called a ‘microclient’, which essentially handles the appropriate behavior of that endpoint, returning a resource of that type during create, read, and update commands.

API Client Architecture

Example showing the call sequence of instantiating the API Client and making a list call
As part of the migration of Split into the Harness platform, Split will be deprecating some of its API endpoints — these — such as Users and Groups — will proceed to be maintained in the future under the banner of the Harness Platform. Split Customers are going to be migrated to have their Split App accessed from within Harness, and so Users, Groups, and Split Projects will proceed to be managed in Harness, meaning that Harness endpoints will have to be used.

How to mate the API Client with the proper endpoints for customers post Harness Migration?
With respect to API keys, the Split API keys will continue to work for existing endpoints, and after migration to harness they will still be able to work. Harness API keys will work for everything and be required for Harness endpoints post-migration.

I had some great help from the former Split (now Harness FME) PMM and Engineering teams who took on the task of actually feeding me the relevant APIs from the Harness API Docs. This gave me a good starting point to understand what I might need to do.
Essentially to have similar control over Harness’s Role Based Access Control (RBAC) and Project information just as we did in Split — I’d need to utilize the following Harness APIs
Not all Split accounts will be migrating at once to the Harness platform — this will be over a period of a few months. This means that we will have to support both API access styles for at least some period of time. I also know that I still have my normal role at Harness supporting onboarding customers using our FME SDKs and don’t have a lot of free time to re-write an API client from scratch, so I got to thinking about what my options were.
I really wanted to make the API transition as seamless as possible for my API client users. So the first thing I figured was that I would need a way to determine if the API key being used was from a migrated account. Unfortunately, after discussing with some folks there simply wasn’t going to be time for building out an endpoint like this for what will be, at most, a period of a few months. As such my first design decision was how to determine which ‘mode’ the Client was going to use, the existing mode with access to the older Split API endpoints, or the ‘new’ mode with those endpoints deprecated and a collection of new Harness endpoints available.
I decided this was going to be done with a variable on instantiation. Since the API client’s constructor signature already included an object as its argument, this I thought would be pretty straightforward.
Eg:
Would then have an additional option for:
Now — I was thinking and questioning how I would implement this.
Recently, Harness Employees were given access to Windsurf IDE with Claude AI. I figured since I could use the help that I would sign on and that this would help me build out my code changes faster.
I had used Claude, ChatGPT, DeepSeek, and various other AI assistants through their websites for small scale problem solving (eg — fill in this function, help me with this error, write me a shell script that does XYZ) but never actually worked with something integrated into the IDE.
So I fired up Windsurf and put in a pretty ambitious prompt to see what it was capable of doing.
Split has been acquired by harness and now the harness apis will be used for some of these endpoints. I will need to implement a seperate ‘harness_mode’ boolean that is passed in at the api constructor. In harness mode there will be new endpoints available and the existing split endpoints for users, groups, restrictions, all endpoints except ‘get’ for workspaces, and all endpoints for apikeys when the type == ‘admin’ will be deprecated. I will still need to have the apikey endpoint available for type==’client_side’ and ‘server_side’ keys.
It then whirred to work, and, quite frankly. I was really impressed with the results. However — It didn’t quite understand what I wanted. The harness endpoints are completely different in structure and methods (and in base url). The result was that I’d get the microclients to have harness methods and harness placeholders in the URLs but this wasn’t going to work. I should have told the AI that I really want different microclients and different resources for Harness. I reverted the changes and went back to the drawing board. (but I’ll get back to this later)
OpenAPI
My second Idea was to attempt to generate some API code from the Harness API docs themselves. Harness’s API docs have an OpenAPI specification available, and there are tools that can be used to generate API clients out of these specifications. However, it became clear to me that the tooling to create APIs from OpenAPI specifications isn’t easily filterable. Harness has nearly 300 API endpoints for the rich collection of modules and features that it has. Harness’s nearly 10 MB OpenAPI spec would actually crash the OpenAPI generator — it was too big. I spent some time working on code to strip out and filter the OpenAPI Spec JSON just to the endpoints I needed.
Here, the AI tooling was also helpful. I asked
how can I filter a openapi json by either tag or by endpoint resource path?
can this also remove components that aren’t part of the endpoints with tags
could you also have it remove unused tags
But the problem ended up being that the OpenAPI spec is actually more complex then I initially thought, including references, parameters and dependencies for objects. So it wasn’t going to be as simple as passing in my endpoints I need and proceeding to send them to the API Generator.
I kept attempting to run the filter script generated and then proceeded to run the generator. I did a few loops of attempting to run the script, getting an error, and sending it back to the AI assistant.
By the end I did seem to get a script that could do filtering, but filtering down to just what I needed ended up being still too big for the OpenAPI generator. You can see that code here
For a test, I did start generating with just one endpoint (harness_user) and reviewing the python generated code. One thing that was clear after reviewing the file was that it was just structured so wildly differently from the API Client that I already have. Also there are dozens of warnings inside of the generated code to not make any changes or updates to it. Moreover, I was not familiar with the codebase
Either manually or attempting via an AI assistant, stitching these together was not going to be easy, so I stashed this idea as well.
As an aside, I think this is worth noting, that an AI code assistant can’t help you when you don’t even know how to really specify what exactly you want and what your outcome is going to look like. I needed to have a better understanding of what I was trying to accomplish
One of the things I had in my mind was that I really wanted to make the transition as seamless as possible. However, once my idea of the automated mode select was dashed, I still thought I could, through heroic effort, automate the creation of the existing Split python classes via the Harness APIs.
I had a deep dive into this idea and really came back with the result that it would simply be too burdensome to implement and not really give the users what they need.
For example — to create an API Key in Split, we just had one API endpoint with a json body:
However, Harness has a very rich RBAC model and with multiple modules has a far more flexible model of Service Accounts, API Keys, and individual tokens. Harness’s model allows for easy key rotation and allows the API key to really be more of a container for the actual token string that is used for authentication in the APIs.
Shown more simply in the diagrams below:

Observe the difference in structure of API Key authentication and generation
Now the Python microclient for generating API keys for Split currently makes calls structured like so:
To replicate this would mean that I would have to have the client in ‘Harness Mode’ create a Service Account, API Key, and Token all at the same time, and automatically map the roles to a created service account, being seamless to the user.
This is a tall task, and being pragmatic, I don’t see that as a real sustainable solution for developers using my library as they get more familiar with the Harness platform. They’re going to want to use Harness objects natively.
This is especially true with the delete method of the current client,
The Harness method for deleting a token takes the token identifier, not the token itself, making this signature impossible to reproduce with Harness’s APIs. And even if I could delete a token, would I want to delete the token and keep the service account and api key? Would I need to replicate the role assignment and roles that Split has? Much of this is very undefined.
Wanting to keep things as straightforward and maintainable as possible, along with trying to move to understanding the world in Harness’s API Schema, I had a design decision in my head.
We were going to have ‘Harness Mode’ for the APIs that will explicitly deprecate the Split API microclients and resources and will then activate a separate client that will use Harness API endpoints and resources. The endpoints that are unchanged will still use the Split endpoints and API keys.

Now that I’ve got a better understanding of how I want to design this, I felt like I could create a better prompt.
Split has been acquired by harness and now the harness apis will be used for some of these endpoints. I will need to implement a seperate ‘harness_mode’ boolean that is passed in at the api constructor. In harness mode there will be new endpoints available and the existing split endpoints for users, groups, restrictions, all endpoints except ‘get’ for workspaces, and all endpoints for apikeys when the type == ‘admin’ will be deprecated. I will still need to have the apikey endpoint available for type==’client_side’ and ‘server_side’ keys. Make seperate microclients in harness mode for the following resources:
harness_user, harness_project, harness_group, role, role_assignment, service_account, and token
Ensure that that the harness_mode has a seperate harness_token key that it uses. It uses x-api-key as the header for auth and not bearer authentication
Claude then whirred away and this was with much better results here. With the separate microclients I had a much better structure to build my code with. This also helped me with understanding of how I thought I would continue building.

The next thing I asked it to do was to create resources for all of my microclient objects.

The next thing I did was a big mistake. I asked it to create tests for me for all of my microclients and resources. Creating the tests at this time before I had finished implementing my code means that the AI doesn’t know which one is right or not. So I spent a lot of time troubleshooting issues with tests until I just decided to delete all of my test files and create the tests much later in my development cycle. Once I had the designs for the microclients and resources reasonably implemented, I went forth and had it write the tests for me. DO NOT have the AI write BOTH your tests and your code before you have the chance to review either of them, or you will be in a world of pain and be spending hours trying to figure out what you actually want.
This was an enormous time saver for me. Having the project essentially built with custom scaffolding for me was just amazing.
The next thing I was going to do was fill in the resources. The resources were essentially a schema with an init call to pull the endpoints in and accessors to get the fields from the data.
The schemas I was able to pull from the apidocs.harness.io site pretty easily.
Here’s an example of the AI generated code for the harness group resource.
I did a few things here — I had the AI generate for me a generalizable getter and dict export from the schema itself — essentially allowing me to just copy and paste the schema into the resource and have it auto-generate the methods that it needs to have.
Here’s an example of that code for the harness user class.
Once this was done for all of my resources, I had the AI create tests for these resources and went through a few iterations before my tests passed.
The microclients were a bit more challenging. Partly because of how the methods were really fundamentally different in many cases between the Split and Harness way of managing these HTTP resources.
There was more manual work here and not as much automation. That being said, the AI had a lot of helpful autocompletes.
For example, in the harness_user microclient class, the default list of endpoints looked like this
If I were to change one of them to the proper endpoint (ng/api/user) and then press tab it will automatically fix the other endpoints — small things like that really added up when I was going through and manually setting up things like endpoints, looping over the returned array from a GET endpoint. The AI tooling really helps speed up the implementation.
Once I had the microclients finished, I had the AI create tests and worked through running them, ensuring that we had coverage and the tests made sense and covered all of the microclient endpoints (including pagination for the list endpoints)
The last thing to clean up now was the base client. The AI created a separate main harness_apiclient that would be instantiated when harness mode was enabled. I had to review the deprecation code to ensure that deprecation warnings were indeed only fired when specified. I also cleaned up and removed some extraneous code around supporting other base urls, and set the proper harness base url.
I proceeded to ask AI to allow me to pass in an account_identifier since many of the harness endpoints require that — allowing me to make it easier so that you didn’t need to pass that field in each time for every microclient request.
Finally, I had the AI write me a comprehensive test script that would test all endpoints in both harness mode and split mode. I ran this with a Harness account and a Split account to ensure success. I fixed a few minor issues but ultimately it worked very well and seemed extremely straightforward and easy to use.
After this whole project I would like to let the reader depart with a few learnings. First of which is that your AI assistant still requires you to have a good sense of code smell. If something looks wrong or your implementation in your head would be different, always feel free to back up and revert the changes it makes. Better to be safe than sorry.
You really need to have the design in your head and constantly be comparing it to what the AI is building for you when you ask it questions. Don’t just accept it — interrogate it. Save and commit often so that you can revert to known states.
Do not have it create both your tests and implementations at the same time. Only have it do one until you are finished with it and then have it do the other.
You do not want to just keep asking it for things without an understanding of what you want the outcome to look like. Keep your hand on the revert button and don’t be afraid to revert to earlier parts of your conversation with the AI. If you do not review the code coming out of your AI assistant you will be in a world of trouble. Coding with an AI assistant still uses those Senior/Staff Software Engineer skillsets, perhaps even more than ever due to the sheer volume of code that is possible to generate. Design is more important than ever.
If you’re familiar with the legend of John Henry — he was a railroad worker who challenged a steam drilling machine with his hammer. With an AI assistant I really feel like I’ve been given a steam driller. Like this is the way to huge gains in efficiency in the production of software.

Learn how to work with your robot and be successful
I’m very excited for the future and how AI code assistants will grow and become part and parcel of the standard workflow for software development. I know it saved me a lot of time and from a lot of frustration and headaches.


In our staging environment, which handles the daily CI/CD workflows for all Harness developers, our Hosted Harness delegate was doing something curious: CPU and memory rose and fell in a suspiciously tight correlation, perfectly tracking system load.
(For context, Harness Delegate is a lightweight service that runs inside a customer’s infrastructure, securely connecting to Harness SaaS to orchestrate builds, deployments, and verifications. In the Hosted Delegate model, we run it in Harness’s cloud on behalf of customers, so they don’t have to manage the infrastructure themselves.)
At first glance, this looked normal. Of course, you expect CPU and memory to rise during busy periods and flatten when the system is idle. But the details told a different story:

In other words, what looked like “a busy system” was actually the fingerprint of a leak: memory piling up with load, and CPU spikes reflecting the runtime’s struggle to keep it under control.
The next step was to understand where this memory growth was coming from. We turned our attention to the core of our system: the worker pool. The delegate relies on a classic worker pool pattern, spawning thousands of long-running goroutines that poll for and execute tasks.
On the surface, the implementation seemed robust. Each worker was supposed to be independent, processing tasks and cleaning up after itself. So what was causing this leak that scaled perfectly with our workload?
We started with the usual suspects—unclosed resources, lingering goroutines, and unbounded global state—but found nothing that could explain the memory growth. What stood out instead was the pattern itself: memory increased in perfect proportion to the number of tasks being processed, then immediately plateaued during idle periods.

To dig deeper, we focused on the worker loop that handles each task:
This seemed innocent enough. We were just reassigning ctx to add task IDs for logging and then processing each incoming task.
The breakthrough came when we reduced the number of workers to one. With thousands running in parallel, the leak was smeared across goroutines, but a single worker made it obvious how each task contributed.
To remove the noise of short-lived allocations, we forced a garbage collection after every task and logged the post-GC heap size. This way, the graph reflected only memory that was truly retained, not temporary allocations the GC would normally clean up. The result was loud and clear: memory crept upward with each task, even after a full sweep.

That was the aha moment 💡. The tasks weren't independent at all. Something was chaining them together, and the culprit was Go's context.Context.
A context in Go is immutable. Functions like context.WithValue doesn't actually modify the context you pass in. Instead, they return a new child context that holds a reference to its parent. Our AddLogLabelsToContext function was doing exactly that:
This is fine on its own, but it becomes dangerous when used incorrectly inside a loop. By reassigning the ctx variable in every iteration, we were creating a linked list of contexts, with each new context pointing to the one from the previous iteration:
Each new context referenced the entire chain before it, preventing the garbage collector from ever cleaning it up.
With thousands of goroutines in our worker pool, we didn't just have one tangled chain—we had thousands of them growing in parallel. Each worker was independently leaking memory, one task at a time.
A single goroutine's context chain looked like this:
...and this was happening for every single worker.
Each chain lived as long as its worker goroutine—effectively, forever.
The fix wasn't concurrency magic. It was simple variable scoping:
The problem wasn't the function itself, but how we used its return value:
❌ ctx = AddLogLabelsToContext(ctx, ...) → chain builds forever
✅ taskCtx := AddLogLabelsToContext(ctx, ...) → no chain, GC frees it
The core problem can be distilled to this pattern:
It's a universal anti-pattern that appears anywhere you wrap an immutable (or effectively immutable) object inside a loop.
Same mistake, different costumes.
After fixing this memory leak, we enabled the profiler for the delegate to get better visibility into production performance. And guess what? The profiler revealed another issue - a goroutine leak!
But that's a story for the next article...🕵️♀️
Stay tuned for "The Goroutine Leak Chronicles: When Profilers Reveal Hidden Secrets 🔍🔥"