
AI is changing both what you build and how you build it - at the same time. Today, Harness is announcing two new products to secure both: AI Security, a new product to discover, test, and protect AI running in your applications, and Secure AI Coding, a new capability of Harness SAST that secures the code your AI tools are writing. Together, they further extend Harness's DevSecOps platform into the age of AI, covering the full lifecycle from the first line of AI-generated code to the models running in production.
In November, Harness published our State of AI-Native Application Security report, a survey of hundreds of security and engineering leaders on how AI-native applications are changing your threat surface. The findings were stark: 61% of new applications are now AI-powered, yet most organizations lack the tools to discover what AI models and agents exist in their environments, test them for vulnerabilities unique to AI, or protect them at runtime. The attack surface has expanded dramatically — but the tools to defend it haven't kept up.
The picture is equally concerning on the development side. Our State of AI in Software Engineering report found that 63% of organizations are already using AI coding assistants - tools like Claude Code, Cursor, and Windsurf - to write code faster. But faster isn't safer. AI-generated code has the same vulnerabilities as human-written code, but now with larger and more frequent commits. AppSec programs that were already stretched thin are now breaking under the volume and velocity.
The result is a blind spot on both sides of the AI equation - what you're building, and what you're building with. Today, Harness is closing that gap.
Most security vendors are stuck in their lane. Shift-left tools catch vulnerabilities in code before they reach production. Runtime protection tools block attacks after applications are deployed. And the two rarely talk to each other.
Harness was built on a different premise: real DevSecOps means connecting every stage of the software delivery lifecycle, and closing the loop between what you find in production and what you fix in code.
That's what the Harness platform does today. Application Security Testing brings SAST and SCA directly into the development workflow, surfacing vulnerabilities where they're faster and cheaper to fix. SCS ensures the integrity of artifacts from build to deploy, while STO provides a unified view of security posture — along with policy and governance — across the entire organization.
As code ships to production, Web Application & API Protection monitors and defends applications and APIs in real time, detecting and blocking attacks as they happen. And critically, findings from runtime don't disappear into a security team's backlog — they flow back to developers to address root causes before the next release.
The result is a closed loop: find it in code, protect it in production, fix it fast. All on a single, unified platform.
Today, we're extending that loop into AI - on both sides. AI is reshaping what you build and how you build it simultaneously. A platform that can only address one side of that equation leaves you exposed on the other. Harness closes both gaps.
In the State of AI-Native Application Security, 66% of respondents said they are flying blind when it comes to securing AI-native apps. 72% call shadow AI a gaping chasm in their security posture. 63% believe AI-native applications are more vulnerable than traditional IT applications. They’re right to be concerned.
Harness AI Security is built on the foundation of our API security platform. Every LLM call, every MCP server, every AI agent communicating with an external service does so via APIs. Your AI attack surface isn't separate from your API attack surface; it's an expansion of it. AI threats introduce new vectors like prompt injection, model manipulation, and data poisoning on top of the API vulnerabilities your teams already contend with. There is no AI security without API security.
.png)
With the launch of AI Security, we are introducing AI Discovery in General Availability (GA). AI security starts where API security starts: discovery. You can't assess or mitigate risk from AI components you don't know exist. Harness already continuously monitors your environment for new API endpoints the moment they're deployed. Recognizing LLMs, MCP servers, AI agents, and third-party GenAI services like OpenAI and Anthropic is a natural extension of that. AI Discovery automatically inventories your entire AI attack surface in real time, including calls to external GenAI services that could expose sensitive data, and surfaces runtime risks, such as unauthenticated APIs calling LLMs, weak encryption, or regulated data flowing to external models.
Beyond discovering and inventorying your AI application components, we are also introducing AI Testing and AI Firewall in Beta, extending AI Security across the full discover-test-protect lifecycle.
.png)
AI Testing actively probes your LLMs, agents, and AI-powered APIs for vulnerabilities unique to AI-native applications, including prompt injection, jailbreaks, model manipulation, data leakage, and more. These aren't vulnerabilities that a traditional DAST tool is designed to find. AI Testing was purpose-built for AI threats, continuously validating that your models and the APIs that expose them behave safely under adversarial conditions. It integrates directly into your existing CI/CD pipelines, so AI-specific security testing becomes part of every release — not a one-time audit.
.png)
AI Firewall actively protects your AI applications from AI-specific threats, such as the OWASP Top 10 for LLM Applications. It inspects and filters LLM inputs and outputs in real time, blocking prompt injection attempts, preventing sensitive data exfiltration, and enforcing behavioral guardrails on your models and agents before an attack can succeed. Unlike traditional WAF rules that require manual tuning for every new threat pattern, AI Firewall understands AI-native attack vectors natively, adapting to the evolving tactics attackers use against generative AI.
Harness AI Security with AI Discovery is now available in GA, while AI Testing and AI Firewall are available in Beta.
"As AI-assisted development becomes standard practice, the security implications of AI-generated code are becoming a material blind spot for enterprises. IDC research indicates developers accept nearly 40% of AI-generated code without revision, which can allow insecure patterns to propagate as organizations increase code output faster than they expand validation and governance, widening the gap between development velocity and application risk."
— Katie Norton, Research Manager, DevSecOps, IDC
AI Security addresses the risks inside your AI-native applications. Secure AI Coding addresses a different problem: the vulnerabilities your AI tools are introducing into your codebase.
Developers are generating more code than ever, and shipping it faster than ever. AI coding assistants now contribute to the majority of new code at many organizations — and nearly half (48%) of security and engineering leaders are concerned about the vulnerabilities that come with it. AI-generated code arrives in larger commits, at higher frequency, and often with less review than human-written code would receive.
SAST tools catch vulnerabilities at the PR stage — but by then, AI-generated code has already been written, reviewed, and often partially shipped. Harness SAST's new Secure AI Coding capability moves the security check earlier to the moment of generation, integrating directly with AI coding tools like Cursor, Windsurf, and Claude Code to scan code as it appears in the IDE. Developers never leave their workflow. They see a vulnerability warning inline, alongside a prompt to send the flagged code back to the agent for remediation — all without switching tools or even needing to trigger a manual scan.
"Security shouldn't be an afterthought when using AI dev tools. Our collaboration with Harness kicks off vulnerability detection directly in the developer workflow, so all generated code is screened from the start." — Jeff Wang, CEO, Windsurf

What sets Secure AI Coding apart from simpler linting tools is what happens beneath the surface. Rather than pattern-matching the AI-generated code in isolation, it leverages Harness's Code Property Graph (CPG) to trace how data flows through the entire application - before, through, and after the AI-generated code in question. That means Secure AI Coding can surface complex vulnerabilities like injection flaws and insecure data handling that only become visible in the context of the broader codebase. The result is security that understands your application - not just the last thing an AI assistant wrote.
When we deployed AI across our own platform, our AI ecosystem grew faster than our visibility into it. We needed a way to track every API call, identify sensitive data exposure, and monitor calls to external vendors — including OpenAI, Vertex AI, and Anthropic — without slowing down our engineering teams.
Deploying AI Security turned that black box into a transparent, manageable environment. Some milestones from our last 90 days:
The shift wasn't just operational — it was cultural. We moved from reactive monitoring to proactive defense. As our team put it: "Securing AI is foundational for us. Because our own product runs on AI, it must be resilient and secure. We use our own AI Security tools to ensure that every innovation we ship is backed by the highest security standards."
AI is moving fast. Your attack surface is expanding in two directions at once - inside the applications you're building, and inside the code your teams are generating to build them.
Harness AI Security and Secure AI Coding are available now. Whether you're trying to get visibility into the AI running in your environment, test it for vulnerabilities before attackers do, or stop insecure AI-generated code from reaching production, Harness’ platform is ready.
Talk to your account team about AI Security. Get a live walkthrough of AI Discovery, AI Testing, and AI Firewall, and see how your AI attack surface maps against your existing API security posture.
Already a Harness CI customer? Start a free trial of Harness SAST - including Secure AI Coding. Connect it to your AI coding assistant, and see what's shipping in your AI-generated code today.

Over the last few years, something fundamental has changed in software development.
If the early 2020s were about adopting AI coding assistants, the next phase is about what happens after those tools accelerate development. Teams are producing code faster than ever. But what I’m hearing from engineering leaders is a different question:
What’s going to break next?
That question is exactly what led us to commission our latest research, State of DevOps Modernization 2026. The results reveal a pattern that many practitioners already sense intuitively: faster code generation is exposing weaknesses across the rest of the software delivery lifecycle.
In other words, AI is multiplying development velocity, but it’s also revealing the limits of the systems we built to ship that code safely.
One of the most striking findings in the research is something we’ve started calling the AI Velocity Paradox - a term we coined in our 2025 State of Software Engineering Report.
Teams using AI coding tools most heavily are shipping code significantly faster. In fact, 45% of developers who use AI coding tools multiple times per day deploy to production daily or faster, compared to 32% of daily users and just 15% of weekly users.
At first glance, that sounds like a huge success story. Faster iteration cycles are exactly what modern software teams want.
But the data tells a more complicated story.
Among those same heavy AI users:
What this tells me is simple: AI is speeding up the front of the delivery pipeline, but the rest of the system isn’t scaling with it. It’s like we are running trains faster than the tracks they are built for. Friction builds, the ride is bumpy, and it seems we could be on the edge of disaster.

The result is friction downstream, more incidents, more manual work, and more operational stress on engineering teams.
To understand why this is happening, you have to step back and look at how most DevOps systems actually evolved.
Over the past 15 years, delivery pipelines have grown incrementally. Teams added tools to solve specific problems: CI servers, artifact repositories, security scanners, deployment automation, and feature management. Each step made sense at the time.
But the overall system was rarely designed as a coherent whole.
In many organizations today, quality gates, verification steps, and incident recovery still rely heavily on human coordination and manual work. In fact, 77% say teams often have to wait on other teams for routine delivery tasks.
That model worked when release cycles were slower.
It doesn’t work as well when AI dramatically increases the number of code changes moving through the system.
Think of it this way: If AI doubles the number of changes engineers can produce, your pipelines must either:
Otherwise, the system begins to crack under pressure. The burden often falls directly on developers to help deploy services safely, certify compliance checks, and keep rollouts continuously progressing. When failures happen, they have to jump in and remediate at whatever hour.
These manual tasks, naturally, inhibit innovation and cause developer burnout. That’s exactly what the research shows.
Across respondents, developers report spending roughly 36% of their time on repetitive manual tasks like chasing approvals, rerunning failed jobs, or copy-pasting configuration.
As delivery speed increases, the operational load increases. That burden often falls directly on developers.
The good news is that this problem isn’t mysterious. It’s a systems problem. And systems problems can be solved.
From our experience working with engineering organizations, we've identified a few principles that consistently help teams scale AI-driven development safely.
When every team builds pipelines differently, scaling delivery becomes difficult.
Standardized templates (or “golden paths”) make it easier to deploy services safely and consistently. They also dramatically reduce the cognitive load for developers.
Speed only works when feedback is fast.
Automating security, compliance, and quality checks earlier in the lifecycle ensures problems are caught before they reach production. That keeps pipelines moving without sacrificing safety.
Feature flags, automated rollbacks, and progressive rollouts allow teams to decouple deployment from release. That flexibility reduces the blast radius of new changes and makes experimentation safer.
It also allows teams to move faster without increasing production risk.
Automation alone doesn’t solve the problem. What matters is creating a feedback loop: deploy → observe → measure → iterate.
When teams can measure the real-world impact of changes, they can learn faster and improve continuously.
AI is already changing how software gets written. The next challenge is changing how software gets delivered.
Coding assistants have increased development teams' capacity to innovate. But to capture the full benefit, the delivery systems behind them must evolve as well.
The organizations that succeed in this new environment will be the ones that treat software delivery as a coherent system, not just a collection of tools.
Because the real goal isn’t just writing code faster. It’s learning faster, delivering safer, and turning engineering velocity into better outcomes for the business.
And that requires modernizing the entire pipeline, not just the part where code is written.

Today, Harness is announcing the General Availability of Artifact Registry, a milestone that marks more than a new product release. It represents a deliberate shift in how artifact management should work in secure software delivery.
For years, teams have accepted a strange reality: you build in one system, deploy in another, and manage artifacts somewhere else entirely. CI/CD pipelines run in one place, artifacts live in a third-party registry, and security scans happen downstream. When developers need to publish, pull, or debug an artifact, they leave their pipelines, log into another tool, and return to finish their work.
It works, but it’s fragmented, expensive, and increasingly difficult to govern and secure.
At Harness, we believe artifact management belongs inside the platform where software is built and delivered. That belief led to Harness Artifact Registry.
Artifact Registry started as a small, high-ownership bet inside Harness and a dedicated team with a clear thesis: artifact management shouldn’t be a separate system developers have to leave their pipelines to use. We treated it like a seed startup inside the company, moving fast with direct customer feedback and a single-threaded leader driving the vision.The message from enterprise teams was consistent: they didn’t want to stitch together separate tools for artifact storage, open source dependency security, and vulnerability scanning.
So we built it that way.
In just over a year, Artifact Registry moved from concept to core product. What started with a single design partner expanded to double digit enterprise customers pre-GA – the kind of pull-through adoption that signals we've identified a critical gap in the DevOps toolchain.
Today, Artifact Registry supports a broad range of container formats, package ecosystems, and AI artifacts, including Docker, Helm (OCI), Python, npm, Go, NuGet, Dart, Conda, and more, with additional support on the way. Enterprise teams are standardizing on it across CI pipelines, reducing registry sprawl, and eliminating the friction of managing diverse artifacts outside their delivery workflows.
One early enterprise customer, Drax Group, consolidated multiple container and package types into Harness Artifact Registry and achieved 100 percent adoption across teams after standardizing on the platform.
As their Head of Software Engineering put it:
"Harness is helping us achieve a single source of truth for all artifact types containerized and non-containerized alike making sure every piece of software is verified before it reaches production." - Jasper van Rijn
In modern DevSecOps environments, artifacts sit at the center of delivery. Builds generate them, deployments promote them, rollbacks depend on them, and governance decisions attach to them. Yet registries have traditionally operated as external storage systems, disconnected from CI/CD orchestration and policy enforcement.
That separation no longer holds up against today’s threat landscape.
Software supply chain attacks are more frequent and more sophisticated. The SolarWinds breach showed how malicious code embedded in trusted update binaries can infiltrate thousands of organizations. More recently, the Shai-Hulud 2.0 campaign compromised hundreds of npm packages and spread automatically across tens of thousands of downstream repositories.
These incidents reveal an important business reality: risk often enters early in the software lifecycle, embedded in third-party components and artifacts long before a product reaches customers.When artifact storage, open source governance, and security scanning are managed in separate systems, oversight becomes fragmented. Controls are applied after the fact, visibility is incomplete, and teams operate in silos. The result is slower response times, higher operational costs, and increased exposure.
We saw an opportunity to simplify and strengthen this model.

By embedding artifact management directly into the Harness platform, the registry becomes a built-in control point within the delivery lifecycle. RBAC, audit logging, replication, quotas, scanning, and policy enforcement operate inside the same platform where pipelines run. Instead of stitching together siloed systems, teams manage artifacts alongside builds, deployments, and security workflows. The outcome is streamlined operations, clearer accountability, and proactive risk management applied at the earliest possible stage rather than after issues surface.
Security is one of the clearest examples of why registry-native governance matters.
Artifact Registry delivers this through Dependency Firewall, a registry-level enforcement control applied at dependency ingest. Rather than relying on downstream CI scans after a package has already entered a build, Dependency Firewall evaluates dependency requests in real time as artifacts enter the registry. Policies can automatically block components with known CVEs, license violations, excessive severity thresholds, or untrusted upstream sources before they are cached or consumed by pipelines.

Artifact quarantine extends this model by automatically isolating artifacts that fail vulnerability or compliance checks. If an artifact does not meet defined policy requirements, it cannot be downloaded, promoted, or deployed until the issue is addressed. All quarantine and release actions are governed by role-based access controls and fully auditable, ensuring transparency and accountability. Built-in scanning powered by Aqua Trivy, combined with integrations across more than 40 security tools in Harness, feeds results directly into policy evaluation. This allows organizations to automate release or quarantine decisions in real time, reducing manual intervention while strengthening control at the artifact boundary.

The result is a registry that functions as an active supply chain control point, enforcing governance at the artifact boundary and reducing risk before it propagates downstream.
General Availability signals that Artifact Registry is now a core pillar of the Harness platform. Over the past year, we’ve hardened performance, expanded artifact format support, scaled multi-region replication, and refined enterprise-grade controls. Customers are running high-throughput CI pipelines against it in production environments, and internal Harness teams rely on it daily.
We’re continuing to invest in:
Modern software delivery demands clear control over how software is built, secured, and distributed. As supply chain threats increase and delivery velocity accelerates, organizations need earlier visibility and enforcement without introducing new friction or operational complexity.
We invite you to sign up for a demo and see firsthand how Harness Artifact Registry delivers high-performance artifact distribution with built-in security and governance at scale.


Definition: Parallel execution in CI is the practice of running independent build, test, or deployment tasks concurrently to reduce feedback time, improve resource utilization, and control infrastructure costs.
Developers often spend almost half their time waiting for builds that could be faster. Simply adding more resources is not enough. Real improvements come from planned parallelism, using concurrency together with test intelligence, caching, and strong governance.
With this approach, teams can get builds done 4x faster and cut infrastructure costs by up to 80%, all while staying reliable. Harness CI helps achieve these results with AI-powered optimization and strong governance. See how modern parallel execution can speed up your development.
When your 200+ developers have to wait 40 minutes for build feedback, productivity drops, and your cloud costs go up because of idle compute time. How does running things in parallel make the CI/CD pipeline faster and help developers get more done? Teams get rid of bottlenecks that waste both developer time and infrastructure money by running separate tasks at the same time instead of making them wait in line.
Traditional CI pipelines make tasks wait one after another, wasting resources while jobs are idle. With concurrent processing, you can find independent tasks, such as testing different modules or deploying to separate environments, and run them at the same time on available machines.
Quick feedback helps developers stay focused instead of switching tasks while waiting for slow builds. If PR validation takes hours, developers move on to other work and lose track of their changes, which can lead to costly rework.
CloudBees research shows that 75% of DevOps professionals lose over 25% of their productivity due to slow testing cycles. Simultaneous test execution addresses this by distributing test suites across multiple machines, thereby substantially reducing total execution time.
Raw concurrency alone doesn't maximize gains; pairing it with smart optimization multiplies benefits while controlling costs. Test Intelligence cuts test cycles by up to 80% by running only tests related to code changes, reducing the work that needs to be parallelized.
Cache Intelligence stops unnecessary downloads of dependencies and pulls of Docker layers across parallel jobs. When used with the fastest CI platform, this leads to even more improvements: fewer tests to run at the same time, faster execution of individual jobs, and lower infrastructure costs because waste is no longer needed.
Legacy Jenkins environments consuming 20% of the platform team's capacity need a methodical approach to avoid turning parallel execution into operational complexity. The best practices for implementing parallel execution in complex legacy CI systems start with understanding your current dependencies and stabilizing your foundation before scaling out.
By building a strong foundation first, you lower the risk of parallel execution making problems worse and get clear speed improvements. Once dependencies are mapped and tests are stable, teams can focus on governance and cost controls to keep parallelism going as they grow.
Allocating the right amount of resources demonstrates that parallel execution can reduce cloud costs without compromising security. On-demand build environments with autoscaling only add new machines when they are needed and take them away when they are done, so there is no overprovisioning.
Pairing this with intelligent caching and AI-powered test selection can slash test cycles by up to 80%, while recent research shows parallel execution strategies lower overall operational costs by 40-50% when properly implemented. Company Burst SMS achieved a 76% infrastructure cost reduction by moving to optimized, no-share infrastructure that ensures consistent performance without noisy neighbors.
In addition to optimizing infrastructure, good parallelism needs rules to keep developers productive and stop uncontrolled scaling. Policy as Code frameworks make it easier for teams to set up RBAC controls and manage secrets automatically in CI pipelines with policies that can be tested and versioned.
These automated guardrails prevent unauthorized parallel job sprawl while ensuring secure artifact tracking for all builds. The key is measuring what matters: track four key metrics, queue time, concurrency utilization, cache hit rates, and cost per build, to tune your parallelism strategy continuously.
To summarize:
Speed → parallel stages + test selection
Cost → autoscaling + caching
Control → policy-as-code + RBAC
Parallel execution can turn CI pipelines from slow points into fast accelerators when combined with smart caching, selective testing, and good governance. Teams can get builds done four times faster and cut infrastructure costs by up to 76% by using concurrent stages and AI-powered optimizations. The secret is to balance speed and control, using templates, policy rules, and analytics to scale parallelism safely across teams.
Moving from theory to practice requires the right platform foundation. Harness CI streamlines parallel execution through automated migration tools, stage-level parallelism, and built-in troubleshooting that removes operational friction.
Ready to accelerate your CI pipelines while cutting infrastructure costs? Explore Harness Continuous Integration to see how AI-powered parallel execution delivers measurable results for your development teams.
Platform engineering teams take care of CI infrastructure for hundreds of developers who work on many different product teams. This makes it harder and more important to run things in parallel than in normal DevOps setups. When you run a lot of workflows at the same time, problems like making sure tests are reliable, keeping costs down, and following security rules get even worse.
Use Test Intelligence to only run tests that are important, which can cut down on exposure to unreliable suites by up to 80%. Instead of blanket retries, set up targeted retries and auto-quarantine for flaky tests that are found. Separate temp directories and resource limits for sandbox test processes so that tests don't get in each other's way.
Configure predictive scaling with usage buffers and cooldown windows to avoid cost spikes. Set policy rules that enforce maximum concurrent jobs per team or repository. Combine smart caching and selective test execution to reduce the need for high concurrency while maintaining fast feedback.
Enable SLSA L3 compliance with automated software bill of materials generation across parallel build stages. Run each parallel job in isolated build environments to avoid cross-contamination. Cache dependencies at the layer level while maintaining secure verification of cached artifacts.
Roll out templates and RBAC to standardize parallel patterns while allowing team customization. Monitor concurrency usage and cost per build through centralized dashboards. Create policy rules that automatically enforce resource limits and security scanning requirements across all parallel workflows without blocking developers.
Start with high-value pipelines that have clear dependency boundaries and stable test suites. Apply migration utilities to automate up to 80% of pipeline conversion tasks. Map existing job dependencies before parallelizing to avoid hidden bottlenecks that cancel out performance gains from concurrent execution.


We've all been there. You push a PR, grab coffee, check Slack, maybe start a side conversation — and your build is still running. Multiply that across a team of 50 engineers, and you're looking at hours of lost focus every single day.
Slow CI/CD builds don't just waste time. They generate a steady stream of "CI is slow" tickets that eat into your platform team's roadmap. Intelligent caching is one of the fastest ways to break that cycle.
This checklist walks platform teams through three high-impact levers: intelligent caching, test intelligence, and parallelization. These cut build latency, lower costs, and keep feedback loops tight. And if you'd rather get these patterns out of the box instead of stitching them together yourself, take a look at how Harness CI brings Cache Intelligence, Test Intelligence™, and parallel pipelines together in a single platform.
We're focusing on three things that consistently deliver the biggest bang for your effort:
Think of this as a scorecard. Capture your current build metrics first, then work through each area to figure out where intelligent caching, smarter testing, and better parallelization will give you the most improvement.
Before you touch anything, measure three things:
Developer wait time. What are your p50 and p95 build durations for PR and main branch pipelines? This is the number your developers feel every day.
Cost. How much compute, storage, and bandwidth are you burning on CI/CD and artifact delivery? Most teams are surprised when they actually add it up.
Reliability. How often are flaky tests, registry timeouts, or failed pulls derailing builds? These "small" issues compound fast.
As you roll out intelligent caching, test intelligence, and parallelization, these numbers should all move in the right direction together. Faster feedback, lower spend, fewer flake-related fires.
Here's the thing: most teams will tell you they "use caching." But very few treat intelligent caching as a deliberate, governed part of their CI/CD architecture. There's a big difference between flipping on a cache toggle and actually thinking through a caching strategy.
Intelligent caching for CI/CD comes down to clear decisions:
Instead of one generic cache, intelligent caching becomes a set of policies and metrics that your platform team owns and governs.
Start with a quick self-audit. Be honest; that's where the value is:
If most of your answers are "no" or "not sure," intelligent caching is your single biggest opportunity for improvement.
In a mature setup, intelligent caching typically includes:
Docker layer caching. Base images and common layers are served from local cache nodes. Only true cache misses travel across regions or clouds. (For context, Harness CI offers managed Docker Layer Caching that works across any build infrastructure, including Harness Cloud, with automatic eviction of stale layers.)
Dependency caching as a policy. Shared caches for language dependencies, keyed by lockfiles or checksums. Clear eviction and refresh rules so you're not pulling stale or vulnerable packages. Harness calls this Cache Intelligence. It automatically detects and caches dependencies without requiring manual configuration for each repo.
Build artifact caching. Reuse of intermediate build outputs, especially valuable for monorepos and shared components. Cache warmup for your most frequent pipelines. Harness's Build Intelligence feature handles this for tools like Gradle and Bazel by storing and reusing build outputs that haven't changed.
Policy-driven behavior. TTLs scoped by artifact type and environment. Cache bypass on dedicated security branches or hotfix pipelines.
Full observability. Cache hit/miss metrics broken down by repo and pipeline. Latency and bandwidth savings visible to the platform team. Harness CI surfaces intelligence tiles in the stage summary showing exactly how much time Cache Intelligence, Test Intelligence, and Docker Layer Caching saved on each build.
This is intelligent caching as a governed layer in front of your registries, package managers, and artifact stores; not just a hidden toggle buried in your CI tool's settings.
Here's how this typically plays out for a PR:
The impact is often visible within a day. Those minutes of "pulling…" that clutter your build logs? They just vanish from the hot path.
Score yourself here:
If you have fewer than three of these checked, start here. Intelligent caching will have an outsized impact on your build times and bandwidth costs.
Once caching is doing its job, the next bottleneck is almost always testing. Over time, test suites swell until they dominate your CI budget. Teams add tests but rarely prune them, and before you know it, every PR triggers a full regression run.
Test intelligence focuses on running only the tests that actually matter for a given change, with full runs reserved for where they truly count.
You probably need test intelligence if:
In that world, even perfect intelligent caching can't overcome the fundamental problem: you're doing way more work than necessary.
Test intelligence typically works by:
Then you decide when to run targeted subsets (PRs) versus full suites (main branch, nightly, pre-release).
Harness's Test Intelligence™ uses machine learning to figure out which tests are actually affected by a code change and can accelerate test cycles by up to 80%. It also supports test parallelism, automatically splitting tests based on timing data so they run concurrently instead of in sequence.
With intelligent caching already in place, these selected tests start and finish faster because they spend less time waiting on dependency and artifact downloads. The two work as a multiplier.
If most of these aren't in place, test intelligence should be your next move after your initial intelligent caching rollout.
Caching and selective tests still underperform if your pipeline runs as one long serial chain. At that point, idle capacity is your real enemy.
Parallelization makes sure jobs run side by side so your builds actually use the runners and hardware you're already paying for.
Watch for these patterns:
Parallelization is how you break big problems into smaller, faster pieces without losing coverage.
Mature CI/CD setups typically break pipelines into many jobs and stages (build, unit tests, integration tests, UI tests, security scans, packaging, deployment), each running independently where possible.
They use fan-out / fan-in patterns: fan-out to share big test suites into many small, independent jobs, and fan-in to aggregate results into a single decision point.
The key is aligning parallel jobs with intelligent caching. Each shard reuses cached dependencies, Docker layers, and artifacts. Cache keys are structured so shards benefit from each other's work. This is where intelligent caching becomes a true multiplier. Every cache hit benefits many jobs running at once.
Harness CI supports this natively. You can define multi-stage pipelines with parallel steps, and combined with Cache Intelligence and Test Intelligence's automatic test splitting, your builds naturally take advantage of all available capacity.
If intelligent caching is already in place, parallelization is often the fastest path to another noticeable drop in build times.
Here's the full picture. Count how many you can honestly check off.
Intelligent Caching
Test Intelligence
Parallelization
How to read your score:
0–7 checks: There are big wins on the table. Start with intelligent caching. It's typically the highest-leverage first move.
8–12 checks: Solid foundation. Focus on tuning test intelligence and parallelization for the next round of gains.
13+ checks: You're in great shape. Keep refining policies, observability, and edge cases.
If you're investing in a modern CI platform like Harness CI, intelligent caching, test intelligence, and parallelization aren't separate projects you tackle one at a time. They're connected patterns that reinforce each other. Faster builds, lower costs, and a lot less developer toil.
Pick one or two gaps from this checklist, bring them to your next team planning session, and start turning intelligent caching into a visible, strategic win for your platform.
Want to see these patterns in action instead of building them yourself? Harness CI brings Cache Intelligence, Test Intelligence™, Build Intelligence, and Docker Layer Caching together with parallel pipelines and Harness Cloud infrastructure, so platform teams can focus on golden paths instead of plumbing.
Intelligent caching in CI/CD goes beyond basic "store and hope for hits." It combines caching with policies, observability, and automation; controlling what gets cached, where it's stored, how long it lives, and when it gets refreshed. For Docker images, dependencies, and build artifacts, this means pipelines that are both fast and safe.
Basic caching saves data temporarily and crosses its fingers. Intelligent caching looks at usage patterns, environments, and business rules to decide which artifacts deserve cache space, how TTLs should be tuned, when to bypass the cache entirely, and how to track the impact on build times and costs. It's a governed capability, not a checkbox.
Intelligent caching shortens build and test stages, reduces cloud egress and registry load, and takes a big chunk out of daily developer wait time. For platform and DevOps teams, it's a lever you can adjust with policy and metrics — not one-off tweaks buried in pipeline YAML.
Nope. Redis is great for application-level caching, but CI/CD intelligent caching typically relies on reverse proxies, artifact caching layers, and CI-native mechanisms (like Harness's Cache Intelligence) that sit in front of registries, package managers, and object stores.
Track p50 and p95 build times, cache hit rates, origin requests, bandwidth/egress costs, and registry load before and after enabling intelligent caching. The combination of faster builds and lower infrastructure costs tells a clear, defensible ROI story.


At SREday NYC 2026, the ShipTalk podcast welcomed Zachary Gruenberg, Solution Engineer and Machine Identity SME at Palo Alto Networks, for a conversation about one of the fastest growing challenges in modern infrastructure: machine identity management.
Throughout the conference, much of the discussion centered on AI agents automating operational tasks—from incident response to infrastructure management. But every automated agent interacting with systems still requires credentials and access permissions.
In the episode, ShipTalk host Dewan Ahmed, Principal Developer Advocate at Harness, spoke with Zachary about how the rapid rise of AI-driven automation is creating an explosion of machine identities—and why managing them is quickly becoming a major security concern for SRE and platform teams.
In the past, identity management primarily focused on human users logging into systems.
Today, the landscape looks very different.
Modern infrastructure environments include a growing number of non-human identities such as:
Each of these components requires credentials in order to interact with infrastructure, APIs, and other services.
As organizations deploy more automation and AI-driven workflows, the number of machine identities can quickly outnumber human users by several orders of magnitude.
For SRE teams, this creates a new challenge: tracking which systems have access to what resources—and ensuring those permissions remain secure.
One of the most common problems Zachary sees is that teams prioritize functionality when deploying new automation systems.
When engineers introduce AI agents or automated workflows, identity management is often treated as an afterthought.
That approach can lead to:
To address this, Zachary encourages organizations to treat machine identity as a core component of their security architecture, rather than a secondary concern.
This often includes practices such as:
When these controls are built into the platform early, security can scale alongside automation instead of becoming a bottleneck.
Despite the growing awareness of identity security, Zachary frequently encounters one recurring issue.
Many teams simply lose track of the machine identities they have created.
Over time, environments accumulate service accounts, API keys, tokens, and automation credentials that remain active long after the systems that created them are gone.
This “identity sprawl” can create significant risk, particularly in environments where automated systems are interacting with critical infrastructure.
The challenge becomes even greater as AI agents begin performing more complex operational tasks.
Ensuring that these agents have the right level of access—and no more—requires visibility into every identity operating within the system.
As organizations adopt AI-driven automation across operations, the importance of identity security will only increase.
Each new automation tool or AI workflow adds another layer of machine identities interacting with infrastructure.
For SRE and platform teams, this means reliability engineering and security practices are becoming increasingly interconnected.
Strong machine identity management ensures that automation systems can operate safely while protecting the infrastructure they interact with.
Zachary Gruenberg’s message is a timely reminder that the growth of AI agents and automation does not eliminate the need for strong security foundations.
If anything, it makes them even more critical.
As organizations move toward more autonomous systems, understanding who—or what—has access to critical infrastructure will remain one of the most important challenges for reliability and security teams alike.
Subscribe to the ShipTalk Podcast
Enjoy conversations like this with engineers, platform builders, and reliability leaders from across the industry.
Follow ShipTalk on your favorite podcast platform and stay tuned for more stories from the people building the systems that power modern technology. 🎙️🚀


Definition: CI pipeline optimization is the practice of reducing build and test time and the cost per build by running only what matters, reusing unchanged components, and enforcing standardized governance.
Platform teams are wasting thousands of hours every year because their pipelines aren't working right. Developers wait 45 minutes for builds. Jenkins consumes 20% of your team's capacity on maintenance. Infrastructure costs keep climbing, and CI transforms from helpful automation into the thing everyone complains about at standups.
Your team isn't the problem, though. Traditional CI methods just don't work on a larger scale. Giving slow pipelines more computing power is like buying a faster car to get through traffic: you're still stuck in the same traffic jam, but you have to pay more.
AI-powered pipeline optimization changes the game. Instead of running everything all the time, smart systems look at code changes, past patterns, and dependencies to figure out what really matters. Harness CI brings these optimization methods together into one platform. Find out more about how to speed up your pipelines.
AI-based optimization is all about getting rid of waste, not adding capacity. One way to solve the problem is to clean out your garage, and the other way is to rent a storage unit.
Recent studies show that AI methods like reinforcement learning are the best way to improve CI/CD, with testing accounting for 41.2% of all optimization gains. This is how modern platforms handle it:
Test Intelligence looks at code dependencies and past patterns to only run the tests that were affected by your changes. Changed just one service? You don't have to take the whole test suite like you're studying for finals if you only have one test.
According to research, this method cuts the time it takes to run tests by 40% and the time it takes to build everything by 33%. Instead of waiting for thousands of tests to finish before they can merge a two-line fix, developers get feedback right away.
To keep costs down, you need to make changes to the building, not just buy cheaper machines. Ephemeral build environments run each job in separate, dedicated containers that automatically grow and shrink as needed. It's like Uber for build capacity: you only pay for what you use when you use it.
This gets rid of the "noisy neighbor" effect, where one team's resource-heavy build slows down everyone else. Teams say that infrastructure costs have gone down by as much as 76% when they use smart caching of dependencies and Docker layers along with Jenkins clusters that are over-provisioned and mostly idle.
Instead of being the referee between teams, platform leaders use automated policies to see and control what's going on. Analytics dashboards show build performance metrics, failure patterns, and how resources are used across teams without needing custom tools that always turn into someone's side project.
Policy templates and RBAC controls make sure that security practices are always the same. SLSA L3 compliance makes sure that the build provenance can't be changed. This lets developers do things on their own within limits. Developers get the freedom they want, platform teams get the control they need, and nobody's happy hour is ruined by emergency pipeline fixes.
To optimize a multi-cloud environment, you need to find a balance between letting developers work on their own and keeping control of operations. You want teams to work quickly, but you don't want your infrastructure to become a lawless place. These practices help platform teams keep their performance steady without making things more complicated.
Give teams the freedom to work on their own without letting the pipeline get too big or the security get too weak. Use Open Policy Agent rules to make sure that things like container scanning are done, but let developers change how they work. It's like building with LEGOs: the pieces fit together in certain ways, but teams can still make whatever they need.
Get rid of noisy neighbors and the risk of leaks between clouds and regions. Each build execution takes place in a clean, isolated environment. This stops configuration drift and makes sure that performance is always the same, no matter which cloud runs the job.
Set clear limits and alerts for queue time, cache hit rate, flaky test rate, and cost per build, and treat them as business-critical metrics. These become your optimization compass, showing you where things are slowing down before they affect how much work developers can get done. You can't fix something if you don't measure it, and you definitely can't explain why your budget went over without data.
All cloud providers should use dependency fingerprinting and reuse of Docker layers. A cache hit rate of more than 80% means that the optimization is working well. A sudden drop in the rate means that there are configuration problems or changes in dependencies that need to be fixed. When caching works, builds go fast. You'll know right away when it breaks.
Put scanning and compliance checks right into the templates for the pipeline. This shift-left method finds vulnerabilities early and keeps the same level of security whether builds run on AWS, Azure, or Google Cloud. Instead of a separate gate where developers wait for approvals, security happens automatically.
Keep an eye on this along with other traditional performance metrics. Sudden spikes often show that resources are being used inefficiently or that test suites are running out of control and using up compute power without adding value. You want more than just "CI stuff" when your CFO asks why the AWS bill doubled.
The best methods focus on getting rid of extra work by making smart choices and reusing things. In real business settings, these methods can cut the time it takes to do things by as much as 8 hours to less than 1 hour. Deploying before lunch is different from deploying before you leave for the day.
This is how to put these optimization ideas into action:
Test Intelligence looks at code changes and only runs the unit tests that are needed, cutting test cycles by up to 80%. Combine this with flaky test quarantine to separate tests that don't work and make your feedback signals more stable. No more running the whole suite again because one flaky test failed three times this week.
Cache Intelligence takes care of dependency caching on its own, and Docker layer caching can cut build times by 70 to 90%. Keep an eye on cache hit rates and set size limits to keep cache bloat from slowing down performance. A well-tuned cache is like a toolbox that is well-organized: everything is where you need it.
Build Intelligence stores compiled artifacts and test results in caches, which speeds up builds by 30% to 40% by avoiding unnecessary rebuilds. First, do quick checks to avoid having to do expensive work on code that hasn't changed. Why do you have to recompile everything when only one service changed?
Put Docker instructions in order from least to most frequently changing, and copy dependency manifests before source code. This simple change makes it possible to reuse layers across builds. The order in which you load the dishwasher makes a big difference.
Use BuildKit Cache Mounts with Package Managers Cache package download directories (npm, pip, Maven) across builds to significantly reduce infrastructure costs by having to rebuild less often.
To cut down on the total time it takes to run a pipeline, run independent steps at the same time. Test sharding and parallel execution can cut down on feedback cycles by a lot. Don't make things that don't depend on each other wait in line.
Keep an eye on queue time, cache hit rates, flaky test percentages, and cost per build as top metrics. Use the built-in analytics to make sure that improvements last and aren't just short-term gains. Things that are measured get better.
Even when they use the right strategies, teams run into problems when they try to optimize pipelines. What is good news? We can see these problems coming, which means we can also see the solutions.
Legacy systems often have builds that are tightly linked, which makes it hard to improve them bit by bit. No one wants to be the one who breaks the build because everything depends on everything else.
How to fix:
Developers have to run pipelines again or ignore failures completely when tests are not reliable. When "just run it again" is common advice, you've lost the signal in the noise.
How to fix it:
Costs for infrastructure become hard to predict as teams and pipelines grow. This month's $40,000 surprise is last month's $10,000 bill.
How to fix it:
If done wrong, security scanning can slow down pipelines a lot. No one wants to have to choose between safety and speed.
How to fix it:
The next generation of CI/CD will focus on predictive optimization and self-healing. Systems will stop problems from happening instead of reacting to them.
It becomes harder to find the cause of failures as pipelines become more complicated. AI will find the most likely causes, point out patterns that keep happening, and suggest practical solutions before you finish your first cup of coffee.
Before problems happen, systems will learn from past patterns to allocate resources. It's like traffic apps that tell you to take a different route before you get stuck in traffic.
Pipelines will find problems and automatically roll back changes or start remediation workflows without any human help. The engineer who is on call stays asleep, and the problem fixes itself.
Governance will be shown as policy: who can do what, where workloads can run, and what needs to be approved. All of this can happen without slowing down developers or making platform teams look over every change.
AI-powered acceleration is the first step in optimizing a pipeline by getting rid of unnecessary work. Test Intelligence, Cache Intelligence, and Build Intelligence speed up feedback cycles by only running what matters and reusing outputs that don't change. These aren't just ideas; they're tools that get real results.
Standardized templates with policy enforcement make governance easier without limiting the freedom of developers. In just two quarters, 92% of commercial cloud pipelines adopted Microsoft's governed templates. This shows that this method can grow quickly, even in very large companies.
Book a demo to see how Harness Continuous Integration delivers builds that are four times faster and cuts infrastructure costs by up to 76%.
Pipeline optimization saves money by smartly allocating resources and getting rid of unnecessary compute work. Selective test execution and AI-powered caching cut compute time by 30% to 80%. Ephemeral build machines get rid of wasted resources and automatically adjust compute resources to the right size. You stop paying for space you don't need.
Use golden templates with automatic policy enforcement to make security requirements the same for everyone while still letting developers be flexible. Automated checks and approval workflows help platform teams set rules for how things should be done. Within those guardrails, developers still have control over how things are done. They keep you safe like highway guardrails do, but they don't tell you exactly where to go.
Legacy migrations are hard because they require complicated configurations and training for the whole team. Most teams finish transitions in 6 to 12 weeks. Migration tools take care of routine tasks, but custom integrations need to be done by hand. During the learning curve phase, you should expect your productivity to go down at first. Plan for it, tell people about it, and the dip will be shorter.
Test Intelligence cuts test cycles by up to 80% by only running tests that are affected by code changes. Add build output caching and Docker layer caching to get even better results. Parallel execution and incremental builds get rid of extra work at all stages of CI. Begin with the method that deals with your biggest problem.
SLSA L3 compliance works by automatically generating provenance and artifact attestation, which doesn't slow down builds. Instead of making separate approval gates, security scanning is built right into build templates. Isolated build environments and tamper-proof artifact generation keep things compliant while keeping speed. You don't have to pick between speed and safety.
Yes. Faster feedback, less manual work, and regular quality checks are good for teams of all sizes. You don't need a team of platform engineers to optimize modern platforms. You don't need a group of 50 people to speed up your builds.
Most teams see real progress in a matter of weeks. Quick wins like smart caching and test selection make the feedback cycle better right away. Ephemeral environments and other more thorough optimizations take longer but keep costs down over time. Start small, see what works, and then grow it.


--
Key Takeaways:
The Harness MCP server is an MCP-compatible interface that lets AI agents discover, query, and act on Harness resources across CI/CD, GitOps, Feature Flags, Cloud Cost Management, Security Testing, Resilience Testing, Internal Developer Portal, and more.
--
The first wave of MCP servers followed a natural pattern: take every API endpoint, wrap it in a tool definition, and expose it to the LLM. It was fast to build, easy to reason about, and it was exactly how we built the first Harness MCP server. That server taught us a lot: solid Go codebase, well-crafted tools, broad platform coverage across 30 toolsets. It also taught us where the one-tool-per-endpoint model hits a wall.
For platforms the size of Harness, spanning the entire SDLC, the pattern doesn't scale. When you expose one tool per API endpoint, you're asking the LLM to be a routing layer, forcing it to do something a switch statement does better. Every tool definition consumes context that could be spent on reasoning. At ~175 tools, that's ~26% of the LLM's context window before the developer even types a prompt.
So we iterated. The Harness MCP v2 redesign does the same work with 11 tools at ~1.6% context consumption. The answer isn't fewer features, it's a different architecture: a registry-based dispatch model where the LLM reasons about what to do, and the server handles how to do it.
When an MCP client connects to a server, it loads every tool definition into the LLM's context window. Every name, description, parameter schema, and annotation. For the first Harness server at ~130+ active tools, here's what that costs:

That's the core insight: the first server uses ~26% of context on tool definitions before any work begins. The v2 uses ~1.6%.
This isn't a theoretical concern. Research on LLM behavior in large context windows, including Liu et al.'s "Lost in the Middle" findings, shows that models struggle to use information placed deep within long contexts. As Ryan Spletzer recently wrote, dead context doesn't sit inertly: "It dilutes the signal. The model's attention is spread across everything in the window, so the more irrelevant context you pack in, the less weight the relevant context carries."
Anthropic's own engineering team has documented this trade-off: direct tool calls consume context for each definition and result, and agents scale better when the tool surface area is deliberately constrained.
The problem compounds in real-world developer environments. If you're running Cursor or Claude Code with a Playwright MCP, a GitHub MCP, and the Harness MCP, those tool definitions stack. EclipseSource's analysis shows that a standard set of MCP servers can eat 20% of the context window before you even type a prompt. The recommendation: stay below 40% total context utilization. Any MCP server with 100+ tools, ours included, would consume more than half that budget on its own.
The context window tax isn't unique to Harness: it's an industry-wide problem. Here's how the v2 server compares to popular MCP servers in the wild:

Lunar.dev research: "5 MCP servers, 30 tools each → 150 total tools injected. Average tool description: 200–500 tokens. Total overhead: 30,000–60,000 tokens. Just in tool metadata." MCP server v2 at ~3,150 tokens would represent just 5–10% of a typical multi-server setup's overhead.
Real-world Claude Code user: A developer on Reddit r/ClaudeCode with Playwright, Context7, Azure, Postgres, Zen, and Firecrawl MCPs reported 83.3K tokens (41.6% of 200K) consumed by MCP tools immediately after /clear. That's before a single prompt.
Anthropic's code execution findings: Anthropic's engineering team reported that a workflow consuming 150,000 tokens was reduced to ~2,000 tokens (a 98.7% reduction) by switching from direct tool calls to code-based tool invocation. The principle is clear: fewer, smarter tools beat more, narrower ones.
MCPAgentBench: An academic benchmark found that "nearly all evaluated models exhibit a decline of over 10 points in task efficiency when tool selection complexity increases." Models overwhelmed with tools prioritize task resolution over execution efficiency. They get the job done, but waste tokens doing it.
Cursor enforces an 80-tool cap, OpenAI limits to 128 tools, and Claude supports up to ~120. The v2 server's 11 tools leave massive headroom to run Harness alongside other MCP servers without hitting these limits.
Consider a concrete example: a developer running Cursor with Playwright (21 tools), GitHub MCP (~40 tools), and the old Harness MCP (~175 tools) would hit ~236 tools, well past Cursor's 80-tool cap. With v2 Harness (11 tools), the same stack is 72 tools, comfortably under the limit.
With Claude Code, the same old stack would burn ~76,400 tokens (~38%) on tool definitions alone. With v2, it drops to ~27,550 tokens (~14%), freeing ~48,850 tokens for actual reasoning and conversation.
The MCP ecosystem is in the middle of a reckoning. Scalekit ran 75 benchmark runs comparing CLI and MCP for identical GitHub tasks on Claude Sonnet 4, and CLI won on every efficiency metric: 10–32x cheaper, 100% reliable vs MCP’s 72%. For a simple “what language is this repo?” query, CLI used 1,365 tokens. MCP used 44,026 — almost entirely from schema injection of 43 tool definitions the agent never touched.
The Playwright team shipped the same verdict in hardware. Their new CLI tool saves browser state to disk instead of flooding context. In BetterStack’s benchmarks, CLI used ~150 tokens per interaction vs MCP’s ~7,400+ of accumulated page state. CircleCI found CLI completed browser tasks with 33% better token efficiency and a 77 vs 60 task completion score.
The CLI camp’s argument is real: schema bloat kills performance. But their diagnosis points at the wrong layer. The problem isn’t MCP. It’s naive MCP server design.
CLI wins when the agent already knows the tool. gh, kubectl, terraform: these have extensive training data. The agent composes commands from memory, pays zero schema overhead, and gets terse, predictable output. Scalekit found that adding an 800-token “skills document” to CLI reduced tool calls and latency by a third.
CLI also wins on composition. Piping grep into jq into xargs chains operations in a single tool call. An MCP agent doing the same work makes N round-trips through the LLM, each one burning context.
But CLI’s advantages dissolve the moment you cross three boundaries:
CLI works when the agent knows the command. For a platform like Harness, with 122+ resource types across CI/CD, GitOps, FinOps, security, chaos, and IDP, the agent can’t know the API surface from training data alone. MCP’s harness_describe tool lets the agent discover capabilities at runtime. CLI would require the agent to guess curl commands against undocumented APIs.
As Scalekit themselves concluded: “The question isn’t CLI or MCP. It’s who is your agent acting for?” CLI auth gives the agent ambient credentials: your token. For multi-tenant, multi-user environments (which is where Harness operates), MCP provides per-user OAuth, explicit tool boundaries, and structured audit trails.
CLI agents can run arbitrary shell commands. An MCP server constrains the agent to declared tools with typed inputs. The v2 server’s elicitation-based confirmation flows, fail-closed deletes, and read-only mode are protocol-level safety guarantees that CLI can’t replicate.
The CLI vs MCP debate is really about schema bloat and naive tool design. The v2 Harness MCP server eliminates the arguments against MCP without losing the arguments for it:
Schema bloat? 11 tools at ~3,150 tokens. That’s less than a single CLI help output for a complex tool. Cursor’s 80-tool cap? We use 11. The 44,026-token GitHub MCP problem? We’re 14x leaner.
Round-trip overhead? The registry-based dispatch means the agent makes one tool call to harness_diagnose and gets back a complete execution analysis — pipeline structure, stage/step breakdown, timing, logs, and root cause. A CLI agent would need to chain 4–5 API calls to assemble the same picture.
Discovery? harness_describe is a zero-API-call local schema lookup. The agent discovers 125+ resource types without a single network request. CLI would require a man page the agent has never seen.
Composition? Skills + prompt templates encode multi-step workflows (build-deploy-app, debug-pipeline-failure) as server-side orchestration. The agent reasons about what to do; the server handles how to chain it. Same efficiency as a CLI pipe, with protocol-level safety.
The real lesson from the benchmarks: MCP servers with 43+ tools and no architecture for context efficiency will lose to CLI on cost metrics. But a well-designed MCP server with 11 tools, a registry, and a skills layer outperforms both naive MCP and naive CLI — and provides authorization, safety, and discoverability that CLI architecturally cannot.
We stopped designing for API parity and started designing for agent usability.
The v2 server is built around a registry-based dispatch model. Instead of one tool per endpoint, we expose 11 intentionally generic verbs. The intelligence lives in the registry: a declarative data structure that maps resource types to API operations.

When an agent calls harness_list(resource_type="pipeline"), the server looks up pipeline in the registry, resolves the API path, injects scope parameters (account, org, project), makes the HTTP call, extracts the relevant response data, and appends a deep link to the Harness UI. The agent never needs to know the underlying API structure.
Each registry entry is a declarative ResourceDefinition:
{
resourceType: "pipeline",
displayName: "Pipeline",
toolset: "pipelines",
scope: "project",
identifierFields: ["pipeline_id"],
operations: {
list: {
method: "GET",
path: "/pipeline/api/pipelines/list",
queryParams: { search_term, page, size },
responseExtractor: (raw) => raw.content
},
get: {
method: "GET",
path: "/pipeline/api/pipelines/{pipeline_id}",
responseExtractor: (raw) => raw.data
}
}
}
Adding support for a new Harness module requires adding one declarative object to the registry. No new tool definitions. No changes to MCP tool schemas. The LLM's tool vocabulary stays constant as the platform grows.
Today, the registry covers 125+ resource types across 30 toolsets, spanning the full Harness platform:
The architecture wasn't designed in a vacuum. We built it specifically for the environments developers actually use.
Cursor and Windsurf connect via stdio transport — the server runs as a local process alongside the IDE. With 11 tools instead of 130+, the Cursor agent has a minimal, clear menu. It doesn't waste reasoning cycles on tool selection or get confused by 40 CCM-specific tools when the developer is debugging a pipeline failure.
For teams that only use specific Harness modules, HARNESS_TOOLSETS lets you filter at startup:
{
"mcpServers": {
"harness": {
"command": "npx",
"args": ["-y", "harness-mcp-v2@latest"],
"env": {
"HARNESS_API_KEY": "pat.xxx.yyy.zzz",
"HARNESS_TOOLSETS": "pipelines,services,connectors"
}
}
}
}
The agent only sees resource types from the enabled toolsets. The rest don't exist as far as the LLM is concerned.
Claude Code excels at multi-step workflows. We leaned into that with 26 prompt templates across four categories:
Each prompt template encodes a multi-step workflow the agent can execute. debug-pipeline-failure doesn't just fetch an execution — it calls harness_diagnose, follows chained failures, and produces a root cause analysis with actionable fixes.
The v2 server also supports multi-project workflows without hardcoded environment variables. An agent can dynamically discover the account structure, then scope subsequent calls with org_id and project_id parameters. No configuration changes needed.
Every tool accepts an optional url parameter. Paste a Harness UI URL, a pipeline page, an execution log, a dashboard, and the server automatically extracts the account, org, project, and resource identifiers. The agent gets context without the developer having to specify it manually.
Reducing tool count solves the context efficiency problem. But developers don't just need fewer tools — they need tools that know how to chain together into real workflows. That's where Harness Skills come in.
The v2 server ships with a companion skills layer (github.com/thisrohangupta/harness-skills) that turns raw MCP tool access into guided, multi-step workflows. Skills are IDE-native agent instructions that teach the AI how to use the MCP server effectively — without the developer having to explain Harness concepts or orchestration patterns.
Skills operate at three levels:
Every IDE gets a base instruction file, loaded automatically when the agent starts:
These files teach the agent: what the 11 tools do, how Harness scoping works (account → org → project), dependency ordering (always verify referenced resources exist before creating dependents), and how to extract context from Harness UI URLs.
The 26 MCP prompt templates registered directly in the server. Any MCP client can invoke them. They encode multi-step workflows with phase gates, e.g., build-deploy-app structures a 4-phase workflow (clone → scan → CI pipeline → deploy) with explicit "do not proceed until this step is done" checkpoints.
Specialized SKILL.md files that function as slash commands in the IDE. Each skill includes YAML frontmatter (trigger phrases, metadata), phased instructions, worked examples, performance notes, and troubleshooting steps.
Without skills, a developer says "deploy my Node.js app" and the agent has to figure out the right Harness concepts, the correct ordering, and the proper API calls from scratch. With skills, the flow is:
harness_list / harness_create / harness_execute callsThe skills layer delivers three measurable improvements:
Without skills, the agent typically needs 3–5 exploratory tool calls to understand Harness's resource model before starting real work. Skills encode this knowledge upfront — the agent knows to check for existing connectors before creating a pipeline, to verify environments exist before deploying, and to use harness_describe for schema discovery instead of trial-and-error.
Harness resources have strict dependency chains (connector → secret → service → environment → infrastructure → pipeline → trigger). Skills encode the 7-step "Deploy New Service" and 8-step "New Project Onboarding" workflows as ordered sequences. The agent doesn't discover dependencies through failures, it follows the prescribed order.
Each failed API call and retry burns tokens. Skills eliminate the most common failure modes (wrong scope, missing dependencies, incorrect parameter formats) by teaching the agent the patterns before execution. The combination of 11 tools (minimal context overhead) plus skills (minimal wasted calls) means more of the context window is available for the developer's actual task.
The first Harness MCP server (harness/mcp-server) pioneered the IDE-native pattern with a review-mcp-tool command that works across Cursor, Claude Code, and Windsurf via symlinked definitions:
One canonical definition in .harness/commands/, symlinked to all three. Update once, propagate everywhere.
The v2 skills layer extends this pattern from developer-tool commands to full DevOps workflows, the same "define once, deploy to every IDE" architecture, applied to pipeline creation, deployment debugging, cost analysis, and security review.
MCP servers that can create, update, and delete resources need safety guardrails. We built them in from the start.
Human-in-the-loop confirmation: All write operations use MCP elicitation to request explicit user confirmation before executing. The agent presents what it intends to do; the developer approves or rejects.
Fail-closed destructive operations: harness_delete is blocked entirely if the MCP client doesn't support elicitation. No silent deletions.
Read-only mode: Set HARNESS_READ_ONLY=true for shared environments, demos, or when you want agents to observe but not act.
Secrets safety: The secret resource type exposes metadata (name, type, org, project) but never the secret value itself.
Rate limiting and retries: Configurable rate limits (default: 10 req/s), automatic retries with backoff for transient failures, and bounded pagination to prevent runaway list operations.
The v2 server supports two transports:
For team deployments, the HTTP transport is compatible with MCP gateways like Portkey, LiteLLM, and Envoy-based proxies, enabling shared control planes with centralized auth, observability, and policy enforcement.
# Local (Cursor, Claude Code)
npx harness-mcp-v2@latest
# Remote (team deployment)
npx harness-mcp-v2@latest http --port 3000
# Docker
docker run -e HARNESS_API_KEY=pat.xxx.yyy.zzz harness-mcp-v2
The shift from 130+ tools to 11 isn't about simplification for its own sake. It's about recognizing that the best MCP servers are capability-oriented agent interfaces, not API mirrors.
Building the first Harness MCP server taught us the same lesson the broader ecosystem is learning: when you expose one tool per API endpoint, you're asking the LLM to be a routing layer. You're consuming context on definitions that could be used for reasoning. And you're fighting against the LLM's actual strengths, reasoning, planning, and multi-step problem solving, by forcing it to do something a switch statement does better. That first server made the cost concrete. The v2 is our answer.
The registry pattern inverts this. The tool vocabulary is stable: 11 verbs today, 11 verbs when Harness ships 50 more resource types. The registry is extensible. The skills layer is composable. The LLM reasons about what to do, and the server handles how to do it. That's not just an efficiency win — it's the correct division of labor between an LLM and a server.
This is the pattern we think more MCP servers should adopt, especially platforms with broad API surfaces. The MCP specification itself is built on the idea that servers expose capabilities, not endpoints. We took that literally.
The efficiency gains from the v2 architecture translate directly into concrete, time-saving use cases for developers operating within their IDEs. The combination of a minimal tool surface (11 tools), deep resource knowledge (125+ resource types), and pre-encoded workflows (Harness Skills) allows the agent to handle complex DevOps tasks with minimal guidance.
See it in action:
Some other use cases:
Debug a Failed CI Pipeline: Get root cause and logs for a pipeline run.
Onboard New Service: Create a Service, Environment, Infrastructure, and initial Connector.
Review Cloud Cost Anomaly: Investigate a sudden spike in cloud spend.
Check Compliance Status: Verify a service's SBOM compliance against OPA policies.
Deploy App to Prod: Execute a canary deployment pipeline.
npx harness-mcp-v2@latest
Configure with your Harness PAT (account ID is auto-extracted):
HARNESS_API_KEY=pat.<accountId>.<tokenId>.<secret>
Full source: github.com/thisrohangupta/harness-mcp-v2
Official Harness MCP Server: github.com/harness/mcp-server
---
The Harness MCP server is an MCP-compatible server that lets AI agents interact with Harness resources using a small set of generic tools.
Each exposed tool adds metadata to the model context. A smaller tool surface leaves more room for reasoning and task execution.
Instead of exposing one tool per API endpoint, it uses 11 generic tools plus a registry that maps resource types to the correct API operations.
The post mentions Cursor, Claude Code, Claude Desktop, Windsurf, Gemini CLI, and other MCP-compatible clients.
The design includes write confirmations, fail-closed delete behavior, read-only mode, and controls for retries, rate limiting, and deployment transport.
.jpg)
.jpg)
CI/CD tools are software platforms that automate code integration, testing, release preparation, and deployment. They connect source control, build systems, test frameworks, and runtime environments into a repeatable delivery pipeline.
CI/CD tools sit at the center of how modern teams ship software. Instead of pushing risky, manual releases once a month, you automate builds, tests, and deployments so every change follows the same, reliable path to production. Done right, CI/CD turns release day from an “all‑hands fire drill” into just another commit.
In this guide, we will walk through what ci cd tools are, the key features that actually matter, and how to choose the right platform for your stack.
Along the way, we will show how platforms like Harness Continuous Integration and Harness Continuous Delivery & GitOps bring AI, governance, and deep insights together so you can ship faster without losing control.
CI/CD tools are the backbone of modern software delivery. They automate the process of building, testing, and deploying code, so changes can move from commit to production with minimal friction.
At a minimum, effective CI/CD tools:
To go deeper on pipelines themselves, see our guide on the basics of CI/CD pipelines.
The importance of CI/CD tools in today's software development ecosystem is hard to ignore. They address several challenges teams face every day:
Martin Fowler defined Continuous Integration (CI) as “a software development practice where each member of a team merges their changes into a codebase together with their colleagues' changes at least daily.” Each integration triggers automated builds and tests, allowing teams to detect and address integration issues early. This approach helps maintain a consistently stable codebase and reduces the time and effort required for integration at later stages of development.
Modern CI/CD tools extend this by making those builds faster and more insightful, surfacing exactly which tests or components were impacted by a given change.
The "CD" in CI/CD can stand for either Continuous Delivery or Continuous Deployment. While closely related, these concepts have distinct implications for the software release process.
Continuous Delivery is an extension of continuous integration. It automates the process of preparing code changes for release to production. In continuous delivery, every change that passes automated tests is kept in a production-ready state and can be deployed at any time, often with a manual approval step before release. Additional tests and security scans are run in these test environments. This allows for manual approval and additional testing before the final push to production.
Teams often rely on CI/CD tools with strong approval workflows and policy controls here, so releases stay safe without turning into ticket‑driven bottlenecks.
Continuous Deployment takes automation a step further. In this model, every change that passes the automated tests is automatically deployed to production without manual intervention.
This approach requires a high degree of confidence in the testing process and can significantly reduce the time between writing code and seeing it live in production.
In practice, only teams with mature testing, monitoring, and rollback capabilities should aim for full continuous deployment.
Not all CI/CD tools solve the same problems. When you compare options, focus on a few core dimensions:
While CI/CD and DevOps are often mentioned in the same breath, they are not synonymous. CI/CD refers to specific practices and tools within the software development lifecycle, while DevOps is a broader cultural and operational philosophy.
DevOps aims to break down barriers between development and operations teams, fostering collaboration and shared responsibility. CI/CD practices are a key component of DevOps, but DevOps encompasses a wider range of principles and practices aimed at improving overall software delivery and operational performance.
Think of CI/CD tools as the automation layer that makes DevOps ways of working real in day‑to‑day delivery.
CI/CD security is a critical consideration in modern software development. It involves implementing security measures throughout the CI/CD pipeline to protect against vulnerabilities and ensure the integrity of the software delivery process. This includes:
By integrating security into the CI/CD pipeline, organizations can shift security left, addressing potential issues earlier in the development process and reducing the risk of security breaches in production environments. For more information, check out DevSecOps in the Harness Academy.
If you are building or modernizing pipelines today, plan security into your CI/CD tools selection from day one.
Advanced platforms also bring AI into this space. Harness, for example, offers AI‑assisted deployment verification that automatically analyzes metrics and logs during deployments to catch anomalies and trigger safe rollbacks.
The CI/CD tooling landscape is diverse, offering solutions for various needs and preferences. Some common CI/CD tools include:
Each of these CI/CD tools has strengths. The right choice depends on your existing ecosystem, team skills, compliance needs, and appetite for maintaining tooling.
A practical evaluation process for CI/CD tools looks something like this:
If you are comparing cloud‑hosted vs self‑managed approaches, our article on cloud-based CI/CD options outlines trade‑offs across control, cost, and operational overhead.
Harness stands out in the CI/CD tooling landscape as a comprehensive Software Delivery Platform that addresses the complexities of modern software development. Here's how Harness can elevate your CI/CD processes:
In practice, that looks like:
By adopting Harness as your CI/CD tools platform, you can streamline software delivery, improve code quality, and accelerate time to market while still meeting strict security and governance requirements.
CI/CD tools are software systems that automate how code is built, tested, and deployed. They connect your source control, test suites, and runtime environments into a repeatable pipeline so every change follows the same path to production.
Many CI/CD tools focus just on automation for builds and deployments. DevOps platforms go further with governance, security, cost controls, and developer self‑service. Harness combines both, so you do not need a separate stack of ad‑hoc scripts and point tools.
Yes. Even very small teams benefit from automated builds and tests. Manual steps are fragile and do not scale. Starting with CI/CD tools early keeps quality high and avoids painful rewrites of your delivery process later.
They provide consistent places to run security scans, enforce policies, and control who can deploy what. When combined with DevSecOps practices and capabilities like AI‑assisted verification, CI/CD tools help catch vulnerabilities before they hit customers.
Look for fast feedback, strong integration with your Git provider, clear governance stories, and evidence that the tool can handle your scale. AI‑driven insights and good observability into pipelines are now table stakes for serious teams.
Traditional tools often require heavy scripting and manual integration. Harness focuses on intelligent automation, policy‑driven governance, and a unified platform that covers CI, CD, and insights in one place, so platform teams can standardize delivery without slowing developers down.
With Harness Continuous Delivery & GitOps, you can create reusable templates for rolling deployment pipelines, link them to observability tools, and use AI to help with verification. Harness checks metrics and logs at every step of a rollout and can automatically pause or roll back if there are any problems. This makes rolling deployment a low-effort, repeatable process.
In GitOps, manifests stored in Git describe how rolling deployments should work, and tools like Argo CD make sure that the desired state is reflected in Kubernetes clusters. Platforms like Harness GitOps add enterprise-level visibility, governance, and promotion workflows to Argo CD. This makes it easier to run rolling deployments on a large scale across many services and clusters.
.png)
.png)
Modern software delivery has dramatically accelerated. AI-assisted development, automated CI/CD pipelines, and cloud-native architectures have made it possible for teams to deploy software dozens of times per day.
But speed alone does not guarantee reliability.
At Conf42 Site Reliability Engineering (SRE) 2026, Uma Mukkara, Head of Resilience Testing at Harness and co-creator of LitmusChaos, delivered a clear message: outages are inevitable. In modern distributed systems, assuming your design will always work is not just optimistic—it’s risky.
In fact, as Uma put it, failure in distributed systems is a mathematical certainty.
That’s why resilience testing must become a core, continuous practice in the Software Development Life Cycle (SDLC).
Even the most reliable cloud providers experience outages.
Uma illustrated this with examples that highlight how unpredictable failures can be:
These incidents demonstrate an important reality: the types of failures constantly evolve.
A system validated during design may not be resilient against tomorrow’s failure scenarios. Architecture may stay the same, but the failure patterns surrounding it continuously change.
This is why resilience cannot rely on assumptions.
Hope is not a strategy—verification is.
For a deeper look at this broader approach to resilience, see how chaos engineering, load testing, and disaster recovery testing work together.
Resilience is often misunderstood as simply keeping systems online.
But uptime alone does not make a system resilient.
Uma defines resilience more precisely:
Resilience is the grace with which systems handle failure and return to an active state.
In practice, a resilient system must handle three categories of disruption:
Pod crashes, node failures, infrastructure disruptions, or network faults.
Traffic spikes or sudden demand that pushes systems to their limits.
Regional outages, multi-AZ failures, or infrastructure loss that require recovery mechanisms.
If teams test only one of these dimensions, they leave significant risks undiscovered.
True resilience requires verifying how systems behave across all three scenarios.
One of the biggest challenges Uma highlighted is how organizations treat resilience.
Many teams still see it as a “day-two problem”—something SREs will handle after systems are deployed.
Others assume that once resilience has been validated during system design, the problem is solved.
In reality, resilience must be continuously verified.
As systems evolve with each release, so do their failure modes. The most effective strategy is to:
This approach shifts resilience testing into the outer loop of the SDLC, alongside functional and performance testing.
Instead of waiting for production incidents, teams proactively identify weaknesses before customers experience them.
Uma introduced an important concept: resilience debt.
Resilience debt is similar to technical debt. When teams postpone resilience validation, they leave hidden risks unresolved in the system.
Over time, that debt accumulates.
And when failure eventually occurs—which it inevitably will—the business impact grows proportionally to the resilience debt that was ignored.
The only way to reduce this risk is to steadily increase resilience testing coverage over time.
As testing matures across multiple quarters, organizations gain better feedback about system behavior, uncover more risks earlier, and continuously reduce the likelihood of severe outages.
Another key takeaway from Uma’s session is that resilience testing should not happen in silos.
Many organizations treat chaos testing, load testing, and disaster recovery validation as separate initiatives owned by different teams.
But the most meaningful risks often appear when these scenarios intersect.
For example:
That’s why resilience testing must be approached as a holistic practice combining:
You can explore the fundamentals of resilience testing in the Harness documentation.
Resilience testing also requires collaboration across multiple roles.
Developers, QA engineers, SREs, and platform teams all contribute to validating system reliability.
Uma pointed out that many organizations already share infrastructure for testing but run different experiments independently. By coordinating these efforts, teams can:
Resilience becomes significantly stronger when personas, environments, and test assets are shared rather than siloed.
As systems become more complex, another challenge emerges: knowing what to test and when.
Large organizations may have hundreds of potential experiments, making it difficult to prioritize testing effectively.
Uma described how agentic AI systems can help address this challenge.
By analyzing internal knowledge sources such as:
AI systems can recommend:
These recommendations allow teams to run the right tests at the right moment, improving resilience coverage without overwhelming engineering teams.
To support this holistic approach, Harness has expanded its original Chaos Engineering capabilities into a broader platform: Harness Resilience Testing.
The platform integrates multiple testing disciplines in a single environment, enabling teams to:
By combining these capabilities, teams gain a single pane of glass for identifying resilience risks across the SDLC.
This unified view allows organizations to track trends in system reliability and proactively address weaknesses before they turn into production incidents.
Uma closed the session with a clear conclusion.Resilience testing is not optional.
Outages will happen. Infrastructure will fail. Traffic patterns will change. Dependencies will break.
What matters is whether organizations have continuously validated how their systems behave when those failures occur.
The more resilience testing coverage teams build over time, the more feedback they receive—and the lower the potential business impact becomes.
In modern software delivery, resilience is no longer just a reliability practice.
It is a core discipline of the enterprise SDLC.
Ready to start validating your system’s resilience?
Explore Harness Resilience Testing and start validating reliability across your SDLC.
.png)
.png)
E2E Testing Has a New Bottleneck, and It's Not the Code
End-to-end (E2E) testing has always been the hardest part of a QA strategy. You're simulating real users, navigating real flows, validating real outcomes across browsers, environments, and data states that never hold still.
Traditional test automation tackled this with scripts: rigid, deterministic sequences tied to element selectors and hard-coded values. They worked until the UI changed. Or the data changed. Or a new team member touched the wrong locator. The result: flaky, expensive test maintenance cycles that teams quietly stopped trusting.
AI-driven testing/AI Test Automation promised to fix this. And it has, but only for teams who figured out the new bottleneck. It's not the model. It's not the tooling. It's the prompt engineering.
In AI test automation, you don't write scripts anymore; you write instructions. And the quality of those instructions determines everything that follows.
In general AI usage, a prompt is the input you give to get an output. In intelligent test automation, it's much more specific: a prompt is a natural language instruction that tells the AI testing engine what to do, what to verify, and how to handle what it finds.
A complete, well-formed test prompt for E2E automation includes five ingredients:
Goal
What business outcome is being tested? (e.g., 'User completes checkout with a promo code applied')
Context
Where does the test start? What preconditions exist? What user state or data should be assumed?
Specifics
Exact values, field names, amounts, account types, formats, and no ambiguity about inputs or expected data.
Assertion
What does success look like? A confirmation message? A balance update? A redirect to a specific URL?
Boundaries
What should the AI NOT do? What's out of scope for this particular test step?
Miss any one of these, and you've handed the AI a half-built blueprint. It will fill in the gaps, just not necessarily the way you intended.
Here's the fundamental truth of AI-driven testing: non-deterministic prompts produce non-deterministic tests. And non-deterministic tests are worse than no tests at all; they create false confidence and burn engineering time chasing phantom failures.

The good news: prompt quality is entirely within your control. Unlike flaky network conditions or unpredictable UI re-renders, a badly written prompt is just a rewrite away from being a reliable one. This is the foundation of self-healing tests. Better prompts dramatically increase the likelihood that the tests can self-heal. Let's break down where prompts go wrong and right.
✅ EFFECTIVE PROMPT
"Navigate to the checkout page, apply promo code SAVE20, and verify the order total shows $80.00 after the discount is applied from $100.00."
❌ WEAK PROMPT
"Go to checkout and check the discount works."
✅ EFFECTIVE PROMPT
"Click on the row in the Orders table where the Status column shows 'Completed,' and the Order ID matches the ORDER_ID parameter."
❌ WEAK PROMPT
"Click on the completed order."
✅ EFFECTIVE PROMPT
"After the payment confirmation spinner disappears, assert: Is the text 'Payment Successful' visible on screen?"
❌ WEAK PROMPT
"Check that payment worked."
Pattern 1: The Intent + Outcome Pattern
Lead with the business intent, end with the verifiable outcome. This structure forces you to be clear about both what you're doing and how you'll know it worked.
"Complete a standard checkout as a guest user with item SKU-4421, shipping to postcode 90210, and verify the order confirmation page displays an order number."
Why it works: The AI knows the starting intent, the data to use, and exactly what constitutes success. No room for interpretation.
Pattern 2: The Precondition Guard
State what must be true before the test action begins. This prevents cascading failures caused by the AI attempting steps when the application isn't in the right state.
"Given the user is logged in and has at least one saved payment method, navigate to the subscription renewal page and click 'Renew Now'."
Why it works: Guards against false failures. If the precondition isn't met, the test fails meaningfully, not mysteriously.
Pattern 3: Content-Based References (Not Positions)
Never reference UI elements by their position on screen. Reference them by their visible content, label, or semantic role. This is the single biggest driver of self-healing tests and reduces test maintenance dramatically.
✅ EFFECTIVE PROMPT
"Select the product named 'Wireless Mouse' from the search results."
❌ WEAK PROMPT
"Select the second item in the search results."
Why it works: Lists reorder. Pages change. Content-based references survive both.
Pattern 4: Atomic Assertions
One assertion should test one condition. Compound assertions ('check that X is visible AND says Y AND the button is enabled') are harder for the AI to evaluate cleanly and produce confusing failure messages.
"Is the error message 'Invalid credentials' visible below the login form?"
Not: 'Is the error message visible and does it say Invalid credentials and is the login button still enabled?', split these into three separate assertions.
Pattern 5: The Fallback Instruction
For data that may not always exist (discounts, optional fields, conditional UI elements), always specify what the AI should do when that data is absent.
"Extract the promotional banner text into PROMO_TEXT, or set PROMO_TEXT to 'none' if no promotional banner is displayed on the page."
Why it works: Tests that handle absence are far more stable across different data states and environments.
Harness AI Test Automation (AIT) is one of the most complete implementations of prompt-driven E2E testing available today. It reduces the need to manually script Selenium/Playwright flows with an intent-driven model: you describe what a user wants to achieve, and Harness AI figures out how to test it.
The platform is built on an agentic AI testing architecture, an autonomous testing system that blends LLM reasoning with real-time application exploration, DOM analysis, and screenshot-based visual validation. What makes it especially relevant to this discussion is that Harness AIT exposes the quality of your prompts directly: write a vague intent, get an unreliable test. Write a precise one, get a test that runs stably in your CI/CD testing pipeline.

"Rather than scripting every step of 'add item to cart and checkout,' a tester writes: Verify that a user can add an item to the cart and complete checkout successfully. The AI testing tool interprets the intent and executes the full flow, including assertions."
Harness structures AI instructions into four command types for codeless test automation. Each has its own prompting rules; get them right and your tests become dramatically more stable.
AI Assertion - Verify application state at a specific point in execution
Write it like this:
"In the confirmation dialog, is the deposit amount displayed as $100.00?"
Avoid this:
"Is the amount correct?" AI has no memory of what amount was entered.
AI Command - Perform a specific, discrete UI interaction
Write it like this:
"After the loading spinner disappears, click the 'Continue' button in the payment form."
Avoid this:
"Click Continue." Which Continue? What if it's not ready yet?
AI Task - Execute a complete multi-step business workflow
Write it like this:
"Transfer $500 from Savings to Checking, confirm the transaction and verify both balances are updated correctly."
Avoid this:
"Transfer money between accounts.", missing values, accounts, and success criteria.
AI Extract Data - Capture dynamic values for use in subsequent test steps
Write it like this:
"Create parameter ORDER_ID and assign the order number from the confirmation message on this page."
Avoid this:
"Get the order number.", stored where? from which element?
When you submit an intent-driven prompt to Harness AIT, it goes through a five-stage pipeline, and the quality of your prompt shapes every stage:
1. Interpret
The LLM Interface Layer reads your natural language prompt and formulates a structured test intent. Vague prompts produce ambiguous intents.
2. Explore
The AI queries its App Knowledgebase (Application Context) to find relevant pages and flows. Specific context in your prompt narrows this search dramatically.
3. Execute
Each step is translated into an executable action. Content-based references in your prompt produce resilient steps. Positional ones produce fragile ones.
4. Validate
DOM and screenshot-based validation confirms both functional and visual state. Your assertion prompts define exactly what gets checked.
5. Learn
Each run updates the App Knowledgebase. Better prompts produce richer, more accurate knowledge, improving future test case generation and reducing test maintenance.
These are the most common prompt antipatterns seen in AI-driven E2E testing, each one a reliable way to introduce flakiness:
Positional References
Saying 'click the third row' or 'select the first option' creates tests that break every time data changes or UI reorders.
Missing Context
Assertions like 'Is the amount correct?' fail because the AI might not have any memory of previous steps. Restate the expected values in assertions. Every prompt must be self-contained.
Compound Assertions
Checking multiple conditions in one assertion makes failures ambiguous. One assertion, one condition, always.
No Success Criteria
Tasks like 'register a new user' without specifying what success looks like leave the AI guessing when to stop.
Assumed Data Formats
Not specifying 'extract the total as a number without a currency symbol' means you might get '$1,234.56' when you needed '1234.56'.
Ignoring Timing
Not accounting for loading states ('after the spinner disappears') is one of the top causes of intermittent test failures.
End-to-end testing has always required precision. The medium has changed, from XPath selectors and coded steps to natural language testing instructions, but the requirement for precision hasn't. If anything, the stakes are higher because a poorly written prompt now fails invisibly: the AI will attempt something, just not what you intended.
The teams getting the most out of AI test automation are not the ones with the most sophisticated models. They're the ones who've learned to write clear, specific, self-contained instructions through effective prompt engineering. Who knows the difference between 'click the third button' and 'click the Submit button in the payment form.' Who ends every assertion with a question mark and every task with a success criterion.
Platforms like Harness AI Test Automation are built to reward exactly this kind of precision, turning well-crafted prompts into stable, self-healing tests that are CI/CD testing-ready and survive the real world with minimal test maintenance.
"The art of prompt engineering isn't about clever wording. It's about transferring your intent, completely and unambiguously, to an autonomous testing system that will act on every word you write."
Write with that precision, and your intelligent test automation will finally be the safety net it was always meant to be.
Harness AI Test Automation empowers teams to move faster with confidence. Key benefits include:
Harness AI Test Automation turns traditional QA challenges into opportunities for smarter, more reliable automation, enabling organizations to release software faster while maintaining high quality.
If you're ready to eliminate flaky tests, simplify maintenance, and improve test reliability with intent-driven, natural-language testing, try Harness AI Test Automation today or contact our team to see how it can transform your testing experience.


---
Key Takeaways
---
AI can generate code in seconds. It still can’t ship software safely.
That gap isn’t about model quality or prompt engineering. It’s about context, and most software organizations don’t have a system that accurately reflects how pipelines, services, environments, policies, and teams actually relate to each other.
Without that context, AI doesn’t automate delivery. It amplifies risk.
I am responsible for building the Knowledge Graph that powers Harness AI, and I see this every day, working on AI infrastructure and data platforms at Harness, and it’s a recurring theme. AI-first delivery fails not because of intelligence, but because of fragmentation.
Modern engineering organizations already generate more data than any human can reason about:
Each system works. The problem is that none of them agree on what the system actually is.
When something breaks, we don’t query systems. We page people. That’s the clearest signal you’ve hit the context bottleneck. When your organization depends on a few humans to resolve incidents, you don’t have a tooling problem. You have a context problem.
Most teams today operate in AI-assisted DevOps:
That’s helpful, but shallow.
AI-operational DevOps is different. Here, AI doesn’t just assist tasks. It understands how software actually moves from commit to production, including constraints, dependencies, and governance.
The difference is a platform problem. Without a shared context layer, AI remains a collection of point optimizations. With one, it becomes an operator.
Context is not dashboards. It’s not a data lake. And it’s definitely not another CMDB.
In practice, context means entities and relationships.
In DevSecOps environments, the most critical entities are:

Pipelines are often the natural center — not because they’re special, but because they express intent.
A pipeline alone isn’t context.
A pipeline links to:
That's the operational truth.
This is why knowledge graphs matter. They don’t store more data; they preserve meaning.
To truly transform the software development lifecycle, AI needs more than just intelligence, it needs deep, operational context. Harness AI uses a purpose-built Software Delivery Knowledge Graph to make AI fast, efficient, and exceptionally accurate. By bridging the gap between raw data and real-world delivery pipelines, we ensure that your AI operates with complete situational awareness from day one, allowing teams to ship faster without breaking things.
I’ve seen three failure modes repeat across organizations:

A knowledge graph only works when it’s use-case driven, minimal, and fresh.
The fastest way to see value is not breadth, it’s focus. Start with one use case that cannot be solved by a single system.
A strong starting point:
To support that, you need:
That’s often fewer than 10 entities. Everything else is enrichment, not day one requirements.
AI agents don’t need perfect context. They need the current context.
For delivery workflows, near real-time synchronization is often mandatory. When a deployment fails, an engineer doesn’t want last month’s answer; they want why it failed now. This is why the semantic layer matters. AI agents should interact with meaning, not raw tables.

AI agents must be treated as extensions of humans, not superusers.
That means:
At Harness, Policy as Code and native policy agents ensure AI can’t bypass governance — even when it’s acting autonomously.
You don’t measure a knowledge graph by node count. You measure it by outcomes.
Four metrics matter:
If context doesn’t improve decisions, it’s noise.
Imagine a developer says, in natural language: “Deploy this service to QA and production.”
Behind the scenes, an AI agent:
If the pipeline fails, the same graph enables automated remediation:
That’s not automation. That’s operational reasoning.
Traditional dashboards tell you what happened. Knowledge graphs tell you why.
Cost spikes only make sense when linked to:
Rollbacks are only safe when dependency graphs are understood. Rolling back a service without knowing the upstream and downstream impact is how outages cascade.
Do this:
Avoid this:
Context is a product, not a schema.
AI-first software delivery doesn’t fail because models aren’t smart enough. It fails because platforms don’t understand themselves.
Knowledge graphs give AI the one thing it can’t generate on its own: context grounded in reality, thus making them the primary pillar in AI-first software delivery context.
The future of software delivery isn't just automated; it's intelligently orchestrated. Because Harness AI uses a Software Delivery Knowledge Graph to make AI fast, efficient, and accurate, your teams can finally trust AI to handle complex operational workflows without adding risk. We’ve done the heavy lifting of mapping your operational truth so your AI can act with absolute precision.
What’s the difference between observability and a knowledge / context graph?
Observability shows what’s happening. Knowledge/Context graphs explain what it means.
Do knowledge graphs replace existing tools?
No. They connect them.
Who owns the knowledge graph?
Everyone: platform, SRE, security, and application teams.
Is this only for large enterprises?
No. Smaller teams benefit faster because tribal knowledge is thinner.
Can AI work without a knowledge graph?
Yes, but only at the task level, not the system level.


AI is changing both what you build and how you build it - at the same time. Today, Harness is announcing two new products to secure both: AI Security, a new product to discover, test, and protect AI running in your applications, and Secure AI Coding, a new capability of Harness SAST that secures the code your AI tools are writing. Together, they further extend Harness's DevSecOps platform into the age of AI, covering the full lifecycle from the first line of AI-generated code to the models running in production.
In November, Harness published our State of AI-Native Application Security report, a survey of hundreds of security and engineering leaders on how AI-native applications are changing your threat surface. The findings were stark: 61% of new applications are now AI-powered, yet most organizations lack the tools to discover what AI models and agents exist in their environments, test them for vulnerabilities unique to AI, or protect them at runtime. The attack surface has expanded dramatically — but the tools to defend it haven't kept up.
The picture is equally concerning on the development side. Our State of AI in Software Engineering report found that 63% of organizations are already using AI coding assistants - tools like Claude Code, Cursor, and Windsurf - to write code faster. But faster isn't safer. AI-generated code has the same vulnerabilities as human-written code, but now with larger and more frequent commits. AppSec programs that were already stretched thin are now breaking under the volume and velocity.
The result is a blind spot on both sides of the AI equation - what you're building, and what you're building with. Today, Harness is closing that gap.
Most security vendors are stuck in their lane. Shift-left tools catch vulnerabilities in code before they reach production. Runtime protection tools block attacks after applications are deployed. And the two rarely talk to each other.
Harness was built on a different premise: real DevSecOps means connecting every stage of the software delivery lifecycle, and closing the loop between what you find in production and what you fix in code.
That's what the Harness platform does today. Application Security Testing brings SAST and SCA directly into the development workflow, surfacing vulnerabilities where they're faster and cheaper to fix. SCS ensures the integrity of artifacts from build to deploy, while STO provides a unified view of security posture — along with policy and governance — across the entire organization.
As code ships to production, Web Application & API Protection monitors and defends applications and APIs in real time, detecting and blocking attacks as they happen. And critically, findings from runtime don't disappear into a security team's backlog — they flow back to developers to address root causes before the next release.
The result is a closed loop: find it in code, protect it in production, fix it fast. All on a single, unified platform.
Today, we're extending that loop into AI - on both sides. AI is reshaping what you build and how you build it simultaneously. A platform that can only address one side of that equation leaves you exposed on the other. Harness closes both gaps.
In the State of AI-Native Application Security, 66% of respondents said they are flying blind when it comes to securing AI-native apps. 72% call shadow AI a gaping chasm in their security posture. 63% believe AI-native applications are more vulnerable than traditional IT applications. They’re right to be concerned.
Harness AI Security is built on the foundation of our API security platform. Every LLM call, every MCP server, every AI agent communicating with an external service does so via APIs. Your AI attack surface isn't separate from your API attack surface; it's an expansion of it. AI threats introduce new vectors like prompt injection, model manipulation, and data poisoning on top of the API vulnerabilities your teams already contend with. There is no AI security without API security.
.png)
With the launch of AI Security, we are introducing AI Discovery in General Availability (GA). AI security starts where API security starts: discovery. You can't assess or mitigate risk from AI components you don't know exist. Harness already continuously monitors your environment for new API endpoints the moment they're deployed. Recognizing LLMs, MCP servers, AI agents, and third-party GenAI services like OpenAI and Anthropic is a natural extension of that. AI Discovery automatically inventories your entire AI attack surface in real time, including calls to external GenAI services that could expose sensitive data, and surfaces runtime risks, such as unauthenticated APIs calling LLMs, weak encryption, or regulated data flowing to external models.
Beyond discovering and inventorying your AI application components, we are also introducing AI Testing and AI Firewall in Beta, extending AI Security across the full discover-test-protect lifecycle.
.png)
AI Testing actively probes your LLMs, agents, and AI-powered APIs for vulnerabilities unique to AI-native applications, including prompt injection, jailbreaks, model manipulation, data leakage, and more. These aren't vulnerabilities that a traditional DAST tool is designed to find. AI Testing was purpose-built for AI threats, continuously validating that your models and the APIs that expose them behave safely under adversarial conditions. It integrates directly into your existing CI/CD pipelines, so AI-specific security testing becomes part of every release — not a one-time audit.
.png)
AI Firewall actively protects your AI applications from AI-specific threats, such as the OWASP Top 10 for LLM Applications. It inspects and filters LLM inputs and outputs in real time, blocking prompt injection attempts, preventing sensitive data exfiltration, and enforcing behavioral guardrails on your models and agents before an attack can succeed. Unlike traditional WAF rules that require manual tuning for every new threat pattern, AI Firewall understands AI-native attack vectors natively, adapting to the evolving tactics attackers use against generative AI.
Harness AI Security with AI Discovery is now available in GA, while AI Testing and AI Firewall are available in Beta.
"As AI-assisted development becomes standard practice, the security implications of AI-generated code are becoming a material blind spot for enterprises. IDC research indicates developers accept nearly 40% of AI-generated code without revision, which can allow insecure patterns to propagate as organizations increase code output faster than they expand validation and governance, widening the gap between development velocity and application risk."
— Katie Norton, Research Manager, DevSecOps, IDC
AI Security addresses the risks inside your AI-native applications. Secure AI Coding addresses a different problem: the vulnerabilities your AI tools are introducing into your codebase.
Developers are generating more code than ever, and shipping it faster than ever. AI coding assistants now contribute to the majority of new code at many organizations — and nearly half (48%) of security and engineering leaders are concerned about the vulnerabilities that come with it. AI-generated code arrives in larger commits, at higher frequency, and often with less review than human-written code would receive.
SAST tools catch vulnerabilities at the PR stage — but by then, AI-generated code has already been written, reviewed, and often partially shipped. Harness SAST's new Secure AI Coding capability moves the security check earlier to the moment of generation, integrating directly with AI coding tools like Cursor, Windsurf, and Claude Code to scan code as it appears in the IDE. Developers never leave their workflow. They see a vulnerability warning inline, alongside a prompt to send the flagged code back to the agent for remediation — all without switching tools or even needing to trigger a manual scan.
"Security shouldn't be an afterthought when using AI dev tools. Our collaboration with Harness kicks off vulnerability detection directly in the developer workflow, so all generated code is screened from the start." — Jeff Wang, CEO, Windsurf

What sets Secure AI Coding apart from simpler linting tools is what happens beneath the surface. Rather than pattern-matching the AI-generated code in isolation, it leverages Harness's Code Property Graph (CPG) to trace how data flows through the entire application - before, through, and after the AI-generated code in question. That means Secure AI Coding can surface complex vulnerabilities like injection flaws and insecure data handling that only become visible in the context of the broader codebase. The result is security that understands your application - not just the last thing an AI assistant wrote.
When we deployed AI across our own platform, our AI ecosystem grew faster than our visibility into it. We needed a way to track every API call, identify sensitive data exposure, and monitor calls to external vendors — including OpenAI, Vertex AI, and Anthropic — without slowing down our engineering teams.
Deploying AI Security turned that black box into a transparent, manageable environment. Some milestones from our last 90 days:
The shift wasn't just operational — it was cultural. We moved from reactive monitoring to proactive defense. As our team put it: "Securing AI is foundational for us. Because our own product runs on AI, it must be resilient and secure. We use our own AI Security tools to ensure that every innovation we ship is backed by the highest security standards."
AI is moving fast. Your attack surface is expanding in two directions at once - inside the applications you're building, and inside the code your teams are generating to build them.
Harness AI Security and Secure AI Coding are available now. Whether you're trying to get visibility into the AI running in your environment, test it for vulnerabilities before attackers do, or stop insecure AI-generated code from reaching production, Harness’ platform is ready.
Talk to your account team about AI Security. Get a live walkthrough of AI Discovery, AI Testing, and AI Firewall, and see how your AI attack surface maps against your existing API security posture.
Already a Harness CI customer? Start a free trial of Harness SAST - including Secure AI Coding. Connect it to your AI coding assistant, and see what's shipping in your AI-generated code today.


This is part 1 of a five-part series on building production-grade AI engineering systems.
Across this series, we will cover:
Most teams experimenting with AI coding agents focus on prompts.
That is the wrong starting point.
Before you optimize how an agent thinks, you must standardize what it sees.
AI agents do not primarily fail because of reasoning limits. They fail because of environmental ambiguity. They are dropped into repositories designed exclusively for humans and expected to infer structure, conventions, workflows, and constraints from scattered documentation.
If AI agents are contributors, then the repository itself must become agent-native.
The foundational step is introducing a standardized instruction layer that every agent can read.
That layer is AGENTS.md.
The Real Problem: Context Silos
Every coding agent needs instructions. Where those instructions live depends on the tool.
One IDE reads from a hidden rules directory.
Another expects a specific markdown file.
Another uses proprietary configuration.
This fragmentation creates three systemic problems.
1. Tool-dependent prompt locations
Instructions are locked into IDE-specific paths. Change tools and you lose institutional knowledge.
2. Tribal knowledge never gets committed
When a developer discovers the right way to guide an agent through a complex module, that guidance often lives in chat history. It never reaches version control. It never becomes part of the repository’s operational contract.
3. Inconsistent agent behavior
Two engineers working on the same codebase but using different agents receive different outputs because the instruction surfaces are different.
The repository stops being the single source of truth.
For human collaboration, we solved this decades ago with READMEs, contribution guides, and ownership files. For AI collaboration, we are only beginning to standardize.
What AGENTS.md Is
AGENTS.md is a simple, open, tool-agnostic format for providing coding agents with project-specific instructions. It is now part of the broader open agentic ecosystem under the Agentic AI Foundation, with broad industry adoption.
It is not a replacement for README.md. It is a complement.
Design principle:
Humans need quick starts, architecture summaries, and contribution policies.
Agents need deterministic build commands, exact test execution steps, linter requirements, directory boundaries, prohibited patterns, and explicit assumptions.
Separating these concerns provides:
Several major open source repositories have already adopted AGENTS.md. The pattern is spreading because it addresses a real structural gap.
Recent evaluations have also shown that explicit repository-level agent instructions outperform loosely defined “skills” systems in practical coding scenarios. The implication is clear. Context must be explicit, not implied.
A Real Example: OpenAI’s Agents SDK
A practical example of this pattern can be seen in the OpenAI Agents Python SDK repository.
The project contains a root-level AGENTS.md file that defines operational instructions for contributors and AI agents working on the codebase. You can view the full file here: Github.
Instead of leaving workflows implicit, the repository encodes them directly into agent-readable instructions. For example, the file requires contributors to run verification checks before completing changes:
Run `$code-change-verification` before marking work complete.It also explicitly scopes where those rules apply, such as changes to core source code, tests, examples, or documentation within the repository.
Rather than expecting an agent to infer these processes from scattered documentation, the project defines them as explicit instructions inside the repository itself.
This is the core idea behind AGENTS.md.
Operational guidance that would normally live in prompts, chat history, or internal knowledge becomes version-controlled infrastructure.
Designing an Effective Root AGENTS.md
A root AGENTS.md should be concise. Under 300 lines is a good constraint. It should be structured, imperative, and operational.
A practical structure includes four required sections.
This section establishes the mental model.
Include:
Agents are pattern matchers. The clearer the structural map, the fewer incorrect assumptions they make.
This section must be precise.
Include:
Avoid vague language. Replace “run tests” with explicit commands.
Agents execute what they are told. Precision reduces drift.
This section defines conventions.
Rather than bloating AGENTS.md, reference a separate coding standards document for:
The root file should stay focused while linking to deeper guidance.
This is where most teams underinvest.
Document:
Agents tend to repeat statistically common patterns. Your codebase may intentionally diverge from those patterns. This section is where you enforce that divergence.
Think of this as defensive programming for AI collaboration.
Hierarchical AGENTS.md: Scaling Context Correctly
Large repositories require scoped context.
A single root file cannot encode all module-specific constraints without becoming noisy. The solution is hierarchical AGENTS.md files.
Structure example:
root/
AGENTS.md
module-a/
AGENTS.md
module-b/
AGENTS.md
sub-feature/
AGENTS.mdAgents automatically read nested AGENTS.md files when operating inside those directories. Context scales from general to specific.
Root defines global conventions.
Module-level files define local invariants.
Feature-level files encode edge-case constraints.
This reduces irrelevant context and increases precision.
It also mirrors how humans reason about codebases.
Compatibility Across Tools
A standard file location matters.
Some agents natively read AGENTS.md. Others require simple compatibility mechanisms such as symlinks that mirror AGENTS.md into tool-specific filenames.
The key idea is a single source of truth.
Do not maintain multiple divergent instruction files. Normalize on AGENTS.md and bridge outward if needed.
The goal is repository-level portability. Change tools without losing institutional knowledge.
Best Practices for Agent Instructions
To make AGENTS.md effective, follow these constraints.
Write imperatively.
Use direct commands. Avoid narrative descriptions.
Avoid redundancy.
Do not duplicate README content. Reference it.
Keep it operational.
Focus on what the agent must do, not why the project exists.
Update it as the code evolves.
If the build process changes, AGENTS.md must change.
Treat violations as signal.
If agents consistently ignore documented rules, either the instruction is unclear or the file is too long and context is being truncated. Reset sessions and re-anchor.
AGENTS.md is not static documentation. It is part of the execution surface.
Ownership and Governance
If agents are contributors, then their instruction layer requires ownership.
Each module-level AGENTS.md should be maintained by the same engineers responsible for that module. Changes to these files should follow the same review rigor as code changes.
Instruction drift is as dangerous as code drift.
Version-controlled agent guidance becomes part of your engineering contract.
Why Teams Are Adopting AGENTS.md
Repositories across the industry have begun implementing AGENTS.md as a first-class artifact. Large infrastructure projects, developer tools, SDKs, and platform teams are standardizing on this pattern.
The motivation is consistent:
AGENTS.md transforms prompt engineering from a personal habit into a shared, reviewable, versioned discipline.
Vercel published evaluation results showing that repository-level AGENTS.md context outperformed tool-specific skills in agent benchmarks.
Why This Matters Now
AI agents are rapidly becoming embedded in daily development workflows.
Without a standardized instruction layer:
The repository must become the stable contract between humans and machines.
AGENTS.md is the first structural step toward that contract.
It shifts agent collaboration from ad hoc prompting to engineered context.
Foundation Before Optimization
In the next post, we will examine a different failure mode.
Even with a perfectly structured AGENTS.md, long AI sessions degrade. Context accumulates. Signal dilutes. Hallucinations increase. Performance drops as token counts rise.
This phenomenon is often invisible until it causes subtle architectural damage.
Part 2 will focus on defeating context rot and enforcing session discipline using structured planning, checkpoints, and meta-prompting.
Before you scale orchestration.
Before you add subagents.
Before you optimize cost across multiple model providers.
You must first stabilize the environment.
An agent-native repository is the foundation.
Everything else builds on top of it.


AI is proliferating across enterprise environments faster than security teams can govern it. From third-party LLM integrations to agentic frameworks like Model Context Protocol (MCP), most organisations have limited visibility into how many AI systems are running, what data they process, or what risks they introduce.
Three realities are driving this to the top of the security agenda:
Example: Shadow AI in a financial services firm
A quantitative analyst team integrates an LLM into their research workflow. The integration ships as a product feature. Six months later, a compliance review finds the endpoint is externally accessible, processes client PII, and transmits data to a third-party model provider outside the scope of the firm's data processing agreements. The AI system existed, processed regulated data, and created regulatory exposure - entirely outside the security programme's awareness.
Effective AI security is not a single capability - it is a continuous workflow across four phases:
Harness continuously discovers and classifies every AI asset from live traffic and API specifications - no manual registration required:
Shadow AI found by Harness is risk-scored, ownership-flagged, and surfaced for immediate security review. The finding moves directly into the vulnerability lifecycle with a URL, environment classification, and traffic record.
Harness continuously analyzes AI API & MCP traffic to identify sensitive data types flowing through every discovered endpoint:
When sensitive data appears in an AI endpoint for the first time, or is transmitted to an external provider, Harness surfaces a real-time Posture Event - giving privacy and compliance teams the window to act before an exposure becomes a breach notification obligation.
Harness detects AI API & MCP tool vulnerabilities passively from live traffic - no active scanning, no disruption to production AI workloads. Detection covers:
Risk scoring applies AI-specific weighting: an unauthenticated, externally exposed LLM endpoint is simultaneously a prompt injection target, a data extraction vector, and a compute abuse surface. Scores are dynamic, recalculating as traffic patterns and sensitive data classifications change.
Harness Posture Events feed connects AI security signals to the workflows security teams already run:
Custom notifications: privacy teams can alert on sensitive data to 3rd parties; SOC on risk score spikes; governance on new shadow AI assets

AI security posture management is a journey, not a deployment. Here is how organisations evolve:
For organisations where CMDB governs asset lifecycle, Harness’s Service Graph Connector extends AI-SPM into ServiceNow. Key use cases:

Operationalising AI security is not about scanning prompts. It is about continuously discovering AI systems, understanding how they access sensitive data, assessing the risks they introduce, and integrating AI posture into the security operations that already exist.
The organisations that build this capability now will govern what others are still trying to find, detect exposures before they become incidents, and answer regulatory questions with data rather than approximation - continuously, not periodically.


Over the last few years, something fundamental has changed in software development.
If the early 2020s were about adopting AI coding assistants, the next phase is about what happens after those tools accelerate development. Teams are producing code faster than ever. But what I’m hearing from engineering leaders is a different question:
What’s going to break next?
That question is exactly what led us to commission our latest research, State of DevOps Modernization 2026. The results reveal a pattern that many practitioners already sense intuitively: faster code generation is exposing weaknesses across the rest of the software delivery lifecycle.
In other words, AI is multiplying development velocity, but it’s also revealing the limits of the systems we built to ship that code safely.
One of the most striking findings in the research is something we’ve started calling the AI Velocity Paradox - a term we coined in our 2025 State of Software Engineering Report.
Teams using AI coding tools most heavily are shipping code significantly faster. In fact, 45% of developers who use AI coding tools multiple times per day deploy to production daily or faster, compared to 32% of daily users and just 15% of weekly users.
At first glance, that sounds like a huge success story. Faster iteration cycles are exactly what modern software teams want.
But the data tells a more complicated story.
Among those same heavy AI users:
What this tells me is simple: AI is speeding up the front of the delivery pipeline, but the rest of the system isn’t scaling with it. It’s like we are running trains faster than the tracks they are built for. Friction builds, the ride is bumpy, and it seems we could be on the edge of disaster.

The result is friction downstream, more incidents, more manual work, and more operational stress on engineering teams.
To understand why this is happening, you have to step back and look at how most DevOps systems actually evolved.
Over the past 15 years, delivery pipelines have grown incrementally. Teams added tools to solve specific problems: CI servers, artifact repositories, security scanners, deployment automation, and feature management. Each step made sense at the time.
But the overall system was rarely designed as a coherent whole.
In many organizations today, quality gates, verification steps, and incident recovery still rely heavily on human coordination and manual work. In fact, 77% say teams often have to wait on other teams for routine delivery tasks.
That model worked when release cycles were slower.
It doesn’t work as well when AI dramatically increases the number of code changes moving through the system.
Think of it this way: If AI doubles the number of changes engineers can produce, your pipelines must either:
Otherwise, the system begins to crack under pressure. The burden often falls directly on developers to help deploy services safely, certify compliance checks, and keep rollouts continuously progressing. When failures happen, they have to jump in and remediate at whatever hour.
These manual tasks, naturally, inhibit innovation and cause developer burnout. That’s exactly what the research shows.
Across respondents, developers report spending roughly 36% of their time on repetitive manual tasks like chasing approvals, rerunning failed jobs, or copy-pasting configuration.
As delivery speed increases, the operational load increases. That burden often falls directly on developers.
The good news is that this problem isn’t mysterious. It’s a systems problem. And systems problems can be solved.
From our experience working with engineering organizations, we've identified a few principles that consistently help teams scale AI-driven development safely.
When every team builds pipelines differently, scaling delivery becomes difficult.
Standardized templates (or “golden paths”) make it easier to deploy services safely and consistently. They also dramatically reduce the cognitive load for developers.
Speed only works when feedback is fast.
Automating security, compliance, and quality checks earlier in the lifecycle ensures problems are caught before they reach production. That keeps pipelines moving without sacrificing safety.
Feature flags, automated rollbacks, and progressive rollouts allow teams to decouple deployment from release. That flexibility reduces the blast radius of new changes and makes experimentation safer.
It also allows teams to move faster without increasing production risk.
Automation alone doesn’t solve the problem. What matters is creating a feedback loop: deploy → observe → measure → iterate.
When teams can measure the real-world impact of changes, they can learn faster and improve continuously.
AI is already changing how software gets written. The next challenge is changing how software gets delivered.
Coding assistants have increased development teams' capacity to innovate. But to capture the full benefit, the delivery systems behind them must evolve as well.
The organizations that succeed in this new environment will be the ones that treat software delivery as a coherent system, not just a collection of tools.
Because the real goal isn’t just writing code faster. It’s learning faster, delivering safer, and turning engineering velocity into better outcomes for the business.
And that requires modernizing the entire pipeline, not just the part where code is written.


Harness Artifact Registry marks an important milestone as it evolves from universal artifact management into an active control point for the software supply chain. With growing enterprise adoption and new security and governance capabilities, Artifact Registry is helping teams block risky dependencies before they reach the pipeline, reduce supply chain exposure, and scale artifact management without slowing developers down.
In little over a year, Harness Artifact Registry has grown from early discovery to strong enterprise adoption, supporting a wide range of artifact formats, enterprise-scale storage, and high-throughput CI/CD pipelines across both customers and internal teams. What started as a focused initiative inside Harness has evolved into a startup within a startup, quickly becoming a core pillar of the Harness platform.
Today, we’re sharing how Artifact Registry is helping organizations scale software delivery by simplifying artifact management, strengthening supply chain security, and improving developer experience and where we’re headed next.
In customer conversations, one theme came up repeatedly: as organizations scale CI/CD, artifacts multiply fast. Containers, packages, binaries, Helm charts, and more end up spreading across fragmented tools with inconsistent controls. Teams don't want just another registry. They want one trusted system, deeply integrated with CI/CD, that can scale globally and enforce policy by default. That's exactly what the Artifact Registry was built to be. By embedding artifact management directly into the Harness platform, it reduces tooling sprawl while giving platform engineering, DevOps, and AppSec teams centralized visibility and control, without slowing developers down.
Today, Artifact Registry supports a growing ecosystem of artifact types, with Docker, Helm (OCI), Generic, Python, npm, Go, NuGet, Dart, Conda, PHP Composer, and AI artifacts now available, and more on the way. With Artifact Registry, teams can:
The business impact is already clear. Artifact Registry has quickly gained traction with several enterprise customers, driven by strong platform integration, low-friction adoption, and the advantage of having artifact management natively embedded within the CI/CD platform.

One early customer managing artifacts across Docker, Helm, Python, NPM, Go, and more has standardized on Harness Artifact Registry, achieving 100% CI adoption across teams and pipelines.
“Harness Artifact Registry is stable, performant, and easy to trust at scale, delivering faster and more reliable artifact pulls than our previous vendor”
— SRE Lead
By unifying artifact storage with the rest of the software delivery lifecycle, Artifact Registry simplifies operations while helping teams focus on shipping software.
Software supply chain threats have become both more frequent and more sophisticated. High-profile incidents like the SolarWinds breach, where attackers injected malicious code into trusted update binaries affecting thousands of organizations, exposed how deeply a compromised artifact can penetrate enterprise systems. More recently, the Shai-Hulud 2.0 campaign saw self-propagating malware compromise hundreds of npm packages and tens of thousands of downstream repositories, harvesting credentials and spreading automatically through development environments.
As these attacks show, risk doesn’t only exist after a build, it can be embedded long before artifacts reach CI/CD pipelines. That’s why Harness Artifact Registry was designed with governance at its core.
Harness Artifact Registry includes Dependency Firewall, a control point that allows organizations to govern which dependencies are allowed into their environment in the first place. Rather than relying on downstream scans after a package has already been pulled into CI/CD, Dependency Firewall evaluates dependency requests at ingest using policy-based controls.
This allows teams to proactively block risky artifacts before they are ever downloaded. Organizations can prevent the use of dependencies with known CVEs or license violations, blocking risky dependencies before they reach your pipeline, and restrict access to untrusted or unsafe upstream sources by default. The result is earlier risk reduction, fewer security exceptions later in the pipeline, and stronger alignment between AppSec and development teams without slowing delivery.
[Dependency Firewall Explainer Video]
To further strengthen supply chain protection, Artifact Registry provides built-in artifact quarantine, allowing organizations to automatically block artifacts that fail security or compliance checks. Quarantined artifacts cannot be downloaded or deployed until they meet defined policy requirements, helping teams stop risk before it moves downstream. All quarantine actions are policy-driven, fully auditable, and governed by RBAC, ensuring that only authorized users or systems can quarantine or release artifacts.

Rather than forcing teams to replace the tools they already use, Harness Artifact Registry is built to fit into real-world security workflows by unifying scanning and governance at the registry layer. Today, Artifact Registry includes built-in scanning powered by Aqua Trivy for vulnerabilities, license issues, and misconfigurations, and integrates with over 40 security scanners, including tools like Wiz, for container, SCA, and compliance checks. Teams can orchestrate these scans directly in their CI pipelines, with scan results feeding into policy evaluation to automatically determine whether an artifact is released or quarantined.

Artifact Registry also exposes APIs that allow external security and ASPM platforms to trigger quarantine or release actions based on centralized policy decisions. Together, these capabilities enable organizations to enforce consistent, policy-driven controls early, stop risky artifacts before they move downstream, and connect artifact governance to broader enterprise security workflows all without slowing down developers.
As organizations scaled, legacy registries have become bottlenecks disconnected from CI/CD, security, and governance workflows. Harness takes a different approach. Because Artifact Registry is natively integrated into the Harness platform, teams benefit from:
This tight integration has accelerated adoption by removing friction from day-to-day workflows. Teams are standardizing how artifacts are secured, distributed, and governed across the software delivery lifecycle, while keeping developer workflows fast and familiar.
Harness Artifact Registry was built to modernize artifact management for the enterprise, combining high-performance distribution with built-in security, governance, and visibility. We’ve continued to invest in a platform designed to scale with modern delivery pipelines and we’re just getting started.
Looking ahead, we’re expanding Artifact Registry in three key areas:
Support is coming for Alpine, Debian, Swift, RubyGems, Conan, and Terraform packages, enabling teams to standardize more of their software supply chain on a single platform.
We’re investing in artifact lifecycle management, immutability, audit logging, storage quota controls, and deeper integration with Harness Security Solutions.
Upcoming capabilities include semantic artifact discovery, custom dashboards, AI-powered chat, OSS gatekeeper agents, and deeper integration with Harness Internal Developer Portal.
Modern software delivery demands clear control over how software is built, secured, and distributed. As supply chain threats increase and delivery velocity accelerates, organizations need earlier visibility and enforcement without introducing new friction or operational complexity.
We invite you to sign up for a demo and see firsthand how Harness Artifact Registry delivers high-performance artifact distribution with built-in security and governance at scale.


An API failure is any response that doesn’t conform to the system’s expected behavior being invoked by the client. One example is when a client makes a request to an API that is supposed to return a list of users but returns an empty list (i.e., {}). A successful response must have a status code in the 200 series. An unsuccessful response must have either an HTTP error code or a 0 return value.
An API will raise an exception if it can’t process a client request correctly. The following are the common error codes and their meanings:
An API failure can happen because of issues with the endpoints like network connectivity, latency, and load balancing issues. The examples below may give you a good understanding of what causes an API failure.
Some APIs are better left locked down to those who need access and are only available to those using an approved key. However, when you don’t set up the correct permissions for users, you can impede the application’s basic functionality. If you’re using an external API, like Facebook, Twitter, or even Google Analytics, make sure you’re adding the permissions for your users to access the data they need. Also, keep on top of any newly added features that can increase security risks.
If you’re leveraging external APIs requiring extra configuration, get the correct API key so the app has the proper permissions. Also, provide your clients with API keys relevant to their authorization levels. Thus, your users will have the correct permissions and will seamlessly access your application.
We’ve all seen it happen a million times: someone discovers an API that’s exposed to everyone after gaining user consent. Until now, this was usually reasonably benign, but when credentials are leaked, things can get ugly fast, and companies lose brand trust. The biggest problem here is keeping admins from having unsecured access to sensitive data.
Using a secure key management system that includes the “View Keys” permission for the account will help mitigate this risk. For example, you could use AWS Key Management Service (AWS KMS) to help you manage and create your encryption keys. If you can’t protect your keys, then at the very least, include a strong master password that all users can access, and only give out these keys when needed.
Untrusted tokens and session variables can cause problems for how a website functions, causing timing issues with page loads and login calls or even creating a denial of service, which can harm the end-user experience and your brand.
The best way to secure sensitive data is by using token authentication, which will encode user data into the token itself based on time/date stamps. You can then enforce this to ensure that whenever you reissue tokens, they expire after a set amount of time or use them for API requests only. As for session variables, these are usually created based on your authentication keys and should be handled the same way as your privileged keys—with some encryption. And keep the source of your keys out of the hands of anyone who can access them.
If you’re using an API to power a website, you must upload new data in real time or save it to a cache for later use. When you set an expiry time for an API and fail to update, you make it unavailable. When a user or application tries to access it after the expiry, they get a 404 or 500 error.
You should use a middle ground option—a proxy API. This will allow you to cache your data before you make it available and only allow access to the correct bits of the APIs as needed. You should also schedule tasks that run daily to import updated data and bring it into your system.
This one isn’t necessarily a mistake, but it happens from time to time when developers aren’t careful about how they name things or if they’re using an improper URL structure for their API endpoints. When the URL structure is too complex or has invalid characters, you will get errors and failures. Look at some examples of bad URL structure: “http://example.com/api/v1?mode=get” The above structure is bad because the "?" character filters a single parameter, not the type of request. The default request type is GET; thus, a better URL would look like this: “http://example.com/api/v1”
Remove any unsafe characters in your URL, like angle brackets (<>). You use angle brackets as delimiters in your URL. Also, design the API to make it more friendly for users. For example, this URL "https://example.com/users/name" tells users they’re querying the names of users, unlike this URL "https://example.com/usr/nm" It’s also good practice to use a space after the “?” in your API URL because otherwise, people can mistakenly think that the space is part of a query string.
This happens when trying to build multiple ways of accessing multiple applications. You do this by relying on generic endpoints instead of target audiences and specific applications. Creating a lot of different paths for the same data results in non-intuitive routes.
There are several ways to go about this, but for most, you want to use a network proxy system that can handle the different data access methods and bring it all into one spot. This will help minimize potential issues with your APIs routes and help with user confusion and brand damage.
This can happen when organizations are not properly securing their public IP addresses, or there is no solid monitoring process. This exposes your assets by providing easy access to anyone. Exposed IPs make your application vulnerable to DDoS attacks and other forms of abuse or phishing.
Make sure you properly manage your IP addresses and have a solid monitoring system. You must block all IPv6 traffic and enforce strict firewall rules on your network. You should only allow service access through secure transport methods like TLS.
API errors are a plague on the internet. Sometimes they come as very poor performance that can produce long response times and bring down APIs, or they can be network-related and cause unavailable services. They’re often caused by problems such as inconsistent resource access errors, neglect in proper authentication checks, faulty authentication data validation on endpoints, failure to read return codes from an endpoint, etc. Once organizations recognize what causes API failures and how to mitigate them, they seek web application and API protection (WAAP) platforms to address the security gaps. Harness WAAP by Traceable helps you analyze and protect your application from risk and thus prevent failures.
Harness WAAP is the industry’s leading API security platform that identifies APIs, evaluates API risk posture, stops API attacks, and provides deep analytics for threat hunting and forensic research. With visual depictions of API paths at the core of its technology, its platform applies the power of distributed tracing and machine learning models for API security across the entire software development lifecycle. Book a demo today.
Need more info? Contact Sales