AI writes the code. Who delivers it safely?

All this author’s posts

The question for enterprise AI in 2026 is no longer just which model. It’s which harness.

An agent harness is the system around the model. It decides what the agent remembers, what context it sees, what tools it can call, what it is allowed to do, and what happens when it is wrong.

The model provides intelligence. The harness provides control.

This is where the real engineering is happening. When Claude Code's source was accidentally exposed earlier this year, reports put it at more than half a million lines. None of that was the model. All of it was the system around the model.

The model gets you started. The harness gets you to production.

In software engineering

Software engineering is one of the first places this plays out. AI coding tools are writing and editing code. Autonomous agents are starting to deploy, operate, and respond to incidents. These are not suggestions anymore. They are changes to running software, made by agents acting on their own.

And one harness is not enough.

Two loops, two harnesses

Software engineering has two halves at the level that matters for agent harness design. Software development, where code gets written. Software delivery, where code becomes running software.

The inner loop is software development. Code gets written, edited, tested, and reviewed. Coding agents work here, close to the developer and bounded by the repository. Whether they live in an IDE, a terminal, a background session, or a web workspace doesn’t change what they do. They help one person write better code faster.

The outer loop is software delivery. Code becomes software that is built, tested, secured, deployed, verified, operated, and sometimes rolled back. That includes CI, security scans, deployments, infrastructure, feature flags, incidents, and approvals.

The two are loops different. The inner loop is about individual productivity. The outer loop is about organizational execution under risk. It crosses teams, touches production, uses secrets, enforces policy, and leaves an audit trail.

An agent delivering software can’t be a coding assistant with API access. It has to run inside a system that enforces the organization’s rules.

What goes wrong without the right harness

The stakes are easier to see by starting with what breaks.

Security. An agent with broad access to deploy, provision, and push config changes is a new attack surface. Prompt injection through a PR description, a poisoned dependency, or a malicious issue comment can turn an autonomous agent into the most privileged insider threat in the company. It acts under its own identity, with its own scoped credentials, doing exactly what it’s authorized to do. The attacker just redirects the authorization. Without an identity model and governed execution, every action the agent can take becomes a potential action path for an attacker.

Compliance. An agent that ships code without the same policy gates, approvals, and audit trails humans use creates a parallel path that regulators and auditors will challenge. A single deployment that skipped EU data residency review can trigger a finding that takes quarters to close. Cyber insurers are starting to scrutinize AI governance, and some are exploring exclusions or tighter terms for poorly governed AI. Within a year or two, “we have autonomous agents deploying code without an evidence trail” will be impossible to defend. Autonomous delivery without verification is autonomous liability.

Confident bad decisions. An agent with partial context looks like it’s working. It deploys during a change freeze. It rolls out a config change that breaks an upstream service. It enables a feature flag during an incident. Each failure is locally reasonable and globally wrong. Without the full knowledge graph, the agent keeps making the wrong call.

AI-specific failure modes. Autonomous agents fail in ways that deterministic automation doesn’t. They hallucinate actions, generating and deploying a Kubernetes manifest that doesn’t match reality. They get stuck in loops, rolling back and redeploying the same change until a human kills the process. They’re confidently wrong, proposing a fix that passes a weak policy gate and breaks production an hour later. No attacker involved. Without verification strong enough to catch them, errors reach production.

All of this has happened with deterministic automation, one mistake at a time. With autonomous agents, errors happen in parallel. A coding agent with bad context can push 10 broken PRs in 10 minutes. A delivery agent without verification can deploy 20 services before anyone notices.

Speed used to be the feature. With autonomous agents, speed is also the damage multiplier.

What a software delivery agent actually needs

A software delivery agent needs four things: memory, context, tools, and verification. The shape and stakes of each element are distinct.

Suppose a team is shipping a new version of a retailer’s checkout service on Thursday. Checkout depends on payments, inventory, fraud, and identity.

Memory: a graph of how your company ships

A Software Delivery Knowledge Graph is a connected map of services, teams, pipelines, deployments, incidents, policies, scorecards, and artifacts. Nodes and edges show how they all relate.

To answer “Is checkout safe to ship Thursday?”, the agent has to know which services checkout depends on, what their scorecards look like, whether any have open critical CVEs, whether there’s a change freeze, and who’s on call Thursday night.

Tha’is a graph query. If the agent doesn’t have the graph, it’s guessing.

Context: the live signal

Memory is the durable map. Context is the live signal. Memory tells the agent how the delivery system is connected. Context tells it what’s happening now.

Back to checkout. The agent sees that a chaos experiment last week showed payments fail when its Redis cache is unavailable. It sees that yesterday’s security scan flagged a critical CVE in a library fraud detection depends on. It sees that the new version changes the same config flag that caused an incident two weeks ago.

None of this is in the pull request. All of it matters.

Context isn’t something you assemble from scratch at runtime. It accumulates in the harness long before the agent is asked to act.

Tools: governed execution

People often assume “tools” means function calls to APIs. For a software delivery agent, it means something different. The agent can deploy to Kubernetes, run a database migration, apply a feature flag, trigger a security scan, run a chaos experiment, open and close an incident. Real actions, inside your network, using your credentials, under your policies, with full audit logging.

At Harness, every action runs through a Delegate: a lightweight worker inside your environment. Your VPC, your Kubernetes cluster, your data center. The agent issues an instruction. The Delegate executes it inside your perimeter and returns the result.

Secrets are decrypted inside the Delegate. Never in the agent’s context window, never in a model provider's memory, never in an audit log.

An agent with arbitrary production access is dangerous. An agent constrained by governed execution is governable.

Verification: proving the action was safe

This is the pillar coding and personal productivity agents don’t need at this depth. Software delivery agents do.

Three mechanisms make it concrete:

Scorecards grade services against rules the organization defines. Test coverage, SLO compliance, library currency, critical CVEs. Every rule measurable. Every score live. Thresholds set by the organization.
Policy gates block actions until conditions are met. “No deployment without a passing scorecard.” “No EU infrastructure change without a named EU approver.” The gate sits in the pipeline. The agent can’t route around it.
Evidence is cryptographically signed proof that each action met its policy. When an auditor asks, “prove last Tuesday’s deployment passed security testing,” the system returns a tamper-evident record.

For checkout, the Thursday release is blocked unless the scorecard passes, no critical CVEs are open, no change freeze applies, and an EU compliance approver signs off. If any of those fail, the agent cannot deploy. If they all pass, the deployment runs through a Delegate and an evidence record is written.

The rules of the organization are enforced in the harness. The agent operates inside them.

The foundation is already built

I mentioned that an agent needs memory, context, tools, and verification. The good news: a modern software delivery platform like Harness already has the foundations, because truly automated delivery has always needed those four things.

A note on our name. We called the company Harness in 2017 because the original thesis was a safety harness for code: let developers move fast without breaking things. Pipelines, policies, approvals, rollbacks, evidence. The scaffolding that lets speed and safety coexist.

That thesis hasn’t changed. The mover has. Developers are still moving fast. AI agents are moving fast too, and faster. The harness has to hold both.

Pipelines aren’t agents. Pipelines are the harness that lets agents safely act. They’re the control plane where agent actions are evaluated, constrained, and executed under policy.

The word “pipeline” carries baggage. Many people hear “script runner.” That isn’t what we mean. Harness pipelines are production orchestration engines: loops, matrix runs, parallel stages, conditions, approvals, OPA gates, rollback, retries, and deterministic-plus-agentic step-chaining.

An agent step can run inside a loop. A deterministic step can pass output to an agent, then to a policy gate, an approval, another agent, and a deployment. The agent isn’t replacing the pipeline. The agent is one kind of step the pipeline already knows how to run.

Harness pipelines execute hundreds of millions of runs a year across enterprise production systems. That isn’t a theoretical runtime for agents. It’s a runtime already hardened at scale, on real delivery, under real policy, with real rollback. That’s the difference between a script runner and a production harness for autonomous action.

The rest of the foundation maps the same way. The Delegate is how actions reach your infrastructure. The Software Delivery Knowledge Graph is the memory. Our platform modules are the tools. Scorecards, policy gates, and signed evidence are the verification. Harness AI, the intelligence layer on top, uses all four of these elements.

We didn’t set out to build an agent harness. We set out to build a software delivery platform with AI at its core. It turns out those two things are the same.

Why coding agents are a different harness

Coding agents (IDE copilots, background agents, terminal-based assistants, cloud coding sessions) are built for a different job. They know your codebase, your style, your recent commits. That’s a real harness, bounded by the repository and the developer. A software delivery harness has different scope, memory, risks, and accountability.

A coding agent’s memory is the repository. A software delivery agent’s memory is the organization.

The context gap. Ask your coding assistant: “Is it safe to deploy this checkout change to production tonight?” It can’t answer. It doesn’t know the current scorecard, the change freeze status, last week’s chaos test results, or who’s on call. None of that lives inside the developer's workspace. A coding agent can write a change. It can’t know if the change is safe to ship.

The blast radius gap. A coding agent’s bad change usually gets caught before it hurts anything: in review, in CI, in a security scan, on a policy gate. Fifteen minutes wasted, not a production incident. A software delivery agent’s worst day is customer data exposure, a production outage, or a regulatory incident. Same agent paradigm, radically different blast radius.

The safety-net gap. Both kinds of agents are moving toward less human oversight. The difference is what catches them when they’re wrong. A coding agent mistake gets caught downstream: by CI, by security scans, by policy gates, by the delivery harness itself. A delivery agent mistake has nothing downstream. It is the downstream.

The control-plane gap. Could a coding agent call Harness as a backend? Of course. It should. But the caller isn’t the control plane. The software delivery harness decides whether the request is allowed, how it executes, and what evidence is retained.

The preference gap. Developers are going to pick their own coding agents. Most enterprises already run two or three: Cursor on some teams, Claude Code on others, Copilot on others, whatever ships next year on yet other teams. That’s healthy. Software development is distributed by design. Software delivery is the opposite: it’s centralized. One company, one delivery control plane. One set of policies, one audit trail, one source of evidence, one place where credentials are held.

The winning pattern is the two meeting cleanly: whichever coding agent the developer picks, the deployment passes through the same delivery harness.

Why model providers aren’t the delivery harness today

Managed agents. Stateful APIs. Server-side memory. Model providers are extending into harness territory, and for many use cases, that works. For software delivery specifically, the architecture runs into a different set of constraints.

The credentials problem. Every software delivery action requires production credentials: cloud admin roles, Kubernetes service accounts, database passwords, secrets manager keys. The most sensitive assets in the company. Enterprises spend years building the controls around them: vaults, rotation, scoped access, audit trails. A model-provider-hosted agent loop would require those credentials to flow through the model provider’s infrastructure on every action. Few CISOs will approve it. Few auditors will sign off. In regulated industries, it’s often a non-starter.

The inversion. A model can be hosted anywhere. Any provider, any cloud. Execution has to happen inside the enterprise, using credentials that never leave. The model stays outside. The control plane runs inside. Intelligence can live anywhere. The control plane can’t.

The live-state problem. A software delivery agent’s answer to “Is this safe to ship?” depends on a state that changes every minute. The current change freeze. The latest incident. The newest CVE. Who’s on call right now. Whether the deployment window just closed. A model provider can reason about what you put in the prompt. It doesn’t naturally own the current state of your delivery system. A model provider knows the world. The harness has to know your world, right now.

The accountability problem. When a delivery agent does something wrong, the model provider isn’t on the incident bridge. The on-call engineer is. The platform lead is. The CTO is. The company is the one that has to explain the outage to customers, the finding to regulators, the miss to the board. Accountability can’t be outsourced. The harness that constrains the agent can’t be either.

A model provider can be the brain. It can’t be the harness for delivery.

AI for everything after code

More and more code will be written by AI. The bottleneck is shifting from code generation to safe delivery.

Coding agents help developers write code. Software delivery agents help teams safely deliver and operate it. Two harnesses. Two categories. Two sets of winners.

The foundation for software delivery is ready. The agents that need it are arriving now. The category now has a name.

We’ve always called it Harness. The idea just got bigger.

Jyoti Bansal

All this author’s posts

Jyoti Bansal is a serial entrepreneur and technology visionary who believes passionately in software's ability to change the world for the better. He co-founded Harness in 2017 to automate and simplify all software delivery processes, and serves as CEO.