Designing MCP for the Age of AI Agents

Sunil Gattupalle

All this author’s posts

Rohan Gupta

All this author’s posts

Shubham Jindal

All this author’s posts

Key Takeaways:

The Harness MCP server is an MCP-compatible interface that lets AI agents discover, query, and act on Harness resources across CI/CD, GitOps, Feature Flags, Cloud Cost Management, Security Testing, Resilience Testing, Internal Developer Portal, and more.

The Harness MCP server v2 reduces tools from 130+ to 11.
The redesign cuts estimated tool-definition context cost from about 26% to 1.6% of a 200K-token window.
A registry-based dispatch model supports 125+ resource types without expanding the tool vocabulary.
The architecture is designed for Cursor, Claude Code, and other MCP-compatible clients.
Built-in safety controls include confirmation for writes, fail-closed deletes, and read-only mode.

The first wave of MCP servers followed a natural pattern: take every API endpoint, wrap it in a tool definition, and expose it to the LLM. It was fast to build, easy to reason about, and it was exactly how we built the first Harness MCP server. That server taught us a lot: solid Go codebase, well-crafted tools, broad platform coverage across 30 toolsets. It also taught us where the one-tool-per-endpoint model hits a wall.

For platforms the size of Harness, spanning the entire SDLC, the pattern doesn't scale. When you expose one tool per API endpoint, you're asking the LLM to be a routing layer, forcing it to do something a switch statement does better. Every tool definition consumes context that could be spent on reasoning. At ~175 tools, that's ~26% of the LLM's context window before the developer even types a prompt.

So we iterated. The Harness MCP v2 redesign does the same work with 11 tools at ~1.6% context consumption. The answer isn't fewer features, it's a different architecture: a registry-based dispatch model where the LLM reasons about what to do, and the server handles how to do it.

What We Learned: Tool Sprawl and Agent Performance

When an MCP client connects to a server, it loads every tool definition into the LLM's context window. Every name, description, parameter schema, and annotation. For the first Harness server at ~130+ active tools, here's what that costs:

That's the core insight: the first server uses ~26% of context on tool definitions before any work begins. The v2 uses ~1.6%.

This isn't a theoretical concern. Research on LLM behavior in large context windows, including Liu et al.'s "Lost in the Middle" findings, shows that models struggle to use information placed deep within long contexts. As Ryan Spletzer recently wrote, dead context doesn't sit inertly: "It dilutes the signal. The model's attention is spread across everything in the window, so the more irrelevant context you pack in, the less weight the relevant context carries."

Anthropic's own engineering team has documented this trade-off: direct tool calls consume context for each definition and result, and agents scale better when the tool surface area is deliberately constrained.

The problem compounds in real-world developer environments. If you're running Cursor or Claude Code with a Playwright MCP, a GitHub MCP, and the Harness MCP, those tool definitions stack. EclipseSource's analysis shows that a standard set of MCP servers can eat 20% of the context window before you even type a prompt. The recommendation: stay below 40% total context utilization. Any MCP server with 100+ tools, ours included, would consume more than half that budget on its own.

How We Stack Up: Context Efficiency Across the MCP Ecosystem

The context window tax isn't unique to Harness: it's an industry-wide problem. Here's how the v2 server compares to popular MCP servers in the wild:

‍Lunar.dev research: "5 MCP servers, 30 tools each → 150 total tools injected. Average tool description: 200–500 tokens. Total overhead: 30,000–60,000 tokens. Just in tool metadata." MCP server v2 at ~3,150 tokens would represent just 5–10% of a typical multi-server setup's overhead.

Real-world Claude Code user: A developer on Reddit r/ClaudeCode with Playwright, Context7, Azure, Postgres, Zen, and Firecrawl MCPs reported 83.3K tokens (41.6% of 200K) consumed by MCP tools immediately after /clear. That's before a single prompt.

Anthropic's code execution findings: Anthropic's engineering team reported that a workflow consuming 150,000 tokens was reduced to ~2,000 tokens (a 98.7% reduction) by switching from direct tool calls to code-based tool invocation. The principle is clear: fewer, smarter tools beat more, narrower ones.

MCPAgentBench: An academic benchmark found that "nearly all evaluated models exhibit a decline of over 10 points in task efficiency when tool selection complexity increases." Models overwhelmed with tools prioritize task resolution over execution efficiency. They get the job done, but waste tokens doing it.

IDE Tool Limits and Practical Headroom

Cursor enforces an 80-tool cap, OpenAI limits to 128 tools, and Claude supports up to ~120. The v2 server's 11 tools leave massive headroom to run Harness alongside other MCP servers without hitting these limits.

Consider a concrete example: a developer running Cursor with Playwright (21 tools), GitHub MCP (~40 tools), and the old Harness MCP (~175 tools) would hit ~236 tools, well past Cursor's 80-tool cap. With v2 Harness (11 tools), the same stack is 72 tools, comfortably under the limit.

With Claude Code, the same old stack would burn ~76,400 tokens (~38%) on tool definitions alone. With v2, it drops to ~27,550 tokens (~14%), freeing ~48,850 tokens for actual reasoning and conversation.

The CLI vs MCP Debate and Why It’s the Wrong Question

The MCP ecosystem is in the middle of a reckoning. Scalekit ran 75 benchmark runs comparing CLI and MCP for identical GitHub tasks on Claude Sonnet 4, and CLI won on every efficiency metric: 10–32x cheaper, 100% reliable vs MCP’s 72%. For a simple “what language is this repo?” query, CLI used 1,365 tokens. MCP used 44,026 — almost entirely from schema injection of 43 tool definitions the agent never touched.

The Playwright team shipped the same verdict in hardware. Their new CLI tool saves browser state to disk instead of flooding context. In BetterStack’s benchmarks, CLI used ~150 tokens per interaction vs MCP’s ~7,400+ of accumulated page state. CircleCI found CLI completed browser tasks with 33% better token efficiency and a 77 vs 60 task completion score.

The CLI camp’s argument is real: schema bloat kills performance. But their diagnosis points at the wrong layer. The problem isn’t MCP. It’s naive MCP server design.

What CLI Gets Right

CLI wins when the agent already knows the tool. gh, kubectl, terraform: these have extensive training data. The agent composes commands from memory, pays zero schema overhead, and gets terse, predictable output. Scalekit found that adding an 800-token “skills document” to CLI reduced tool calls and latency by a third.

CLI also wins on composition. Piping grep into jq into xargs chains operations in a single tool call. An MCP agent doing the same work makes N round-trips through the LLM, each one burning context.

What CLI Can’t Do

But CLI’s advantages dissolve the moment you cross three boundaries:

Discovery

CLI works when the agent knows the command. For a platform like Harness, with 122+ resource types across CI/CD, GitOps, FinOps, security, chaos, and IDP, the agent can’t know the API surface from training data alone. MCP’s harness_describe tool lets the agent discover capabilities at runtime. CLI would require the agent to guess curl commands against undocumented APIs.

Authorization

As Scalekit themselves concluded: “The question isn’t CLI or MCP. It’s who is your agent acting for?” CLI auth gives the agent ambient credentials: your token. For multi-tenant, multi-user environments (which is where Harness operates), MCP provides per-user OAuth, explicit tool boundaries, and structured audit trails.

Safety

CLI agents can run arbitrary shell commands. An MCP server constrains the agent to declared tools with typed inputs. The v2 server’s elicitation-based confirmation flows, fail-closed deletes, and read-only mode are protocol-level safety guarantees that CLI can’t replicate.

The v2 Server Is Our Answer to This Debate

The CLI vs MCP debate is really about schema bloat and naive tool design. The v2 Harness MCP server eliminates the arguments against MCP without losing the arguments for it:

Schema bloat? 11 tools at ~3,150 tokens. That’s less than a single CLI help output for a complex tool. Cursor’s 80-tool cap? We use 11. The 44,026-token GitHub MCP problem? We’re 14x leaner.

Round-trip overhead? The registry-based dispatch means the agent makes one tool call to harness_diagnose and gets back a complete execution analysis — pipeline structure, stage/step breakdown, timing, logs, and root cause. A CLI agent would need to chain 4–5 API calls to assemble the same picture.

Discovery? harness_describe is a zero-API-call local schema lookup. The agent discovers 125+ resource types without a single network request. CLI would require a man page the agent has never seen.

Composition? Skills + prompt templates encode multi-step workflows (build-deploy-app, debug-pipeline-failure) as server-side orchestration. The agent reasons about what to do; the server handles how to chain it. Same efficiency as a CLI pipe, with protocol-level safety.

The real lesson from the benchmarks: MCP servers with 43+ tools and no architecture for context efficiency will lose to CLI on cost metrics. But a well-designed MCP server with 11 tools, a registry, and a skills layer outperforms both naive MCP and naive CLI — and provides authorization, safety, and discoverability that CLI architecturally cannot.

The Redesign: 11 Tools, 125+ Resource Types, 1 Registry

We stopped designing for API parity and started designing for agent usability.

The v2 server is built around a registry-based dispatch model. Instead of one tool per endpoint, we expose 11 intentionally generic verbs. The intelligence lives in the registry: a declarative data structure that maps resource types to API operations.

The 11 Tools

When an agent calls harness_list(resource_type="pipeline"), the server looks up pipeline in the registry, resolves the API path, injects scope parameters (account, org, project), makes the HTTP call, extracts the relevant response data, and appends a deep link to the Harness UI. The agent never needs to know the underlying API structure.

How the Registry Works

Each registry entry is a declarative ResourceDefinition:

{

resourceType: "pipeline",

displayName: "Pipeline",

toolset: "pipelines",

scope: "project",

identifierFields: ["pipeline_id"],

operations: {

list: {

method: "GET",

path: "/pipeline/api/pipelines/list",

queryParams: { search_term, page, size },

responseExtractor: (raw) => raw.content

},

get: {

method: "GET",

path: "/pipeline/api/pipelines/{pipeline_id}",

responseExtractor: (raw) => raw.data

}

‍

Adding support for a new Harness module requires adding one declarative object to the registry. No new tool definitions. No changes to MCP tool schemas. The LLM's tool vocabulary stays constant as the platform grows.

Today, the registry covers 125+ resource types across 30 toolsets, spanning the full Harness platform:

DevOps: Pipelines, Executions, Services, Environments, Infrastructure, Templates, Connectors, Secrets, Delegates
Code: Repositories, Branches, Commits, Pull Requests, Code Reviews
GitOps: Agents, Applications, Clusters, ApplicationSets, Repositories
Security: Security Issues, Exemptions, SBOMs, Compliance, Supply Chain, OPA Policies
Cloud Cost: Perspectives, Budgets, Recommendations, Anomalies, Commitments
Chaos: Experiments, Probes, Templates, Infrastructure, Load Tests
Feature Flags: Workspaces, Environments, Flags
IDP: Entities, Scorecards, Workflows, Tech Docs
SEI: DORA Metrics, Team Analytics, AI Usage, Business Alignment
Platform: Organizations, Projects, Users, Roles, Permissions, Settings, Audit Trail

Optimized for Cursor, Claude Code, and Real Developer Workflows

The architecture wasn't designed in a vacuum. We built it specifically for the environments developers actually use.

Cursor and Windsurf: Stdio-First, Toolset Filtering

Cursor and Windsurf connect via stdio transport — the server runs as a local process alongside the IDE. With 11 tools instead of 130+, the Cursor agent has a minimal, clear menu. It doesn't waste reasoning cycles on tool selection or get confused by 40 CCM-specific tools when the developer is debugging a pipeline failure.

For teams that only use specific Harness modules, HARNESS_TOOLSETS lets you filter at startup:

{

"mcpServers": {

"harness": {

"command": "npx",

"args": ["-y", "harness-mcp-v2@latest"],

"env": {

"HARNESS_API_KEY": "pat.xxx.yyy.zzz",

"HARNESS_TOOLSETS": "pipelines,services,connectors"

}

‍

The agent only sees resource types from the enabled toolsets. The rest don't exist as far as the LLM is concerned.

Claude Code: Prompt Templates and Multi-Project Discovery

Claude Code excels at multi-step workflows. We leaned into that with 26 prompt templates across four categories:

DevOps (12): build-deploy-app, debug-pipeline-failure, create-pipeline, onboard-service, dora-metrics-review, and more
FinOps (5): optimize-costs, cloud-cost-breakdown, commitment-utilization-review, cost-anomaly-investigation, rightsizing-recommendations
DevSecOps (6): security-review, vulnerability-triage, sbom-compliance-check, supply-chain-audit, security-exemption-review, access-control-audit
Harness Code (3): code-review, pr-summary, branch-cleanup

Each prompt template encodes a multi-step workflow the agent can execute. debug-pipeline-failure doesn't just fetch an execution — it calls harness_diagnose, follows chained failures, and produces a root cause analysis with actionable fixes.

The v2 server also supports multi-project workflows without hardcoded environment variables. An agent can dynamically discover the account structure, then scope subsequent calls with org_id and project_id parameters. No configuration changes needed.

URL Context Awareness

Every tool accepts an optional url parameter. Paste a Harness UI URL, a pipeline page, an execution log, a dashboard, and the server automatically extracts the account, org, project, and resource identifiers. The agent gets context without the developer having to specify it manually.

Harness Skills: From MCP Tools to Guided Workflows

Reducing tool count solves the context efficiency problem. But developers don't just need fewer tools — they need tools that know how to chain together into real workflows. That's where Harness Skills come in.

The v2 server ships with a companion skills layer (github.com/thisrohangupta/harness-skills) that turns raw MCP tool access into guided, multi-step workflows. Skills are IDE-native agent instructions that teach the AI how to use the MCP server effectively — without the developer having to explain Harness concepts or orchestration patterns.

How Skills Work

Skills operate at three levels:

Level 1: Shared Agent Instructions

Every IDE gets a base instruction file, loaded automatically when the agent starts:

CLAUDE.md for Claude Code (auto-loaded)
AGENTS.md for OpenAI Codex / generic agents
.cursor/rules/harness.mdc for Cursor (auto-loaded as project rule)
.github/copilot-instructions.md for VS Code GitHub Copilot

These files teach the agent: what the 11 tools do, how Harness scoping works (account → org → project), dependency ordering (always verify referenced resources exist before creating dependents), and how to extract context from Harness UI URLs.

Level 2: Prompt Templates (Server-Side)

The 26 MCP prompt templates registered directly in the server. Any MCP client can invoke them. They encode multi-step workflows with phase gates, e.g., build-deploy-app structures a 4-phase workflow (clone → scan → CI pipeline → deploy) with explicit "do not proceed until this step is done" checkpoints.

Level 3: Individual Skills (Slash Commands)

Specialized SKILL.md files that function as slash commands in the IDE. Each skill includes YAML frontmatter (trigger phrases, metadata), phased instructions, worked examples, performance notes, and troubleshooting steps.

Pipeline & Execution: /create-pipeline, /run-pipeline, /debug-pipeline, /create-trigger, /create-template, /migrate-pipeline
Infrastructure: /create-service, /create-environment, /create-infrastructure, /create-connector, /create-secret
Access Control: /manage-users, /manage-roles
Specialized: /analyze-costs, /audit-report, /chaos-experiment, /create-policy

The Interaction Pattern

Without skills, a developer says "deploy my Node.js app" and the agent has to figure out the right Harness concepts, the correct ordering, and the proper API calls from scratch. With skills, the flow is:

IDE auto-loads shared instructions (tool reference, scoping rules, dependency ordering)
Agent matches intent to a skill via trigger phrases in skill descriptions
Skill provides ordered, phase-gated execution steps (what to check, what to ask, what to generate)
MCP server executes the actual harness_list / harness_create / harness_execute calls

Performance Benefits

The skills layer delivers three measurable improvements:

Fewer Round-Trips

Without skills, the agent typically needs 3–5 exploratory tool calls to understand Harness's resource model before starting real work. Skills encode this knowledge upfront — the agent knows to check for existing connectors before creating a pipeline, to verify environments exist before deploying, and to use harness_describe for schema discovery instead of trial-and-error.

Correct Ordering on First Attempt

Harness resources have strict dependency chains (connector → secret → service → environment → infrastructure → pipeline → trigger). Skills encode the 7-step "Deploy New Service" and 8-step "New Project Onboarding" workflows as ordered sequences. The agent doesn't discover dependencies through failures, it follows the prescribed order.

Reduced Token Waste

Each failed API call and retry burns tokens. Skills eliminate the most common failure modes (wrong scope, missing dependencies, incorrect parameter formats) by teaching the agent the patterns before execution. The combination of 11 tools (minimal context overhead) plus skills (minimal wasted calls) means more of the context window is available for the developer's actual task.

IDE-Specific Integration

The first Harness MCP server (harness/mcp-server) pioneered the IDE-native pattern with a review-mcp-tool command that works across Cursor, Claude Code, and Windsurf via symlinked definitions:

.claude/commands/ → Claude Code slash commands
.cursor/commands/ → Cursor Agent commands
.windsurf/workflows/ → Windsurf workflows

One canonical definition in .harness/commands/, symlinked to all three. Update once, propagate everywhere.

The v2 skills layer extends this pattern from developer-tool commands to full DevOps workflows, the same "define once, deploy to every IDE" architecture, applied to pipeline creation, deployment debugging, cost analysis, and security review.

Operational Safety: Designed for Production

MCP servers that can create, update, and delete resources need safety guardrails. We built them in from the start.

Human-in-the-loop confirmation: All write operations use MCP elicitation to request explicit user confirmation before executing. The agent presents what it intends to do; the developer approves or rejects.

Fail-closed destructive operations: harness_delete is blocked entirely if the MCP client doesn't support elicitation. No silent deletions.

Read-only mode: Set HARNESS_READ_ONLY=true for shared environments, demos, or when you want agents to observe but not act.

Secrets safety: The secret resource type exposes metadata (name, type, org, project) but never the secret value itself.

Rate limiting and retries: Configurable rate limits (default: 10 req/s), automatic retries with backoff for transient failures, and bounded pagination to prevent runaway list operations.

Deployment: Local to Team-Scale

The v2 server supports two transports:

Stdio (default): Direct integration with Claude Desktop, Cursor, Windsurf, Gemini CLI. Zero network configuration.
Streamable HTTP: Session-based remote deployment. Kubernetes manifests included. Sessions reaped after 30 minutes. CORS restricted. Rate limited to 60 req/min per IP.

For team deployments, the HTTP transport is compatible with MCP gateways like Portkey, LiteLLM, and Envoy-based proxies, enabling shared control planes with centralized auth, observability, and policy enforcement.

# Local (Cursor, Claude Code)

npx harness-mcp-v2@latest

# Remote (team deployment)

npx harness-mcp-v2@latest http --port 3000

# Docker

docker run -e HARNESS_API_KEY=pat.xxx.yyy.zzz harness-mcp-v2

Why This Architecture Matters

The shift from 130+ tools to 11 isn't about simplification for its own sake. It's about recognizing that the best MCP servers are capability-oriented agent interfaces, not API mirrors.

Building the first Harness MCP server taught us the same lesson the broader ecosystem is learning: when you expose one tool per API endpoint, you're asking the LLM to be a routing layer. You're consuming context on definitions that could be used for reasoning. And you're fighting against the LLM's actual strengths, reasoning, planning, and multi-step problem solving, by forcing it to do something a switch statement does better. That first server made the cost concrete. The v2 is our answer.

The registry pattern inverts this. The tool vocabulary is stable: 11 verbs today, 11 verbs when Harness ships 50 more resource types. The registry is extensible. The skills layer is composable. The LLM reasons about what to do, and the server handles how to do it. That's not just an efficiency win — it's the correct division of labor between an LLM and a server.

This is the pattern we think more MCP servers should adopt, especially platforms with broad API surfaces. The MCP specification itself is built on the idea that servers expose capabilities, not endpoints. We took that literally.

Real Life Use Cases

The efficiency gains from the v2 architecture translate directly into concrete, time-saving use cases for developers operating within their IDEs. The combination of a minimal tool surface (11 tools), deep resource knowledge (125+ resource types), and pre-encoded workflows (Harness Skills) allows the agent to handle complex DevOps tasks with minimal guidance.

See it in action:

Some other use cases:

Debug a Failed CI Pipeline: Get root cause and logs for a pipeline run.

Onboard New Service: Create a Service, Environment, Infrastructure, and initial Connector.

Review Cloud Cost Anomaly: Investigate a sudden spike in cloud spend.

Check Compliance Status: Verify a service's SBOM compliance against OPA policies.

Deploy App to Prod: Execute a canary deployment pipeline.

Get Started

npx harness-mcp-v2@latest

Configure with your Harness PAT (account ID is auto-extracted):

HARNESS_API_KEY=pat.<accountId>.<tokenId>.<secret>

Full source: github.com/thisrohangupta/harness-mcp-v2

Official Harness MCP Server: github.com/harness/mcp-server

---

FAQs

What is the Harness MCP server?

The Harness MCP server is an MCP-compatible server that lets AI agents interact with Harness resources using a small set of generic tools.

Why does MCP tool count matter?

Each exposed tool adds metadata to the model context. A smaller tool surface leaves more room for reasoning and task execution.

How is the Harness MCP server different from a traditional MCP server?

Instead of exposing one tool per API endpoint, it uses 11 generic tools plus a registry that maps resource types to the correct API operations.

Which AI clients work with the Harness MCP server?

The post mentions Cursor, Claude Code, Claude Desktop, Windsurf, Gemini CLI, and other MCP-compatible clients.

Is the Harness MCP server safe for production use?

The design includes write confirmations, fail-closed delete behavior, read-only mode, and controls for retries, rate limiting, and deployment transport.

‍

Sunil Gattupalle

All this author’s posts

Sunil is an Engineering Architect focused on building production-grade AI and data platforms at scale.

Rohan Gupta

All this author’s posts

Rohan is the Product Lead for Harness AI, driving the future of AI-native DevOps.

Shubham Jindal

All this author’s posts

Shubham Jindal is a Director of AI Software Engineering at Harness, where he leads the development of intelligent systems that enhance developer productivity and software delivery. With a strong foundation in distributed systems and web performance.

Architecting MCP for AI Agents: Lessons from Our Redesign| Harness Blog

What We Learned: Tool Sprawl and Agent Performance

How We Stack Up: Context Efficiency Across the MCP Ecosystem

IDE Tool Limits and Practical Headroom

The CLI vs MCP Debate and Why It’s the Wrong Question

What CLI Gets Right

What CLI Can’t Do

Discovery

Authorization

Safety

The v2 Server Is Our Answer to This Debate

The Redesign: 11 Tools, 125+ Resource Types, 1 Registry

The 11 Tools

How the Registry Works

Optimized for Cursor, Claude Code, and Real Developer Workflows

Cursor and Windsurf: Stdio-First, Toolset Filtering

Claude Code: Prompt Templates and Multi-Project Discovery

URL Context Awareness

Harness Skills: From MCP Tools to Guided Workflows

How Skills Work

Level 1: Shared Agent Instructions

Level 2: Prompt Templates (Server-Side)

Level 3: Individual Skills (Slash Commands)

The Interaction Pattern

Performance Benefits

Fewer Round-Trips

Correct Ordering on First Attempt

Reduced Token Waste

IDE-Specific Integration

Operational Safety: Designed for Production

Deployment: Local to Team-Scale

Why This Architecture Matters

Real Life Use Cases

Get Started

FAQs

What is the Harness MCP server?

Why does MCP tool count matter?

How is the Harness MCP server different from a traditional MCP server?

Which AI clients work with the Harness MCP server?

Is the Harness MCP server safe for production use?

Similar Blogs

Harness AI: The Platform for Everything After Code

Meet Harness' MCP Server: A Smarter DevOps Way

the State of

Engineering

Excellence 2026

Architecting MCP for AI Agents: Lessons from Our Redesign
| Harness Blog