Harness Blog

Featured Blogs

March 30, 2026
Time to Read

On March 19th, the risks of running open execution pipelines — where what code runs in your CI/CD environment is largely uncontrolled — went from theoretical to catastrophic.

A threat actor known as TeamPCP compromised the GitHub Actions supply chain at a scale we haven't seen before (tracked as CVE-2026-33634, CVSS 9.4). They compromised Trivy, the most widely used vulnerability scanner in the cloud-native ecosystem, and turned it into a credential-harvesting tool that ran inside victims' own pipelines.

Between March 19 and March 24, 2026, organizations running affected tag-based GitHub Actions references were sending their AWS tokens, SSH keys, and Kubernetes secrets directly to the attacker. SANS Institute estimates over 10,000 CI/CD workflows were directly affected. According to multiple security research firms, the downstream exposure extends to tens of thousands of repositories and hundreds of thousands of accounts.

Five ecosystems. Five days. One stolen Personal Access Token.

This is a fundamental failure of the open execution pipeline model — where what runs in your pipeline is determined by external references to public repositories, mutable version tags, and third-party code that executes with full privileges. GitHub Actions is the most prominent implementation. 

The alternative, governed execution pipelines, where what runs is controlled through policy gates, customer-owned infrastructure, scoped credentials, and immutable references, is the model we designed Harness around years ago, precisely because we saw this class of attack coming.

Part I: The Long Road to TeamPCP (2025–2026)

TeamPCP wasn't an anomaly; it was the inevitable conclusion of an eighteen-month escalation in CI/CD attack tactics.

1. The tj-actions Proof of Concept (March 2025)

CVE-2025-30066. Attackers compromised a PAT from an upstream dependency (reviewdog/action-setup) and force-pushed malicious code to every single version tag of tj-actions/changed-files. 23,000 repositories were exposed. The attack was later connected to a targeted campaign against Coinbase. CISA issued a formal advisory.

This proved that the industry's reliance on mutable tags (like @v2) was a serious structural vulnerability. According to Wiz, only 3.9% of repositories pin to immutable SHAs. The other 96% are trusting whoever owns the tag today.

2. The Shai-Hulud Worm (Sept–Nov 2025)

The first self-replicating worm in the CI/CD ecosystem. Shai-Hulud 2.0 backdoored 796 npm packages representing over 20 million weekly downloads — including packages from Zapier, PostHog, and Postman. 

It used TruffleHog to harvest 800+ credential types, registered compromised machines as self-hosted GitHub runners named SHA1HULUD for persistent C2 over github.com, and built a distributed token-sharing network where compromised machines could replace each other's expired credentials.

PostHog's candid post-mortem revealed that attackers stole their GitHub bot's PAT via a pull_request_target workflow exploit, then used it to steal npm publishing tokens from CI runner secrets. Their admission that this kind of attack "simply wasn't something we'd prepared for" reflects the industry-wide gap between application security and CI/CD security maturity. CISA issued another formal advisory.

3. The Trivy Compromise (March 19, 2026)

TeamPCP went after the security tools themselves.

They exploited a misconfigured GitHub Actions workflow to steal a PAT from Aqua Security's aqua-bot service account. Aqua detected the breach and initiated credential rotation — but reporting suggests the rotation did not fully cut off attacker access. TeamPCP appears to have retained or regained access to Trivy's release infrastructure, enabling the March 19 attack weeks after initial detection.

On March 19, they force-pushed a malicious "Cloud Stealer" to 76 of 77 version tags in trivy-action and all 7 tags in setup-trivy. Simultaneously, they published an infected Trivy binary (v0.69.4) to GitHub Releases and Docker Hub. Every pipeline referencing those tags by name started executing the attacker's code on its next run. No visible change to the release page. No notification. No diff to review.

Part II: Inside the "Cloud Stealer" Tradecraft

TeamPCP's payload was purpose-built for CI/CD runner environments:

Memory Scraping. It read /proc/*/mem to extract decrypted secrets held in RAM. GitHub's log-masking can't hide what's in process memory.

Cloud Metadata Harvesting. It queried the AWS Instance Metadata Service (IMDS) at 169.254.169.254, pivoting from "build job" to full IAM role access in the cloud.

Filesystem Sweep. It searched over 50 specific paths — .env files, .aws/credentials, .kube/config, SSH keys, GPG keys, Docker configs, database connection strings, and cryptocurrency wallet keys.

Encrypted Exfiltration. All data was bundled into tpcp.tar.gz, encrypted with AES-256 and RSA-4096, and sent to typosquatted domains like scan.aquasecurtiy[.]org (note the "tiy"). These domains returned clean verdicts from threat intelligence feeds during the attack. As a fallback, the stealer created public GitHub repos named tpcp-docs under the victim's own account.

The malicious payload executed before the legitimate Trivy scan. Pipelines appeared to work normally. CrowdStrike noted: "To an operator reviewing workflow logs, the step appears to have completed successfully."

The Five-Day Cascade

Date Target Impact
March 19 Trivy 10,000+ workflows affected; CVE-2026-33634 (CVSS 9.4).
March 20 npm CanisterWorm deployed; 50+ packages backdoored.
March 22 Aqua Security Internal GitHub org hijacked; 44 repos exposed in a 2-minute burst.
March 23 Checkmarx KICS AST and KICS Actions poisoned; VS Code extensions trojanized.
March 24 LiteLLM PyPI packages poisoned; malware ran every time Python started via .pth hooks.

Sysdig observed that the vendor-specific typosquat domains were a deliberate deception — an analyst reviewing CI/CD logs would see traffic to what appears to be the vendor's own domain. 

It took Aqua five days to fully evict the attacker, during which TeamPCP pushed additional malicious Docker images (v0.69.5 and v0.69.6).

Part III: Why Open Execution Pipelines Break at Scale

Why did this work so well? Because GitHub Actions is the leading example of an open execution pipeline — where what code runs in your pipeline is determined by external references that anyone can modify.

This trust problem isn't new. Jenkins had a similar issue with plugins. Third-party code ran with full process privileges. But Jenkins ran inside your firewall; exfiltrating data required getting past your network perimeter. 

GitHub Actions took the same open execution approach but moved execution to cloud-hosted runners with broad internet egress, making exfiltration trivially easy. TeamPCP's Cloud Stealer just needed to make an HTTPS POST to an external domain, which runners are designed to do freely. 

Here are a few reasons why open execution pipelines break at scale:

Mutable Trust. When you use @v2, you are trusting a pointer, not a piece of code. Tags can be silently redirected by anyone with write access. TeamPCP rewrote 76 tags in a single operation. 96% of the ecosystem is exposed.

Flat Privileges. Third-party Actions run with the same permissions as your code. No sandbox. No permission isolation. This is why TeamPCP targeted security scanners — tools that by design have elevated access to your pipeline infrastructure. The attacker doesn't need to break in. The workflow invites them in.

Secret Sprawl. Secrets are typically injected into the runner's environment or process memory during job execution, where they remain accessible for the job's duration. TeamPCP's /proc/*/mem scraper didn't need any special privilege. It just needed to be running on the same machine.

Unbounded Credential Cascades. There is no architectural boundary that stops a credential stolen in one context from unlocking another. TeamPCP proved this definitively: Trivy → Checkmarx → LiteLLM → AI API keys across thousands of enterprises. One PAT, five ecosystems.

Part IV: Governed Execution Pipelines — Three Structural Walls

Harness CI/CD pipelines are built as governed execution pipelines — where what runs is controlled through customer-owned infrastructure, policy gates, scoped credentials, immutable references, and explicit trust boundaries. At its core is the Delegate — a lightweight worker process that runs inside your infrastructure (your VPC, your Kubernetes cluster), executes tasks locally, and communicates with the Harness control plane via outbound-only connections.

When we designed this architecture, we assumed the execution plane would become the primary target in the enterprise. If TeamPCP tried to attack a Harness-powered environment, they would hit three architectural walls.

Wall 1: The Airlock (Outbound-Only, Egress-Filtered Execution)

The Architecture. 

The Delegate lives inside your VPC or cluster. It communicates with our SaaS control plane via outbound-only HTTPS/WSS. No inbound ports are opened.

The Defense. 

You control the firewall. Allowlist app.harness.io and the specific endpoints your pipelines need, deny everything else. TeamPCP's exfiltration to typosquat domains would fail at the network layer — not because of a detection rule, but because the path doesn't exist. Both typosquat domains returned clean verdicts from threat intel feeds. Egress filtering by allowlist is more reliable than detection by reputation.

Wall 2: The Vault (Secret Isolation at the Source)

The Architecture

Rather than bulk-injecting secrets as flat environment variables at job start, Harness can resolve secrets at runtime through your secret manager — HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault — via the Delegate, inside your network. Harness SaaS stores encrypted references and metadata, not plaintext secret values.

The Defense

TeamPCP's Cloud Stealer worked because in an open execution pipeline, secrets are typically injected into the runner's process memory where they remain accessible for the job's duration. In a governed execution pipeline, this exposure is structurally reduced: secrets can be resolved from your controlled vault at the point they're needed, rather than broadcast as environment variables to every step in the pipeline.

An important caveat: Vault-based resolution alone doesn't eliminate runtime exfiltration. Once a secret is resolved and passed to a step that legitimately needs it — say, an npm token during npm publish — that secret exists in the step's runtime. If malicious code is executing in that same context (for example, a tampered package.json that exfiltrates credentials during npm run test), the secret is exposed regardless of where it came from. This is why the three walls work as a system: Wall 2 reduces the surface of secret exposure, Wall 1 blocks the exfiltration path, and (as we'll see) Wall 3 limits the blast radius to the scoped environment. No single wall is sufficient on its own.

To further strengthen how pipelines use secrets, leverage ephemeral credentials — AWS STS temporary tokens, Vault dynamic secrets, or GCP short-lived service account tokens — that auto-expire after a defined window, often minutes. Even if TeamPCP’s memory scraper extracted an ephemeral credential, it likely would have expired before the attacker could pivot to the next target.

Wall 3: The Dead End (Environment-Scoped Isolation)

The Architecture. 

Harness supports environment-scoped delegates as a core architecture pattern. Your "Dev" scanner delegate runs in a different cluster, with different network boundaries and different credentials, than your "Prod" deployment delegate.

The Defense. 

The credential cascade that defined TeamPCP hits a dead end. Stolen Dev credentials cannot reach Production publishing gates or AI API keys, because those credentials live in a different vault, resolved by a different delegate, in a different network segment. If the Trivy compromise only yielded credentials scoped to a dev environment, the attack stops at phase one.

Beyond the walls, governed execution pipelines provide additional structural controls:

  • No default marketplace dependency: In GitHub Actions, the primary building block is a reference to an external Action in a public repository. In Harness, the primary building blocks are native pipeline steps that don't reference external Git repos. Harness does support running GitHub Actions as steps for teams that need compatibility, but external Actions are an optional path — not the default architecture.
  • Reduced tooling and attack surface. Customers can use minimal delegate images with a significantly reduced binary footprint and least-privilege Kubernetes roles to restrict available tooling. TeamPCP's kubectl get secrets --all-namespaces would require tooling and permissions that a properly hardened delegate environment wouldn't provide.

The Comparison

Dimension Open Execution (e.g., GitHub Actions) Governed Execution (Harness)
Trust Source External repos, public authors, and mutable tags. Internal policy, customer-owned infrastructure, and governed configs.
Secret Delivery Bulk-injected as environment variables at job start. Resolved from your Vault/KMS by the Delegate at execution time.
Network Model Bidirectional with broad egress from cloud runners. Outbound-only with strict egress allowlisting.
Environment Isolation Optional and typically manually configured. Separate Delegates per environment supported as a core architecture pattern.
Runner Persistence Self-hosted runners may persist between jobs. Ephemeral execution patterns and minimal images reduce persistence risk.
Governance SHA pinning is manual; 96% of the ecosystem remains unpinned. Native steps aren't pulled from external Git repos by default.
Credential Rotation Manual, often incomplete. Customer-managed vault integration with delegate-scoped access narrows blast radius.
Credential Lifetime Typically long-lived static secrets. Supports ephemeral credentials (AWS STS, Vault dynamic secrets, GCP temporary tokens) that auto-expire after job completion.

What TeamPCP Actually Exploited — Mapped to Harness Defenses

Attack Vector TeamPCP / Shai-Hulud Method Governed Pipeline Defense (Harness)
Tag Poisoning Force-pushed malicious code to 76 of 77 version tags in trivy-action, affecting all pipelines using @v2. No Default Marketplace Dependency: Native steps do not reference external Git repos by default, eliminating reliance on mutable third-party tags.
Secret Harvesting Scraped /proc/*/mem and queried AWS IMDS (169.254.169.254) to extract decrypted secrets from runner memory. Vault-Based Resolution: Secrets are resolved at execution time via the Delegate from your Vault/KMS rather than bulk-injected as environment variables.
Lateral Movement Used stolen Trivy PATs to poison Checkmarx, then LiteLLM, allowing one credential to unlock five ecosystems. Delegate Scoping: Environment-scoped delegates ensure Dev credentials cannot reach Production publishing gates across network boundaries.
Persistence Installed malicious .pth hooks in Python and registered compromised machines as persistent SHA1HULUD runners. Ephemeral Execution: Minimal images with reduced binary footprints and auto-scaling delegates significantly reduce persistence opportunities.
Network Deception Sent data to typosquatted domains (e.g., aquasecurtiy[.]org) that passed standard reputation checks. Egress Allowlisting: Outbound-only traffic restricted to your specific VPC endpoints; unknown domains are blocked at the network layer.
Worm Propagation Shai-Hulud self-replicated via stolen npm tokens. CanisterWorm spread via stolen SSH keys and K8s APIs. Secret Isolation: Publishing tokens not exposed as env vars. Minimal delegate images resist worm installation and persistence.
Rotation Gap Retained access during incomplete credential rotation window. Vault Integration + Ephemeral Credentials:Delegate-scoped access narrows blast radius. Ephemeral credentials (AWS STS, Vault dynamic secrets) auto-expire after job completion, limiting the lifetime of stolen credentials to minutes regardless of rotation hygiene.

Part V: The Nuance — Governed Doesn't Mean Automatically Safe

Architecture is a foundation, not a guarantee. Governed execution pipelines are materially safer against this class of attack, but you can still create avoidable risk by running unvetted containers on delegates, skipping egress filtering, using the same delegate across dev and prod, granting overly broad cloud access, or exposing excessive secrets to jobs that don't need them, or using long-lived static credentials when ephemeral alternatives exist.

I am not claiming that Harness is safe and GitHub Actions is unsafe. That would be too simplistic. 

What I am claiming is that governed execution pipelines — where what runs is controlled through policy gates, customer-owned infrastructure, scoped credentials, and immutable references — are a materially safer foundation than open execution pipelines. We designed Harness as our implementation of a governed execution pipeline. But architecture is a starting point — you still have to operate it well.

Part VI: The Strategic Bottom Line — From Open to Governed

As we enter the era of Agentic AI — where AI is generating pipelines, suggesting dependencies, and submitting pull requests at machine speed — we can no longer rely on human review to catch a malicious tag in an AI-generated PR.

But there's a more fundamental shift: AI agents will become the primary actors inside CI/CD pipelines. Not just generating code — autonomously executing tasks, selecting dependencies, making deployment decisions, remediating incidents.

Now imagine an AI agent in an open execution pipeline — downloaded from a public marketplace, referenced by a mutable tag, executing with full privileges, making dynamic runtime decisions you didn't define. It has access to your secrets, your cloud credentials, and your deployment infrastructure. Unlike a static script, an agent makes decisions at runtime — fetching resources, calling APIs, modifying files.

If TeamPCP showed us what happens when a static scanner is compromised, imagine what happens when an autonomous AI agent is compromised — or simply makes a decision you didn't anticipate.

This is why governed execution pipelines aren't just a security improvement — they're an architectural prerequisite for the AI era. In a governed pipeline, even an AI agent operates within structural boundaries: it runs on infrastructure you control, accesses only scoped secrets, has restricted egress, and its actions are audited. The agent may be autonomous, but the pipeline constrains what it can reach.

The questions every engineering leader should be asking:

  1. Is my pipeline open or governed? Do I control what code executes, or is it determined by external references I don't audit?
  2. Where does execution happen? In infrastructure I control, or in an environment assembled from public dependencies?
  3. Who controls the network boundary? My security team, or the maintainer of a third-party Action?
  4. Are secrets sitting in runner memory or safely in my Vault?
  5. What stops a credential cascade from crossing environment boundaries?
  6. When AI agents start running autonomously in my pipelines, what structural boundaries constrain them?

What You Should Do Right Now

If you use Trivy, Checkmarx, or LiteLLM

  • Assume compromise if you ran any of these tools between March 19–25. Rotate all credentials accessible to affected CI/CD runners. Check your GitHub org for repos named tpcp-docs — their presence indicates successful exfiltration. 
  • Block scan.aquasecurtiy[.]org, checkmarx[.]zone, and models.litellm[.]cloud at the network level.
  • Update to safe versions: check with the providers of each impacted package and update the scanner and actions.

If you use GitHub Actions

  • Pin every Action to an immutable commit SHA. Today. 
  • Add provenance verification: To close the gap left by SHA pinning alone, verify the Action’s source and publisher, restrict which external Actions are allowed, and prefer artifacts with verifiable provenance or attestations.
  • Audit workflows for pull_request_target triggers. 
  • Enforce Least Privilege on GitHub Tokens: Audit every Personal Access Token and GitHub App permission. If it’s not scoped to the specific repository and the specific task (e.g., "contents: read"), it is a liability.
  • Monitor egress for unexpected destinations: Domain reputation alone is insufficient.

For the longer term

  • Evaluate whether your CI/CD pipelines are open or governed. If production credentials flow through your pipelines, you need a governed execution pipeline where you control the infrastructure, the network boundary, the secret resolution, and the audit trail.
  • Establish policies: Implement platform-wide automated governance to enforce SHAs and least-privilege token usage programmatically through systems like OPA.

The Responsibility We Share

I'm writing this as the CEO of a company that competes with GitHub in the CI/CD space. I want to be transparent about that.

But I'm also writing this as someone who has spent two decades building infrastructure software and who saw this threat model coming. When we designed Harness, the open execution pipeline model had already evolved from Jenkins plugins to GitHub Actions — each generation making it easier for third-party code to run with full privileges and, by moving execution further from the customer's network perimeter, making exfiltration easier. We deliberately chose to build governed execution pipelines instead.

The TeamPCP campaign didn't teach us anything new about the risk. What it did was make the difference between open and governed execution impossible for the rest of the industry to ignore.

Open source security tools are invaluable. The developers and companies who build them — including Aqua Security and Checkmarx — are doing essential work. The problem isn't the tools. The problem is running them inside open execution pipelines where third-party code has full privileges, secrets sit in memory, and exfiltration faces no structural barrier.

If you want to explore how the delegate architecture works in practice, we're here to show you. But more importantly, regardless of what platform you choose, please take these structural questions seriously. The next TeamPCP is already studying the credential graph.

March 11, 2026
Time to Read

Over the last few years, something fundamental has changed in software development.

If the early 2020s were about adopting AI coding assistants, the next phase is about what happens after those tools accelerate development. Teams are producing code faster than ever. But what I’m hearing from engineering leaders is a different question:

What’s going to break next?

That question is exactly what led us to commission our latest research, State of DevOps Modernization 2026. The results reveal a pattern that many practitioners already sense intuitively: faster code generation is exposing weaknesses across the rest of the software delivery lifecycle.

In other words, AI is multiplying development velocity, but it’s also revealing the limits of the systems we built to ship that code safely.

The Emerging “Velocity Paradox”

One of the most striking findings in the research is something we’ve started calling the AI Velocity Paradox - a term we coined in our 2025 State of Software Engineering Report

Teams using AI coding tools most heavily are shipping code significantly faster. In fact, 45% of developers who use AI coding tools multiple times per day deploy to production daily or faster, compared to 32% of daily users and just 15% of weekly users.

At first glance, that sounds like a huge success story. Faster iteration cycles are exactly what modern software teams want.

But the data tells a more complicated story.

Among those same heavy AI users:

  • 69% report frequent deployment problems when AI-generated code is involved
  • Incident recovery times average 7.6 hours, longer than for teams using AI less frequently
  • 47% say manual downstream work, QA, validation, remediation has become more problematic

What this tells me is simple: AI is speeding up the front of the delivery pipeline, but the rest of the system isn’t scaling with it. It’s like we are running trains faster than the tracks they are built for. Friction builds, the ride is bumpy, and it seems we could be on the edge of disaster.

The result is friction downstream, more incidents, more manual work, and more operational stress on engineering teams.

Why the Delivery System Is Straining

To understand why this is happening, you have to step back and look at how most DevOps systems actually evolved.

Over the past 15 years, delivery pipelines have grown incrementally. Teams added tools to solve specific problems: CI servers, artifact repositories, security scanners, deployment automation, and feature management. Each step made sense at the time.

But the overall system was rarely designed as a coherent whole.

In many organizations today, quality gates, verification steps, and incident recovery still rely heavily on human coordination and manual work. In fact, 77% say teams often have to wait on other teams for routine delivery tasks.

That model worked when release cycles were slower.

It doesn’t work as well when AI dramatically increases the number of code changes moving through the system.

Think of it this way: If AI doubles the number of changes engineers can produce, your pipelines must either:

  • cut the risk of each change in half, or
  • detect and resolve failures much faster.

Otherwise, the system begins to crack under pressure. The burden often falls directly on developers to help deploy services safely, certify compliance checks, and keep rollouts continuously progressing. When failures happen, they have to jump in and remediate at whatever hour. 

These manual tasks, naturally, inhibit innovation and cause developer burnout. That’s exactly what the research shows.

Across respondents, developers report spending roughly 36% of their time on repetitive manual tasks like chasing approvals, rerunning failed jobs, or copy-pasting configuration.

As delivery speed increases, the operational load increases. That burden often falls directly on developers.

What Organizations Should Do Next

The good news is that this problem isn’t mysterious. It’s a systems problem. And systems problems can be solved.

From our experience working with engineering organizations, we've identified a few principles that consistently help teams scale AI-driven development safely.

1. Standardize delivery foundations

When every team builds pipelines differently, scaling delivery becomes difficult.

Standardized templates (or “golden paths”) make it easier to deploy services safely and consistently. They also dramatically reduce the cognitive load for developers.

2. Automate quality and security checks earlier

Speed only works when feedback is fast.

Automating security, compliance, and quality checks earlier in the lifecycle ensures problems are caught before they reach production. That keeps pipelines moving without sacrificing safety.

3. Build guardrails into the release process

Feature flags, automated rollbacks, and progressive rollouts allow teams to decouple deployment from release. That flexibility reduces the blast radius of new changes and makes experimentation safer.

It also allows teams to move faster without increasing production risk.

4. Remember measurement, not just automation

Automation alone doesn’t solve the problem. What matters is creating a feedback loop: deploy → observe → measure → iterate.

When teams can measure the real-world impact of changes, they can learn faster and improve continuously.

The Next Phase of AI in Software Delivery

AI is already changing how software gets written. The next challenge is changing how software gets delivered.

Coding assistants have increased development teams' capacity to innovate. But to capture the full benefit, the delivery systems behind them must evolve as well.

The organizations that succeed in this new environment will be the ones that treat software delivery as a coherent system, not just a collection of tools.

Because the real goal isn’t just writing code faster. It’s learning faster, delivering safer, and turning engineering velocity into better outcomes for the business.

And that requires modernizing the entire pipeline, not just the part where code is written.

Latest Blogs

The Modern Software Delivery Platform®

Loved by Developers, Trusted by Businesses
Get Started

Need more info? Contact Sales