Chapters
Try It For Free
No items found.
November 5, 2025

Running Chaos Engineering on GKE Autopilot Just Got Easier

Table of Contents

Harness Chaos Engineering now runs natively on GKE Autopilot. A simple allowlist configuration enables you to test resilience on Google's managed Kubernetes without sacrificing security or requiring workarounds.

Google's GKE Autopilot provides fully managed Kubernetes without the operational overhead of node management, security patches, or capacity planning. However, running chaos engineering experiments on Autopilot has been challenging due to its security restrictions.

We've solved that problem.

Why This Matters

Chaos engineering helps you identify issues before they impact your users. The approach involves intentionally introducing controlled failures to understand how your system responds. Think of it as a fire drill for your infrastructure.

GKE Autopilot secures clusters by restricting many permissions, which is excellent for security. However, this made running chaos experiments difficult. You couldn't simply deploy Harness Chaos Engineering and begin testing.

That changes today.

What Changed

We collaborated with Google to add Harness Chaos Engineering to GKE Autopilot's official allowlist. This integration enables Harness to run chaos experiments while operating entirely within Autopilot's security boundaries.

No workarounds required. Just chaos engineering that works as expected.

How to Set It Up

1. Apply the Allowlist

First, you need to tell GKE Autopilot that Harness chaos workloads are okay to run. Copy this command:

kubectl apply -f - <<'EOF'
apiVersion: auto.gke.io/v1
kind: AllowlistSynchronizer
metadata:
  name: harness-chaos-allowlist-synchronizer
spec:
  allowlistPaths:
  - Harness/allowlists/chaos/v1.62/*
  - Harness/allowlists/service-discovery/v0.42/*
EOF
  

Then wait for it to be ready:

kubectl wait --for=condition=Ready allowlistsynchronizer/harness-chaos-allowlist-synchronizer --timeout=60s
  

That's it for the cluster configuration.

2. Enable Autopilot Mode in Harness

Next, configure Harness to work with GKE Autopilot. You have several options:

If you're setting up chaos for the first time, just use the 1-click chaos setup and toggle on "Use static name for configmap and secret" during setup.

If you already have infrastructure configured, go to Chaos Engineering > Environments, find your infrastructure, and enable that same toggle.

You can also set this up when creating a new discovery agent, or update an existing one in Project Settings > Discovery.

What You Can Test

You can run most of the chaos experiments you'd expect:

The integration supports a comprehensive range of chaos experiments:

Resource stress: Pod CPU Hog, Pod Memory Hog, Pod IO Stress, Disk Fill. These experiments help you understand how your pods behave under resource constraints.

Network chaos: Pod Network Latency, Pod Network Loss, Pod Network Corruption, Pod Network Duplication, Pod Network Partition, Pod Network Rate Limit. Production networks experience imperfections, and your application needs to handle them gracefully.

DNS problems: Pod DNS Error to disrupt resolution, Pod DNS Spoof to redirect traffic.

HTTP faults: Pod HTTP Latency, Pod HTTP Modify Body, Pod HTTP Modify Header, Pod HTTP Reset Peer, Pod HTTP Status Code. These experiments test how your APIs respond to unexpected behavior.

API-level chaos: Pod API Block, Pod API Latency, Pod API Modify Body, Pod API Modify Header, Pod API Status Code. Good for testing service mesh and gateway behavior.

File system chaos: Pod IO Attribute Override, Pod IO Error, Pod IO Latency, Pod IO Mistake. These experiments reveal how your application handles storage issues.

Container lifecycle: Container Kill and Pod Delete to test recovery. Pod Autoscaler to see if scaling works under pressure.

JVM chaos if you're running Java: Pod JVM CPU Stress, Pod JVM Method Exception, Pod JVM Method Latency, Pod JVM Modify Return, Pod JVM Trigger GC.

Database chaos for Java apps: Pod JVM SQL Exception, Pod JVM SQL Latency, Pod JVM Mongo Exception, Pod JVM Mongo Latency, Pod JVM Solace Exception, Pod JVM Solace Latency.

Cache problems: Redis Cache Expire, Redis Cache Limit, Redis Cache Penetration.

Time manipulation: Time Chaos to introduce controlled time offsets.

What This Means for You

If you're running GKE Autopilot and want to implement chaos engineering with Harness, you can now do both without compromise. There's no need to choose between Google's managed experience and resilience testing.

For teams new to chaos engineering, Autopilot provides an ideal starting point. The managed environment reduces infrastructure complexity, allowing you to focus on understanding application behavior under stress.

Getting Started

Start with a simple CPU stress test. Select a non-critical pod and run a low-intensity Pod CPU Hog experiment in Harness. Observe the results: Does your application degrade gracefully? Do your alerts trigger as expected? Does it recover when the experiment completes?

Start small, understand your system's behavior, then explore more complex scenarios.

You can configure Service Discovery to visualize your services in Application Maps, add probes to validate resilience during experiments, and progressively explore more sophisticated fault injection scenarios.

Check out the documentation for the complete setup guide and all supported experiments.

The goal of chaos engineering isn't to break things. It's to understand what breaks before it impacts your users.

Ashutosh Bhadauriya

Senior Developer Relations Engineer

Matt Schillerstrom

Matt Schillerstrom is a Product Marketing Manager at Harness, specializing in Feature Management, Chaos Engineering, Database DevOps, and AI-native DevOps. With over two decades of experience in DevOps and reliability practices, Matt helps DevOps engineering and SRE teams adopt modern delivery workflows built on governance, automation, and resilience. His work bridges technical depth and business impact to drive software reliability at scale.

Read

The Chaos Engineering Maturity Model

Explore four levels of chaos engineering maturity to enhance software reliability. Learn organizational roles and assess your maturity level.

Read the ebook
Link
No items found.
Chaos Engineering