Automating Chaos Engineering with Terraform

All this author’s posts

Automating chaos engineering with Terraform eliminates manual setup across environments by enabling you to version control your entire chaos infrastructure, from service discovery to security governance policies. The Harness Terraform provider supports end-to-end automation including Kubernetes infrastructure setup, custom image registries, Git-based ChaosHub management, and granular security controls that ensure safe experiment execution in production. Start small with one environment, then scale gradually by adding governance rules and custom experiments as you learn what works for your systems.

Infrastructure as Code (IaC) has revolutionized how we manage and provision infrastructure. But what about chaos engineering? Can you automate the setup of your chaos experiments the same way you provision your infrastructure?

The answer is yes. In this guide, I'll walk you through how to integrate Harness Chaos Engineering into your infrastructure using Terraform, making it easier to maintain resilient systems at scale.

Why Automate Chaos Engineering?

Before diving into the technical details, let's talk about why this matters.

Managing chaos engineering manually across multiple environments is time-consuming and error-prone. You need to set up infrastructures, configure service discovery, manage security policies, and maintain consistency across dev, staging, and production environments.

With Terraform, you can:

Version control your entire chaos engineering setup
Replicate configurations across environments reliably
Integrate chaos engineering into your existing IaC workflows
Collaborate with your team using familiar tools

What You Can Automate

The Harness Terraform provider lets you automate several key aspects of chaos engineering:

Infrastructure Setup - Enable chaos engineering on your existing Kubernetes clusters or provision new ones with chaos capabilities built in.

Service Discovery - Automatically detect services that can be targeted for chaos experiments, eliminating manual configuration.

Image Registries - Configure custom image registries for your chaos experiment workloads, giving you control over where container images are pulled from.

Security Governance - Define and enforce policies that control when and how chaos experiments can run, particularly important for production environments.

ChaosHub Management - Manage repositories of reusable chaos experiments, probes, and actions at the organization or project level.

Getting Started

Before you begin, make sure you have:

Terraform installed and configured
The Harness Terraform provider set up (see the official documentation)
A Kubernetes infrastructure where you want to enable chaos engineering

Currently, the Harness Terraform provider for chaos engineering supports Kubernetes infrastructures.

Building Your Configuration

Let's walk through the key resources you'll need.

Setting Up Common Configuration

Start by defining common variables that will be used across all your resources:

locals {
  org_id = var.org_identifier != null ? var.org_identifier : harness_platform_organization.this[0].id
  
  project_id = var.project_identifier != null ? var.project_identifier : (
    var.org_identifier != null ? "${var.org_identifier}_${replace(lower(var.project_name), " ", "_")}" : 
    "${harness_platform_organization.this[0].id}_${replace(lower(var.project_name), " ", "_")}"
  )
  
  common_tags = merge(
    var.tags,
    {
      "module" = "harness-chaos-engineering"
    }
  )
  
  tags_set = [for k, v in local.common_tags : "${k}=${v}"]
}

This approach keeps your configuration DRY and makes it easy to reference organization and project identifiers throughout your setup.

Creating Organization and Project

If you don't have an existing organization or project, Terraform can create them:

resource "harness_platform_organization" "this" {
  count       = var.org_identifier == null ? 1 : 0
  identifier  = replace(lower(var.org_name), " ", "_")
  name        = var.org_name
  description = "Organization for Chaos Engineering"
  tags        = local.tags_set
}

resource "harness_platform_project" "this" {
  depends_on = [harness_platform_organization.this]
  
  count       = var.project_identifier == null ? 1 : 0
  org_id      = local.org_id
  identifier  = local.project_id
  name        = var.project_name
  color       = var.project_color
  description = "Project for Chaos Engineering"
  tags        = local.tags_set
}

Setting Up Kubernetes Connector

Connect your Kubernetes cluster to Harness:

resource "harness_platform_connector_kubernetes" "this" {
  depends_on = [harness_platform_project.this]
  
  identifier = var.k8s_connector_name
  name       = var.k8s_connector_name
  org_id     = local.org_id
  project_id = local.project_id

  inherit_from_delegate {
    delegate_selectors = var.delegate_selectors
  }

  tags = local.tags_set
}

Creating Environment and Infrastructure

Set up your environment and infrastructure definition:

resource "harness_platform_environment" "this" {
  depends_on = [
    harness_platform_project.this,
    harness_platform_connector_kubernetes.this
  ]
  
  identifier = var.environment_identifier
  name       = var.environment_name
  org_id     = local.org_id
  project_id = local.project_id
  type       = "PreProduction"
  
  tags = local.tags_set
}

resource "harness_platform_infrastructure" "this" {
  depends_on = [
    harness_platform_environment.this,
    harness_platform_connector_kubernetes.this
  ]
  
  identifier      = var.infrastructure_identifier
  name            = var.infrastructure_name
  org_id          = local.org_id
  project_id      = local.project_id
  env_id          = harness_platform_environment.this.id
  deployment_type = var.deployment_type
  type            = "KubernetesDirect"

  yaml = <<-EOT
  infrastructureDefinition:
    name: ${var.infrastructure_name}
    identifier: ${var.infrastructure_identifier}
    orgIdentifier: ${local.org_id}
    projectIdentifier: ${local.project_id}
    environmentRef: ${harness_platform_environment.this.id}
    type: KubernetesDirect
    deploymentType: ${var.deployment_type}
    allowSimultaneousDeployments: false
    spec:
      connectorRef: ${var.k8s_connector_name}
      namespace: ${var.namespace}
      releaseName: release-${var.infrastructure_identifier}
  EOT

  tags = local.tags_set
}

Enabling Chaos Infrastructure

Now enable chaos engineering capabilities on your infrastructure:

resource "harness_chaos_infrastructure_v2" "this" {
  depends_on = [harness_platform_infrastructure.this]
  
  org_id         = local.org_id
  project_id     = local.project_id
  environment_id = harness_platform_environment.this.id
  infra_id       = harness_platform_infrastructure.this.id
  name           = var.chaos_infra_name
  description    = var.chaos_infra_description
  
  namespace  = var.chaos_infra_namespace
  infra_type = var.chaos_infra_type
  
  ai_enabled           = var.chaos_ai_enabled
  insecure_skip_verify = var.chaos_insecure_skip_verify
  
  service_account = var.service_account_name
  tags            = local.tags_set
}

Automating Service Discovery

Service discovery eliminates the need to manually register services for chaos experiments:

resource "harness_service_discovery_agent" "this" {
  depends_on = [harness_chaos_infrastructure_v2.this]
  
  name                   = var.service_discovery_agent_name
  org_identifier         = local.org_id
  project_identifier     = local.project_id
  environment_identifier = harness_platform_environment.this.id
  infra_identifier       = harness_platform_infrastructure.this.id
  installation_type      = var.sd_installation_type

  config {
    kubernetes {
      namespace = var.sd_namespace
    }
  }
}

Once deployed, the agent will automatically detect services running in your cluster, making them available for chaos experiments.

Configuring Custom Image Registries

For organizations that use private registries or have specific image sourcing requirements, you can configure custom image registries at both organization and project levels:

resource "harness_chaos_image_registry" "org_level" {
  depends_on = [harness_platform_organization.this]
  
  count = var.setup_custom_registry ? 1 : 0
  
  org_id = local.org_id
  
  registry_server  = var.registry_server
  registry_account = var.registry_account
  
  is_default          = var.is_default_registry
  is_override_allowed = var.is_override_allowed
  is_private          = var.is_private_registry
  secret_name         = var.registry_secret_name != "" ? var.registry_secret_name : null
  
  use_custom_images = var.use_custom_images
  dynamic "custom_images" {
    for_each = var.use_custom_images ? [1] : []
    content {
      log_watcher = var.log_watcher_image != "" ? var.log_watcher_image : null
      ddcr        = var.ddcr_image != "" ? var.ddcr_image : null
      ddcr_lib    = var.ddcr_lib_image != "" ? var.ddcr_lib_image : null
      ddcr_fault  = var.ddcr_fault_image != "" ? var.ddcr_fault_image : null
    }
  }
}

resource "harness_chaos_image_registry" "project_level" {
  depends_on = [harness_chaos_image_registry.org_level]
  
  count = var.setup_custom_registry ? 1 : 0
  
  org_id     = local.org_id
  project_id = local.project_id
  
  registry_server  = var.registry_server
  registry_account = var.registry_account
  
  is_default          = var.is_default_registry
  is_override_allowed = var.is_override_allowed
  is_private          = var.is_private_registry
  secret_name         = var.registry_secret_name != "" ? var.registry_secret_name : null
  
  use_custom_images = var.use_custom_images
  dynamic "custom_images" {
    for_each = var.use_custom_images ? [1] : []
    content {
      log_watcher = var.log_watcher_image != "" ? var.log_watcher_image : null
      ddcr        = var.ddcr_image != "" ? var.ddcr_image : null
      ddcr_lib    = var.ddcr_lib_image != "" ? var.ddcr_lib_image : null
      ddcr_fault  = var.ddcr_fault_image != "" ? var.ddcr_fault_image : null
    }
  }
}

Setting Up Git Connector for ChaosHub

To manage your chaos experiments in Git repositories, first create a Git connector:

resource "harness_platform_connector_git" "chaos_hub" {
  depends_on = [
    harness_platform_organization.this,
    harness_platform_project.this
  ]
  
  count = var.create_git_connector ? 1 : 0
  
  identifier      = replace(lower(var.git_connector_name), " ", "-")
  name            = var.git_connector_name
  description     = "Git connector for Chaos Hub"
  org_id          = local.org_id
  project_id      = local.project_id
  url             = var.git_connector_url
  connection_type = "Account"
  
  dynamic "credentials" {
    for_each = var.git_connector_ssh_key != "" ? [1] : []
    content {
      ssh {
        ssh_key_ref = var.git_connector_ssh_key
      }
    }
  }
  
  dynamic "credentials" {
    for_each = var.git_connector_ssh_key == "" ? [1] : []
    content {
      http {
        username     = var.git_connector_username != "" ? var.git_connector_username : null
        password_ref = var.git_connector_password != "" ? var.git_connector_password : null
        
        dynamic "github_app" {
          for_each = var.github_app_id != "" ? [1] : []
          content {
            application_id  = var.github_app_id
            installation_id = var.github_installation_id
            private_key_ref = var.github_private_key_ref
          }
        }
      }
    }
  }
  
  validation_repo = var.git_connector_validation_repo
  
  tags = merge(
    { for k, v in var.chaos_hub_tags : k => v },
    {
      "managed_by" = "terraform"
      "purpose"    = "chaos-hub-git-connector"
    }
  )
}

This connector supports multiple authentication methods including SSH keys, HTTP credentials, and GitHub Apps, making it flexible for different Git hosting providers.

Managing ChaosHubs

ChaosHubs let you create libraries of reusable chaos experiments:

resource "harness_chaos_hub" "this" {
  depends_on = [harness_platform_connector_git.chaos_hub]
  
  count = var.create_chaos_hub ? 1 : 0
  
  org_id      = local.org_id
  project_id  = local.project_id
  name        = var.chaos_hub_name
  description = var.chaos_hub_description
  
  connector_id    = var.create_git_connector ? one(harness_platform_connector_git.chaos_hub[*].id) : var.chaos_hub_connector_id
  repo_branch     = var.chaos_hub_repo_branch
  repo_name       = var.chaos_hub_repo_name
  is_default      = var.chaos_hub_is_default
  connector_scope = var.chaos_hub_connector_scope
  
  tags = var.chaos_hub_tags
  
  lifecycle {
    ignore_changes = [tags]
  }
}

The configuration intelligently uses either a newly created Git connector or an existing one based on your variables, providing flexibility in how you manage your infrastructure.

Implementing Security Governance

This is where things get interesting. Chaos Guard lets you define rules that control chaos experiment execution.

First, create conditions that define what you want to control:

resource "harness_chaos_security_governance_condition" "this" {
  depends_on = [
    harness_platform_environment.this,
    harness_platform_infrastructure.this,
    harness_chaos_infrastructure_v2.this,
  ]
  
  name        = var.security_governance_condition_name
  description = "Condition to block destructive experiments"
  org_id      = local.org_id
  project_id  = local.project_id
  infra_type  = var.security_governance_condition_infra_type
  
  fault_spec {
    operator = var.security_governance_condition_operator
    
    dynamic "faults" {
      for_each = var.security_governance_condition_faults
      content {
        fault_type = faults.value.fault_type
        name       = faults.value.name
      }
    }
  }
  
  dynamic "k8s_spec" {
    for_each = var.security_governance_condition_infra_type == "KubernetesV2" ? [1] : []
    content {
      infra_spec {
        operator  = var.security_governance_condition_infra_operator
        infra_ids = ["${harness_platform_environment.this.id}/${harness_chaos_infrastructure_v2.this.id}"]
      }
      
      dynamic "application_spec" {
        for_each = var.security_governance_condition_application_spec != null ? [1] : []
        content {
          operator = var.security_governance_condition_application_spec.operator
          
          dynamic "workloads" {
            for_each = var.security_governance_condition_application_spec.workloads
            content {
              namespace = workloads.value.namespace
              kind      = workloads.value.kind
            }
          }
        }
      }
      
      dynamic "chaos_service_account_spec" {
        for_each = var.security_governance_condition_service_account_spec != null ? [1] : []
        content {
          operator         = var.security_governance_condition_service_account_spec.operator
          service_accounts = var.security_governance_condition_service_account_spec.service_accounts
        }
      }
    }
  }
  
  dynamic "machine_spec" {
    for_each = contains(["Windows", "Linux"], var.security_governance_condition_infra_type) ? [1] : []
    content {
      infra_spec {
        operator  = var.security_governance_condition_infra_operator
        infra_ids = var.security_governance_condition_infra_ids
      }
    }
  }
  
  lifecycle {
    ignore_changes = [name]
  }
  
  tags = [
    for k, v in merge(
      local.common_tags,
      {
        "platform" = lower(var.security_governance_condition_infra_type)
      }
    ) : "${k}=${v}"
  ]
}

This configuration supports multiple infrastructure types including Kubernetes, Windows, and Linux, with specific specifications for each platform type.

Then, create rules that apply these conditions with specific actions:

resource "harness_chaos_security_governance_rule" "this" {
  depends_on = [harness_chaos_security_governance_condition.this]
  
  name          = var.security_governance_rule_name
  description   = var.security_governance_rule_description
  org_id        = local.org_id
  project_id    = local.project_id
  is_enabled    = var.security_governance_rule_is_enabled
  
  condition_ids  = [harness_chaos_security_governance_condition.this.id]
  user_group_ids = var.security_governance_rule_user_group_ids
  
  dynamic "time_windows" {
    for_each = var.security_governance_rule_time_windows
    content {
      time_zone  = time_windows.value.time_zone
      start_time = time_windows.value.start_time
      duration   = time_windows.value.duration
      
      dynamic "recurrence" {
        for_each = time_windows.value.recurrence != null ? [time_windows.value.recurrence] : []
        content {
          type  = recurrence.value.type
          until = recurrence.value.until
        }
      }
    }
  }
  
  lifecycle {
    ignore_changes = [name]
  }
  
  tags = [
    for k, v in merge(
      local.common_tags,
      {
        "platform" = lower(var.security_governance_condition_infra_type)
      }
    ) : "${k}=${v}"
  ]
}

This setup ensures that certain types of chaos experiments require approval or are blocked entirely in production environments, giving you confidence to enable chaos engineering without fear of accidental damage. You can also configure time windows for when experiments are allowed to run.

What Happens After Deployment

Once you've applied your Terraform configuration:

Your service discovery agent starts detecting applications in your configured environments automatically
Your security governance rules are active, controlling how chaos experiments can be executed
Your custom ChaosHubs are synchronized and available for use
Custom image registries are configured if you're using private registries

At this point, you can use the Harness UI to create and configure specific chaos experiments, then execute them against your discovered services. The infrastructure and governance layer is handled by Terraform, while the experiment design remains flexible and can be adjusted through the UI.

Putting It All Together

Here's a practical example of what a complete module structure might look like:

module "chaos_engineering" {
  source = "./modules/chaos-engineering"

  # Organization and Project
  org_identifier     = "my-org"
  project_identifier = "production"
  
  # Infrastructure
  environment_id    = "prod-k8s"
  infrastructure_id = "k8s-cluster-01"
  namespace         = "default"
  
  # Chaos Infrastructure
  chaos_infra_name      = "prod-chaos-infra"
  chaos_infra_namespace = "harness-chaos"
  chaos_ai_enabled      = true
  
  # Service Discovery
  service_discovery_agent_name = "prod-service-discovery"
  sd_namespace                 = "harness-delegate-ng"
  
  # Custom Registry (optional)
  setup_custom_registry = true
  registry_server       = "my-registry.io"
  registry_account      = "chaos-experiments"
  is_private_registry   = true
  
  # Git Connector for ChaosHub
  create_git_connector   = true
  git_connector_name     = "chaos-experiments-git"
  git_connector_url      = "https://github.com/myorg/chaos-experiments"
  git_connector_username = "myuser"
  git_connector_password = "account.github_token"
  
  # ChaosHub
  create_chaos_hub      = true
  chaos_hub_name        = "production-experiments"
  chaos_hub_repo_branch = "main"
  chaos_hub_repo_name   = "chaos-experiments"
  
  # Security Governance
  security_governance_condition_name = "block-destructive-faults"
  security_governance_condition_faults = [
    {
      fault_type = "pod-delete"
      name       = "pod-delete"
    }
  ]
  
  security_governance_rule_name         = "production-safety-rule"
  security_governance_rule_user_group_ids = ["platform-team"]
  security_governance_rule_is_enabled   = true
  
  # Tags
  tags = {
    environment = "production"
    managed_by  = "terraform"
    team        = "platform"
  }
}

Best Practices

As you build out your chaos engineering automation, keep these practices in mind:

Start with non-production environments - Test your Terraform configurations and governance rules in development or staging before rolling out to production.

Use separate state files - Maintain separate Terraform state files for different environments to prevent accidental cross-environment changes.

Version your chaos experiments - Store experiment definitions in Git repositories and reference them through ChaosHubs for better collaboration and change tracking.

Leverage conditional resource creation - Use count parameters to optionally create resources like custom registries or Git connectors based on your needs.

Implement proper authentication - Use Harness secrets management for storing sensitive credentials like registry passwords and Git authentication tokens.

Review governance rules regularly - As your understanding of system resilience grows, update your governance conditions and rules to reflect new insights.

Use time windows strategically - Configure governance rules with time windows to allow experiments only during business hours or maintenance windows.

Tag everything - Proper tagging helps with cost tracking, resource management, and understanding relationships between resources.

Combine with CI/CD - Integrate your chaos engineering Terraform configurations into your CI/CD pipelines for fully automated infrastructure deployment.

Moving Forward

Automating chaos engineering with Terraform removes friction from adopting resilience testing practices. You can now treat your chaos engineering setup like any other infrastructure component, with version control, code review, and automated deployment.

The key is starting small. Pick one environment, set up the basic infrastructure and service discovery, then gradually add governance rules and custom experiments as you learn what works for your systems.

For more details on specific resources and configuration options, check out the Harness Terraform Provider documentation.

What aspects of chaos engineering do you think would benefit most from automation in your organization?

Important Links:

New to Harness Chaos Engineering ? Signup here

Trying to find the documentation for Chaos Engineering ? Go here: Chaos Engineering

‍

Ashutosh Bhadauriya

All this author’s posts

Senior Developer Relations Engineer

Automating Chaos Engineering with Terraform

Why Automate Chaos Engineering?

What You Can Automate

Getting Started

Building Your Configuration

Setting Up Common Configuration

Creating Organization and Project

Setting Up Kubernetes Connector

Creating Environment and Infrastructure

Enabling Chaos Infrastructure

Automating Service Discovery

Configuring Custom Image Registries

Setting Up Git Connector for ChaosHub

Managing ChaosHubs

Implementing Security Governance

What Happens After Deployment

Putting It All Together

Best Practices

Moving Forward

Important Links:

Similar Blogs

How Harness is Using AI to Simplify Chaos Engineering Adoption

AI-Native Application Security

2025

Automating Chaos Engineering with Terraform

Why Automate Chaos Engineering?

What You Can Automate

Getting Started

Building Your Configuration

Setting Up Common Configuration

Creating Organization and Project

Setting Up Kubernetes Connector

Creating Environment and Infrastructure

Enabling Chaos Infrastructure

Automating Service Discovery

Configuring Custom Image Registries

Setting Up Git Connector for ChaosHub

Managing ChaosHubs

Implementing Security Governance

What Happens After Deployment

Putting It All Together

Best Practices

Moving Forward

Important Links:

Similar Blogs

How Harness is Using AI to Simplify Chaos Engineering Adoption

the State of

AI-Native Application Security

2025