June 2, 2025

Understanding APM Probes: How to Monitor Your Apps During Chaos Experiments

Table of Contents

APM Probes transform chaos engineering from guesswork into objective measurement by connecting to existing monitoring systems like Prometheus, AppDynamics, and Splunk to track application performance during failure experiments. These probes provide clear SUCCESS or FAILED results based on real performance data, enabling teams to definitively determine whether their applications can handle unexpected problems. By leveraging your existing observability infrastructure, APM Probes make chaos engineering accessible without requiring additional instrumentation.

APM Probes help you monitor your application's performance during chaos experiments. They connect to monitoring systems to collect important metrics about your application.

The key to effective chaos experiments is measurement. You need to know how your system normally behaves, predict how it should respond to failure, and then measure what happens. APM Probes give you the data to do exactly that - they capture your baseline performance, monitor your application during chaos experiments, and show whether your system is resilient or needs improvement.

When running chaos experiments in Kubernetes environments, these probes give you objective measurements to determine if your application remains stable under stress. Instead of guessing whether your system is resilient, APM Probes provide clear pass/fail results based on actual performance data from your existing monitoring systems.

What is an APM Probe?

APM stands for Application Performance Monitoring, and a probe is a tool that checks how your application is performing. You can think of it as a health check for your apps.

In simple terms, an APM Probe helps you:

  • Query a specific value from your monitoring system (like CPU usage or response time)
  • Compare that value against what you expect
  • Determine if your application is behaving properly

These probes work specifically with Kubernetes environments that use the Harness Delegate. They connect to monitoring systems like Prometheus, AppDynamics, or Splunk Observability to get the data they need.

How APM Probes Enable Effective Chaos Engineering

APM Probes support the core chaos engineering methodology:

Define Steady State Behavior Before any chaos experiment, APM Probes help you establish what "normal" looks like for your application. This includes average response times, error rates, throughput, CPU usage, and memory consumption during typical operations. Understanding your steady state is crucial because you can't measure the impact of chaos without knowing your baseline.

Form Your Hypothesis Based on your steady state measurements, you hypothesize that your system will continue to behave normally even when subjected to real-world failures. For example: "Even when we terminate 50% of our database connections, our API response time will remain under 200ms and error rates will stay below 1% because we have connection pooling and circuit breakers configured."

Introduce Real-World Variables During the chaos experiment, you introduce variables that reflect actual production failures - network partitions, resource exhaustion, service failures, etc. APM Probes continuously monitor the metrics you established in your steady state, providing real-time feedback about whether your system meets expectations.

Measure and Learn After the experiment, you analyze the probe results to see if your hypothesis was correct. Did response times stay under 200ms? Did error rates remain acceptable? This measurement-driven approach helps you identify gaps in your system's resilience and validates that improvements work.

When Would You Use an APM Probe?

APM Probes are particularly useful when existing APM systems are already monitoring your application. Instead of setting up new monitoring for chaos experiments, you can leverage your existing observability stack.

Here are some common scenarios where APM Probes add value:

  • Testing database failover: Monitor query response times and error rates during database chaos experiments
  • Network partition experiments: Track service-to-service communication latency and success rates
  • Resource exhaustion tests: Watch CPU, memory, and disk usage as you stress system resources
  • Load balancer chaos: Measure request distribution and response times when nodes go down

You can also combine APM Probes with other probe types (such as Command or HTTP probes) to obtain infrastructure—and application-level insights from the same experiment.

Setting Up APM Probes in Harness

Let's look at how to set up each type of probe:

AppDynamics Probe

AppDynamics is an APM solution that helps you detect and diagnose complex application performance problems in real time. It provides end-to-end visibility across your entire application ecosystem.

Setting up the AppDynamics probe in Harness is pretty straightforward:

First, make sure you have

  • An active AppDynamics account
  • Access to the AppDynamics API
  • Your authentication credentials

Create the probe

  1. Go to the Resilience probe section in the chaos module, click “New Probe”, and choose “APM Probe”

  1. Name your probe and select "AppDynamics" as the APM Type
  1. You can either select an existing AppDynamics connector or create a new one if you haven’t created one
  1. Configure your AppDynamics controller credentials
  1. Select the delegate and verify the connection
  1. Now the connector is created and selected, click on “Configure Details”
  1. Now you need to set up the metric path you want to monitor

Metric full path

The metric path is just the specific thing you want to measure. For example, if you want to check CPU usage, your path might look like:

Application Infrastructure Performance|Root|Individual Nodes|boutique/adservice-54d59c5594-gggb9|Hardware Resources|CPU|%Busy

Or for memory:

Application Infrastructure Performance|Root|Individual Nodes|boutique/adservice-54d59c5594-gggb9|Hardware Resources|Memory|Used (MB)

LookBack Window (In Minutes):

The lookback window refers to the time range from a specified number of minutes ago up to the current moment, over which data is aggregated

  • Provide the comparison criteria under AppDynamics Data Comparison

  • Now provide the run properties and click “Create Probe”


Splunk Observability Probe

Splunk Observability (formerly SignalFx) is a monitoring platform that provides real-time visibility into cloud infrastructure and applications. It collects metrics, traces, and logs to help you monitor application performance.

For Splunk Observability:

  1. Prerequisites:

    • A Splunk Observability account
    • API access
    • An API token
  2. Setup steps are pretty similar to the AppDynamics process, but you'll use Splunk-specific queries here, like:
sf_metric:cpu.utilization AND host.name:gke-default-pool-667be17c-t588.c.test.internal

Prometheus Probe

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects and stores metrics as time series data, making it well-suited for monitoring containerized environments like Kubernetes.

  1. You'll need:
    • A running Prometheus server
    • Access to the Prometheus API
    • Your application configured to expose metrics
  2. When setting up, you'll specify:
    • TLS configuration if needed
    • Your PromQL query, like:

avg_over_time(probe_duration_seconds{job="prometheus-blackbox-exporter",instance="frontend.boutique.svc.cluster.local:80"}[60s:1s])*1000

Why This Matters

APM Probes turn subjective questions like "is my app resilient?" into objective measurements. When your chaos experiment runs, these probes report either SUCCESS or FAILED, contributing to an overall resilience score.

This means you can confidently determine whether your application can handle unexpected problems, backed by real data from your monitoring systems.

Get Started with Your Chaos Experiment

APM Probes are a powerful feature in the Harness Chaos Engineering module that helps you objectively measure your application's resilience. By integrating with your existing monitoring systems, they provide valuable insights during chaos experiments without requiring any additional instrumentation.

Even if you're new to chaos engineering, these probes provide a straightforward way to measure your applications' performance under stress. They're a valuable addition to your toolkit for building more reliable systems.

Ready to learn more? Sign up for a demo or a free account today.

Happy resilience testing 💥

You might also like
No items found.
Chaos Engineering