APM Probes transform chaos engineering from guesswork into objective measurement by connecting to existing monitoring systems like Prometheus, AppDynamics, and Splunk to track application performance during failure experiments. These probes provide clear SUCCESS or FAILED results based on real performance data, enabling teams to definitively determine whether their applications can handle unexpected problems. By leveraging your existing observability infrastructure, APM Probes make chaos engineering accessible without requiring additional instrumentation.
APM Probes help you monitor your application's performance during chaos experiments. They connect to monitoring systems to collect important metrics about your application.
The key to effective chaos experiments is measurement. You need to know how your system normally behaves, predict how it should respond to failure, and then measure what happens. APM Probes give you the data to do exactly that - they capture your baseline performance, monitor your application during chaos experiments, and show whether your system is resilient or needs improvement.
When running chaos experiments in Kubernetes environments, these probes give you objective measurements to determine if your application remains stable under stress. Instead of guessing whether your system is resilient, APM Probes provide clear pass/fail results based on actual performance data from your existing monitoring systems.
APM stands for Application Performance Monitoring, and a probe is a tool that checks how your application is performing. You can think of it as a health check for your apps.
In simple terms, an APM Probe helps you:
These probes work specifically with Kubernetes environments that use the Harness Delegate. They connect to monitoring systems like Prometheus, AppDynamics, or Splunk Observability to get the data they need.
APM Probes support the core chaos engineering methodology:
Define Steady State Behavior Before any chaos experiment, APM Probes help you establish what "normal" looks like for your application. This includes average response times, error rates, throughput, CPU usage, and memory consumption during typical operations. Understanding your steady state is crucial because you can't measure the impact of chaos without knowing your baseline.
Form Your Hypothesis Based on your steady state measurements, you hypothesize that your system will continue to behave normally even when subjected to real-world failures. For example: "Even when we terminate 50% of our database connections, our API response time will remain under 200ms and error rates will stay below 1% because we have connection pooling and circuit breakers configured."
Introduce Real-World Variables During the chaos experiment, you introduce variables that reflect actual production failures - network partitions, resource exhaustion, service failures, etc. APM Probes continuously monitor the metrics you established in your steady state, providing real-time feedback about whether your system meets expectations.
Measure and Learn After the experiment, you analyze the probe results to see if your hypothesis was correct. Did response times stay under 200ms? Did error rates remain acceptable? This measurement-driven approach helps you identify gaps in your system's resilience and validates that improvements work.
APM Probes are particularly useful when existing APM systems are already monitoring your application. Instead of setting up new monitoring for chaos experiments, you can leverage your existing observability stack.
Here are some common scenarios where APM Probes add value:
You can also combine APM Probes with other probe types (such as Command or HTTP probes) to obtain infrastructure—and application-level insights from the same experiment.
Let's look at how to set up each type of probe:
AppDynamics is an APM solution that helps you detect and diagnose complex application performance problems in real time. It provides end-to-end visibility across your entire application ecosystem.
Setting up the AppDynamics probe in Harness is pretty straightforward:
Metric full path
The metric path is just the specific thing you want to measure. For example, if you want to check CPU usage, your path might look like:
Or for memory:
LookBack Window (In Minutes):
The lookback window refers to the time range from a specified number of minutes ago up to the current moment, over which data is aggregated
Splunk Observability (formerly SignalFx) is a monitoring platform that provides real-time visibility into cloud infrastructure and applications. It collects metrics, traces, and logs to help you monitor application performance.
For Splunk Observability:
Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects and stores metrics as time series data, making it well-suited for monitoring containerized environments like Kubernetes.
APM Probes turn subjective questions like "is my app resilient?" into objective measurements. When your chaos experiment runs, these probes report either SUCCESS or FAILED, contributing to an overall resilience score.
This means you can confidently determine whether your application can handle unexpected problems, backed by real data from your monitoring systems.
APM Probes are a powerful feature in the Harness Chaos Engineering module that helps you objectively measure your application's resilience. By integrating with your existing monitoring systems, they provide valuable insights during chaos experiments without requiring any additional instrumentation.
Even if you're new to chaos engineering, these probes provide a straightforward way to measure your applications' performance under stress. They're a valuable addition to your toolkit for building more reliable systems.
Ready to learn more? Sign up for a demo or a free account today.
Happy resilience testing 💥