Chaos engineering helps organizations minimize unplanned downtime's financial and reputational impact. It also lets developers focus on software delivery rather than fire-fighting production incidents. Chaos experiments go beyond traditional unit, integration, and system tests and more closely represent random failures in a real-world production environment. This realistic environment provides insight into how systems behave, equipping teams to understand applications' and infrastructure weaknesses and proactively creating resilience to help prevent costly downtime. This blog will look closely at the product’s key capabilities to see how it helps teams solve these challenges.
Harness CE provides:
Let’s dive deeper into the capabilities that teams can leverage to increase reliability.
Achieve Continuous ResilienceTM with the native platform integration with Harness CE and Continuous Delivery (CD). Powered by the CNCF project, LitmusChaos, this integration makes it easier for Developers and SREs to test the reliability and resilience of applications in software delivery pipelines to improve overall reliability and minimize the risk of unplanned downtime.
Implement chaos engineering using our SaaS, self-hosted, on-premises, or air-gapped deployments to align with your business and security requirements. Harness supports injecting experiments into multiple platforms and environments. The Enterprise ChaosHub is a catalog of advanced experiments with coverage across VMware, AWS, GCP, Azure, Serverless and a full range of Kubernetes chaos experiments. Chaos experiments enable users to manage, edit, schedule, and run experiments within the UI for improved collaboration. Harness provides the largest and most diverse chaos experiments available today, with many more added monthly.
Chaos orchestration enables users to build a CE practice quickly by letting the Harness solution fill the gaps in the organization's knowledge, processes, and tools. Utilize Harness CE to train new and existing employees to level everyone up on software reliability.
Roll out chaos engineering to the entire enterprise from a Git repository instead of waiting years to adopt the CE practice team by team. Start your entire enterprise on the chaos engineering practice to scale software reliability to every application. Leverage GitOps and CI/CD integrations to automate the complexity and meet developers where they are by providing declarative YAML files for chaos experiments that improve the developer experience.
The GitOps feature enables you to configure a single source of truth for your chaos experiments and execute them directly from Git, allowing a vast scope of automation in CI/CD pipelines.
A team can manage reliability through the resilience score to define, measure, and tune each experiment to track resiliency over time and automate experiment results.
Rather than have developers manually look at monitoring dashboards and have “eyes on glass” with multiple browser tabs open, Harness CE provides probes that can automate the experiment's measurement. Probes are editable checks you can define for any chaos experiment to measure an experiment's success and failure conditions. Chaos Probe examples include simple querying of application health checks and system steady state metrics.
A GameDay is a series of experiments that serves a purpose, such as:
The Harness Chaos Engineering platform’s GameDay feature constructs experiments to test with a team. Your GameDay is repeatable by defining it as a template. The feature enables a user to start, stop, and re-run experiments within one UI, allowing a team to test in small increments of failure. The team can also take notes and observations and create a checklist of tasks they need to complete, which can be added to a ticketing system.
Harness provides declarative chaos experiments to define configuration in a code repository, version, and edit through automation. This declarative approach empowers developers to build and automate reliability in their code.
Harness chaos engineering enables you to run faults in parallel (CPU fault + Memory fault) to mimic real-world events. In addition to this approach, you can run chaos experiments in parallel to model complex IT outages that often stem from multiple failure modes.
Run various experiments on different targets to simulate cascading failure across more extensive sets of services. This ability enables you to cause a network disruption on one cloud provider’s availability zone and simultaneously run a resource exhaustion experiment, simulating traffic moving over to the redundant system.
Lastly, you can abort an inflight experiment that causes an impact beyond the desired test expectation. Users can manually or automatically set up abort conditions using probes defined with the tested system's health metrics and automate recovery scripts.
Harness CE can send chaos metrics to popular observability and application performance monitoring (APM) solutions that enable developers to integrate with their ecosystem of reliability. This reduces developer toil because Harness CE can plug into their system. Our list includes Prometheus, Grafana, Dynatrace, Keptn, and more. Besides observability and monitoring integrations, you can integrate with load-testing tools or leverage your own test with a custom script.
Different roles require additional views regarding dashboards and reports. Executives might want a high-level risk assessment on a single dashboard. An engineering manager might want to see the reliability status of all services. Regardless, Harness CE has all the experiment data, analytics, and reporting capabilities needed to be the centralized source for reliability.
Harness has built a reputation in the CI/CD industry for having detailed audit trails and fine-grained RBAC. These audit trails make it quick and easy for engineering teams to pass audits, often turning what would be days of effort into just a few hours. Our fine-grained RBAC model means that you can implement a permissions system that meets your organization's needs - no matter how complex.
Harness recognizes that enterprises need to move fast and scale quickly to meet the demands of their business, so we’re equipped to offer enterprise support to ensure your chaos engineering practice can begin as quickly and safely as possible. Harness CE was built by the same team of experts that created the CNCF open-source project, LitmusChaos. This team is ready to support SaaS, on-premises, self-hosted, or air-gapped installations and provide onboarding assistance, feature enhancements, chaos best practices, and custom tooling integration for CI/CD and observability platforms.
Getting started with chaos engineering has never been so simple. If you are ready to see how your organization can adopt this practice and improve reliability, request a demo and sign up for the SaaS trial today!
Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.