In the last 18 months, we have been learning quite a bit about how customers are using the Chaos Engineering product and also the major challenges in the smooth transition from no-chaos to automated-chaos when it comes to adoption of chaos within their organisations. As a result, many new capabilities were being developed that are now unveiled at the ChaosCarnival 2024 conference. This article describes the motivation and functionality of the new features being unveiled today.
Automated onboarding of chaos experimentation on Kubernetes clusters
In this completely automated onboarding process, the user selects a target kubernetes cluster for starting the chaos or resilience journey. Harness Delegate does all the magic. Harness Delegate runs the preparatory work that is required on the target Kubernetes application, such as discovering the running services, relationships among them, creating application boundaries among the services and creating possible chaos experiments. After creating the experiments, a set of safe experiments are selected and run as well to create the initial resilience insights of the target Kubenernetes application.
The hassle of deploying the chaos agent on the target cluster and creating the experiments manually is completely avoided. This feature allows the DevOps teams to start the chaos engineering practices in the lower environments more liberally and efficiently.
Governing capability around chaos orchestration is needed for the controlled execution of chaos experiments to avoid the unwanted loss of setups in lower environments and unexpected critical incidents in production because of self-inflicted chaos.
Controlling the blast radius of chaos experiments is crucial to achieve the smooth running of services while the resilience verification is done. If a weakness is found during chaos experimentation, it should not be achieved at the cost of disrupting the service at unexpected time and prolonging for unexpected duration. The administrator who is overseeing the chaos experimentation should make sure that the chaos experimentation is happening outside the critical business times, being done by the right people and only safe experiments are in play. The administrator should have the capability to set up rules in the product control above desired behavior.
This feature provides the capability to govern the chaos orchestration by the team members. Administrators would be able to set up rules to govern the chaos around who can run what chaos experiments on what targets during which time.
The ChaosGuard feature has individual rules and conditions that can be configured at the project level by those that have the required RBAC permissions. Rule contains one or more conditions. The rules control who can run a chaos experiment and the conditions control the target clusters or namespaces and the type of faults. By combining the rules and conditions, the administrator can effectively control who can run a given experiment, the time window of a chaos experiment, the targeted resources and type of chaos faults.
Resilience Probe Dashboards
A typical chaos experiment consists of a chaos fault while it observes multiple steady states for any deviation. This new feature allows the users to do exactly the opposite - the users can see the effects of different chaos faults on one single steady state of a service.
With this feature, you can see the history of chaos faults that were run while observing a specific steady state such as the status code of a service URL. This feature will help the potential weakness areas for a given part of the service.
In product Chaos Sandbox
Harness Chaos Engineering now comes with a FREE sandbox environment for everyone and by default gives about 20 hours of operational run time that can be managed one hour at a time. The sandbox comes completely packaged with a sample application and the chaos infrastructure pre-built in it. Using this sandbox feature, users interactively learn the process of running and observing chaos experiments along with the direct observation of the sample application.
Users can get to run their first chaos experiment from the Signup within 5 minutes.
In product Chaos Engineering certification
With sandbox features, it is easy to learn how chaos experiments are structured, how to run them and how to make sense of the resulting resilience scores. This will also help to get the lab examination credits that are mandatory for achieving the “Harness Certified Expert in Chaos Engineering - Developer” certification. You can start the certification process here.
Chaos ROI calculator
The need for Chaos Engineering practices is well understood in those environments that have invested in the modern DevOps tools/practices. However, every organisation needs the data for justifying the investment into this new practice especially when it takes additional developers to design/write/automate/maintain the chaos experiments in the lower environments and SREs to do the same in production environments. Investing into chaos engineering can positively surprise you when you have frequent incidents and frequent developer/SRE war rooms.
Harness unveiled a simple yet complete ROI calculator for estimating the costs and potential benefits of investing in Harness Chaos Engineering. Check out the Chaos ROI calculator.
Signup today and run your sample chaos experiment in less than 5 minutes, it is FREE too !