May 16, 2024

Building robust and resilient Harness pipelines with Failure Handling support

Table of Contents

Key takeaway

Harness pipelines provide robust failure handling mechanisms to ensure resilience and stability in software deployment processes. By configuring failure strategies at different levels—stage, step, and step group—teams can manage deployment failures, infrastructure issues, and human errors effectively. Features like failure rollback and post-deployment rollback enable quick recovery from errors, while marking stages or pipelines as failed allows for immediate intervention. These strategies support continuous improvement and help maintain system reliability by reverting to the last known good state during failures.

Overview

Handling failures effectively is crucial for maintaining the stability and reliability of the systems and for fostering a culture of continuous improvement. The basic principle for handling any failure in a CI/CD pipeline is to make sure that the state of the pipeline reverts to the last known good state to avoid implications and downtime for users.

In this blog, we will talk about what are the different scenarios for failure and how we effectively handle failures in Harness pipelines.

Failure Scenarios

Failures in a pipeline can occur at various stages due to different reasons. Let’s discuss a few of them in this blog and how to handle them gracefully

  1. Deployment Failures:
    1. Failed Deployment Scripts: Errors in scripts or tools used for deployment can cause the deployment to fail.
    2. Environment Configuration Issues: Mismatched or incorrect environment configurations can lead to deployment failures.
  2. Infrastructure Issues:
    1. Resource Limitations: Insufficient resources (CPU, memory, disk space) on servers or build agents can cause failures.
    2. Network Issues: Network connectivity problems can disrupt the pipeline, especially in stages that involve communication with remote servers or services.
  3. Human Errors:
    1. Incorrect Configuration Changes: Manual changes to configuration files or environment settings can introduce errors.
    2. Mistakes in Code Merge: Errors during the merging of code branches can lead to conflicts and failures.
  4. Unresolved Expressions and Environment variables: Unresolved variables and expressions throw exceptions and can cause failures of pipelines.

Ways to gracefully handle failures with Harness

Failures are a part of your DevOps journey, yes this statement seems quite contradictory as DevOps is seen as a way to avoid failures. But, it serves as a crucial learning opportunity that drives continuous improvement and resilience in software development and operations.  

Harness pipelines support significant ways to handle failures, with failure strategies at stage, step and step group level.DevOps experts can configure failure strategies at different levels and handle failures gracefully.

Learn more about different failure strategies supported by Harness.

You can configure failure strategies for different errors that can arise during the execution of a pipeline

Learn more about different error types that can be selected in failure strategy.

Lets deep dive into failure  handling mechanisms in Harness, which will make it easier for you to configure  appropriate failure handling for your pipeline.

Failure handling mechanisms include Failure rollback and Post deployment rollback

Failure rollback

There is a deployment pipeline with a stage that deploys to a certain environment, but if approval is rejected, you want to roll back that stage. In this case, you can use the Rollback Pipeline feature in Harness.

Learn more about Rollback Pipeline in Harness

Post Deployment Rollback

Let’s suppose you deployed a major update to your platform, which includes several new features and improvements aimed at enhancing user experience. The deployment was successful, and the new features went live without any immediate issues. However, a few hours after the deployment, customer support begins receiving a high volume of complaints; for example, users are not able to log in to your platform or complete purchases. To quickly address this, you decide to roll back to the previous stable version of the deployed service to avoid downtime. You have two options: re-deploying to the previous version or performing an immediate rollback. Re-deploying will take time, and you are in a time crunch in this scenario. This is where the Post-Deployment Rollback Feature in Harness comes in handy . It will help you quickly roll back to the previous stable version of your service, allowing you to spend the rest of your time solving the issues that appeared in your new release.

Learn more about using Post Deployment Rollback in Harness.


How can users stop pipeline executions and rollback using the failure handling mechanisms?

Mark stage as failed

Imagine you are deploying an application and have reached the "Wait For Steady State" step in your pipeline. During this step, your monitoring system detects an error in your cluster, indicating that something is wrong with the deployment. Instead of waiting for the step to complete and potentially fail, you want to immediately trigger a rollback or abort the deployment process to prevent further issues. In this scenario, Mark stage as Failed functionality in Harness comes into picture. 

When users mark a stage as failed , the configured rollback mechanisms are automatically triggered using the above discussed failure strategy support.

Learn more about using the Marking stage as Failed in Harness.

Marking Pipeline as Failed

Let's take a scenario where a critical bug is found after running the deployment pipeline, and now you wish to abort the pipeline. However, aborting the pipeline is not the suggested approach since it will not clean up your resources, and you also want each stage in the pipeline to implement its failure strategy when you stop the pipeline. Aborting doesn't support this. To solve this issue, Harness recommends "Mark Pipeline as Failed" to mark the pipeline as failed and apply each stage's failure strategy as well.

Learn more about Marking pipeline as Failed in Harness.

Mark Pipeline as Failure vs Aborting a Pipeline

When we perform a rollback, we aim to clean up resources. Essentially, marking a pipeline as failed applies each step's failure strategy and initiates resource cleanup by executing a rollback deployment. However, aborting a deployment pipeline is not preferred because it doesn't clean up resources and doesn't apply failure strategies defined at either the stage or step level. Aborting a pipeline can be used when running a simple script or task that doesn't utilize any resources.

Conclusion

With different failure scenarios and leveraging Harness's features like marking stages or pipelines as failed and post-production rollback, you can ensure stability and resilience in your software deployment processes. 

Harness pipelines ensure stability and resilience in your software deployment processes with robust failure handling mechanisms and preventing pipeline execution with interrupt support.

Learn more about failure handling in Harness.

Continuous Delivery & GitOps