Cloud costs
April 25, 2023
min read

How to Perform a Root Cause Analysis


Root cause analysis (RCA) is a systematic approach used to identify, understand, and resolve the underlying issues that lead to a problem or failure in a process. By determining the root cause, organizations can prevent the same issue from recurring and continuously improve performance. 

This blog discusses the various steps involved in performing a root cause analysis, including identifying the underlying cause, assessing root causes, utilizing the "5 Whys" method and Ishikawa diagram, prioritizing and implementing solutions, and evaluating success.

Identifying the Underlying Cause

Identifying the underlying cause is a critical step in performing a root cause analysis and creates the foundation for the entire analysis. Teams should approach this stage with an open mind, consider all possible causes, and avoid jumping to conclusions until all evidence has been gathered. During this step, teams should focus on collecting as much information as possible.

Identification steps should include:

  • Reviewing any documentation related to the incident, such as reports, logs, and other records
  • Interviews with team members who were involved in or have knowledge of the incident 
  • Examination of any physical evidence related to the incident, such as hardware damage.

Once teams have gathered all relevant information, they can then analyze the data to determine the most likely root cause.

Assessing Root Causes

The next crucial step involves assessing the root causes that contributed to its occurrence. Root cause analysis can be a complex and challenging process, as several factors can contribute to a problem or failure. Taking a structured approach is essential, which is where a decision-making framework or quantitative techniques come in. By using these methodologies, teams can avoid overlooking critical factors that may contribute to the problem. Two popular methodologies are the 5 Whys method and the Ishikawa Diagram.

The 5 Whys Method

The 5 Whys method is a simple yet powerful process for identifying root causes. This technique involves asking "Why?" five times (or as many as needed) to drill down to the core issue. The 5 Whys method encourages critical thinking and enhances the understanding of the problem. By continually asking "Why?" and digging deeper into the issue, teams can reveal a deeper understanding of the problem.

However, simply asking "why" isn't effective on its own. Teams must ask the right questions and maintain focus on the problem at hand. Bringing in individuals with diverse perspectives can help the process stay objective and thoroughly analyze the issue.

Blameless Post-Mortems

While the 5 Whys technique is effective for identifying the root cause, it can also create a culture of blame and lead to defensiveness. Creating a psychologically safe environment is crucial for open and honest discussions, and blameless post-mortems are an excellent way to create this. Blameless post-mortems are systematic analyses of incidents aimed at understanding their causes and implementing preventative measures. 

The primary principle is to focus on learning rather than assigning blame to individuals. In addition, the benefits include: 

  • Focus on systemic issues: By going beyond individual mistakes, the 5 Whys help uncover systemic problems that contribute to incidents.
  • Encourages collaboration: The blameless approach fosters open discussions, allowing diverse perspectives and expertise to contribute to the analysis.
  • Promotes a learning culture: Blameless post-mortems emphasize learning from incidents, enabling teams to make improvements and prevent future occurrences.
  • Prevents recurrence: Identifying root causes and implementing preventive measures reduces the likelihood of repeating the same mistakes.

To conduct a post-mortem:

  • Assemble a diverse team of stakeholders involved in the incident, including technical and non-technical team members.
  • Establish a timeline of events leading up to the incident, ensuring all relevant information is gathered.
  • Apply the 5 Whys technique to each contributing factor or event identified in the timeline.

Benefits of Blameless Post-Mortems with the 5 Whys

Blameless post-mortems, combined with the 5 Whys technique, provide a powerful framework for analyzing incidents in a way that focuses on learning and improvement. By encouraging a blame-free environment, teams can identify root causes, implement preventive measures, and foster a culture of continuous learning and growth.

Ishikawa Diagram

The Ishikawa diagram, also known as a fishbone diagram or cause-and-effect diagram, is another useful tool for root cause analysis. The diagram resembles a fishbone, with the effect or problem at the "head" and potential causes extending outwards as "bones." This method enables teams to visually examine the cause-and-effect relationships between contributing factors, enhancing understanding and promoting collaborative problem-solving.

To create an Ishikawa diagram, brainstorm potential causes, categorize them into related groups, and then arrange them around the central problem.

Prioritizing Solutions

Once root causes have been identified and assessed, the next step is to prioritize possible solutions to implement. Given that resources are often limited, it is essential to prioritize solutions based on factors such as cost-effectiveness, ease of implementation, and potential impact.

  • Cost-effectiveness is a crucial factor to consider when prioritizing solutions. Organizations need to ensure that they are getting the most bang for their buck when implementing solutions. Cost-effective solutions can help organizations save money while still addressing the root causes of the problem. For example, implementing energy-efficient measures can lead to significant cost savings on utility bills over time.
  • Ease of implementation is another important consideration when prioritizing solutions. Solutions that are easy to implement can be put into action quickly, reducing the time it takes to address the root causes of the problem.
  • Potential impact is also a critical factor to consider when prioritizing solutions. Solutions that have a high potential impact can lead to significant improvements in the organization's operations.

Teams should evaluate each proposed solution against these criteria and create a priority list. Organizations can use techniques such as cost-benefit analysis or decision matrix analysis to support the prioritization process. These techniques can help teams objectively evaluate each solution and determine which ones are the most cost-effective, easiest to implement, and have the highest potential impact.

Identifying and prioritizing solutions helps allocate resources strategically, ensuring maximum effectiveness in addressing the root causes. By focusing on the most critical solutions first, organizations can make the most of their limited resources and achieve the greatest impact.

Implementing Solutions

After prioritizing, the focus turns to implementing solutions. Effective implementation requires a well-defined action plan detailing the specific steps, responsibilities, and timelines for each solution. Organizations should also establish monitoring and reporting mechanisms to track the progress of solution implementation and address any issues or obstacles that may arise during execution. It's also important l to stay adaptable. Changes may be made to the original plan based on unexpected events or emerging insights. Organizations should regularly review the progress of the implementation plan and be prepared to make changes as needed.

Team Buy-In and Communication

Ensuring that all team members are on board with the plan is essential. This can be achieved through regular team meetings, progress reports, and status updates. By communicating the plan clearly and effectively to all stakeholders, all team members will understand their roles and responsibilities.

Throughout the implementation process, frequent communication and collaboration are crucial to ensure alignment among team members and stakeholders. By keeping everyone informed and involved, organizations can avoid misunderstandings and ensure that everyone is working toward the same goals.

Finally, be sure to celebrate successes along the way. Recognizing and rewarding progress can help motivate team members and keep them engaged in the implementation process. This can be achieved through public recognition, team-building activities, or other forms of positive reinforcement.

Resource Allocation

Another critical factor in the successful implementation of root cause analysis is the allocation of resources. Organizations need to ensure that they have the necessary resources, including personnel, technology, and funding, to execute the plan effectively. This may require reallocating resources from other areas or securing additional funding from external sources.

Evaluating Success

Evaluating the success of implemented solutions is the final step in performing a root cause analysis. This evaluation should involve measuring the effectiveness of the solutions in addressing the root causes and preventing recurrence. Criteria for success may include reduced incidence of the problem, minimized associated costs, or improved process performance.

One way to evaluate the success of implemented solutions is to collect and analyze data. This data can include metrics such as the number of incidents before and after implementation, the cost savings achieved, and the level of customer satisfaction. Analyzing this data can provide insights into the effectiveness of the solutions and identify any areas for improvement.

Another way to evaluate success is to gather feedback from stakeholders. This feedback can come from employees, customers, suppliers, and other relevant parties. By soliciting feedback, organizations can gain a better understanding of how well the solutions are working and if there are any areas for improvement.

Continuous Improvement and Learning

Evaluating success is not a one-time event. Periodic assessments will help identify any areas requiring adjustments or further intervention. This ongoing evaluation process ensures that the solutions implemented are still effective and that any new issues are addressed quickly. Continuous monitoring and evaluation will help support continuous improvement and learning.

Get Started with Harness

Performing a root cause analysis is a critical endeavor for organizations seeking to identify, understand, and resolve the underlying issues that lead to problems or failures in their processes. By implementing the steps outlined in this article, organizations can successfully conduct a thorough root cause analysis and ultimately improve performance and prevent recurring issues. Organizations can then improve their processes and ultimately enhance their performance and resilience to prevent these issues from arising again. This not only benefits the organization but also customers and stakeholders.

At Harness, our end-to-end software delivery platform provides a simple, safe, and secure way for engineering and DevOps teams to release applications into production. The Harness Service Reliability Management (SRM) module helps SREs and IT operations cut through the noise of monitoring data overload to identify the components or changes that led to an incident. This dramatically reduces the time and resources needed to restore service and perform root cause analysis.

Interested in learning more? Contact us for a free demo

We want to hear from you

Enjoyed reading this blog post or have questions or feedback?
Share your thoughts by creating a new topic in the Harness community forum.

Sign up for our monthly newsletter

Subscribe to our newsletter to receive the latest Harness content in your inbox every month.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Service Reliability Management