4 questions to ask for an effective technical postmortem

A technical postmortem is a retrospective of a failure. It’s a preventative step that can help you quickly identify and address issues with your assets, systems, or other technology platforms so they don’t happen again. They are commonly used in maintenance but also have applications in software development and design as well.

What is a technical postmortem?

A technical postmortem is a retrospective analysis of events that resulted in a technical failure.

The purpose of a technical postmortem is to:

  • Find out what went wrong and why
  • Identify trouble areas
  • Determine what can be done to prevent future failures
  • Create best practices for your business
  • Inform process improvements, mitigate future risks, and promote iterative best practices

4 questions to ask during a technical postmortem

This postmortem outline is not meant to be comprehensive but to serve as a starting point for your technical postmortem. These questions generate discussion about what went well, what the team struggled with during the failure, and what the team would do differently moving forward.

Here’s what you and your team should be asking during a technical postmortem:

1. What happened?

You can’t analyze what you don’t understand, so establishing a clear understanding of what went wrong is crucial.

2. Why did it happen?

Identify the major events that led to the failure and try isolating the root causes for the failure. Determine if the events are the underlying causes of the failure, or if they initiate a process that leads to the technical failure. Some underlying causes can include defects in design, process, or poor maintenance practices.

Look strictly at the technical causes of the failure and examine the underlying management and team environment. Sometimes team members ignore warning signs of impending failure due to the organizational culture, time crunches, and budget pressure.

3. How did we respond and recover?

How your team responds to failure can determine how quickly you identify the root cause and fix it. A major technical failure can have a direct impact on shareholder value, revenues, market share, and brand equity, so a quick recovery is paramount.

A useful technical postmortem requires a reasonable level of honesty, insight, and cooperation from the organization. The outcome of the postmortem should be to recognize what worked and fix the processes that didn’t. Remember, the idea is to learn from your successes and failures, not just to document them.

4. How can we prevent similar unexpected issues from occurring again?

Unexpected technical issues do arise in mission-critical or complex hardware systems. However, the key to prevention is technical planning to prevent problems from affecting the entire system. Each of the failures uncovered in step two represents a risk going forward, so schedule regular inspections or system checks in your maintenance management software.

When a risk is detected, certain actions should be triggered immediately to prevent similar failures. Planning must also consider the business process and management responses the team initiates when a failure occurs. A complete postmortem addresses both technical and management issues.

Don’t turn your postmortem into a blame game. Instead, management has to develop a reputation for listening openly to input and not punishing people for being honest. A well-run postmortem can help a maintenance team create a culture of continuous improvement.

The benefits of conducting a technical postmortem

As we can see from our example, a technical postmortem has a series of positive benefits including a detailed analysis of why an asset failed. It can help you avoid future problems by identifying issues that are present before any kind of launch.

A technical postmortem can also benefit you by:

  • Identifying potential problems with an asset
  • Improving the way your team approaches new projects
  • Learning from mistakes so they don’t happen again
  • Gaining insights into how other teams have handled similar situations

Some next steps after your technical postmortem is completed

After a technical postmortem is conducted and the project is concluded there is a postmortem meeting. This meeting is intended to understand the project from start to finish and determine what can be optimized and improved for the next postmortem. Generally, the project manager and team attend these meetings, but it’s open for anyone part of the project to join.

Tips and tricks to keep in mind during and after your technical postmortem

  • A postmortem can help you become more effective by learning from mistakes and focusing on what worked best, but it’s up to you to structure the meeting to get the most out of it. A way to structure your meeting is by setting a clear agenda, beginning with a recap of the project objectives, reviewing the results and whether or not the project met the set objectives, and lastly, analyzing the successes and failures and why they occurred.
  • You can ensure that your technical postmortem is successful by carefully preparing in advance, analyzing the failure systematically, producing actionable findings, and actively sharing the results.
  • Don’t let the momentum fade with your team. Schedule the postmortem right after the end of the project. A technical postmortem should occur within one to two weeks of the technical failure.
  • Make sure to store your postmortems in the asset record in a CMMS so they can be easily found in the future to prevent similar failures going forward.

A technical postmortem is an important tool for maintaining and improving your systems

A technical postmortem is a tool that allows you to learn from mistakes, identify the root cause of a problem, and improve your systems. It may sound like an abstract concept, but it’s actually quite simple: you document what went wrong and use that information to prevent the same issue from happening again.

Source: https://www.fiixsoftware.com/blog/4-questions-effective-technical-post-mortem/

What is a Fishbone Diagram?

A fishbone diagram, also known as an Ishikawa diagram or a cause-and-effect diagram, was developed by Kaoru Ishikawa in the 1960s. It is a visual representation of the causes of a problem or failure. The diagram is structured as a fish skeleton, with the problem or event being represented by the head of the fish, and the causes of the problem branching off the bones of the fish.

Fishbone diagrams are used in maintenance to identify the root cause of a problem. They can also be used to identify patterns and trends, which can help prevent similar problems from occurring in the future. In this article, we will define what a fishbone diagram is and share a use-case example of how a fishbone diagram can be used.

What is a fishbone diagram?

Fishbone diagrams are a visual tool that shows all the possible reasons a problem or event may have occurred, as well as their source. It can be useful if the maintenance team is coming up short when troubleshooting an issue. Every possible cause is categorized by its source. Causes are then reduced again and again until you can isolate the root cause of a problem or outcome.

fishbone diagram

How do fishbone diagrams work?

A fishbone diagram helps maintenance teamstrace the steps that could have led up to a problem, like a piece of equipment breaking down. Take an aircraft, for example. Let’s say the ground crew engineer discovers that a compressor is malfunctioning. There are many possible causes of the malfunction, but by using a fishbone diagram, the crew can break the problem down into main categories. In this instance, you could isolate the issue in the following steps:

  1. Personal: List out anyone who may have been performing maintenance or repairs on the aircraft
  2. Machinery: Define and outline the technology
  3. Materials: List the raw parts used to construct the aircraft
  4. Measurements: Detail the inspection and steps taken
  5. Environment: Detail the climate, geographical, and other factors relating to the environment
  6. Methods: List the processes

In steps 1 and 2, you could break it down even further and into more detail. You know that some compressor parts were just replaced, and some new staff were working on the plane recently. You can now expand on the primary categories and see if you can identify the factor that caused the overall effect. For example:

Materials

  • New parts may have been installed improperly
  • A part is malfunctioning or was not inspected properly

Personal

  • A technician installed the compressor incorrectly
  • Some tools may be left inside the compressor housing
  • There was something jamming the rotation of the compressors that the mechanic missed
  • The pilot pushed the compressor too far and may have damaged it during the flight

Environment

  • Bird or drone strike

Measurements

  • The turbine was inspected and compressor wear was noted
  • The inventory for the aircraft parts and labor lists all of the pieces and staff who were active around the aircraft in a 48-hour span

The information that you have linked off of the first stem of ideas brings you closer to discovering the root cause of the problem. You have identified the main possibilities and now you can expand each possible cause by choosing the most probable outcome. This is what that might look like in our example:

The mechanic installed a part incorrectly which caused a malfunction. This caused the turbine to become damaged during a flight. This is the primary cause (also known as the main cause) of the failure.

Now that this hypothesis has been created, inspections can focus on certain traits, which means less time searching for a problem and less overall downtime for the aircraft. Even better, if this sort of problem is documented, there can be preventive and predictive maintenance making sure similar malfunctions are avoided in the future.

Other tools in your arsenal along with fishbone diagrams

There are also methods of troubleshooting, like root cause analysis (RCA) and the 5 whys methodology, which helps increase the chances of isolating the root cause of an issue. A fishbone diagram is a handy tool for troubleshooting any mechanical, electrical, or operational issue. As demonstrated in the example above, allow yourself to isolate and categorize the potential problems into subcategories making the troubleshooting fluent and efficient.

In the case of the aircraft example, knowing certain mechanical failures could possibly reoccur, you could store the part on-site or you could introduce more regular inspections to prevent further failures and minimize downtime. A fishbone diagram allows a simple but logical process of elimination which leads to faster problem resolution, ensuring your business reduces downtime and increases productivity.

Source: https://www.fiixsoftware.com/blog/fishbone-diagrams/