A fishbone diagram, also known as an Ishikawa diagram or a cause-and-effect diagram, was developed by Kaoru Ishikawa in the 1960s. It is a visual representation of the causes of a problem or failure. The diagram is structured as a fish skeleton, with the problem or event being represented by the head of the fish, and the causes of the problem branching off the bones of the fish.
Fishbone diagrams are used in maintenance to identify the root cause of a problem. They can also be used to identify patterns and trends, which can help prevent similar problems from occurring in the future. In this article, we will define what a fishbone diagram is and share a use-case example of how a fishbone diagram can be used.
What is a fishbone diagram?
Fishbone diagrams are a visual tool that shows all the possible reasons a problem or event may have occurred, as well as their source. It can be useful if the maintenance team is coming up short when troubleshooting an issue. Every possible cause is categorized by its source. Causes are then reduced again and again until you can isolate the root cause of a problem or outcome.
How do fishbone diagrams work?
A fishbone diagram helps maintenance teamstrace the steps that could have led up to a problem, like a piece of equipment breaking down. Take an aircraft, for example. Let’s say the ground crew engineer discovers that a compressor is malfunctioning. There are many possible causes of the malfunction, but by using a fishbone diagram, the crew can break the problem down into main categories. In this instance, you could isolate the issue in the following steps:
Personal: List out anyone who may have been performing maintenance or repairs on the aircraft
Machinery: Define and outline the technology
Materials: List the raw parts used to construct the aircraft
Measurements: Detail the inspection and steps taken
Environment: Detail the climate, geographical, and other factors relating to the environment
Methods: List the processes
In steps 1 and 2, you could break it down even further and into more detail. You know that some compressor parts were just replaced, and some new staff were working on the plane recently. You can now expand on the primary categories and see if you can identify the factor that caused the overall effect. For example:
A part is malfunctioning or was not inspected properly
A technician installed the compressor incorrectly
Some tools may be left inside the compressor housing
There was something jamming the rotation of the compressors that the mechanic missed
The pilot pushed the compressor too far and may have damaged it during the flight
Bird or drone strike
The turbine was inspected and compressor wear was noted
The inventory for the aircraft parts and labor lists all of the pieces and staff who were active around the aircraft in a 48-hour span
The information that you have linked off of the first stem of ideas brings you closer to discovering the root cause of the problem. You have identified the main possibilities and now you can expand each possible cause by choosing the most probable outcome. This is what that might look like in our example:
The mechanic installed a part incorrectly which caused a malfunction. This caused the turbine to become damaged during a flight. This is the primary cause (also known as the main cause) of the failure.
Now that this hypothesis has been created, inspections can focus on certain traits, which means less time searching for a problem and less overall downtime for the aircraft. Even better, if this sort of problem is documented, there can be preventive and predictive maintenance making sure similar malfunctions are avoided in the future.
Other tools in your arsenal along with fishbone diagrams
There are also methods of troubleshooting, like root cause analysis (RCA) and the 5 whys methodology, which helps increase the chances of isolating the root cause of an issue. A fishbone diagram is a handy tool for troubleshooting any mechanical, electrical, or operational issue. As demonstrated in the example above, allow yourself to isolate and categorize the potential problems into subcategories making the troubleshooting fluent and efficient.
In the case of the aircraft example, knowing certain mechanical failures could possibly reoccur, you could store the part on-site or you could introduce more regular inspections to prevent further failures and minimize downtime. A fishbone diagram allows a simple but logical process of elimination which leads to faster problem resolution, ensuring your business reduces downtime and increases productivity.
Maintenance troubleshooting can be both an art and a science. A common problem is that, while art can be beautiful, it isn’t known for its efficiency. When taken to the next level, maintenance troubleshooting can ditch the trial-and-error moniker and become a purely scientific endeavor. This helps maintenance technicians find the right problems and solutions more quickly. When troubleshooting is done correctly, your whole maintenance operation can overcome backlog, lost production, and compliance issues much more efficiently.
In this troubleshooting guide, we’ll take a look at what it actually is, why it matters to maintenance professionals, and how your team can fine-tune its approach.
What is maintenance troubleshooting?
Systems break down—that’s just a fact of life. Whether it’s a conveyer belt or an industrial drill, we’ve all run across a piece of equipment that is unresponsive, faulty, or acting abnormally for seemingly no reason at all. It can be downright frustrating.
Maintenance troubleshooting is the process of identifying what is wrong with these faulty components and systems when the problem is not immediately obvious. Maintenance troubleshooting usually follows a systematic, four-step approach; identify the problem, plan a response, test the solution, and resolve the problem. Steps one to three are often repeated multiple times before a resolution is reached.
Identify the problem
Plan a response
Test the solution
Repeat until problem is resolved
Think about it this way: When a conveyor belt breaks down, you may try a few different methods to fix it. First, you identify which part of the conveyor belt isn’t working. Once you’ve identified the problem area, you plan a response and test it, such as realigning or lubricating a part. If this fails to fix the problem, you might replace the part, which makes the conveyor belt work again. This is troubleshooting.
How is maintenance troubleshooting usually done?
Stop us if you’ve heard this story before. An asset breaks down and no one knows why. You talk to the operator, read some manuals, and check your notes about the asset. You try a couple of things to get the machine up and working again with no luck. Before you can try a third or fourth possible solution, you get called away to another emergency, with the asset still out of commission.
This is often how the process happens when performing maintenance troubleshooting, especially when a facility relies on paper records or Excel spreadsheets. The process is based on collecting as much information as possible from as many sources as possible to identify the most likely cause of the unexpected breakdown. You can never go wrong when you gather information, but it’s the way that information is gathered that can turn troubleshooting from a necessity to a nightmare.
Why does maintenance troubleshooting matter?
Unexpected equipment failure is the entire reason maintenance troubleshooting exists. If assets never broke down without any clear signs of imminent failure, there would be no need to troubleshoot the problem. But we know that’s just not the case.
Machinery failure doesn’t always follow a predictable pattern. Yes, maintenance teams can use preventive maintenance and condition-based maintenance to reduce the likelihood of unplanned downtime. However, you can never eliminate it entirely. What you can do is put processes in place to reduce failure as much as possible and fix it as soon as possible when it does occur. This is where strong maintenance troubleshooting techniques come in handy.
Because troubleshooting will always be part of the maintenance equation, humans will also always have a role. Maintenance technology does not erase the need for a human touch in troubleshooting; it simply makes the process much more efficient. When troubleshooting isn’t refined, it could lead to time wasted tracking down information, a substantial loss of production, an unsafe working environment, and more frequent failures. In short, knowing some maintenance troubleshooting techniques could be the difference between an overwhelming backlog and a stable maintenance program.
Maintenance troubleshooting tips
The following are just a few ways your operation can improve its troubleshooting techniques to conquer chaos and take control of its maintenance.
1. Quantify asset performance and understand how to use the results
It probably goes without saying, but the more deeply you know an asset, the better equipped you’ll be to diagnose a problem. Years of working with a certain asset can help you recognize when it’s not working quite right. But exceptional troubleshooting isn’t just about knowing the normal sounds, speeds, or odours of a particular machine. Instead, it’s about knowing how to analyze asset performance at a deeper level, which is where advanced reporting factors in.
When operators and technicians rely solely on their own past experience with a piece of equipment, it leaves them with huge gaps in knowledge that hurt the maintenance troubleshooting process. For example, it leaves too much room for recency bias to affect decision-making, which means that technicians are most likely to try the last thing that fixed a particular problem without considering other options or delving further into the root cause. Also, if maintenance troubleshooting relies on the proprietary knowledge of a few technicians, it means repairs will have to wait until those particular maintenance personnel are available.
Maintenance staff should have the know-how to conduct an in-depth analysis of an asset’s performance. For example, technicians should understand how to run reports and understand KPIs for critical equipment, such as mean time between failure and overall equipment effectiveness. If using condition-based maintenance, the maintenance team should also know the P-F curve for each asset and what different sensor readings mean. When technicians are equipped with a deeper understanding of an asset, it will be easier for them to pinpoint where a problem occurred and how to fix it, both in the short and long-term.
2. Create in-depth asset histories
Information is the fuel that powers exceptional maintenance troubleshooting for maintenance. Knowing how a particular asset has worked and failed for hundreds of others is a good place to start a repair. That’s why manuals are a useful tool when implementing troubleshooting maintenance techniques. However, each asset, facility, and operation is different, which means asset machine failure doesn’t always follow the script. Detailed notes on an asset’s history can open up a dead end and lead you to a solution much more quickly.
A detailed asset history can give you an edge in maintenance troubleshooting in a variety of ways. It offers a simple method for cross-referencing symptoms of the current issue with elements of past problems. For example, a technician can see if a certain type of material was being handled by a machine or if there were any early warning signs identified for a previous failure. The more a present situation aligns with a past scenario, the more likely it is to need the same fix. Solutions can be prioritized this way, leading to fewer misses, less downtime, fewer unnecessary spare parts being used, and more.
When troubleshooting is done correctly, your whole maintenance operation can overcome backlog, lost production, and compliance issues much more efficiently.
When creating detailed asset histories to help with maintenance troubleshooting (as well as preventive maintenance), it’s important to include as much information as possible. Make sure to record the time and dates of any notable actions taken on an asset or piece of equipment. This can include breakdowns, PMs, inspections, part replacement, production schedules, and abnormal behavior, such as smoke or unusual sounds. Next, document the steps taken during maintenance, including PMs or repairs. Lastly, highlight the successful solution and what was needed to accomplish it, such as necessary parts, labor and safety equipment. Make sure to add any relevant metrics and reports to the asset history as well.
Effective maintenance troubleshooting starts with eliminating ambiguity and short-term solutions. Finding the root of an issue quickly, solving it effectively and ensuring it stays solved is a winning formula. Root cause analysis and failure codes are a couple of tools that will help you achieve this goal.
Root cause analysis is a maintenance troubleshooting technique that allows you to pinpoint the reason behind a failure. The method consists of asking “why” until you get to the heart of the problem. For example:
Why did the equipment fail?: Because a bearing wore out
Why did the bearing wear out?: Because a coupling was misaligned
Why was the coupling misaligned?: Because it was not serviced recently.
Why was the coupling not serviced?: Because maintenance was not scheduled.
Why was maintenance not scheduled?: Because we weren’t sure how often it should be scheduled.
This process has two benefits when conducting maintenance troubleshooting for maintenance. First, it allows you to identify the immediate cause of failure and fix it quickly. Second, it leads you to the core of the issue and a long-term solution. In the example above, it’s clear a better preventive maintenance program is required to improve asset management and reduce unplanned downtime.
Failure codes provide a consistent method to describe why an asset failed. Failure codes are built on three actions: Listing all possible problems, all possible causes, and all possible solutions. This process records key aspects of a failure according to predefined categories, like misalignment or corrosion.
Failure codes are useful when maintenance troubleshooting because technicians can immediately see common failure codes, determine the best solution, and implement it quickly. Failure codes can also be used to uncover a common problem among a group of assets and determine a long-term solution.
4. Build detailed task lists
Exceptional maintenance troubleshooting requires solid planning and foresight. Clear processes provide a blueprint for technicians so they can quickly identify problems and implement more effective solutions. Creating detailed task lists is one way to bolster your planning and avoid headaches down the road. This could also be incorporated into routine maintenance.
A task list outlines a series of tasks that need to be completed to finish a larger job. They ensure crucial steps aren’t missed when performing inspections, audits or PMs. For example, the larger job may be conducting a routine inspection of your facility’s defibrillators. This job is broken down into a list of smaller tasks, such as “Verify battery installation,” and “Inspect exterior components for cracks.”
Maintenance technology does not erase the need for a human touch in troubleshooting; it simply makes the process much more efficient.
Detailed task lists are extremely important when conducting maintenance troubleshooting. They act as a guide when testing possible solutions so technicians can either fix the issue or disqualify a diagnosis as quickly as possible. The more explicit the task list, the more thorough the job and the less likely a technician is to make a mistake. Comprehensive task lists can also offer valuable data when failure occurs. They provide insight into the type of work recently done on an asset so you can determine whether any corrective actions were missed and if this was the source of the problem.
There are a few best practices for building detailed task lists. First, include all individual actions that make up a task. For example, instead of instructing someone to “Inspect the cooling fan,” include the steps that comprise that inspection, such as “Check for any visible cracks,” and “Inspect for loose parts.” Organize all steps in the order they should be done. Lastly, include any additional information that may be helpful in completing the tasks, including necessary supplies, resources (ie. manuals), and PPE.
5. Make additional information accessible
We’ve said it before and we’ll say it again; great maintenance troubleshooting techniques are often the result of great information. However, if that information is difficult to access, you will lose any advantage it provides. That is why it is crucial for your operation to not only create a large resource center, but to also make it highly accessible. This will elevate your maintenance troubleshooting abilities and get your assets back online faster when unplanned downtime occurs.
Let’s start with the elements of a great information hub. We’ve talked about the importance of reports, asset histories, failure codes and task lists when performing a troubleshooting method. Some other key resources include diagrams, standard operating procedures (SOPs), training videos, and manuals. These should all be included and organized by asset. If a technician hits a dead-end a troubleshooting procedure, these tools can offer a solution that may have been missed in the initial analysis.
Now that you’ve gathered all your documents together, it’s time to make them easily accessible to the whole maintenance team. If resources are trapped in a file cabinet, on a spreadsheet, or in a single person’s mind, they don’t do a lot of good for the technician. They can be lost, misplaced and hard to find—not to mention the inefficiency involved with needing to walk from an asset to the office just to grab a manual. One way to get around this obstacle is to create a digital knowledge hub with maintenance software. By making all your resources available through a mobile device, technicians can access any tool they need to troubleshoot a problem. Instead of sifting through paper files to find an asset history or diagram, they can access that same information anywhere, anytime.
Using CMMS software for maintenance troubleshooting
If it sounds like a lot of work to gather, organize, analyze and circulate all the information needed to be successful at maintenance troubleshooting, you’re not wrong. Without the proper tools, this process can be a heavy lift for overwhelmed maintenance teams. Maintenance software is one tool that can help ease the load every step of the way. A digital platform, such as a CMMS, takes care of crunching the numbers, organizing data and making it available wherever and whenever, so you can focus on using that information to make great decisions and troubleshoot more effectively.
For example, when building a detailed asset history, it’s important to document every encounter with a piece of equipment. This is a lot of work for a technician rushing from one job to another and difficult to keep track of after the fact. An investment in maintenance software will help you navigate these roadblocks. It does this by allowing technicians to use a predetermined set of questions to make and retrieve notes in real-time with a few clicks.
The same goes for failure codes. The key to using them effectively is proper organization and accessibility. Without those two key ingredients, failure codes become more of a hindrance than a help. One way to accomplish this is to use maintenance software. A digital platform can organize failure codes better than any filing cabinet or Excel spreadsheet and make it easy for technicians to quickly sort them and identify the relevant ones from the site of the breakdown.
The bottom line
Troubleshooting will always exist in maintenance. You will never be 100 percent sure 100 percent of the time when diagnosing the cause of failure. What you can do is take steps to utilize maintenance troubleshooting techniques to ensure equipment is repaired quickly and effectively. By combining a good understanding of maintenance metrics with detailed asset histories, failure codes, task lists, and other asset resources, and making all this information accessible, you can move your troubleshooting beyond trial and error to a more systematic approach.