Jump to content
Test ×

MTBF: How have you been using the Mean Time Between Failures metric?


Recommended Posts

  • Community Leader

With the increasingly competitiveness of a globalized market, it is important to put our minds to efficient maintenance control practices, inventory and various other production related items.

When it comes maintenance control practices, there are several different metrics that can support us on understanding our weaknesses and making decisions that will lead us to better results in the future.

For this week's topic, we are going to discuss a little bit about a widely known metric: the Mean Time Between Failures, also called MTBF.

Mean Time Between Failure (MTBF) refers to the average amount of time that a device or product works before it fails. This unit of measure includes only the operating time between failures and does not include repair times, assuming the item has been repaired and starts working again. MTBF values are often used to evaluate the probability that a single unit will fail within a certain period of time.

How do we calculate it?

MTBF = Number of operation hours ÷ Number of failures

For instance:

MTBF = 1,000 operational hours ÷ 17 failures

MTBF = 1,000 ÷ 17

MTBF = 58.8 hours

 

Once you have the MTBF of your equipment, it is possible to determine the reliability of this asset for a period of time, displaying how likely it is for an M&R professional to achieve his goals. Another important KPI that can be calculated using the MTBF, is the availability (it also requires other metrics) of an equipment or system.

 

Now, how about telling us how frequently you use it in your company and how it helps you on reaching your goals and improving results?

 

If you do not use it, tell us why you and what keeps you away from this metric!

 

Regards,

Raul Martins

 

 

 

 

  • Like 1
Link to post
Share on other sites

I've done a lot of reliability work as have a number of my colleagues. MTBF is one of the parameters needed to do proper analysis (e.g.: Weibull) so it's a valuable piece of information. Of course knowing whether the failure is age / usage related, random or infant mortality is also very important in making decisions about failure management approaches.

Unless we run our own "studies" to capture data that we can rely on, most of us will rely on data captured in the CMMS/EAM. All too often that data is not fit for purpose, at least not without a considerable amount of effort to scrub it clean. Aside from incomplete / missing records of failure events, we must determine whether or not the WO that comprises any given CMMS/EAM "record" was done to correct a failure or for some other reason. We are interested only in those events where a failure actually occurred, or clearly would have occurred in a very short time had intervention not been carried out. 

If we simply count up number of times an asset was worked on we will probably include PMs performed, proactive preventive change outs (which are usually done well before failure), repairs in response to false alarms, etc. In speaking with colleagues both in the field and in academia, the quality of data available for reliability purposes is usually very low. 

Taking that just a bit further, we get into that situation because we don't set our CMMS/EAM systems up to capture data that is useful to reliability. Many of those systems aren't even capable of doing it very well. Programmers are not engineers and most engineers are not reliability engineers (or even reliability conscious). What are your chances of finding a CMMS/EAM that is actually designed with reliability in mind?

  • Like 1
Link to post
Share on other sites
50 minutes ago, UptimeJim said:

I've done a lot of reliability work as have a number of my colleagues. MTBF is one of the parameters needed to do proper analysis (e.g.: Weibull) so it's a valuable piece of information. Of course knowing whether the failure is age / usage related, random or infant mortality is also very important in making decisions about failure management approaches.

Unless we run our own "studies" to capture data that we can rely on, most of us will rely on data captured in the CMMS/EAM. All too often that data is not fit for purpose, at least not without a considerable amount of effort to scrub it clean. Aside from incomplete / missing records of failure events, we must determine whether or not the WO that comprises any given CMMS/EAM "record" was done to correct a failure or for some other reason. We are interested only in those events where a failure actually occurred, or clearly would have occurred in a very short time had intervention not been carried out. 

If we simply count up number of times an asset was worked on we will probably include PMs performed, proactive preventive change outs (which are usually done well before failure), repairs in response to false alarms, etc. In speaking with colleagues both in the field and in academia, the quality of data available for reliability purposes is usually very low. 

Taking that just a bit further, we get into that situation because we don't set our CMMS/EAM systems up to capture data that is useful to reliability. Many of those systems aren't even capable of doing it very well. Programmers are not engineers and most engineers are not reliability engineers (or even reliability conscious). What are your chances of finding a CMMS/EAM that is actually designed with reliability in mind?

Fully agree with UptimeJim.

For MTBF it is really crucial to have quality data available. And as some other maintenance KPI's are normally used for monitoring maintenance performance, CMMS/EAM very often cannot provide all the necessary data. E.g. in many cases some interfaces with Operations Management IT Systems are needed for precise inputs on asset uptime/downtime, etc.

On the other hand, even the top-notch IT System does not help much, if the discipline of the maintenance team is not at the required level, as the data are not dully put in the system.  I've personally seen lots of issues with the latter.

It may be useful to first establish handful of other maintenance KPIs before utilizing MTBF.  It helps to improve the understanding of why the KPIs and why there's a need for accurate and timely data. EN 15341:2019 Maintenance - Maintenance Key Performance Indicators may provide some guidance, even though there's a huge number of KPIs in there and it is absolutely necessary to decide which ones to use and for what purpose.

In some maintenance organizations it can prove to be helpful to introduce the reliability engineer, who helps others understand the importance of reliability and all the necessary activities associated to it. And, step by step, works towards meeting all the prerequisites for MTBF to be calculated correctly and, more importantly, acted upon.

  • Like 1
Link to post
Share on other sites
  • Community Leader

Hi all,

I guess the dream of every Reliability Engineer is having a reliable source of data (which includes myself). Unfortunately, lack of reliable maintenance records is a very common issue and dealing with this situation has become part of the daily routine of the vast majority of the Reliability Engineers (although it is wrong).

For several times I found myself collecting data by myself instead of using the information available on the CMMS.

We discussed a little bit about how we deal with this situation in another topic a few weeks ago. The link can be seen below:

https://bit.ly/2R1Kjdp

 

I remember an occasion that I used the MTBF for an equipment that was a bad actor of a fertilizers plant. Basically, that plant was going through big revenue losses and had been working reactively. They had a reactor that was a real bad actor for the results of the plant and something had to be done immediately. Firstly, I collected the data available on the CMMS to determine the MTBF and create a preventive plan of replacing a few componentes every X weeks. Although I knew that was not the best way to create a proper PM program, that worked and a couple of months later, some results could be seen. Having better results and a bit more time, I could later do a better study by using LDA analysis to understand the behaviour of the failure modes and then improve the PM program.

Have you guys done something similar using the MTBF to improve your results?

 

Regards,
Raul Martins

 

  • Like 1
Link to post
Share on other sites

 

Hi Raul,

Its good idea!

ADempiere is open source and every student/programmer or any other can change for own need, but its also community to determine what can/could not-on official version.

I was re-program ADempiere for my own needs, and there is way how to add and MTFB, but its need to define complex fixed (enterprise) assets.
1) example if assets is induction engine from elevator number 1234xy 

Operational hours elevators (have- controlling electrical induction 3 phases engine 3 x 380V 3-25 kW) is time counter values in [sec].
this information can be joined using SQL with replacements data collected from sales orders (but not all products(items) populate sales order lines)  

-only can be take information from bearings (one bearings per sub-asset - electrical engine) 
and replacement coils from induction engine/ switch breakers - only one products/parts/items per assets ) 

sample for 3,7[kW\, 22[m] (7 level/doors/24 flats 100 persons) speed 1,2 [m/s] 20 [sec] up/down (+/_ load)
apps 4 [hours]/day 1500 [hours]/year (100 persons x 2,4 times per day = 240/60[min] = 4 [h]/[day] )

average replacement of coils from stator : 3 years
average replacement of bearings 2 times per year  
average replacement switch breaker (3 phases x 380V, 16 Amps Delta/Y) 1 per year

(* for this sample I m not calculate servicemen-work-hours operations/ workflows)

MTFB = sum operational hours/failures= 1500 / 3(2 bearing and one switch breaker)=500 [hours]


(** elevator as assets have complex BOM - sales order are not 100% good for this calculation, 
each floor have doors and safety device, doors opening devices/commands/controlers... etc and don't have information is this 
taken/replaced from 
1. floor, or 2 floor 3 floor or .....and this calculated values is not real, 
only can be take information from bearings (one bearings per sub-asset - electrical engine) 
and replacement coils from induction engine/ switch breakers - only one products/parts/items per assets ) 

MTFB = sum operational hours/failures= 1500 / 30 ( 2 x diodes, lights, command door x 4 2 bearings, doors supports upper x 4, kabine elements, 5 sheaves, 3 one switch breaker .... and more + elevator parts )=50 [hours] -- but this calculation is not real!

Downtown for  elevators as assets need to be in minimum, so best way is to change important parts at night every 24 months, information from history statistics data or other producer recommended work period.

2) example 
For hydro stations with all working period - constant speed (no frequency regulated speed) is calculation MTFB possible using electrical meter reading old and new values of [kWh] / nominal power of induction engine- for total operational hours and time and frequency replacement bearings or replacement switch breakers or copper coil replacement from rotor/stator  - for example. For hydro station with non constant speed and intermediate work - there is counter [sec] 

On this manual way this can be pick from database, at that moment ADempiere open source don't have any automatically
solution for this calculation MTFB, way how to do is step 1- issue https://github.com/adempiere/adempiere/issues ADempiere
and explain what/how in step by step and present how this work on your local copy of ADempiere and if community see this benefit
this will be included in next ADempiere realizes. 

regards,

Gordan

Edited by Gordan
  • Like 1
Link to post
Share on other sites
Quote

MTFB = sum operational hours/failures= 1500 / 30 ( 2 x diodes, lights, command door x 4 2 bearings, doors supports upper x 4, kabine elements, 5 sheaves, 3 one switch breaker .... and more + elevator parts )=50 [hours] -- but this calculation is not real!

Information from sales orders (WO) give number of products(items/parts..) and its important to see confusion for example element(part/item/product) doors support upper, info from sales order lines (WO lines)  give for example 20 pcs replacement for 3 years (its confused information, because on 7 level/doors 2 pcs was from level1, 4 from level3, 5 are from level 4, .....etc and sales order line DONT HAVE this information about products(items/parts) PER LEVEL - only TOTAL (from all levels/doors) and part number (EAN code ....) and this doors support upper is universal for any levels/elevator and its redundancy (DUPLICATE). My clients reject for reliability, this information is relevant for purchase orders and prognoses only in QNT and have stochastic cycles, but bearing and switch breaker is good information's.

Edited by Gordan
  • Like 1
Link to post
Share on other sites
On ‎1‎/‎15‎/‎2020 at 7:22 AM, Raul Martins said:

How do we calculate it?

MTBF = Number of operation hours ÷ Number of failures

For instance:

MTBF = 1,000 operational hours ÷ 17 failures

MTBF = 1,000 ÷ 17

MTBF = 58.8 hours

Raul,

Regarding MTBF and per your simple example above, do you feel that MTBF can be, at some level, very macro?

First off, I would definitely use MTBF as a KPI tool in the RCM tool box! What has my pondering is the following.

Let's say each day/week MTBF KPI information is posted either on a TV monitor and/or in a report for all department managers to review. This said, say we use your MTBF example, over the course of 1000 operational hours, the 17 failures are noted and the 58.8 hrs reported.

Following this, the Production Manager, after reviewing the information, makes a comment stating that he does not believe the 58.8 hours between failures. He makes a statement as to the fact that there had been a few occasions whereby the failures occurred hours and/or a day apart. Of course to this person a 58.8 hr laps between failures does not make sense and in reality his logic is supported. NOTE: This is the human factor involved.

In conclusion,

1. Should MTBF be reviewed weekly/monthly?

2. Should MTBF be associated with a more detailed KPI subtype report that displays the failure count, date / hr of the failures, details of the failure (failure mode and code) etc?

3. Would it be fair to say the Devil is in the detail when it comes to MTBF?

Sincerely,

Jim

Link to post
Share on other sites
  • Community Leader

Hi @Jim Vantyghem,

Thank's for the reply. Here are my opinions about your post:

5 hours ago, Jim Vantyghem said:

Regarding MTBF and per your simple example above, do you feel that MTBF can be, at some level, very macro?

MTBF is just a metric like any other. Nothing can be stated regarding an equipment, maintenance area or business just by analysing one single KPI, exactly for that reason any maintenance area use several KPI's to control its performance. Such metric will only give you directions in order to show if you are heading towards the right direction or not.

 

5 hours ago, Jim Vantyghem said:

Let's say each day/week MTBF KPI information is posted either on a TV monitor and/or in a report for all department managers to review. This said, say we use your MTBF example, over the course of 1000 operational hours, the 17 failures are noted and the 58.8 hrs reported.

Following this, the Production Manager, after reviewing the information, makes a comment stating that he does not believe the 58.8 hours between failures. He makes a statement as to the fact that there had been a few occasions whereby the failures occurred hours and/or a day apart. Of course to this person a 58.8 hr laps between failures does not make sense and in reality his logic is supported. NOTE: This is the human factor involved.

This situation is quite common anywhere and I would say that the vast majority of M&R professionals have gone through this situation before. Those who haven't, probably will go through it in the future. However, it is important to say that a M&R professional has to be ready for this sort of issue. A production manager, operator, process engineer may not know how to use it, how it is measured or even what it is. In this situation, the Reliability Engineer has to show them, based on technical books and reliable data that the metric that he has is important to not only to improve the results of the maintenance team, nor the production team, but that metric might play an important role for the sustainability of the company.

 

5 hours ago, Jim Vantyghem said:

1. Should MTBF be reviewed weekly/monthly?

2. Should MTBF be associated with a more detailed KPI subtype report that displays the failure count, date / hr of the failures, details of the failure (failure mode and code) etc?

3. Would it be fair to say the Devil is in the detail when it comes to MTBF?

 

1. In a normal situation, I do not see any reason why to review weekly/monthly MTBF. For me, Mean Time Between Failures is a really helpful tool to improve our results, but not to be on a managerial report. I would review more often KPI's such as availability, production results, budget and maintenance cost per unit. I would use MTBF on my bad actors or on improvement opportunities studies.

2. As I said, I would not focus on the MTBF as a KPI.

3. I would not say that the Devil is in the details. However, as John Wooden used to say: "It's the little details that are vital. Little things make big things happen".

 

Regards,

Raul Martins

Link to post
Share on other sites

Greetings,

The situation of a production manager challenging data validity because his experience doesn't match the number he sees on the monitor is quite common. Firstly, the use of MTBF as a form of performance measure is probably not wise - it's useful (with other parameters) in reliability work but rather meaningless on its own, particularly as an indicator of production performance. Secondly, the concept of "mean" is not well understood. Mean, is but one parameter used in a continuous distribution function to describe failure experience. Using it alone is akin to describing a person as having a given height and nothing else. KPIs need to be carefully selected with thought given to what they can / can not tell you and how that information is meaningful. 

Jim

Link to post
Share on other sites

It might be good to extend the discussion to a set of KPIs that different practitioners have best experience with. There are lots of available sources suggesting a large number of KPIs and at least for the newcomers, it can be difficult to select the ones that can bring most value and stage the others later on.

It is clear that KPIs change with the maturity of maintenance processes and organization, as well as with the level of CMMS/EAM support, yet in my opinion, the topic would be helpful for many. 

My own belief is that it is better to have lower number of KPIs to begin with. It is more important to assure they are meaningful, make good use of them and take actions, rather than acquiring too many KPIs which dissolve at the end of the day and nothing happens based of them.

Andrej

Link to post
Share on other sites

Andrej - I agree with the fewer is better premise and that the set of KPIs used will mature with the organization.

I believe it is key to really think through what information those KPIs can provide and how it might be interpreted or misinterpreted. The example of misusing MTBF (above) is a case in point. I've seen "downtime" used in a way that drove massive investments in spares that were simply not needed. I've seen availability (in its various forms) used to mislead general management into thinking things were just fine, when in fact, they were not. We can choose from among many KPIs that we can measure, but it's the consequences of using them that we want to understand. They can and sometimes do drive behaviors - we want to make sure that they drive the right behaviors.

Unlike an error signal in a control system that is used to adjust an actuator - the human "actuator" also has the ability to make choices. The results are not always so easily predictable. We need to consider not only what we are measuring (error signal) and what we want to encourage (control adjustment), but also what the organization (human actuator) is likely to interpret as intent (e.g.: can this get me in trouble?) and therefore produce a response that may or may not match the desired outcome. More information can produce better outcomes to a degree, but it can also lead to confusion. 

Each KPIs represents an observation on something that is / is not happening. What we want to address are the underlying causes if those trends are unfavorable. Single data points rarely convey the whole story. Too many data points don't inform, they distract. What is important in or to one organization, may not be important in another. What helps one, may harm another. KPIs are sort of clothing - a tailored suit will fit better, at least until the wearer changes shape. One size does not fit all and certainly won't fit all over time. Thoughtful, experienced tailors who can tune into the organization's culture are needed. 

Link to post
Share on other sites

 

Raul, Jim & Andrej,

 

Gentlemen, thank you very much for the feedback! The reason I had posed the question about the underlying data support of the lagging MTBF KPI was to stir up a deeper understanding of its intended use from professionals like yourselves.

I have a principle I follow based off of the term or principle of “Confirmation Bias” which, in short, pertains to a person (or organization) only looking at the truth / facts of a situation that are in alignment with one’s own values and beliefs while ignoring opposing truth / facts that may support a strong challenge. This explanation likens itself to pointing a finger. The pointed finger represents one truth while the opposing 3 fingers can be described as oppositional truth.

It was stated to me at one time that a good attorney always anticipates questions that may be asked especially those that may not support their case. In the case of other department managers possibly pointing out data discrepancies (as per my example) understanding the data captured and associated protocols used to obtain the data is paramount.

NOTE: On the emotional side, I am sure all of you have experienced the 2 steps forward in progress only to be followed by the 5 steps back scenario. All it takes is one person’s unfounded remark or opinion to cause a ripple effect leading to unfavorable progress, trust, and/or support.

So, regarding MTBF (and MTTR), this KPI seems to be spoken of and written about numerous times in articles, books, and throughout the internet etc. as to its importance. MTBF, for me, is a lagging KPI score card to provide feedback only and I would not use it as a means to make any haste decisions. Leading KPIs in support of MTBF (and in general) are the interesting focal points for me.

I agree with your statements of the fewer the better as a start. I also agree that each company may have different reporting needs for desired outcomes. This said, I have a question for you.

What common KPIs (Leading and lagging) would you propose as a good start for a plant to utilize if none exist? As per stated above, I understand that there are variations from company to company, but in general terms I am asking for a generic list.

In closing, I am very grateful to be part of this forum and thankful of your shared data and experiences. All of you definitely create a positive impact for the world of maintenance / engineering reliability.

 

Sincerely,

 

Jim

Link to post
Share on other sites
  • Community Leader

Hi all,

This has been a great discussion so far!

I agree to what has been said. Fewer is better. Too many KPIs not only might be too time consuming to measure, but also might lead to confusion.

Regarding MTBF itself, although it can be defined as a KPI, I prefer simply calling it as a metric. This is because, as I mentioned before, I would not track the MTBF of an equipment, or system periodically. I like to use MTBF only for specific studies, so before the study and after, maybe once more in the middle of the analysis just to check if I am on the right track. It is like a Life Data Analysis, we probably would not do such analysis every single week or month to track the behave of our assets.

@Jim Vantyghem, what if we create a specific topic to discuss about KPIs? I am concerned that, if we talk about too many different KPIs now, this might confuse those who are aiming specifically at MTBF.

 

Regards,

Raul Martins

  • Like 1
Link to post
Share on other sites

Hi Raul,

opening a separate topic on KPI makes a lot of sense.

It would also be useful take one of the existing international standards (e.g. latest edition of EN 15341 Maintenance - Maintenance KPIs or any other one you may prefer) as a reference.

I fully agree with UptimeJim that "experienced tailors who can tune into the organization's culture are needed", yet many organizations simply do not have those available - for different reasons.

Hence, I believe the discussion proposed by Jim to put together a set of potential KPI's to start with, would be useful. I've seen quite a few organizations not utilizing maintenance KPI's besides budget compliance. Unfortunately.

UptimeJim is absolutely right by emphasizing one size does not fit all; yet some colleagues who are sweating their path out of firefighting may still find the discussion on basic KPIs helpful. The adaptation to their specific needs which will inevitably change with the level of development of the maintenance processes (and associated IT systems) will certainly be a must. 

Best regards,

Andrej

 

  • Like 1
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use, Privacy Policy and use of We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue..