Failures don't matter; consequences do

Maintenance is about preventing failure, isn't it? Well, RCM says that it isn’t.

That probably sounds surprising, so let's take an example.

A pump is used about once per day to evacuate waste water from a tank. If it fails, there is enough capacity in the tank to allow water to be stored for another two days. Even if the pump cannot be repaired quickly, it takes two maintainers about 3 hours to arrange a temporary system to empty the tank and keep the process running.

The same type of pump is used to pump water in a cooling system. If it breaks down, the process that it supports shuts down within two minutes, and it takes at least two hours to repair or replace the pump. The cost of downtime is between $2000 and $3000 per hour.

If I ask you whether we should put more maintenance effort into the first pump or the second, you don’t have to think for long: the second pump is more important. The effects of the first pump’s failure are so slight that we might even decide to leave alone and fix it when it fails.

Although these are identical pumps that suffer the same failure, there is a fundamental difference between them: what happens when they fail. In the first case, the effects are probably limited to the cost of repairing the pump. Production stops immediately when the second pump fails, and downtime costs quickly outweigh the cost of repair. We care far more about failures of the second pump than those of the first. In other words, it is the effects or consequences of failure that matter, not the failure itself.

This approach to maintenance tasks is central to RCM: the purpose of maintenance is not to prevent failures, but to manage the consequences of failure.