Why major accidents are not due to random failures

Jamie Elliott discusses how understanding systemic risks can help us achieve the next level of major hazard safety.

Published: 2 November, 2016

On 1 July 2002 two planes collided in mid-air over the German town of Überlingen. Whilst the technology may be different, this accident from the aviation industry can teach us important lessons in Oil & Gas.

Mid-air collision is a well-recognised hazard in aviation. To prevent this hazard, two independent barriers were implemented. Therefore, we would think that an accident is unlikely to happen but unfortunately history shows us different.

The first barrier was Air Traffic Control (ATC) which monitors the positions of aircraft and if they are on a collision course issues an advisory to one pilot to climb and to the other pilot to descend. The second barrier was a Traffic Collision Avoidance System (TCAS) on board both aircraft. This could also detect an impending mid-air collision and warn the pilots to take avoiding action.

Unfortunately, the ATC and the TCAS gave conflicting advisory messages. The first pilot was told by ATC to descend but TCAS told him to climb, and vice-versa for the other pilot. The first pilot followed the instructions of ATC and the other followed the instructions of TCAS and both descended, colliding with each other. Everybody on board both aircraft was killed.

This accident challenges many of our common assumptions.

Firstly, it was a no-failure accident. The ATC, TCAS, pilots and planes all performed within acceptable limits. So, it is surprising to find that we can have an accident without any person or component failing.
Secondly, we can see that adding a second, independent layer of protection has actually made things less safe, not more safe as we would expect. Adding the TCAS increased the complexity of the overall system and an unanticipated interaction between different parts of the system caused the accident.
Thirdly, this was not a random failure that we could have modelled with quantitative techniques. A quantitative risk analysis can only take account of factors that we know about. But the solution to conflicts between ATC and TCAS is not to assign a probability or frequency for it happening, but to design it out (as the aviation industry has now done).

This accident and others has led many people to change the way they think about safety in high hazard, complex socio-technical systems like those found in oil & gas. Rather than analysing the potential for random failures of individual people or components, people have started looking at the interactions in the system.

One technique for doing this is called STAMP (Systems-Theoretic Accident Model and Processes) developed by MIT.

DNV’s free webinar happened on Wednesday 9 November. It was demonstrated how STAMP can be used in the oil and gas industry to bring new insights and help us achieve the next level of safety.

The record of the webinar is available for download here.

11/2/2016 8:00:00 AM