The Rail Safety and Standards Board Podcast

When Software Goes Wrong—A Quick Recovery

March 03, 2021 RSSB Season 1 Episode 15
The Rail Safety and Standards Board Podcast
When Software Goes Wrong—A Quick Recovery
Show Notes

In this sixth podcast about software failures in safety-critical systems, Dr Emma Taylor talks about an incident that happened in 2014 during normal working of the National Air Traffic System. We look at what went wrong, and how good recording and documentation at each stage in the V-model allowed a complete shutdown of the air traffic control system for southern England to be quickly reinstated—without any harm to the thousands of passengers in the air.

02:05 The incident and its impact on passengers; and what the railway can learn from it.

04:20 What's coming for the railway as it introduces more and more digital parts.

04:43 The system definition step in the V-model, and assumptions made about the core software.

07:15 Why the latent software fault wasn't found; the failure, and safety hazard categorisation.

09:20 How good documentation and work logs narrowed the search for the faulty line of code.

10:51 Specifying the ability of a complex software-based system to log changes and faults.

11:39 The recommendations from the NATS report that will help find the 'needle in the haystack'.

14:04 The need to manage software quality in the supply chain.

15:12 Don't ask suppliers deliver beyond their capabilities.

16:44 Retaining development information, auditing the evidence, verifying processes, and formal error management systems.

Resources mentioned in this episode:

NATS System Failure 12 December 2014 – Final Report, Independent Enquiry https://www.nats.aero/wp-content/uploads/2015/05/Independent-Enquiry-Final-Report-2.0.pdf 

Loss of safety critical signalling data on the Cambrian Coast line, 20 October 2017: https://www.gov.uk/raib-reports/report-17-2019-loss-of-safety-critical-signalling-data-on-the-cambrian-coast-line 

The digital bits of a system podcast https://www.orr.gov.uk/guidance-compliance/rail/health-safety/strategy/rm3 

The V-model on Geeks for Geeks.org: https://www.geeksforgeeks.org/software-engineering-sdlc-v-model/ 

The V-model for humans on Wikipedia: https://en.wikipedia.org/wiki/V-Model_(software_development) 

 Other related resources:

LHSBR Infrastructure Asset Integrity section: https://www.rssb.co.uk/safety-and-health/leading-health-and-safety-on-britains-railway/infrastructure-asset-integrity 

LHSBR Rolling Stock Asset Integrity section: https://www.rssb.co.uk/safety-and-health/leading-health-and-safety-on-britains-railway/rolling-stock-asset-integrity