Systems failure

Systems failure is a significant business risk

Our growing reliance on technology means it is increasingly common for organisations to suffer system failure, causing significant distress to their operations. Recent examples of how companies have been impacted by systems failure include:

  • An airline facing a computer outage caused by a fire impacting its data centre at the group’s headquarters. Around 2,300 flights were cancelled and hundreds of thousands of passengers were delayed.
  • A bank experiencing systems failure meant 600,000 customer payments and direct debits went missing. The failure was caused by the bank’s IT infrastructure struggling to deal with traffic volumes.
  • A large metals producer saw systems fail after a ransomware attack, forcing it to revert to manual operations on some processes. The estimated cost of the incident for the company was $50 million.

Such incidents have the potential to cause severe financial, operational and reputational problems for the organisation, including high costs associated with managing the fallout.

Consumer reactions and regulatory responses to a system failure can result in lost customer revenues and substantial reputational damage. News of the failure can spread extremely quickly on social media, and at the same time regulators’ expectations are increasing. As a result, effective and rapid situational response and crisis management need to be a strategic priority.

Extraordinary challenges

Rapidly and effectively responding to a significant systems failure can be challenging. Some of the common obstacles include:

  1. Lack of assigned responsibility and accountability regarding who owns different elements of the crisis response.
  2. Inability to execute an integrated, cross-functional response.
  3. Lack of experienced, knowledgeable and available experts who can immediately assist with the issue.
  4. Inability to rapidly ramp up resource in call centre teams to respond to the surge in both customer and supplier inquiries.
  5. Difficulty obtaining the necessary data and information required to organise the response e.g. requests for information from the regulator.
  6. Inability to accurately track and manage the costs or scope of the failure, due to system and data challenges.

These challenges highlight the complexities in managing the fallout of a significant systems failure and rapid mobilisation of an effective remediation operation.

Four pillars of effective systems failure response

Taking definitive action within the first 48 hours is critical. In our experience, many companies are unprepared and lose time during this vital period. They focus on organising a response team and obtaining information to enable decisions to be made, whilst the story continues to progress unchecked. As a result, the issue continues to grow. Below, we identify the four pillars of successful systems failure response.

Resilience planning and process design

The remediation team should review and benchmark the current IT service management processes, architectural management approach, service development method and technologies that contribute to the IT resilience capability.

Performing end-to-end IT service risk mapping, by conducting a series of deep dives to develop risk maps that illustrate where and why technology resilience risks are concentrated in the IT estate.

Information management

Companies should be able to:

  • assess the scope of a failure.
  • track and document remediation activity.
  • respond quickly and accurately to requests for data  from the regulator or other external stakeholders
  • track all remediation costs and KPIs.

This requires robust information and project  management platforms.

Understanding the technical estate

A rapid understanding of the end-to-end technical estate is essential. In particular, developing a view of how technologies in place operate and interact to ensure any interdependencies are identified and further systems risks are mitigated during an incident.

Stakeholder communications

Companies need to communicate effectively with  regulatory agencies and other key stakeholders, including  employees, customers, suppliers, insurance carriers,  investors, and the board. The response team must  determine what should be communicated, how and when.

How PwC can help?

Rapid systems review

We can quickly understand the end-to-end technology landscape relevant to the incident; for example:

  • Applications impacted and dependencies e.g. middleware.
  • Data centres and infrastructure hosting environments.
  • Third party cloud providers.
  • Network dependencies including DC LAN, WAN links.

When to get in touch?

Systems failure can be extremely complex and highly disruptive. PwC can rapidly scale up your response to limit the financial, regulatory and reputational impact. To find out more, please get in touch with our dedicated team.

How PwC has supported clients in crisis

International aerospace company

Nature of failure: IT outage that significantly disrupted the company’s operations. 

How we helped: PwC was engaged to undertake a rapid post-incident technical and operational review to help isolate the root cause of the incident and provide a set of recommendations to help prevent similar outages.

We provided a chronological incident report mapping the end-to-end technology landscape and assessing the current delivery model, including the functions provided by third party providers.

A major government organisation

Nature of failure: Significant IT outage that caused systems to be inoperable for ten days

How we helped: PwC was engaged to investigate the incident. We found that there were multiple single points of failure in the hosting technology, which the client understood to be resilient. We also found that key service and supplier management processes were not sufficient to manage system resilience. We then helped the client through the remediation programme, from both a technology and process perspective.

Contact us

Umang Paw

Umang Paw

Chief Technology Officer, PwC United Kingdom

James Cooke

James Cooke

Director, PwC United Kingdom

Tel: +44 (0)7718 864896

Steven Bewick

Steven Bewick

Forensic Services Leader, PwC United Kingdom

Tel: +44 (0)7725 706095

We unite expertise and tech so you can outthink, outpace and outperform
See how
Follow us