Series 6 Episode 10: Operational resilience in a dynamic world - responding to evolving threats and the impact of AI

In this episode, host Tessa Norman is joined by Duncan Scott and Rory Spedding-Jones from PwC’s Technology, Data and Resilience practice to delve into the topic of operational resilience.

Our expert guests reflect on financial services firms’ ongoing operational resilience journeys, how these are likely to evolve beyond the March 2025 UK implementation deadline, and developing regulatory expectations. Our guests also explore the evolving resilience landscape in response to changing threats and market expectations, as well as the transformative potential for AI and other technologies to enhance firms’ resilience capabilities, drive efficiencies and add value both in the short and long term.

Listen on: Apple Podcasts Spotify

Transcript

Tessa Norman: Hi everyone, and welcome to the latest episode of Risk and Regulation Rundown, the podcast where we share our views and insights on financial services, risk and regulatory hot topics. In today's episode, we're returning to a subject that we've talked about before, and which I know is of keen interest to many of our listeners, and that's of operational resilience. We're joined by Duncan Scott, who's a Director in PwC's Technology, Data and Resilience Practice, and a former guest of the podcast, and Rory Spedding-Jones who's a Manager in the same practice. We last talked about this topic on the podcast back in June 2024, and at the time we were talking about how firms were preparing for the UK Operational Resilience Framework deadline of 31st March 2025, as well as their preparations for the EU Digital Operational Resilience Act, known as DORA. So, speaking where we are today, we are recording in mid-March, just a couple of weeks away from that end of March deadline. We thought now was a really prescient time to return to the subject and talk about where firms are now in their operational resilience journey, and reflecting on what lies beyond that deadline and how the world of resilience is changing, particularly in light of the transformative effects of technology. Duncan, it's brilliant to have you back on the podcast, welcome, and let's start our discussion by thinking about that deadline at the end of this month. Given that we're now very close to that deadline, what does that mean for firms? Does that mean all the hard work is done and they can start to think about other issues?

Duncan Scott: Thanks Tessa, it's great to be back and talking about my favourite subject with you again. The short answer is that there's still work to do. Firms have been working on this for quite some time, and the deadline itself talks about the ability for firms to operate within impact tolerance for severe but plausible situations, scenarios, so that's what they've been working towards. What is true is that there are scenarios out there that it's very difficult to operate within those tolerances for, so there will need to be further work, particularly the areas of cyber, if we think about ransomware, those are areas that are particularly challenging to firms and require significant efforts to overcome. It's also important, I find, to reflect on the fact that this is a journey, not a destination, and I feel like I'm talking as Paul Williams back in 2019, a former colleague of mine and the architect of the policy. There are still things to do going forward, and that's not just to appease regulators and hit the policy requirements, that's also because the world is evolving as well. Threats are changing, the nature of technology is changing and expectations in markets are changing. A simple one to highlight is the changes to T+1 for the settlement of certain products in the future, that's going to change expectations about how quickly things need to be done, in investment banking in particular, so that may well have repercussions for resilience. It's a dynamic environment and one that you can't say is solved today or even tomorrow.

Tessa: Absolutely, I think that's why it's such an interesting topic and a really good one for us to return to, because as you say, market expectations, market dynamics and of course regulatory expectations are continuing to evolve. I think that T+1 settlement reform is a really interesting one for a lot of our clients at the moment, and it's very topical that you mention it because that's actually going to be the topic of our next podcast episode, so listen out for that for anyone who's interested to find out more. If we talk a little bit more about then regulatory expectations, I'm sure the regulators will have been engaging with firms closely in the run up to that deadline, but as we look a bit beyond the deadline, what should firms be expecting in terms of regulatory expectations and broader scrutiny?

Duncan: What I would say, is my views on this are my own, so I'll just make that very clear, but I've had conversations with regulators, I've spoken to firms about how they're being supervised, and you can then broadly extrapolate out what might happen in the future. To date, regulators have been relatively clear about what they expect of firms in self-assessments and the information that firms are going to be providing them with. Those documents will be the heart of the way in which regulators initially will be looking at firms, and they've been doing it for some time, as this will probably be about the third iteration of self- assessments. What I would expect regulators will be doing with that, is considering what interventions might be needed from a firm perspective, either to nudge them in the right direction or take some stronger action, but I suspect stronger action is further down the line as firms evolve. I also think they will be looking at that as a collective, and thinking about what these self-assessments are telling them about the industry, and what larger actions they might look to talk to the industry about, communicate on the subject of. I think that's going to be the mode in which regulators are likely to operate, and in time they obviously have broader supervisory powers and tools that they can use, where they see that there are issues to be dealt with.

I think resilience will start moving into that environment that's more typical of other functions within firms, where they could be subject to deeper reviews from the regulator, straying further through into skilled persons, but that's not a threat to put out there, that's just the nature of supervision and the way regulators work. I also think it's not just regulators who are interested nowadays, and we've seen recently the Treasury Select Committee put out their questions to the big banks across the UK on the outages that they've had over the last couple of years, in the wake of some other large outages. I mention that on the basis that the usual communication is between regulators and firms, and that's largely behind closed doors and dealt with like that, but this elevates it into the political and the public sphere, so it creates a new audience for resilience, and perhaps increased scrutiny. I think that's an important dynamic to think about when we're looking at what the regulators might do, but also what politicians might do as well.

Tessa: Absolutely. I think you've articulated really well there the ongoing and evolving nature of the challenge there for firms, both in terms of how thing aren't standing still in the market but they're also continuing to evolve in terms of wider scrutiny and expectations. Given that environment, what are the key areas that firms really need to be focusing on in order to continue to meet those expectations as we look beyond the deadline? Rory, it would be great to bring you in to the conversation here, and let's kick off with your thoughts on that question?

Rory Spedding-Jones: Obviously, a number of different areas that we could cover here. I think the first one that I want to call out explicitly is around testing, and the work that I've been doing with my clients recently has been around how to take an integrated or holistic approach to that testing, and that can be from the minutia and the detailed focus on some of the component testing, right through to board-level crisis exercising as well, and taking learnings from all of those pieces. Other things to consider might be business continuity plan validation, ITDR testing or third-party exercising as well. That integrated approach comes with a whole host of benefits, from taking the learnings and leveraging some of the insight that you've taken from some of the more detailed testing. Using that within your severe but plausible scenarios gives you a much greater level of insight, and you're working from much better data than you might otherwise be where you're having to make assumptions, or the stakeholders that you have engaged in the test might just have to make assumptions.

If you can work from that data, you can be much more confident in your understanding of, 'Am I going to breach my ITols in this scenario?' Or any vulnerabilities that you identify, because you've got a broad mix of stakeholders in the room, where you can start to unpick what those vulnerabilities might be in the session and understand where they originate from. Just picking up on that piece around stakeholders, the integrated approach comes with a benefit of helping to shift the culture a little bit as well, where you're getting that engagement from potentially siloed risk- or resilience-related teams. Getting them all in the same room and working towards the same outcome can really help with that engagement and that culture piece, and demonstrate the value to the business that are engaged in that testing as well, as it all moves towards the right direction of really helping to drive resilience and make the organisation more resilient. The final piece I'd just touch on there again, related to testing, is a lot of that insight and a lot of that value can really be expedited by taking advantage of some of the technology platforms that are on offer, and that we work with our clients and see in the market. You've got, it's going to sound super-nerdy, but some really cool platforms coming out around facilitation and simulation, and you've got those softer skills that come through those platforms, where you make it feel much more real. You could get bombarded with tweets, or social media messages, or phone calls and emails, and ramping up that pressure of what it feels like to respond to an actual incident, or you've got the collective resilience tooling space where you can capture all the different test types and leverage them all centrally. So, testing is what I'm spending most of my time with clients on recently, and how we enhance it and really drive some value from it is what my clients are focusing on.

Tessa: Brilliant, and Duncan, are there any other additional areas beyond testing that you'd highlight?

Duncan: Yes, obviously I wholeheartedly agree with what Rory is saying there. What I'm spending a lot of time with my clients on is looking at how they move from a project mindset, which sometimes gets linked in with that deadline, into what the future looks like from business-as-usual. Some have already shifted to that, and perhaps have been in that for some time, but there can still be a project feel, and that needs to be demystified from the boards and the exec, because this is going to continue. With that, there's a focus on the operating model for a resilience function. How it works with other parts of a bank, as Rory has spoken about, that's key for culture, and the exact activities it's going to do. While it's been creating methodology and implementing things, that's going to change going forward, to a bit more around refreshing and updating, and making sure things remain relevant, and the testing. The other point I've been spending time with takes resilience in a slightly different direction. It's where there's an intersection between resilience and operational risk. For a long time we've been thinking about post disruption, an event, what do we do to mitigate it and how do we do that? It's also possible to provide that resilience lens on the preventative control side of things on operational risk. To what extent, specifically for important business services, have we got controls that are well-designed and operating effectively, to prevent these disruptions from occurring?

It's shifting the gaze from looking past the disruption to pre-disruption, which is a very mature space anyway, but placing that lens of resilience over it starts to build a slightly different picture, focus efforts and focus allocation of resources. I think that's really important. The last point I'd touch on briefly is the point Rory has already mentioned, which is culture. There's a point that firms are trying to think about in terms of resilience by design. It's a term that's been thrown out there by many people. I'm sure many people have different views on what that actually means. For me, it's going beyond just hard coding, resilience questions and focus, into new initiatives, but it's going beyond into strategy of the organisation and what it's trying to do going forward, and then all the touch-points that flow from that. So not too blinkered in terms of large IT changes, transformation projects, but actually if this organisation is looking to go into a new geography and influence the market there with the products they're providing, or to increase their presence in a particular product, that can change the important business services and the focus, and that needs to be picked up. For me, resilience by design should also lead to a great culture around resilience, I think those work together.

Tessa: Really interesting, thank you both. Rory, you mentioned some of the new and exciting technology platforms that firms are starting to use. It would be great to dive into that a bit more. I think, given the pace of technological change at the moment and the focus among a lot of our clients around using AI and other technologies to drive operational efficiencies, what scope is there for firms to do things differently in the space, and hopefully more efficiently as well?

Rory: Yes, I mean huge scope, and I think you can't talk about technology change and efficiencies without talking about AI, and what AI looks like today, and also what it could look like in the future. If I focus on today, the platforms that most people are used to interacting with are ChatGPT or the other equivalent chatbot tools that are excellent for summarising, creating, analysing data information through the written word. Where I'm starting to see clients start to use this, and where we're helping them with that, is through their self-assessments. So, speaking to firms, they are unintentionally becoming a bit of a behemoth, teams spending a lot of time producing them, a lot of content in there, and so if you can have an AI layer effectively run through and do a lot of the leg-work for you it can save a lot of time, and teams can actually spend their time doing more valuable things, making the firm more resilient. A couple of examples, and again, these are generally helped by having some resilience tooling or a platform in place that's centralised, where you've got access to all that data, is around vulnerabilities. If you have a vulnerabilities log and action trackers associated with that, you can feed that into an AI platform, and get summaries, and format that how you'd like to, to then lift and shift into a self-assessment. The same could be said for any changes that your services may have been through.

If there are refreshers or updates to your tolerances, if there are changes to the scopes or introductions of new services, you can use AI to help tell the story of that and what that means, and the same can be said for testing as well, so getting that output from the testing and formatting it in the right way to include in your self-assessment. Now, I'm absolutely not saying it's going to perfect and AI's just going to do that work for you, there's still going to be a role for humans, at least for the time being, to review that and make sure it all works and is not hallucinating anything, but it's certainly a leg up and can be a huge time saving in terms of how they're producing those self-assessments. If I was just to pick up on the testing point as well, rather than just focusing on the output, AI can be used a lot for the input to testing as well. So, the crafting of different scenarios, justifications or narratives for why certain tests are valuable, as well as during the facilitation some creative injects that can help to stress or challenge that scenario, or increase some of the sophistication there. Outsourcing some thinking, shall we say, can make those types of sessions really engaging and really help to drive some efficiencies and make life slightly easier for the teams that we work with.

Tessa: Brilliant, I think it's really helpful to hear some of those practical use cases that firms are implementing in the here and now. Whilst of course, as you've articulated, technology can really help firms improve their approaches, as firms are potentially implementing technology change, that can itself pose operational resilience risks that need to be managed, and that Duncan, as you mentioned earlier, that the TSC has been focused on on some of those issues recently. How should firms be thinking about resilience in terms of technology change?

Rory: Yes, really interesting, and I think it goes back to what Duncan said around the preventative or the forward-looking approach. The TSC response that you mentioned, and tying it back to the AI we talked about, we've actually developed a tool internally where we fed the initial Treasury Select Committee letter, and all of the responses to the banks, into its knowledge base, so that all of our teams could then ask some questions and get some insights on it without having to read through all of the letters. Using that we did some analysis over some of the responses, and the big piece that came out was that technology change, or business change, was by far and away the main driver for the incidents that had been caused, and as Duncan said, shifting to that proactive mindset, there are some clever things that you can do when we start to look at what that means. For example, let's say you've got a change window coming up this weekend, and you've got a few different applications or technology assets that are going through a change. If you have tracked or tagged those to a specific service, you might want to flag that, 'Oh, we've got three applications all going through the same change window that all operate or all support the delivery of one service.' That might be a bit of a risk that you want to flag, and you might want to put a control in place around that. You might want to then notify yourIBS owner or delegates that this is coming up, maybe have a think about what your service recovery plans look like, or what workarounds you might put in place. Again, if we tie it back to what Duncan was talking about a little bit about the culture, proactivity, engagement, if you can nip that in the bud before something has gone wrong it's a lot less stressful for everyone involved, and you're proactively being resilient as opposed to, as you say, having to respond or put workarounds in place for if something did go wrong.

Tessa: Absolutely, really helpful to hear that proactivity point but brought to life, thank you for that. You've talked to us about some of the more near-term use cases for AI, and it would be great as well to get your reflections on how you see use of AI and technology evolving looking a little bit further into the future. If we think about AI's potential to really transform business processes, looking a few years or perhaps even five years ahead, what does that mean for how firms approach resilience and respond to more complex threats in a world which could probably look very different to our current world today?

Rory: That feels likes a very difficult question to answer. I think if anyone told you they know what the world is going to look like in five, ten, fifteen years, I think they're lying to you. I really don't know what a post-AI or post-AGI world is going to look like. So, AGI, artificial general intelligence, typical definition being an AI that can do anything that we can do, which leads to the point where we as human beings might be a little bit out of the loop. The one thing I think we can hold true with that, and that we're starting to see develop, is that AI is becoming more agentic, so it's going from producing outputs, mainly written, to taking actions and doing work for us. If I was to walk through a simple scenario with that, you might have something like a payment request or a refund request that an AI is looking at, and there's one particular threat there called indirect prompt injection, so effectively a bad actor is giving instructions to the AI hidden in the data that it's looking at. So you could embed within that payment request an instruction to the AI to maybe increase the payment amount, or change the bank account where that payment gets sent, and at the moment it's a real achilles heel of these AI systems, because we don't really know, one, how to detect it, or two, how to prevent the AI then acting upon those instructions.

You get into this world where you've introduced some quite difficult and new risks and challenges to be resilient against and protect against, and the controls you put in place around that, do you have an AI performing a control check to make sure that those payment amounts are matching what was expected? Do you still have humans in the loop that are doing that control check, and does having a human in the loop reduce some of the efficiency gains? Becomes an interesting question to start to consider. So you've got AI that is starting to do more and more, and starting to talk to one another as well, where my AI talks to Duncan's AI rather than us talking to each other. When that's the case, there's less reason to use human language, to use English to communicate, and so AIs can start to be more efficient and talk how they need to, and then we're out of the loop and we don't really know how they're saying, or why they're saying it or what they're saying. Applying controls around that becomes really tricky, and if stuff goes wrong, because we don't necessarily understand how they're operating, building resilience into that process becomes a real challenge as well. I don't know if you will have an AI that is clever enough to detect that something is broken and be able to fix itself. I don't know if you will have a secondary AI that's deployed to fix the original one, whether you have humans operating manual workarounds in an old-school business continuity world of get your paper out and do it on paper. Does a human operating a process how we currently work become the backup, or do you have an AI doing a different process that's the workaround or the backup instead? And so I really don't know what that world is going to look like, but it's certainly going to become more complex and potentially introduce some threats or danger into the marketplace, which Duncan I think you wanted to pick up on what that evolution looks like?

Duncan: I think it's really interesting what you're saying there, because there are so many options out there and so many different permutations of how things could be done, that it's not necessarily easy to understand what that's going to look like as a whole. What I find from what you're saying there, Rory, is I can step back and look back over financial services over the last ten, twenty years, and we've moved from models where there's latency in the system, we all know when you paid a cheque in to a bank and you waited five days for it to clear, and then you'd get your money-, we're now talking about systems where, to take that through to what you're saying, Rory, where AI is speaking to AI in a different language. These systems suffer from getting more tightly coupled and interconnected, but they also become quite fragile in that process. So your points around governance, and understanding how these work, is really on point, because at the moment that we devolve responsibility for those things, which a regulator is never going to allow us to do anyway, there are going to be big problems. Actually, going back to the purpose of resilience in its original sense is about acknowledging quite how far financial services has come in the way it provides those services, but it had something to catch up in terms of how to manage, govern and deal with the disruptions that will inevitably happen. This is just a further evolution of that I think, Rory, and it brings a really interesting picture about how quickly things can be done in the future, and how little humans could be involved, which can be quite scary in itself. I think with all of these things, they come with time and with education and understanding, but we just need to be on top of it.

Tessa: Absolutely fascinating, thank you both so much for that, lots for firms to be reflecting on. As you say, Rory, we don't know what the future is going to look like, but firms certainly need to start getting a handle on some of these risks, and I think your point on governance, Duncan, is so interesting. If I think about some of the principles that firms should be applying to AI around explainability, accountability, in a world where two different AIs are talking to each other in a language we can't understand, how can you have that effective governance around it? Really interesting, and I think there are some quite big challenges there, not just for firms but for the regulators as well in terms of how the regulators approach a world that looks so different. Duncan, let's bring things back to that regulatory landscape. How do you see regulation for operational resilience evolving, both in the UK and globally?

Duncan: Thanks Tessa. I think operational resilience regulation isn't restricted to the UK alone, it's worth saying that for a start. Various other regulators have taken the UK standards and applied them themselves, almost directly, and others have come up with different versions of it, perhaps coming from a business continuity background for example. That's more typical of the likes of the MAS Singapore, and the HKMA Hong Kong. There are plenty of other regulators out there interested in this subject, it's gained quite a lot of traction. The challenge there is then how to manage some of that, because different regulators will be picking up on different issues they want to be solving. What we're finding, and I suspect Rory will have found it in some of his engagements as well, is that where firms can create a common focus or a core around what they want to deliver on resilience, that provides a good foundation. The PRA and FCA regulation does help with that. It's very outcome-focused, and provides room for manoeuvre in how it's delivered. Some other regulations, you mentioned the Digital Operational Resilience Act, is a little bit more constrained. It's a bit more prescriptive, so what needs to then happen is to understand the links between the two, in fact the overlaps are relatively focused for those too, and how you achieve the core but you create variation for the other things that that won't achieve. That would be true of applying it to Asian regulators as well. So having that core is really important, holding true to that, and varying where necessary.

Tessa: Thank you, and I think that's a challenge for firms in lots of different areas, managing that divergence and as you say, the balance between prescription and more outcomes-focused rules. I really wanted to get your thoughts as well on third-party risk management in particular. It feels like there's a growing focus on that, both in the UK and globally. How do you see that link between operational resilience and third-party risk management, and how should firms be approaching that?

Duncan: I think that's a great question, because third-party regulation and management of risk is a big point going forward for the benefit of resilience, as well as just controlling your organisation. I think it's best summed up by the fact that the PRA and FCA in the UK launched their resilience policy, discussion, etc, at the same time as one on third party, and they did that on purpose. They want those two subjects to be inter-related and thought about as being connected, and that's only embedding further as we move forward. The Digital Operational Resilience Act has, one of the key pillars in there is third party risk management, alongside cyber and technology risk, so that's calling out the role in resilience. There's the more recent consultation out there around incident reporting that also includes third party, framed in resilience but, importantly, about third parties, so another piece to deal with. Then, you've got the critical third parties regime, how they're seeking to address how large third parties provide services into financial services.

There's a weight of expectation there on firms, from different angles, emphasising the role of third parties. If I were to step back and look at resilience as a whole, the evolution of it has been to learn about what's important and apply thought to that, but technology has often been the focus. And, as Rory says, it's one of the key fundamentals for why things go wrong. We're moving further forward into looking at third parties and their role, and quite how influential some of those are, and problems that need to be addressed around that. So for me, while resilience has had a time-frame associated with it with this policy, and will still continue and still has focus, a fundamental to that is third party. I know in a future podcast you're going to have one of my colleagues in talking about that more specifically, that should be much more insightful than me on that subject, but emphasising quite how important it is.

Rory: We're seeing that in our clients as well, where they're starting to engage their third parties, particularly around testing as well. It's almost a maths sum that doesn't quite have a good solution yet. Due to the nature of the market, we've got more and more firms that want to test in partnership with counter-parties, or sometimes even competitors, to understand what their resilience upon each other looks like, and we're seeing that I think in particular with some of the cloud service providers. There's a limited pool of cloud providers out there that service the market, but there's a much larger number of financial services organisations that take advantage of those services provided, and these FS firms want to start engaging with the cloud providers or with these third parties to test together, to get better insight and to get better answers as to what that dependency upon them looks like. We might end up in a world where there's only limited time and capacity at the third parties to support that testing, and so how they prioritise who they test with, do they prioritise the bigger market players who pay them more money, versus some of the smaller firms maybe getting left behind? I think that's something we just need to be conscious of, as we start to do more integrated testing with third parties, how that's balanced across the marketplace.

Tessa: Brilliant. Well, I think that's a really natural point for us to conclude our conversation on, because as you mentioned, Duncan, this is a topic that we're going to be returning to in a bit more depth in a future episode, so please look out for that, listeners, if you're interested in finding out more on third party risk management. This has been a really fascinating conversation. It's been great to hear both of your reflections on how operational resilience is really evolving, given the dynamic nature of the market and the evolving nature of threats, and particularly to delve into some of the impacts of the evolution of AI, so thank you both very much for that. Thank you as well to our listeners for joining us, and I hope you've enjoyed this conversation. As always, please subscribe to future episodes, and please rate and review this series as it helps other listeners to find us. If you'd like to hear more from us on risk and regulation, please look out for our regular publications on our website, which we'll link to in the show notes, and I look forward to speaking with you again next month.