WeCyberYou! Unlocked Podcast
The WeCyberYou! Unlocked Podcast breaks down cyber security, online safety and digital risks into clear, practical conversations anyone can understand.
Each episode is designed for a specific audience, ensuring the advice is relevant, accessible and grounded in real-world scenarios - not technical jargon.
WeCyberYou! Unlocked Podcast
Cyber Security Frameworks Demystified Part 8 - ISO/IEC 27031
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
In this episode, we break down what the ISO/IEC 27031 is, how it helps organisations prepare for cyber incidents and major disruptions and why ensuring ICT readiness is critical to keeping businesses running when everything else fails.
Duration: 0:19:30
Visit https://www.wecyberyou.com for more cyber security education, resources and awareness content like this.
Thank you for listening.
WeCyberYou! Team
Like and follow us to be notified when a new episode is released on this channel.
When you picture corporate cybersecurity, you probably imagine a giant digital fortress.
SPEAKER_00Right, like massive impenetrable walls.
SPEAKER_01Yeah, exactly. Heavy cryptographic gates and automated guards just keeping all the malicious actors out.
SPEAKER_00Yeah.
SPEAKER_01But uh what happens when those walls are breached?
SPEAKER_00Or worse.
SPEAKER_01Right. Or worse, what happens when the massive power grid running that entire fortress just completely fails? You know, the alarms go silent and the gates are stuck wide open.
SPEAKER_00That is the nightmare scenario.
SPEAKER_01It really is. And welcome to this deep dive on the WeCyber You Unlocked podcast. Before we get into today's topic, please take a quick second to follow the channel and remember to visit WeCyberU.com for more content exactly like this.
SPEAKER_00Highly recommend checking the site out.
SPEAKER_01Definitely. So today, for you the listener, we are looking at the official documentation for ISO 27031. Our mission on this deep dive is to understand the actual physical mechanics of how organizations keep their digital lights on when absolute disaster strikes.
SPEAKER_00Yeah, and it really is the ultimate playbook for the worst case scenario. I mean, we spend so much time in the tech industry discussing how to prevent an incident rate.
SPEAKER_01Oh, all the time.
SPEAKER_00But ISO 27031 is entirely focused on the mechanics of surviving one. It forces an organization to accept that the unthinkable is going to happen. And when it does, how do you mathematically and structurally ensure the business keeps functioning?
SPEAKER_01Okay. Let's unpack this because before we can look at how to fix a disaster, we really need to understand the overarching framework of technology readiness.
SPEAKER_00Right, the big picture.
SPEAKER_01Yeah. So if we look at other frameworks, like say ISO 2701, you can think of that as the anti-lock brakes on a car.
SPEAKER_00That's a good way to put it.
SPEAKER_01Right. It's doing everything in its power to stop the crash from happening. But ISO 27 C031 is the crumple zone and the airbags. It accepts that the crash is currently happening.
SPEAKER_00Yeah, you're already hitting the wall.
SPEAKER_01Exactly. And its entire mechanism is designed to absorb the kinetic impact so the passengers, which is the business, actually survive the wreck. It specifically zooms in on technology readiness.
SPEAKER_00Aaron Powell What's fascinating here is how it acts as a mechanical bridge. It complements broader standards like ISO 27001 for information security. Right. And uh ISO 2002301, which handles overall business continuity. But ISO 27031 carves out its own critical niche by focusing purely on the technology side of that equation. In the documentation, the core concept driving all of this is IRBC.
SPEAKER_01IRBC. Wait, which stands for what again?
SPEAKER_00Right. So that is ICT readiness for business continuity.
SPEAKER_01Aaron Powell ICT readiness for business continuity, which sounds like really heavy corporate jargon, but the underlying concepts are actually quite elegant.
SPEAKER_00Aaron Powell They are. I mean, IRBC is really the architectural heart of ISO 27031. It basically means engineering your IT systems to possess three specific qualities. They must be available, they must be resilient, and they must be recoverable.
SPEAKER_01Hold on, let me push back on that for a second. Availability and resilience. Those sound like the exact same thing to me. Like if a system is available, isn't it inherently resilient?
SPEAKER_00Aaron Powell Not necessarily. And the mechanical difference between those two is actually where a lot of infrastructure design completely fails. Yeah. So availability is about redundancy. It means having a secondary load balancer or a backup power supply so that if component A fails, component B takes over.
SPEAKER_01Aaron Powell The system just stays up.
SPEAKER_00Right. Resilience, on the other hand, is about how a system behaves under extreme stress or partial failure.
SPEAKER_01Oh, okay.
SPEAKER_00Like if a massive denial of service attack hits your network. A resilient system doesn't just crash, it degrades gracefully.
SPEAKER_01Degrades gracefully.
SPEAKER_00Yeah. So maybe the high resolution images stop loading on the application, but the core text-based transaction database keeps functioning. It bends instead of snapping.
SPEAKER_01Well, that makes total sense. So availability is having a spare tire.
SPEAKER_00Right.
SPEAKER_01Resilience is having run-flat tires that let you keep driving at a slower speed. And recoverability is how fast you can get the car to a mechanic and replace the axle entirely if you hit a massive pothole.
SPEAKER_00That is a highly accurate way to look at it. The ultimate goal here is to engineer an environment that prevents disruptions where possible responds effectively when the environment is compromised and recovers the underlying data quickly.
SPEAKER_01Just maintaining critical operations during the chaos.
SPEAKER_00Exactly.
SPEAKER_01So we have our crumple zone, but how do we actually design it? Because wait, so not all systems are created equal, right? Definitely not. Like if my company gets hit by a massive power outage, surely the IT team isn't trying to save the employee cafeteria menu server at the exact same time they're trying to save the primary customer transaction database.
SPEAKER_00No, they absolutely shouldn't be. You cannot save everything all at once. And honestly, if you try, you will just end up saving nothing.
SPEAKER_01It's like emergency room triage.
SPEAKER_00Exactly like that.
SPEAKER_01If you walk into an ER, some agents need a heartbeat instantly. They get rushed straight to the back.
SPEAKER_00Right.
SPEAKER_01Other people with a sprained ankle can sit in the waiting room for hours. But here's my question. In a hospital, a doctor makes that call. In a massive corporation with a complex distributed architecture, every department head thinks their specific application is the patient bleeding out.
SPEAKER_00Oh yeah. Everyone thinks their project is tier one.
SPEAKER_01Right. So if every department claims their system is tier one, who breaks the tie?
SPEAKER_00And that is exactly why the standard introduces two core mechanics: alignment with business continuity and risk management. An IT department cannot operate in a vacuum. That makes sense. Nor can they be the ones making the final triage decisions.
SPEAKER_01Aaron Powell So how does that actually work in practice? Do they just put the sales director and the IT architect in a room until they agree?
SPEAKER_00No, it is much more empirical than that. A proper BIA relies on financial and operational modeling. The business units have to quantify the exact cost of an outage.
SPEAKER_01Like literally assigning a dollar value to the downtime.
SPEAKER_00Exactly. They calculate that if the CRM goes down, they lose $50,000 an hour in sales.
SPEAKER_01Wow.
SPEAKER_00But if the internal HR portal goes down, it costs them maybe a few hundred dollars in lost productivity. The numbers dictate the triage order.
SPEAKER_01Aaron Powell Oh, so it completely removes the emotion from the equation.
SPEAKER_00Yes. The technology serves the business, not the other way around.
SPEAKER_01Aaron Powell Okay. The documentation highlights two crucial metrics that come out of this business impact analysis. And I want to make sure you, the listener, really crasp the mechanics of these because they are basically the foundation of any recovery architecture. They are RTO and RPO.
SPEAKER_00Yeah, let's start with RTO, the recovery time objective. This dictates how fast specific systems need to be back online before the business suffers unacceptable, irreversible damage.
SPEAKER_01Okay.
SPEAKER_00So for a crapel financial transaction system, your RTO might be five minutes. For that internal HR portal, your RTO might be a week.
SPEAKER_01Okay, so the BIA gives us our RTO. It's our math. We know the bank app needs to be up in five minutes. Now the second metric is RPO or recovery point objective.
SPEAKER_00Right.
SPEAKER_01If RTO is about time moving forward to get back online, RPO is about looking backward at your data. It defines how much historical data the business can afford to lose.
SPEAKER_00Exactly. It dictates your backup point.
SPEAKER_01So if my RPO is 24 hours, I am backing up my data once a day. If the system crashes, I lose yesterday's work, and that is deemed acceptable by the business.
SPEAKER_00Correct. But if you are a major financial institution, an RPO of 24 hours is catastrophic. You cannot tell millions of customers that their transactions from Tuesday simply vanished into thin air.
SPEAKER_01Oh yeah, that would be a nightmare.
SPEAKER_00A total disaster. For those Tier 1 systems, the RPO must be near zero, meaning data is backed up continuously in real time.
SPEAKER_01Wait, an RPO of zero, backing up data continuously in real time sounds insanely expensive and resource heavy.
SPEAKER_00It is incredibly expensive.
SPEAKER_01How does a company even pull that off without slowing their entire network to an absolute crawl? Every time I save a file, it has to instantly copy to another server.
SPEAKER_00Well, which is why the BIA is so vital, right? You only apply an RPO of zero to the most critical databases. Mechanically, to achieve near zero RPO without tanking network performance, organizations don't use standard backups.
SPEAKER_01What do they use then?
SPEAKER_00They use synchronous replication over dedicated high-speed fiber channels. When a customer makes a deposit, the primary server writes the data. And before it even confirms the transaction to the user, it waits for a confirmation from a secondary server miles away that the data was also written there.
SPEAKER_01Aaron Powell So they are literally writing the data in two places simultaneously.
SPEAKER_00Exactly. Or they use asynchronous replication with continuous delta syncing.
SPEAKER_01Delta syncing.
SPEAKER_00Yeah. Where only the tiny block level changes in the data are streamed to the backup site milliseconds after they happen. It requires massive bandwidth and incredibly sophisticated storage arrays.
SPEAKER_01Aaron Powell Here's where it gets really interesting though. We've established the risk and we know our objectives with RTO and RPO. But math doesn't reboot a server.
SPEAKER_00No, it certainly doesn't.
SPEAKER_01What actually happens when the nightmare becomes reality and the data center is literally underwater? How do we move to the action phase?
SPEAKER_00Right, the response.
SPEAKER_01Yeah, the standard outlines the key components of an incident response. And one thing that really caught my attention was the communication plans. I find it fascinating that a highly technical IT standard explicitly demands communication protocols.
SPEAKER_00Well, because a disaster is rarely just a technical failure. It is almost always a human coordination failure as well. Oh, for sure. Think about it. If a massive outage hits, the IT engineers are frantically digging through code to restore routing tables, the executives are demanding answers for the press.
SPEAKER_01And the customer service team is giving clients completely inaccurate information.
SPEAKER_00Exactly. The disaster multiplies exponentially due to chaos.
SPEAKER_01But practically speaking, if the network is down, the company email server is down.
SPEAKER_00That is exactly why ISO 27031 requires out-of-band communication plans. You cannot rely on the infrastructure you are trying to fix to communicate about fixing it.
SPEAKER_01Oh, that's a great point.
SPEAKER_00Mature organizations will have entirely separate cloud-hosted communication platforms, like a standalone Slack workspace or dedicated emergency cellular devices that do not touch their primary network.
SPEAKER_01So it's totally isolated.
SPEAKER_00Totally isolated. The communication plan dictates exactly who talks to whom on what isolated platform and at what intervals.
SPEAKER_01So it's like the core switch is down, we are failing over to the secondary site, notify the client success team, we will be degraded for two hours. It basically manages the panic.
SPEAKER_00This raises an important question about the actual restoration, though. Communication is just the wrapper. Inside that wrapper, the standard requires an ICT continuity strategy and ICT recovery plans.
SPEAKER_01Aaron Powell What's the mechanical difference between a strategy and a plan in this context?
SPEAKER_00So the strategy is the overarching architectural approach. Are we using redundant physical systems in a hot site, or are we relying on cloud failover? Okay. The recovery plans are the tactical, step-by-step technical procedures. It is the literal runbook an engineer executes.
SPEAKER_01Aaron Powell When you say cloud failover as a strategy, what is actually happening mechanically? We hear that term all the time. Is there a literal heartbeat signal between the primary data center and the cloud backup?
SPEAKER_00Aaron Powell In many architectures, yes. Quite literally. You will have automated monitoring systems constantly sending a health check or a heartbeat to the primary application. If that primary application fails to respond to three consecutive heartbeats, the automated failover mechanism kicks in.
SPEAKER_01And how does the user traffic know to go to the cloud instead of the dead server?
SPEAKER_00Aaron Powell That usually happens at the DNS level. The system automatically updates the domain name system records, essentially changing the internet's address book on the fly. Oh wow. It drops the time to live or TTL of the routing record to maybe 30 seconds. So all incoming internet traffic is instantly rerouted from the dead physical data center to the standby environment in AWS or Azure.
SPEAKER_01That's incredibly fast.
SPEAKER_00Yeah. When designed perfectly, the end user might just experience a slight page load delay.
SPEAKER_01That is incredible. But you know, the sources also give us some very practical examples of real-world threats ISO 27031 prepares you for. And not all of them are simple hardware failures.
SPEAKER_00No, definitely not.
SPEAKER_01We are talking about ransomware attacks shutting down entire networks. And this is where I kind of get hung up. If you get hit by advanced ransomware, how do you even recover? If everything is connected and syncing continuously to hit those RPO targets we talked about, wouldn't the ransomware just sink to the backup and corrupt that too?
SPEAKER_00That is the exact nightmare scenario. And it happens frequently. Wow. This is why the ICT continuity strategy must account for the specific nature of the threat. To combat ransomware, you cannot just rely on standard continuous replication. You need immutable backups.
SPEAKER_01Immutable meaning they cannot be changed. Correct.
SPEAKER_00The storage architecture is designed so that once a backup snapshot is written, it is cryptographically locked. It physically cannot be modified or deleted by any user or administrator, even if they have top-level network credentials for a set period of time.
SPEAKER_01So even if the ransomware encrypts the primary network and the administrative servers, it just hits a brick wall when it tries to infect the immutable storage array.
SPEAKER_00Exactly.
SPEAKER_01So the recovery plan runbook for ransomware wouldn't just be restore the backup. It'd be step one, isolate the network. Step two, verify the immutability of the backup repository. Step three, forensically scrub the hardware. And step four, initiate the restoration.
SPEAKER_00Exactly. It turns an existential company-ending crisis into a highly stressful but entirely manageable engineering workflow.
SPEAKER_01So what does this all mean for day-to-day operations? Because a brilliant architectural strategy and an immutable backup on a piece of paper are completely useless if the engineer just freezes when the alarms actually go off.
SPEAKER_00If we connect this to the bigger picture of the standard, that is why the final major component of ISO 27031 is testing and continuous improvement. You cannot assume your plan works. You have to actively prove it works through rigorous testing.
SPEAKER_01What does that actually look like? Are IT directors running around like chaos agents, literally pulling fiber optic cables out of server racks to see what happens?
SPEAKER_00Honestly, in the most advanced tech companies, yes.
SPEAKER_01Wait, really?
SPEAKER_00Yeah, that is a practice known as chaos engineering. Companies like Netflix famously develop software like Chaos Monkey that intentionally terminates production instances randomly during the workday.
SPEAKER_01Just to see if the failover works.
SPEAKER_00Exactly. To ensure the automated failover mechanisms we discussed actually work under real-world conditions. But for most standard enterprises, testing starts with tabletop exercises.
SPEAKER_01Walk me through a tabletop exercise. How do you simulate a digital disaster in a conference room?
SPEAKER_00Aaron Powell The security leadership will draft a highly specific worst-case scenario. Let's say a novel ransomware strain has just compromised our Active Directory environment.
SPEAKER_01Okay.
SPEAKER_00Active Directory is the system that controls all user permissions and passwords. So they say the attackers have changed all administrative passwords, you cannot log into your laptops, you cannot access the digital runbooks, the manufacturing floor has halted. Go.
SPEAKER_01Wow. That immediately exposes the human flaws. If the recovery plan is saved as a PDF on a server, you can no longer log into your plan as useless.
SPEAKER_00Exactly. The tabletop exercise forces the team to realize they need physical printed copies of the runbooks stored in a secure offline safe. It forces them to realize they need an out-of-band communication method because their corporate email uses Active Directory to authenticate. Right. The entire goal of the simulation is to break the plan, identify the gaps, and continuously improve the architecture based on those lessons learned. Technology changes, cloud environments shift, and threat actors evolve daily. And an untested plan is an obsolete plan.
SPEAKER_01So taking a step back and looking at everything we've covered, from crumple zones to immutable backups to tabletop simulations, why should the listener care?
SPEAKER_00It's a valid question.
SPEAKER_01Yeah, why does going through the rigorous, expensive process of aligning with ISO 27031 actually matter to the bottom line of a business?
SPEAKER_00Well, the documentation makes it clear that this isn't just an IT checklist. It is a core business survival tool. First, it drastically reduces downtime and financial hemorrhage. When transaction systems are offline, the company is burning money by the second.
SPEAKER_01Every single minute of an RTO has a dollar amount attached to it.
SPEAKER_00Absolutely. Second, it fundamentally changes your posture against cyber attacks. If you have a tested, immutable backup architecture and a zero trust recovery plan, ransomware loses its leverage.
SPEAKER_01Because you can just ignore the ransom.
SPEAKER_00Right. You don't have to pay it because you can confidently rebuild the environment yourself. Furthermore, it supports strict regulatory and compliance requirements that governments are increasingly mandating for critical infrastructure and financial sectors.
SPEAKER_01And I think the point that really hits home is that it protects reputation.
SPEAKER_00Oh, massively.
SPEAKER_01If an airline scheduling system goes down for an hour, it makes the evening news. If a bank app is offline for two days, people are moving their direct deposits to a competitor. Trust takes decades to build and a single poorly managed IT incident to destroy. Yep. ISO 27031 proves that disaster recovery is no longer an IT problem. It is a core business operation.
SPEAKER_00Trust is the ultimate asset you're protecting when you implement these frameworks.
SPEAKER_01So to summarize, the core takeaway from the source text. ISO 27031 is the standard that helps organizations engineer their IT systems to handle extreme disruptions and keep the business running no matter the circumstances. It moves an organization from hoping they survive a crash to mathematically engineering the crumple zone so they know they will.
SPEAKER_00It does. But uh, I want to leave you with one final slightly provocative thought to ponder based on one of the practical examples the standard addresses. Cloud service outages.
SPEAKER_01Oh, this is the cloud paradox.
SPEAKER_00Exactly. We discussed cloud failover. The idea that if your local systems fail, the massive, highly redundant infrastructure of a major provider like AWS or Microsoft Azure will seamlessly take over. Right. And many modern organizations build their entire disaster recovery strategy around this exact assumption. But what happens when the disruption isn't local? Oh man. If a massive cascading infrastructure event takes out the major global cloud providers themselves, which we have seen happen in limited capacities due to routing errors or massive localized weather events, where is the ultimate fail-safe?
SPEAKER_01Wow. If your backup plan relies entirely on someone else's computers, what is your strategy when their computers go down?
SPEAKER_00It really forces us to question the extreme limits of external redundancy. At a certain point, the cloud is just another physical data center vulnerable to the laws of physics.
SPEAKER_01That is a chilling thought. And definitely something to keep you up at night if you are a chief technology officer. Just when you think your digital fortress is secure and your crumple zones are perfect, you realize you might not control the actual ground the fortress is built on.
SPEAKER_00A very unsettling but necessary realization.
SPEAKER_01Well, on that slightly terrifying but incredibly important architectural note, we are going to wrap up this deep dive on the Way CyberU Unlock podcast. Thank you so much for joining us. Please make sure you follow the channel so you never miss a deep dive. And head over to WeCyberU.com right now to keep exploring the complex frameworks that keep our digital world turning. Until next time, keep your backups immutable and your plans ruthlessly tested.