Darnley's Cyber Café

The Deepfake Threat: Voice Cloning & Video Fraud Targeting Businesses

Darnley's Cyber Café Season 6 Episode 45

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 16:26

Your CEO sounds exactly right on that Zoom call...but is it actually them? 

In this episode of Darnley's Cyber Café, cybersecurity veteran Darnley breaks down the rapidly escalating threat of deepfake voice cloning and AI-generated video fraud targeting businesses. 

From the $25 million Arup incident to the 2025 Singapore case where attackers faked an entire executive video conference, this episode unpacks how these attacks work, who's being targeted, why finance teams are in the crosshairs, and what procedural defences actually hold up when your eyes and ears can't be trusted.

 If your organization moves money based on voice or video confirmation, this episode is worth the listen.

Click here to send future episode recommendation

Support the show

Subscribe now to Darnley's Cyber Cafe and stay informed on the latest developments in the ever-evolving digital landscape.

INTRO MUSIC — FADE IN, HOLD 5 SECONDS, FADE UNDER ]  

 

Welcome back to Darnley's Cyber Café — I'm your host, Darnley. Grab your cup of coffee, settle in, and let's get into it.

 

Today's episode is about something that honestly used to sound like a thriller plot. But in 2025 and into 2026, it has become a very real, very expensive operational threat for businesses of every size.

 

Here's where we're starting. March 2025. A finance director at a multinational firm in Singapore joins a Zoom call with what appears to be her CFO and several other senior executives. They look right. They sound right. The CFO speaks. The other executives nod and respond. It feels completely normal — except for the urgency of the ask. A wire transfer, $499,000, for what they describe as a confidential acquisition.

 

She authorizes it.

 

None of those executives were real. Every face on that call was a deepfake. Every voice was AI-generated. The entire meeting was fabricated using publicly available recordings and photos of the actual leadership team.

 

That is the threat landscape we're operating in. And today I'm going to walk you through how we got here, how these attacks actually work, what the real incidents tell us, and — more importantly — what you can do about it…

 

[PAUSE]

 

SEGMENT 1 — How We Got Here

 

Let's go back to 2019. A UK energy company receives a phone call from what sounds exactly like their CEO — accent, cadence, the whole thing. He's asking for an urgent wire transfer of €220,000 to a supplier. The employee complies. The CEO never made that call. That was the first major documented business case of voice clone fraud in the wild. It was a proof of concept. A prototype.

 

What's happened since then is not incremental improvement. It's a qualitative leap.

 

There are three shifts that have driven this explosion. First, video realism. Modern video generation models now produce content with temporal consistency — coherent motion, stable identities, synchronized expression. The artifacts that used to give deepfakes away — unnatural eye movement, flickering around the jawline, warping during fast motion, weird or disappearing hair line — those have largely disappeared, particularly in short-form content.

 

Second, and this is the one I find most alarming from a practitioner standpoint — voice synthesis has crossed what researchers are calling the 'indistinguishable threshold.' We're not talking about minutes of audio to build a profile anymore. Three to ten seconds of clean audio is sufficient to generate a voice clone with natural intonation, rhythm, pauses, even breathing patterns. Three seconds. That's a conference intro clip. That's a Zoom recording. That's a LinkedIn video post. Maybe this isn’t even Darnley speaking? Things to ponder about eh?

 

Third — the technical barrier is essentially gone. We now have what the industry is calling Deepfake-as-a-Service. Criminal platforms offering ready-to-use voice cloning, video generation, and persona simulation tools. No technical expertise required. Just a script, a recording, and a target.

 

The numbers reflect this. Deepfake volumes grew from roughly half a million documented cases online in 2023 to an estimated eight million by 2025. That's approaching 900 percent annual growth. And according to researchers at DeepStrike, a deepfake attempt was occurring against a business every five minutes in 2024.

 

[PAUSE]

 

SEGMENT 2 — How These Attacks Actually Work

 

Let me walk you through the mechanics, because understanding the attack chain is the first step toward defending against it.

 

For voice cloning, the attacker needs audio. That's the raw material. And they don't need much. Three to thirty seconds of clean audio — free of background noise, free of multiple speakers — is enough to train a modern model. Higher quality clones capturing subtle vocal characteristics might need a bit more, but the floor is shockingly low.

 

Where does that audio come from? Everywhere your executives are public. Earnings calls. Conference presentations. Podcast appearances. YouTube interviews. LinkedIn video posts. Press releases with recorded statements. If your leadership team has a public voice presence — and most do — there is usable training material out there. Attackers know this. They're harvesting it.

 

For video deepfakes, the model needs to learn facial movements from different angles and under different lighting conditions. Conference recordings are ideal. So is any video content where the executive's face is clearly visible for an extended period. The AI learns to replicate the face, the expressions, the lip movements — and then it can be overlaid on any script the attacker wants to run.

 

Now here's what makes the Singapore case particularly instructive. The attackers didn't just clone one person. They cloned an entire leadership team and ran an interactive, multi-person video conference. That's not a pre-recorded deepfake video. That's a real-time, multi-participant synthetic meeting. The complexity of that operation tells us these threat actors had resources, planning, and time. They weren't amateur hour.

 

And here's the part that keeps me up at night as someone who's spent years in this field. These attacks don't exploit technical vulnerabilities. They exploit organizational psychology. Urgency. Authority. The reluctance to question someone who appears to be your CFO. The fear of slowing down a sensitive business transaction. Companies spend significant resources hardening their systems, their endpoints, their email gateways — and attackers simply walk around all of it by impersonating the human beings inside.

 

Finance teams are the primary target, and deliberately so. They have direct authority to move money. They handle urgent transactions regularly. And urgency is normalized in finance — the deal is closing today, the transfer needs to clear by end of day. Attackers don't create urgency; they exploit urgency cultures that already exist.

 

[PAUSE]

 

SEGMENT 3 — Real Incidents & What We Can Learn

 

Let's talk about the cases that matter and what the patterns tell us.

 

You already know the 2019 UK energy firm case. €220,000. Voice only. No video. The technology at that point couldn't sustain a real-time interactive video call, so the attacker used a phone. But it worked. The voice passed every mental credibility check the employee ran.

 

In 2024, engineering firm Arup became one of the highest-profile cases on record. A finance employee in Hong Kong was invited to a video call. The call appeared to include the CFO and other executives. The request was for a $25 million wire transfer — across multiple transactions. The employee participated in what appeared to be a normal executive briefing. Every participant except them was AI-generated. Twenty-five million dollars.

 

What made that case historic wasn't just the dollar amount. It was the proof that multi-participant deepfake calls were now operationally viable. Before Arup, most security practitioners assumed deepfakes were limited to one-on-one scenarios. That assumption died in Hong Kong.

 

Then March 2025, Singapore again. $499,000. And here's the detail that I think is most important for your security posture. The attackers, knowing that deepfake awareness had grown since Arup, actually proactively suggested a video call to verify the request. They weaponized the verification step. The victim thought: 'I was going to ask for a video call, and they're the ones offering it — this must be legitimate.' That apparent willingness to verify created a false sense of security.

 

Also in 2025, the CEO of global advertising giant WPP was targeted — someone had cloned his voice and used it on a fake Microsoft Teams-style call. It didn't succeed financially, but it demonstrates that even the most prominent executives with security-conscious organizations are being targeted.

 

The common thread across every one of these cases: urgency, authority, and an artificial verification channel that the victim trusts. The playbook is consistent. And that consistency is actually useful — because it means we can train against it.

 

One more number I want to leave in this segment. Humans correctly identify high-quality deepfake videos only about 24.5 percent of the time. In another study, participants claimed 73 percent accuracy at identifying audio deepfakes — but they were frequently fooled. We are not equipped, cognitively, to detect this threat with our eyes and ears alone. Policy and process have to do what perception cannot.

 

[PAUSE]

 

SEGMENT 4 — What Your Business Can Actually Do

 

Alright. Let's talk defenses, because this is where I want to be practical. I have sat across from organizations that have been through exactly these kinds of incidents, and I can tell you — the ones that recover fastest, and sometimes the ones that prevent it in the first place, have process on their side.

 

First: out-of-band verification is non-negotiable for financial requests. Pre-agreed code words or phrases that are not public knowledge. Callback procedures to known numbers — not numbers provided in the suspicious communication. Even something as simple as a team-agreed safe word for urgent wire requests. This sounds old-fashioned. It works.

 

Second: multi-step approval for wire transfers, full stop. Regardless of how convincing the voice sounds, regardless of how legitimate the video looks. The Singapore attackers adapted specifically because they knew people had heard about deepfakes and might push back. Your controls need to assume that even verification steps can be faked. The answer is layering — multiple people, multiple channels, documented authorization.

 

Third: train your people on deepfake scenarios specifically. Generic phishing training isn't going to help a finance director who gets a Teams call from someone who looks and sounds exactly like the CFO. Run tabletop exercises that include this scenario. Walk through what the red flags are — unexpected urgency, unusual payment destinations, pressure to keep transactions confidential, proactive offers of verification from the requester.

 

Fourth: think carefully about your executives' public audio and video footprint. I'm not suggesting your leadership go dark — that's not realistic, and it has its own costs. But be intentional. Know what training material exists and factor that into your threat model.

 

Fifth: AI-detection tools. They exist, they're improving, and they're worth evaluating. But do not make them your primary control. Detection tools are currently losing the arms race to generation tools. They are a useful additional signal, not a safety net you can trust completely.

 

Sixth — and this is the longer arc — watch the content provenance space. The Coalition for Content Provenance and Authenticity, C2PA, is developing cryptographic signing standards for media. When this matures and is adopted at platform scale, it will shift the burden of proof onto synthetic content. We're not fully there yet. But it's coming, and it matters.

 

Here's the frame I'd leave with every executive and every security team: CEO fraud via deepfake is now targeting at least 400 companies per day. The question isn't whether your executives will be impersonated. It's whether your organization has the procedures in place that make that impersonation fail before anyone moves money.

 

[PAUSE]

 

  [ OUTRO MUSIC — FADE IN SOFTLY ]  

 

OUTRO

 

Let's bring this home.

 

Takeaway one: voice synthesis has crossed the indistinguishable threshold. A few seconds of audio is all an attacker needs. The audio your executives produce publicly is training data.

 

Takeaway two: multi-participant deepfake video calls are operational — not theoretical. Arup proved it. Singapore confirmed it. Your finance team needs to know this is a real scenario they could face.

 

Takeaway three: attackers are adapting to your awareness. Proactively offering verification is now part of the playbook. Your controls have to go deeper than a video call confirmation.

 

Takeaway four: process beats perception. Your people cannot reliably detect high-quality deepfakes with their eyes and ears. Policy, multi-step approval, and out-of-band verification are your actual defenses.

 

The threat is real, it's growing, and it's sophisticated. But so are the people defending against it — and that includes you.

 

Thank you for stopping by the Café. If you enjoy this content, stop by next week or listen to some of our older podcast episodes to stay informed. 

 

As always — stay sharp, stay skeptical, and remember Knowledge is power.  I'll catch you next time at the café.

 

  [ OUTRO MUSIC — SWELL AND FADE ]