Google Catches the First AI Built Zero Day: Inside the GTIG Report on a Two Factor Authentication Bypass, the LLM Fingerprints, and the Mass Exploitation Event That Almost Was - May 12, 2026 Artwork

DX Today | No-Hype Podcast & News About AI & DX

The DX Today Podcast: Real Insights About AI and Digital Transformation

Tired of AI hype and transformation snake oil? This isn't another sales pitch disguised as expertise. Join a 30+ year tech veteran and Chief AI Officer who's built $1.2 billion in real solutions—and has the battle scars to prove it.

No vendor agenda. No sponsored content. Just unfiltered insights about what actually works in AI and digital transformation, what spectacularly fails, and why most "expert" advice misses the mark.

If you're looking for honest perspectives from someone who's been in the trenches since before "digital transformation" was a buzzword, you've found your show. Real problems, real solutions, real talk.

For executives, practitioners, and anyone who wants the truth about technology without the sales pitch.

All Episodes

DX Today | No-Hype Podcast & News About AI & DX

Google Catches the First AI Built Zero Day: Inside the GTIG Report on a Two Factor Authentication Bypass, the LLM Fingerprints, and the Mass Exploitation Event That Almost Was - May 12, 2026

May 12, 2026

0:00 | 12:46

Send us Fan Mail

Google Catches the First AI Built Zero Day: Inside the GTIG Report on a Two Factor Authentication Bypass, the LLM Fingerprints, and the Mass Exploitation Event That Almost Was - May 12, 2026 On May 11, 2026 Google Threat Intelligence Group disclosed the first publicly documented in the wild use of an AI generated zero day exploit, a Python script that bypassed two factor authentication on a widely deployed open source admin tool and was poised for mass exploitation before Google disrupted it. Chris and Laura unpack the telltale LLM fingerprints in the code, the new AI augmented malware families like PROMPTSPY, PROMPTFLUX, and PROMPTSTEAL, the China, Russia, and North Korea linked threat actors named in the report, and the shadow API relay market that is leaking premium model access into mainland China. The bottom line: the discovery to weaponization timeline has already collapsed, and defenders do not get to opt out. Hosted by Chris and Laura. The DX Today Podcast brings you daily deep dives into the most consequential stories in the AI ecosystem. Send us fan mail: https://dxtoday.com/contact #AI #Cybersecurity #ZeroDay #ThreatIntelligence #GoogleGTIG

SPEAKER_01 0:00

Welcome to the DX Today Podcast, your daily deep dive into the AI ecosystem. I'm Chris, and joining me as always is Laura.

SPEAKER_00 0:07

Hey Chris, and hello to everybody listening on this Tuesday. Today we are talking about a story that I think people are going to look back on as a real before and after line in the sand.

SPEAKER_01 0:16

That is a pretty bold claim to open with, so set the table for us. What exactly happened and why does it qualify as the moment everything pivoted in your view?

SPEAKER_00 0:25

So on May 11th, 2026, Google Threat Intelligence Group, which the security crowd just calls GTIG, dropped a report that for the first time publicly documents an AI-built zero day actually being deployed in the wild.

SPEAKER_01 0:39

Okay, let me slow you down for the people who do not live and breathe security jargon. A zero day basically means a flaw that the vendor has zero days to fix because it is already known to attackers, right?

SPEAKER_00 0:52

Exactly right. And that framing matters because zero days are the crown jewels of offensive security. They are rare, they are expensive, and historically you needed top-tier human researchers and weeks of grinding to find and weaponize a really good one.

SPEAKER_01 1:06

So, what makes this one different in a way that actually matters technically beyond the marketing buzz of being able to slap the words AI generated onto a press release headline?

SPEAKER_00 1:15

GTIG specifically caught a cybercrime crew that was using an AI model to build a two-factor authentication bypass for a very popular open source web-based system administration tool, which they plan to use in a mass exploitation event.

SPEAKER_01 1:30

Mass exploitation event is one of those phrases that sounds clinical, but in practice it means thousands of organizations would have woken up to find their admin panels compromised on the same morning. Is that the right mental picture?

SPEAKER_00 1:43

That is exactly the mental picture. And Google has not yet publicly named the tool because they wanted to give the vendor time to patch it and disrupt the campaign before it could actually be launched against customers in production.

SPEAKER_01 1:55

That is a smart disclosure dance. But it also means we, the listeners, do not get to know whether this is a tool we are personally running. How worried should the average sysadmin really be right now?

SPEAKER_00 2:06

I would not panic, but I would absolutely take this as a forcing function to audit your perimeter and your admin tooling, because the broader signal here is that the cost of producing a working zero day just collapsed.

SPEAKER_01 2:17

Let me push back on that for a second, though. Google says they have high confidence, an AI model was involved, but they did not catch the attacker red-handed prompting a chatbot, so how do they actually know?

SPEAKER_00 2:28

The fingerprints are honestly delightful if you are a code reviewer. The Python script that did the exploit was riddled with educational dock strings, included a hallucinated common vulnerability scoring system rating, and used what GTIG called a textbook Pythonic format.

SPEAKER_01 2:44

I love the detail that the model essentially gave itself a fake C VSS score, because that is something a human exploit developer would never bother to do. That is pure model behavior leaking through.

SPEAKER_00 2:56

That is the giveaway. Plus, there were detailed help menus, a clean C ANSI color class for terminal output, and the kind of overly thorough comments and structure you only see when a large language model is trying to be helpful.

SPEAKER_01 3:09

So basically, the exploit looked like a really polite, well-documented homework assignment, which is exactly the opposite of how a battle-hardened human exploit developer would actually write something for criminal operational use.

SPEAKER_00 3:21

Precisely, and that is the tell. Real underground exploit code is usually minimal, ugly, and stripped of anything that could be used as a fingerprint. AI-authored code has the opposite problem, it leaves a paper trail.

SPEAKER_01 3:34

Now here's the part that fascinates me. Google was very careful to say there is no evidence that Gemini, their own model, was actually the one used. Why do you think they made that distinction so loudly?

SPEAKER_00 3:45

Because they have to. If Gemini had been weaponized, this would be a totally different and much more embarrassing story for Google, and they would be answering questions about safety filters and abuse monitoring all week long.

SPEAKER_01 3:56

Got it. So the responsible reading is that this was likely some other model, possibly an open source one that an attacker could run locally with no abuse monitoring at all, which is honestly even more frightening in some ways.

SPEAKER_00 4:08

That is the dark interpretation, and a lot of security researchers are leaning toward it. If you can spin up a capable coding model on a single GPU and aim it at vulnerability research, your guardrails are whatever you choose them to be.

SPEAKER_01 4:22

Let's talk about the actual flaw though, because I think this is where the story gets even more interesting. What kind of bug did the model actually find inside this open source administration tool?

SPEAKER_00 4:31

GTIG described it as a high-level semantic logic flaw that came from a hard-coded trust assumption inside the code base, which is exactly the class of bug that large language models are surprisingly excellent at noticing.

SPEAKER_01 4:45

Walk me through that, because I think most people assume AI is good at memorization and pattern matching, not at the abstract reasoning you need to spot a trust assumption buried inside thousands of lines of code.

SPEAKER_00 4:56

This is the subtle part. Logic bugs are not about a typo or a missing bounds check. They are about a developer assuming something is always true, when in a corner case it is not, and a language model can reason about intent surprisingly well.

SPEAKER_01 5:10

So you are saying the model's essentially reading the code the way a senior reviewer would, asking what the developer must have been thinking, and then poking at whether that assumption holds up under adversarial conditions.

SPEAKER_00 5:21

That is the perfect description, and it is also what makes this dangerous. Traditional fuzzers throw random inputs at programs hoping to crash them, but a model reading for intent can find a whole class of bugs that fuzzers miss entirely.

SPEAKER_01 5:34

Which means defenders cannot just buy a slightly better fuzzer and call it a day. The attackers now have a tool that operates at a different layer of abstraction, and our existing scanners may simply not see what the model sees.

SPEAKER_00 5:46

Exactly. And the GTIG report makes the point that this is not theoretical anymore. They list an entire taxonomy of AI augmented threat activity that they are already tracking across multiple state and criminal actors.

SPEAKER_01 6:00

Let's get into that taxonomy because I think the zero-day exploit gets the headline. But the supporting cast in this report is honestly the part that should keep security leaders up at night for the rest of the quarter.

SPEAKER_00 6:11

Okay, so the headline: Malware Family is Prompt Spy, an Android backdoor that does something genuinely creepy. It abuses the Gemini API at runtime to actually analyze what is on the victim's screen in real time.

SPEAKER_01 6:23

That is a wild design choice. Instead of hard coding the malware's behavior in advance, the attackers are essentially letting an AI watch the phone screen and decide what to do next based on what it sees.

SPEAKER_00 6:34

Right, and prompts PI goes further than that. It captures biometric replay data so it can defeat lock screens, and it has an app protection detector module that finds the uninstall button on the screen and overlays an invisible block on top of it.

SPEAKER_01 6:48

So when the worried user finally figures out something is wrong and tries to delete the app, their tap just falls into the void because the malware is literally squatting on the pixel coordinates where the uninstall button lives.

SPEAKER_00 7:00

Yes, and the kicker is the malware is built to rotate its own infrastructure dynamically through its command and control channel, including the Gemini API keys without ever needing to redeploy a new version of the payload.

SPEAKER_01 7:13

That is operationally a different kind of resilience than we usually see in mobile malware. They essentially built it on the assumption that defenders will burn pieces of it, and the architecture just shrugs and rotates around the damage.

SPEAKER_00 7:25

And PrompSpee is just one of several. Earlier reports from GTYG already documented PromptFlux and PromptSeal, two families that query large language models during execution to rewrite their own code or generate fresh commands on the fly.

SPEAKER_01 7:41

That means the old Defender playbook of taking a malware sample, reverse engineering it, and writing a signature is becoming much less useful because the next version of the malware might literally be different code generated minutes later.

SPEAKER_00 7:53

Exactly. The report also names specific threat actors, including suspected China-linked groups under the tracking codes UNC2814, UNC5673, and UNC6201, all of them experimenting with AI in different ways.

SPEAKER_01 8:09

And it is not just China. Google also flagged North Korean APT 45 running thousands of repetitive prompts to analyze different CVEs and validate proof-of-concept exploits at a scale no human team could maintain.

SPEAKER_00 8:23

Right, plus Russia-linked groups deploying AI-enabled malware that Google calls canfail and longstream, both of which use language model-generated decoy code to hide their real malicious functionality inside what looks like ordinary application logic.

SPEAKER_01 8:37

So we have got nation-state actors on three continents, plus financially motivated cybercrime crews, all converging on the same realization at roughly the same time. That is not a coincidence. That is a phase transition in the threat landscape.

SPEAKER_00 8:50

That is exactly how John Holtquist, who runs analysis at GTYG, framed it in the report. He called it the moment cybersecurity has been warning about for years, the moment attackers actually arm themselves with AI at scale.

SPEAKER_01 9:03

There's also a really interesting bit in the report about a kind of gray market for AI access. Tell me about the shadow API relays, because I think this is the part most listeners will not have heard about.

SPEAKER_00 9:14

So this is fascinating and a little dystopian. Researchers found a whole ecosystem of relay services, mostly hosted outside mainland China, that let developers inside China illegally access Anthropic Claude and Gemini without the official regional restrictions.

SPEAKER_01 9:29

I assume those services advertise themselves as faithful proxies to the real models. But you mentioned there's a twist where the relays may not actually deliver the model you think you are paying premium dollars for.

SPEAKER_00 9:41

Yes, and this is where the CIAT's P.A. Helmholtz Center for Information Security ran a study in March of 2026. They identified 17 shadow APIs and benchmarked them against the real Gemini.

SPEAKER_01 9:53

Give me the numbers, because I think when people hear model substitution, they imagine the difference between vanilla and French vanilla. The real performance gap apparently is more like the difference between an expert and an enthusiastic amateur.

SPEAKER_00 10:06

On the medical question benchmark called Med QA, the official Gemini 2.5 Flash scored about 83.8% accuracy, while the shadow AP is dropped all the way down to roughly 37%.

SPEAKER_01 10:19

That is a 46-point gap, which means anybody building a real product on top of those shadow APIs is shipping something that is dangerously worse than what they think they are shipping to their customers.

SPEAKER_00 10:31

And the operators of those relays see every single prompt and every single response, which means there is a gold mine of data flowing through them that can be used for fine-tuning copycat models or for outright corporate espionage.

SPEAKER_01 10:43

So the same infrastructure that lets attackers anonymize their access to capable models also conveniently exfiltrates a stream of prompts from companies and individuals who probably have no idea what they have actually agreed to use.

SPEAKER_00 10:55

Exactly. And that is before you get to threat actors like UNC 6201, who Google says use automated scripts to register and immediately cancel premium accounts at scale, essentially abusing free trials to subsidize their malicious operations.

SPEAKER_01 11:10

Okay, so let's land the plane. If you are a defender, a CISO, or honestly just a curious technologist listening to this, what is the practical takeaway you should walk away with from today's conversation?

SPEAKER_00 11:21

The first one is that the discovery to weaponization to exploitation timeline has already compressed. Ryan Dewhurst at Watchtower put it bluntly: defenders do not get to opt out, and the timelines are not collapsing in the future. They collapsed.

SPEAKER_01 11:35

That is the line that stuck with me too, because there's a tendency in our industry to talk about AI risk as a coming wave. When the honest read of this report is that the wave has already broken on the shore.

SPEAKER_00 11:45

The second takeaway is that defenders need to assume an attacker can produce a credible zero-day prototype against any sufficiently complex open source dependency on a budget that fits in a single weekend of compute time.

SPEAKER_01 11:57

Which means software, supply chain hygiene, dependency review, and proactive code audits have to shift left in a much more aggressive way than most engineering organizations have been willing to fund up to now.

SPEAKER_00 12:08

And the third takeaway is that this is not the moment for AI defenders to give up. It is the moment to fight back with the same tools, because the only thing that catches AI-augmented attacks is AI augmented defense.

SPEAKER_01 12:19

Which is exactly what GTIG is doing in this report. They are showing their work, they are naming the techniques, and they are signaling to the rest of the industry that the response has to be coordinated.

SPEAKER_00 12:30

That is a great place to land. If you take away one thing today, let it be this assume the attacker on the other side of your firewall now has a tireless, sleepless intern with a PhD in code review.

SPEAKER_01 12:41

That is all for today's episode of the DX Today podcast. Thanks for listening, and we'll see you next time.