Behind the Binary by Google Cloud Security

EP22 Jailbreaking, Prompt Injection, and the "Agentic" Flaw in MCP with Kevin Harris

Josh Stroschein Season 3 Episode 2

"Skilled adversaries have a 100% success rate against all of the defenses that we know about."

In this episode, Kevin Harris defends that claim. We move past the standard "AI Safety" talking points to distinguish between the two attack vectors confusing the industry: Prompt Injection (an application-layer failure) vs. Jailbreaking ("gaslighting" the model via context shifting).

Kevin argues that we haven't actually invented AI yet—we've just built a mirror that reflects our own intelligence (and psychosis) back at us. We also dissect the new model context protocol (MCP) and why giving "discretion" to agents that cannot think is potentially repeating the security mistakes of Web 2.0.

THE SESSION:

  • The "Pirate" Jailbreak: Why telling a model to be a pirate isn't just a party trick—it's a method of shifting the context window to bypass refusal patterns.
  • The 100% Failure Rate: Why current defenses are only speed bumps for skilled adversaries, and why you are attacking the application, not the model.
  • "There Is No AI": Kevin’s theory on why LLMs are just "predictive text made 3 orders of magnitude better" and the danger of "AI-induced psychosis".
  • The Agentic Threat (MCP): A deep dive into the model context protocol. Why client-side authorization is the new "Browser Security" battleground, and why we are handing "table saws" to users who don't know how to use them.
  • The Fix: Why "Attention Functions" are the key to understanding (and securing) the future of these models.

Join the Community

  • Research Hub: Threat research, training events and news:
    https://cloud.google.com/security/flare
  • The FLARE Insider: Get community updates and announcements. To subscribe, email flare-external@google.com

FOLLOW THE SHOW: