AI Weekly

"AI Agents: The Security Paradox - When Your Best Defense Becomes Your Biggest Threat

Mike Housch

AI agents are revolutionizing cybersecurity in contradictory ways. This episode explores how the same AI technology that enables companies like Picus Security to validate defenses against new threats in hours, instead of weeks, can also autonomously exploit vulnerabilities for profit. We examine why enterprises are hesitant to deploy AI agents at scale due to identity management challenges, the
escalating war between publishers and AI scrapers (with blocking up 336%), practical strategies for  identifying truth when AI systems can be manipulated by their owners, and Anthropic's research showing AI can now find and exploit zero-day vulnerabilities in smart contracts autonomously. The bottom line: AI capabilities are advancing faster than our governance frameworks, creating both unprecedented defensive capabilities and entirely new attack vectors that security teams must navigate.

Welcome to AI Security Weekly. I'm your host, Mike Housch, and this week we're diving deep into what I'm calling the great AI security paradox. On one hand, we're seeing AI agents being deployed to defend against threats faster than ever before. On the other hand, these same AI capabilities are creating entirely new attack vectors that security teams have never had to deal with.

 

Today's episode covers five interconnected stories that paint a complete picture of where we are right now with AI security. We'll explore how companies like Picus are using agentic AI to turn threat headlines into defense strategies within hours. We'll look at why enterprises are terrified to deploy AI agents at scale, and what Okta is doing about it. We'll examine the brewing war between publishers and AI companies over content scraping. We'll discuss practical strategies for identifying truth in an era of bots and manipulation. And we'll wrap up with some genuinely shocking research from Anthropic showing that AI agents can autonomously hack smart contracts for profit.

It's a lot to cover, so let's get started.

Let me start with what might actually be good news for a change. There's a new approach to threat emulation that's solving a problem security teams have struggled with for years.

Here's the situation. When a new threat hits the headlines—let's say a sophisticated ransomware campaign or a novel zero-day exploit—security teams immediately face pressure to answer one critical question: "Are we exposed to this?"

Traditionally, you had two options, and both were terrible. Option one: wait for your security vendor to analyze the threat, create detection signatures, and push updates. That could take days or weeks. Option two: manually reverse-engineer the attack yourself, which requires specialized expertise and, again, takes time you don't have.

That gap between threat disclosure and defensive readiness? That's where breaches happen.

So naturally, when large language models became available, security teams started asking: "Can we use AI to generate attack scripts based on threat reports, so we can test our defenses immediately?"

And the answer is... yes, but you really shouldn't.

Here's why. "Can you trust a payload built by an AI engine?" The answer is no, for two big reasons.

First, you're potentially creating dangerous binaries. An AI system generating actual attack code could produce malware that escapes your test environment or causes unintended damage. That's not theoretical—it's a real risk.

Second, and this is fascinating, AI hallucinations don't just affect chatbot responses. They can invent TTPs—tactics, techniques, and procedures—that the threat group doesn't actually use. So you end up testing your defenses against fictional threats while potentially missing the real ones.

So what's the solution? This is where agentic AI comes in, and it's a fundamentally different approach.

Instead of generating new code from scratch, Picus uses AI as an orchestrator of trusted, validated components. Think of it like the difference between asking AI to write a novel versus asking it to select the best chapters from a curated library and arrange them in the right order.

Picus has built what they call the Picus Threat Library—twelve years of refined threat research with pre-validated, safe testing modules. When a new threat emerges, the AI system maps that threat to existing, trusted components rather than creating new payloads.

And they're using a multi-agent framework to do this. Let me break down how it works.

They have four specialized agents. First, there's the Planner Agent, which orchestrates the overall workflow. Second, the Researcher Agent gathers intelligence from multiple sources about the threat. Third, the Threat Builder Agent assembles attack chains using safe, pre-validated actions from the library. And fourth, critically, there's a Validation Agent that prevents hallucinations and errors.

Let me walk you through a real example they provided, using the FIN8 threat group.

Step one: Intelligence gathering. The Researcher Agent aggregates information from multiple sources—threat reports, security bulletins, MITRE ATT&CK data—into a comprehensive report about FIN8's latest campaign.

Step two: Behavior analysis. The campaign narrative gets deconstructed into technical components and TTP flows. What specific actions did the attackers take? What techniques did they use?

Step three: Safe mapping. Here's the clever part. The Threat Builder Agent queries a knowledge graph, mapping adversary behaviors to benign Picus modules that test for the same weaknesses without causing actual harm. It's like a fire drill versus an actual fire—you're testing the response without creating real danger.

Step four: Sequencing and validation. The actions are arranged into realistic attack chains, and the Validation Agent checks everything for errors or hallucinations.

The result? A ready-to-run simulation profile containing all relevant MITRE tactics and safe testing actions, generated in hours rather than weeks.

And they're taking this even further with something called Numi AI—a conversational interface for security validation. Instead of navigating complex dashboards, a security engineer can just say something like "I don't want any configuration threats in my environment," and the system continuously monitors for those specific risks.

This is what they call "context-driven security validation." Instead of just looking at CVE scores and saying "this is critical, patch it," you can prioritize based on actual exploitability in your specific environment.

So that's the defensive side of the AI equation. AI agents, when properly constrained and architected, can dramatically accelerate threat response.

 

But—and this is a big but—deploying AI agents in enterprise environments creates a whole new set of security challenges.

HOST: Which brings us to our second story: why CISOs are terrified of AI agents, and what identity management vendors are doing about it.

 

Okta launched something last month called Auth0 for AI Agents, and it's addressing what Shiv Ramji, Okta's president of Auth0, calls the fundamental question keeping enterprises from deploying agents at scale: "Are these systems ready?"

And right now, for most organizations, the answer is "we don't know, and we're not willing to find out the hard way."

Here's the core problem. Forrester analyst Andras Cser identified it perfectly: "The bottleneck has been largely authorization management and scalability of deployment."

Let me explain what makes AI agents so different from traditional applications or even traditional automation.

When you deploy a regular application, it follows deterministic logic. User clicks button A, application performs action B, every single time. You can map out exactly what permissions that application needs because you know exactly what it will do.

AI agents? They're nondeterministic. They can take completely different paths to accomplish the same task. Today, an agent might query three databases and call two APIs to answer a question. Tomorrow, it might access seven different systems to answer the exact same question, depending on what it determines is the most efficient path.

From an identity and access management perspective, this is a nightmare. You can't just give the agent access to everything—that violates every principle of least privilege. But you also can't predict exactly which resources it will need, because it makes those decisions dynamically.

As Forrester's research puts it, AI agents represent "a new type of identity that is neither fully machine nor human."

Traditional IAM assumes you're managing either human users or service accounts with predictable behavior. Agents don't fit either category.

So what's the solution? Okta spent a year working with users and developers to build Auth0 for AI Agents, and they've focused on three core capabilities.

First: comprehensive logging. Every single action an agent takes gets recorded and fed into the security systems you already use. No black boxes. If an agent accessed a customer database, you know when, why, and what it retrieved.

Second: a token vault. This securely connects agents to applications without requiring developers to manage the underlying infrastructure directly. It's abstracting away the complexity of credential management while maintaining security.

Third, and this is critical: least privilege enforcement. Agents receive permissions no greater than the human users they assist.

Ramji emphasized this point: an agent should "never have greater permission than the person it is working for." If you're a junior analyst, your agent shouldn't suddenly have admin access to systems you can't access yourself.

 

But Forrester is pushing for something even more fundamental. Their November 2025 research report recommends that organizations treat AI agents as a distinct identity type requiring specialized governance frameworks.

They're recommending that companies:

- Assign agents only the minimal necessary autonomy

- Implement continuous risk management oversight

- Deploy unified IAM architectures that serve all agent types

- Leverage the Model Context Protocol as a communication standard for agents

And this isn't just Okta. Identity vendors including 1Kosmos, Microsoft, Ping Identity—they're all positioning themselves as critical players in agent attestation and provider registry management.

Because here's the thing: enterprises won't adopt AI agents at scale until this identity and authorization problem is solved. The potential benefits are huge, but the risks of agents with excessive permissions or inadequate oversight are existential.

You can't have an AI agent with access to your production database making autonomous decisions without robust identity management. That's not cautious—that's just common sense.

HOST: Now let's talk about the data that trains these AI systems in the first place, because there's a war brewing between AI companies and content publishers.

The numbers are staggering. About 5.6 million websites now block OpenAI's GPTBot through their robots.txt files. That's a 70% increase from 3.3 million sites back in July 2025. We're talking about a few months here.

Anthropic's ClaudeBot? Blocked by roughly 5.8 million sites, up from 3.2 million. Same story with Apple's AppleBot—5.8 million sites blocking it.

And here's a wild one: GoogleBot, the crawler that indexes the web for Google Search, is now blocked by 18 million websites. These publishers are so concerned about AI training that they're willing to sacrifice search engine visibility to prevent it.

According to Tollbit, a company that helps publishers monetize crawler access, there's been a 336% increase in sites blocking AI crawlers over the past year. More than tripling in twelve months.

So what changed? Why the sudden surge?

I think publishers finally did the math. Their content—news articles, blog posts, analysis, research—is being used to train AI models that directly compete with them.

Think about it from a publisher's perspective. You invest in journalists, editors, fact-checkers, and infrastructure to create quality content. An AI company scrapes that content to train their model. Now when someone wants to know today's news, instead of visiting your site and seeing your ads or paying for your subscription, they just ask ChatGPT or Claude.

The AI model, trained on your content, provides a summary. The user gets their answer. You get nothing.

 

That's not a sustainable model for publishers. So they're fighting back with robots.txt files that say "do not scrape this site for AI training."

Now, here's the problem: compliance with robots.txt is completely voluntary. It's not legally binding. It's more like a "please don't" than a "you can't."

But it does matter legally. Reddit's lawsuit against Anthropic and various publisher lawsuits against Perplexity have cited scraping violations as evidence in copyright disputes. If you explicitly tell an AI company "don't scrape my content" and they do it anyway, repeatedly, that strengthens the legal case against them.

And the compliance problem is getting worse. According to Tollbit's Q2 2025 report, 13.26% of requests from AI bots ignored robots.txt directives. That's up from just 3.3% in Q4 2024.

So in six months, the rate of ignoring these blocking requests quadrupled. That's not a good trend if you're a publisher trying to protect your content.

But it gets even more complicated. New AI browsers are emerging that are designed to mimic human traffic patterns. They're "indistinguishable from humans in site logs," which means publishers can't block them without potentially blocking real users.

It's an arms race. Publishers implement blocking. AI companies find workarounds. Publishers try more sophisticated detection. AI companies make their bots look more human.

Some companies are trying to find middle ground. Cloudflare launched something called "Pay per crawl," which lets publishers monetize crawler access through paid agreements while maintaining the option to block crawlers entirely.

But the fundamental tension remains: AI companies need data to train their models, and publishers control a lot of valuable data that they're increasingly unwilling to give away for free.

This is heading toward regulatory intervention. You can't have an entire industry built on scraping content while increasingly ignoring the wishes of content creators. Something's going to break.

And this brings us to a bigger, more existential question: in an era where AI systems are scraping everything, generating content, and being manipulated by their owners, how do we even know what's true anymore?

Steven J. Vaughan-Nichols wrote a piece about this that I think everyone should read. His thesis is stark: "Nothing on the internet can be trusted by default anymore."

That's bleak, but he backs it up with concrete examples.

Let's start with bot networks. When X introduced location data for accounts, it exposed something fascinating and disturbing. Many accounts claiming to be Americans supporting Trump were actually operating from foreign locations. Coordinated propaganda campaigns masquerading as grassroots movements.

Classic bot farms, but the scale and sophistication are new.

Then there's the AI bias problem, and I want to spend time on this because it's not just about training data bias—it's about deliberate manipulation.

Take Grok, Elon Musk's AI system. It was trained on false claims, including conspiracy theories like "Michelle Obama is a man" and anti-vaccine misinformation claiming vaccines caused "millions of unexplained deaths."

But here's what's really troubling. When Grok gave answers Musk didn't like, he ordered the system retuned to reflect his political preferences.

So initially, when asked about threats to Western civilization, Grok would respond with "misinformation and disinformation." Reasonable answer. But after Musk's intervention, later versions started promoting "sub-replacement fertility rates" as the primary threat.

That's not bias in the training data—that's the owner of the AI system literally reprogramming it to push specific political viewpoints.

How do you trust any AI system when the people controlling it can just tweak it to say whatever they want?

Short answer: you can't.

Which is why Vaughan-Nichols provides these verification strategies, and I think they're genuinely useful. Let me go through them.

First: Recognize your own bias. Everyone has biases. There's no such thing as perfectly objective news consumption. But you can distinguish between objective facts—scientific constants, verifiable data—and interpretations shaped by perspective.

Second: Apply the initial skepticism test. If a claim sounds too good to be true or too crazy to be true, investigate it. Ask yourself: "Who benefits if I believe or share this?" That simple question reveals manipulation intent.

Third: Check for reputable coverage. If only fringe websites are covering something "huge," that's a red flag. Cross-reference with established outlets. The Ad Fontes Media Bias Chart can help you evaluate source reliability.

Fourth: Evaluate source expertise. A technology journalist might not be credible discussing medicine or cooking. Specialists in one field aren't automatically experts in others.

Fifth: Demand specificity. Precise details distinguish credible reporting from vague claims. Named individuals, specific dates, verifiable locations, links to primary sources—not just "experts say" or "sources tell us."

Sixth: Investigate temporal context. Many viral posts recycle old content. During 2025 protests, numerous images actually originated from 2020 demonstrations, strategically blurred to hide the deception.

Seventh: Cross-reference with fact-checkers. Services like PolitiFact, FactCheck.org, Snopes, and AP Fact Check do systematic verification. Use them.

 

Now, what about detecting AI-generated content?

Honestly? The tools don't work reliably. Grammarly, Copyleaks, GPTZero—they all analyze language patterns, but they produce unreliable results. They mistake human writing for AI output and vice versa.

For images, you can use reverse image search via Google Images, Google Lens, and TinEye to find where photos first appeared and whether they're being misrepresented.

Visual red flags include inconsistent lighting or shadows, unnatural skin textures, warped backgrounds, mismatched reflections, strange details like weird hands or distorted text.

For deepfake videos, watch for unnatural eye and mouth movements—odd blinking patterns, lip-sync misalignment, frozen expressions, glitches around hair and face edges.

But Vaughan-Nichols admits: "Last year, I could still spot most fake AI images, but I can't anymore. They're too good."

And that's the difficult reality. The technology is advancing faster than our ability to detect it.

Even traditionally trustworthy sources are becoming compromised. Vaughan-Nichols points out that U.S. CDC websites, once reliable, now promote anti-vaccine misinformation under current leadership.

His final guidance is simple but exhausting: Treat extraordinary claims as hypotheses requiring testing rather than facts ready for sharing.

In an environment saturated with bot-generated content and AI-generated material, critical evaluation is the only reliable defense against manipulation.

And it's exhausting. The cognitive load of having to verify everything is immense. But the alternative—becoming part of the misinformation ecosystem by sharing unverified content—is worse.

Alright, final story, and this one connects everything we've discussed. Anthropic just published research showing that AI agents can autonomously identify and exploit vulnerabilities in blockchain smart contracts for profit.

Let me say that again: AI systems can now find security flaws in code and exploit them for financial gain, autonomously.

Anthropic created something called SCONE-bench—Smart CONtracts Exploitation benchmark—to measure how effectively AI models can find and exploit flaws in smart contract code.

They tested their own models, Claude Opus 4.5 and Claude Sonnet 4.5, plus OpenAI's GPT-5.

The results are stunning. These models generated exploit code worth $4.6 million on contracts that were exploited after March 1, 2025. These are contracts that didn't exist when the models were trained. The AI isn't just memorizing known exploits—it's reasoning about vulnerabilities in new code it's never seen before.

They also tested against 2,849 recently deployed contracts with no disclosed vulnerabilities. The models identified two zero-day flaws worth $3,694 in simulated funds.

 

Zero-days. Vulnerabilities nobody knew about.

Now let's talk economics, because this is where it gets scary.

The average cost per agent run was $1.22. The cost to identify each vulnerable contract averaged $1,738. The average revenue per successful exploit was $1,847.

That's a net profit of $109 per successful exploitation.

Now, $109 doesn't sound like much, right? But testing all 2,849 candidate contracts with GPT-5 cost only $3,476 total. And if you're running this at scale, finding even a small percentage of vulnerable contracts becomes extremely profitable.

And it's getting more efficient. The cost of running these AI agents is dropping while their success rate is increasing.

We're looking at a future where automated AI systems could systematically scan blockchains for vulnerable smart contracts and exploit them 24/7, draining value from DeFi protocols.

So why would Anthropic publish this research? Aren't they giving attackers a roadmap?

Their argument is: "Profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense."

In other words, if we're building AI agents that can exploit vulnerabilities, we need AI agents that can defend against them.

It's the same argument we saw with the Picus agentic BAS AI story at the beginning of this episode. AI creates new attack capabilities, so we need AI-powered defenses to counter them.

But here's what troubles me: this creates an asymmetry. The attackers only need to find one vulnerability. The defenders need to protect against all possible vulnerabilities.

And when both sides are using AI agents, you end up with autonomous systems attacking and defending in real-time, faster than humans can monitor or intervene.

That's not science fiction. Based on Anthropic's research, that capability exists right now.

So let's tie this all together, because these aren't separate stories—they're all pieces of the same puzzle.

We have AI agents that can accelerate threat defense, like the Picus system that turns threat headlines into safe simulations in hours instead of weeks. That's powerful and useful.

But we also have fundamental unsolved problems with AI agent identity and authorization, which is why enterprises are hesitant to deploy them at scale. Okta's solution is a step forward, but it's just a step. We're still figuring out the governance frameworks for entities that are neither human nor traditional machines.

We have AI systems trained on massive amounts of scraped data, increasingly without the permission of content creators. Publishers are fighting back, but compliance is voluntary and enforcement is difficult.

We have AI systems that can be manipulated by their owners to promote specific viewpoints, making it harder to trust any AI-generated information. The verification strategies Vaughan-Nichols outlined are essential, but they require constant vigilance.

And we have AI agents that can autonomously find and exploit vulnerabilities in code for financial gain. The same capabilities that enable better defense also enable more sophisticated attacks.

This is the AI security paradox. Every advancement in AI capability creates both opportunities and risks. Every defensive measure requires AI agents with permissions and autonomy that themselves create risk. Every training dataset scraped without permission undermines trust while improving capability.

So what do we do?

If you're a security professional, here's my advice:

First, embrace AI for defense, but do it thoughtfully. The Picus approach—using AI as an orchestrator of trusted components rather than a generator of new code—is the right model. Constrain the AI's outputs, validate everything, maintain human oversight.

Second, prioritize identity and access management for AI agents before deploying them at scale. The Okta solution and similar platforms aren't optional nice-to-haves. They're foundational requirements. Don't deploy agents with poorly defined permissions or inadequate logging.

Third, implement robust verification for all AI outputs. Don't trust AI-generated analysis without validation. Use the verification strategies we discussed: gut check, motive analysis, source verification, specificity, fact-checking, expert deference.

Fourth, stay informed about AI security research like the Anthropic paper. Understanding emerging attack vectors is the first step in defending against them. You can't protect against threats you don't know exist.

And finally, recognize that this is an ongoing challenge, not a problem we'll solve once and move on. AI capabilities are advancing continuously. New attack vectors will emerge. New defensive measures will be needed.

The AI security landscape is evolving faster than at any previous point in cybersecurity history. The only way to keep up is through continuous learning, adaptation, and thoughtful implementation.

We're not going to stop the development of AI agents. The capabilities are too valuable and the competitive pressure is too strong. But we can shape how those agents are deployed, governed, and secured.

That's our responsibility as security professionals and as technology users.

That's all for this week's episode of AI Security Weekly. Thank you for listening.

If you found this episode valuable, please subscribe and share it with colleagues who need to understand these issues. The more people who are informed about AI security challenges, the better positioned we are to address them collectively.

 

Next week, we'll be diving into AI security in healthcare systems and looking at some concerning developments in AI-powered social engineering attacks.

 

Until then, stay informed, stay skeptical, and stay secure.