Search as a Channel
Explores how discoverability powers modern marketing from SEO and paid media to social discovery and product growth.
Search as a Channel
Server Logs Are the New Search Data
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Your analytics aren't showing you what AI is actually doing on your site, and that blind spot has real business consequences. This episode breaks down the growing gap between AI crawler ingestion and actual citation or retrieval, why server-side log analysis is now an executive-level concern, and what agencies need to start measuring before clients start asking. If GPTBot is hitting your clients' sites thousands of times a day but driving zero attributable traffic, what are you actually giving away and what are you getting back? A reframe of how search visibility works in an AI-mediated world.
Imagine looking at your website analytics dashboard. You know, you're probably staring at Google Analytics right now, seeing those clean, precise little charts.
SPEAKER_02Right, the ones we all rely on.
SPEAKER_01Exactly. You've got a steady stream of visitors, page views ticking up, bounce rates fluctuating, and you look at that screen and think, you know, you have a perfectly illuminated map of exactly who is visiting your site.
SPEAKER_02And what they're consuming.
SPEAKER_01Yeah. But what if that dashboard is actually like a broken window? What if it's completely blind to a massive invisible ecosystem of bots that are just devouring your content right at this very second?
SPEAKER_02Reshaping your brand's digital footprint without leaving a single trace.
SPEAKER_01Welcome to the deep dive. I'm your host, and today we're looking at an incredible stack of sources to uncover exactly what is happening in that dark space.
SPEAKER_02And I'm thrilled to be here to break this down with you because we all rely on those metrics, right? They're categorized, colorful, and well, comforting.
SPEAKER_01Very comforting. But our sources today, we've got a revealing internal discussion from Google's Gary Illies and Martin Splitt, alongside two really eye-opening growth intelligence briefs from Kevin Indig.
SPEAKER_02Oh, and that fascinating technical experiment by Metahan YesAlert.
SPEAKER_01Yes. So our mission today is to unpack the hidden reality of how search engines and AI models actually crawl the web, and why the traditional metrics for online visibility you rely on every single day might be completely leading you astray.
SPEAKER_02It's a massive shift. The reality operating just beneath the surface of the modern web is profoundly different now.
SPEAKER_01Okay, let's unpack this. To understand this new AI reality, I feel like we have to start by shattering the biggest myth we all have about the internet. Which is the idea that when you publish a new page, one single diligent entity called Googlebot comes to look at it.
SPEAKER_02Aaron Powell Ah, yeah. Letting go of the idea of a single Google bot is honestly the mandatory first step to understanding modern search infrastructure. Trevor Burrus, Jr.
SPEAKER_01Because it's a total illusion, right?
SPEAKER_02Trevor Burrus, Completely. I mean the term itself is a historical misnomer. Google openly admits it's just a relic from the early 2000s.
SPEAKER_01Back when they basically just had one thing.
SPEAKER_02Exactly. Back then, Google essentially had one core product search. So they had one primary crawler. The singular name totally made sense.
SPEAKER_01Aaron Powell But fast forward to today. Trevor Burrus, Jr.
SPEAKER_02Right. Today you have AdWords, image search, Google News, and just like countless internal microservices that all require fresh web data to function.
SPEAKER_01Aaron Powell See, in my head, and I think for anyone listening who grew up in the early SEO era, Google bot is like a single, highly efficient librarian.
SPEAKER_02Aaron Powell A librarian. I like that.
SPEAKER_01Yeah. Like you put a new book on the shelf, the librarian walks over, inspects the table of contents, and puts a neat little card in the master catalog. Right. But reading through Gary Illy's explanation of how their systems actually operate, it sounds less like a single librarian and more like a giant trench coat hiding hundreds of different entities all stacked on top of each other.
SPEAKER_02That is a much more accurate visual.
SPEAKER_01So if Googlebot isn't the crawler, what actually is it?
SPEAKER_02Well, what's fascinating here is that Google operates a massive centralized internal crawling infrastructure. Gary Illies compared it to a software as a service platform that only exists inside Google's walls.
SPEAKER_01Like an internal SaaS product.
SPEAKER_02Exactly. He gave it a hypothetical internal name, so let's just call it Jack. Jack is the actual physical infrastructure doing the heavy lifting. Googlebot is just one of many clients that calls Jack's API endpoints to request data from the open internet.
SPEAKER_01So if I'm uh an engineer on a random internal Google team building some new feature, I don't write my own web scraper to go out and get the data I need.
SPEAKER_02Aaron Powell No, not at all. You just ping Jack.
SPEAKER_01Aaron Powell I just say, hey Jack, I need the HTML from these 10,000 URLs.
SPEAKER_02Yep. You ping the API and pass along a very specific set of parameters. You tell the infrastructure what user agent you want to broadcast. Trevor Burrus, Jr.
SPEAKER_01Which is basically the name tag you wear when knocking on a website's door.
SPEAKER_02Aaron Powell Exactly, the name tag. And you tell it how long you're willing to wait for the data to return and what specific robots.txt rules you intend to obey.
SPEAKER_01Aaron Powell And then Jack just handles it.
SPEAKER_02Right. The infrastructure takes that request, manages the bandwidth, ensures it doesn't overwhelm the target server, and fetches the bytes. It centralizes all of it.
SPEAKER_01Aaron Powell But this is where the scale of the operation gets kind of murky for me. Because Illies mentions there are potentially dozens or even hundreds of these internal crawlers pinging the infrastructure for various Google products.
SPEAKER_02Oh, easily.
SPEAKER_01And the vast majority of them are entirely undocumented. Wow. Which is wild. Like, why wouldn't a company built on organizing the world's information just list all of their own bots so site owners know exactly who is visiting?
SPEAKER_02It really comes down to a mix of sheer volume and developer practicality. Illy's explained that trying to document hundreds of tiny, highly specific crawlers on a single HTML page.
SPEAKER_00Like their official developers.google.com/slash crawlers page.
SPEAKER_02Right. That page, it's practically infeasible to list everything there. He called the space on that documentation page valuable real estate.
SPEAKER_00Valuable real estate. It's a web page.
SPEAKER_02I know, but from their perspective, if an internal crawler is tiny, highly specialized, and doesn't pull a significant volume of data, documenting it just creates noise. Oh, I see. So they draw a threshold, they only publicly document the major crawlers or the special ones that hit a certain scale of bandwidth.
SPEAKER_01Aaron Powell Meaning there are literally phantom Google bots roaming around our servers right now that we do not have names for.
SPEAKER_02Yep. And this brings up another crucial technical distinction, Ilise highlights, the difference between crawlers and fetchers.
SPEAKER_01Okay, break that down for me.
SPEAKER_02Understanding this separation is key to understanding how your server resources are being used. Crawlers perform automated, continuous work in massive batches.
SPEAKER_01Like a systematic sweep.
SPEAKER_02Exactly. Grabbing URLs whenever the infrastructure has available compute power. Fetchers, on the other hand, operate on a completely different logic. Oh so a fetcher grabs a single URL and is strictly controlled by a user or a real-time process. There is an actual human or a specific microservice on the other end waiting for the response of that exact singular fetch before they can proceed.
SPEAKER_01Oh, got it. So a crawler is like an automated street sweeper running continuously all night covering the whole grid.
SPEAKER_02That's a great analogy.
SPEAKER_01And a fetcher is someone using tweezers to pick up one specific item off the pavement because they need it for a project right this second.
SPEAKER_02A very helpful way to visualize the mechanical difference. And Google monitors this entire ecosystem internally.
SPEAKER_01So what happens if one of those undocumented fetchers starts going crazy?
SPEAKER_02Well, if one of those tiny crawlers suddenly starts pulling too much data and crosses their internal threshold, it triggers an alarm.
SPEAKER_01Okay, so they are watching.
SPEAKER_02Yeah. Illies or his team will track down the responsible engineers, audit what the tool is doing, ensure it isn't malfunctioning, and then make a judgment call on whether it now requires public documentation.
SPEAKER_01Aaron Powell Okay, but if Google is already deploying hundreds of these phantom entities just to keep up with traditional search features, it sets a wild precedent.
SPEAKER_02It really does.
SPEAKER_01I mean it forces you to wonder what happens when a completely different architecture like a massive, large language model needs to ingest the web.
SPEAKER_02Oh, that changes everything.
SPEAKER_01Aaron Powell How does an insatiable force like generative AI change the mechanics of crawling?
SPEAKER_02Well, we transition from a complex, somewhat regulated environment into total volatility. The scale of data consumption just it changes by orders of magnitude.
SPEAKER_01Here's where it gets really interesting. Let's look at Metahan Yeseliert's experiment, which Kevin Indig highlights in his brief.
SPEAKER_02Such a brilliant test.
SPEAKER_01It really is. So Metahan wanted to see exactly how these AI bots behave in the wild without any of the usual SEO noise. Right. So he built a 60,000-page website called StateGlobe.com. He generated the entire architecture and all the content using an AI model, specifically GPT 4.1 nano.
SPEAKER_02And the cost was unbelievable.
SPEAKER_01Under$10. The total cost to build this massive site was under$10. And the content was purely statistics, very data-heavy, structured information.
SPEAKER_02Aaron Powell Choosing statistical data was a brilliant variable for this experiment.
SPEAKER_01Why is that?
SPEAKER_02Because high density, structured, factual data is exactly the type of foundational information that large language models require to refine their internal weights and improve their reasoning capabilities.
SPEAKER_01It's like superfood for them.
SPEAKER_02Exactly. It is prime nutritional material for an AI.
SPEAKER_01Aaron Powell But here is the variable that matters most for you listening right now. This site was a complete ghost.
SPEAKER_02A total ghost down.
SPEAKER_01Zero backlinks, zero social media shares. Metahan intentionally did not submit an XML sit map to Google Search Console.
SPEAKER_02Nothing. An island completely isolated in the middle of the digital ocean.
SPEAKER_01So he hits publish, sets up his tracking, and waits. Within the first 12 hours, our old friend Googlebot, the crawler, the entire marketing industry obsesses over made a grand total of 11 requests.
SPEAKER_02Eleven.
SPEAKER_01Just 11 hits on a 60,000 page site.
SPEAKER_02Aaron Powell, which is completely standard behavior for a legacy search engine encountering a brand new domain with zero established authority.
SPEAKER_01Trevor Burrus Because it has no incoming links to signal importance.
SPEAKER_02Right. Traditional search engines are historically cautious. They want to keep their index clean and avoid wasting resources on potential spam.
SPEAKER_01But in that exact same 12-hour window, OpenAI's GPT bot found the site. And it didn't make 11 requests.
SPEAKER_02No, it did not.
SPEAKER_01It made over 29,000 requests.
SPEAKER_02Unbelievable.
SPEAKER_01That is a 470 times difference in appetite. It was hitting this completely unknown, unlinked site at a rate of roughly one request per second.
SPEAKER_02It's a staggering disparity. And it perfectly illustrates the sheer aggression of AI ingestion protocols compared to traditional search indexing.
SPEAKER_01Aaron Powell Wait, but if I look at my standard Google Analytics 4 dashboard, GA4 claims to have built-in bot filtering technology, right? Aaron Powell It does claim that, yes. Aaron Powell Are you saying that filtering is useless against something like GPT bot? Mm-hmm. Because if you were MeatHam looking at a GA4 dashboard during those 12 hours, you would see almost zero traffic.
SPEAKER_02Aaron Powell It's not that the filtering is failing exactly. It's that the tracking mechanism itself is fundamentally incompatible with how these bots operate.
SPEAKER_00Okay, why?
SPEAKER_02Traditional analytics tools rely on client sidetracking. They inject small snippets of JavaScript code that must run inside a human user's web browser like Chrome or Safari.
SPEAKER_01Okay, so when I click a link.
SPEAKER_02When a human clicks a link, the browser renders the page, executes that JavaScript, and sends a beacon back to the analytics server saying, hey, a user is here.
SPEAKER_01It tracks the environment, the screen size, the session duration, the scroll depth, all of that.
SPEAKER_02Exactly. But bots, particularly ingestion bots like GPT bot, do not operate within a standard web browser.
SPEAKER_01They don't care about the visuals.
SPEAKER_02Right. They do not render the visual elements or execute the JavaScript payload. They act more like an automated assembly line.
SPEAKER_01Which is scrape.
SPEAKER_02They arrive, strip the raw HTML code down to its component text and data, and immediately leave to process the next URL.
SPEAKER_01So the script never even fires.
SPEAKER_02Because the JavaScript never executes, the analytics tool is completely blind to the event. The hit is never registered on the client side to begin with, so there is nothing for GA4 to even filter out.
SPEAKER_01Wow. They leave absolutely no footprint in your marketing dashboards. None at all. But wait, if the dashboards were totally blank, how did Metahan even know his site was being stripped for parts at a rate of one page per second?
SPEAKER_02Aaron Powell Because he bypassed the client side illusion entirely and audited his server-side logs. Yes. The server log is the unvarnished truth of every single request made to the hosting machine. And he took it a step further to ensure data integrity. Well, he didn't just look at the names in the user agent strings, because anyone can write a script and name their bot GPT bot to spoof the system.
SPEAKER_01Oh, true.
SPEAKER_02He verified the raw IP addresses of those 29,000 requests, cross-referencing them against the official IP subnets and eponymous system numbers that OpenAI publishes. This was cryptographically verified as OpenAI's official infrastructure consuming his site.
SPEAKER_01This completely upends the old SEO model, doesn't it? I mean the barrier to getting an AI to crawl your site is effectively zero.
SPEAKER_02It's practically nonexistent. Trevor Burrus, Jr.
SPEAKER_01You don't need a high domain rating.
SPEAKER_02Yeah.
SPEAKER_01You don't need a PR campaign. They will find you and they will consume your data at breathtaking speed.
SPEAKER_02Yes, they will.
SPEAKER_01Which tells me server-side log analysis isn't just some dusty IT department chore anymore.
SPEAKER_02Absolutely not. The sources make this abundantly clear. Understanding server-side log data has been elevated from a technical maintenance task to a core executive intelligence function.
SPEAKER_01You have to know what's happening.
SPEAKER_02If your reporting stack cannot see the entities that are actively mapping and shaping the future of information discovery, you are flying blind. You have no actual concept of your brand's digital visibility.
SPEAKER_01Okay, let's play devil's advocate for a second. Let's say I am a brand manager. I hear my site got crawled 29,000 times by OpenAI in 12 hours.
SPEAKER_02Sounds great on paper.
SPEAKER_01Right. My first instinct is to take the team out for drinks. That sounds like a massive win. Yeah. They clearly value the data I'm producing. Naturally. Doesn't this massive appetite naturally translate into my brand, showing up when a user types a prompt into Chat GPT?
SPEAKER_02If we connect this to the bigger picture, the answer is a definitive no.
SPEAKER_01No.
SPEAKER_02No. And this represents the single most dangerous trap for content creators and marketers operating today.
SPEAKER_01I think of it like this, and tell me if this analogy works. Imagine you write a brilliant, groundbreaking textbook on economics. Okay. Someone comes along, takes your book, reads it cover to cover, memorizes every single data point, and uses your proprietary research to go get their PhD.
SPEAKER_02Which is the ingestion phase.
SPEAKER_01Exactly, ingestion. But when they finally publish their own highly acclaimed dissertation, they never actually quote you.
SPEAKER_02Oh, that's painful.
SPEAKER_01They never mention your name or link back to your original work. They simply pass your foundational knowledge off as their own inherent understanding of the world.
SPEAKER_02That is a failure of citation.
SPEAKER_01Right.
SPEAKER_02And that is a highly accurate translation of the mechanics at play here. What you just described is the ingestion gap.
SPEAKER_01The ingestion gap.
SPEAKER_02A massive spike in crawl volume from an agent like GPT bot is not evidence of AI visibility or brand reach. It simply means OpenAI is harvesting your raw materials to refine the internal weights and parameters of their overarching model.
SPEAKER_01And Meethan's data proves this, right?
SPEAKER_02The data from his experiment illustrates this gap with stark numbers.
SPEAKER_01Let's look at the actual breakdown of those bots. Over a slightly longer tracking period, the GPT bot user agent, which is the specific crawler used strictly for underlying model training, hit the site 78,000 times.
SPEAKER_02Massive volume.
SPEAKER_01But the ChatGPT user user agent, which is the specialized bot that goes out to retrieve live information to site in a real-time chat window for a human user, only crawled the site 642 times.
SPEAKER_02For chief marketing officers and brand strategists, this dynamic is a looming crisis.
SPEAKER_01Because you're getting nothing in return.
SPEAKER_02Exactly. Brands are eagerly giving away their most high-value proprietary data, like a highly structured statistical database, which is prime real estate for an AI, and capturing zero measurable business return. None. You are just providing the raw fuel for someone else's machine entirely for free.
SPEAKER_01You're subsidizing the intelligence of the AI ecosystem.
SPEAKER_02While your analytics dashboard shows zero traffic, and the AI is absorbing your insights without ever needing to route the end user back to your domain.
SPEAKER_01So you're basically invisible.
SPEAKER_02In the era of AI-driven search, absence from the synthesized answer is the new equivalent of ranking on page two of Google. If your data is only being ingested for training and never actively retrieved for a live query, you effectively do not exist to the consumer.
SPEAKER_01Man, so what does this all mean for you listening right now? We've dismantled the myth of the single Google bot. We've uncovered this invisible, highly aggressive ecosystem of AI bots tearing through server logs undetected. Right. And we've established that being eaten by an AI is wildly different from being recommended by one.
SPEAKER_02Crucial difference.
SPEAKER_01So how do you actually reorganize your strategy, both defense and offense, based on this intelligence?
SPEAKER_02Aaron Powell Well, the sources outline a very deliberate playbook for adapting to this architecture. It requires a fundamental shift in where you source your truth.
SPEAKER_01Aaron Powell What's the first step?
SPEAKER_02The first step is to audit your logs, not your analytics. You must decouple your understanding of bot interest from client-side tools like GA4.
SPEAKER_01So you need the IT team involved.
SPEAKER_02Yes. You need your engineering team to set up server-side log analysis, whether that's through an ELK stack, Splunk, or another log management tool so you can see the raw requests.
SPEAKER_00You need that visibility.
SPEAKER_02You have to separate the silent training crawlers from the live retrieval agents.
SPEAKER_01Because if you don't know exactly who is knocking on the server door, you cannot make strategic decisions about what to hand them, which leads directly to the second step from the briefs drawing the line.
SPEAKER_02You must architect a defense. This is where your robots.txt file evolves from a basic technical checklist into a critical strategic weapon.
SPEAKER_01Okay, how so?
SPEAKER_02You have the power to explicitly block GPT bot. You can dictate to the ecosystem. No, you may not systematically swallow my proprietary research to train your foundational model for free.
SPEAKER_01But you don't block everything, right?
SPEAKER_02No, crucially, within that exact same file, you can explicitly allow the chat GPT user agent.
SPEAKER_01Uh translating that back to our earlier analogy, you are legally stating you cannot read my entire textbook to get your PhD. But if a human asks you a highly specific question about my field of expertise, you are fully authorized to pull my book off the shelf, open it up, and quote me directly to the user.
SPEAKER_02Perfect translation. You protect the intellectual property from mass ingestion while remaining fully eligible for live conversational citations.
SPEAKER_01But that requires knowing the difference between the bots.
SPEAKER_02Exactly. Implementing this requires a highly nuanced understanding of which specific user agents perform which specific functions across different AI companies.
SPEAKER_01And the third step in the playbook is about redefining our metrics for success. Right.
SPEAKER_02We must move beyond the vanity metrics of the previous decade. A massive spike in AI bot requests hitting your server is not a key performance indicator.
SPEAKER_01It doesn't pay the bills.
SPEAKER_02It is not proof of influenced revenue. You have to reframe your entire team's measurement framework.
SPEAKER_01So what should we be asking in strategy meetings?
SPEAKER_02The questions should revolve around are our digital assets inherently machine readable? Is our entity clarity strong? Are we actually driving citation share within AI answers rather than just raw traffic?
SPEAKER_01Aaron Powell Entity clarity. It's a term that gets thrown around a lot right now. Yeah. What does building entity clarity actually look like in practice for a brand listening to this?
SPEAKER_02It can't just be stuffing keywords on a page anymore. It is the exact opposite of keyword stuffing. Entity clarity is about structuring your data so that a machine can definitively map your brand to a specific concept without guessing.
SPEAKER_01Give me an example of how you do that.
SPEAKER_02In practice, this means rigorous use of schema.org markup to tag your content. It means using crystal clear, semantic HTML architecture.
SPEAKER_01Making it foolproof for the machine. Yes.
SPEAKER_02It means ensuring that your brand is mentioned in authoritative, high-context backlink environments, not just spammy directories. You are building a mathematical map of relationships that proves to the AI definitively that your brand is the authoritative source on a given topic.
SPEAKER_01So we are transitioning completely away from an era where the entire objective was simply making yourself easy to crawl.
SPEAKER_02Yes.
SPEAKER_01We used to just put out the digital welcome at and hope the single Google bot showed up eventually.
SPEAKER_02And now that posture is obsolete.
SPEAKER_01This new architecture demands a completely different posture. It's about making yourself easy to trust, making your data structurally easy to retrieve when it actually matters, and making your brand impossible to ignore when the AI formulates its final answer for the user.
SPEAKER_02Search is no longer functioning merely as a directory or a traffic source.
SPEAKER_01No, it's not.
SPEAKER_02It has become an active intermediary layer between your brand and the customer. The AI model is the ultimate gatekeeper. Wow. If you only optimize your site for mass ingestion, you resign yourself to being the invisible fuel. If you optimize for structural retrieval and authoritative citation, you retain your position as the destination.
SPEAKER_01Let's bring all of these threads together. We started with the realization that your standard analytics dashboard is a broken window, fundamentally blinding you to the reality of machine traffic. A tough pill to swallow. We discovered that Google's crawling infrastructure is not a single bot, but a massive, complex internal SAS product fielding request from hundreds of undocumented entities.
SPEAKER_02Right, old Jack.
SPEAKER_01We examine Mayhan Yeselert's experiment, proving that AI ingestion bots are aggressively tearing through the web at scales hundreds of times larger than traditional search engines.
SPEAKER_02Entirely undetected by JavaScript tracking.
SPEAKER_01And most importantly, we unpack the ingestion gap. The dangerous reality that an AI can consume all your proprietary data to train its models without ever actually citing your brand to a human user. It really forces a complete paradigm shift in how we manage. Measure, manage, and protect digital visibility. Trevor Burrus, Jr.
SPEAKER_02It does. But you know, there is one final unmentioned implication in all of this data that we really need to consider.
SPEAKER_01What's that?
SPEAKER_02Well, we've established it right now, in this exact moment, these AI models are in an absolute feeding frenzy.
SPEAKER_01Right, grabbing everything they can.
SPEAKER_02They're aggressively scraping every piece of high-quality data they can find to train their foundational parameters. But if we follow this trajectory to its logical conclusion, what happens when the models feel they know enough?
SPEAKER_01Wait, you mean if they stop needing new data?
SPEAKER_02If the aggressive training phase eventually yields diminishing returns and these massive ingestion crawlers are significantly dialed back or retired entirely, the architecture of the web changes fundamentally.
SPEAKER_00Oh wow.
SPEAKER_02We face the chilling prospect of the internet becoming a closed loop.
SPEAKER_00Like lockdown.
SPEAKER_02A digital environment where no new sites, no emerging voices, and no innovative brands can ever break in and gain algorithmic authority simply because the gatekeeping AI has decided its training is complete and has permanently stopped looking for anything new.
SPEAKER_01Just frozen.
SPEAKER_02We could be looking at a web frozen forever in the amber of an AI's final training data cutoff.
SPEAKER_01That is definitely something to think about the next time you look at those clean little charts on your dashboard. Remember, you might just be looking through a broken window. Go check your server logs. Thank you for joining us on this deep dive. We will catch you next time.