Coffee with Developers

ChatGPT vs Google and SEO in the Age of AI Search

WeAreDevelopers

On this episode we met with SEO expert Eric Enge to discuss his recent trending article comparing ChatGPT to Google’s search results, and what the future of web search looks like now that LLMS are here to stay. 

We hope you enjoy this episode, and you can - of course - find us on socials to let us know what you think.

----------------------------------------
Welcome to WeAreDevelopers, the #1 developer community in Europe!
This is your one-stop destination for the latest tech insights, tutorials, and career advice to elevate your developer career.

Stay updated Dev Digest, with our weekly newsletter featuring the most recent tech trends, career guidance, and original content crafted by developers, for developers. Subscribe now

Interested in advancing your career? Browse through our job board featuring over 190,000 jobs. Unlock job opportunities with a free developer profile

Don't miss out on the annual highlight of every developer's calendar - the WeAreDevelopers World Congress. Network with 15,000 peers and learn from over 500 speakers. Secure your spot and save 10% with "wearedevs_yt"

#CareerInTech #Tech #ProgrammingTutorials #DevRel #CodingTools #TechJobs #TechTalks #DeveloperSkills...

Welcome to another coffee with developers. I'm here with Eric Engel who wrote an interesting article where we compared chat, GPT and Google results and that's what we're going to cover. But also the general idea of what how the search market is changing, how the indexing market is changing. But first of all like what makes you an expert in this space? How have you worked on this kind of environment in the past?
So from the SEO side of things, I've been involved in SEO for 25 years, wrote the book called the Art of SEO or co wrote the book called the Art of SEO, founded one of the more well known agencies in its time and sold that in 2018, an agency called Stone Temple Consulting. We'd grown it to 70 people and worked with many of the largest brands. And so I have a technical background. I'm not currently actively coding by the way, but I have taken classes from Jeffrey Hinton and Andrew Ng, both who are very, very well known in AI circles. And during those classes actually learn how to write some of the simpler forms of AI algorithms.
So I actually have an insight into how neural networks are structured, how they work, and I've created a few very simple versions of that. So what's interesting about that is when you understand the basic math that's going on and being used at the root of large language models, it gives you a different perception of what might be possible and what's silly to consider. It's an interesting thing how machine learning and deep data analysis has been around for such a long time and now everything gets clobbered with this AI as a hype title. Like everything gets put into the same box and say like it's just magic, just trusted. But your article that we're going to link here as well is basically you did a deep analysis.
It's really long read but it's great to actually see somebody going into these details where you compared aat, free chat, GPT tier, not the expendipate one with Google results. So what were your main findings where you say one is better than the other? What would be the elevator pitch for each of them for you? So yeah, I think in terms of the main findings I found that there are s certain classes of queries where Google is not any under any immediate threat. And I would say things like commercial queries, local queries, navigational queries, those are things that Google is leveraging other kinds of databases that you don't find by crawling the web.
And so I shared a bunch of results. For example, I shared a result of asking for directions to Whole Foods from where I was sitting here in Southborough, Massachusetts. I know the lake quite well, actually, but and you know, Google provided a fabulous result that knew where I was and it was, you know, point to point, really well done. Now Google has an advantage there and this is an example of one of those database things I'm talking about. You know, I use Google Maps on this device.
It knows where I am and correspondingly, Bing doesn't have that level of data. So the Chat GPT thing, it was quite interesting. It assumed I was in the neighboring town because it couldn't, you know, locate me quite so precisely. And then it gave a set of directions that were nonsensical, saying that I should follow Route 20 east, which will mean nothing to any of you unless you're from Massachusetts. But that's beside the point until you get to Route 90 and take the exit there to get a Route 90.
The problem is Route 20 east doesn't run into Route 90. So this is an example of an error, but a very detailed answer to one part of it. Let me give a higher level answer to the rest of your question. Question, Chris, which on the ChatGPTT side of things, there are definitely classes of queries where it does a lot better know certainly anything involving any kind of content analysis. Those kinds of queries, it's the only game in town right now.
Well, at least compared to Google, Gemini is a standalone tool'try to do it. But not as well as Chat TPTT search does. So that's it's a flawed sort of content analysis and you need to treat it like a brainstorming partner. But it does it way better than Google does. Does better on disambiguation queries.
Also, it's another area I found of interest that it did a better job of. So disambiguation, you mean like when you go to Wikipedia you get like a term that could be 12 different things and a disambiguation page there. So Chat GPT would pick the right disambiguation compared to what you asked it before or how do you mean it does a better job in that. And by the way, the answer to that question is of course rooted in some degree, inevitably in some level of bias, by which I mean I had a type of answer that I felt that was better. And let me give an example.
I could ask both ChatGPTT Search and Google what is a Jaguar? And the entire Google page will be about the animal, whereas the Chat GPT page will note that there's an animal, there's a car, you know, a few other things. So for example, there's an operating system, there's a guitar, and then there's an American football team called the Jagars. And a human might mean any of those things. I believe that the Google result is operating more on the assumption that the great majority of people mean the animal.
So we'll just answer the animal. I had a preference and hence I would scored chat cptt search higher to explain what all the various options are and give people ways to go ant into various details. I'm actually surprised that Google does the main thing the animal because I would have thought that the brand somebody would have paid for actually getting this as the most result in there because that's one of the things that drives me nuts the most about like I mean I've always used search. I worked on a few search products but it drove me nuts that when I look for something right now I get like not the right content but I get content that somebody paid for. Like for example I have a shirt that has a certain name and, and a certain brand and I look for that one and I go to Bing for example and it gives me like 20 different brands of like that are similar to the one that I was looking for.
So that's something I would expect from an LLM to do. But I see search engines doing the same thing right now. Like the first. I mean a few years ago I wrote a blog post that the web starts at page four because the first three pages are paid content and like SEO optimized content rather than your real result. Did you find with the more generic searches that you did that Google gives you like what you're looking for, a focused thing?
Or do you also see that SEO optimized and also sold content and ads being displayed watering down the results of search engines as well? Well you do get, I mean yes, there's a lot of people who are really good at SEO optimized content and and you can still today have some success getting higher rankings than other signals might suggest. Google tries to combat this with the use of multiple types of signals as we learned in their API leak of May of last year.
So they use user engagement data, including Chrome to just see how well users respond to content. So if the content is like bad it sends a pretty clear signal that users arent liking it and they can do something about it. The reality is that its probably a pretty crude signal because as we see with some of the reactions to the early content from Chat GPT content can look pretty good, but not actually be pretty good. And it takes sometimes a subject matter expert to understand the things that are flawed in a piece of content. I'LIKE to just share a quick example of exactly this kind of a case.
I did a queries in earlier studies I did around this. The query was what was the significance of the sinking of the Bismarck in World War II? One of Germany's two prized battleships during World War II. And the various generative AI platforms came back and it was the largest battleship ever built. They would say.
And they would say that it ended the Battle of the Atlantic in World War II. So having been a World War II geek when I was a teenager, I know the truth of those two things, which is Japanese built multiple battleships that were larger. And it didn't end the Battle of the Atlantic in World War II, it shifted it to a degree. But you wouldn't know that unless you really already knew the history. So you have to have this willingness to dive in and find out what the real truth is behind something.
And I think that was really one of the big things that has struck me all along. As I read various people publish content and saying that ChatPT content or somebody else's content is going to take over search because they do a quick surface read and a texty type response is pretty cool. And it's nice and easy process. But there are various kinds of problems with it are many. You have errors, you have emissions, you also have other sorts of situations.
Like I would get responses from Chat GBT in my study here, which seemed like a decent response, but the problem was that I knew that I was going to need to reference multiple other sites to get all the information I wanted. ChatGPT would serve up kind of a random sample of a couple of authoritative sites that it built its response on. And what I really needed was the list of the six or seven various resources because I was going to have to go through all of them to really get what I wanted. So there's a number of these kinds of scenarios that come up when you kind of pull back the covers and try to really understand what's going on. Well, Ch Bing Chat does that.
It basically lists the pages where it got the information from and what the source was. It's just like it's Wikipedia or it's been the web and these kind of things, but it still doesn't mean that it actually is the right resource. I mean, you could have published any random things on the Internet, but at least it's good to know where it came from. I think one of the bigger problems is also that the way chat GPT talks to you always say, like, it seems like a happy Californian in 1980. They sound very, very warm and they sound like very knowledgeable.
It's like mansplaining as a service to a degree, because there's sometimes no data behind it, but it still sounds like a very simple thing to me. It feels like a readers digest of the web, basically. Like, here's the few points that are interesting, but when it comes to proper research, then it actually means that you still have to go to the resources that you talked about. Then again, in an agenic web, like when we say like wen to have agents that serve the web for us and actually click through websites for us and submit forms and these kind of things, it sold us to like, we do the research for you and then the chat system gives you the result. But it seems like in a world where the machine always has to give you something back, that's the thing that drives me nuts.
When I go to Google and a Google search, there's no result. I get an empty page. It's not often happening anymore, but sometimes it just says like, I don't know. But an LLM will never say I don't know. It just basically then starts hallucinating and gives you random stuff back.
And is this a marketing thing or do you think there will be a change in that in the future as well, where there will be more honesty in these systems that say, like, this is really not information that I know. Please ask, give me more information to research? Yeah, that's a great question, Chris. I can only speculate. One hopes that they will do a better and more complete job of admitting their shortcomings.
The challenge, though, is a marketing one, because a lot of the times, the issues and the gaps and things that are missing from what you should be receiving as a response, it takes someone who really knows what they're doing to understand that they're there. And users might run off and do things with the information they've gotten. And depending on their application, they might get a poor grade, they might get bad medical advice. Even with the warnings that are given, any number of different things could happen. I think that's a real issue and it's an outcome of the underlying algorithms, which in rough terms is trying to fit an algebraic equation to describe all the world's information.
And I think it's a simplistic explanation, but it's roughly on the money here with some patches that are put on top of it to fix certain kinds of errors and it'not possible to write an algebraic. I'll stand by, you can give me quantum computers. I don't think you can write an algebraic equation and describe all the world's information. I think you can do a lot of interesting things with it, but only so far. It can go and people will try other things like retrieval, augmented generation, which is this technology for providing a constraining database of known information to the LLM to limit how it responds and reduce errors.
Ultimately, if you build a database of all the world's information so the LLM can't make any errors, well then you already have a database of all the world's information. What do I need the LLM for except to summarize it? So I guess translation is the thing in context. I mean when it comes to prompting, when it comes to talking to these systems, the way you prompt them and the way you tell them to give you information is a completely different result. I mean I found that roles are a really important thing that you tell them.
Like you are a Greatade 5 teacher and explain these things to me as you would to a child kind of thing and then you get the better information. You would then you say like give me just the information. By focusing more in your requests to the machine, you probably get better results as well. But then again I find it so time consuming. Oftentimes I'm like why would I want to to chat to a system for like 10 minutes about the best recipe for pancakes when I just could read through a website.
But then the web has changed as well. Like we don't find recipe websites for pancakes anymore. We find 10 minute YouTube videos for pancake recipes where people ask us to like and subscribe and then get a VPN provider and do all kinds of things. So that the spaminess of the web search results as well is probably what drives people to go through LLMs because they get the information immediately. That doesn't mean to me that the quality is good.
Right? Yeah. So I did share an example in the study of asking for a salmon recipe, I believe where actually the Google result was. I mean there were websites with salmon recipes quite near the top and it was fairly easy to parse. But I agree there's certainly plenty of situations where it's, as you say, it's harder to get at the content you want in search.
But having said a number of things that suggest that I'm negative on chat tpt, I want to turn it around a little bit, because I have my concerns, but that's different than being negative. And I kind of want to draw that out a little bit more, which is, I believe that gen AAI tools like ChatPTT and Claude is good at certain things and Gemini is good at certain things, and Copilot is obviously just basically chatpt, but they're really good when you treat them as a brainstorming partner for your work, not as the absolute golden source of true information or the perfect information set. Let me just give an example, which is imagine that I have been tasked with creating a piece of content about some topic. We can go back to my example of the sinking of the Bismarck in World War II, for example, and I asked my Genai platform for an outline of that article. And maybe I'll be a little clever and I'll ask you for three outlines and then I'll review the outlines and I'll go through them and drop out the things that I don't think belong and maybe add a thing or two that I think they missed.
I can do that much faster using the Gen AI tool than I would if I started with a blank sheet of paper. And so it gets tricky as you go further and further into it, like if you ask it to write the article. So far, my experience, and I've been through this with a number of different companies, is that it's almost not helpful because there's enough rework that needs to be done to get it to where each particular company I worked with in my examples here felt it was suitable for their use. There are other things you can ask. You can ask for key trends on the topic.
You can ask for. Give me a list of facts and statistics about the topic, including sources. Give me a set of frequently asked questions, and you can approach it multiple different ways. And then my job as a content creator is greatly simplified because everything has been assembled for me like rapid fire. So maybe it only saves me 20% of the time and the cost of creating the article.
What company out there wouldn't die for a 20% gain? I mean, that's real, that's very important, and it's valuable. Everybody who's in any level of content creation should be doing something along these lines, in my opinion, because there's so much to be gotten from it. And of course there's a lot of other simpler queries where the Genai tools provide good enough information for the situation at hand. So it's sort of finding those places where essentially the potential problems don't bother you.
An interesting One is though, the emergence of AI slop like a generated content from AI that is there to just overwhelm a system and overwhelme a search engine just to show that there was great examples where somebody had a website and he said like okay, ask AI system to SEO, optimize it for him. And basically it managed to get him high on the Google results and on the Bing results, but none of the content was left. Like basically it was just random things in there. And that's something that I think as a content creator it becomes really, really hard to compete with this automated generated stuff by thousands of bots. So do you think there is a way for content creators to still stand out and make sure that our stuff is being found by people rather than just like the things that the machine wants to hear created by another machine through brand?
If you have a really good brand, that becomes an easy metric that search engines can use, that Google can use to have more trust in the content created there. Which doesnt mean that really good brands wont abuse that somehow. It is a concern. I agree that they are definitely examples of where this is working and people are succeeding in doing this. And Google of course is doing everything they can to root it out.
And I have a strong belief thats one of the reasons why the cycle of Google updates is so much faster than it was. We had one point, I think it was like after March or May of last year we didnt see an Update for like 5 months or something and then it was bang, bang, month after month because theyre trying to clean this crap out. Well, I guess it's also the release of Gemini was also the thing that they actually had like cooking in the background for months and now they finally released it and they wanted and it's interesting because when you do the comparisons of the different engines, Gemini is doing really well. Like compared to Open AAI saying like the cost of it is far too high than what they maintain at the moment. Even the $200 a month is actually not breaking even for them.
So it's an interesting bit to see how much money will go into that one. What I found interesting is like two things. The indexing of Google was very, very fast. Like my blog has been around since 2006. When I wrote a blog post 10 minutes later I found it in a Google search.
Now LLMs have a model that actually needs to be weighted, that needs to be cleaned up, that needs to be re released. There is no immediacy in that one. I always feel like when I use ChatGPTT or something. It feels like hitting. I feel lucky in Google because I get the first result that has been indexed years ago and information is normally like one or two years out of date in the free versions, not in the paid ones.
So do you think that's a thing that will make sure that search is still an interesting bit compared to just having a chat as your main information source? The immediacy of it is basically that the delay until the model get recreated is something that will make sure that search will not go away. Yeah, well in Google parlance we call that freshness. So the need for fresh content. There's a sporting event going on, a major political figure is speaking.
People are looking for election results, sports scores, weather information. These are all things where immediacy is a big issue. And then there's events, earthquakes, volcanoes, tornadoes, et cetera.
There's a lot of queries where people want very real time response. It will take the LLMs a while to get that set up. Yeah, I'm not sure what path you have to go down and I think it leads Chris, to one of the other big challenges which is the LLMs currently or the Gen AAI platforms don't really have a valid way to monetize their platform. And the place where they're making the most inroads is on informational queries where for the most part people don't care to advertise. Even when they try to introduce advertising, they won not be the best solution for the right context.
Effectively thats a big challenge that theyre going to have to figure out how to solve because as they get into more of these more and more complex models and everything else, theyre driving the expense up. And still 20 bucks or 200 bucks a month or whatever it is, just like you said earlier, just comes nowhere near covering it. And Microsoft's going to want their $13 billion back. They're going to want to make profit on that. Yeah, it's an interesting one.
Like with the monetization of the as I said before, like Google search results, Bing search results are like full of advertisements nowadays rather than like getting results from the web. And one interesting thing that I locd at was there was a versual article about different AI crawlers and how they actually indexed the web and it feels like 2001 in the search world because a lot of those actually don't understand JavaScript. They don't follow JavaScript links or anything like that. So they need content not even as HTML but actually they prefer content as markdown so there's a lot of tools that turn websites into markdown for ingesting into models. So it feels like there's a need to publish in static formats again rather than in JavaScript.
And I like that because I think JavaScript is still flaky in terms of interaction when I'm on a bad connection. But it feels like we're back in pleasing the crawlers in 2001 with those AI crawlers. Except we have less insight. At least Google had their webmaster portal where you can ask questions and get information. But we have no idea how these crawlers work and we just hope that they index our stuff.
Is there any way we can remediate that? Yeah, I don't know of the solution for that Chris, which doesn't necessarily mean someone hasn't come up with one. But people aren't going to want to give up their JavaScript. Their whole UX is built on top of it in so many different sites. It's like LLMs are going to be forced to adapt whether they want to or not and that will drive the processing challenge even higher.
Or the other thing was I could see in my logs when Google came to my site, I cannot see if my stuff has been indexed, had become a model. Like how can I can even evaluate my content being used by an lnm? How do I get this information? It's not something I've dug into. So it's a great question.
I'd be really interested in understanding the answer to that. But it'they'like you said, it's 1998 all over again. So in terms of like people can look up your article, read about it and I said I like that. You basically pointed out that it's not all negative about LLMs, that basically it seems like when you do research it's a great research partner to throw balls back and forth but you still have to finally to do the grunt work for you of research but you still have to validate it yourself and then write the content yourself. Right?
That's right. And so what I tried to do with the article, Chris, is rather than I did 62 queries I evaluated and I spent about an hour on each query fact checking in detail. So I was very happy when there were queries where I already knew all the answers like the one I shared earlier because those were a little bit of a coffee break for me. It was like okay, good. But it was a lot of time.
I spent an enormous amount of time into it. What I tried to do was just give examples of scenarios and explain why something might lean one way or the other. And I've mentioned a couple of them so far through the course of our chat here today. But there's may be eight or 10 different scenarios where I kind of say well for this kind of query, here's how you need to think about it. And this is why I Scored Google Hire vs Chat GPTT H or Vice versa.
I thought it was the right way to approach it because I knew I couldn't score the platforms in their entirety with 62 queries. Right. But what I could do is illustrate a bunch of ways to think about it and that might help people understand as they start thinking about how they want to use, you know, Gen AAI Tools or your ChatPTT search or Google, how they should think about approaching it and why they might decide to go one way or the other. One thing I found is that people that I talked from, people from Nature for example, and from Reuters and like news agencies and a lot of them, what they do is they compare different models against each other and actually see them next to each other and also see how much it costs them to actually use each of them to basically get help them with the research faster. And it's interesting with Bing like giving you web results and the chat at the same time.
So it seems like they want to help you with doing that research in both worlds. But yeah, it's still an interesting one that I think a lot of stuff that people do that are experts that are basically researchers, they use local models instead of just using just the LLM like local models that have been trained on a certain subset of content rather than just giving you the whole world in a nutshell. Do you think that's something that will come in the future? That we go away from these all knowing LLMs to more like specialist terms? I absolutely think thats coming.
I actually know of companies that are going down that exact track of being more focused on specific topic areas. Smaller problem means smaller model means better chance of a higher level of accuracy and more amenable to tuning with a rag database to limit errors even further. So there's a lot of goodness in that. One way I express have expressed it to people, Chris, is just to help the audience understand how to think about the complexity even for a topic. I usually say the geography in the United States, but since you're from Germany, I'll say the geography of Europe.
Let's just take that topic and if you asked an LLM and you just kept running queries at it, my Working guess is there's probably trillions of errors that it can still make about that contained topic trillions.
And so now you try to think about that across. Wait, wait a minute. That's a really limited, simple topic area that helps express again sort of the complexity of the broad general purpose do everything model. But if I were to build a model focused just on the geography of Europe, I could use a rag database and implement a lot of constraints around that to dramatically reduce the level of errors. I don't think you can eliminate them, but I think you can bring it way down and certainly the most common ones and think it's a promising direction for sure.
And of course the other thing is like a wasted effort. Wasted time wasd electricity. It was interesting. Like when you think about search engines back then and there was no click on it then they basically ranked the thing lower because obviously the people didn't get what they wanted. But I found, I found that LLM seemed to be the, the other way around.
Like saying like if people have no more questions then the answer was right and then they rank it higher. I hope it's not the case, but it might be right. Yeah, it's a great question. I think the signal is a little more difficult to parse here because, and I think this is your point, you know, you have this texty response and you feel like you got your answer, you leave, you have a texty response, you feel like it didn't help you at all, you leave. So how do I tell?
Right? How do I measure that back? Well, I mean there was research that basically said like that 67% of clicks of Google search result pages had no click. Last year like there was like basically that people got very disenfranchised with web search and that's why probably the LLMs got like a lot of gain. Do you think it's a, it's an age thing.
It's people that grew up with the web and saw search engines come up but still trust them more than people that just take out their phone and start talking to it. It could be. So yes, I think that might drive the rate at which people will experiment with LLM based solutions. The long term driver of success I hope will be that you know what was ultimately the utility of what you got. So for example, if you have a paper that you have to do for a college course and you include information in it and maybe it's all accurate but it's just missing some material pieces you probably should have included or any kind of problems and you get dinged on it or you're in a commercial setting and you rely on something and then it doesn't meet the need.
These things will happen to people who aren't critical enough in their review of the response they get. People will eventually get forced by what are largely column commercial or real world situations. They'll get pushed into using a more discerning eye in reviewing things, and that may or may not cause them to go back to another approach where they get the result. From a search experience to your original question, young people are probably going to be much more faster to try a check d EP or something else and but we well ll know five years from now, you know how many people stay Excellent. Well, thank you so much for your time.
There's hopefully going to be some interesting things for people to find in your article and discussion about it as well. So I think overall it's like we have to deal with this world that we're in right now. We have to deal that LLMs and chat systems are the new as content creators are a new revenue for us to actually cater to. It just feels kind of unfair because it's really hard to find out how to get an LLM to index your stuff above something else. Like in SEO, we had dirty tricks, we had good tricks, but I always found that great content just prevails over time, whereas with an LLM you can't actually rely on that.
Excellent. Well, thank you very much. And that was Coffee with Developers with Eric Enger. Take a look at the article, take it apart, run it through an LLM, or ask an LLM to give you the findings in it rather than reading it yourself. And hopefully there'be something in there for you as well.
Thank you so much, Eric. All right, thank you. Thanks for having me.

People on this episode

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Stack Overflow Podcast Artwork

The Stack Overflow Podcast

The Stack Overflow Podcast