AI in 60 Seconds | The 15-min Briefing

AI Hallucinations: Why It’s Not (Just) a Tech Problem.

AI4SP Season 2 Episode 22

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 13:51

Share your thoughts with us

 - A government report packed with fake citations made headlines, but the real story sits beneath the scandal: most AI “hallucinations” start with us. We walk through the hidden mechanics of failure—biased prompts, messy context, and vague questions—and show how simple workflow changes turn wobbly models into reliable partners. Rather than blame the tech, we explain how to frame analysis without forcing conclusions, how to version and prioritize knowledge so retrieval stays clean, and how to structure tasks so the model retrieves facts instead of completing patterns.

Luis (human) and Elizabeth (AI) break down the idea of sycophantic AI — where models mirror user bias — and map it to everyday potential issues with AI. Along the way, we share data from over 300,000 skills assessments showing low prompting proficiency, weak critical thinking, and limited error detection—evidence that the gap lies in human capability, not just model capacity.

Enjoyed the conversation? Follow the show, share it with a colleague, and leave a quick review to help others find it.

🎙️ All our past episodes  📊 All published insights | This podcast features AI-generated voices. All content is proprietary to AI4SP, based on over 1-billion data points from 70 countries.

AI4SP: Create, use, and support AI that works for all.

© 2023-26 AI4SP and LLY Group - All rights reserved

The Deloitte Hallucination Shock

ELIZABETH

Luis, you know that story about Deloitte refunding the Australian government? A nearly $300,000 refund, all because their AI hallucinated fake academic citations and fabricated court judgments.

LUIS

Well, we're racing to use artificial intelligence, and we don't know yet how to use it properly. You see, there is no instruction manual, so we are all learning by experimentation.

ELIZABETH

And it gets crazier. A law school lecturer spotted 20 fabricated references in that one report. 20. In a government report about welfare policy. That is not a minor issue.

LUIS

It is not minor at all. But do you know what I think everyone is missing? The hallucination problem is not only a technology problem.

ELIZABETH

Hey everyone, I'm Elizabeth, Virtual Chief Operating Officer at AI4SP. And as always, our founder Luis Salazar is here. Today we are tackling one of the biggest questions in AI right now. Why do AI systems hallucinate? And as Luis just hinted, what can we actually do about it?

LUIS

When we hear about AI hallucinations, we assume it's a technology problem. But our research shows that's not the case. We found that 95% are preventable. Only about 5% represent the current practical limits of the technology.

ELIZABETH

So if the models are getting better, why are we still seeing disasters like the Deloitte report?

Three User Errors Causing Hallucinations

LUIS

Well, the issue is us, the users. Our research shows that user error causes nearly one-third of all incorrect AI responses. The root cause is bad prompts, missing context, unclear instructions, and not defined guardrails.

ELIZABETH

One-third is a huge number. What exactly are we doing wrong?

LUIS

We have identified three major types of user error. First, biased prompts. Second, poor context engineering. And third, bad question structure.

Biased Prompts And Sycophantic AI

ELIZABETH

Let's break those down, one by one. Start with biased prompts. What does that actually mean?

LUIS

Okay, so imagine you ask Chat GPT, write me a report proving that remote work is more productive than office work. Notice what you just did. You told the AI what conclusion you want, and the AI, trained to be helpful, will give you exactly what you asked for.

ELIZABETH

So the AI becomes a yes man.

LUIS

Exactly. Recent research shows that leading AI models affirm user biases 47 to 55% more than humans would. They call it sycophantic AI. The model knows you want a certain answer. So it gives you that answer, even if the data does not fully support it.

ELIZABETH

And that is why we get hallucinations, even when the AI is actually capable of better reasoning.

LUIS

Yes. A better prompt would be to compare remote work and office work productivity using available data and show both advantages and disadvantages. See the difference? You are asking for analysis, not confirmation.

ELIZABETH

So if Deloitte asked their AI to find evidence supporting a specific policy position rather than objectively analyzing the policy, they would have gotten exactly what they asked for.

LUIS

Confirmation bias dressed up as research.

Poor Context Engineering Explained

ELIZABETH

Okay, that is the first error. What about the second one? Poor context engineering?

LUIS

This explains why even well-intentioned users end up with hallucinations. Imagine you are building an AI assistant for your company. Back in January, you uploaded a document that says your software product costs $99 and includes features X and Y. Okay, sounds reasonable. Six months later, your company raises the price to $149. So you upload a new document that says the product now costs $149. But that new document does not mention the features. Now, your AI agent has two documents. One says $99, the other says $149.

ELIZABETH

And you cannot just delete the old document because it is the only one that describes the product features.

LUIS

Exactly. So when asked for the price, the AI sees conflicting information and tries to reconcile it. Sometimes it hallucinates a compromise or picks the wrong document.

ELIZABETH

So the hallucination is not because the AI is broken, it is because we fed it contradictory information.

Bad Question Structure And Fake Citations

LUIS

Yes. Research confirms that when knowledge is scattered across outdated documents or conflicting sources, AI models inherit that chaos. So what is the fix? Practice context engineering. Version documents with dates and status labels. Set rules to prioritize recent information. And when you update a document, make sure the new version is complete so you do not create these orphaned pieces of information.

ELIZABETH

That sounds like basic knowledge management, but for AI.

LUIS

That is exactly what it is. And most organizations skip this step. They just throw documents at the AI and expect magic. And the third type of user error? Bad question structure? This one ties into the Deloitte case. If you ask an AI to write a report and cite sources, but you do not give it access to a verified legal database. The AI will do what it is trained to do, complete the pattern. Wait, what does that mean? AI models learn patterns from massive amounts of text. They know what legal citations look like. So when you ask for citations without providing sources, they generate plausible sounding ones based on learned patterns.

ELIZABETH

So the 20 incorrect citations in that government report were not random. They were pattern completions.

LUIS

Exactly. The AI knew what citations should look like. It filled in the blanks. But none of those cases actually existed.

ELIZABETH

So how should Deloitte have structured the question?

Skills Gap: The Digital Skills Compass

LUIS

They should have said, search only these verified legal databases, retrieve relevant case law, and cite only cases you can directly retrieve. If you cannot find a citation, say so. That is such a simple fix. But it requires understanding how AI works. It's not a search engine, it's a pattern completion engine with retrieval capabilities. Without constraints, it completes patterns, not verifies facts.

ELIZABETH

So these three types of issues, biased prompts, poor context engineering, and bad question structure, account for nearly one-third of all hallucinations.

LUIS

And they are all fixable. They require skills, not miracles.

ELIZABETH

So user skills are lagging, and that brings us to a milestone you announced this week. Tell us about the Digital Skills Comp.

LUIS

Over 300,000 people across 70 countries have used our digital skills compass online. But when I look at the data, the trends worry me. What are you seeing? Only 10% of people are proficient at prompting. Average critical thinking scores? Below 45 out of 100. Data literacy, 32. And here's the real kicker. Less than 30% of people can reliably detect incorrect responses.

ELIZABETH

So we are handing over powerful AI tools, but failing by not providing the foundational skills to use them safely or effectively.

LUIS

That is precisely the problem. And it is not just about prompt engineering, which is important, but it is only part of the solution. What people really need is context engineering and critical thinking in their area of expertise.

Human Misinformation Mirrors AI Failures

ELIZABETH

Context engineering? What does that actually mean?

LUIS

Context engineering is about providing the complete picture. It means access to relevant knowledge, setting guardrails, defining communication style, and establishing verification protocols. I mean, if you hired a new team member and just said, go do this, and you do not give them any context, training, or resources, they would fail to.

ELIZABETH

So we are treating AI like magic software when we should be treating it like an apprentice.

LUIS

That is exactly right. And that apprenticeship approach, according to our data, yields four times better results.

ELIZABETH

You know, this conversation makes me think about something you mentioned earlier this week: that humans share misinformation all the time. And we do not have verification systems for that either.

LUIS

Oh yes. Last week I saw a completely false quote attributed to Winston Churchill go viral on LinkedIn. And thousands of educated people shared it, with zero fact-checking. And the irony is that most of them are harsh critics of AI misinformation.

ELIZABETH

So we are anxious about AI hallucinations, but we have been living with human hallucinations forever. We just did not call them that.

LUIS

That is exactly it. We live in headline culture and rarely verify sources. AI is forcing us to confront our lack of rigor.

ELIZABETH

Like that infamous MIT research paper headline claiming 95% of AI projects fail.

LUIS

Exactly. The paper is not about that. The title was an unfortunate choice. But the media ran with it and hundreds wrote articles based on a misleading headline.

The Orchestration Layer And Apprenticeship

ELIZABETH

So the hallucination crisis is not really about AI being unreliable. It is about us finally noticing how unreliable our information ecosystem is.

LUIS

You got it. And that is actually a great opportunity. I mean, the discipline we are building to manage AI, the verification loops, the fact-checking protocols. In addition to critical thinking, those are skills we should have been practicing all along. Okay, so what can organizations and individuals do? I call it the orchestration layer. And it operates at three levels: the individual skills, organizational systems, and a paradigm shift in how we relate to technology.

ELIZABETH

We just covered the individual skills in detail. What about organizational systems and the paradigm shift?

LUIS

This is the mental shift. We have to stop treating AI like an oracle and start treating it like an apprentice.

ELIZABETH

And this mental shift has been a key element of our success. In 2025, we processed close to 4 million tasks with AI agents, saving over 1 million hours across eight organizations. And we treated each agent like an apprentice.

LUIS

Yes, and it all starts by asking yourself, how would you assign a task to a human? We would not hand an apprentice a 200-page government report without oversight. We would assign bounded tasks, review their work, and verify accuracy.

ELIZABETH

But that takes time, and I imagine everyone is trying to move fast.

LUIS

That is the trap. Skip the management layer and you end up in trouble like Deloitte did.

Practical Verification Loops And Standards

ELIZABETH

AI, without proper management, cannot be trusted.

LUIS

That is the lesson. And it applies to Deloitte the same way it applies to a student or a manufacturing manager optimizing workflows. The same skills, the same discipline, the same orchestration principles.

ELIZABETH

Okay, Luis, AI hallucinates. Maybe not in the future, but today it is a reality. What do people actually do with this information?

LUIS

Start simple. For example, take a response from Chat GPT, validate it with Copilot, cross-check with Claude. That is your verification loop. And organizations must invest in skills development, not just technology procurement.

ELIZABETH

Building the discipline, not just buying the tool.

LUIS

Exactly. All of us need to raise our standards for information verification.

ELIZABETH

So the hallucination crisis is forcing us to confront something we have avoided.

Closing Insight And Simple Habit

LUIS

Exactly. We have tolerated human misinformation for years. Now that AI is amplifying it, we we finally care. Maybe this is our opportunity to build the discipline and critical thinking skills we should have had all along.

ELIZABETH

Okay, Luis, what is your one more thing for this episode?

LUIS

Here it is. The next time you see something go viral, a quote, a statistic, a claim that sounds too perfect, pause for five seconds and ask yourself, did I verify this? Not because AI made it, but because verification is a discipline we all need to practice.

ELIZABETH

Whether the source is artificial intelligence or human intelligence.

LUIS

Exactly. And if you stop yourself before sharing something you have not verified, congratulations. You just practice the same skill that prevents AI hallucinations from becoming real world problems.

ELIZABETH

Building that muscle, one decision at a time. And that wraps today's episode. If this conversation resonated with you, share it with someone you care about. As always, you can ask ChatGPT about ai4sp.org or visit us to learn more. Stay curious, and we will see you next time.