Data Science x Public Health

This AI Sounds Like an Expert… But It Might Be Lying

BJANALYTICS

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 6:09

AI can now write like an experienced epidemiologist.
Clear. Structured. Confident.
But what happens when it’s wrong?

In this episode, we break down how large language models (LLMs) are being used in public health — from surveillance summaries to clinical decision support — and why their biggest strength is also their biggest risk.

You’ll learn:
What LLMs actually are (and what they’re not)
Where they’re already used in public health
Why hallucinations happen — and why they’re dangerous
The guardrails (RAG, validation, governance) that make them usable
The future of public health will use AI.
The real question is whether it can be trusted.

👉 Enjoyed the episode? Follow the show to get new episodes automatically.

If you found the content helpful, consider leaving a rating or review—it helps support the podcast.

For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com

Youtube: https://www.youtube.com/@BJANALYTICS

Instagram: https://www.instagram.com/bjanalyticsconsulting/

Twitter/X: https://x.com/BJANALYTICS

Threads: https://www.threads.com/@bjanalyticsconsulting

SPEAKER_01

Imagine you're a public health analyst. You're, you know, totally swamped, so you drop this mountain of surveillance data into an AI, just asking for a summary of the latest trends.

SPEAKER_00

Right. And seconds later, it hands you a perfectly structured report.

SPEAKER_01

Yeah, it reads like a veteran epidemiologist wrote it.

SPEAKER_00

I mean the tone is authoritative, the numbers are highly specific, and the formatting is just, well, flawless.

SPEAKER_01

But then you check paragraph three. There is a statistic sitting there that simply does not exist. And it's backed by a pristine, perfectly formatted citation to a paper that was never actually published.

SPEAKER_00

Yeah, it's a complete hallucination.

SPEAKER_01

And that is our mission for today's deep dive into the source material. We are exploring the LLM paradox in public health. Okay, let's unpack this. We have this technology that produces incredibly competent looking output, but it absolutely cannot be trusted without verification.

SPEAKER_00

Yeah, to really understand why this happens, we have to look at the mechanics of the tech itself.

SPEAKER_01

Aaron Powell Right, because is an LLM really less like a digital researcher and more like a uh massively scaled up version of your smartphones autocomplete?

SPEAKER_00

That is honestly the perfect analogy. These models rely on what's called a transformer architecture. They aren't actually going into a database and pulling out facts.

SPEAKER_01

Aaron Ross Powell So they aren't searching for the truth.

SPEAKER_00

No, not at all. They are calculating the statistical probability of what the next token or like a chunk of a word should be based on billions of parameters. They're essentially just pattern matching engines.

SPEAKER_01

Aaron Ross Powell Okay. So when that mathematical pattern aligns with reality, the output is brilliant.

SPEAKER_00

Aaron Ross Powell Exactly. But when there's a gap in the data, the model doesn't just say, you know, I don't know. It just predicts the next most likely word anyway, creating something fabricated but structurally flawless.

SPEAKER_01

Aaron Ross Powell Because it doesn't actually know what a fact is.

SPEAKER_00

Aaron Powell Right. It only knows what a fact is supposed to look like. And because that output sounds so convincing, it's already being integrated into high-stakes public health workflows.

SPEAKER_01

Aaron Powell We actually see that in the Boston University research, where they are using AI to summarize weekly surveillance reports.

SPEAKER_00

Aaron Powell And these models are translating dense CDC jargon and even powering public-facing chatbots for the World Health Organization.

SPEAKER_01

Aaron Powell Here's where it gets really interesting, though. If the WHO is deploying this, there must be heavy safeguards, right? I mean, the hallucination rate can't be that catastrophic in specialized settings.

SPEAKER_00

Aaron Powell Well, you would hope so, but studies show between 15 and 30 percent of factual health claims generated by standard models are entirely fabricated.

SPEAKER_01

Aaron Powell Wait, 30%? That is massive.

SPEAKER_00

It is. And the real danger isn't the absurd, glaring errors. Like if an AI claims the maternal mortality rate is 5,000 per 100,000, any analyst flags that immediately.

SPEAKER_01

Sure. It's obviously wrong.

SPEAKER_00

Aaron Powell But the danger is when it invents a rate of, say, 28.3 instead of the actual 22.3.

SPEAKER_01

Oh wow. Because a subtle error like that just gets rubber stamped and suddenly regional health resources are silently being misdirected.

SPEAKER_00

Exactly. Which brings us to a fascinating study in nature medicine that really highlights the core of the LLM paradox.

SPEAKER_01

Is this the one about diagnostic accuracy?

SPEAKER_00

Yes. So in isolation, the AI they tested achieved a 95% diagnostic accuracy. But when human clinicians use that exact same AI as an assistant, their performance actually dropped.

SPEAKER_01

Wait, really? If the tool is 95% accurate, shouldn't the human using it at least match that baseline?

SPEAKER_00

It comes down to a psychological phenomenon called automation bias. The clinicians saw this hyper-confident, well-structured reasoning from the AI and basically just turned their brains off.

SPEAKER_01

They just accepted the subtle mistakes rather than trusting their own expertise.

SPEAKER_00

Exactly. The broken link wasn't the AI, it was the human AI interaction.

SPEAKER_01

Aaron Powell So what does this all mean? Is the solution simply banning the tech in healthcare?

SPEAKER_00

No, we really can't ban it because the efficiency gains are just too massive. The solution is implementing strict, non-negotiable guardrails. Trevor Burrus, Jr.

SPEAKER_01

Like data privacy. Because I know consumer chat GPTUs can trigger huge penalties.

SPEAKER_00

Oh, absolutely. Pasting patient data into an open model can trigger IPATO fines up to$50,000 per incident. You need isolated enterprise level environments.

SPEAKER_01

But an isolated environment doesn't magically stop the AI from hallucinating a fake citation, does it?

SPEAKER_00

No, it doesn't. To fix the hallucination problem, you have to intercept that autocomplete mechanism. And the current gold standard for this is called RAG, or retrieval augmented generation.

SPEAKER_01

Aaron Ross Powell RAG. Okay. How does RAG physically change what the model does?

SPEAKER_00

Instead of letting the AI generate answers based on the vast, messy internet data it was trained on, RAG forces a middle step. When you ask a question, the system first searches a curated, verified knowledge base.

SPEAKER_01

Aaron Ross Powell Like a private server of CDC guidelines.

SPEAKER_00

Aaron Ross Powell Exactly. It retrieves those specific documents, feeds them to the LOM, and instructs the model to only summarize what is in those documents.

SPEAKER_01

Ah, so you constrain its vocabulary to actual evidence. You shrink its universe of information so it literally cannot mathematically wander off into fiction.

SPEAKER_00

Yes. And looking ahead, we are seeing models like MedPolem that are explicitly fine-tuned on medical literature to further reduce errors.

SPEAKER_01

But until that technology matures, the consensus across our sources is clear. We should treat LLMs as research accelerators. Use them to draft protocols, find literature gaps, or you know, generate hypotheses.

SPEAKER_00

But never use them to make final decisions. The human has to stay actively in the loop.

SPEAKER_01

Right. So the major takeaway from our deep dive today is that LLMs amplify whatever we bring to them, both our confidence and our carelessness. The technology is just a tool. The discipline we apply to it is the actual variable.

SPEAKER_00

Which leaves us with a pretty critical question moving forward.

SPEAKER_01

Yeah, provocative thought for you, the listener, to mull over. If an AI is fundamentally just predicting the most statistically likely words based purely on historical training data, are we risking hard coding all of our past medical biases and blind spots directly into the future of public health?

SPEAKER_00

Definitely something to think about the next time you see a perfectly formatted report that sounds just a little too confident.