This AI Sounds Like an Expert… But It Might Be Lying Artwork

Data Science x Public Health

This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.

All Episodes

Data Science x Public Health

This AI Sounds Like an Expert… But It Might Be Lying

March 27, 2026 • BJANALYTICS

0:00 | 6:09

AI can now write like an experienced epidemiologist.
Clear. Structured. Confident.
But what happens when it’s wrong?

In this episode, we break down how large language models (LLMs) are being used in public health — from surveillance summaries to clinical decision support — and why their biggest strength is also their biggest risk.

You’ll learn:
What LLMs actually are (and what they’re not)
Where they’re already used in public health
Why hallucinations happen — and why they’re dangerous
The guardrails (RAG, validation, governance) that make them usable
The future of public health will use AI.
The real question is whether it can be trusted.

👉 Enjoyed the episode? Follow the show to get new episodes automatically.

If you found the content helpful, consider leaving a rating or review—it helps support the podcast.

For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com

Youtube: https://www.youtube.com/@BJANALYTICS

Instagram: https://www.instagram.com/bjanalyticsconsulting/

Twitter/X: https://x.com/BJANALYTICS

Threads: https://www.threads.com/@bjanalyticsconsulting

SPEAKER_01 0:00

Imagine you're a public health analyst. You're, you know, totally swamped, so you drop this mountain of surveillance data into an AI, just asking for a summary of the latest trends.

SPEAKER_00 0:10

Right. And seconds later, it hands you a perfectly structured report.

SPEAKER_01 0:15

Yeah, it reads like a veteran epidemiologist wrote it.

SPEAKER_00 0:17

I mean the tone is authoritative, the numbers are highly specific, and the formatting is just, well, flawless.

SPEAKER_01 0:23

But then you check paragraph three. There is a statistic sitting there that simply does not exist. And it's backed by a pristine, perfectly formatted citation to a paper that was never actually published.

SPEAKER_00 0:36

Yeah, it's a complete hallucination.

SPEAKER_01 0:37

And that is our mission for today's deep dive into the source material. We are exploring the LLM paradox in public health. Okay, let's unpack this. We have this technology that produces incredibly competent looking output, but it absolutely cannot be trusted without verification.

SPEAKER_00 0:53

Yeah, to really understand why this happens, we have to look at the mechanics of the tech itself.

SPEAKER_01 0:57

Aaron Powell Right, because is an LLM really less like a digital researcher and more like a uh massively scaled up version of your smartphones autocomplete?

SPEAKER_00 1:06

That is honestly the perfect analogy. These models rely on what's called a transformer architecture. They aren't actually going into a database and pulling out facts.

SPEAKER_01 1:16

Aaron Ross Powell So they aren't searching for the truth.

SPEAKER_00 1:18

No, not at all. They are calculating the statistical probability of what the next token or like a chunk of a word should be based on billions of parameters. They're essentially just pattern matching engines.

SPEAKER_01 1:32

Aaron Ross Powell Okay. So when that mathematical pattern aligns with reality, the output is brilliant.

SPEAKER_00 1:38

Aaron Ross Powell Exactly. But when there's a gap in the data, the model doesn't just say, you know, I don't know. It just predicts the next most likely word anyway, creating something fabricated but structurally flawless.

SPEAKER_01 1:48

Aaron Ross Powell Because it doesn't actually know what a fact is.

SPEAKER_00 1:50

Aaron Powell Right. It only knows what a fact is supposed to look like. And because that output sounds so convincing, it's already being integrated into high-stakes public health workflows.

SPEAKER_01 1:58

Aaron Powell We actually see that in the Boston University research, where they are using AI to summarize weekly surveillance reports.

SPEAKER_00 2:04

Aaron Powell And these models are translating dense CDC jargon and even powering public-facing chatbots for the World Health Organization.

SPEAKER_01 2:12

Aaron Powell Here's where it gets really interesting, though. If the WHO is deploying this, there must be heavy safeguards, right? I mean, the hallucination rate can't be that catastrophic in specialized settings.

SPEAKER_00 2:23

Aaron Powell Well, you would hope so, but studies show between 15 and 30 percent of factual health claims generated by standard models are entirely fabricated.

SPEAKER_01 2:32

Aaron Powell Wait, 30%? That is massive.

SPEAKER_00 2:34

It is. And the real danger isn't the absurd, glaring errors. Like if an AI claims the maternal mortality rate is 5,000 per 100,000, any analyst flags that immediately.

SPEAKER_01 2:46

Sure. It's obviously wrong.

SPEAKER_00 2:47

Aaron Powell But the danger is when it invents a rate of, say, 28.3 instead of the actual 22.3.

SPEAKER_01 2:53

Oh wow. Because a subtle error like that just gets rubber stamped and suddenly regional health resources are silently being misdirected.

SPEAKER_00 2:59

Exactly. Which brings us to a fascinating study in nature medicine that really highlights the core of the LLM paradox.

SPEAKER_01 3:06

Is this the one about diagnostic accuracy?

SPEAKER_00 3:08

Yes. So in isolation, the AI they tested achieved a 95% diagnostic accuracy. But when human clinicians use that exact same AI as an assistant, their performance actually dropped.

SPEAKER_01 3:20

Wait, really? If the tool is 95% accurate, shouldn't the human using it at least match that baseline?

SPEAKER_00 3:27

It comes down to a psychological phenomenon called automation bias. The clinicians saw this hyper-confident, well-structured reasoning from the AI and basically just turned their brains off.

SPEAKER_01 3:38

They just accepted the subtle mistakes rather than trusting their own expertise.

SPEAKER_00 3:41

Exactly. The broken link wasn't the AI, it was the human AI interaction.

SPEAKER_01 3:45

Aaron Powell So what does this all mean? Is the solution simply banning the tech in healthcare?

SPEAKER_00 3:50

No, we really can't ban it because the efficiency gains are just too massive. The solution is implementing strict, non-negotiable guardrails. Trevor Burrus, Jr.

SPEAKER_01 3:58

Like data privacy. Because I know consumer chat GPTUs can trigger huge penalties.

SPEAKER_00 4:02

Oh, absolutely. Pasting patient data into an open model can trigger IPATO fines up to$50,000 per incident. You need isolated enterprise level environments.

SPEAKER_01 4:13

But an isolated environment doesn't magically stop the AI from hallucinating a fake citation, does it?

SPEAKER_00 4:18

No, it doesn't. To fix the hallucination problem, you have to intercept that autocomplete mechanism. And the current gold standard for this is called RAG, or retrieval augmented generation.

SPEAKER_01 4:29

Aaron Ross Powell RAG. Okay. How does RAG physically change what the model does?

SPEAKER_00 4:35

Instead of letting the AI generate answers based on the vast, messy internet data it was trained on, RAG forces a middle step. When you ask a question, the system first searches a curated, verified knowledge base.

SPEAKER_01 4:47

Aaron Ross Powell Like a private server of CDC guidelines.

SPEAKER_00 4:49

Aaron Ross Powell Exactly. It retrieves those specific documents, feeds them to the LOM, and instructs the model to only summarize what is in those documents.

SPEAKER_01 4:56

Ah, so you constrain its vocabulary to actual evidence. You shrink its universe of information so it literally cannot mathematically wander off into fiction.

SPEAKER_00 5:03

Yes. And looking ahead, we are seeing models like MedPolem that are explicitly fine-tuned on medical literature to further reduce errors.

SPEAKER_01 5:11

But until that technology matures, the consensus across our sources is clear. We should treat LLMs as research accelerators. Use them to draft protocols, find literature gaps, or you know, generate hypotheses.

SPEAKER_00 5:23

But never use them to make final decisions. The human has to stay actively in the loop.

SPEAKER_01 5:28

Right. So the major takeaway from our deep dive today is that LLMs amplify whatever we bring to them, both our confidence and our carelessness. The technology is just a tool. The discipline we apply to it is the actual variable.

SPEAKER_00 5:43

Which leaves us with a pretty critical question moving forward.

SPEAKER_01 5:46

Yeah, provocative thought for you, the listener, to mull over. If an AI is fundamentally just predicting the most statistically likely words based purely on historical training data, are we risking hard coding all of our past medical biases and blind spots directly into the future of public health?

SPEAKER_00 6:03

Definitely something to think about the next time you see a perfectly formatted report that sounds just a little too confident.