Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
This AI Sounds Like an Expert… But It Might Be Lying
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
AI can now write like an experienced epidemiologist.
Clear. Structured. Confident.
But what happens when it’s wrong?
In this episode, we break down how large language models (LLMs) are being used in public health — from surveillance summaries to clinical decision support — and why their biggest strength is also their biggest risk.
You’ll learn:
What LLMs actually are (and what they’re not)
Where they’re already used in public health
Why hallucinations happen — and why they’re dangerous
The guardrails (RAG, validation, governance) that make them usable
The future of public health will use AI.
The real question is whether it can be trusted.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
Imagine you're a public health analyst. You're, you know, totally swamped, so you drop this mountain of surveillance data into an AI, just asking for a summary of the latest trends.
SPEAKER_00Right. And seconds later, it hands you a perfectly structured report.
SPEAKER_01Yeah, it reads like a veteran epidemiologist wrote it.
SPEAKER_00I mean the tone is authoritative, the numbers are highly specific, and the formatting is just, well, flawless.
SPEAKER_01But then you check paragraph three. There is a statistic sitting there that simply does not exist. And it's backed by a pristine, perfectly formatted citation to a paper that was never actually published.
SPEAKER_00Yeah, it's a complete hallucination.
SPEAKER_01And that is our mission for today's deep dive into the source material. We are exploring the LLM paradox in public health. Okay, let's unpack this. We have this technology that produces incredibly competent looking output, but it absolutely cannot be trusted without verification.
SPEAKER_00Yeah, to really understand why this happens, we have to look at the mechanics of the tech itself.
SPEAKER_01Aaron Powell Right, because is an LLM really less like a digital researcher and more like a uh massively scaled up version of your smartphones autocomplete?
SPEAKER_00That is honestly the perfect analogy. These models rely on what's called a transformer architecture. They aren't actually going into a database and pulling out facts.
SPEAKER_01Aaron Ross Powell So they aren't searching for the truth.
SPEAKER_00No, not at all. They are calculating the statistical probability of what the next token or like a chunk of a word should be based on billions of parameters. They're essentially just pattern matching engines.
SPEAKER_01Aaron Ross Powell Okay. So when that mathematical pattern aligns with reality, the output is brilliant.
SPEAKER_00Aaron Ross Powell Exactly. But when there's a gap in the data, the model doesn't just say, you know, I don't know. It just predicts the next most likely word anyway, creating something fabricated but structurally flawless.
SPEAKER_01Aaron Ross Powell Because it doesn't actually know what a fact is.
SPEAKER_00Aaron Powell Right. It only knows what a fact is supposed to look like. And because that output sounds so convincing, it's already being integrated into high-stakes public health workflows.
SPEAKER_01Aaron Powell We actually see that in the Boston University research, where they are using AI to summarize weekly surveillance reports.
SPEAKER_00Aaron Powell And these models are translating dense CDC jargon and even powering public-facing chatbots for the World Health Organization.
SPEAKER_01Aaron Powell Here's where it gets really interesting, though. If the WHO is deploying this, there must be heavy safeguards, right? I mean, the hallucination rate can't be that catastrophic in specialized settings.
SPEAKER_00Aaron Powell Well, you would hope so, but studies show between 15 and 30 percent of factual health claims generated by standard models are entirely fabricated.
SPEAKER_01Aaron Powell Wait, 30%? That is massive.
SPEAKER_00It is. And the real danger isn't the absurd, glaring errors. Like if an AI claims the maternal mortality rate is 5,000 per 100,000, any analyst flags that immediately.
SPEAKER_01Sure. It's obviously wrong.
SPEAKER_00Aaron Powell But the danger is when it invents a rate of, say, 28.3 instead of the actual 22.3.
SPEAKER_01Oh wow. Because a subtle error like that just gets rubber stamped and suddenly regional health resources are silently being misdirected.
SPEAKER_00Exactly. Which brings us to a fascinating study in nature medicine that really highlights the core of the LLM paradox.
SPEAKER_01Is this the one about diagnostic accuracy?
SPEAKER_00Yes. So in isolation, the AI they tested achieved a 95% diagnostic accuracy. But when human clinicians use that exact same AI as an assistant, their performance actually dropped.
SPEAKER_01Wait, really? If the tool is 95% accurate, shouldn't the human using it at least match that baseline?
SPEAKER_00It comes down to a psychological phenomenon called automation bias. The clinicians saw this hyper-confident, well-structured reasoning from the AI and basically just turned their brains off.
SPEAKER_01They just accepted the subtle mistakes rather than trusting their own expertise.
SPEAKER_00Exactly. The broken link wasn't the AI, it was the human AI interaction.
SPEAKER_01Aaron Powell So what does this all mean? Is the solution simply banning the tech in healthcare?
SPEAKER_00No, we really can't ban it because the efficiency gains are just too massive. The solution is implementing strict, non-negotiable guardrails. Trevor Burrus, Jr.
SPEAKER_01Like data privacy. Because I know consumer chat GPTUs can trigger huge penalties.
SPEAKER_00Oh, absolutely. Pasting patient data into an open model can trigger IPATO fines up to$50,000 per incident. You need isolated enterprise level environments.
SPEAKER_01But an isolated environment doesn't magically stop the AI from hallucinating a fake citation, does it?
SPEAKER_00No, it doesn't. To fix the hallucination problem, you have to intercept that autocomplete mechanism. And the current gold standard for this is called RAG, or retrieval augmented generation.
SPEAKER_01Aaron Ross Powell RAG. Okay. How does RAG physically change what the model does?
SPEAKER_00Instead of letting the AI generate answers based on the vast, messy internet data it was trained on, RAG forces a middle step. When you ask a question, the system first searches a curated, verified knowledge base.
SPEAKER_01Aaron Ross Powell Like a private server of CDC guidelines.
SPEAKER_00Aaron Ross Powell Exactly. It retrieves those specific documents, feeds them to the LOM, and instructs the model to only summarize what is in those documents.
SPEAKER_01Ah, so you constrain its vocabulary to actual evidence. You shrink its universe of information so it literally cannot mathematically wander off into fiction.
SPEAKER_00Yes. And looking ahead, we are seeing models like MedPolem that are explicitly fine-tuned on medical literature to further reduce errors.
SPEAKER_01But until that technology matures, the consensus across our sources is clear. We should treat LLMs as research accelerators. Use them to draft protocols, find literature gaps, or you know, generate hypotheses.
SPEAKER_00But never use them to make final decisions. The human has to stay actively in the loop.
SPEAKER_01Right. So the major takeaway from our deep dive today is that LLMs amplify whatever we bring to them, both our confidence and our carelessness. The technology is just a tool. The discipline we apply to it is the actual variable.
SPEAKER_00Which leaves us with a pretty critical question moving forward.
SPEAKER_01Yeah, provocative thought for you, the listener, to mull over. If an AI is fundamentally just predicting the most statistically likely words based purely on historical training data, are we risking hard coding all of our past medical biases and blind spots directly into the future of public health?
SPEAKER_00Definitely something to think about the next time you see a perfectly formatted report that sounds just a little too confident.