Data Science x Public Health

You’ve Been Using Predictive Models Wrong — Here’s What Actually Happens

BJANALYTICS

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 4:48

Predictive models are widely used to identify high-risk patients and populations.
They promise earlier intervention, better resource allocation, and improved outcomes.

But what if prediction alone is not enough to actually change what happens next?

In this episode, we break down the critical difference between prediction and causation—and why models that perform well statistically can still fail when used in real-world decision-making. You will learn why predicting risk is not the same as knowing what action to take, and how this gap affects healthcare and public health systems.

👉 Enjoyed the episode? Follow the show to get new episodes automatically.

If you found the content helpful, consider leaving a rating or review—it helps support the podcast.

For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com

Youtube: https://www.youtube.com/@BJANALYTICS

Instagram: https://www.instagram.com/bjanalyticsconsulting/

Twitter/X: https://x.com/BJANALYTICS

Threads: https://www.threads.com/@bjanalyticsconsulting

SPEAKER_01

Welcome to today's deep dive. We're getting into something pretty wild today. Imagine noticing that every time people carry umbrellas, it rains. So to stop the storm, you just go ahead and ban umbrellas.

SPEAKER_00

Right, which sounds completely ridiculous on its face.

SPEAKER_01

Exactly. But right now, some of the world's most advanced healthcare and public health organizations are making that exact same logical leap with their data. Today we are unpacking excerpts from the hitting flaw, why prediction is not intervention, to figure out why these massive institutions completely misread their models.

SPEAKER_00

They falsely assume that just because an algorithm can predict a problem, it automatically tells them how to fix it.

SPEAKER_01

And why does that assumption break down so disastrously in the real world? Well, because risk prediction is fundamentally just correlation. It's not causation.

SPEAKER_00

Right. We see this play out constantly. A hospital will build a model to predict patient readmissions, or a public health team will try to forecast overdose hotspots. They identify the correlated risk factors, the umbrellas, and treat them like the actual root cause of the problem.

SPEAKER_01

Which is wild because the umbrella is just a proxy for the rain. It doesn't cause it. But when you're looking at massive spreadsheets of risk factors, it's easy to see how those lines get blurred.

SPEAKER_00

Oh, they blur because organizations confuse two very different scientific methods. Identifying who is at risk, well, that's prediction. Figuring out what action will actually change that person's outcome requires causal inference, which basically means testing if pulling a specific lever actually causes a change in the real world. Treating predictive features like intervention levers is the ultimate hidden flaw.

SPEAKER_01

But wait, if a model has like incredibly high accuracy metrics, say an algorithm is scoring 99% accuracy in finding mortality risks, my instinct as an administrator is to trust it. Shouldn't I let that guide my actions?

SPEAKER_00

Well, no, because high accuracy metrics can actively mislead you. Let's look at the hospital mortality example from the source material. A model might use late-stage lab abnormalities to accurately predict that a patient won't survive. On paper, that model is incredibly accurate. It's successfully flagged the risk. But as a doctor trying to save a life, it's completely useless.

SPEAKER_01

Because it's too late.

SPEAKER_00

Exactly. By the time those specific lab abnormalities show up, the patient's deterioration is already well underway. You're catching the train right as it goes off the cliff.

SPEAKER_01

Wow. So you're perfectly accurate, but way too late to change the outcome. What about outside the hospital, you know, in public health?

SPEAKER_00

We see the same trap there with chronic disease tracking. A model might use ZIP codes to predict where disease will spike, and it might score incredibly high on an AUC metric.

SPEAKER_01

And AUC is just a metric data scientists use to measure how well a model distinguishes between true and false positives, right?

SPEAKER_00

A high score means the model is confidently flagging the right areas.

SPEAKER_01

But if you deploy all your medical resources based purely on that high-scoring ZIP code data, don't you just create a feedback loop?

SPEAKER_00

You absolutely do.

SPEAKER_01

Because the algorithm highlights a specific neighborhood, officials send more resources to monitor it. And that naturally leads to finding more cases in that area, which feeds right back into the algorithm.

SPEAKER_00

Making that ZIP code look even riskier the next time around becomes a total self-fulfilling prophecy. It just increases surveillance on an already burdened population without doing a single thing to address the upstream structural causes actually driving the disease.

SPEAKER_01

So what does this all mean then? If treating scores like strategies just leads to these algorithmic feedback loops and false solutions, what are these models actually good for? Should we just scrap predictive models entirely?

SPEAKER_00

Oh no, we definitely shouldn't scrap them. They remain highly valuable for specific operational tasks, things like triage, capacity forecasting, and resource planning. The key to mature data science is separating three distinct questions. Who is at risk? Why are they at risk? And what intervention will actually change that risk?

SPEAKER_01

Ah. So you separate them, you use your predictive model to figure out who needs a hospital bed tomorrow. That's capacity planning. But you use entirely different causal models to figure out why they got sick and what medicine will fix it.

SPEAKER_00

That's the exact distinction. You have to stop asking a thermometer to act as a thermostat.

SPEAKER_01

Oh, I love that. So for you listening, the ultimate takeaway here is to never confuse operational success with actual improvement. Just because an algorithm correctly flags a risk, well, that doesn't mean it has solved the problem.

SPEAKER_00

Yeah, it really highlights the boundaries of what our data can actually do for us.

SPEAKER_01

Absolutely. But it leaves me with one final thought. If our predictive models rely entirely on finding patterns in past data, how can we ever confidently use them to design interventions for a completely unprecedented crisis?

SPEAKER_00

Where no historical data exists at all.

SPEAKER_01

Exactly. Because if the storm has never happened before, looking for umbrellas isn't going to save us.