Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
You’ve Been Using Predictive Models Wrong — Here’s What Actually Happens
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Predictive models are widely used to identify high-risk patients and populations.
They promise earlier intervention, better resource allocation, and improved outcomes.
But what if prediction alone is not enough to actually change what happens next?
In this episode, we break down the critical difference between prediction and causation—and why models that perform well statistically can still fail when used in real-world decision-making. You will learn why predicting risk is not the same as knowing what action to take, and how this gap affects healthcare and public health systems.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
Welcome to today's deep dive. We're getting into something pretty wild today. Imagine noticing that every time people carry umbrellas, it rains. So to stop the storm, you just go ahead and ban umbrellas.
SPEAKER_00Right, which sounds completely ridiculous on its face.
SPEAKER_01Exactly. But right now, some of the world's most advanced healthcare and public health organizations are making that exact same logical leap with their data. Today we are unpacking excerpts from the hitting flaw, why prediction is not intervention, to figure out why these massive institutions completely misread their models.
SPEAKER_00They falsely assume that just because an algorithm can predict a problem, it automatically tells them how to fix it.
SPEAKER_01And why does that assumption break down so disastrously in the real world? Well, because risk prediction is fundamentally just correlation. It's not causation.
SPEAKER_00Right. We see this play out constantly. A hospital will build a model to predict patient readmissions, or a public health team will try to forecast overdose hotspots. They identify the correlated risk factors, the umbrellas, and treat them like the actual root cause of the problem.
SPEAKER_01Which is wild because the umbrella is just a proxy for the rain. It doesn't cause it. But when you're looking at massive spreadsheets of risk factors, it's easy to see how those lines get blurred.
SPEAKER_00Oh, they blur because organizations confuse two very different scientific methods. Identifying who is at risk, well, that's prediction. Figuring out what action will actually change that person's outcome requires causal inference, which basically means testing if pulling a specific lever actually causes a change in the real world. Treating predictive features like intervention levers is the ultimate hidden flaw.
SPEAKER_01But wait, if a model has like incredibly high accuracy metrics, say an algorithm is scoring 99% accuracy in finding mortality risks, my instinct as an administrator is to trust it. Shouldn't I let that guide my actions?
SPEAKER_00Well, no, because high accuracy metrics can actively mislead you. Let's look at the hospital mortality example from the source material. A model might use late-stage lab abnormalities to accurately predict that a patient won't survive. On paper, that model is incredibly accurate. It's successfully flagged the risk. But as a doctor trying to save a life, it's completely useless.
SPEAKER_01Because it's too late.
SPEAKER_00Exactly. By the time those specific lab abnormalities show up, the patient's deterioration is already well underway. You're catching the train right as it goes off the cliff.
SPEAKER_01Wow. So you're perfectly accurate, but way too late to change the outcome. What about outside the hospital, you know, in public health?
SPEAKER_00We see the same trap there with chronic disease tracking. A model might use ZIP codes to predict where disease will spike, and it might score incredibly high on an AUC metric.
SPEAKER_01And AUC is just a metric data scientists use to measure how well a model distinguishes between true and false positives, right?
SPEAKER_00A high score means the model is confidently flagging the right areas.
SPEAKER_01But if you deploy all your medical resources based purely on that high-scoring ZIP code data, don't you just create a feedback loop?
SPEAKER_00You absolutely do.
SPEAKER_01Because the algorithm highlights a specific neighborhood, officials send more resources to monitor it. And that naturally leads to finding more cases in that area, which feeds right back into the algorithm.
SPEAKER_00Making that ZIP code look even riskier the next time around becomes a total self-fulfilling prophecy. It just increases surveillance on an already burdened population without doing a single thing to address the upstream structural causes actually driving the disease.
SPEAKER_01So what does this all mean then? If treating scores like strategies just leads to these algorithmic feedback loops and false solutions, what are these models actually good for? Should we just scrap predictive models entirely?
SPEAKER_00Oh no, we definitely shouldn't scrap them. They remain highly valuable for specific operational tasks, things like triage, capacity forecasting, and resource planning. The key to mature data science is separating three distinct questions. Who is at risk? Why are they at risk? And what intervention will actually change that risk?
SPEAKER_01Ah. So you separate them, you use your predictive model to figure out who needs a hospital bed tomorrow. That's capacity planning. But you use entirely different causal models to figure out why they got sick and what medicine will fix it.
SPEAKER_00That's the exact distinction. You have to stop asking a thermometer to act as a thermostat.
SPEAKER_01Oh, I love that. So for you listening, the ultimate takeaway here is to never confuse operational success with actual improvement. Just because an algorithm correctly flags a risk, well, that doesn't mean it has solved the problem.
SPEAKER_00Yeah, it really highlights the boundaries of what our data can actually do for us.
SPEAKER_01Absolutely. But it leaves me with one final thought. If our predictive models rely entirely on finding patterns in past data, how can we ever confidently use them to design interventions for a completely unprecedented crisis?
SPEAKER_00Where no historical data exists at all.
SPEAKER_01Exactly. Because if the storm has never happened before, looking for umbrellas isn't going to save us.