Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
Everyone Uses Sensitivity Analyses… But They Fail When the Assumption Space Is Too Small
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Sensitivity analyses are often presented as proof that a result is robust and trustworthy. They are supposed to show that findings hold up even when assumptions are changed. But what if the analysis only tested a tiny corner of the uncertainty that actually matters?
In this episode, we break down why sensitivity analyses often fail, how local robustness can create false reassurance, and why truly strong evidence has to challenge deeper sources of fragility.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
Welcome to today's deep dive. We are looking at excerpts from a really fascinating paper called The Fragility of Robustness: Why Sensitivity Analyses Fail.
SPEAKER_01It's a great one.
SPEAKER_00Yeah, and our mission today is to uncover why the mathematical tools designed to make scientific research trustworthy might actually be, you know, lulling us into a false sense of security.
SPEAKER_01Right, which is a scary thought.
SPEAKER_00Okay, let's unpack this. Imagine you build a new bridge and you need to stress test it. But instead of driving like a fleet of heavy semi-trucks over it, you just send over a dozen bicycles. The bridge holds up fine, so you declare it perfectly safe for all traffic.
SPEAKER_01Which is incredibly reassuring, but completely misses the point of a stress test. In the scientific community, we use a tool called a sensitivity analysis. It's meant to be a form of scientific self-criticism. Researchers vary their assumptions and tweak their models to see if their final result still holds up. And if the result doesn't collapse the moment one modeling choice changes, the finding is considered robust.
SPEAKER_00Wait, I'm stuck on this idea already. If I tweak a variable in my model and the result still holds up, haven't I proven my math works? I mean, why isn't that enough to declare robust?
SPEAKER_01Well, you've proven the math works for that specific variable, but you're falling into the trap of what this paper calls local robustness. Analysts often only test nearby assumptions. They might swap out like one minor data point or change an exclusion rule, but they stay completely inside their original analytic worldview. What they ignore are the deeper structural threats.
SPEAKER_00Meaning what exactly?
SPEAKER_01Things like unmeasured confounding, where a hidden third variable is actually causing the effect you're seeing, or massive selection bias, where the people you surveyed don't represent the real world at all. Tweaking a tiny variable doesn't test for those massive structural flaws.
SPEAKER_00So it's like vigorously checking to make sure your car's mirrors are perfectly aligned, but completely ignoring that the engine block is missing. Mirrors are perfect, but the car still won't drive. So what does this all mean? Are researchers intentionally creating this illusion, or is it just an academic blind spot where they can't see the forest for the trees?
SPEAKER_01It's almost always a genuine academic blind spot. I mean, they are doing the math correctly, but they're testing the wrong things. The problem is that when reviewers or policymakers read the phrase robust to sensitivity analysis, they relax their skepticism. It unintentionally functions as a, well, a confidence amplifier.
SPEAKER_00Because it sounds authoritative.
SPEAKER_01Yes, it really does. And this is especially critical in public health research.
SPEAKER_00Right, because the stakes are so high.
SPEAKER_01Exactly. And observational data is inherently messy. Narrow, polite tests can make extremely fragile evidence sound policy ready long before it deserves that status.
SPEAKER_00Here's where it gets really interesting. It's like a security guard verifying the front door is locked, but ignoring that the walls are made of paper.
SPEAKER_01That's a great way to put it.
SPEAKER_00And you listening to this, you need to care about this. This policy-ready data directly shapes the health guidelines, the medical, and the laws you live under every single day.
SPEAKER_01It affects everyone.
SPEAKER_00If the math, giving the green light to those policies is fundamentally flawed, it impacts all of us. So how do we fix it?
SPEAKER_01Well, that brings us to the core lesson of the paper. These analyses fail because they are far too polite. Better practice requires bringing out the heavy machinery to see where the model breaks. The author suggests tools like negative controls and missing data stress tests.
SPEAKER_00Let's break those down. I hear negative control and I think of like a high school science fair. How does that work in statistics?
SPEAKER_01Think of a negative control as testing your model on a scenario where you already know the answer must be zero. If you are studying a new drug's effect on heart disease, you might run your statistical model to see if the drug impacts well in grown toenails. If your model suddenly finds a massive effect where there shouldn't be one, you instantly know your underlying math is biased.
SPEAKER_00Okay, that makes perfect sense. You force the model to look at something totally unrelated to expose the underlying flaw. What about the missing data stress test?
SPEAKER_01So instead of assuming the data you couldn't collect would have supported your conclusion, you actively assume the worst. You plug in adversarial numbers for the missing data to see if your conclusion completely falls apart. The goal isn't to perform a polite check to prove you are right. The goal is to expose exactly where your result breaks.
SPEAKER_00So instead of gently nudging the model to prove it's sturdy, good biostatistics should actively try to destroy it.
SPEAKER_01You have to find the breaking point to know what the model can actually withstand.
SPEAKER_00So the next time you see the word robustness thrown around to defend a new study or a sweeping policy, don't just take it at face value. Ask yourself, were the right sensitivities challenged? Which leaves us with a final thought to chew on. If so much scientific policy survives simply because it was only ever tested politely, how many of your own deeply held beliefs only survive because you've never subjected them to an adversarial stress test?