Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
This Is Why Regression Adjustment Doesn’t Work (And Nobody Talks About It)
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Regression adjustment is one of the most common tools in biostatistics and health research. It is often treated as proof that a study has properly controlled for differences and moved closer to the truth. But what if regression adjustment is creating more confidence than validity?
In this episode, we break down why regression models often fail to remove bias, how adding more covariates can sometimes make analysis worse, and why statistical control is much weaker than most people think. If you use or interpret regression results, this is a concept you cannot afford to ignore.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
Okay, so welcome to today's deep dive. We're uh unpacking some excerpts from a paper called The Illusion of Control: Rethinking Regression Adjustment.
SPEAKER_00And the mission here is well, it's pretty simple, but incredibly crucial. We really need to figure out why the phrase we controlled for in health research.
SPEAKER_01Yeah, you see that phrase literally everywhere.
SPEAKER_00You really do. And it might actually be giving you, the listener, a completely false sense of security.
SPEAKER_01Aaron Powell Exactly. I mean, we constantly read studies claiming they controlled for variables like, I don't know, age or pre-existing conditions, and we just instantly assume the findings are bulletproof.
SPEAKER_00Well, because regression adjustment is taught everywhere, right? So naturally it's just the default analytical tool everyone reaches for.
SPEAKER_01But and this is where it gets interesting that default creates this really dangerous illusion of control.
SPEAKER_00Yeah, exactly. People see a multivariable model in a research paper and they just kind of assume the hard part is over.
SPEAKER_01Aaron Powell Like it's a magic wand or something.
SPEAKER_00Aaron Powell Yes. They treat this uh conditional statistical tool as if it automatically scrubs the data clean of all the real-world messiness.
SPEAKER_01Which I'll admit I have absolutely fallen for that. You just assume statistical control is an absolute fix.
SPEAKER_00Aaron Powell But think about it. If researchers take a highly complex, you know, chaotic human relationship and force it into a really rigid mathematical equation.
SPEAKER_01Aaron Powell They aren't actually removing the mess, are they?
SPEAKER_00Aaron Powell Not at all. They're literally just rearranging it. Statistical control only adjusts for the variables exactly as they are represented in the data set.
SPEAKER_01So if the data itself is bad, the control is bad.
SPEAKER_00Precisely. Imagine a study trying to quote unquote control for diet, just to see how a new heart drug works. But the only data point they have for diet is a survey asking, do you eat vegetables? A simple yes or no.
SPEAKER_01Oh wow, that is a terrible proxy for someone's actual diet.
SPEAKER_00Aaron Powell Right, it is. The regression output will still calculate an adjustment based on that yes or no, and then the final paper will boldly claim we controlled for diet.
SPEAKER_01So the estimate looks super rigorous on paper.
SPEAKER_00Yeah, but it remains deeply misleading because the measurement was so weak. I mean, the math worked perfectly, but the logic just completely failed.
SPEAKER_01Well, my immediate instinct, at least as a non-statistician, is to just say, well, if one variable is a weak proxy, just throw 50 more variables into the model. Because more data points should automatically give you a safer, more accurate model.
SPEAKER_00Actually, no. Adding variables without mapping out the cause and effect first, uh, it introduces entirely new biases.
SPEAKER_01Wait, really? How does adding data make it worse?
SPEAKER_00Well, for example, by just throwing in more variables, you could accidentally adjust for what is called a mediator.
SPEAKER_01A mediator, meaning like a variable that sits directly between the cause and the effect.
SPEAKER_00Yes. So say you want to know if smoking causes heart attacks. If you build a model and decide to control for high blood pressure, you've just adjusted for a mediator.
SPEAKER_01Because smoking causes high blood pressure.
SPEAKER_00Exactly. Smoking causes the high blood pressure, which in turn causes the heart attack. By mathematically neutralizing blood pressure in your model, you erase a huge part of the very effect you're trying to measure.
SPEAKER_01Oh, I see. So the study would conclude smoking isn't that bad for your heart, completely missing the actual mechanism.
SPEAKER_00Straton. And the model can break in other ways, too. You could also accidentally adjust for a collider. Let's say you're looking at a hospital data set to see if having a respiratory disease makes you more likely to have a broken leg.
SPEAKER_01I mean, in the general population, those two things have absolutely nothing to do with each other.
SPEAKER_00Right. But remember, you're only looking at people admitted to the hospital.
SPEAKER_01Oh, and people usually only go to the hospital if they have a severe disease or a severe injury.
SPEAKER_00Yes. Hospital admission is the collider here. It's an outcome caused by both the disease and the broken leg.
SPEAKER_01So by only analyzing hospitalized patients.
SPEAKER_00You are effectively controlling for that collider. Suddenly, the math will show a massive, entirely fake correlation between respiratory illnesses and broken bones. The source beautifully calls this trap confidence theater. The model gets wider and more elaborate, making the researchers highly confident, but it's really just mathematically generating its own bias.
SPEAKER_01That is genuinely terrifying when you realize these adjusted estimates carry so much weight in epidemiology.
SPEAKER_00They really do. Unadjusted comparisons get ignored while these confidence theater models basically dictate public policy.
SPEAKER_01And the actual medical treatments you and I receive. So how do researchers even fix this? Is the answer just waiting for better machine learning software that can, I don't know, automatically flag a collider?
SPEAKER_00No, a computer can't fix this because it's a reasoning problem, not a calculation problem. Better practice has to start before the model is ever built. Researchers need to define a clear estimate. Just a highly specific, defensible, causal question they want to answer, and they also need to run sensitivity analyses.
SPEAKER_01So they have to basically stress test the model to see how fragile the math actually is.
SPEAKER_00Yes, it forces humility. A sensitivity analysis asks: if there's an unknown variable we completely missed, how big would its impact have to be to overturn our results?
SPEAKER_01Aaron Powell So regression should just be one component of scientific reasoning, not a total substitute for it.
SPEAKER_00Exactly.
SPEAKER_01It fundamentally changes how you read the news. Regression adjustment isn't a guarantee that bias is handled at all. It's really simply a modeling choice.
SPEAKER_00Aaron Powell A choice that can easily go wrong.
SPEAKER_01And I really want you to consider how this extends into your own daily life. The source exposes how confidence theater impacts health policy, but it goes way beyond that. Exactly. Think about it. If elite researchers can fall for this illusion of control, well, how much blind faith are we putting into the massively complex, quote unquote, adjusted algorithms that are currently determining your credit scores?
SPEAKER_00Or screening your job applications.
SPEAKER_01Or calculating your insurance premiums. It's definitely something to think about.