Data Science x Public Health

This Is Why Regression Adjustment Doesn’t Work (And Nobody Talks About It)

BJANALYTICS

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 5:53

Regression adjustment is one of the most common tools in biostatistics and health research. It is often treated as proof that a study has properly controlled for differences and moved closer to the truth. But what if regression adjustment is creating more confidence than validity? 

In this episode, we break down why regression models often fail to remove bias, how adding more covariates can sometimes make analysis worse, and why statistical control is much weaker than most people think. If you use or interpret regression results, this is a concept you cannot afford to ignore.

👉 Enjoyed the episode? Follow the show to get new episodes automatically.

If you found the content helpful, consider leaving a rating or review—it helps support the podcast.

For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com

Youtube: https://www.youtube.com/@BJANALYTICS

Instagram: https://www.instagram.com/bjanalyticsconsulting/

Twitter/X: https://x.com/BJANALYTICS

Threads: https://www.threads.com/@bjanalyticsconsulting

SPEAKER_01

Okay, so welcome to today's deep dive. We're uh unpacking some excerpts from a paper called The Illusion of Control: Rethinking Regression Adjustment.

SPEAKER_00

And the mission here is well, it's pretty simple, but incredibly crucial. We really need to figure out why the phrase we controlled for in health research.

SPEAKER_01

Yeah, you see that phrase literally everywhere.

SPEAKER_00

You really do. And it might actually be giving you, the listener, a completely false sense of security.

SPEAKER_01

Aaron Powell Exactly. I mean, we constantly read studies claiming they controlled for variables like, I don't know, age or pre-existing conditions, and we just instantly assume the findings are bulletproof.

SPEAKER_00

Well, because regression adjustment is taught everywhere, right? So naturally it's just the default analytical tool everyone reaches for.

SPEAKER_01

But and this is where it gets interesting that default creates this really dangerous illusion of control.

SPEAKER_00

Yeah, exactly. People see a multivariable model in a research paper and they just kind of assume the hard part is over.

SPEAKER_01

Aaron Powell Like it's a magic wand or something.

SPEAKER_00

Aaron Powell Yes. They treat this uh conditional statistical tool as if it automatically scrubs the data clean of all the real-world messiness.

SPEAKER_01

Which I'll admit I have absolutely fallen for that. You just assume statistical control is an absolute fix.

SPEAKER_00

Aaron Powell But think about it. If researchers take a highly complex, you know, chaotic human relationship and force it into a really rigid mathematical equation.

SPEAKER_01

Aaron Powell They aren't actually removing the mess, are they?

SPEAKER_00

Aaron Powell Not at all. They're literally just rearranging it. Statistical control only adjusts for the variables exactly as they are represented in the data set.

SPEAKER_01

So if the data itself is bad, the control is bad.

SPEAKER_00

Precisely. Imagine a study trying to quote unquote control for diet, just to see how a new heart drug works. But the only data point they have for diet is a survey asking, do you eat vegetables? A simple yes or no.

SPEAKER_01

Oh wow, that is a terrible proxy for someone's actual diet.

SPEAKER_00

Aaron Powell Right, it is. The regression output will still calculate an adjustment based on that yes or no, and then the final paper will boldly claim we controlled for diet.

SPEAKER_01

So the estimate looks super rigorous on paper.

SPEAKER_00

Yeah, but it remains deeply misleading because the measurement was so weak. I mean, the math worked perfectly, but the logic just completely failed.

SPEAKER_01

Well, my immediate instinct, at least as a non-statistician, is to just say, well, if one variable is a weak proxy, just throw 50 more variables into the model. Because more data points should automatically give you a safer, more accurate model.

SPEAKER_00

Actually, no. Adding variables without mapping out the cause and effect first, uh, it introduces entirely new biases.

SPEAKER_01

Wait, really? How does adding data make it worse?

SPEAKER_00

Well, for example, by just throwing in more variables, you could accidentally adjust for what is called a mediator.

SPEAKER_01

A mediator, meaning like a variable that sits directly between the cause and the effect.

SPEAKER_00

Yes. So say you want to know if smoking causes heart attacks. If you build a model and decide to control for high blood pressure, you've just adjusted for a mediator.

SPEAKER_01

Because smoking causes high blood pressure.

SPEAKER_00

Exactly. Smoking causes the high blood pressure, which in turn causes the heart attack. By mathematically neutralizing blood pressure in your model, you erase a huge part of the very effect you're trying to measure.

SPEAKER_01

Oh, I see. So the study would conclude smoking isn't that bad for your heart, completely missing the actual mechanism.

SPEAKER_00

Straton. And the model can break in other ways, too. You could also accidentally adjust for a collider. Let's say you're looking at a hospital data set to see if having a respiratory disease makes you more likely to have a broken leg.

SPEAKER_01

I mean, in the general population, those two things have absolutely nothing to do with each other.

SPEAKER_00

Right. But remember, you're only looking at people admitted to the hospital.

SPEAKER_01

Oh, and people usually only go to the hospital if they have a severe disease or a severe injury.

SPEAKER_00

Yes. Hospital admission is the collider here. It's an outcome caused by both the disease and the broken leg.

SPEAKER_01

So by only analyzing hospitalized patients.

SPEAKER_00

You are effectively controlling for that collider. Suddenly, the math will show a massive, entirely fake correlation between respiratory illnesses and broken bones. The source beautifully calls this trap confidence theater. The model gets wider and more elaborate, making the researchers highly confident, but it's really just mathematically generating its own bias.

SPEAKER_01

That is genuinely terrifying when you realize these adjusted estimates carry so much weight in epidemiology.

SPEAKER_00

They really do. Unadjusted comparisons get ignored while these confidence theater models basically dictate public policy.

SPEAKER_01

And the actual medical treatments you and I receive. So how do researchers even fix this? Is the answer just waiting for better machine learning software that can, I don't know, automatically flag a collider?

SPEAKER_00

No, a computer can't fix this because it's a reasoning problem, not a calculation problem. Better practice has to start before the model is ever built. Researchers need to define a clear estimate. Just a highly specific, defensible, causal question they want to answer, and they also need to run sensitivity analyses.

SPEAKER_01

So they have to basically stress test the model to see how fragile the math actually is.

SPEAKER_00

Yes, it forces humility. A sensitivity analysis asks: if there's an unknown variable we completely missed, how big would its impact have to be to overturn our results?

SPEAKER_01

Aaron Powell So regression should just be one component of scientific reasoning, not a total substitute for it.

SPEAKER_00

Exactly.

SPEAKER_01

It fundamentally changes how you read the news. Regression adjustment isn't a guarantee that bias is handled at all. It's really simply a modeling choice.

SPEAKER_00

Aaron Powell A choice that can easily go wrong.

SPEAKER_01

And I really want you to consider how this extends into your own daily life. The source exposes how confidence theater impacts health policy, but it goes way beyond that. Exactly. Think about it. If elite researchers can fall for this illusion of control, well, how much blind faith are we putting into the massively complex, quote unquote, adjusted algorithms that are currently determining your credit scores?

SPEAKER_00

Or screening your job applications.

SPEAKER_01

Or calculating your insurance premiums. It's definitely something to think about.