In Theory, Confounding Adjustment Works. In Reality… It Doesn’t Artwork

Data Science x Public Health

This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.

All Episodes

Data Science x Public Health

In Theory, Confounding Adjustment Works. In Reality… It Doesn’t

April 01, 2026 • BJANALYTICS

0:00 | 5:19

Confounding adjustment is one of the most common phrases in epidemiology and observational research. It is often treated as proof that a study has handled bias and moved closer to a causal answer. But what if adjustment is creating more confidence than the data actually deserve?

In this episode, we break down why confounding adjustment often fails, how poorly measured or incorrectly chosen variables leave bias behind, and why “adjusted” is not the same thing as “causal.” If you read or produce observational research, this is an essential concept to understand.

👉 Enjoyed the episode? Follow the show to get new episodes automatically.

If you found the content helpful, consider leaving a rating or review—it helps support the podcast.

For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com

Youtube: https://www.youtube.com/@BJANALYTICS

Instagram: https://www.instagram.com/bjanalyticsconsulting/

Twitter/X: https://x.com/BJANALYTICS

Threads: https://www.threads.com/@bjanalyticsconsulting

SPEAKER_00 0:00

You know that uh that panic cleaning you do right before guests come over? You grab the loose mail, the shoes, random charging cords, and you just sort of shove them all into a hallway closet, you shut the door, and boom, the ring looks perfectly clean.

SPEAKER_01 0:12

Right, but the mess is still there. I mean, you haven't actually cleaned anything, you've just hidden it behind a closed door.

SPEAKER_00 0:17

Exactly. And you know, in the world of research, there's a specific phrase that acts exactly like that closet door, which is uh we adjust it for confounder. So rigorous. But in our deep dive today into the article, The Illusion of Causal Adjustment, our mission is why statistical adjustment doesn't automatically mean causal proof. So to kick us off, why do researchers even feel the need to like clean up the data in the first place?

SPEAKER_01 0:43

Well, it really comes down to how we study human beings. In an observational study, we can't just, you know, assign one group of people to smoke and another to live in pristine conditions. We just observe them in the real world. If we're comparing smokers and non smokers, those two groups already differ in a million ways besides just the smoking. Income, diet, stress levels. So researchers use statistical tools to basically mathematically balance those groups out so they can compare apples to apples.

SPEAKER_00 1:11

Okay, that makes perfect sense. If the groups are different, you balance them out. You throw age, income, diet into your statistical model to adjust for them. But I mean, isn't more data always better? Why wouldn't I just throw every single variable I have into the math?

SPEAKER_01 1:26

Uh that is a massive trap, actually. Throwing everything into a model assumes adjustment is just this mechanical checklist. But blindly adding variables can actually create statistical illusions. Take something called a mediator.

SPEAKER_00 1:38

Wait, what exactly is a mediator?

SPEAKER_01 1:40

So a mediator is a stepping stone on the path between a cause and an effect. Imagine you want to know if exercising causes weight loss, but in your math model, you decide to adjust for calories burned.

SPEAKER_00 1:50

Oh, I see where this is going.

SPEAKER_01 1:51

Calories burned is the mediator. It's the actual mechanism of how exercise causes weight loss. If you mathematically remove that stepping stone, your data will falsely show that exercise has absolutely no effect on weight.

SPEAKER_00 2:04

Wow, because I essentially factored out the very mechanism I was trying to study. I basically hid the actual cause in the closet.

SPEAKER_01 2:11

You did. And it gets even stranger with something called a collider. This is when two completely unrelated things both cause a third thing. Like let's say being a talented actor and being incredibly lucky both help you get a Hollywood role.

SPEAKER_00 2:25

Okay, sure.

SPEAKER_01 2:26

In the general population, talent and luck have nothing to do with each other. But if you only look at successful Hollywood actors, meaning you adjust for the collider of getting the role, suddenly it looks like talented actors are less lucky.

SPEAKER_00 2:38

Wait, really? Why?

SPEAKER_01 2:40

Because to make it without talent, you had to be incredibly lucky.

SPEAKER_00 2:43

Oh wow. So you've induced a false negative correlation that didn't even exist. So by trying to clean the data, I actually fabricated a bias out of thin air just by including the wrong variable. That completely flips my whole, you know, more data is better assumption. But let's say I map out my logic perfectly. I avoid colliders, I avoid mediators. The mathematical model should be bulletproof, then, right?

SPEAKER_01 3:07

Not if the data itself is garbage. I mean, perfect math cannot fix bad data. Sometimes researchers use really weak proxies.

SPEAKER_00 3:15

Like what? What's a weak proxy?

SPEAKER_01 3:16

Like trying to measure a highly complex concept, uh, say a person's frailty or their access to healthcare, using just a single crude metric like a zip code, or they accidentally use post-exposure variables.

SPEAKER_00 3:28

Which means data collected after the event happened.

SPEAKER_01 3:31

Right, which completely scrambles the timeline. You can't use an effect to explain the cause.

SPEAKER_00 3:35

It's like trying to navigate a city using a map drawn by someone who only ever looked out of an airplane window. The mathematical GPS might be highly sophisticated, but if the map is fundamentally flawed, you're still driving into a lake.

SPEAKER_01 3:47

That is a great way to put it. And this is why researchers are really pushing for something called target trial thinking. Before you even touch a statistical model, you sit down and design the hypothetical perfect experiment on paper.

SPEAKER_00 4:00

Oh, so you map out the exact sequence of events, what causes what, and what variables you actually need to measure. You're building the blueprint before you pour the concrete. So if we don't do this, if we just rely on messy data and blind adjustments, what is the real world fallout for us?

SPEAKER_01 4:17

Aaron Powell The fallout is flawed public health policies. Public health relies heavily on observational evidence. And if weak evidence gets a free pass just because it carries the stamp of adjusted data, we end up making rules based on statistical illusions.

SPEAKER_00 4:31

Which damages scientific trust entirely.

SPEAKER_01 4:33

Good epidemiology requires stress testing your model to see how badly unmeasured variables could mess up your results. It demands total transparency about what we still just don't know.

SPEAKER_00 4:43

So when you and I are reading an article and the headline proudly claims the adjusted for all confounders, we shouldn't just nod and accept it as gospel.

SPEAKER_01 4:51

No, we should approach it with curiosity, not blind trust. Adjustment is a critical tool, but it's not a magic eraser.

SPEAKER_00 4:58

Adjusted does not mean solved. It just means they shove some things into the closet and hope the door would hold. So next time you read a headline claiming a study adjusted for all lifestyle factors, ask yourself this What invisible variables like a person's daily micro stresses or their deeply ingrained habits were completely ignored by that neat mathematical equation? Keep questioning.