Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
In Theory, Confounding Adjustment Works. In Reality… It Doesn’t
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Confounding adjustment is one of the most common phrases in epidemiology and observational research. It is often treated as proof that a study has handled bias and moved closer to a causal answer. But what if adjustment is creating more confidence than the data actually deserve?
In this episode, we break down why confounding adjustment often fails, how poorly measured or incorrectly chosen variables leave bias behind, and why “adjusted” is not the same thing as “causal.” If you read or produce observational research, this is an essential concept to understand.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
You know that uh that panic cleaning you do right before guests come over? You grab the loose mail, the shoes, random charging cords, and you just sort of shove them all into a hallway closet, you shut the door, and boom, the ring looks perfectly clean.
SPEAKER_01Right, but the mess is still there. I mean, you haven't actually cleaned anything, you've just hidden it behind a closed door.
SPEAKER_00Exactly. And you know, in the world of research, there's a specific phrase that acts exactly like that closet door, which is uh we adjust it for confounder. So rigorous. But in our deep dive today into the article, The Illusion of Causal Adjustment, our mission is why statistical adjustment doesn't automatically mean causal proof. So to kick us off, why do researchers even feel the need to like clean up the data in the first place?
SPEAKER_01Well, it really comes down to how we study human beings. In an observational study, we can't just, you know, assign one group of people to smoke and another to live in pristine conditions. We just observe them in the real world. If we're comparing smokers and non smokers, those two groups already differ in a million ways besides just the smoking. Income, diet, stress levels. So researchers use statistical tools to basically mathematically balance those groups out so they can compare apples to apples.
SPEAKER_00Okay, that makes perfect sense. If the groups are different, you balance them out. You throw age, income, diet into your statistical model to adjust for them. But I mean, isn't more data always better? Why wouldn't I just throw every single variable I have into the math?
SPEAKER_01Uh that is a massive trap, actually. Throwing everything into a model assumes adjustment is just this mechanical checklist. But blindly adding variables can actually create statistical illusions. Take something called a mediator.
SPEAKER_00Wait, what exactly is a mediator?
SPEAKER_01So a mediator is a stepping stone on the path between a cause and an effect. Imagine you want to know if exercising causes weight loss, but in your math model, you decide to adjust for calories burned.
SPEAKER_00Oh, I see where this is going.
SPEAKER_01Calories burned is the mediator. It's the actual mechanism of how exercise causes weight loss. If you mathematically remove that stepping stone, your data will falsely show that exercise has absolutely no effect on weight.
SPEAKER_00Wow, because I essentially factored out the very mechanism I was trying to study. I basically hid the actual cause in the closet.
SPEAKER_01You did. And it gets even stranger with something called a collider. This is when two completely unrelated things both cause a third thing. Like let's say being a talented actor and being incredibly lucky both help you get a Hollywood role.
SPEAKER_00Okay, sure.
SPEAKER_01In the general population, talent and luck have nothing to do with each other. But if you only look at successful Hollywood actors, meaning you adjust for the collider of getting the role, suddenly it looks like talented actors are less lucky.
SPEAKER_00Wait, really? Why?
SPEAKER_01Because to make it without talent, you had to be incredibly lucky.
SPEAKER_00Oh wow. So you've induced a false negative correlation that didn't even exist. So by trying to clean the data, I actually fabricated a bias out of thin air just by including the wrong variable. That completely flips my whole, you know, more data is better assumption. But let's say I map out my logic perfectly. I avoid colliders, I avoid mediators. The mathematical model should be bulletproof, then, right?
SPEAKER_01Not if the data itself is garbage. I mean, perfect math cannot fix bad data. Sometimes researchers use really weak proxies.
SPEAKER_00Like what? What's a weak proxy?
SPEAKER_01Like trying to measure a highly complex concept, uh, say a person's frailty or their access to healthcare, using just a single crude metric like a zip code, or they accidentally use post-exposure variables.
SPEAKER_00Which means data collected after the event happened.
SPEAKER_01Right, which completely scrambles the timeline. You can't use an effect to explain the cause.
SPEAKER_00It's like trying to navigate a city using a map drawn by someone who only ever looked out of an airplane window. The mathematical GPS might be highly sophisticated, but if the map is fundamentally flawed, you're still driving into a lake.
SPEAKER_01That is a great way to put it. And this is why researchers are really pushing for something called target trial thinking. Before you even touch a statistical model, you sit down and design the hypothetical perfect experiment on paper.
SPEAKER_00Oh, so you map out the exact sequence of events, what causes what, and what variables you actually need to measure. You're building the blueprint before you pour the concrete. So if we don't do this, if we just rely on messy data and blind adjustments, what is the real world fallout for us?
SPEAKER_01Aaron Powell The fallout is flawed public health policies. Public health relies heavily on observational evidence. And if weak evidence gets a free pass just because it carries the stamp of adjusted data, we end up making rules based on statistical illusions.
SPEAKER_00Which damages scientific trust entirely.
SPEAKER_01Good epidemiology requires stress testing your model to see how badly unmeasured variables could mess up your results. It demands total transparency about what we still just don't know.
SPEAKER_00So when you and I are reading an article and the headline proudly claims the adjusted for all confounders, we shouldn't just nod and accept it as gospel.
SPEAKER_01No, we should approach it with curiosity, not blind trust. Adjustment is a critical tool, but it's not a magic eraser.
SPEAKER_00Adjusted does not mean solved. It just means they shove some things into the closet and hope the door would hold. So next time you read a headline claiming a study adjusted for all lifestyle factors, ask yourself this What invisible variables like a person's daily micro stresses or their deeply ingrained habits were completely ignored by that neat mathematical equation? Keep questioning.