Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
Healthcare Is Drowning in Data… So Who’s Making Sense of It?
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Healthcare is producing more data than ever before — electronic health records, wearables, genomic data, insurance claims, and real-time patient monitoring.
But data alone doesn’t solve problems.
In this episode, we break down health data science — the field where biostatistics, machine learning, and modern healthcare systems come together. You’ll learn what health data science actually is, how it differs from traditional biostatistics, and why it’s one of the fastest-growing careers in public health and data science.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
Right now, your smartwatch is streaming your heart rate. You know, tomorrow you might visit a clinic and an insurance claim gets filed. Oh, absolutely. We are generating petabytes of health data every single year. Having mountains of data isn't the same as actually having insight. We're exploring excerpts from the architecture of modern health data science.
SPEAKER_01It's a really fascinating read.
SPEAKER_00It is. And our mission today is to figure out how this massive explosion of real-world information is forcing traditional biostatistics to evolve into, well, a completely new discipline.
SPEAKER_01And I mean the leap from traditional biostatistics to modern health data science is just huge. For decades, biostatistics relied on heavily controlled environments.
SPEAKER_00Like uh like a clinical trial.
SPEAKER_01Exactly. Think of a clinical trial where you have a specific group of patients given a specific dose and they're monitored at exact intervals. The data is pristine by design. Aaron Powell Right.
SPEAKER_00It's very neat and tidy.
SPEAKER_01But modern health data is the complete opposite. It's wildly massive and inherently very, very messy.
SPEAKER_00Aaron Powell Which kind of makes me wonder why we can't just, you know, feed this new mountain of data into the old statistical calculators. But looking at the sources, there's a fundamental mismatch in purpose.
SPEAKER_01Yeah, that's a great point.
SPEAKER_00Like a clinical trial is designed purely to answer a scientific question. But an insurance billing code that wasn't designed for science. It was designed to get a doctor paid.
SPEAKER_01Right. And that is the defining tension of this entire field. Health data science isn't just, you know, biostatistics running on a faster computer.
SPEAKER_00This is not just a rebranding.
SPEAKER_01No, not at all. It's a distinct discipline sitting right at the intersection of biostatistics, computer science, and intense medical domain knowledge. You're trying to extract scientific truth from systems that were built for administrative tracking or patient charting, not research.
SPEAKER_00Aaron Powell It kind of reminds me of baking. Like traditional biostatistics is baking in a pristine, controlled test kitchen. But modern health data science is like trying to cook a massive feast during the dinner rush in this chaotic real-world restaurant.
SPEAKER_01I love that. That's exactly what it feels like because the constraints are intense, the regulatory environment is strict, and the ethical stakes are just incredibly high.
SPEAKER_00Okay, so if I'm looking at a patient's electronic health record, I'm seeing hard structured numbers, right? Like a blood pressure reading. Sitting right next to a doctor's hastily typed, completely unstructured notes, and then you mix in genomic sequences and like administrative billing codes from three years ago. It's a lot. How do you even put those in the same spreadsheet? I mean, it's beyond comparing apples and oranges. It's like trying to compare an apple to a tax return.
SPEAKER_01Yeah, and you really can't compare them until you fundamentally translate them. A lot of people think data science is all about building, you know, flashy predictive algorithms.
SPEAKER_00Right, the AI stuff.
SPEAKER_01Exactly. But in health data science, the absolute hardest, most time-consuming part of the job is what's called data harmonization. You have to force all these disparate data streams to speak the exact same language.
SPEAKER_00Kind of like converting a dozen different foreign currencies into a single gold standard, just so you can actually calculate how much global wealth you have.
SPEAKER_01That is a great way to visualize it. Hospital A codes a heart attack one way, and hospital B uses completely different shorthand, an algorithm will just see two different diseases. So the industry uses specific frameworks. There's one called OMOP, for instance, which acts like a universal dictionary. It maps those totally different inputs into one standardized concept.
SPEAKER_00Aaron Powell Wait, so how does that actually work for the really messy stuff? I mean, a billing code is one thing, but how do you harmonize a doctor's rambling paragraph about how a patient is feeling today?
SPEAKER_01Aaron Powell Well, that's where the computer science element really shines. Teams use natural language processing to scan those unstructured clinical notes.
SPEAKER_00Ah, NLP.
SPEAKER_01Exactly. It extracts key medical terms and automatically maps them to that universal dictionary. This is a very meticulous pipeline.
SPEAKER_00Aaron Powell So you can't just jump straight into the modeling.
SPEAKER_01No, not at all. You start by navigating strict data access agreements like institutional review boards, just to even touch the information. Then you use heavy programming tools, Python, R, SQL, massive cloud platforms to clean and harmonize it. And only after all of that can you actually build your predictive model.
SPEAKER_00Aaron Powell Which brings up something interesting. The sources note a massive AI-focused hiring boom right now, specifically looking toward 2026 for machine learning engineers and clinical informaticists. But with all this automated natural language processing and advanced tooling, are the machines just taking over the pipeline? Like, are we just feeding this chaos into a black box and letting AI sort it out?
SPEAKER_01Aaron Powell Not at all. And the sources are very clear on this. The computational power is scaling up, but the human element is non-negotiable. This is where that deep clinical domain knowledge comes into play. An algorithm might crunch the standardized data and spot a correlation between a specific drug and a drop in blood pressure.
SPEAKER_00Okay, that sounds useful.
SPEAKER_01It is, but a human clinician has to look at that output and determine if it actually makes biological sense. Or if it's just, you know, a bizarre statistical fluke caused by a glitch in the hospital's buildings. Absolutely need human medical expertise for validation and safe deployment.
SPEAKER_00So health data science basically takes the rigorous foundation of traditional biostatistics and radically scales it up, translates the chaotic real world data of our modern healthcare system into a standardized language with wearables turning every heartbeat, every breath, and every single step into a data point to be harmonized and modeled. At what point does hyperanalyzing our health metrics begin to change our actual human experience of what it feels like to be healthy?