Data Science x Public Health
This podcast discusses the concepts of data science and public health, and then delves into their intersection, exploring the connection between the two fields in greater detail.
Data Science x Public Health
This Is Why Outbreak Curves Don’t Work (And Nobody Talks About It)
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Outbreak curves are one of the most recognizable tools in epidemiology. They appear to show whether an epidemic is rising, peaking, or falling in real time. But what if the curve is reflecting reporting behavior as much as disease transmission?
In this episode, we break down why outbreak curves often mislead, how reporting delays and revisions distort the shape people think they are seeing, and why epidemiologic interpretation has to go deeper than the visual alone.
👉 Enjoyed the episode? Follow the show to get new episodes automatically.
If you found the content helpful, consider leaving a rating or review—it helps support the podcast.
For business and sponsorship inquiries, email us at:
📧 contact@bjanalytics.com
Youtube: https://www.youtube.com/@BJANALYTICS
Instagram: https://www.instagram.com/bjanalyticsconsulting/
Twitter/X: https://x.com/BJANALYTICS
Okay, let's unpack this. Imagine looking out like a perfectly clear window, only to realize you're actually staring into a warped funhouse mirror.
SPEAKER_00Oh, that's a great way to put it.
SPEAKER_01Right. And that is basically what happens every time you look at an outbreak curve on the news. Today we're taking a deep dive into excerpts from the Mirage of the Outbreak Curve to figure out why epidemiology's most iconic visual is, well, often an illusion.
SPEAKER_00Yeah. And what's fascinating here is that decision makers, journalists, and really all of us watching at home, we just fall for it. Because, you know, that curve feels so intuitive. Cases go up, they peak, they come down.
SPEAKER_01Exactly. It looks like a direct, real-time picture of a disease spreading.
SPEAKER_00It does. But that clean line actually hides a really messy, disjointed administrative pipeline.
SPEAKER_01Wait, I have to challenge this right out of the gate. I mean, we live in a highly digitized world. If I get a lab result today, shouldn't that instantly trigger a point on a graph? How are we still getting data backlogs on like a random Tuesday?
SPEAKER_00I know. It sounds like it should be instant. But let's actually trace what happens when someone gets sick. So you develop symptoms, then maybe a couple of days later, you manage to get a test.
SPEAKER_01Right. If you can even get an appointment.
SPEAKER_00Then the lab has to process it, which obviously takes time. And then that lab system has to talk to a public health database.
SPEAKER_01Which I'm guessing uses completely different software.
SPEAKER_00Oh, almost always. So if a curve is built using the day that data finally hits the system, the so-called report date, it's really just visualizing administrative timing. It's not the actual speed of the virus at all.
SPEAKER_01So a spike today might just be like a server finally sinking, not an actual sudden outbreak.
SPEAKER_00Yeah, pretty much.
SPEAKER_01But what if they use the date the person actually started feeling sick? Doesn't that onset date fix the timeline?
SPEAKER_00It helps, sure. But onset date curves have this massive blind spot at the most critical place, the far right edge of the graph. That edge represents the last few days, which is the exact time frame everyone is obsessively watching to see if a public health intervention is working or failing.
SPEAKER_01Because those recent cases are still stuck somewhere in that administrative pipeline we just talked about.
SPEAKER_00You got it. So that right edge is just inherently unstable. The new cases haven't even arrived yet.
SPEAKER_01That creates this bizarre time travel effect, right? Where the past data is literally changing behind our backs as late reports trickle in.
SPEAKER_00It really does. And it's not even just late reports adding to the pile. You have to think about deduplication too.
SPEAKER_01Deduplication? What do you mean?
SPEAKER_00Well, say you get a rapid test at a pharmacy, and then you get a PCR test at a hospital the next day to confirm. The system initially counts two sick people.
SPEAKER_01Oh, I see.
SPEAKER_00Yeah. And days later, a human administrator realizes it's the same person and deletes one record. So suddenly a past spike on the graph just shrinks.
SPEAKER_01Which completely warps how leaders respond. So a sudden spike might not be a scary new super spreader event, but literally just an administrator releasing a backlog of old data on a Tuesday.
SPEAKER_00Exactly. Or conversely, if data drops off because fewer administrators are processing records over the weekend, a mayor might look at the curve and think the crisis is magically over.
SPEAKER_01Oh wow, they're reacting to paperwork, not the pathogen.
SPEAKER_00Yes. And this illusion creates a dangerous false certainty. We end up treating an evolving estimate as if it's this fixed, undeniable truth.
SPEAKER_01So if the visuals are inherently distorted by these systemic blind spots, we can't just keep staring at the funhouse mirror. We have fundamentally changed how we interact with them.
SPEAKER_00Absolutely. We need better interpretation.
SPEAKER_01And the sources mention things like delay adjustments and now casting to fix this. How does that actually work in practice?
SPEAKER_00Well, now casting is essentially like weather forecasting, but for the present moment. Instead of just logging the delayed cases they currently have, epidemiologists look at historical patterns of how late data usually is.
SPEAKER_01Oh, so they use that math to estimate the true number of infections happening today.
SPEAKER_00Exactly. They compensate for the paperwork that hasn't arrived yet. They also use revision aware dashboards.
SPEAKER_01What do those look like?
SPEAKER_00They physically show the uncertainty on the right edge of the graph. So maybe they fade the line out or add a wide shaded area, just so you know that specific data is still settling. A curve isn't a fixed truth. You know, it's an evolving estimate.
SPEAKER_01So what does this all mean? It sounds like you have to interpret the surveillance system itself, not just passively read the shape on the screen.
SPEAKER_00Yeah, a curve is never raw reality. It is a processed image.
SPEAKER_01Which leaves you with a pretty wild thought to chew on. If these incredibly trusted epidemiological visuals are so easily warped by the systems that create them, how much of our historical memory of past health crises is actually just a picture of administrative bottlenecks rather than the true spread of a disease? Next time you look at the data, just remember the distortion itself is a huge part of the story.