Informonster Podcast

Episode 52: Charlie Harp Discusses Why Good Enough Isn’t Good Enough

Clinical Architecture Episode 52

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 28:17

In this episode, Charlie Harp discusses why healthcare’s long-standing definition of “good enough” data no longer works in a world increasingly driven by interoperability, analytics, and AI.  Drawing from a recent presentation at the AMIA Amplify Conference 2026, Charlie breaks down the growing gap between the way healthcare data was originally designed to function and the way organizations are now trying to use it. 

From interoperability and population health to clinical decision support and generative AI, Charlie explores how healthcare organizations are asking data to do far more than it was ever designed to do. He also breaks down the six dimensions of data quality and why issues like integrity, relevance, liquidity, usablity, granularity, and trust have become critical to the future of intelligent healthcare systems.

The conversation also dives into the risks of “uncalibrated uncertainty,” the hidden dangers of low-quality data powering AI systems, and why healthcare leaders need to rethink how they evaluate, govern, and invest in data quality before the consequences become much harder to ignore. 

Contact Clinical Architecture

• Tweet us at @ClinicalArch 
• Follow us on LinkedIn and Facebook
• Email us at informonster@clinicalarchitecture.com

Thanks for listening! 

Charlie Harp (00:09):

I'm Charlie Harp and this is the Informaster Podcast. Today on the podcast, I want to talk about something that's been a recurring theme in my career and honestly a recurring source of frustration and inspiration for the better part of 30 years. I'm going to talk about data quality and why right now in this particular moment in healthcare, data quality is maybe the most consequential conversation that we're not having loudly enough. I just got back from AMIA's Amplify Conference in Denver where I did a presentation where I talked about data quality in this watershed moment that we're in today. And what I'm going to try to do in this podcast echoes that presentation or I'm going to explain why this moment is different and why the conversation we've been having about data quality for decades is no longer adequate and why if we don't change our thinking, we're going to find ourselves in some genuinely dangerous territory.

(01:10):

So let's get into it. There's a conversation that I've had more times than I can count in my career. I'm in a room, could be a health system, could be a vendor, could be a conference, and I start talking about the need to improve data quality. And almost without fail, somebody in that room says some version of the following. The data quality is good enough.

(01:35):

I have to tell you, every single time I hear that or something like it, something in me kind of short circuits because as somebody who spent my career living at the intersection of health informatics, data science, analytics, and systems design, I know that we are barely scratching the surface of what's possible with our data. And a huge reason for that, arguably the reason for that is that the data is genuinely not fit for the purposes we are now trying to use it for. But here's where it gets interesting because the person who says the data is fine isn't necessarily wrong. They're just answering a different question than the one that I'm asking. And that distinction, that gap between their definition of good enough and mine is really the crux of the whole issue. So let me try to unpack it because I think once you see it, you can't unsee it.

(02:32):

When I talk about data quality and healthcare, I'm thinking about things like interoperability. I'm thinking about clinical decision support. I'm thinking about analytics, population health, quality measurement. I'm thinking about the scenarios where I need to take data from across a population or even across a single patient's longitudinal record and use it in a broad, normative way to reveal patterns to identify risk, to trigger the right intervention at the right moment, or to leverage some form of automated intelligence, whether that's deterministic, rule-based automation, or generative AI. That's the use case I have in mind when I say the data isn't good enough. But the person across the table from me, they're thinking about something else entirely. They're thinking about billing, they're thinking about scheduling. They're thinking about the operational workflow that gets a patient in the door, produces a clean claim and gets the health system paid.

(03:30):

And for that purpose, yeah, the data's probably fine. It does the job. The billing system doesn't need to know the nuanced clinical trajectory of your hypertension management. It needs a diagnosis code and a procedure code and a data service. And if it gets those things, they're good to go. The data's fine. The same thing from the provider's perspective. A physician or other clinical provider puts data into the system because they have to. Some of it's discreet, some of it's coded, structured, some of it's free text, their narrative notes, their clinical reasoning, their observations. And when they come back to see that patient next time, they can refresh themselves. They can kind of recash the patient in their mind by reading their own notes or the notes of another provider. The structured data they entered, it's almost secondary. It's there because the system required it, but the real clinical picture, at least from most providers vantage points, lives in the narrative.

(04:29):

And for their purposes, that's good enough too. So I want to be clear, I'm not saying those people are wrong. I'm saying they're answering a different question. The problem is that different question is the old question and we now have a new one. To understand where we are, you need to understand how we got here. And for that, I want to take the way back machine a few decades. The historical pattern of data in healthcare, and I mean going back to when we started automating clinical workflows on computers, was fundamentally episodic. A patient comes in, you collect some data, you use that data to produce a bill and then in a lot of cases, you're done with it. The data has served its transactional purpose and it fades into the background. Now that pattern didn't emerge out of negligence or malice. It emerged because that's how we've always aproached automation as human beings.

(05:21):

We take a manual process and we automate it. We don't usually step back and say, wait, what's the best way for a computer-driven system to handle the data? What structure, what data model, what collection methodology would maximize the long-term utility of this information? No. We look at what we are already doing on paper and we say, let's do that, but faster and electronically. That's it. That's what happened. Then along comes the era of meaningful use. Regulations come out that push the industry towards discrete, coded, structured data, things like CPOE. The idea behind the push is genuinely visionary. If we can get clinicians to enter data in a structured coded way, we can start to use data programmatically. We can trigger clinical alerts. We can do population health analytics. We can measure adherence to best practices through quality measures. We can build the kind of decision support infrastructure that helps clinicians make better decisions for their patients.

(06:18):

And that vision was sound. The problem was execution and the habits we have as humans. Because even as we pushed for structured data, we were still fundamentally operating off the same mental model, collect data, use it for transaction and move on. The providers who were now being asked to choose from a dropdown list instead of just writing a note, many of them experienced that as an obstacle, not a tool. It slowed them down. It asked them to distill complex clinical realities into a code that may or may not capture what they actually meant. And because they're human beings under enormous pressure and an incredibly demanding environment, they made choices, fast choices, defensible choices that may not have produced the most clinically precise data, but that got them through the encounter. The result is what I call uncalibrated uncertainty. We built systems that produce the appearance of structured, reliable, coded clinical data, but underneath that surface, the data is riddled with assumptions, shortcuts, defaults, and approximations that nobody ever bothered to document or account for.

(07:29):

That would be impossible. The structure is there. The position often isn't. And for a long time, for the purposes we're using the data for, that was also fine because the systems consuming that data were also operating at a relatively low level of clinical sophistication, but things have changed and that's where we are now. Before I get into what's different now, I want to give you a framework for how I think about this because data quality as a phrase gets thrown around a lot in healthcare and everybody nods at it and almost nobody defines it the same way. And I think that's part of the problem. If we can't agree on what quality means, we can't have a productive conversation about whether we have it or not. So let me give you six dimensions, six ways of looking at the data that I think taken together give you a pretty honest picture whether your data is actually fit for the purposes you're trying to use it for.

(08:28):

The first is integrity. And by integrity, I mean something pretty fundamental. Does the data reasonably and accurately represent what is actually happening with the patient? Not perfectly. I'm not asking for perfection. Perfection's unattainable, but reasonably. Is the problem list a fair reflection of this patient's current conditions? Is the medication list something close to what they're actually taking? Does the clinical picture in the record bear a meaningful resemblance of the clinical reality of that human being? Integrity is the baseline. It's the foundation of everything else that sits on it. And in a lot of healthcare systems today, it is shakier than most people want to admit because the data was never really designed to be a clinical truth. It was designed to support a series of transactions and those are two very different things. The second dimension is relevance and this one is underappreciated in my opinion.

(09:24):

Relevance is about whether the data is actually pertinent to the current state of the patient for a given use case because here's the thing. If you can have data that was perfectly accurate when it was entered and is now completely misleading because the patient's situation has changed and nobody updated that record, a diagnosis that was resolved two years ago but is still sitting on the active problem list, a medication that was discontinued but never removed, an allergy that was documented based upon an unconfirmed report from 15 years ago. That data isn't wrong exactly. It's just no longer relevant. And when you feed irrelevant data into a clinical decision support system or a generative AI tool, it doesn't know the difference. It reasons over what it's given. Relevance is about keeping the data current and contextually appropriate, not just accurate at a point in time, but meaningful right now for this patient for this purpose.

(10:21):

The third dimension is liquidity and liquidity is pretty straightforward in concept, even if it is enormously difficult in practice. Liquidity means the data is available to people who need it across the full continuum of care for the patient, not just within one system or one department or one encounter, but wherever the patient goes and whoever is responsible for their care at any given moment. The primary care physician and the specialists in the emergency department and the skilled nursing facility and the home health agency, all of them need access to a coherent picture of that patient. And right now in most healthcare ecosystems, that picture is fragmented across systems that don't talk to each other particularly well. That data exists. It's not just liquid. It doesn't flow and when it doesn't flow, clinicians make decisions in the dark and patients fall through the cracks. We've worked really hard to put the plumbing in place, but the data today is still not flowing the way it should.

(11:26):

The fourth dimension is usability and this is one that ties directly to interoperability. Usability means that whoever receives the data can actually use it to do what they need to do. It's not enough for the data to arrive. It has to arrive in a form that the receiving system or the receiving clinician can interpret correctly and act upon. And this is where a lot of interoperability efforts quietly fail. The data gets transmitted, the interface works. The message is received, but the code system on the sending side doesn't map cleanly to the code system on the receiving side or the terminology is local or the structure is technically valid but clinically uninterpretable. Usability is the last mile of interoperability and without it, you haven't really achieved interoperability at all. You've just moved the problem from one place to another and I would argue that interoperability without usability is just another form of data blocking.

(12:27):

The fifth dimension, granularity. And this is one that we touched on earlier, but it's worth being precise about. Granularity is about whether the data is at the right level of specificity for the intended use. And I want to be careful here because this isn't just about having more detail, it's about having the right detail for what you're trying to do. A billing system might be perfectly served by knowing a patient has diabetes. A population health algorithm trying to identify patients at risk for end-stage renal disease needs to know the type, the duration, the current control, the trajectory, the complications. Same condition, very different granularity and requirements. The mismatch between the granularity of the data we collect and the granularity required by the use case we're now asking the data to support. That mismatch is one of the most under-recognized gaps in healthcare informatics today. And the sixth dimension, the last one I'm going to talk about is trust.

(13:27):

Trust is really another word for provenance, but I like trust better because it captures what's actually at stake. Trusting is about knowing where the data came from and who's touched it. Was this diagnosis entered by the treating physician who examined the patient or was it carry forward from a prior encounter by someone who never questioned it? Was this lab value transmitted directly from an instrument or was it transcribed manually somewhere in the chain? Was this medication list reconciled at the last visit or has it been quietly accumulating entries for three years and nobody taking responsibility for its accuracy? Trust is the audit trail. It's the lineage and as we hand more and more clinical reasoning over to automated systems and especially degenerative AI, the provenance of the data those systems are operating on becomes a patient safety question. A system that doesn't know where the data came from cannot tell you how much to trust its conclusions and in healthcare that matters enormously.

(14:31):

And Providence is one of those things where in the ecosystems we built today, we have just a bare indication of provenance and we must go a lot deeper and we also have to make sure we have a way to uniquely identify the sources of data and the systems that the data came from so when we encounter bad data, we know who to tell so they can correct the bad data that they're introducing into the ecosystem. So integrity, relevance, liquidity, usability, granularity, and trust. These six dimensions. I would argue that most of the data quality failures we see in healthcare today can be traced back to a breakdown in one or more of these. When a clinical alert fires and a clinician dismisses it because they know it's based on stale data, that's a relevance problem, maybe an integrity problem. When a patient shows up in the emergency department and the care team has no visibility of their medication history, that's a liquidity problem, maybe a usability problem.

(15:34):

When a generative AI tool synthesizes a patient summary that sounds authoritative but is on a low resolution unverified data, that's a granularity problem and a trust problem. The framework isn't just academic, it's diagnostic. If you know which of these six dimensions is broken, you know where to focus. And those of you that know me know that I've worked on the PIQI Alliance and the PIQI Framework to help measure and identify some of these qualitative issues that track back to these broader dimensions that I'm talking about here today. So what's different now? Here's the thing. For most of the last few decades, we could get away with the data being imperfect, sometimes wildly imperfect because the systems consuming that data were relatively forgiving. A billing system doesn't ask much of clinical data, even early decision support systems, your drug interaction checkers, your duplicate therapy alerts, they were working off of fairly simple rules and the downside of imprecise data was mostly noise.

(16:41):

Too many alerts, too many false positives annoying, yes, dangerous sometimes, but relatively contained and human providers shielded us from the worst of it typically. Well, that era is over and here's why. We are starting to lean into the data in fundamentally different ways. We're starting to ask the data to do things it was never designed to do and the reason we're doing that is because we have to because the demands on healthcare have outpaced the capacity of human beings to manage them without technological help. Think about what's happening in healthcare right now. We have a population that is increasingly complex, more chronic conditions, more comorbidities, more medications, more data generated per patient per year than at any point in history. We have a workforce that is stretched to its absolute limit. We have clinicians who are seeing more patients than they probably should, spending enormous amounts of time on documentation rather than care burning out at rates that should concern all of us.

(17:47):

And into that environment, we are deploying or trying to deploy increasingly sophisticated clinical tools, population health platforms, predictive analytics that flag patients for outreach before they end up in an emergency department. Quality measurement systems tied to reimbursement and accreditation and now most significantly generative artificial intelligence that can synthesize patient information, draft clinical notes, suggest diagnoses, and increasingly begin to take on an operational role within the care workflow. All of these systems share something in common. They're only as good as the data they're built on and that's where the reckoning comes. Let me spend a minute on generative AI specifically because I think it represents a genuinely different kind of challenge, a genuinely higher bar for data quality than anything we've dealt with before.

(18:46):

The clinical decision support tools in the last 20 years, your drug interaction checkers, your lab alert systems, your evidence-based order sets, all of which I've been acutely involved with. Those were deterministic. They operated on rules. If drug A and drug B are both in the medication list, fire and alert. The rules are visible, auditable, predictable. When they're wrong, you can usually trace back to why. Generative AI doesn't work like that. Generative AI is probabilistic. It operates by synthesizing patterns across enormous amounts of training data and generating outputs that are statistically likely to be appropriate in context. And that has some remarkable capabilities. It can take a messy, unstructured, incomplete set of clinical information and generate a coherent, seemingly authoritative summary or recommendation. But here's the risk and I want to be really clear about this because I think it's underappreciated. When you hand a generative AI system low quality data, data that is inaccurate, inconsistent, incomplete, or low granularity, the system doesn't flag it as low quality.

(19:57):

It doesn't say, "Hey, I'm not sure about this. " It generates a confident, sounding, output based on whatever you gave it. It hallucinates when it has to. It fills in the gaps probabilistically and in healthcare, when you fill in the gaps wrong, people get hurt. And then there's the human behavior piece because we know what happens when people start using a tool that seems to work. We trust it. We give ourselves over to it. We go on a kind of cognitive autopilot. I'm not talking about any specific clinicians here or people. This is just what humans do. You find something that seems to be doing a good job and you stop watching it quite as closely as you should.

(20:40):

In most contexts, that's fine. If AI helps you write a memo and it embellishes something here or there, the downside's minimal. But in healthcare, when AI is synthesizing a patient's clinical picture or suggesting a treatment direction or summarizing a patient's history before a procedure, the stakes are categorically different. A mistake there is not an inconvenience. It could be a catastrophe. Generative AI is a force multiplier. Give it good data, you get a powerful clinical tool, give it bad data, you get a confident, fluent, dangerous one. Let me get a little more concrete because I think sometimes when we talk about data quality in the abstract, it doesn't really land. So let me walk through a few specific places where the gaps become more visible. Let's take the medication list for a patient. We know it was prescribed. We know it was dispensed. Those are concrete transactions, but outside of an inpatient setting where medications are administered and documented in real time, we fundamentally don't know if the patient's taking their medication.

(21:47):

They might not be filling the prescription. They might be filling it, but not taking it. They might be taking it at a different dose or frequency than prescribed. We assume adherence. That assumption gets baked into clinical reasoning and into the data as if it were a fact. And when you feed that assumption into an AI system trying to understand why a patient's condition isn't responding, you've started with a flawed picture in the first place. Let's talk about what happened with clinical decision support. When we first turned on clinical decision support, providers turned it off or they overrode it constantly because the alerts were blunt and they were firing too often and they were second guessing decisions a provider had already considered. Part of the reason they were so blunt was that the data wasn't precise enough to calibrate them. The allergy list had things on it that weren't real allergies.

(22:36):

The medication list had drugs the patient wasn't taking and the problem was had diagnosis that were years out of date. The system was doing its job. It just wasn't doing it well because the data wasn't accurate and the support wasn't granular enough to be meaningful to the provider so that when it fired, 85% of the time they were grateful. It was more of a nuisance than it was of value. When it comes to population health, let's say I want to identify patients at high risk for a cardiovascular event in the next year. To do that, I need their conditions documented accurately, consistently, and currently. I need their labs. I need their medications. I need to understand their adherence. I need a longitudinal picture that reflects reality, not a transactional snapshot that reflects the last billing encounter. Population health is an example of something where at scale it requires data that in all honesty, we don't actually have today.

(23:35):

So where does that leave us? What does doing better actually look like? The first thing we need to do is we need to change the conversation about good enough. We need to stop evaluating data quality against the operational transaction and start evaluating against the secondary uses that are increasingly driving the value of our systems. That means getting the people who define good enough, the operational leaders, the revenue cycle folks, the IT governance committees in the same room with the people who are trying to use this data for clinical intelligence and having an honest conversation about the gap. We need to invest in measuring the quality of our data and understanding it. We need to go back through the data we have and start asking hard questions. What's real? What's assumed? What's a placeholder nobody ever updated? What's coded to a level of specificity that's actually clinically meaningful?

(24:28):

This is not glamorous work, but it's foundational work. It's important work and you cannot build reliable clinical intelligence on a foundation of uncalibrated uncertainty. We've got to build data liquidity into our infrastructure and use it. The information that we need exists in most cases, it really does, but it's scattered across systems, siloed in departments, trapped behind interfaces that were never designed for rich clinical data exchange. Fire and interoperability mandates have opened doors here, but walking through those doors and building the kind of longitudinal patient-centered data infrastructure we need, that's still a significant lift and most organizations haven't fully done it or even considered what it means to do it. And the last thing I'll throw out there is making data quality part of AI governance and our AI data pipelines. Every AI system that touches clinical decision making needs to be evaluated not just on its algorithmic performance, but on the quality of the data that it's operating on.

(25:34):

That means both what it's trained on and what it's prompted with. We need to ask what happens when this system gets bad data? How does it fail? Does it fail loudly or quietly? Does it tell the user it's uncertain or does it generate a confident output regardless? These are data quality questions and they need to be part of the AI governance conversation. All right. I'm losing my voice, so let me bring this home. For 30 years, the data is good enough, has been a defensible position because for 30 years, the things we're asking the data to do were relatively forgiving of its imperfections or we as humans covered for it. The transactional use cases were tolerant. The early analytics were rough but survivable. The decision support was blunt enough that the noise and the data just added a little more noise to the output. Well, that era is over.

(26:36):

We are now asking the data to do something it was never designed to do. We're asking it to power intelligent systems that synthesize complex clinical pictures, identify patterns across populations, support real-time decision making, and increasingly take on autonomous or semi-autonomous functions within the care workflow and the bar for those use cases is not the same bar we've been working on. The good news is I think this is solvable. It's hard. It's not glamorous. It requires investment and discipline and organizational will, but it is solvable. The data's there, the infrastructure is maturing, the standards are improving. What we need now is the recognition at every level of the organization that the old definition of good enough is no longer good enough. And the time to have that conversation is now, not when an AI system makes a high stakes clinical error and everyone asks how it happened.

(27:36):

Now, while we still have the runway to do this right. All right. I'm done harping on that, pun intended. That's the episode. I hope this gave you some things to chew on. As always, I'd love to hear what you think. Drop me a note, reach out on LinkedIn. Let me know if this resonates with you where you're sitting or if I'm totally wrong. And if you know somebody in your organization who needs to hear this conversation, share it with them. I'm Charlie Harp and this has been another episode of the Infomonster Podcast. Until next time.