The Clinical Realist

Why Most Health Systems Will Fail at AI — And It's Not the Technology

Dr. Sarah Matt Season 1 Episode 7

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 14:09

Health systems are signing AI contracts they cannot govern. That is the real problem. Not the technology.

In this episode, Dr. Sarah Matt, physician executive and healthcare strategy advisor, breaks down the three governance layers where AI deployment fails in health systems — and why organizations that conflate technology procurement with organizational readiness are setting themselves up for expensive, very public failures.

The strategic layer: who actually owns the AI portfolio, and what happens when that answer is "nobody in particular." The operational layer: why AI pilots that succeed in a controlled environment die on contact with real clinical workflows. The clinical layer: who is accountable when an AI recommendation contributes to a patient harm event — and whether your organization has that answer in writing before deployment, or is planning to figure it out afterward.

Dr. Matt draws on direct experience working with health systems navigating AI procurement, implementation failures, and the organizational redesign required to make these tools actually function. The argument she makes is not anti-AI. It is pro-governance. Health systems that build the organizational infrastructure before the contract signature will move faster, perform better, and retain clinical trust longer than those that are still building it after something goes wrong.

This episode is built for CMOs, CMIOs, CNOs, physician executives, and health system leaders who are being asked to put their professional credibility behind AI deployment and want a precise framework for what has to be in place before they do.

What you will take away from this episode:

  • The three-layer governance framework for clinical AI deployment
  • Why operational AI governance is a workflow problem, not a technology problem
  • What named clinical accountability actually requires, and why committees are not a substitute
  • The single question that reveals whether your organization is governance-ready before you sign

If you are making AI investment decisions in a health system this year, this episode is the pre-work.



Resources & Links:

📖 Get the Book: "The Borderless Healthcare Revolution" is available now on Amazon and major retailers.

💼 Work with Dr. Matt:
Looking for a keynote speaker or strategic advisor?
Visit: drsarahmatt.com

🔗 Connect on Social:
LinkedIn: https://www.linkedin.com/in/sarahmattmd/
YouTube: www.youtube.com/@DrSarahMatt

📧 Subscribe to The Briefing: drsarahmatt.com/newsletter-signup
 

Disclaimer:
The views expressed on this podcast are those of Dr. Sarah Matt and her guests. They do not necessarily reflect the official policy or position of any affiliated institutions. This content is for informational and educational purposes only and does not constitute medical advice or a professional consulting relationship.

SPEAKER_00

Every health tech vendor exhibiting at HIMS right now, this year, this week, is selling AI. Not some of them, all of them. The EHR platform is AI powered. The Rev Cycle tool is AI-driven automation. The clinical documentation software uses AI to reduce physician burden. And the patient engagement platform has an AI copilot. The predictive analytics suite surfaces AI-generated insights everywhere. So every single one of them. And most of it is not going to change a single clinical outcome in your health system. That's right. I just saved you thousands and thousands and maybe millions and millions. But I really want to be precise about that statement because I'm not anti-AI at all. I've seen AI work in clinical settings all over the place. And I know what the evidence looks like when it's real. What I am talking about is the gap between what these tools are packaged to do and what they actually do in clinical operating environments. And that gap's actually really enormous. The health systems are spending tens of millions of dollars falling into it right now. So today I'm going to tell you the five vendor claims that fail in clinical practice, where AI actually does work, the adoption barriers that no one builds into the budget, and the four-question framework I use before recommending any AI contract. So first, let me define the gap clearly because it has a very specific shape. Healthcare AI claims operate at one of three levels. Level one is the algorithm just performs well in a controlled study. Great. Level two, the tool deploys successfully in a clinical environment. And level three is the tool changes clinical outcomes at scale. Notice that level three is what we want here. Almost every vendor, and again, if you're at HIMS right now, go on the showroom floor and test this out. Every vendor is selling at level one. Most tools make it to maybe level two. Very few reach level three, and level three is the only one that actually matters, clinical outcomes. Now the reason for the gap between the level two and level three, it's not the algorithm. The algorithm is usually doing exactly what it was trained to do. But the reason is the clinical operating environment, workflows, staffing structure, documentation burden, and cognitive load on clinicians, and almost most importantly, the organizational culture that determines whether an AI recommendation ever reaches a patient. So you can deploy a technically excellent algo into a clinical environment and achieves absolutely nothing because the environment is not configured to act on what the algo produces. The vendors actually know this. That's right. Most of them are not hiding it. They are just not leading with it in their sales presentation. So here are the five AI claims I hear most often in health systems. AI conversations and why each of them requires absolute scrutiny before you sign. So first, I love it when vendors say our AI reduces physician documentation time by 40%. How many times have you heard that? So sometimes it's true. And in control pilots with motivated users, motivated users at specific documentation types on specific EHR integrations, the claim fails when it hits the production environment. Different EHR configuration, variable physician adoption, we know that's true, documentation types that the pilot didn't really cover. So that 40% becomes like 12% for the physicians who use it, and 0% for the physicians who root around it. So the overall metric looks like 18% and the vendor calls it a success. The second claim is our predictive model identifies high-risk patients before deterioration. I mean, I might be one of the people that's completely responsible for putting this out in the market too. But the OROC on the validation study, it's impressive. What the slide does not show is the alert burden that model actually generates at your patient volume. I've watched health systems deploy sepsis prediction algos that fired alerts at a rate that nurses stopped reading them within three months. I mean, the sensitivity was real. The workflow integration was not. Alert fatigue is not a technology problem, it's a clinical design problem that technology vendors don't own. The next one, which I oh, I'm like hitting my microphone. I'm so worked up about this last one. Implementation takes 90 days. All right, when's the last time you implemented anything in 90 days? This is a timeline to technical go live. It's not the timeline to meaningful adoption. Those are two completely different numbers. And the second one is typically 12 to 18 months longer than the vendor projects. The 90-day claim is accurate and incomplete in the same breath. And then claim four, and again, guilty as charged. Our tool is EHR agnostic. This means it integrates via HL7 or FIRE. It does not mean the integration is seamless, fully bidirectional, or that it surfaces outputs in the actual clinical workflow where decisions get made. EHR agnostic is a statement about technical standards, not about clinical usability. Claim five is our model was validated at insert your prestigious health system. You know, validation in a major academic medical center with a dedicated implementation team, our research infrastructure, and motivated physician champions does not predict performance at a community hospital with three IT staff and a clinical team that is already at 110% capacity. The validation is real, of course, but the generalizability that's the question your vendor is not actually answering. So let me be equally specific about where AI and healthcare does work, because the evidence is real and the use cases matter. AI is working right now in radiology. Image recognition at scale for specific findings, specific modalities, specific diagnostic tasks. The FDA cleared tools in chest x-ray interpretation, diabetic retinopathy screening, and mammography triage, and those have genuine clinical evidence. These are narrow, well-defined tasks where the algo outperforms and brings output maps directly to a clinical decision and the workflow integration is contained. So this is where the clinical evidence for AI is very, very strong. AI is also working in administrative and revenue cycle functions in amazing ways. Prior auth automation, coding accuracy, claims processing, scheduling optimization, these are areas where the workflow is relatively standardized, the feedback loops are fast, and the cost of an AI error is recoverable. So health systems that have achieved demonstrable ROI from AI investments are disproportionately realizing it here, not necessarily at the bedside. AI works in pop health at scale too. So when the task is identifying patterns across large patient populations for care management outreach, AI generally outperforms human review. The value is in the scale, not in the individual patient encounters. And AI works when the clinical workflow is designed around it. So this is the underappreciated condition. The tools that succeed are not the ones with the best algorithms. They're the ones deployed into workflows that were redesigned to act on AI output. The algorithm is only one component. The workflow is the delivery mechanism. Without the second, the first is completely irrelevant. So the adoption barriers in healthcare AI, they don't appear in vendor decks. I wonder why. And they appear, though, in month four when the clinical team has been live for 90 days, and adoption is abysmal at 22% max. And this is really a trust problem. Clinicians do not trust a black box. An algorithm that produces a risk score without explaining its reasoning will be overridden by experienced clinicians who can't reconcile it with their clinical judgment. The tools that achieve adoption are the ones that show their work. So here's the patient data the model used. Here's why this patient was flagged. Here's what the evidence says about this risk factor. Explainability is not a nice to have. It's an adoption prerequisite for clinicians. The workflow position problem here, where an AI recommendation appears in the clinical workflow determines whether it gets acted on or not. A deterioration alert that fires in a nursing station dashboard that a nurse checks twice a shift, it's going to produce very different outcomes than the same alert integrated into the EHR workflow the nurse uses 40 billion times a day. Workflow position is a clinical design decision, and most AI tools don't come with it solved. The champion dependency problem is also important. Every AI deployment that succeeds has a physician champion, a nurse champion, a clinician with credibility who advocates for the tool, troubleshoots adoption issues with peers, and personalizes the case for use. When that person leaves or gets promoted, adoption often collapses. Success that depends on a single champion can be fragile. Health systems at scale AI deployments build champion infrastructure, not just champions. Then there's the data quality problem nobody really admits to. The model performs to the quality of the data it ingests. So if your EHR data has documentation inconsistencies, missing fields, or coding variations by department, the model is going to perform inconsistently too. And we all know all of those data issues exist everywhere. Most health systems discover their data quality problems for the first time when an AI deployment surfaces them. This is not the vendor's fault. It's also not a surprise if you audit your data before you sign the contract. Data governance comes first, then AI governance. So before any healthcare AI contract, I run four questions. And these could replace a hundred slides of vendor materials. So question one, what specific clinical or operational problem does this solve? And how will we measure whether it is solved? Not improve outcomes, not enhance decision support, the specific problem. So is that 30-day readmissions in this patient population? Is it prior authorization denial rates for this service line? Is it documentation time for this note type? If you can't answer with that level of specificity, you are not ready to evaluate a solution. Figure out what problem you're trying to solve. Question two. The vendor does not own your workflow. Someone on your team needs to own the workflow redesign that makes the tool completely usable. Who's that gonna be? What authority do they have to change workflow at the unit level? And if that answer is unclear, the tool's not gonna reach adoption, regardless of the amazing technical algorithm quality. And question three is what does the validation evidence actually show? And does it even apply to our patient population in an operating environment? So read the study, not the executive summary, the study. Where was it validated? What patient volume? What was the alert burden at the volume that it was seeing patients at? What was the physician champion situation? How similar is that environment to yours? So if the gap is large, the validation is directionally interesting and operationally limited. Next, you should ask what are the contract terms if the tool does not produce the agreed clinical outcomes? And this question changes the conversation immediately. Vendors who are confident in their clinical evidence welcome the outcome-based contract terms. We gotta ask for them. Vendors who are selling on algorithm quality without clinical outcomes? Confidence becomes evasive. The answer to this question is the clearest signal available about what the vendor actually believes about their product's real world performance. This is them going at risk with you. So the health systems that are succeeding with AI right now are not the ones with the most AI. They're the ones with the most disciplined approach to deploying a small number of tools into workflows designed to actually use them. They start with a 90-day problem definition phase. And again, you can make this longer or shorter, depending on your organization. But this is before any vendor conversation. A cross-functional team defines the problem. They quantify the current state, they map the clinical workflow, and then they identify what 10% better is worth in real dollars or clinical outcomes. That number becomes the investment ceiling. They run outcome-linked pilots with hard decision dates, 90 to 120 days, whatever it is. Specific success criteria need to be defined before go live. And if the criteria are met, deployment plan activates. If it doesn't, you gotta cancel. You have to make sure that you stop that go live. No extensions, no let's give it another quarter. That is a discipline that separates organizations that scale AI from organizations that accumulate pilots. They build adoption infrastructure alongside the technology. A named adoption owner, dedicated budget, explicit timeline, and accountability metrics for clinical use. Not IT adoption metrics, clinical behavior change metrics. And they audit their data before they sign. They know what the model will ingest, they know what the quality issues are, and they build data remediation into the project scope before the vendor starts talking about go live dates. So AI and healthcare, it's not a question of whether the tools are real. The evidence base in specific domains is very compelling. The question is whether your organization is designed to actually use it. So if you've watched an AI deployment go wrong in your health system or gone right, I mean, I'd love to hear about that. I want to know. I mean, what made the difference? Drop in the comments. New episode every Tuesday. Subscribe if this was useful. And I'll see you next week. This is Dr. Sarah Matt.