Episode 30: Data Quality in Healthcare: Inside the Patient Information Quality Improvement (PIQI) Framework

When looking for a practical way to assess the quality of patient information from external sources, Charlie Harp found that existing frameworks were either too general or too specific to certain databases or schemas. So, he created the Patient Information Quality Improvement (PIQI) framework. In the second episode of the "Data Quality in Healthcare" series, Charlie gives a high-level overview of the PIQI framework and shares some use cases for this structure. Tune in as he shares the ultimate goal for the framework which is to enhance the quality of patient information to provide better outcomes and care across the healthcare industry.

Contact Clinical Architecture

• Tweet us at @ClinicalArch
• Follow us on LinkedIn and Facebook
• Email us at informonster@clinicalarchitecture.com

Thanks for listening!

Charlie Harp:

Hi, I am Charlie Harp and this is the Informonster Podcast. On this episode of the Informonster Podcast, I am going to continue my series on the Patient Information Quality Improvement framework. In the first episode, I laid out the issues we have with data quality in healthcare and why data quality is important. In this episode, I'm going to provide a summarized overview of the approach and the use case and the concepts behind the Patient Information Quality Improvement framework, which I'm going to call the PIQI framework, PIQI, because it's just easier for me to say. So when this first started, what I was looking for was a way to just generally look at the quality of information that you receive from somebody else meant to represent a patient and their clinical situation primarily. And so I started looking at all these different frameworks and papers and standards around data quality hoping that I would find something that would just slot right in and I could create something that would put it into practice and I'd be done.

(01:21):

The problem is, what I found when I looked across the spectrum of research and tooling and frameworks was that things tended to fall into one or two camps. One camp were things that were very academically oriented, they were very general in what they talked about. They didn't have a lot of practical examples of what you mean by this and what are you applying this to. And so those things that were either were kind of super general around data quality or academic thesis on what does quality mean and is this quality and is that quality? I needed something that I could practically apply. I needed a guidebook essentially. So then I looked at other things that were very purpose built. For example, there's a lot of qualitative frameworks designed around OMOP, and those are fine, but a lot of those things are very prescriptive to OMOP or very prescriptive to a relational database, and they tend to lean more on schema quality or data quality from a database perspective.

(02:32):

And that wasn't exactly what I was looking for. So I decided to step back and look at the use case itself that I was trying to solve. And in summary, here's the use case that I wanted to solve for. I wanted a data quality framework that I could apply to a collection of data points that I was receiving from someone else that was intended to give me a summary overview of the patient's clinical demographic makeup. Because what I was trying to do with this use case is I have a clinical data repository and I have a patient or patients in that clinical data repository. I want to enhance and enrich and complete that picture by taking in data from other places. And I want to make sure before I bring that data in that it's good. I want to make sure that it's well formatted.

(03:32):

I want to make sure that the things that are numbers are numbers, that it's using terminologies that are compatible with what I'm trying to do. And generally, I want to make sure there's nothing weird in the data that is going to corrupt the data that I've been collecting. Because even with data providence, when you start pulling data in and adding it into a clinical scenario or into a repository, it's hard to unmake the sausage, if you will. It's hard to unweave everything when you've mushed everything together. And so the way to keep yourself from having to deal with that scenario is to stop it from happening, stop the bad stuff from getting in the first place, in which case you are preserving the quality of the data that you have as good as it is. So that was the first thing. That was the general use case.

(04:28):

I want to take data from elsewhere and I want to bring it into my data repository. The second part of the use case is I really don't want this to be dependent on schema. So I don't care whether you're giving me OMOP or FHIR or CCDA or HL7 or CDISC. I want to be able to take the data and I want to be able to assess its quality without worrying about the schema that it came from or the schema that it's going into. Now, I know that when you do that, it creates a least common denominator effect, and that means that there are only certain things that you can validate, but the truth is when you're going canonically from one schema to another, that's going to happen regardless of what you do. But there are a handful of things that we can agree on that are ubiquitous in these different domains of data that we have for patients.

(05:26):

And within those ubiquitous attributes, we can agree that these things have to be of certain quality. The second use case feature is I need it to be schema agnostic or let's say inbound and outbound schema independent. And that means that you're working with a minimal data model that is just there to satisfy the purpose of qualitative analysis. So that's thing number two. The third thing is I wanted it to be flexible enough so that I can create things that evaluate, let's call them assessments, and I can assemble those assessments against the data with different criteria depending upon what I'm trying to do. And so I call that an evaluation criteria. I want to be able to say, here are the things that I can measure. Because if you look at patient data, for example, take a lab result, there's only so many things you can measure in a lab result.

(06:26):

You can validate that the test code is valid and that it's interoperable. You can validate that the unit of measure is ucom compliant. You can validate that the result if it's appropriate as an integer or as a number. There are things that you can evaluate when you're bringing data in, and those things are always going to be there. The question is, do you care about those things? Do you care about whether the lab is a LOINC code or not? Do you care about whether the unit is UCUM compliant or not? And depending upon the evaluation criteria, it might be a showstopper. It might not be a showstopper, but that doesn't mean I want to have to reinvent the wheel. If I have two use cases, I'll throw something out very specific USCDI version one versus USCDI version three to be USCDI version one compliant. There are certain things that I must say are valid.

(07:25):

There are certain things I must say are true to be USCDI version three compliant. There are more things that have to be valid. There are things that have to be coded in a certain way. In the Venn diagram of assessments, there are a number of overlapping elements between USCDI, version one and version three. I shouldn't have to reinvent the wheel for USCDI version three. If I've already built assessments that do the things that USCDI version one needs, I should just be able to leverage those assessments because they're still valid in USCDI version three, assuming they're still valid in USCDI version three. And if I'm doing USCDI version three and I want to be more rigorous than that and call that Charlie's quality level, then I should be able to use those assessments and add in my new assessments that maybe are looking at plausibility or other things that aren't contemplated in USCDI version three.

(08:21):

So the next thing is I wanted to have an architecture or design where the assessments on the attributes and the elements in the data payload are independent entities from the evaluation criteria that orchestrates and weighs the relevance of those assessments against the evaluation criteria. And the last thing is I wanted it to be simple enough so that if people build these assessments, the assessments can be shared across entities. In fact, maybe even standardized. Because we spend a lot of time in healthcare recreating the wheel in every individual silo, and there are certain things that if we can agree on a standard of quality and we can agree on a collection of assessments that we use to measure that quality, then as an industry we can put a gauge, a quality gauge on data that somebody is sending and determine whether or not that quality that they're sharing is good quality.

(09:28):

And the final part of the use case is saying your quality is bad and you failed this and you failed that is a step in the right direction. But the other thing that I thought was really important was this idea of a taxonomy of qualitative issues. This is the idea that, for example, when I say that something is not interoperable or it's incompatible, that we agreed that the lab was going to be LOINC and the drug was going to be RxNorm and the conditions or the diagnoses were going to be SNOMED codes. Those things are all the same type of problem. So there are different domains and there are different attributes or fields in an element, but at the end of the day, they are interoperability problems. They are conformance issues, and if you measure your quality and 90% of your qualitative issues are conformance issues, that's something that informs the person sending the data that you're not conformant.

(10:28):

That's your problem. If you're sending me data and the things that are supposed to be numbers are not numbers, well then those are accuracy problems. Those are validity problems, and you're doing something wrong. And so the idea of having a taxonomy is it doesn't just measure the qualitative issues at an attribute level or an element level, but it also kind of gives you an idea of these are the type of issues that you're creating. You're creating issues around the accuracy of the data or the availability of the data or the conformance of the data because a lot of data that we use in healthcare is around code systems or you're having issues with the plausibility of the data. You can't have a birth date in the future. This thing doesn't happen after this thing. It only happens before this thing. So these are all issues that when you are looking at the data and you're trying to determine how usable it is and you're trying to convey that back to the source of the data, understanding the dimension of a qualitative issue and the category that it falls under, I think is enlightening.

(11:40):

And that also adds a dimension to our ability to look at data that we're getting in and not just saying that you have an issue in the lab domain. That issue is around conformance, but other than that, everything looks good. So you don't have accuracy issues, which accuracy issues can be message formatting issues. It can be a number of things. Conformance issues are about mapping and interoperability. Plausibility issues could be that this patient is not the patient that you think it is. So there's a number of reasons for that. Our goal at Clinical Architecture with the PIQI framework is to establish the framework, build out the components, and then actually push data through it. If you know me, one of my sayings is the proof of the pudding is in the tasting. So if we build something and you start to try to use it with real data, it doesn't take very long for you to determine whether you're pointed in the right direction.

(12:41):

And so that's where we are right now with the PIQI framework is we've kind of established let's call it, its inaugural design, and we're going to be pushing data through it to determine whether or not it's viable. What I thought I would start with today is to kind of give a high-level overview of the components of the PIQI framework for consideration and in the future episodes, what I'll do is I'll get more in the weeds of each one of these components and what they do. I've already kind of told you the use case, so the PIQI framework is structured into four distinct functional components. Each of these components has an important role in helping to deal with very specifically patient information quality issues in healthcare. The first component is the taxonomy of dimensions. Now, this component is designed to establish a well-organized taxonomy that categorizes the various dimensions to be measured in healthcare data quality assessment.

(13:48):

These dimensions encompass a wide range of attributes such as accuracy, availability, conformance and plausibility, so that we can focus on those aspects of data quality when we're performing an assessment or an evaluation. The hope is by categorizing these dimensions, we'll enable a better understanding of what the nature of the failure is that's creating the quality issue in the first place, and I'll go more into each one of those dimensions and the categories in their dimension in the next Informonster Podcast. The next component is the information model. I considered a number of information models, but what I thought made the most sense is when you look at something like USCDI version three, there is a kind of an implied model in the US core that basically says these are the things in each domain that we care about, and the simple information model, the PIQI information model, is basically those things.

(14:52):

It covers the things that we care to measure, and you could extend the model to evaluate other things that you care to measure as long as you're willing to articulate the nature of those things so that you can align them with dimensions that can measure them and apply simple assessment modules to them. But the idea is this simple information model is going to serve the core data structure that we would do a data quality assessment against, and it includes data elements related to patient demographics, medical history or medication history, treatment records, and other pertinent healthcare information. And the idea is if the model is simple is a least common denominator, minimal data set for those critical things that we need to evaluate, then creating standard transformations that go from standard messaging formats like FHIR, OMOP, CCDA, HL7, should not be rocket science. I should be able to pluck those things out, put them into a simple model, run that entire patient model through an evaluation and get a report card for an evaluation criteria.

(16:00):

So that's the idea behind that. The next thing is this concept that I talked about, which is a simple assessment module. A simple assessment module is a unit of assessment that can be applied to certain types of things, whether it's a coded entity attribute, a simple attribute, an element in a particular domain or the entire patient payload that you're looking at. It takes one of those things as an input and it returns a pass or a fail. That's it. The idea is you have the input object you're evaluating. You may have a parameter, for example, if you're evaluating conformity, you could say, here's the code system identifier that I'm expecting, and you would either pass or fail that, and the net result of the assessment module is did they pass or they fail? Because ultimately what we're looking for is a simple numerator denominator evaluation.

(16:51):

We can use that to assemble some kind of arbitrary scoring system if we want, but at the end of the day, what I'm saying is, did this pass the assessment or did it fail the assessment? The nice thing about that is when you're creating a simple assessment module, the contract for that interface is known. I have a certain type of thing that I can give to the assessment. I have a list of parameters that I can request that you pass in that are relevant. If I want to create an assessment that can do more than one thing, for example, like validate different code systems and I know that my result is a pass or fail, true or false, that also means when I'm consuming a simple assessment module, I don't have to have a lot of external training or knowledge, and if I build an assessment, then theoretically I can share that assessment with you.

(17:40):

And I know there are some architectural and technological barriers there, but I feel like we could find a way through that. That's what a simple assessment module is. And the next part of that is the evaluation criteria. The evaluation criteria is basically a list of simple assessments sequenced against the patient information model with essentially whether or not that pass or fail gates, that thing or its parent, and if you want to apply a score, what is the arbitrary score impact of this pass or fail? Now, I'm not going to get into this in details because that's going to be in a future podcast, but the idea here is the evaluation criteria. Let's say USCDI version three says, when you get a patient payload, here are the assessments you're going to do and here's the order you're going to do them in, and if one fails, here's what that means either in terms of score or in terms of gating the data.

(18:39):

The nice thing about an evaluation criteria is it does two things. It generates a report card that when you accumulate a bunch of these messages over time, you can say, when I look at the thousand patients, here are the areas where I saw issues. Here are the areas where it was good. Here's the nature of that impact. The bottom line is it does that, but it also says that you had seven lab results and five of them were okay, two of them were bad. You can decide when you're evaluating that whether you're going to accept the five that were good or whether the fact that two failed gate the entire patient from coming into your data warehouse or gate all lab data from coming into your data warehouse. So there's the gating potential. There's also the scoring potential because in reality, what I would assume would happen in the future, if we have a way to assess the quality of somebody's data stream, what we would do is we would turn the hose on and start receiving data.

(19:38):

We would divert it from the data repository and we'd run it through this evaluation process. The evaluation process would tell us the quality of the data so we can decide if we want, if we're willing to accept the data into our data warehouse. And so you can imagine that there's a gauge that says, oh, we turned on the stream from X, Y, Z, and the data is scoring at 20% quality. You can go back to the source and say, whether you're buying the data, whether you're just accepting the data or whether they want to be a member of a community, that is the sharing data. You can say, yeah, you have to be at 85% quality. Ideally, they're at a hundred percent quality depending upon the nature of your evaluation criteria, but you can say, well, you have to be at 80% quality before we're going to start using your data, and I'm not going to pay top dollar for data that's got 20% quality.

(20:29):

The way I think it'll work in the future is you won't necessarily be gating individual records or attributes. Maybe you will, I mean, at least with this model, you can do that, but ultimately it's going to allow you to determine if somebody is sharing good data before you even start to consider ingesting that data into your clinical data repository or into your environment. Those are the components of the PIQI framework and the benefit of this approach and the picky framework in general is number one, it's standardized. When you talk about this approach, the model standardized, the taxonomy, standardized, the simple assessments will get to a point where there are a standard arsenal of simple assessments, and they're not rocket science. They're things like, is this an integer? Is it a decimal? Is it a valid UCUM unit? These things are known quantities, and so for all intents and purposes, they're standard, and when we talk about them in the industry, we'll be able to talk about them like they're a standard.

(21:30):

It won't be a, well, how's your quality? I don't know. It's got stuff. We'll be able to say that my quality generally is good. I'm having some trouble with UCUM compliance, and we're trying to figure out what that's about. It's probably something that's coming from our lab system. We're going to nail it down. The next thing is this model is hyper-focused on patient data, and it's not written in stone. My expectation is as we get more sophisticated and we share things, healthcare evolves by incremental improvements. I talked about this last time. Disruption is difficult. We are disruption-resistant, but if we can take this minimal set of data and measure its quality, then we can start to do interesting things. We can start to look at additional data elements. We can start to do plausibility checking, which is really looking at the clinical integrity of the data to make sure that we don't let something slip in that's going to knock a patient out of a cohort or do something based upon bad data.

(22:30):

The fact that we're focused on prioritizing patient data and these things that are important to us, expanding that over time makes it much more relevant than ubiquitous generic data quality. And the last big benefit from my perspective is this idea that this is a shareable thing, so I'm not interested in making a proprietary way of measuring quality. I don't know that that's acceptable at this point. I think that having something where we can all agree or all understand because sometimes perfection is the enemy of good, but if we can all understand how we're measuring quality, and if I make a plausibility checker for lab results that say hemoglobin A1C can't be 10,000, for example, and I can put that out there, then what we can also do is we can stop spinning our wheels in all of our separate silos that we can have a shareable, portable, relevant standard way of looking at data in healthcare, specifically patient data in healthcare, and determining its quality.

(23:42):

If we can measure it, then you get awareness of where falling down. If you fix that, a rising tide in patient data quality raises all boats, and the boat that we're ultimately trying to save is providing better care and better outcomes for our population, which includes our loved ones and families. So I think that's where I'm going to call it for today. I appreciate you guys putting up with my ramblings. Stay tuned for episodes where we'll dive a little bit into the weeds on each one of these things. And if you have any questions or comments, please feel free to reach out informonster@clinicalarchitecture.com. I look forward to getting your feedback and hopefully collaborating with you as we start to roll this out, as we deploy it against data and we learn by its practical application. I also look forward to sharing that information with you as well. Alright, well I'm Charlie Harp and this has been another episode of the Informonster Podcast. Thanks a lot.

Informonster Podcast

Episode 30: Data Quality in Healthcare: Inside the Patient Information Quality Improvement (PIQI) Framework

Listen to this podcast on