Pharma Council - FutureHealth

The Potential for Bias in Agentic AI Systems in Health Care and Pharma

Advertising Research Foundation

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 33:51

With the growth of AI comes concerns about fairness and bias in the development, training, and deployment of agentic AI systems in health care and pharma.  This episode features an interview with an expert on algorithmic fairness and bias who conducts research on responsible AI – Dr. Kalinda Ukanwa, Assistant Professor of Marketing at University of California, Irvine. Dr. Ukanwa explains how bias can arise in AI tools and systems for health care, the conditions under which biases could be exacerbated in the use of these tools, and strategies for identifying and overcoming bias in agentic AI systems for health care and pharma. 

SPEAKER_01

This is Future Health, a podcast on trends in the patient journey, what to expect in the next three to five years. The podcast is produced by the Advertising Research Foundation's Pharma Council, whose mission is to identify marketing and research challenges in the pharma industry and develop strategies to deal with them. I'm Jay Matlin, the director of the ARF Council program. AI, Artificial Intelligence, continues to make headlines and not just in the business pages. The last two episodes of Future Health were centered on understanding how agentic AI works and its implications for the healthcare and pharma industries. For this episode, we focus on fairness and bias when it comes to implementing AI, particularly agentic AI in pharma and healthcare. Our guest is Dr. Kalinda Ukanwa. She's a professor at the University of California Irvine, and she conducts research on responsible AI, algorithmic fairness and bias, and the implications of algorithmic bias for firms and consumers. Welcome, Kalinda. Thank you. So to start off, how do you define bias in your work and what does it mean in the context of AI technologies?

SPEAKER_02

Bias in general just means some kind of deviation from neutral or fair or norm that systematically favors outcome or persons or groups compared to others. And in the context of AI, the way that it's more precisely defined is when the AI tool or an algorithm systematically produces errors that might uh favor groups or individuals over others. People usually think about it in terms of social demographic groups. It could be based on race, ethnicity, gender, social status, uh, age.

SPEAKER_01

So why is that important?

SPEAKER_02

It's hugely important because it's about ultimately it compromises the safety of the AI tools that we use and becomes all the more important that we can use these tools and have be able to trust it, that it'll produce uh outputs or helps us in ways that are reliable, that are fair, that are trustworthy, that are safe. And this is really part of the field that I do research in called responsible AI. To help people understand why this becomes really important, I like to use the analogy of another huge invention of the past, which is you know the automobile that really transformed society. So at the time when the automobile or the car was invented, it allowed us to go farther, faster, do things that we were never able to do before in our society. And it really revolutionized the way we did things, which was wonderful. But at the time that the automobile was invented, people didn't really think about some of the safety implications like seatbelts, flat roadways, um, stoplights, stop signs, uh, guardrails, things that were put into place later as it became clear with the use of the automobile that there were ways that needed to be addressed to make it safer to use, more trustworthy to use, and yet extract all the benefits of the auto. Well, analogously, we need to do the same with AI. Here's this revolutionary invention. It's already uh allowing us to do things that we weren't able to do before, and we've only seen the beginning of its potential. But we need to also immediately think about well, how do we make it safer so that we're not going off the guardrails, you know?

SPEAKER_01

But how does this happen? How do biases arise in AI tools and systems?

SPEAKER_02

The field of algorithmic bias is about the pursuit of um detecting and rooting out when algorithms produce biased outcomes. And then the related field of algorithmic fairness is focused on ways to fix that, ways to mitigate that. They're two distinct but very linked fields of inquiry. And in terms of what we're doing here, there's different ways that it can arise. And so I wrote a paper not too long ago that talked about a framework on the way you think about it. It's called the 3D dependable AI framework. And in this framework, the 3Ds represent design, develop, and deploy, which encompasses the whole AI life cycle. So if an individual or a firm is developing and producing an AI tool, it's going to go through these three phases, you know, the design phase where you're designing the algorithm or the AI tool, the develop phase where you're developing it so that it's ready for production, and then the deployment phase where it's out in the world to be used. Well, bias can arise in all three of those stages. But many people think of it as, oh, it's about the data. If you fix the data, then you will fix the bias. And I would say that's a good starting point, but it's not the only point that needs to be addressed. So, for example, uh, in terms of design, how does bias arise from that when you're just simply uh putting together the model or you're chopping, so to speak, for a model that you're going to be using? There are choices that are made during that phase, you know, the selection of the model or the tool you're going to use, the selection of the type of data that you're going to use, which particular features in the data will you keep versus discard. All of these are active choices that can ultimately influence whether bias arises. So there was a paper that appeared in the journal Science that examined a healthcare algorithm, an AI tool that was designed to predict whether the patient that it was examining whether they warranted additional interventional health care. And they found in this particular AI tools that there was bias between white and black patients, where it was routinely underdiagnosing black patients in terms of needing additional health care. And by digging deeper into why was this, it was about the design of the tool. So in that particular case, the algorithm's target was based on based on how much spending had happened in the past by the patient. So it would assume, as a result of that design, that those who were not spending as much were maybe not as sick. That that is a specific design uh element of the tool. But in fact, prior spending on healthcare was not really predictive of how sick the person was and whether they needed to have more invention. They were finding that there was a disparity between black and white patients because black patients simply did not spend more, but could have been because of budget limitations. There's also some association from a cultural standpoint where uh black community members just don't go to healthcare resources as often as white counterparts because of issues about trust, um, access, you know, equal treatment, et cetera. So all these things fed into this disparity that was captured in how much were they spending in the past. What they found is that if they changed that design element, and you can think of it as the target or the dependent variable to something else, which was how many actual symptoms each of the patients were showing, they found that by changing that design, the bias almost diminished completely. Uh I'm using this as an example, that it can arise in the design stage. It can also arise from, of course, data if there's not any representation of the people that you are planning to use the algorithm with, you know, or if there's very little representation, then of course we run into issues with the algorithm not being able to predict well for people who are not as commonly represented. It's going to think that uh cases are focused on whatever the majority group is in there, that would be in the development phase of the process, or it can happen in the uh deployment phase of the process where you know you've built this tool, now you set it out into the world for people to use. If you have a tool that was designed for one particular group, so say you have a school that is an all-boys school, it hired a company to develop an admissions AI geared towards male students. Maybe the algorithm that's developed is fair and appropriate for that all-boys school, but now you take it and you want to take that same algorithm and you want to apply it to an all-girl school or a co-ed school. Now you're using an algorithm in a context that it was not designed for. And as a result, by using that algorithm in a context it was not designed for and it has not been modified, it can now produce biases because it was trained on something else.

SPEAKER_01

Within that framework that you just laid out, design, development, deployment, you gave an example a minute ago, how bias can affect the design of an AI system. How can biases arise in AI tools and systems for healthcare and development and deployment?

SPEAKER_02

Say it's some kind of tool that's going to look at images, you know, medical images for diagnosis. And say the training of this tool was only done on men. Men have, on average, a different physiology from women. And so if the tool is trained on what a lung might look like in men, that tool is going to assume that a healthy lung looks like uh lungs of men. And then when it encounters images of lungs from women, if there's any deviation from that, it's going to make more likely make an error in diagnosis simply because it's not familiar with what this lung looks like from another group. In terms of the deployment phase, there could be something similar using this very simple example. If the tool is intentionally developed to diagnose lung issues in men only, which is a fair scope, that needs to be disclosed, that needs to be given as a limitation to everybody in terms of use. But if it's deployed out in the world and the users are not aware of it, and now they say, oh, this is a scanner AI to diagnose lung disease, let's use it on everybody. And now it's being used on women. It's being used on children. It's more likely to produce systematic errors for women and children because it wasn't designed for that kind of context. And then once again, bias can rise.

SPEAKER_01

The example that you just gave had to do with lung conditions. And of course, healthcare data can come in various forms. It can come in text, but there are also other forms like medical images, genomics, wearables data. How can bias manifest when AI processes these non-textual types of healthcare data?

SPEAKER_02

There's already been plenty of uh research that shows that um for visual sorts of data like images, uh video, that current technologies in terms of uh recognition of images can show biases based on demographic groups. Same thing with respect to other types of data, as you mentioned, like genomics, uh wearables, just about all of AI is centered on one principle, which is it is trained on what is the average case based on the data it's fed. And whatever you feed and train the AI on becomes its average case. And so anything that looks different from that becomes something that the tool thinks, oh, it requires different treatment.

SPEAKER_01

Can there be biases in healthcare tools that arise because of the compositions of people who are using those tools?

SPEAKER_02

Absolutely. Um, that can definitely arise and produce what's called data deserts. What a data desert is, is basically an area where there's simply a lack of data for the algorithm to train on. Uh after all, the thing about artificial intelligence that makes it AI is that it's continually learning, continually being fed, kind of like humans, right? And so if it's only pulling in information from a specific group and it rarely, if at all, gets information from other groups, then it's going to be produce more errors because it simply doesn't have as much information to learn from.

SPEAKER_01

Turning now to the a little bit more to the third D, employment. Yes. And you mentioned feedback loops a minute ago. So under what conditions could feedback loops in the use of AI and healthcare and pharma improve or exacerbate bias?

SPEAKER_02

Feedback loops could actually improve the product, uh, would take a little bit more intervention, but it has potential. There's also um opportunities in terms of synthetic data. That's something that audience may not be aware of, where you can generate data that is not actual or real, but it looks or mimics the missing population. So employment, you know, a couple of ways where you can actually use aspects of the deployment to actually improve.

SPEAKER_01

So uh the deployment may actually be an opportunity for these systems to correct biases.

SPEAKER_02

100%. Yeah. Uh digital, digital twins or other forms of synthetic uh data have both promises and perils around them, depending on how they're being used. I mean, some of the perils associated with uh synthetic data is that it all comes back to what is it modeling? What is the data that it's modeling? If there was bias in the data, the originating data, then its twin or its synthetic version is also going to more likely contain that bias. But if it's being used in a way to augment your current data where it's filling in the holes by supplementing it with uh information from lesser seen populations or lesser seen cases, then it definitely, as a result, you're creating greater representation. You have your original data plus the holes that are being filled by the synthetic data, and now you have an opportunity to reduce those biases.

SPEAKER_01

I want to move on now to agentic AI specifically. So agentic AI was the subject of the two previous Future Health podcasts with Nate and Nichols. Just to start off, can you please just briefly describe or define agentic AI?

SPEAKER_02

Agentic AI is just really still AI. It's AI that allows, is able to accomplish multiple tasks on its own. It's autonomous, it doesn't need somebody to tell, do A, B, C, D, it knows those tasks in its own and comes back with a solution to what generally is a more complex problem.

SPEAKER_01

In the healthcare context now, do you see any potential risks of bias in healthcare providers' use of agentic AI systems in some of the applications that Nate Nichols talked about, such as assessing patient state and diagnoses or prescribing medications or recommendations to manage treatments over time?

SPEAKER_02

There's potential, and not only within uh the context you talked about, but within the agentic AI systems, because at the end of the day, the sources of bias are going to be in the design of the tool, the development of the AI tool, or the deployment of it. And that certainly applies to uh agentic AI. And I would add one more thing, in my opinion, it has the potential for compounding bias because it's doing multiple steps. It's not just doing one thing.

SPEAKER_01

Looking now then at agentic AI use by pharma companies, do you see any potential risks of bias there in some of the ways that Nathan Nichols described, such as drug discovery and development, monitoring compliance with medical regimens or mining patient data?

SPEAKER_02

There's similar issues that point back to the theme of again, design, development, deployment. And this certainly applies with um pharma. So if you're mining patient data, for example, um trying to uh collect data in order to be able to lead to the design of a drug. If the data itself is not representative of the populations that you intend this drug to be for, and if it's already limited, then it's going to lead the pharma down a path where their output automatically would be biased. I know that's something that the pharma quite often uh has a challenge with in terms of collecting enough samples from underrepresented populations, smaller populations, but you know, are populations that are large enough where they could be the subject of the pharmaceutical that they ultimately produce.

SPEAKER_01

That's interesting, Calin, because you talked in the beginning about the concerns about algorithmic bias being focused on demographic groups. But in the case of healthcare, there is also the risk of algorithmic bias that's rooted in health status or health conditions. So if uh, for instance, the data set that's available is people with a certain manifestation of a condition or certain stages of a disease, and you're using that to try and um uh develop drugs for a later stage of the disease, it it there may be there may be a bias problem. Facing stages. Yeah, it's not just as it could be demographic, but it could also have to do with with health status and conditions too.

SPEAKER_02

I should make clear is that I I know that uh algorithmic bias and fairness is quite often associated with specifically social demographic groups, but it doesn't necessarily have to be limited to that. It could be, like you said, there could be something about that is not about the traditional demographic groups of age, gender, race, etc., but could be in terms of stage of disease, which may be correlated with those things, maybe not. But uh if it produces outcomes that have no relationship to the objective, then it could produce a form of bias, you know.

SPEAKER_01

Let me ask you about another sort of potential concern about the risks of bias for use of a genetic I by pharma companies, which is decisions about where to invest. Is there um is there a potential risk of bias from a genetic AI there?

SPEAKER_02

Many firms have come across, you know, they see the importance of this, they think of it as an ethical uh consideration that they they must address, and maybe it's a step further as a legal consideration they need to address where they know they don't want to be sued based on bias that arises from their algorithms. But um, in the research I'm doing, I make the case and show conditions where it is also an economic implication. Bottom line, profits. You know, so for example, um, in one of my papers, I show that fairness perceptions can arise from algorithmic bias, even if it's not intended by the firm, even if it's not conscious to the users, oh, there's algorithmic bias coming out of this algorithm I'm using, but that it can influence consumer decisions down the line of what they use. You know, it certainly would be prudent for firms to invest in bias mitigation approaches because they don't want to see their profits uh disappear, their demand disappear. Uh, they want to see their algorithmic tools more accurate. And so investment in mitigation techniques for the algorithm, not just building the algorithm and setting it out in the world, but actually investing in mitigation techniques during the design phase, the development phase, and the deployment phase, and investing in continuous monitoring of because sometimes conditions will change. You know, you designed it thinking that the world is like this, and then the world changes tomorrow. And suddenly what was relevant in your algorithm and worked well today doesn't work tomorrow. There needs to be continual monitoring to make those adjustments as well.

SPEAKER_01

What are some of the strategies and research methodologies you use to uncover or detect bias in the first place?

SPEAKER_02

In terms of uh the world of algorithm fairness and algorithm bias, there certainly are some standards that are out in the world that I also use in my research to detect bias. And they fall into the categories of two categories of fairness standards, one called group-based fairness and one called individual-based fairness standards. And associated with that are measures and metrics. So the difference between the two is individual-based standards of fairness assume that similar individuals should be treated similarly by the algorithm. So, as an example, say I as a woman, uh, you as a man, Jay, are applying for a loan from a bank, if we have the same credit score, the same payment history, the same income, the same other indicators of financial stability, then the outcome for you and me should be the same. But if it ends up being different, and the only thing that's different between you and me is our gender, then that would be an example of bias or unfairness. And that can be measured in terms of individuals' fairness standards, you know, is everything the same except for some kind of demographic group or some kind of designation that has nothing to do with the outcome, which is whether to give each of us a loan. Group-based uh standards, in contrast, there's metrics out there, and that's the more common standard that's used in algorithmic fairness. We're uh looking at two groups, demographic groups usually, in this case gender, are we getting similar outcomes? It could be conditional on people who are qualified for loans. Of those who are qualified for loans, are we getting similar outcomes across groups? So that's the way you distinguish it. And then that's also the way you can detect bias. If there's differences, whether it's individual-based or group-based, that's an indication of bias.

SPEAKER_01

So given what you said, should the bias risks that we've been talking about preclude healthcare providers and pharma companies from using agentic AI systems at all? And if not, then what should healthcare providers that are considering agentic AI systems be doing to mitigate those risks?

SPEAKER_02

I feel very strongly that AI should be used. Um, using the analogy of the automobile, after it was invented, it's like, ah, let's not use the auto because I might get hurt. Rather, I think the the better approach is to uh take a proactive set of actions to minimize some of the issues that we're talking about. And I think that starts with first and foremost awareness. Right now, based on my assessment of the field and just the population in general, most people are not really aware of that there that bias and fairness issues can arise from AI. Um, so there needs to be some uh increased awareness and literacy of that, and certainly with healthcare providers, that would demand increasing that awareness and literacy with employees who are going to engage with AI or the employees who are going to develop AI. Beyond that, what firms could do, what healthcare providers could do, is really focus on or devote some interest and investment in setting up a process that is routinized as part of the process of building and using these systems. Certainly within um industries, we have similar frameworks. If you think about um TQM total quality management back in the day, Six Sigma back in the day, where these are frameworks that are designed to monitor and raise quality output and different processes, you can apply some of the same knowledge and wisdom with respect to AI systems.

SPEAKER_01

I would now like to bring in Anna Bradfield, Senior Vice President Brand Strategy at the ad agency VML Health, and co-chair of the ARF Pharma Council, who has a question.

SPEAKER_00

I think if you were to leave us with a parting thought, what would you encourage healthcare professionals and pharmaceutical companies who are developing AI tools to do in thinking about, you know, the 3Ds or anything else so that they really were developed with some of these principles in mind?

SPEAKER_02

There are a number of things that healthcare professionals and pharmaceutical decision makers can do to address these issues. I hope that I have um convinced the audience that these issues are certainly ethical issues, but it goes way beyond ethics because it affects different stakeholders. It affects firms, it affects citizens as consumers, it affects those who uh are employees working for these firms in terms of what they can do to improve these outcomes. Because I'm gonna return to the analogy that I made about the automobile. People were clear that automobile would uh really change and transform society in multiple ways. How do we use that, the power of that uh invention to make it safer and more trustworthy for everybody to use? And so things were brought in in terms of roadways, signs, stops, seat belts. We have the same potential with respect to AI. We are like at the beginning of the invention of the automobile, that we understand its uh potential. There are greater potential out there in AI that we haven't even begun to imagine, but now's the time to put in those uh roadways, those stop signs, those safeguards, those seat belts so that we can move forward boldly in trusting uh in the safety of the use of this new technology. And so, what can firms do? What can farmers do? What can professionals do? First, become literate in AI. Take the time to learn more about something more than ChatGPT or Gen AI or these uh bingo words that are out there. Become literate in what responsible AI is and the discipline that involves privacy, trust, fairness, uh, use of AI without creating harm. What are the key indicators in responsible AI that need to be addressed and watched out for? Consider learning about the framework that I talked about as part of that literacy, which is the 3D dependable AI, through the AI cycle design, development, and deployment. Where can these things arise in terms of fairness in each one of those steps? And in your position, what role can you play to help mitigate it? And then finally, um, the the last thing that they should be doing as part of this is monitoring because we know that conditions change, and therefore the data that the algorithms are learning from change. The conditions in what the algorithms are being used change. So there should be an actual evolution and adjustment of the actual responsible AI part of the strategy around use and deployment of AI. I would like to leave as a takeaway that it's important to develop a whole system and architecture that is in parallel with the system and architecture developing AI, which is around responsible AI protocols, you know, having a team in place, or at least people who are responsible for monitoring these things within the firm, is one major step towards accomplishing the outlook that we're talking about.

SPEAKER_01

So, Kalindish, do you think that there should be a review board, like an IRB, a government one like the FDA?

SPEAKER_02

There's a delicate balance of having something within the institution that's looking out for these issues without, but at the same time trying to do the best that we can to not add additional bureaucracy, red tape layers. So I will cautiously say there is room for that, but it depends, I think, on the size of the firm. Healthcare providers in general are pretty large, unless you're um an individual clinician who has their own office. So I'm going to speak to larger firms. I think there is room to have at minimum a committee that keeps an eye out on behalf of the healthcare provider on these types of things. Um, I also do believe, and this is connected to a paper that I have, where we examine what is the impact of having a third-party institution who monitors these things for other companies and then puts out a report. So think of, for example, Consumer Reports does that for products for consumers. So it could be government-ways, it could be nonprofit, a third party that can monitor these things and provide information to firms about when algorithms they're using are maybe starting to create uh responsible AI or fairness concerns. Um, we show conditions where it's not only beneficial to patients in this case or consumers, but we also show what conditions is actually quite beneficial to the firm and not something that's an overburden of bureaucracy that's worth the investment.

SPEAKER_01

So looking ahead, what new skills or training do you think will be necessary to effectively navigate this human automation partnership and maintain a focus on critical thinking?

SPEAKER_02

Yeah. So uh certainly future healthcare providers, clinicians, managers in general. Number one thing is to be AI literate. Uh, if you're going to be using the tools, you don't have to become an expert. You don't have to get a degree in computer science, but you should make sure you're aware of what AI can do and cannot do, particularly the tools you're using. And you should become aware of the potential for responsible AI issues, including bias and fairness issues. That involves understanding model limitations. That involves, you know, having a general understanding of the most common bias metrics that are used to measure. Like, how do you know if there is bias? There, there are some standards out there in terms of metrics to do that. It involves using counterfactual reasoning skills. So the most simple example is okay, here's an output for who should get uh medical additional medical intervention, who should not. Now, what if I flip the uh social dynamic graphic on patient Jay Matlin uh to something else? Does the model give a different outcome of whether Jay should have intervention or not? If it does, then there's something wrong that needs to be investigated and corrected, you know. So integrating on a routine basis, counterfactual thinking, certainly skill uh knowledge about some of the liability associated with these things, because there is a liability, not only is it an ethical issue, it's there's a liability issue, there is regulation, law that's rapidly being developed around these things. So this is important to become aware of. And the final thing is it is helpful and produces more success to work in an interdisciplinary way rather than an asilo. So if you are somebody who's a clinician, you're interacting with the medical equivalent of Chat GPT for what you're doing, it's important to also have some kind of engagement with the nurses who you support or the physical assistants to engage with those who are taking uh infant information for insurance purposes and billing purposes, that everybody's on the same page in an interdisciplinary way in terms of what the impact could be.

SPEAKER_01

So, Dr. O'Connor, is there anything else that you uh would like to talk about with respect to potential for bias in agentic AI systems for help that we have not yet covered?

SPEAKER_02

I think the main thing around agentic AI that I mean happens to be some research I'm working on, but of course it's not complete, so I don't proof yet. But I do believe, based on early evidence, that agentic AI really needs it has wonderful, wonderful promise to do to take things to the next level with respect to AI. I think it'll be the next big boom in AI. However, it has the potential to multiply the effects of fairness and bias because the common theme about agentic AI is it's doing multiple tasks on its own. There needs to be attention on all of AI, but I think that there needs to be even more attention with respect to agentic AI. But that being said, I don't want the anyone to have a takeaway of, oh, then we shouldn't use it at all. I think AI has tremendous promise and continues to have tremendous promise for us as society.

SPEAKER_01

Thank you very much, Kalinda. This has been fascinating. Our guest today has been Dr. Kalinda Ukanwa, professor at the University of California Irvine, and researcher on responsible AI, algorithmic fairness and bias, and the implications of these four firms and consumers. This has been Future Health, a podcast on trends in the patient journey, what to expect in the next three to five years. If you would like to learn more about the ARF Council program, please visit our website, the ARF. That is THE ARF.org. Backslash communities backslash. And click on councils. You can also follow the Advertising Research Foundation on Facebook or on X at the underscore ARF. Please join us again for more future health podcasts from the ARF Pharma Council. I'm Jay Batman.