The Signal Room | AI in Healthcare & Ethical AI

AI Explainability, Algorithmic Bias, and Human-in-the-Loop Design in Healthcare | Keshavan Seshadri

Chris Hutchins | AI Strategy & Healthcare AI Expert Season 1 Episode 9

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 28:23

Send us Fan Mail

AI explainability, algorithmic bias, and human-in-the-loop design are the three pillars that determine whether healthcare AI systems can be trusted. Keshavan Seshadri, Senior Machine Learning Engineer at Prudential Financial, builds machine learning platforms, generative AI solutions, and agentic frameworks. His core argument is that context is everything, and current models cannot safely operate in healthcare without four distinct types of context built into their design.


In this episode of The Signal Room, recorded live at Planet Hollywood in Las Vegas during the Put Data First Conference, host Christopher Hutchins, Founder and CEO of Hutchins Data Strategy Consultants, sits down with Keshavan to explore what authentic intelligence really means and how it differs from artificial intelligence. Keshavan identifies four types of context that healthcare AI systems require: patient context including health records, demographics, and disease history; task context recognizing that diagnosis, surgery, and administrative workflows each demand different approaches; operational context knowing which physicians and resources are available in real time; and institutional context covering hospital policies, governmental regulations, and compliance requirements.


The conversation covers why general purpose LLMs are not yet ready to function as specialized clinical tools, how confidence scores and risk metrics can create meaningful human-in-the-loop checkpoints, and why the asymmetric cost of diagnostic errors demands careful calibration. Keshavan explains that telling a patient they have cancer when they do not leads to additional tests, but telling a patient they do not have cancer when they do is far worse. He describes how explainability must be built into the design process from the start, not added after delivery, and how reinforcement learning from human feedback creates feedback loops that help models correct for bias over time. His three priorities for AI designers: understand what data the model was trained on, bring in as much thorough context as possible across all four dimensions, and build guardrails that are proportional to the risk of each task.

Support the show

About The Signal Room: The Signal Room is a podcast and communications platform exploring leadership, ethics, and innovation in healthcare and artificial intelligence. Hosted by Christopher Hutchins, Founder and CEO of Hutchins Data Strategy Consultants. Leadership, ethics, and innovation, amplified.


Website: https://www.hutchinsdatastrategy.com 

LinkedIn: https://www.linkedin.com/in/chutchins-healthcare/ 

YouTube: https://www.youtube.com/@ChrisHutchinsAi

Book Chris to speak:  https://www.chrisjhutchins.com

Keshavan Seshadri:

I just think authentic intelligence means being able to deliver solutions more so closely to the level of the human world, more so in alignment with an AGI perspective. What really brings about enabling current LLM-based models would be to bring in the appropriate context. Context is important, context is everything, right? What diseases they had in the past, the demographics, and so on. That's sort of the first data point you could give the LLM as context. You need to factor in explainability and transparency while building that model, right?

Christopher Hutchins:

Like everyone, I want to welcome you to the Signal Room podcast. For our listeners, we are live at Planet Hollywood in Las Vegas, Nevada for the Put Data First Conference. We are here talking about all things AI, ethical use of AI, machine learning. It's the buzzword of the day kind of conference, I couldn't think of a better place to be. I just want to welcome Keshavan Seshadri. Keshavan is the Senior Machine Learning Engineer at Prudential Financial. For people who don't really understand what that means, maybe tell us a little bit about what you do in your role, and then we'll kind of get into some topics that are top of mind for you.

Keshavan Seshadri:

Like you said, I work on pretty much everything that's exciting. AI being the buzzword, I work right in the center of it. I work on building machine learning platforms, generative AI-based solutions, and agentic frameworks and anything that's new. Yeah.

Christopher Hutchins:

You're not hiding your excitement. I like that. It is exciting work. I think people are a little bit fearful of it, but I think conversations like we're having today are important because it starts to help people understand the only thing to be fearful of is not learning, honestly. And so the topic of what we decided to talk about is authentic intelligence and designing AI that understands context. This is a little bit of a nuance that I don't typically hear from a machine learning expert about. But if you don't mind, could you talk about authentic intelligence and what does that really mean to you? And how is it different from artificial intelligence?

Keshavan Seshadri:

I just think authentic intelligence means being able to deliver solutions more so closely to the level a human would, more so in alignment with an AGI perspective. And what really brings about enabling current LLM-based models would be to bring in the appropriate context, which is where we're at. And this involves both short-term memory context and then there's long-term memory context as well. And I think in this field, in terms of healthcare, context is one of the most important things because there are different types of context one could bring about. And when you do design any sort of AI solution, be it an agentic AI or a more autonomous AI system, context is important, context is everything.

Christopher Hutchins:

Yeah, totally. You mentioned the healthcare field. In clinical and administrative workflows, where does context break down the most, do you think? And how can AI be designed to recognize its own limits?

Keshavan Seshadri:

Right. So in terms of healthcare, I see four different types of contexts that one could bring about. The first being the patient context, right? Their health records, what diseases they had in the past, the age, the demographics, and so on. That's sort of the first data point you could give the LLM as context. And then there is another interesting context, which is about if you were doing a task, be it diagnosis, or maybe you're working on something else in healthcare, maybe assisting with surgery. Each of the tasks itself has a different context and a different set of ways to go about performing that task. And then there's another context, which would be in terms of if I were to diagnose something today, do I have the doctors near me or doctors in the hospital available right now to actually help with the diagnosis or finalizing the surgery or any task? So you need a human in the loop as well, right? And you also need to tell the LLM or the AI model which doctors are available right now and which ones could I bring in to help me solve the problem. And then there's another context, which is in terms of compliance and regulations and institutional context. Each hospital has their own set of policies, and then there's governmental regulations and compliance requirements as well. So there's four different types of context one could bring in within healthcare.

Christopher Hutchins:

You touched on the compliance area. There's a lot of apprehension and fear out there, particularly in the medical profession, just because a clinician understands and has studied context probably better than most people. It's just that the training that they've had for years, they know inherently where to go to find specific types of resources. Based on all their training, it could be their studies or research that they've done. But that may not be an obvious thing for a machine or an algorithm to even deal with. Talk about how you see this capability coming alongside and assisting a clinician and how can we use it in ways that we're enabling judgment. Do you design first by having these conversations and doing some research along with clinicians? How do you see things evolving in a way that can actually be meaningful where adoption becomes just a natural part of the process as opposed to a sales pitch that you have to give your physicians to try to use it and trust it?

Keshavan Seshadri:

Obviously working with the physicians and collecting the data and collecting all those doctor-patient conversations and whatever is allowed. Collecting all that data and figuring out a way to train the model. Sometimes the general purpose LLMs might not really be up to the task, say for diagnosis. It needs more context. And context could be given to the LLM post-training or pre-training itself. Before I train the model or while I train the model, I sort of give the necessary context that it understands what is a disease, and how do you diagnose the disease and what sort of symptoms you need to look at and so on. This is something that could be given as data before even working on the model, before training the model. So maybe we can think of this as creating new LLMs or new models, new AI models to help with different tasks. And another way we could think about is fine-tuning or context engineering in this case. Bringing in the necessary context to a more general purpose advanced reasoning LLM, like the GPT models, and giving that context while it's making the decision or trying to understand the problem.

Christopher Hutchins:

You touched on something I think is really worth digging into. Physicians get the basic core foundational training and then they specialize, right? Your cardiothoracic surgeon is not likely stepping over to do the endocrinologist's job and vice versa. When you're talking about some of this nuance and the way you're developing these models, what do you think it's gonna take to develop models that are really built in such a way that it supports these sub-verticals underneath the clinical profession of a physician? Do you think it's gonna be something where for every specialty or some specialties, we're gonna have specialized large language models that would support those workflows?

Keshavan Seshadri:

Yeah, so let's take a step back and see how humans do it right now. Humans, or the doctors, they take a more general course for the first few years, and then they do some sort of residency training and figure out how to work within the hospital. Then they choose a specialty and then they repeat the process. The training takes about 10, 15 years. Good doctors even keep on studying until their late 40s.

Christopher Hutchins:

So they're not interested in just assuming your AI is gonna be ready to go then, huh?

Keshavan Seshadri:

Yeah, so AI is not gonna be the specialized doctor that you're looking for, but it could certainly help with more general purpose tasks, at least with the current models that we have. They could be more like a student in the university. I feel like a university grad or something, not yet a doctor. And we do need specialized models, but even then there's a chance of error. So I wouldn't really call AI a doctor at the moment, or at least in the next few years. It takes a bit more, and we need to build mechanisms for evals, mechanisms for guardrails. It takes more time.

Christopher Hutchins:

Yeah, I think supporting human judgment is a really good way to think about it and probably the safest place we can stay focused on in the near term. Who knows what things are gonna evolve. I'm hearing discussion about things like super intelligence and super artificial intelligence. I thought what we're doing right now is pretty remarkable. I'm not sure if I'm ready for these advanced things people are thinking about, but if they're already thinking about it and talking about it, I'm gonna guess it won't be long before we start learning a lot. If you're thinking about the balance of efficiency and automation, how do you think we account for the built-in checkpoints where we're looking for human judgment? Because when you're talking about it needs to be designed with context, it also kind of needs to know when a human being needs to get involved or provide guidance or a judgment call. How do you think about that? And how do you build that into your models so that there's deliberate stopping points? Because I've noticed in generative AI in particular, unless you're asking it to prompt and ask you questions, it assumes you're telling it what it needs to know.

Keshavan Seshadri:

Yeah. Right now the models work based on probability. It just predicts the next token, and not even a next word, a next subword. All these LLMs, they assign sort of a score, which word should come next. It's essentially doing that.

Christopher Hutchins:

So an advanced version of what Google did a decade ago to us finishing our sentences.

Keshavan Seshadri:

Yeah, it's pretty much doing something like that. It's not yet at a stage where it can completely understand what it's doing, at least with the models that we have right now. In terms of relating back to your question, when you do design these AI models, you need to build an eval set. You need to have a mechanism or guardrails in place for when to defer to a human. That's dependent on the score, a confidence metric that we can put in place. Maybe if the confidence score is greater than 0.9, then you're pretty good to go. But it also depends on the risk. In terms of saying a patient has cancer when they don't really have it, that's okay in the sense that it just leads to an extra bit of tests, and that might become expensive for the patient. Versus if you say a person doesn't have cancer when they actually do have cancer, that's even worse. Missing a diagnosis.

Christopher Hutchins:

Yeah, and hopefully we're smart enough to know that we cannot let our AI models of any kind automatically make a decision and drive outreach and communication.

Keshavan Seshadri:

Of course, we have a human in the loop. It's just like the first trial. Even when you give those decisions or the AI judgment, you also need to include a confidence score and a risk metric so that when the human looks at it, the human understands the AI probably tried to do a good job but couldn't, based on the current patient's reports or the model's underlying knowledge. So pretty much giving the human a metric to look upon so they could evaluate the answers from the AI model. That's what we need. This comes back to precision versus recall versus different other metrics that you can potentially include.

Christopher Hutchins:

Yeah, I mean it's interesting you mentioned recall. It's one of the things I got excited about early on. One of my good friends is here. He's a physician by training, and he pointed out very wisely to me when I was excited about having the ability to perfectly recall things, that sometimes the things that we have learned experientially aren't found in documentation, and therefore AI might not even know about it. So the fact that it can remember something that I documented a decade ago may not actually be relevant when I think it is, if I'm thinking about it from an algorithm standpoint and what AI is actually going to refer to because it just happens to be there in the model. It kind of leads me into another topic around bias, but not the kind of bias people typically think about. It's like, what does AI not know about that's causing it to perceive and answer prompts in a certain way?

Keshavan Seshadri:

Yeah, so AI is trained on data. The large language models, more commonly, are trained on the entire internet, the books, pretty much anything you can crawl upon. So it's like if you put in good stuff in terms of training the model, good data, then you get good results. If you put garbage in, then there's garbage out. It's essentially that.

Christopher Hutchins:

When you're dealing with the bias piece of it, and then when I think about how that can impact things like explainability, there's a fair amount of legislation that's already come to fruition. I know there's a lot passed in California recently around requiring transparency for healthcare providers who are actually using AI. But the second piece of that was not just the transparency, but it has to be explainable, which is different in healthcare because we really do need to make sure that we're providing enough understanding so a patient can make an informed decision around whether they want to have their data being used that way. As you're thinking about the things that you're designing, building, and people that you're working with, how do you all think about really building in the ability to address this right from the outset? It seems like after you've delivered the first product, it might be a little late to have thought about how do you make it not only transparent but explainable.

Keshavan Seshadri:

I mean, it all sounds fancy, but it's actually really quite simple to understand. In terms of explainability, you need to work it along with the design. When you design an AI system, not just an LLM or a model, but an entire AI system, an autonomous AI agent or something of that sort, you need to factor in explainability and transparency while building that model. You need to have proper logging, proper ways to have LLM as a judge, have the LLM try to explain why it took a decision, what tools did it call, what agents did it call, and so on. This is sort of what goes behind the explainability, and showing that explainability to the user is what brings in the transparency as well. And it also addresses bias because if there was bias in the decision making, and the AI is able to explain that it had some sort of bias, when there's another agent or a human that looks at what the AI explained and then figures out there's some sort of bias in the decision, they could potentially flag it. And then you don't really show it to the user if there's bias in the decision making. And there's another way to think about bias itself. You could have feedback loops where when a user or a human or another AI flags a particular response, the feedback is sent back to the model and it sort of fine-tunes itself. This is called reinforcement learning based on human feedback. That's pretty much what most of these AI systems are now trying to build.

Christopher Hutchins:

I want to ask you to maybe tell on yourself a little bit. Or maybe not, maybe it's telling on someone else. But can you think of a moment where an algorithm produced an insight that was technically correct but contextually it was wrong? And where I'm going with that is, is there something that gave you an insight about human and machine collaboration and why it's so critical? Because you could get the right answer, but delivering it in the wrong context could be really bad. I'll give you a real example that people would probably remember. It was a number of years ago, there was some capability that was driving some patient engagement activities. The father got alerted because his daughter had purchased a pregnancy test at a pharmacy. Right information, contact was not so good. It would have been better handled a little bit more sensitively. So this young woman didn't get direct communication, it went to her parents. No opportunity for any kind of healthy exchange in a way that would have made the girl feel safe having a conversation. It's nuanced and it's definitely about the human relationship part of it. But that's what I'm thinking about. Are there things you can think of where you've seen that? Okay, right answer, but wrong context.

Keshavan Seshadri:

I don't know if you've used ChatGPT nowadays, but what it does is it acts more personal. It talks to you and tries to understand what you are and has a short-term sort of memory stored about you. And every answer the model gives, it ties it back to that, to what you are. So maybe I ask a scientific question, like what happens if you consume something during pregnancy. It might be that the AI answers it based on textual facts, but actually telling it to the person asking the question, that might not be the best approach. It might be insensitive, or we don't really know. The answer could potentially cause negative feelings or negative impact, which we don't want as the people who design AI. We want people to have a positive impact on the world. That's pretty much why we want to design AI. And when we design these systems, we want to bring in that human feel or human touch as well. When we cannot rely on AI to bring in that human touch, we need a human itself to actually give the answers to the person.

Christopher Hutchins:

Yeah, that's such an important thing to remember. The realization I had was talking to an emergency room physician a couple years ago, and one of the things he brought up was even the way that we do analytics for quality reporting, whether it's for CMS or any regulatory agency, things are based on averages. One of the things we have to be keenly aware of is you're not going to find an individual that looks like the average that comes from this mountain of data that people are looking at to create these metrics. In a really risky example that I hadn't really considered, if you're an emergency room doctor and you see me without knowing anything about my history, and you see that I've got high blood pressure, or my blood sugar seems outside of a normal range to you based on averaging, if you don't know what is normal for me, you might overreact and prescribe something that actually can cause an unintended adverse reaction. I think what we're talking about is a really important area that we want to make sure people are mindful of. If you've got a podium and we're talking to however many people are going to be listening to the podcast, what would you tell people who are designing AI and designing these models, large language models, that they should be thinking about? Are there two or three things that you would tell them they should really be doubling down on now while they're designing that'll help prevent them from getting into situations where contextual awareness is just not adequate?

Keshavan Seshadri:

Yeah. The first thing is when you use a model, you first need to understand where and how it was trained. What data was it trained upon? Maybe you use a model that was trained on a data set specific to a location, like the US, but then the doctors are in Europe. It might not really work because the demographics change, the diagnosis changes, a lot of different things. So when you design a system, you first need to understand what model to use and what data has it been trained upon. When you decide to use that model, you also need to ensure that you use it for the right purpose it was intended to do. If you're using a general purpose LLM, you should really understand that it was just designed for general purpose and not really something specific to designing a medical AI system. So data is important. And the context you bring in should be as thorough as possible. It should bring in all possible factors. Like I said, the four different factors about the human, about the institution, about what sort of problem we're solving, and if there's a human that we could defer or delegate the responsibility to, and what humans are available. So it depends on as much context as possible. And again, as much good data as possible.

Christopher Hutchins:

So you're not recommending people design without humans in the loop for everything?

Keshavan Seshadri:

Not right now. I mean, I think some of the tasks, it depends on the risk each task has. Diagnosis has a certain level of risk versus something like surgery or making the decision on the spot whether to make an insertion five millimeter or three millimeter. That is something that really needs a human in the loop. So it depends on the risk. And when they design these AI systems, the models that you use need to also incorporate the explainability and transparency. And the third thing is it needs to have good eval systems, good testing mechanisms, and guardrails in place. Guardrails are pretty much the most important thing when it comes to extremely regulated, compliance-required fields like healthcare.

Christopher Hutchins:

So as we're wrapping up, if folks wanted to get in touch with you, how can they find you?

Keshavan Seshadri:

They could find me on LinkedIn. Yeah, LinkedIn is where I'm most active. And I would respond via emails as well.

Christopher Hutchins:

Well, I'll make sure that we put everything in the show notes. So if you're listening and you want to reach out to Keshavan, we'll make that really easy for you. Keshavan, it's been such a pleasure to have you on the Signal Room and great to meet you here. I'm excited to continue having some great conversations the next couple days while we're here in Las Vegas.

Keshavan Seshadri:

Of course. Thank you very, very much. Thank you so much, Chris. Thank you.

Christopher Hutchins:

That's it for this episode of the Signal Room. If today's conversation sparks something in you, an idea, a challenge, or a perspective worth amplifying, I'd love to hear from you. Message me on LinkedIn or visit SignalRoomPodcast.com to explore being a guest on an upcoming episode. Until next time, stay tuned, stay curious, and stay human.