AI Explainability, Algorithmic Bias & Human-in-the-Loop Design | Keshavan Seshadri Artwork

The Signal Room | AI Governance, AI Strategy & Ethical AI

Welcome to The Signal Room, your go-to podcast for expert insights on ethical AI, AI strategy, and AI governance in healthcare and beyond. Hosted by Chris Hutchins, this show explores leadership strategies, responsible AI development, and real-world implementation challenges faced by healthcare AI leaders. Each episode features deep conversations covering healthcare AI innovation, executive decision-making, regulatory compliance, and how to build trustworthy AI systems that transform clinical and operational realities.

Whether you are an AI strategist, healthcare executive, or AI enthusiast committed to ethical leadership, The Signal Room equips you with the knowledge and tools to lead AI transformation effectively and responsibly.

Join us to learn from industry experts and healthcare leaders navigating the evolving landscape of AI governance, leadership ethics, and AI readiness.

Follow The Signal Room and stay updated on the latest trends shaping the future of ethical AI and healthcare innovation.

All Episodes

The Signal Room | AI Governance, AI Strategy & Ethical AI

AI Explainability, Algorithmic Bias & Human-in-the-Loop Design | Keshavan Seshadri

December 24, 2025 • Chris Hutchins | AI Strategy & Healthcare AI Expert • Season 1 • Episode 9

0:00 | 28:23

Send us Fan Mail

Keshavan Seshadri covers explainability, bias detection, and human-in-the-loop leadership strategies essential for trustworthy healthcare AI.

Keshavan, a Senior ML Engineer at Prudential Financial, makes the case that intelligence without transparency is not intelligence at all. The conversation covers explainability requirements for AI systems, how bias creeps into models and what it takes to detect it, and why human-in-the-loop design is not a limitation but a feature that strengthens AI deployments.

Explainability in AI is not a nice-to-have; it is a prerequisite for any deployment where decisions affect people's lives. Bias in machine learning models is a systemic issue that requires ongoing monitoring, not a one-time audit. Organizations that treat bias detection as a launch criteria rather than a continuous practice will continue to experience failures.

Topics covered: AI explainability requirements, algorithmic bias detection and monitoring, human-in-the-loop design principles, model transparency for clinical trust, and why authentic intelligence demands that humans remain accountable for AI-driven decisions.

Support the show

SPEAKER_03 0:00

Uh I just think authentic intelligence means uh being able to deliver solutions more so closely to the level of human world, like more so in alignment with like AGI uh perspective, really brings about enabling current LLM-based models uh would be to bring in the appropriate context. Context is important, context is everything, right? What diseases they had in the past, uh, you know, like the AI's, the demographics, and so on, right? That's sort of the first data point like you could give the LLM as context. You need to factor in explainability and transparency while building that model, right?

SPEAKER_01 0:38

Like everyone, I want to welcome you to the Signorom podcast. Uh, for our listeners, we are live at Hollywood in Las Vegas, Nevada for the Data First Conference. We are here talking about all things AI, ethical, use of AI, uh, machine learning. It's it's the buzzword of the day kind of conference, I couldn't, I think. But I just want to welcome you Kirshawan Shishari. Uh Kirshavan is the senior machine learning engineer or potential financial. Um, for people who don't really understand what that means. Maybe tell us a little bit about you know what what do you do in your role, and then we'll kind of get into some topics that are top of mind for you.

SPEAKER_03 1:22

Uh, like you said, I work on pretty much everything that's exciting. Uh, and AI being the buzzword, uh, I work on the right in the center of it. Uh I work on building machine learning platforms, uh, generative AI-based solutions, and like agentec frameworks and anything that anything that's new, right? I work on that. Yeah.

SPEAKER_01 1:43

You you're not hiding your excitement. I like that. It is exciting work. I think people are a little bit fearful of it, but I think conversations like we're we're having today are important because it starts to help people to understand the the only thing to be fearful of is not learning, honestly. And so the topic of what we decided to talk about is authentic intelligence and designing AI that understands context. This is a little bit of a nuance that I don't typically have hear from a machine learning expert about. But uh if you don't mind, could you talk about authentic intelligence and what does that really mean to you? And how is it different from artificial intelligence?

SPEAKER_03 2:23

Uh, I just think authentic intelligence means uh being able to deliver solutions more so closely to the level a human word, like more so in alignment with like AGI uh perspective. Right. And the uh and what really brings about enabling current LLM-based models uh would be to bring in the appropriate context, which is where uh we're at. Right. And this involves both like short-term memory context and then there's like a long-term memory context as well. And um I I think in this field, right, like in in terms of healthcare, context is one of the most important things because there are like different types of context one could uh bring about. Uh and like when you do design any sort of AI solution, be it like an Agentic AI or an uh like most of an autonomous AI system, right? Um context is important, context is everything.

SPEAKER_01 3:23

Yeah, totally. I mean, you you you mentioned the healthcare field um in clinical and administrative workflows, where does context break down the most, do you think? And how can AI be designed to recognize its own limits?

SPEAKER_03 3:38

Right. So in terms of healthcare, right, like I see four different types of uh contexts that one could bring about. The first itself being like the patient context, right? Like the um their health records, uh, they're sort of like the what diseases they had in the past, and uh, you know, like the age, the demographics, and so on, right? That's sort of the first data point like you could give the LLMS context. Uh and then uh there is this another uh interesting context is about like if you were uh doing a task, right? Like be it diagnosis, or um, you know, you're doing maybe uh you're working on something else in in healthcare, like maybe in in assisting with surgery, each of the tasks itself has a different uh context, right, right uh and like different set of ways to go about performing that task. And then there's another uh context, which would be in terms of if I were to diagnose something today, right? Like, do I have the doctors and my uh doctors near me or like you know, doctors in the hospital available right now to actually, you know, help with the diagnosis or like you know, finalizing the surgery or being any task, right? So you need a human in the loop as well, right? And and you also need to tell the LLM or the AI model which doctors are available right now and which uh ones could I bring in to help me solve the problem. And then there's another context, which is in terms of like uh compliance and regulations and uh institutional context, right? Like each hospital has their own set of policies, um, and then there's governmental regulations and uh compliance requirements as well. So there's like four different types of context one could uh bring in within like healthcare.

SPEAKER_01 5:27

Uh you you touched on the compliance area. There's there's a lot of apprehension in our right fear, honestly, out there, um, particularly in the in the medical profession, just because a clinician understands studied for context probably better than than most people. It's it's just that the the training that they've had for years, um they know inherently where to go to to find you know specific types of um resources. Um based on all their training, it could be their studies or research that they've done. But that may not be an obvious thing for a machine or an algorithm to even deal with. Uh talk about how you see uh this capability kind of um coming alongside it and assisting a clinician and how can we use use it in ways that we're enabling judgment and and how do you design? Do you do you if for your perspective, are you designing um first by having these conversations and doing some research along with some clinicians? Or but how do you see things evolving in a way that can actually be meaningful where adoption becomes just a natural part of the process as opposed to uh it's a sales pitch that you have to give your physicians to try to use it and and you can trust it?

SPEAKER_03 6:45

Uh I mean, obviously working with the physicians and like collecting the data and collecting all those uh, you know, doctor-patient conversations and you know, whatever is uh allowed, right? Like collecting all that data and figuring out a way to train the model, right, right? Uh sometimes, right, like the general purpose LLMs might not really be up to the task, right? Like, say for uh diagnosis. Um it needs more context. And context could be given to the LLM post-training or pre-training itself, right? Like before I train the model or like while I train the model, I sort of give the um necessary context that it understands like what is a disease, and like uh how do you diagnose the disease and like what sort of symptoms you need to look at and so on, right? This is something that could be uh given uh as data before even like working on the model, right? Before like training the model, right? So maybe we can think of uh this as like creating new uh LLMs or new models, new AI models to uh help with different tasks, right? And another way we could think about is like fine-tuning or context engineering in this uh case, right? Like bringing in the necessary context uh to a more general purpose uh advanced reasoning LLM, like you know, the GPD models or right, and and giving that context while um it's making the decision or like making the sort of uh make trying to understand the problem, right?

SPEAKER_01 8:14

You you touched on something I think is really worth digging, you know, digging a little bit into. Physicians get the base the basic core foundational training and then they specialize, right? Um your cardiothoracic surgeon is not likely stepping over to do the endocrinologist's job and vice versa. Um when you're talking about you know some of this nuance and uh the way you're developing these models, what do you what do you think is take it's gonna take to develop models that are really uh built in such a way that it really supports these I I don't want to call them structures, but basically there's they're almost sub-verticals underneath the the the clinical profession of a physician. So how how do you think about that? And and do you think it's gonna be something where we're gonna basically for every specialty or some specialty, we're gonna have specialized large language models that would support those workflows?

SPEAKER_03 9:08

Yeah, so if you see, right, like let's go back, uh let's take a step back and like see how humans do it right now, right? Like humans have uh or like the doctors, they take a more general course for the first few years, and then uh they sort of like do some sort of residency training and like um, you know, figure out how to work within the hospital, right? Like and then they choose a specialty and then they uh do repeat the process, right? Uh and and the the training sort of takes about 10, 15 years. Uh that's what I heard. Uh like good doctors even keep on studying uh until like late 40s, right?

SPEAKER_01 9:44

So they're they're not interested in just assuming your AI is gonna be ready to go then, huh?

SPEAKER_03 9:49

Yeah, so uh AI is not gonna be, you know, the specialized doctor that you're looking for, but it could certainly help with like more general purpose, um like at least the current models that we have, like the general uh purpose models, right? Like they could be more so like a, you know, uh a student in the university, right? Like it's uh I feel like a university grad or something, right? Like a uh not yet a doctor, right, right? Uh uh and we do need like specialized models, but even then there's like a chance of error, right? So I wouldn't really call AI as a doctor at the moment, or like at least in the next few years, it takes a bit more, and like we need to build like mechanisms for evals, mechanisms for guardrails, and right uh it takes more uh time. Yeah.

SPEAKER_01 10:41

Yeah, I I think supporting human judgment is a really good way to think about it and probably the safest place we can we can stay focused on in the near term. Um who knows what things are gonna evolve. I've hearing discussion about things like super intelligence and like super artificial intelligence. I thought what we're doing right now is pretty remarkable. I'm not sure if I'm ready for for this for these advanced things there. People are thinking about, but if they're already thinking about it and talking about it, I'm gonna guess it won't be long before we start learning a lot. Okay. It's uh it's definitely a time to be paying attention in and learning what we can know and figuring out how we can use it to our advantage. Uh if you're thinking about the balance of efficiency and automation, um how do you think we account for uh the built-in checkpoints where we're looking for human judgment? Because when you're talking about it needs to be designed with context, it also kind of needs to know when a human being needs to get involved or provide guidance or judgment call or whatever. I mean, how do you think about that? And how do you build that into your models that you're working on so that there's deliberate stopping points? Because I've noticed in generative AI in particular, unless you're asking it to prompt and ask you questions, it assumes you're telling it what it needs to know.

SPEAKER_03 12:05

Yeah. Yeah, yeah. I mean, right now the models, right, like work based on probability, right? Like it just predicts the next token, and not even like a next word mode, like a next subword, uh like uh all these LLMs, right? Um, so and then it assigns sort of like a score, right? Uh which word should come next, right? It's essentially doing that.

SPEAKER_01 12:31

So an advanced version of what Google did a decade ago to us finishing our sentences.

SPEAKER_03 12:37

Yeah, yeah, yeah. It's pretty much doing uh something like that, right? Um it's not yet at a stage where it can completely understand what it's doing, right? At least with uh the models that we have right now. So uh in terms of like relating back to your question, um like when you do design these models, right, like the AI models, you need to build like an eval set. Like you need to give how uh or like have a mechanism or guardrails in place when to you know differ to a human, right? Like that's dependent on the score, like a confidence metric that we can uh put in place, right? Like maybe if it is uh if the confidence score is greater than 0.9, or like it's then you're pretty good to go, right? But like when it and also it depends on the risk, right? Like in terms of saying a patient has cancer when he doesn't really have, uh, that's okay. But like uh that just leads to an extra bit of tests, and like, you know, you have to um you know, get uh like you have to order more tests, and that might become expensive for the patient, right? Like so that's okay. Uh that's like okay in the sense, uh okay to an extent uh versus like uh if you say a person, you know, doesn't have cancer when they have ac when they do actually have cancer, that's even worse, right? Like missing a diagnosis.

SPEAKER_01 14:07

Yeah, and hopefully we're smart enough to know that we cannot let our AI models of any kind automatically make a decision and drive outreach and communication.

SPEAKER_03 14:17

Of course, we have a human in the loop. Um it's just like the first trial, right? Like even when you, you know, you sort of give those uh decisions or like, you know, the AI uh judgment, you also need to include sort of a confidence score on like a risk metric. Um so that like when the human looks at it, it the human understands, right? Like the AI probably tried to do a good job but couldn't, right? Uh based on the current uh patient's current uh reports or like their uh model's underlying knowledge. Um so pr pretty much like uh giving the human a metric to look upon so they could uh evaluate the answers from the AI model. That I think that uh that's what we need. So the the this comes back to like precision versus recall versus like different other metrics that uh you can potentially include.

SPEAKER_01 15:06

Yeah, I mean it it's interesting you you mentioned recall. It's one of the things I got excited about, you know, early on. And you know, one of my good friends is here. Um actually he's working on some things behind me. He's uh he's a physician by training, and um, he pointed out very wisely to me when I was excited about having the ability to perfectly recall things, that sometimes uh the things that we have learned experientially aren't found in documentation, and therefore AA might might not even know about it. So the fact that it can remember something that I that I documented a decade ago may not actually be relevant when I think it is from from a if I'm just thinking about it from an algorithm standpoint and what AI is actually going to refer to because it's just happens to be there and it's in the model. Um it kind of leads me into another topic, kind of around the, you know, it's bias, but not but not the kind of bias people typically think about. It's like, what does AI not know about that's causing it to perceive and answer prompts in a certain way?

SPEAKER_03 16:08

Yeah, so AI is trained on data. Like the large language models, right, more so commonly are trained on like uh the entire internet, right? Right? Uh entire internet, the books, like pretty much anything you can crawl upon, they you train on a lot of different, right? Yeah. So it's like if you put in good stuff in, uh like in in in terms of like the training the model, um, good data, then like you get good results, right? If you put garbage in, then there's garbage out, right? So it's essentially that.

SPEAKER_01 16:42

It's a so when you're dealing with that, you know, the bias piece of it, but then when I think about how that that can impact things like explainability, right? And I would there's a there's a fair amount of legislation that's that's already uh come to fruition. I know there's a lot passed in California recently around requiring transparency for healthcare providers who are actually using it. Um, but the second piece of that was not just the transparency, but it has to be explainable, um, which is different in healthcare because we really do need to make sure that we're providing enough understanding so a patient can make an informed decision around whether they want to have their data being used that way. Um as you're thinking about the things that you're designing, you're building, and people that you're you're working with, how do you all think about uh really building in the ability to address this right from the outset? So it seems like after you've delivered the the the first product, um, it might be a little late to have thought about how do you make it not only transparent but explainable. Because I mean you've been very uh articulate and easy to understand for me so far, but I I would be willing to bet that if you start telling me the real details of the work you're doing, you would lose me completely.

SPEAKER_03 17:55

I mean, and not so much. Like it all sounds fancy, but it's actually really quite simple to understand. It's not uh that difficult. And uh in terms of explainability, at least, right? Like you need to work it along with the design, right? Like when you do design an AI system, right? Like, not just like an LLM or like a model, right? But like in an entire AI system, an autonomous AI agent or something of that sort, you need to factor in explainability and transparency while building that model, right? Like you need to have proper logging, proper uh ways to, you know, have LLM as a judge, have LLM, you know, go and like try to explain me why it took a decision, what tools did it call, what agents did it call, and so on, right? Like this is sort of what goes behind the explainability and uh and showing that explainability to the user is what brings in the transparency as well. Right, right. Um and also it addresses bias because if there was uh bias in the decision making, right, like and the AI is able to explain that it had like sort of a bias when there's another uh agent or a human that looks at the explain uh uh looks at what the AI explained and then figures out there's some sort of bias in the decision, right? They could potentially flag it, right? Um and then then that you don't really show it to the user if there's like bias in the destining matering, right? So um uh and like there's another way to think about bias itself. Um you could have potentially feedback loops, okay, right? That like when a user or like a human or another AI flags a particular response, the feedback is sent back and it is sent back to the model and it sort of like fine-tunes it itself, right? Right. Uh so the this is like uh this is called like reinforcement learning based on human feedback, right? So that's pretty much what like uh most of these AI systems are now trying to uh build. Yeah.

SPEAKER_01 19:56

I want to ask you to maybe tell on yourself a little bit. Or maybe not, maybe it's telling on someone else. I don't know. Um but can you think of a moment where an algorithm produced an insight that was technically correct but uh contextually it was wrong? And what and i i I guess where I'm going with that is is there something that you can point to that just gave you an insight about um uh human and machine collaboration and why it's so critical? Because you could get the right answer, but delivering it in the wrong context could be really a really bad thing. I'll give you a real example that probably people re would remember. It was a number of years ago, um, there was some capability that was driving some patient engagement kind of activities. Uh, the father got alerted because his daughter had purchased a pregnancy test at a pharmacy. Um, right information, contact was not so good. It would have been better. It's handled a little bit more sensitively. So this yeah, this young woman didn't get direct communication that went to her carrots. Uh, no opportunity for any kind of um healthy exchange in a way that would have made the girl feel safe having a conversation. Um it's kind of nuanced and it's definitely about the human relationship part of it. But that's kind of what I'm thinking about. And are there things that you can you can think of where you've where you've seen that, okay, right answer, but yeah, yeah.

SPEAKER_03 21:26

Uh I don't know if you've used Chat GPT nowadays, right? Like uh what it does is it sort of acts like you know, more so personal, right? Like it talks to you and like it tries to understand what you are and like has a short-term of sort of memory stored about you, right? Like and and every answer, right, like every answer that uh model gives, it ties it back to that, uh, like ties it back to what you are, right? Right. So maybe I ask a scientific question, right? Like, what happens if uh you consume so and so during pregnancy, right? It might be, it might be that like the AI answers it like based on textual facts, but then actually telling it to the, you know, uh to the person asking the question, that that might not be the best approach, right? Like it might be insensitive, or we don't really know, right? Like the answer could potentially cause like negative feelings or negative impact, we which we don't want as like the people who design AI. We want people to have like a positive impact on the world. And like that's pretty much why uh you know we want to design AI, right? Uh and we when we do design these systems, we want to bring in that human uh feel or human touch as well. Like when we cannot potentially rely on AI to bring in that human touch, we need a human itself to actually, you know, uh give the answers to the person, right?

SPEAKER_01 22:51

Yeah, that's an it's such an important thing to remember. I think the the realization that I had um had I was talking to an emergency room physician uh uh a couple years ago, and one of the things he brought up to me was even the way that we do analytics uh for quality recording, uh, whether it's for CMS or you know any regulatory agency, um, things are based on averages. And one of the things that we have to be keenly aware of is you're not going to find an individual that looks like the average that comes from this mountain of data that people are looking at to create these the metrics. Um, in a really risky example that I hadn't really considered, uh if you're an emergency room doctor and you see me without knowing anything about my history, and you see that I've got a high blood pressure, or you know, but you could see my blood sugar seems outside of a normal range to you based on averaging, if you don't know what is normal for me, you might overreact and prescribe something that actually can cause an advert, you know, an unintended adverse reaction. And I think you know what what we're talking about is a really, really important area that we want to make sure that people are mindful of. Um, you know, if you've got a podium and we're we're talking to however many people are going to be listen listening to the podcast, what would you tell people who are designing AI and designing these models, large language models, that they should be thinking about? Are there some some two or three things that you would tell them that they should really be doubling down on now while they're designing that'll help to prevent them from getting into situations where that contextual awareness is just it's not adequate anymore?

SPEAKER_03 24:42

Yeah. I mean, um the first thing is like when you use a model, right? Like you first need to understand like, you know, where and how it was trained, right? Like what data was it trained upon? Maybe you use a model, right, like uh that was trained on like a data set specific to a location, like US, but then the doctors are in Europe or it might not really work because like the demographics change, like the uh change, the diagnosis change a lot of different things. Right. So when you design a system, you first need to understand like what model to use and like what data has it been trained upon. Like when you decide to use that model, um, you also need to ensure that you use it for the right purpose it was intended to do. Right. Like if you're using a general purpose LLM, um, you should really understand that it was just designed for general purpose and not really uh something specific to uh, you know, designing a medical system, medical AI system, right? Uh so data is important. And the context you bring in um should be as much thorough as possible, right? Like it should bring in all possible factors. Like I said, like the four different factors about the human, about the institution, about like uh what sort of problem we're solving, and if there's a human that we could defer the, defer or delegate the responsibility to, and like what humans are available, right? Like so it depends on as much as context as possible. Right. Um and again, as much good data as possible, right?

SPEAKER_01 26:19

So you're not recommending people design without humans in the loop for just everything?

SPEAKER_03 26:23

Not right now. Uh I do uh I mean, I think some of the tasks, right? Like it depends on the risk each task has. Diagnosis has certain level of risk versus something like surgery or like making the decision on the spot whether to, you know, make a make an insertion like five millimeter or three millimeter. That is something like really, really uh needs a human in the loop, right? Um so it depends on the risk. And then like when they do design and design these AI systems, the models that you use needs to also incorporate like the explainability and transparency. And also uh it and the third thing is it needs to have good e-val system, good testing mechanisms, and guardrails in place. Like guardrails are pretty much the most important thing uh when it comes to like extremely regulated compliant uh compliance required uh I mean required fields like healthcare, right?

SPEAKER_01 27:16

Right. So is we're wrapping up. Um if if folks wanted to get in touch with you, um how how can they find you?

SPEAKER_03 27:24

Um uh they could uh find me on LinkedIn. Um yeah. And potentially I I could uh yeah, LinkedIn is where I'm most active. And I could I would respond via emails as well. And it's my first name, last name, and Gmail.

SPEAKER_01 27:38

Well, I'll make sure that we put everything in in the show notes. So if if you're you're listening, you want to reach out to Kashrimon, uh we'll make that really easy for you. Kashraman, it's been such a pleasure to have you on the Signal Room and great to meet you here. I'm excited to continue having some great conversations the next couple days while we're here in Las Vegas.

SPEAKER_03 27:55

Of course. Thank you very, very much. Yeah, thank you, thank you so much, Chris. Thank you.

unknown 28:00

Bye-bye.

SPEAKER_00 28:00

That's it for this episode of the Signal Room. If today's conversation sparks something in you, an idea, a challenge, or a perspective worth amplifying, I'd love to hear from you. Message me on LinkedIn or visit SignaroomPodcast.com to explore being a guest on an upcoming episode. Until next time, stay tuned, stay curious, and stay human.

Chris Hutchins

Host

Keshavan Seshadri

Guest