
Dirty White Coat
Mel Herbert, MD, and the creators of EM:RAP, UCMAX, CorePendium, and the collaborators on "The Pitt" and many of the most influential medical education series present a new free podcast: “Dirty White Coat.” Join us twice a month as we dive into all things medicine—from AI to venture capital, long COVID to ketamine, RFK Jr. to Ozempic, and so much more. Created by doctors for clinicians of all levels and anyone interested in medicine, this show delivers expert insights, engaging discussions, and the humor we all desperately need more of!
Dirty White Coat
The Dangers and Delights of AI search and "The Pitt"
We're exploring the integration of AI search within Corpendium and discussing the delicate balance between powerful search capabilities and maintaining medical accuracy. Our team dives deep into the technical challenges and ethical considerations of implementing AI in a trusted medical reference platform.
• AI search struggles with staying confined to just Corpendium's content despite explicit instructions
• RAG (Retrieval Augmented Generation) allows AI to search internal databases while maintaining natural language understanding
• The tension between sensitivity and specificity mirrors clinical decision-making in emergency medicine
• Vector space embeddings help AI understand semantic relationships between medical terms beyond simple keyword matching
• Citations and references are crucial for verifying AI-generated information against human-authored content
• Traditional search still has value, especially in offline modes where large language models aren't available
• Expert human judgment remains essential as AI can make dangerous mistakes despite sounding authoritative
• Editorial teams benefit from AI by automating formatting tasks while focusing human expertise on clinical content
• The system will launch in beta with user feedback mechanisms to continuously improve accuracy
• AI is most valuable as a tool for experts rather than a replacement for medical education and training
AI doesn't replace the learning that medical professionals go through. It's exceptionally helpful in the hands of experts, but always scrutinize what it tells you.
Hey, people with a coat, mel Hibbert here. We've been out for a while but we're getting back to it. Before we get started, a couple of words about the Pit. You've been watching the Pit. The Pit's pretty amazing. It's gotten such great reviews and it's so intense and no spoilers. But boy, it gets pretty intense and it's real. Right, it's real.
Speaker 2:A lot of stress, a lot of ptsd and I've said on a lot of programs for those of you that are clinicians watching this, nurses, docs and you've got that ptsd feeling, please go, get help, please go and get help. There's a lot of stuff that works. We've talked about it on Dirty White Coat here. We've talked about things like rapid alternating currents and movements and eye movements and ketamine and psilocybin and there's a lot of stuff, because the job is not normal and just watch the pit, if you can, to remind yourself of the ridiculous nature of the work. It's just crazy. I'm super excited to be. You know, I was a consultant in the first season. Now I'm actually in the writer's room for season two hanging out with Joe Sachs and Noah Wiley and Scott Gemmel and all the rest. It's an amazing experience. It is amazing, and the plan, of course, is to make season two even better.
Speaker 2:I'm not sure how you can do that, because this has been quite remarkable, but we're going through a lot of cases and story arcs, but the idea continuously from everybody there is how do we tell the real story of the docs and nurses that work in the emergency department, of the patients, of the stressors, and what it looks like in 2025, in 2026.
Speaker 2:Not what it looked like 30 years ago, but what does it look like now? What does it actually mean? So, if you're wondering if the people that make this show actually care, they care so deeply about the work that they're doing, to represent the work that you do, the real work, the work that's actually in the real world on real patients but they're getting to tell this dramatization of that and it's been incredibly well received because you know why the work that you do really does matter. And now let's have a discussion about search and AI in Corpendium, and this has much bigger implications and we're going to be talking more about this, about where AI and antigenic AI is going in emergency medicine and medicine in general, as we continue to this discussion on the code. That is quite filthy, stuart. Last night on the pit yeah, there was a shout out to MRAP.
Speaker 3:MRAP. Yeah, it's pretty awesome. Do you have a clip? Do you have a clip?
Speaker 2:of it that you can post. Yeah, I'm going to. I haven't ripped it yet. I'm going to rip it and put it in there.
Speaker 3:Rip it and put it in there.
Speaker 2:I'm going to eventually watch it, Dude last night I was in tears, I was fucking PTSD-ing. Yeah, I don't like that. It was pretty bad and I knew it and I've seen it and I made all the cases with it. I'm still losing my crap. Here's what we're doing. So we've got Corpendium and you've got these LLMs and there's lots of different ones. And, and you've got these LLMs and there's lots of different ones, and what we're trying to do is say, hey, LLM, large language model, use corpendium and only corpendium to do a search for us and then give us the citations and for some reason, it doesn't want to do that. So I've got here Matthew and Stuart with me, who are really working the problem, but one of the issues is that it desperately wants to go use other shit. Is that what you've been finding? It wants to go out of core penum. Even though you say you use core penum, it's like no, I want to look over here.
Speaker 3:Which one do you want first? Do you want like the layperson's explanation?
Speaker 1:or do you want?
Speaker 3:Matthew's explanation. Just tell us which one you want. I mean, from my point of view, this is my understanding okay, is that you can? It has to do, based on all of what it knows, based on the internet. That's the nature of what the AI is, that's the nature of what an LLM is, and so if you ask it to limit itself completely just to what's in corpendium, it would be illiterate, it would be unable to do it, and so what you have to do is you have to allow it to be informed by all of the things that it learned in its internet searches, especially things like synonyms, like heart means cardiac, et cetera. Right, and then, when you introduce that intelligence, you can get these fantastic answers that are generated from it. The only problem is that sometimes I don't know what percent we're finding maybe 5% of the time it's given you just garbage, because it's just desperate to give an answer.
Speaker 2:All right, so now Matthew, who is our tech guy? He's a techie. Give us the technical explanation. And what's the name of that? I've forgotten the name of when you ask an LLM just to do your internal data, or whatever it is.
Speaker 1:It's RUG or something I can't remember that RUG, yeah, rug, so RUG, yeah, retrieval, augmented Generation. So one of the challenges is that with RAG is a way of taking the internal information we have, let's say, with all of Corpendium's information right, the structured data, the structured information we have, you know, in chapters and sections and providing that to in the system with the LLM where we're telling it. We want you to reference primarily this Corpendium data, pull from that data set whenever we have a question that we're asking you. And so one of the challenges there is how we provide that data to the llm. To work in concert with the llm in this rag model is that we need to chunk that data and there's a whole system for how this works. So we first take all the Corpendium data and we chunk that data into sections that can then be turned into what we call embeddings representations that we can then search and find similarities between these numerical representations, between different concepts, between data concepts. So we've got to take all that data from Corpendium, chunk them into chapters or sections, turn them into these numerical representations that live inside of this vector space. And vector space is kind of like it's like a three-dimensional space where you can see how close terms like cardiac and heart are together, versus just someone typing in a keyword that says heart, right, heart's only going to find heart in the keyword search. But in a semantic vector search you type in cardiac and it will return heart because they are close together in that numerical space.
Speaker 1:Part of one of the challenges where the llM will revert to more of the information that it has is when we can't provide enough data from Corpendium that matches the incoming search term. So the incoming search term or, if it's a question, right, what happens on that side is that also gets turned into an embedding, which gets turned into this numerical format, to an embedding, which gets turned into this numerical format. And we try to pair the incoming search query numerical format with what we have in this Corpendium embeddings vector database and we try to match those two. And then we provide the matched information from Corpendium along with the prompt to the LLM and and say use this data in your response. So sometimes the llm, I think, wants to provide such a now robust answer. Maybe it doesn't have enough data that it's being provided, so it will reference its own data, more so perhaps than than data maybe we do or don't have in corpendium or we're able to provide it with that sort of thing.
Speaker 2:So obviously this is an issue that everybody is going to have. It's not unique to us. Are there some of these large language models that are better than others that we talked to? I can't remember who you were talking to about it. Oh, it was actually a group that was working with us externally saying that they change models every week, like their underwear, to try and find the best one. Are you doing the same? What's sort of the industry doing?
Speaker 3:Do you mean in terms of changing?
Speaker 1:our underwear every week. Do you change your underwear every week?
Speaker 3:I'll let Matt answer that.
Speaker 1:Yeah, we're not currently. We found a specific model that seems to be working quite well for bringing specific model that seems to be working quite well for bringing you know, putting the textual information between the pieces of data that we're providing from Corpendium together and then providing that in a summary to the user. But it's definitely something we could look at in the future and should look at as these models continue to improve and costs continue to come down.
Speaker 3:So, mel, you know this whole issue practically. It's something that's really emergency. Physicians are very familiar with it. It's basically sensitivity and specificity, right, you're basically. You know we're struggling.
Speaker 3:On the one hand we want a model that is really really accurate to what is in Corpendium. We don't want it going off script at all. But if you go strictly with that model you're going to get tons of frustrating answers to queries. You know I can't find this, I can't find that, and you know it's just not going to go that extra step that we expect for it to help you to get your answer right. And then if you go to the other extreme and you dial it to, you know, to let it run a little wild in terms of using its outside information, then you run the risk of what I think some people might call hallucination, but really is something of just an effort on the part of the computer to fill in missing gaps.
Speaker 3:It's not like it's. You know, it's not. It's not a psychiatric condition on the part of the computer, just the way these things work. And so there is no perfect setting right. You're always going to get a little bit frustrated because it's going to say stuff isn't there, on the one hand, or run the risk of having to be, you know, a little more skeptical of the answer to make sure that you're going to have to double check that everything is there, and so we're really struggling with where to lie on that scale.
Speaker 2:So that's a really good analogy. You know, for every patient, you see, well, you see it could be a PE. So I could just dial up the sensitivity and just scan everybody, and then there'll be lots of downstream effects of that. Or I can like I only want to scan those people who are clearly dying, and it's clearly a.
Speaker 3:PE. Yeah, absolutely, that's exactly. And you know you can imagine, mel, that in our community there are users that are going to want it to be stricter and they're not going to want AI to play any role in making anything up on its own. So, residency directors, people that are starting off learning, we're really really concerned about that. On the other hand, very experienced practitioners when they get something, they're asking a lot of questions and they get something that seems a little bit sus.
Speaker 3:It's okay because they're seasoned practitioners and they're they know, no, no, no, that's, that's totally it misrepresented, that that's off, and then it just, you know, and and the other thing that I've noticed is that when you do get an answer that's off, it's almost always just a matter of rephrasing the question or making it a little more specific to fix it, and that's something that we all need to know. And so at some point we have to. You know, at some point this has been really we've been consternating and consternating together. We meet about this every week and, you know, should we launch, and at some point we're just going to have to say, look, we're going to have to release this thing and at the same time, we have to educate ourselves on how to use it properly, because otherwise, you know, we'll never end up going anywhere. We have to become AI literate, so the whole world is learning this as we go.
Speaker 2:It is magical when it works the answers. It takes a little while, it's got to think and it's got to look, but the answers are often quite magical. But, matthew, what Stuart's really been trying to work on is like hey, now embed the reference so that there can be a second check. It trying to work on is like hey, now embed the reference so that there can be a second check. It says you know rabies, you should use imnoglobin of this dose. Now I want to click on that reference and I want to find where Sean Nort and his team of humans actually wrote that down. Is that difficult? It seems difficult because I'm looking at you guys testing and sometimes it's again choosing the wrong thing, even though it comes up with the right answer.
Speaker 1:Yeah, and there's actually a case that we came across recently that I think really is a good representation of some of this. Stuart, I'm referring to the Durkin test, right? So that term Durkin test we found doesn't actually exist in the corpus of information, but it was coming back. When you type in Durkin, what is the Durkin test, it would come back and it would give you an AI search summary result saying, hey, this is from Corpendium.
Speaker 1:So what's happening in this vector space that we're talking about is the term Durkin test, when we create this numerical embedding right, is very close to actually the term carpal tunnel compression test, which is what this is. So what will happen is it will find a match in our data because it understands the distance is between Durkin test and carpal compression. Carpal tunnel compression test is very small, so that's that's what it returns, so it would give you a whole summary around the carpal tunnel compression test when you asked about Durkin. So these are some of the challenges we've been having, where we're saying, hey, this data word for word or even character for character, isn't represented, but when we go through the whole search process, the search system would say, yes, it is.
Speaker 2:That's a great example of sort of the magic of this, and we're going to have to get used to this tension, because I didn't know what the Durkin test is. If you had asked me to search through corpidium, I never would have found carpal tunnel compression. So it is a genius in some ways like that, that it would even find that.
Speaker 1:Right, exactly, and of course, then you know throughout all of our testing. The question is are there cases where something's being returned, where between that threshold where we'd say it shouldn't return something, it shouldn't return, you know, b for A or it's really it's determining A for A right and it shouldn't return B for A, and what's that threshold and how do we want to set that and how does that look over time and making sure we get users to be working with this and giving feedback and so we can set those thresholds appropriately?
Speaker 2:So tell us about those, the references now that we're trying to pick out as that sort of secondary check that you can do. How is that going? I think you're on version 857 of this thing right now.
Speaker 1:Very important that we've got citations that people can look at, this summary that's returned from our AI search and then be able to click there and go and read exactly about that information. What we do is we, as a part of each of these embeddings, we break down the chapter information into these chunks, right? We also store. We store the embedding, the numerical value as well as the chapter information, as well as the metadata, which is this reference information that allows people to be taken back to that chapter section where we say this information came from. So that's what we're providing at this point and we're pulling in for any search. We're pulling in up to 15 chapters of information you know around any and then using that for the most relevant 15 chapters of information and using that for any specific search, giving those citations and continually checking. We have humans in the loop. Actually, we're doing this every single day, making sure that we're being pointed back to relevant sections of information that are listed for any search in the citation.
Speaker 3:Well, I mean, yeah, exactly, I was just going to say that what we're experiencing in general, just to take a step back is that the information is cheap and anyone can type anything into Google. They can you know pretty much, I think at this point even ask for a summary of studies and what the studies conclusions were. You can, you can do all that kind of stuff. The coin of the realm is really what our experts have to say, and that's what's becoming so much more important. And so the other thing, mel, that we're really trying to get to is when we craft statements, like when we all get together and say, hey, what are we going to say about controlling the heart rate in aortic dissection? How are you going to say this in a way that's most helpful to our practitioners and doesn't corner them? You know, we really craft a statement that's helpful.
Speaker 3:The last thing in the world we want is for an AI engine to rephrase it or to mess with it or to make its own statement, and so we're also really focusing on the ability to have sacrosanct text boxes that the AI can identify. As hey look, the core P team has already sat down and discussed how to deal with the situation where a patient has both heart failure and thyroid disease, and this is the way that they've decided they're going to present it. There's going to be all kinds of variations of this all across the internet, and lots of engines can put this stuff together, but people want to know what their editorial team has to say about it, of EM practitioners that are their trusted source, and so that's really such a struggle, because what AI is so good at is making words and making statements, and we don't want it to make the most sensitive statements. We want those to be ours.
Speaker 2:Just jive on that. You can go to any chat GPT right now and ask it medical questions and it will give you some pretty good answers. But it has no subtlety there and there is no human necessarily behind it and most of medicine we are unsure what the right answer is, and so what you really want is not a computer that says here is the answer, but an expert with experience to say I don't know the answer, but this is what we do right now, given all the evidence. So we thought that AI would sort of be the end of Corpendium, but it's actually just the beginning of it, because the humans now, you realize, are more important than ever in a world where people can just put this stuff in. So are you using this now as a feedback loop? I did these searches, or your team did these searches. It didn't quite come up with the right answer because we didn't quite put it in the text the right way, and you're rewriting chapters because of that.
Speaker 3:Yes, that's the answer.
Speaker 3:The short answer is yes, but it's a manual process and it involves a lot of people, and so we can't sustain that.
Speaker 3:And as the usership goes up and more and more people are writing in with their comments and suggestions and possible corrections, we're going to have to have some sort of an AI integrated approach to this, where it's sort of feeding us information like, hey, five users have said they've had trouble finding this piece of information, or take issue with this piece of information and feed it up to us as editors and say, hey, listen, you've got to address this, it has to be changed.
Speaker 3:And so what's really exciting, mel and we've talked so much about the user end of this we haven't even touched on the editorial aspects of it, which is that as editors, we spend so much of our time, even as an editor-in-chief, grammar, commas formatting, just lots of stuff like that. Ai can make all that easier for us, and so we're hoping that the editor's time will be much more just coming up with these special statements that I'm telling you about weighing in on controversies, helping users resolve their issues you know in conundra that they come into and much less with just worrying about the commas and the text and the formatting and all that stuff, and so that's what I'm hoping is going to happen after we get an adjustment period here.
Speaker 1:Also add into that to the loop to for editors to make critical changes that we need to then publish to go back through. You know, the embedding process to then show up in these searches is basically near immediate after we publish. So that's really the great part about this is we can receive feedback. We can respond to feedback, especially in these early stages. That really helps us to refine how all of this is delivering value to users.
Speaker 2:Okay, so what if I say I don't trust it or I'm a new user? I'm just not sure if I'll be able to pick up when it's making a mistake. Can I go back to the old search and what is that old search based on?
Speaker 3:What's it based on? Matthew, tell us what's it based on. I'll tell you. I can't figure out what it's based on with some of the results that we got out of that thing, but maybe you can tell us yeah, so the old search also has this semantic capability, right?
Speaker 1:So you're still going to get results that are returned, and when I say results, I mean the old search will give you direct links to chapters and sections, that and other and other types of content. You know we have images and media that we think are related to your search term that you put in. So, instead of giving you a summary of a result that maybe is directionally actionable, it will point you to those areas in Corpendium. That, then, would give you a broader scope of that information you might be looking for.
Speaker 2:Does all search at this point involve some form of in air quotes AI? This has always been a mystery to me how Google can possibly do what it does, and so for me it's artificial intelligence. Are we just living in a world where it's just where you define the line is? Or is this other search just stupider and you're like I told it to know the difference between heart and cardiac and that's actually the same thing? Or is it even basic search now, using some form of LLM in the universe out there?
Speaker 1:Yeah, so artificial intelligence is this very large umbrella which includes machine learning and then things like LLMs, and there's many, many. You know it's artificial intelligence really. You could date it back to sometimes in the 60s or 70s, if you want to define it as such. So searches, you know you can still have today what is known as a keyword search, so just literally searching for direct keyword matches. But most things, including what you're going to find with Google, it's using kind of these semantic capabilities where it's determining what your intention is behind your search and then it's going through and it's pulling back these similarities within the semantic space right To provide you with the most relevant results of what it believes you're looking for.
Speaker 2:We've used this example a few times, but I think it's a really important one. I think it was Mike Weinstock was using a different search and he asked what's the best muscle relaxant in pregnancy? And it came back with rocuronium, which is technically true. It is incredibly good muscle relaxant in pregnancy, but it's also paralytic and will kill you. So I think what we're all learning is we still have to go to med school or nursing school or wherever it is, and we still have to learn this information. It is a great tool, but it can make profound mistakes. Have you had any funny errors of recent that are similar to the rock uranium case, or are you getting it so good that it doesn't do that kind of mistake anymore?
Speaker 3:You think that's funny, mel. I mean it's so scary to me. I mean you know someone to give rock eronium to a pregnant patient? I mean not in the context of intubation, I don't think so. And so I mean, for the time being, I mean we're going to release this on beta for sure. I mean there's no question about that and we want people to have the ability to turn it off, and I completely get that, and the risk is real, like you just mentioned, the risk is real.
Speaker 3:I want people to think about it for the time being as just an advanced search, just a way to get you to the material in Corpendium. I'm really emphasizing that and everything after that is beta to me, and I do believe that when you look at the trajectory of how fast we've progressed on this in just a few months, I'm pretty sure we're going to get to the point where it's 99 point something percent reliable in terms of the AI answers, with the cross-checking and with the verifability. Is that a word? Did I just make up a word? The verifiability. We're going to get there and that's the time when we would take off the label. But I think, just for the time being, I think everyone should just think about this as a really really good search to get to the content in CorPee. Not a thing to answer your clinical questions just yet.
Speaker 2:All right. So I've got a question for Matthew, because people almost certainly have no idea of how complicated this is, because we have web-based search, we have iOS search, android search and we have offline search. So, matthew, how do they all differ? Can you give us a quick summary, particularly the offline mode of Corpendium? These large language models do not live on that phone, they live out there in the universe. So if I'm offline, what kind of search can I expect?
Speaker 1:to get Essentially different types of searches have to be well-tuned to the hardware they're going to be running on. So there are different search algorithms that we use for offline mode which can't provide you this type of AI summary that we're getting on these LLMs because they require massive amounts of compute infrastructure to do that kind of what we call inference right. So you will get a kind of a basic experience out of the search on offline mode and then, as you move up to the semantic search we have when you're online and, of course, then this AI search summary we're talking about rolling out here, which gives you which partners our corpy data with that LLM, to give you that summary response.
Speaker 2:When you release this and I won't ask you when, because we keep moving the goalposts here but will people be able to give feedback internally? What we can do is I can do a search and say, tell me about measles, and it comes back as chicken pox and I can say, hey fools, this is wrong. Are we going to release that ability for people to do, or is that going to be so much information? Your heads are going to explode.
Speaker 3:Yeah, Matt, we're going to let everyone give feedback, of course, except for you. We're going to block it. We're going to block it. We're going to block your channel. We just, you know, we got to draw the line somewhere.
Speaker 2:Yeah, we just yeah. I say rude things when it gets things no, no, I think you did.
Speaker 1:I think you gave that. You absolutely uh want to get user feedback and uh create user feedback, and it's so important to to um really delivering and helping to continue to shape this product in a way that people are going to get maximum value from it it just just what, what, what our what matthew has to put up with?
Speaker 3:uh, just so everyone knows, is it like you know, mel, and on a bender, like in the? You know, in the middle of the night you wake up and you're like, okay, I got to ask, I got to ask the AI a bunch of test questions, right? And then you just go on and on and on and you're giving feedback and then in the morning, um, you know, miranda and Matthew have this inbox full of, like you know, a hundred responses from Mel and myself, basically, you know, with a million different contradictory pieces of feedback saying, no, I checked it this way, I checked it this way, I got that, I got that. And I'm like, oh my God, I'm so sorry.
Speaker 2:I am impressed with how quickly it's improved that. Even a month ago it was like, oh boy, this is a real problem. So now it's like, oh, this is really good, we got to release this soon. It's getting that good, but it's still not perfect and, as we said, we'll never be perfect, but I'm itching to get people to start playing with it. I think that's coming soon-ish. Any final statements, any words of caution for people whenever we release this or whenever you're using any search like this, when you're using these RAG things, that now you understand that it has to. If you want it to be magic like an LLM, it has to use LLM content. So any other words of caution for?
Speaker 3:the world. I mean, I would say that it's you know, this is the most incredible genius tool that we've ever had. But like every human genius, the AI has its faults, and namely it lies, it cheatsats and it steals. It does all the things that it learned from us, and so that's going to always be there at the back of the, at the back of my mind.
Speaker 1:Yeah, you made a really great point, mel, about you know this doesn't replace this, doesn't replace the learning that we go through to get to this point, right. And so I found that AI is just exceptionally helpful in the hands of of experts, right, where it's giving you back like, let's say, an example for this AI search summary, where it's giving you back this summary and it's like, wow, I know immediately that's the right answer, or it triggers you. It might trigger something in your mind that you didn't quite recall. You look at it and you say, oh, that's absolutely the right answer and that's a genius answer that I might not have necessarily come up with myself. But I would say, yeah, it's very important to scrutinize, always scrutinize.
Speaker 3:I love the way you said that. No, I was going to say I love the way that you said that, matthew, and what I was thinking was there's a reason why you can't take a high schooler and put a medical textbook in front of them and expect them to execute on that material in the same way that you need to be an expert to use this type of a system. That's an assumption that we make. It's just not for lay use.
Speaker 2:So I want to thank you. Gentlemen, now get back to work, because we want to get this out to the people and we're really very excited about it. We want to get this out to the people and we're really very excited about it. But again, it all comes back to the humans, and Stuart has, I think, 700 humans that are working on this little project and if you haven't seen it recently, it continues to get better. So it's just one aspect, but it's sort of the most exciting thing right now. There's so much in there. Now we've got a much better way to find it.