Mystery AI Hype Theater 3000
Mystery AI Hype Theater 3000
A Bad Case of Hype-itis, 2026.02.02
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Move over Dr. Google, Dr. ChatGPT is here, and it's even worse as a medical intervention! Alex and Emily scrub in to slice up some harmful new nonsense in the world of "AI" for medicine. What's the cure for an expensive and inaccessible health care system? One thing's for sure — it's not AI hype.
References:
Also referenced:
- "No, I don't want an AI scribe to write my pulmonologist’s note"
- "The Danger of Intimate Algorithms"
- MAIHT3k Episode 62: The Robo-Therapist Will See You Now (with Maggie Harrison Dupré)
Fresh AI Hell:
- Waymo files vague NHTSA report on crash that killed KitKat (See also: NHTSA Standing General Order on Crash Reporting)
- "Medical Schools Use AI Patients to Help With Clinical Training"
- "What If Your Coffee Mug Knew Your Next Move? AI Researchers Made It Happen"
- "Monkeys are on the loose in St. Louis and AI is complicating efforts to capture them"
- Using LLMs to "infer race, ethnicity"
- Tech CEOs hate ridicule as praxis!
Check out future streams on Twitch. Meanwhile, send us any AI Hell you see.
Our merch store is now live on the DAIR website!
Find our book, The AI Con, here.
Subscribe to our newsletter via Buttondown.
Follow us!
Emily
- Bluesky: emilymbender.bsky.social
- Mastodon: dair-community.social/@EmilyMBender
Alex
- Bluesky: alexhanna.bsky.social
- Mastodon: dair-community.social/@alex
- Twitter: @alexhanna
Music by Toby Menon.
Artwork by Naomi Pleasure-Park.
Production by Ozzy Llinas Goodman.
Alex Hanna: Welcome everyone, to Mystery AI Hype Theater 3000, where we seek catharsis in this age of AI hype. We find the worst of it and pop it with the sharpest needles we can find.
Emily M. Bender: Along the way, we learn to always read the footnotes, and each time we think we've reached peak AI hype, the summit of Bullshit Mountain, we discover there's worse to come. I'm Emily M. Bender, a professor of linguistics at the University of Washington.
Alex Hanna: And I'm Alex Hanna, director of research for the Distributed AI Research Institute. This is episode 71, which we're recording on February 2nd of 2026.
Emily M. Bender: Before we start the show, merch, merch! I'm so excited. We wanted to let you know that our online shop is officially open. You can find it at store.dair-institute.org, or just click DAIR store from the homepage of the DAIR website.
Alex Hanna: We've got shirts, mugs, and tote bags with our logo and some favorite reoccurring catchphrases. Of course, there's a ridicule as praxis and a Fresh AI Hell one, plus a few others. All with Naomi Pleasure-Park's wonderful graphic design.
Emily M. Bender: Again, you can find it all at store.dair-institute.org. And we'll be adding that link to the show notes going forward. So please, next time you are watching a live stream, you can toast us in your new Fresh AI Hell mug, for example. I'll be doing that as soon as mine arrives. Now, onto the details about this episode.
Alex Hanna: So, it's a new year, and we're already seeing some horrifying new forms of AI hype in the world of medicine.
Emily M. Bender: Companies like OpenAI are encouraging users to ask their bullshit generators medical questions, and even when you see a human doctor, it's increasingly likely that they're being pushed to use LLMs in their work.
Alex Hanna: We've talked about chatbots and mental health not too long ago, but it's been a while since we've done a deep dive on AI hype and healthcare generally. So let's get into it!
Emily M. Bender: All right, here we go. For the first artifact, it's a bad one. New York Times. This is an opinion piece, a guest essay from January 19th, 2026. And the headline, which we cannot blame the author for, 'cause we know that comes in separately, so we can blame the New York Times for this one, is "Stop Worrying And Let AI Save Your Life." Or as we like to say, when it's spelled that way with a dots, Aiii!
Alex Hanna: Well no, it says, "Let AI Help Save Your Life."
Emily M. Bender: Oh, "help" save your life. Sorry, I missed an important word.
Alex Hanna: So it's not, the AIs, or the Aiii!s, are not doing the CPR.
Emily M. Bender: No. So one of the things that's making me saddest about this is the byline. So it's by someone named Dr. Robert Wachter, who is the chair of the Department of Medicine at UCSF. And so, you know, people who really should know better have fallen for this. And this isn't terribly long, so I think we're gonna actually have to do it by paragraph. I'll start us off. "We physicians have a long tradition of the curbside consult, when we bump into specialists or more seasoned colleagues in the hospital cafeteria and ask for their advice on a vexing clinical case. Over my 35 years of practice, I used to track down other doctors for a couple of curbsides during morning rounds each day. These days, I'm getting far more curbsides, but they are not with colleagues. They are with AI. Sometimes I consult with ChatGPT. Other times I turn to OpenEvidence, a specialized tool for physicians. I find AI's input is virtually always useful. These tools provide immediate and comprehensive answers to complex questions far more effectively than a traditional textbook or a Google search. And they're available 24/7."
Alex Hanna: Oof.
Emily M. Bender: So, thoughts, Alex?
Alex Hanna: Yeah, this is, I mean, it's pretty bad. I mean, it's also pretty sad if you're not doing curbsides with other doctors in your clinic. Just to chat, and say what are you doing, I'm seeing more of this lately, you know, the kind of relational aspect of being in the workplace and that kind of collaboration. And so then doing more, and then turning to both of these tools, ChatGPT and then OpenEvidence. I don't know if we've mentioned OpenEvidence on the show, but I saw some statistic that a survey said something like 45% of physicians were consulting OpenEvidence. And I thought that was really flooring how high that percentage was. And so this seems like really a weird kind of approach to medical practice. And it's very, very worrying that this is something that would be something that someone would do, you know, in a professional- to start that there. Maybe someone who is concerned about their different medical conditions and have no place to turn. But someone at UCSF? I've been to UCSF! That's very alarming.
Emily M. Bender: That's extremely alarming. Absolutely. And the fact that it's like, yeah, I'll just try any one of these. And apparently, no evidence-based practice of checking, how well does this work? And also how does it impact the work of physicians when they're using it? But just, oh, it's shiny and it's virtually always useful. It tells me something that looks good. All right, so we have to have the caveat here that comes next. "To be clear, Aiii! isn't perfect. For my curbside consults, the answers are not as nuanced as the ones I'd hear from my favorite hematologist or nephrologist. On rare occasions, they're just plain wrong, which is why I review them carefully before acting."
Alex Hanna: Yeah, I mean this is the critical paragraph that occurs in, I mean, this is common in the New York Times. It's common in any kind of things where blah- you know, AI or LLMs or whatever- are not perfect, but I use them anyways. And then sometimes they're just wrong, which is, I mean, at least he admits it. And then "review them carefully," and I'm like, I mean, you should be doing that with all information. But at least from someone like your neighborhood hematologist, you would actually know it's coming from them, and there's some accountability if they give you some just wild advice, you know.
Emily M. Bender: Yeah. Yeah, absolutely. And accountability is key to what's coming in the next paragraph here. But also, you know, the whole thing about automation bias, and if you review the output, are you also reviewing the things that you didn't get to because it didn't come out as output? It's just not good practice. So, having admitted imperfections, he continues, "Some people argue that Aiii!'s imperfections mean that we shouldn't use the technology in high stakes fields like medicine, or that it should be tightly regulated before we do." Sorry, excuse me, I want all medical devices to be tightly regulated.
Alex Hanna: Yes. Right.
Emily M. Bender: "But the biggest mistake now would be to overly restrict Aiii! tools that could improve care by setting an impossibly high bar, one far higher than one we set for ourselves as doctors. Aiii! doesn't have to be perfect to be better. It just has to be better." So you were talking about accountability, right, Alex?
Alex Hanna: Yeah, yeah.
Emily M. Bender: The point isn't that the answers are unreliable, it's that there's no accountability for the answers.
Alex Hanna: Yeah. I mean, this is, also the main thrust of the article, which is basically like, well, we can't let the perfect be the enemy of the good, effectively. I mean, this is the kind of, the argument of the entire piece. And I mean, the language is kind of funny because he's like, well, "one far higher than one we set for ourselves as doctors." I'm like, okay, but this is an apples to oranges comparison. Like doctors, you know, have their own ethics and their own ethos that they accord to as professional practice. And part of that is, of course, do no harm. And part of that is to really try to do the best for your patient. There's no quote unquote "impulse" to that, and there's no ethic of that in a chatbot to do such a thing. And despite what OpenAI says, which we'll get to in a moment, you know, that's not what the impulse is. And I mean, that said, yes doctors have, and medicine as a practice has so many issues, and there's much to take to task of doctors, but at least there's someone you can point to and say, this is the person who did this misdiagnosis, or might have been focusing on the wrong things. Or whatnot. And putting that all into a single interface, really misses the point of doctoring and misses the point, I think, of consults. And I mean, also, who's considered to be an expert here, and what kind of tools do you use to engage in expertise?
Emily M. Bender: Yeah. All right. I got something from the chat that I'm going to brave this username to lift up. So the username is Zubenelgenubi17: "'I ask questions about things I don't know, but please be assured I'm 100% able to tell if the answer is correct.'" Yeah, exactly.
Alex Hanna: Right, right. Mm-hmm, yeah. So, "Many people- patients, clinicians and policy makers- are dissatisfied with the current state of healthcare. American medicine delivers miracles every day, but the system itself is a mess, chaos wrapped in mind-boggling paperwork and absurdly high prices. It's in desperate need of transformation." Sure, agreed. But chatbots are not gonna do that. But then the argument is: "AI can support this transformation, but only if we stop disproportionately focusing on rare bad outcomes, as we do with new technologies. While research now demonstrates that driverless cars are now safer than those with human drivers-" Not true.
Emily M. Bender: Yeah. False.
Alex Hanna: Also false. "A serious accident involving a robotaxi is deemed highly newsworthy and often cited as a reason to take driverless cars off the road, whereas one accident involving a human driver may hardly leave a media ripple." So this is, of course, a false equivalence. You know, there is research that's basically like- I mean, there is research that's saying that you actually can't make that comparison just by miles driven in driverless cars. And the kind of reporting in driverless cars is also- and also, I shouldn't call them driverless cars, I call 'em automated vehicles- it's also still coming in. In certain places, they're doing quite, fantastically bad, for other companies, there's just not enough data. So, I mean, it's, your premise is failing here.
Emily M. Bender: Exactly. And you know, in some ways it's the same thing. Like, yes, there's a real problem, right? Our roads could be safer, they could be more pedestrian friendly, they could be more bike friendly. We could certainly lower the number of deaths of people in, you know, accidents involving single occupancy vehicles, let's say. But that doesn't mean that robotaxis are the solution. And similarly, yes, American healthcare is in desperate need of transformation, but that doesn't mean that chatbots have anything to do with that, right? And it's like, he just keeps digging himself deeper here. So the next bit, "Aiii! based mental health assistants are being subjected to similar scrutiny." They better the hell be subjected to scrutiny. Sorry. "A handful of tragic cases involving harmful mental health chatbot responses have made national headlines, spurring several states to enact restrictions on these tools." Good. Right? This is a highly regulated field. It should be regulated. But you know, this dude doesn't seem to think so. "These cases are troubling and demand scrutiny and guardrails, but it's worth remembering that millions of people are now able to receive counseling via bots when a human therapist is impossible to find or impossibly costly. That's not counseling! Right?
Alex Hanna: Yeah. It's, yeah, there's so much frustrating in here. And I mean, especially in the medical health scenario, I mean, you can refer to our episode with Maggie Harrison Dupré, in which we talked about AI psychosis and the kinds of ways in which people are really affected by these particular- I mean, these kinds of outputs are really driving people to do awful things. I mean, yeah, we should focus on those high profile cases, because those are the tip of the iceberg there as well. And it's just, ah, I mean, it's super frustrating. There's some good things in the chat I wanna raise up. Blend3rman says, "I feel like accountability to a large extent drives excellence. If you aren't accountable for your results, why would you strive to improve them?" Which is a great point. And sjaylett says, "I submit that focusing on common bad outcomes would be a sensible place to start with modern medicine-" like at all- "but it would not remotely involve Aiii!" You know? And I mean, there's so many different outcomes that we can talk about rife in medicine. I mean, let's start with the Black maternal mortality rate in America, which is absolute dog shit, you know? And that's, you know, if you look at that, which is itself just a terrible outcome, and one that is represented across class. If you haven't watched, I forgot which documentary with Serena Williams, she talks about her birthing process and just talking about like, how she almost died. I mean, start there, from providers that are not using these automated tools, and yeah, treat them as these serious cases of what they are.
Emily M. Bender: Yeah, absolutely. All right, I've got a lot to say about this next paragraph, too. "At UCSF Medical Center, where I work, many of our physicians now use Aiii! scribes that, with the patient's permission, quote, 'listen' to doctor patient conversations and automatically create summaries of the appointment. Aiii! can also quickly review and summarize patients' medical records, a huge boon when one in five patients has a record longer than Moby Dick. In both cases, the Aiii! isn't flawless, but it can outperform our previous system, which had physicians working as glorified data entry clerks." So there's so much in here, but the first thing I wanna share is that I was talking to a physician at another university medical center who's involved with the interpreter services at that center. It's a place that serves a population with many different linguistic backgrounds, so they frequently need interpreters. And he is very upset about the push to use the so-called AI scribe system, not least because of issues of consent. So in this paragraph it says "with patient's permission," but the consent practices actually don't make clear where the recording of the conversation is going, how long it's gonna be stored, whether or not it's being used for training. So that was one point, and another point that was completely heartbreaking to me was he described the interpreters hearing the physicians start basically talking to the computer. So they switched register, and they start using stuff that they wanted transcribed in the middle of the doctor patient visit. And the interpreter is like, I don't think I can interpret that. Are you talking to the patient? Or what's going on here? And it really raised the issue of, you put this technology in, and there's this sort of imagination of how it's gonna function- it's just in the background, it's just ambient listening. But in fact, it changes the way people behave quite a bit.
Alex Hanna: Yeah, that's a really good point. And the kind of behavioral change with automation, I think, is a huge point. And I think that it's certainly one that gets underappreciated. I mean, there's reporting by Garance Burke and someone else from AP from a while ago, that talks about Whisper and the kind of quote unquote "hallucinations," rather than "making shit up," in particular patient transcripts. I know there's work by Roxana Daneshjou, who we talked about, and the kind of work that she's done on the racist tropes that get echoed by LLMs. And I would say also, I've been talking to a lot of nurses, and this is the case where I think there might be like a physician nurse divide a bit, but I don't talk to physicians as much as I talk to nurses. But in nursing practice, a lot of things with ambient scribe, automated charting, many nurses are very much against this, including nurses from National Nurses United. One, you know, because of the privacy issue, it's not just about patient consent, it's also whether the workers want to use them too. Because it then becomes a mechanism for workplace surveillance. So that's one of them. Also, charting is not just quote unquote "glorified data entry." I mean, charting is part of nursing practice. It's to see patterns, it's to understand whether there's some abrupt changes, it's to recognize that there's- it's a way of quantifying what is that kind of skilled nursing care.
Emily M. Bender: Yeah, it's absolutely part of the care, when nurses do it, when physicians do it. And there's a beautiful essay by Aliaa Barakat in Stat News, with the headline, "No, I don't want an AI scribe to write my pulmonologist's note." The physician written note is invaluable. And when this guy says glorified data entry clerks, first of all, way to shit on other people's work, right? And so yes, if you have too much paperwork to do, if the medical records are too long to actually get the information that you want out, then probably the system needs transformation, for sure. But maybe that transformation could be something better than Aiii!, right?
Alex Hanna: Yeah, absolutely. Absolutely. So this next thing says, "As AI becomes more commonplace in healthcare, we need to develop strategies to determine how much to trust it. As we measure error rates and harms from AI, we need frameworks to make apples to apples comparisons between what human doctors do on their own today and what AI enabled healthcare does tomorrow. In these early days, we should favor a quote 'walk before you run' strategy, starting with using AI to handle administrative paperwork tasks before focusing all our energies on higher stake tasks like diagnosis and treatment." Ugh, I'm gonna read this next one too. "But as we consider the full range of areas in which AI can make a positive impact and design strategies to mitigate its flaws, delaying the full adoption of medical AI until some mythical state of perfection is achieved, will be unreasonable and counterproductive."
Emily M. Bender: So, didn't he start this whole thing by saying he was using it for these curbside consults, which is diagnosis and treatment information? Right?
Alex Hanna: Yeah. He's already, it's informing his practice already on those elements, right?
Emily M. Bender: And it's also, I'm, still stunned by the fact that this fellow who thinks he's talking about medicine, is using the one that's supposedly medicine specific and ChatGPT interchangeably. That's alarming, right?
Alex Hanna: Yeah. It's quite alarming. Yeah. I would like to also back up and just talk about the error rates and harms from AI, because I mean, first off, we just don't have robust evaluations, even if you're not considering the quote unquote "human in the loop" element or the collaborative element of it. Even the sort of evaluation of chatbots is, and especially in the health domain, is moving around so much. I got to read a great draft paper from some folks, I won't name them because I think it's under review right now, but it was a comparison of different benchmarks. And they were talking about benchmarks used in industry and basically, it echoed a result that Bernie Koch and Remi Denton, Jacob Foster, and I found basically that there's a pretty high concentration in usage in a very small set of benchmarks. And I think even in the health domain, this was like even very narrow. And it typically were benchmarks that were developed by the companies themselves, by the vendor. And so where's, you know, there's no institutional robustness there. And then the next part of it is just like, "the apples to apples comparison between what human doctors do and what AI enable healthcare does." Like how do you compare that? I'm trying to understand what that evaluation even looks like in practice. Because in my view, if you're talking about actually going and talking to other people, as in the workplace doing these curbside consults, and doing consults with ChatGPT, it seems like there's externalities in the curbside consults that are being done that are nowhere comparable to what you'd get from talking to ChatGPT or OpenEvidence.
Emily M. Bender: Yeah, absolutely. And then there's one further layer, which is, let's say we even had good evaluation methodology. Let's say that we solved this problem that you're raising. The products that they're talking about are proprietary closed products. So you may evaluate one and decide it's good enough for whatever use case, and then it changes underneath you. This is just, yeah. I have to say, there's something that's been bugging me about curbside consult all along. Why curb? Is the other doctor about to drive home? Is that what's happening?
Alex Hanna: Maybe it's like, I'm trying to get outta here.
Emily M. Bender: Yeah. All right. So he's got this this positive use case scenario: "Imagine a world in which a young woman with vision problems and numbness visits her doctor. An Aiii! scribe captures, synthesizes, and documents the patient physician conversation, a diagnostic Aiii! suggests a diagnosis of multiple sclerosis, and a treatment Aiii! recommends a therapy based on her symptoms, test results, and the latest research findings. The doctor would be able to spend more time focusing on confirming the diagnosis and treatment plan, comforting the patient, answering her questions, and coordinating her care. Based on my experience with these tools, I can tell you that this world is within reach." So the doctor is then, in this scenario, "confirming the diagnosis," so, you know, not just believing what came out of the AI, I guess. But then mostly just being the interface between the person and the AI system. Is that what he's imagining?
Alex Hanna: Yeah, I suppose the kind of grunt work here is the sort of, doing the diagnostic and trying to figure this out. And then the treatment is suggesting a course of action. And I mean, there's, this is like an ideal scenario, basically, of doing kind of a matching of this and then basically developing this. And I think there's a lot of different parts of this which unsettle me. I mean, thinking about that the diagnosis and treatment itself is not something that is interactive with the patient and, you know, there's a really authoritativeness to it as well. And I worry about this a little bit too, because one our fellows at DAIR, and- Yeah, sjaylett says, "Did the doctor even listen to the patient?" And I think this is spot on because Crystal at DAIR is a former data worker, and she talks about being a professional patient, and a lot of it is about even the kinds of things of being responding to her body and that kind of embodied knowledge of that. And I think that's like, I could see how this goes quite wrong with certain types of people, people with disabilities, people with certain types of ailments that have been a bit poo-pooed. People with long COVID, I could see a real kind of inequality raising from that area. So I worry about that quite a lot. Not like that won't happen in traditional doctoral practice. But it, I think that automation bias really adds another layer to it, to really cement what the doctor's doing, and to really not listen to the patient.
Emily M. Bender: So it goes from the doctor saying, "I know your body better than you do," to "This machine knows your body better than you do." Yeah. And this whole paragraph just reads to me like the doctor saying, "I really wish that I had omniscience about what was wrong with my patients-" an understandable wish. "Here's a machine that I can imagine is doing that. And then I could work with complete confidence because I've got this magical machine that's telling me exactly what's wrong and what our best, latest research tells us is the right direction to go." And I think, as always, you can see the real need in here that leads people to turn to these systems. And it's just like, you really wish that someone who's the chair of the Department of Medicine at UCSF would have more skepticism.
Alex Hanna: Yeah, exactly.
Emily M. Bender: All right. Should we wrap this one up?
Alex Hanna: Yeah. Let me read the last two, and then we'll finish this and we'll get onto the next thing. "I'm not arguing that we shouldn't aspire to perfection or that our AI in healthcare should receive a free pass from regulators. AI designed to act autonomously, without clinical supervision, should be closely vetted for accuracy." I mean, I don't think that should exist at all, no matter how much the accuracy element is. "The same goes for AI that may be integrated into machines like CT scanners, insulin pumps, and surgical robots, areas in which a mistake can be catastrophic, and a physician's ability to validate the results is limited. We need to ensure patients are firmly, fully informed and can consent to AI developers' intended use of their personal information." So this is the first area where information and data security is mentioned. "For patient facing AI tools in high stakes settings such as diagnosis and psychotherapy, we also need sensible regulations to ensure accuracy and effectiveness. But as the saying goes, 'Don't compare me to the Almighty; compare me to the alternative.' In healthcare, the alternative is a system that fails too many patients, costs too much and frustrates everyone it touches. AI won't fix all of that, but it's already fixing some of it, and that's worth celebrating." Oof. Yes.
Emily M. Bender: Yikes. Yeah.
Alex Hanna: Yeah. Very bad.
Emily M. Bender: Yeah. So why, I mean, this is back to that same, "Oh, well it's better than nothing," right? It's like, well, why was the alternative nothing? Why can't we actually get our act together and kick the for-profit health insurers, and also the private equity out of healthcare, and then redirect those resources? Create a workplace where the providers have more time to spend with patients, because they're not asked to do so much. Create a space where, patients feel heard and actually can advocate for themselves effectively. There's lots that could be done. The problems are real. But, you know, "Yes, let's just skim off more data for the AI companies," is not gonna help.
Alex Hanna: Yeah. And that's really- you are absolutely right. What are the conditions in which have led American healthcare to be so bad? Then there's also other things, like the high stakes scenarios that have been considered. CT scanners, insulin pumps, and surgical robots- this is where the combining of different types of pattern recognition in synthetic media generation are mixed up. So CT scanners may be, you might have some automated detection of, or bounding box drawing around, particular types of things. Insulin pumps, I don't know enough about insulin pumps or surgical robots, and I'm assuming an insulin pump probably has some kind of a threshold in which, you know, like insulin-
Emily M. Bender: It's not asking ChatGPT for how much insulin to dispense.
Alex Hanna: Exactly. It's saying-
Emily M. Bender: I mean, it better not be, right?
Alex Hanna: It's probably, you know, blood sugar go, glucose go down, insulin go in. I'm assuming that's a pretty easy- but again, I don't know these machines very well. I'm assuming they're a bit more complicated with that. I actually wanna point folks to a great essay from Laura Forlano. And then surgical robots, but those aren't large language models. But yeah, I mean, the whole thing really stinks and everybody in the chat are like, "This last paragraph, really awful stuff."
Emily M. Bender: Yeah. And snake_eyes_lv nails it. "I'm too busy to do my job, so this should do my job for me, and I'll just write bad opinions for the New York Times."
Alex Hanna: Yeah. Yes. The opinions are very bad, and I'm dropping, I'm gonna drop this in the chat. This is a piece by Laura Forlano, which is called "The Danger of Intimate Algorithms." And Laura Forlano is a professor of design. And she is talking about being woken up night after night from alerts from her insulin pump, and in which it is completely, just like, doing ridiculous things. It's supposed to have this monitoring, but it's just alerting her about her insulin, you know, so, in doing this thing, is really messing up her life. So imagine that being, you know, slapping the LLM on that, and imagine just how wrong that will go, too.
Emily M. Bender: Yeah. Yikes. All right, well, meanwhile, OpenAI is all over it. This is from January 7th, 2026. This is marketing copy on their website. So we're not gonna see other ads, it's just one big ad. And the headline is "Introducing ChatGPT Health," subhead, "A dedicated experience in ChatGPT designed for health and wellness." And then there's a button, "Join the wait list."
Alex Hanna: Gosh, yes.
Emily M. Bender: Okay. So: "We're introducing ChatGPT Health, a dedicated experience that securely brings your health information and ChatGPT's intelligence together to help you feel more informed, prepared, and confident navigating your health." We're not gonna get through this whole thing if I keep stopping this much, but first of all, ChatGPT's intelligence is bullshit, right? Doesn't have intelligence. But also, their goal is not better health outcomes, it's "to help you feel more informed, prepared and confident," which is like, okay, but that's not actually what's needed, right? So, okay. "Health is already one of the most common ways people use ChatGPT, with hundreds of millions of people asking health and wellness questions each week." Yikes. "ChatGPT Health builds on the strong privacy, security, and data controls across ChatGPT with additional layered protections designed specifically for health, including purpose-built encryption and isolation to keep health conversations protected and compartmentalized. You can securely connect medical records and wellness apps to ground conversations in your own health information, so responses are more relevant and useful to you. Designed in close collaboration with physicians, ChatGPT Health helps people take a more active role in understanding and managing their health and wellness, while supporting and not replacing care from clinicians."
Alex Hanna: Ugh, there's so much here. I mean, there's, you know, "Health is already one of the most common ways people use ChatGPT," and I'm like, that sounds bad. And sjaylett captures my thoughts very well here, which is, "'Every day millions of people ask ChatGPT for support for their health. Instead of asking ourselves whether that was a good idea, we jumped on the opportunity to draw investor graphs that go up and to the right." I'm like, yes. So it's effectively, how can we monetize this in different ways? The other part about privacy, security, and data controls. I mean, that's one thing that's bullshit. 'Cause if OpenAI gets subpoenaed, which they do all the time, you know, there's no way that conversation is gonna be exempt from that. They say nothing in this piece about HIPAA. That is not mentioned once. And then there's, you know, they're using- it doesn't matter what encryption is involved, like, I don't want you to have that data to begin with.
Emily M. Bender: Yeah, exactly. And they say "keeping health conversations protected and compartmentalized," which means all of the other conversations aren't.
Alex Hanna: I think they say, they talk about it more, 'cause they basically, it's like, you can have a view on your data that is only health based and is not talking about other things. Yeah. And then the last bit about collaboration with physicians, I wanna get into that specifically when they mention a specific number.
Emily M. Bender: Yes. I'm- should we just go down to that, or do we wanna keep going a little bit more first?
Alex Hanna: Well, let's- here. I do want to jump into this. "ChatGPT Health builds on this-" which is the 230 million people globally asking health and wellness related questions every week, which is alarming. And so then they say, "ChatGPT Health builds on this, so responses are informed by your health information and context. You can now securely connect medical records and wellness apps such as Apple Health, Function, and MyFitnessPal." And let me tell you, when I read this, I forgot that I still had a MyFitnessPal account, and I deleted it immediately. So, "ChatGPT can help you understand recent test results, prepare for appointments with your doctor, get advice on how to approach your diet and workout routine, or understand the tradeoffs of different insurance options based on your healthcare patterns."
Emily M. Bender: That's an interesting set of things. So, "understand recent test results." No thank you, right? And isn't there- oh, I forget if it was Gemini or something else, but something was basically taking "positive," as in the value is positive, and turning it into "positive" as good or vice versa on something. It's like, this was never going to be the kind of thing where you would want a friendly, approachable sounding, authoritative sounding source getting in between you and the information. But then, "prepare for your appointments with your doctor," so people are gonna come in really convinced about something 'cause ChatGPT told them so. And then "get advice on how to approach your diet and workout routine," so, you can go ahead and get that advice directly if it's just diet and workout. We're being careful to say you're not gonna get medical advice directly. And then "understand the tradeoffs of different insurance options based on your healthcare patterns." Again, I wouldn't trust it. And also, that's sort of an interesting thing to add to this list. It feels like some set of interns just sat down and brainstormed a bit.
Alex Hanna: Yeah, I mean, I think these are also things that certain insurers do have, and I think that's one of the things that I think they're trying to preempt. Which is that certain insurers are recommending some of this. I know there's a tool that the co-employer, that DAIR uses. And I've tried it just to try- it's not a chatbot, but it basically is like, if you have these kinds of costs. And so I think they're trying to effectively preempt that in a certain kind of way. Which is still a really weird thing. And I mean, you know, insurance is very complicated. And again, this is a point where American healthcare is a nightmare, but in trying to navigate insurance itself, shouldn't have a chat bot on it. But it's, you know, gosh dang it. That's not a great thing. And I think the thing I worry about the most on this is the "prepare for appointments with your doctor." And I'd be very curious too, on- yeah, and I'm seeing magidin has been saying they've been dealing- and this is a response to Blend3rman says, "I'm sure doctors already have to deal with self ChatGPT diagnosed patients." And then magidin says, "Yeah, they've been dealing with WebMD and Dr. Google for ages now." And that kind of idea where it's like, what kind of new... I don't wanna say psychosis, is here, but what new kind of like paranoias are going to be induced if you have people that are feeding information into this, and what is the additional layer that's going to be added onto this? So I think that's probably, maybe a more pernicious aspect of that. And I think the patients that have had, I know patients that have had particular sort of health needs really tend to, I think, get better advice when they're in community with each other trying to navigate all this. And then putting a tech tool on it, I think, is really just gonna lead people down some really weird places.
Emily M. Bender: Totally. And I think that there's a difference between "I looked at my symptoms and I'm worried I might have whatever," and "I gave ChatGPT all my health data and ChatGPT tells me that I have," right? And then, this point about community is really, really important, because yeah, if you are Googling things and finding links on the web, first of all, you might land on a forum, which could be useful, but also you, I think, are more likely to turn to someone to help understand the documents that you find, and at least be talking to another person, where the chat interface, I think, cuts that off quite a bit. And, you know, no one's- someone might say, "Yes, I'll look at that article you found in the NIH database," or, "I'll look at that WebMD page with you," but "I'm gonna look at your ChatGPT transcript with you"? I don't think so. You know?
Alex Hanna: I also think about the tendency, I know that others have called it sycophancy, but, you know, effectively thinking about a kind of way in which someone might interact with a chatbot on health. If they're like, "Hey, ChatGPT, I've got X, Y, and Z, you know, what might it be?" And then it's like, "Well, it might be, you know, lupus or MS or blah, blah, blah." And they're like, "Oh, do you think I have lupus or something?" It's like, "Sure, you have lupus!" Like, I'm imagining the kind of confirmation it does for, you know- if we're talking in the first part of this pod on confirmation bias on the part of doctors, imagine the confirmation bias that's happening on the part of patients.
Emily M. Bender: Yeah, absolutely. Especially with systems that are designed to be confirming like that. So I just wanna take the first little bit of this paragraph, and then I think we should drop down to the part where they worked with doctors, 'cause it's appalling. But, "Health-" which is the name of their product- "is designed to support, not replace, medical care. It is not intended for diagnosis or treatment. Instead, it helps you navigate everyday questions and understand patterns over time," blah, blah, blah. So this is them basically saying, we do not need to be regulated like a medical provider, or a medical instrument, medical technology. This is outside of that. And I was at an event with Deb Raji last week, and she made the really important point that these companies are very good at positioning their products just outside what's actually being regulated. And we need regulators to get on that, stat, because this is not okay. But here's the blah, blah, blah about their privacy, but I wanna get to this thing, "Built with physicians." Do you wanna do the honors?
Alex Hanna: Yeah, yeah. No, thank you, I appreciate it. I was also looking at this, like I was searching just to ensure that HIPAA is not mentioned. I'm pretty sure it's not though.
Emily M. Bender: Right, HIPAA's not mentioned because they're not medical providers, so they're not subject to it, right? And people are voluntarily giving their data, so, fine. It's bad.
Alex Hanna: Yeah. So this part: "Built with physicians. ChatGPT Health was developed in close collaboration with physicians around the world to provide clear and useful health information. Over two years, we've worked with more than 260 physicians who have practiced in 60 countries in dozens of specialties to understand what makes an answer to a health question harmful or potentially- helpful or potentially harmful.
Emily M. Bender: Harmful or harmful, why not?
Alex Hanna: Yeah. Harmful or harmful, which, you know, pick your poison. They're both poison. "This group has now provided feedback on model outputs over 600,000 times, across 30 areas of focus. This collaboration has shaped not just what Health can do, but how it responds, how urgently to encourage follow ups with the clinician, how to communicate clearly without oversimplifying, and how to provide safety in moments that matter." So this is just like, this is incredible. Do you wanna say something about it?
Emily M. Bender: Yeah. I mean, so they present this as, we've designed this in collaboration with physicians, when in fact what they've done is they've put out some human intelligence tasks, right? 600,000 times. That's annotation work. They did not actually talk to the physicians. They did not say, "What would it be like for a patient to be having this conversation, or for you to be having a conversation with a patient after they've interacted with this?" They said, "Here's this scenario, how do you annotate? Here's this scenario, how do you annotate?" And so on.
Alex Hanna: Yeah, exactly. And it's, I'm actually looking at details, too, because it looks like it was part of their benchmark that they developed called HealthBench. Which are these simulated conversations, I'm assuming, and I'm assuming that they're- I'm actually wanting to dig into this a little bit more, 'cause I hadn't read this before. So there's- 'cause the numbers are pretty much identical. So it's like, 260 to, what does it say? I have to go to the bottom of this. But it's like, effectively the same stat. So it's like they're saying the same-
Emily M. Bender: This is HealthBench.
Alex Hanna: Yeah, they're saying the same number of countries. I'm looking at the original paper, 'cause I'm very curious on like, what the interface was. So 5,000 realistic conversations- tell me about the data! This is my gremlin place. Oh, tell me where you're burying this, in section three. Here we go. So, tasks and health plans were created by a group of 262 physicians over 11 months. This is in their hosted paper. And then, they talk about this group- all right, tell me how you recruited them. "We vetted them through a multi-step process. They expressed interest-" how? How, motherfuckers, tell me! It's just like, I'm sure- was this on a platform? Submitted an interest form... I'm like really wanting to dig into it, 'cause I'm actually quite curious. But they're being very mum on the thing. So I'm assuming they basically had some kind of, effectively a crowdworking type task. And one of the things that is very upsetting about this is that they're saying that they're building this with physicians as if physicians are a real collaborator, as if they're actually designing this with a way of actually intervening in mind. Not as if, hey, make some money to supplement your income. And, you know, especially in countries where physicians may not be getting paid as much as the high profile physicians do in the US. And to me, it's a type of deprofessionalization, with that veneer of expertise that the word physician gives off. And that to me, I think, is very, very pernicious in what they're doing here, and very dishonest.
Emily M. Bender: Yeah. And it's such a contrast to the other doctor complaining about- or, the doctor, the other author complaining about charting as being glorified data entry clerk. And here OpenAI is like, "Well, we're gonna take these physicians and basically commodify their expertise and claim it." And it's kind of hilarious to me that the 600,000 times, they think that's a good stat to put into this marketing copy here. 'Cause it so gives away the game.
Alex Hanna: Yeah, yeah. I mean that's the sort of thing, and on the face of it, if you're not familiar with the crowdworking elements of it, you might be like, "Oh, wow! They must have a huge team that's actually doing this." But in fact it's what, in a paper from Mona Sloan and a few other folks- "Participation is not a design fix for machine learning." So Mona Sloan, Manny Moss, Olaitan Awomolo, and Laura Forlano, who I mentioned again- Laura's getting a lot of love this episode. And they have these three models of participation. And the first one is participation as work, and crowdwork is given as the example here. So, this is participation as work and it is not participation, on the other hand, as justice. Which is at the other end of the spectrum. So you're not actually, you know, it is a kind of de-skilling of the expertise, and effectively saying, well, we have these interactions, therefore it's validated.
Emily M. Bender: Yeah. So I'm gonna read the last sentence here, and then take us into our transition to Fresh AI Hell.
Alex Hanna: Yes, let's do it.
Emily M. Bender: OpenAI closes with, "The result-" of their evaluation driven approach, supposedly- "is support that people can trust, always designed to support, not replace your healthcare providers." So trust it, but not too much, which is absurd. All right, Alex, the only idea I have for the Fresh AI Hell transition is bleak. Are you ready for it?
Alex Hanna: Bleak?
Emily M. Bender: Yes.
Alex Hanna: Oh, I thought that was like- I thought it was gonna be like, "Bleak- and, go!"
Emily M. Bender: No, no, no.
Alex Hanna: But go ahead. Yeah.
Emily M. Bender: So, you are a doctor, someone who actually had a whole career treating patients face to face. And now the only work that you can find is behind this gig-ified interface where you are not entirely sure if you're training a system or actually answering real questions from real patients. Go.
Alex Hanna: That's very bleak, 'cause that's like, what people are actually doing.
Emily M. Bender: I know.
Alex Hanna: I don't even know how to replicate that. And I guess, I mean, I guess I would say like, I'd wake up and I'd make my coffee, and I'd go, all right, time to answer questions on healthbench.ai. And then I put my doctor's, I put my jacket on just to- I don't wanna do this! This is very upsetting. This is, I think this is very, this is what people are actually doing now. And I think this is really happening in so many different scenarios where I think the kind of AI, you know, we mentioned this last episode, but the AI trainer as a job is like, these listings are now just, they're not even imagination. It's very upsetting stuff.
Emily M. Bender: Yeah. I'm sorry. Next time I'll make you a demon again. That'll be more fun.
Alex Hanna: It's okay. You can use my weird New York, faux New York accent.
Emily M. Bender: Yeah, all right. Anyway, we are now in Fresh AI Hell, and you get this first one. This is definitely an Alex one.
Alex Hanna: Yeah, this is the ongoing saga of KitKat, who is a cat deemed the King of 16th Street in San Francisco who was killed by a Waymo. This is a skeet from John Berry, who is aniccia, A-N-I-C-C-I-A, .bsky.social, who is a really interesting figure, like, I don't know who this person is, but like, only tweets about waymo stuff. But he's saying, so, "Waymo's NHTSA crash report-" and they have to log these reports by, you know, federally mandated- "narrative for the death of KitKat doesn't state whether the robot detected the cat or the woman who tried to save the cat, i.e. appears to omit slash obscure critical details and probably doesn't comply with reporting requirements. Also states impact as five miles per hour, which may be inaccurate." And then there's smaller text here, which describes the incident. And then he says- and this is their report, if you wanna see this, it's in the larger data file, which I can drop in the chat. And yes, hashtag justice for KitKat. But his follow-up is interesting 'cause he says, "It reads like mostly written by an observer or a viewer of the Randa's Market video." And there was, there's actually video of the scene which was only released after Waymo's initial report, in which Waymo basically said that the cat sprinted in front of the tire, which did not happen. And then he continues, "never properly slash clearly stating where their robot detected it or why it acted, e.g., quote, 'The cat moved farther under the Waymo AV-'" and this is what they say in the report. "May be true, but Waymo has already publicly stated that their robot cannot detect quote 'under AV.' And it goes on and on. So, I mean, this is kind of like the ongoing thing about Waymo safety. It's like, detecting things under the AV rather than in front of it. And also the ways in which there's, there is a federal mandate by, and I don't know if NHTSA lives under the federal DOT, but the way in which they're just gracing, as Deb said, you know, what the actual regulation allows.
Emily M. Bender: Yeah. Oof. Well, thanks for that update, John Berry. Here's one that's on topic for today. This is in something called pymnts.com, I dunno what that is. Doesn't sound very reliable. "Medical schools use AI patients to help with clinical training," January 22nd, 2026. "Medical schools and teaching hospitals in the UK and US are increasingly using artificial intelligence generated patients to train future doctors. The move trains students in communication, diagnosis, and clinical reasoning, shifting medical education away from episodic, resource intensive simulations and toward continuous, software driven practice." And this just sounds like such a terrible idea. Like, I would hope that medical training involves real case studies, maybe role play, if you wanted to like, practice talking with a patient. Plus also, you know, actually being with actual patients supervised by the attending. But I can just picture that someone's like, "Oh, cool, we can make the fake patients ethnically diverse!" Right?
Alex Hanna: Yeah, I mean, the fake patient thing is very bizarre as it is. Like you have, there's like the medical actor element of it, and Dylan Mulvin in his book Proxies has a really interesting piece on this. But yeah, using the LLMs on it, I'm just like, okay, do you wanna take a stereotype of fake patients and then make it worse?
Emily M. Bender: Let's automate the racism. Weee!
Alex Hanna: Yeah, yeah. It's just, it's pretty bad.
Emily M. Bender: All right. You want this one? Just the headline's hilarious.
Alex Hanna: Yeah. So this is, "Robot coffee cups, self-driving trivets? AI researchers made it happen." And this is at CNET. "Scientists found a way to animate everyday objects and predict your next move, so your stapler is always nearby when you need it." And this is by Jon Reed, published December 29th, 2025. What is this about? Can you scroll down? Oh, so there's little, like, motorized wheels on this. And there's a pencil tray, a stapler, and a coffee mug mounted on platforms so they can move at the command of an AI system, which is very funny. "Picture this: you're making cookies for a holiday get together and things have gotten hectic in the kitchen. You've opened the oven door, donned the oven mitts, and grabbed the hot metal tray of warm snickerdoodles. You turn around to place them on the counter, and- whoops, you forgot to prepare something for the tray to rest on!" Happens all the time. Now this is my editorializing. "As you weigh your options, you notice that some trivets have started to move out of their storage space on the counter. They're rolling on their own right into place." Very Beauty and the Beast coded. "It seems like magic-" oh, they say- "like something out of Beauty and the Beast, but it's one possible vision of your future kitchen, according to researchers at Carnegie Mellon University." You know, not, like, LLMs per se- actually kind of funny.
Emily M. Bender: But totally luxury surveillance.
Alex Hanna: Well, yeah. If they have cameras.
Emily M. Bender: Well, because that's to make it work. It's like, we're gonna video what you're doing, and Zubenelgenubi17 says, "Controversial opinion, but I actually want objects to be in the location I last left them."
Alex Hanna: Yeah, that's true. This is like an AuDHD, like an autistic ADHD nightmare, now that I think of it like. It put something-
Emily M. Bender: And, you have to store all these things on flat surfaces so they can roll to you, also.
Alex Hanna: That's true, yeah. Unless you have, well, unless you get like, wall climbing things.
Emily M. Bender: Next!
Alex Hanna: It's, that's, yeah. Anyways, let's move on. Yeah, your turn.
Emily M. Bender: Okay. So this is January 12th, 2026 in the AP by Heather Hollingsworth. "Monkeys are on the loose in St. Louis and AI is complicating efforts to capture them." And basically, there were reports of vervet monkeys in St. Louis, on the loose. And then people started having fun and making up fake pictures of monkeys in various places in St. Louis, apparently. And then it made it much harder to forget what's going on. And I did a little bit of searching this morning to see what's up. And apparently they've given up on trying to find these monkeys.
Alex Hanna: Oh no. Geez.
Emily M. Bender: If they were even real in the first place. Probably, I don't think this started as a hoax. At least I haven't seen anyone saying that. But it's one of these cases where something happens, we're trying to deal with it, and the pollution of the information ecosystem's just making it worse.
Alex Hanna: Yeah. Lord. Terrible. All right, next, this is from- I saw this, this was quite annoying. And I kind of hate this, 'cause I actually like John Holbein, he's a political scientist and I think he's a political methodologist. So this is a skeet, and he's talking about a paper. And the paper, let's go into the paper first.
Emily M. Bender: Okay.
Alex Hanna: So this is the title of the paper. And I don't know if this is a preprint and, you know, this slightly goes against our kind of internal rule of not really going after grad students, but it was pretty rough. So this is by, I don't know if this is a preprint, but it's from someone named Nicole- not Nicole, Noah Dasanaike. And the title is- sorry, I butchered that name- "Large language models naively recover ethnicity from individual records." It's effectively suggesting, there's a method that was developed by, I believe Rand Corporation, that does estimation of ethnicity from names. Basically by- which already is pretty questionable. And it does this from geocoding. So they integrate, I believe, American Community Survey data and a few other types of metrics. And then this author, Noah, assesses LLM classification of names. And then there's a comparison in this case of Gemini and GLM, which I'm assuming is like an open source model, or an open weights model, and then the Bayesian method that was developed by Rand. And is doing it in two locations, Florida and North Carolina. And then it says that it's basically doing better, in sort of the assessment here. And I'm assuming they have, it's ground truth data. And so this is-
Emily M. Bender: I wonder what the gold standard is for this, but yeah.
Alex Hanna: I'm assuming it's probably from- I'm making a big assumption. It's probably from American Community Survey or Census microdata, where they actually have ground truth and people have done self-identification. Because in the Census you can effectively do self-identification. And then- so this is pretty bad, and then John is kind of like, celebrating it. And he says, "The paper shows that large language models can substantially improve on existing methods. LLMs infer race, ethnicity, and related ascriptive identities from full names with accuracy that matches or exceeds current best practices. And I, you know, like, this is annoying, and also using estimations is never a good idea.
Emily M. Bender: Why is that a best practice?
Alex Hanna: It's a, yeah. Best practice is to use, like, if you have to use something, you use BSIG. If you're interested in a paper on it, I wrote a paper on this in 2020, which focuses on racial classification in machine learning fairness, and it is called, "Towards a critical race methodology in algorithmic fairness." And it's a paper I wrote with Remi Denton and Andy Smart and Jamila Smith-Loud. So, why it's kind of a bad idea in general.
Emily M. Bender: And I have a paper coming out in Language- I'm involved in a paper coming out in Language. Alicia Beckford Wassink and Kirby Conrod, and a few other people and I, writing about how to talk about and actually conceptualize race and ethnicity in linguistic research. And it is very far away from, yes, you can just ascribe that identity.
Alex Hanna: Yeah, I mean, ascription, I think, is, used more often than one would like. And then, you know- morally bad!
Emily M. Bender: Yeah. I'm gonna leave these last two and just use this one as our chaser, because this is so hilarious. So Ketan Joshi says- this is a post on Bluesky from January 13th, 2026- "Criticism, opposition, and activist pushback against the excessive and unjustified over deployment of genAI matters. If you want proof, here's two ultra rich and unstoppably successful tech CEOs begging us to stop." And then one of them is Jensen Huang, in a piece from January 12th, with the headline, "Jensen Huang is begging you to stop being so negative about AI. Quote, 'It's extremely hurtful, frankly, and I think we've done a lot of damage,' he said."
Alex Hanna: Wow, damn, you're making the stock value of my shovel factory go down.
Emily M. Bender: And then also, similar time I guess, January 5th. Somehow we've got Microsoft's Nadella against the OpenAI logo, and the headline is, " Microsoft's Nadella wants us to stop thinking of AI as quote 'slop,'" by Julie Bort on January 5th, 2026. So this is: "In his classic intellectual style, Nadella wrote on his personal blog that he wants us to stop thinking of AI as slop and start thinking of it as bicycles for the mind."
Alex Hanna: Bicycles for the mind. That's incredible. It reminds me of Andrea Long Chu's old Bluesky or Twitter username, which is Wife of the Mind. But bicycles of the mind, I like that too.
Emily M. Bender: Yeah. So anyway, I guess the ridicule as praxis is landing and we should keep doing it, right?
Alex Hanna: Keep making, yeah. Keep making fun of it. That's it for this week. Our theme song is by Toby Menon. Graphic design by Naomi Pleasure-Park. Production by Ozzy Llinas Goodman. And thanks as always to the Distributed AI Research Institute. If you like this show, you can support us in so many ways. Order The AI Con at thecon.ai or wherever you get your books, or request it at your local library.
Emily M. Bender: But wait, there's more. Rate and review us on your podcast app, subscribe to the Mystery AI Hype Theater 3000 newsletter on Buttondown for more anti hype analysis, or donate to DAIR at dair-institute.org. You can find our merch store there, too. Merch, merch, merch!
Alex Hanna: Merch, merch, merch! Buy the merch.
Emily M. Bender: Buy the merch. That is dair-institute.org. You can find video versions of our podcast episodes on Peertube, and you can watch and comment on the show while it's happening live on our Twitch stream. That's twitch.tv/dair_institute. Again, that's dair_institute. I'm Emily M. Bender.
Alex Hanna: And I'm Alex Hanna. Stay out of AI hell, y'all.