Episode 11.11 When Evidence Misleads Artwork

Thinking About Ob/Gyn

A fresh and evidence-based perspective of all things related to obstetrics and gynecology. Follow us on Instagram @thinkingaboutobgyn or visit thinkingaboutobgyn.com for show notes and more.

All Episodes

Thinking About Ob/Gyn

Episode 11.11 When Evidence Misleads

May 27, 2026 • Antonia Roberts and Howard Herrell • Season 11 • Episode 11

0:00 | 1:04:23

We sit down with Joshua Oommen to get nerdy about clinical reasoning, FDA standards, and why “good evidence” is harder to define than most of us admit. We challenge the reflex to trust p-values and meta-analyses, then test our instincts against real OBGYN examples where the literature has whiplashed practice.
• why the podcast is called Thinking About OBGYN and how clinical reasoning shapes our work
• the NEJM proposal to make one pivotal trial the FDA default and what “confirmatory evidence” might mean
• medical reversal, surrogate endpoints, and how trust erodes when practice changes late
• why Bayesian thinking fits how clinicians interpret tests, trials, and prior beliefs
• how meta-analyses fail through small study effects, publication bias, p-hacking, and heterogeneity
• the amnioinfusion comeback as a case study in applicability and overconfident conclusions
Be sure to check out thinking about obgyn.com for more information and be sure to follow us on Instagram.

0:00 Welcome And Today’s Big Question

3:48 Why “Thinking About OBGYN” Exists

11:54 The NEJM Push For One Trial

16:38 Medical Reversal And Trust Problems

24:43 AI Proteins And CRISPR Pressure Tests

32:33 Bayes Thinking Beyond P Values

36:43 Why Meta-Analyses Often Mislead

41:08 Bias And Heterogeneity Red Flags

46:24 Amnioinfusion And A Meta-Analysis Comeback

1:02:29 Final Warnings And How To Learn

Welcome And Today’s Big Question

SPEAKER_02 0:01

Welcome to Thinking About OBGYN. Today's episode features Howard Harrell and Joshua Omen discussing clinical reasoning and meta-analysis.

SPEAKER_01 0:15

Howard?

SPEAKER_00 0:16

Joshua?

SPEAKER_01 0:17

What are we thinking about on today's episode?

SPEAKER_00 0:20

Well, we're going to do a little cognitive work. So for our listeners, let me introduce my good friend Joshua Omen, who I can now call Dr. Omen because he just graduated medical school.

SPEAKER_01 0:30

It's a surreal feeling to graduate. There are parts of med school that I really liked. There are parts of medical school that are gl I'm glad I'm done. So I'm happy to move on to the next season of training.

SPEAKER_00 0:42

And now you're headed off for an internal medicine residency.

SPEAKER_01 0:45

Yeah, that was a tumultuous path. I considered many specialties before psychiatry, neurology, urology. A little bit of OBGIN, I'm not going to lie, Dr. Harold. But I settled in on internal medicine because I just like the bread and butter.

SPEAKER_00 0:58

Wow. Well, let me tell you a couple of things to our listeners about Joshua and why and how we met, and that'll explain what we're going to talk about a little bit today. So last month at the ACOG Annual Clinical and Scientific Meeting, three or four listeners, people come up to me. They asked the same question, and which was, why is the podcast called Thinking About OEGYN? And I guess that seems like a curious title. It doesn't to me, but I wrote a book called Clinical Reasoning, and it's meant to be a primer for many of the cognitive skills that we do as physicians. And it ranges from how to take a history and physical to how to interpret lab tests and how to make a diagnosis and make differentials and how to interpret the literature. It's really a doctoring book, and it's used as a textbook for that purpose in some med schools. But the working title when I was writing it was called Thinking About Medicine. And around that time, that was when Antony and I started this podcast. And so we called it Thinking About OBJYN. And I was enthusiastic about applying many of the cognitive lessons of that textbook to the subject matter of obstetrics and gynecology. But of course, as it turns out, listeners are more interested in practical things, the review of new literature, and the four tips series and the things we do without evidence and all those sorts of things that we spend most of our time on. So the material in the book, though, is still present for people who've read the book and listened to the podcast, they'll see it sprinkled in, but maybe not as explicitly. It's more subtle. But we do these things when we dissect the literature or talk about the effectiveness of certain new interventions or things like that. So our listeners will recognize that content. Meanwhile, Joshua was going to med school, and at some point he had purchased and read the book, and he hosts his own wonderful podcast called A Good Omen, a play on his name, and it really gets into more of these cognitive and philosophical and educational areas. And he invites guests on there, and he invited me about three years ago, and I've been on a second time since then, and we become friends over that time. And well, he and I enjoy these nerdier philosophical, I guess, cognitive discussions. And so we're going to do that a little bit today. So it'll be a little bit different than our listeners are used to, but we're going to try to make it practical at least to a few topics on OBGYN for OBGYN listeners.

SPEAKER_01 3:05

Well, seriously, be surprised to learn that I regularly tune in to the podcast, despite being someone who's going into IM. And the reason why is because if you truly want to understand how to practice medicine rationally, what we call in our modern era evidence-based medicine, it requires that you put your principles to the test in the trenches of various nooks and crannies of medicine. If a principle of EBM leads you astray in OBGYN, it may be a glimpse of a pitfall in internal medicine or some other surgical subspecialty. We'll work hard to make everything we discuss as relevant as possible to OBGYN, but I think it's important that we recognize that what we discuss applies to all of medicine.

SPEAKER_00 3:47

Yeah.

Why “Thinking About OBGYN” Exists

SPEAKER_00 3:48

Okay, so back in February of this year in the New England Journal of Medicine, there was a not an editorial, a guest-invited piece written by Veney Prasad and Marty McCari, who might be, as we speak, on his way out. Headlines today are that Trump's going to fire him, but that's so it goes in the Trump administration. And Prasad's been in and out a couple of times. But they published an article in the New England Journal of Medicine called One Pivotal Trial, the New Default Option for FDA Approval, ending the two trial dogma. And of course, our listeners probably know we've discussed that typically two typically placebo controlled trials are required to obtain FDA approval. We talked a lot about that around the controversy of Mechana or 70 hydroxy progesterone for recurrent preterm labor, which was approved with only one trial, which is something the FDA is empowered to do, but they awaited a second confirmatory trial, and of course eventually it came along, along with other data that showed it wasn't effective, and it was withdrawn from the market after nearly a decade. Well, I knew when I read that that it would have to be Joshua and I who would end up discussing this on the podcast. And both of us, I think, probably have our own personal thoughts and connections to Vinay Prasad. We both certainly have been aware of him before we knew each other. And so we I knew we had to talk about this. And we're going to discuss a few related points. And hopefully you guys will find this interesting. And for our longtime listeners, it'll provide more clarity to what we've always tried to do on this podcast. So we can start a little bit about you have your own Prasad thoughts, I have my own Prasad thoughts, but he's certainly been influential. He's quoted in my book. He is, as if folks don't know, he's a professor of epidemiology and biostats at UCSF, University of California, San Francisco, and I believe practices at San Francisco General. Went to University of Chicago, MPH in Johns Hopkins, and he's been this sort of meta-researcher, somebody who studies the process of research itself and focusing on quality of clinical trials, the accuracy of endpoints and things like that, conflicts of interest, financial conflicts of interest. But then during the Trump administration, he was directed to a post in the FDA. And I wondered about that when that happened because a lot of the sort of Maha influences in the FDA are, well, I was surprised. I don't know what your thoughts were about that.

SPEAKER_01 6:00

I think this is a unique combination of personal relationships with people that ended up being close with administration. And I think Dr. Prasad saw an opportunity to take his health policy work and apply it into government action. I think the central question that animates Dr. Prasad's work is what I've sort of written below in our notes is that how do we ensure medical decisions are made by good and reliable evidence in a time of widespread excitement and innovation? And I think that's a question that very much animates his work precisely because he's in hematology and oncology, a field that prides itself in doing a lot of research and having a lot of hope when they encounter diseases that seem impossible to treat. And so I think he sees this political opportunity and he joins in as someone who's not necessarily part of the red tribe per se. He goes in and he faces challenges and controversies. And he's told many times to his audience and to his personal friends that he can't keep his mouth shut and he likes to be iconoclastic. That can lead him astray.

SPEAKER_00 7:04

Yeah. Well, sounds familiar. But but yeah, so for me, he introduced or really opened my eyes to the idea of medical reversal. And so he did a seminal paper on this many years ago, it published in the New England Journal of Medicine. It was a landmark study that looked at over 3,000 articles, and they found that about 40% of then established medical practices that were there because of evidence, because of published evidence, were reversed when they were subjected to more rigorous trials. So by his definition, a medical reversal occurs when a new superior clinical trial, typically a large, well-designed, randomized controlled trial, contradicts whatever the current standard clinical practice is. And it reveals that a widely accepted treatment or maybe a diagnostic tool or a procedure is either no better than placebo or existing care, or in many cases, even harmful, or at least a waste of money. And so he wrote a book about this subsequent to that. And I think that a lot of the things I talk about reliance on surrogate endpoints or surrogate markers. For example, if you give a treatment, does it lower your blood pressure or does it prevent cardiac mortality and morbidity? Does it shrink a tumor or does it prevent death from cancer rather than the hard endpoints? So you live longer, you feel better, that sort of thing. And so reversals happen because we rely too heavily on endpoints. And he also shows really well this knowledge to action gap that I've been talking about a lot on this season, where there's a lag time between when physicians continue to perform the status quo, and then these things become reversed, and then it's years after the evidence is published that they change their practice. And then also, and you can comment on this idea too of mechanistic reasoning versus empirical evidence. So I think he critiques a tendency to adopt practices that make sense, which I certainly do, rather than because they've been proven to work in empiric trials.

SPEAKER_01 8:59

Yeah, and to add to what you're saying, one of the medical treatments naturally change across time. That is just a testament when you just look at medical history. But Dr. Sifu and Dr. Prasad, who were the co-authors on that paper, argued that the manner by which we change the treatment really matters. Like you're pointing out, Dr. Harrell, reversals are where we said there is a treatment that we thought worked, and we told people it worked. We did it as if it worked, but we found out it didn't, and maybe it even harmed. That has much more consequences that are negative for public health trust or changing physician practice because of this practice or this evidence to action gap. And that's much worse than something that's merely replaced, where oh, that's pretty good, but we just found something that's better. Oh, we used to give cumidin for DVTs, but now we can give low molecular weight heparin. That's not that bad. That's much better, and it's not going to cause the kind of medical mistrust that we now have to deal with now as practitioners.

SPEAKER_00 10:03

Well, and so he highlighted tons of examples that we all talk about now. Vertebraplastys were done for back fractures and was found to be no more effective than sham surgery or stinting for stable angina that used to doing stents in non-emergent situations. And then we had the courage and the orbiter trials that showed that this was no better than medical management for symptom relief and things like that. And some of the proposed solutions were to end reversal or strict RCT requirements, requiring rigorous randomized trials before a practice becomes a standard of care, but also medical education reform, which is something you and I have talked a lot about, and obviously as a medical student you've been interested in and will continue to be as a resident, but teaching students to be more skeptical of low quality evidence and mechanistic common sense hypothecation, which is something that I talk about a lot on this podcast, and then regulatory skepticism. And so I'll end with that last point before we talk about this article because Willis encouraged the FDA to demand evidence of actual outcomes, improve survival, better quality of life, things like that, rather than just approving drugs because of an improvement in a surrogate marker or something like that.

SPEAKER_01 11:13

And something to even pave the way to how interesting Dr. Prasad's position now is in this paper that we're going to discuss is that he went as far as to say that we should cut certain basic science education in medical school because he's so concerned that we will socialize medical students, acculturate them into believing biologically plausible mechanisms such that pharmaceutical companies can get away with having poorly developed and designed randomized controlled trials, and they can just say, guys, you learned this in your second year of medical school. Doesn't it make sense? But when you look at the history of medicine, you see that there's lots of things that make sense but don't work.

The NEJM Push For One Trial

SPEAKER_00 11:54

Yeah. And then ironically, though, a lot of the, let's call it the Maha movement is based upon these simple first-order mechanistic ideas that if I give you this thing that affects this outcome, this lab, think functional medicine, if I give you this, then that's going to make this other thing better. It's all predicated upon the things that, well, you and I learned from Fasad are wrong. So that's why some of us are taken aback when he seemed to partner with people like Marty McCurry, who is, if anything, has been the face of Maha, and they wrote this paper together. And so again, traditionally the FDA has required two trials as a default. And then there was some regulatory relief at one point that allowed them in certain circumstances to go with one trial. And that's partly what gave us McCaina. And so our listeners are very familiar with McCana coming about as one trial in the early 2000s that was used by the company that marketed 70-hydroxy progesterone. And the FDA said we'll approve it because in that circumstance, this is a life-saving drug, this is very important. We don't want to delay for many years a second more definitive trial or confirmatory trial. But you guys work on that. And then, of course, the company drug out a second trial for about a decade. And meanwhile, in the interim, a lot of secondary evidence, retrospective cohort type evidence, population evidence, indicated that it didn't work. And then, of course, the confirmatory trial designed by the company, which we can talk about all the bias implicit in that, it came out and found that it didn't work. And it was still difficult for the FDA panel to remove it. It took two years and two votes to get it off the market. And there are still doctors out there today who are compounding it and using it because they just can't believe it doesn't work, and that knowledge to action gap. So the their article argues that the FDA should move away from this two-trial dogma, which when I first read that, I'm like, wait a minute, this is antithetical to everything that you've discussed, Vinet. And then start to think about making potentially approving drugs or therapies if there's one adequate and well-controlled study, and make that the new baseline, along with other confirmative evidence. And that could include mechanistic science, that could include data from related indications or drugs, animal models, real-world evidence that's available, information from other drugs in the same class and things like that. And there's more there, but I do want to say we're entering into this age where these new, and use, as you pointed out, as a hematology oncologist, the 2004 Nobel Prize in Chemistry was awarded to this Google-funded AI project where they were able to determine and predict protein structures. This was stuff people spent years in grad school to do one protein on, and they just did, I can't remember the number, hundreds of thousands or whatever it was, in one fell swoop, release these to the public. They're not patented or anything like that. And now we can predict protein structures and create new proteins. Literally, we can create proteins in a week. And the implications for this for medicine is just going to be the story of your early career, where we're going to be predicting protein structures and creating new proteins rapidly and in a very precise way for cancers, for immunotherapy, for things like that. So a whole different, we could do a whole hour on this or get somebody from Google to do it. But the Alpha Fold 3 and the ESM Fold software, all this stuff that's gone into this, and again, a Nobel Prize winning very rapidly. Within two years, you win the Nobel Prize. That was just shocking because it is an incredibly important thing I think a lot of people don't know about. And so that's going to lead to an explosion in, if not one-off, very small number of people drugs, where we're going to have to trust the concept that we are using generative AI to create a novel protein that has a certain geometric feature that will bind to a virus or bind to a certain sequence of specific DNA binders and things like that, and be able to use structure-based drug discovery and delivery. And so, how do we test those things? How do we approve those things?

SPEAKER_01 16:08

It's a complicated problem that regulators are facing where we have unprecedented tools with uncertain benefits and uncertain harm. The potential upshot is incredible. And so then the concern or the natural tendency is to say, let's change the rules because we're playing a different game now. The question is, are we going to throw the baby out with the bathwater, reduce regulatory standards, and then end up with a whole host of new problems that lead us astray?

Medical Reversal And Trust Problems

SPEAKER_00 16:38

Yeah. The other emerging technology that we're seeing is just CRISPR-based gene therapies. And again, we're already treating a series of very rare diseases with CRISPR technology. That's going to explode in the next 10 years as well, and the number of things that we treat. And they're not going to be, those types of therapies are not going to be amenable to the traditional the traditional way we've regulated drugs. So I think that gets us back to this article. And there's a few points that they make in there. And uh one of them I took a little bit of aim with, and I'll just read a bit to you. So that from a statistical perspective, if one tests inert substances, those unlikely to improve or harm health, two trials reduce the chance of a type one error favoring the product, finding a difference when one does not exist from 250 to 10,000 to 6 in 10,000. So they're minimizing that difference, but I will say that assumed a one-tail direction. And so I didn't like that. When I first read that, I'm like, no, you have to assume a two-tail direction here. So they by doing that, they made the numbers a little bit better. So just for the listeners, when you're using a p-value of 0.05, and you're really talking about the two-tells of the curve, and you're assuming when you do a two-telled analysis that the therapy could go both ways, it's rare that you would assume it could only go both go one way. And they used a one-tell analysis to get those numbers because they sound better than a two. So I read that and I was a little put off by I don't know, by the lack of clarity or the laziness of that statement, but clearly designed to make what their argument sound better. So they argue for a better way of looking at this evidence, and they say that a few other factors that contribute to the overall picture, aside from just traditional trials, of credibility include the magnitude of effect. Yes, love that. Absolutely love that. We talk a lot about magnitude of effect. So often we're sold drugs by drug companies with very minimal magnitude of effect, and you have to appreciate that may not be robust data, but a large magnitude of effect, think insulin for treatment of type 1 diabetes. No one's going to argue that's a type 1 error. But other effects are very marginal. Okay, the use of can of a contemporary control group versus a historical one. We talk about that on the podcast a lot, where you do an intervention and then you compare it to some old historical group, and that's not necessarily a fair comparison. The nature of the control group, are they already on the best available therapy? In other words, not necessarily placebo controlled. We also talk about a lot on the podcast about placebo controlled is not really the standard. That just shows us that it's effective and safe. But is it the therapy we should be using? And we shouldn't be rushing things to market that are no better or perhaps worse than what's already available to our patients. The pre-specification of a hypothesis, the choice of a primary endpoint, and I think he's getting at avoiding surrogate markers and things there, the concordance with biologic correlates, including evidence of alteration of an in vivo target, alignment of intermediate endpoints, so to the points along the way align with the targeted endpoint, statistical power, blinding, concealment, independent review, whether a post-protocol therapy is on par with the U.S. standard of care, the use of concomitant therapy, inclusion criteria, exclusion criteria, randomization, run-in periods, how missing data are handled, and many additional factors. And then and we could talk all day about a lot of those things, but one of the things that I wanted to point out was the last sentence. Increasingly, many of these factors are captured in a Bayesian as opposed to frequentist interpretation of trials, and the FDA has guidance in this space, and all the major journals do too, and you see lots of more Bayesian analytical models being used. That's their argument about basically shifting in policy by looking for other types of confirmative evidence. And and but the devil's in the details. What is a confirmative evidence?

SPEAKER_01 20:30

Yeah, the devil's in the details, that's exactly right. Because this, sure, we can make this shift. And like Prasad and McCary are arguing, they want this to serve as a psychological ink. They want people to view this as the new standard, and that they want to reward people that are trying to be innovative, that are trying to be early adopters. And they're hoping that the effect of that is going to be reduced drug prices, better drugs to market, better fit between what special interest groups and patients want from their drugs and the kinds of drugs they're going to receive for the outcomes they care about. It's a very pro-market-oriented change, and I understand how that fits in the sociopolitical context that they find themselves in. Dr. Harrell, I think there is a clear philosophic commitment that differs between you and Dr. Prasad here that I think is exemplified beyond all the noise of the sociopolitical context. And it's this if you ask Dr. Prasad from 1990 to 2026, do you think that we have increased our knowledge in medicine? If so, by what amount? I think if you asked him that question, you would say maybe we were around 15 to 20 percent, but now we're like 70 or 80 percent. I would go a little bit less than him. I would say maybe 10 to 15 percent, but maybe 30 to 40 percent. But Dr. Harrell, you would say uh we were below 10%. And we're still below ten percent. And you still accept these methodologies but disagree on that point.

SPEAKER_00 22:07

Yeah. No, I yeah, I agree with that. And I actually heard on an NPR podcast him make that same statement, and I was disappointed in it. But in certain domains, I've said this before, I would take the COVID vaccine, the COVID-19 vaccine, the mRNA vaccine, without a clinical trial. I didn't need a clinical trial because I think that our understanding mechanistically of that is so good that I would be willing to do that without a large clinical trial. Whereas I would not take insert blank drug for blood pressure without a clinical trial. So I get that that we know a lot more about certain domains, but we have to always be mindful of what we don't know. Okay, well, yeah, go ahead.

SPEAKER_01 22:46

I guess one question I ask myself when I want to hear your thoughts on if we have time is what sorts of evidence or things when you see in a domain make you think that you're like less of a skeptical empiricist where you say, you know what, maybe the evidentiary standards are too high. I'm seeing good promise here. I'm gonna lower it a little bit.

SPEAKER_00 23:03

I think it has to do with system complexity to me. And so in domains where I feel like we're in a very complex system where the effects have to be measured empirically, then I am gonna continue to require two trials. And that's what I mean by the COVID-19 vaccine. The mRNI vaccine is actually a very complicated system. It's not complex. It's a simple system. I understand biomolecular. I think we have a full model of how that works, and so we need less evidence for it. So that so that's the details of that is how do we uh understand different types of evidence? And you and I have also talked about different types of evidence from different areas. And in OBGYN, that's very important because we don't have many RCTs. In internal medicine, you're gonna have a ton of them, but we have ethical limitations about doing clinical trials. And so we have lots of population level data and retrospective cohorts and things like that where we try to ascertain data, exposure databases, this epidemiologic data that's very difficult to interpret. And as you get off into that softer data, all this becomes harder. And so the pitfalls of their commentary, so I loved it, and it also made me hesitant, maybe because Makari's name was on it. But there is an increased risk of false positives. There's some subjectivity in what we're calling confirmative evidence. We have to be rigorous with that. Relying on pharmaceutical companies' post-market data makes me leery. We have a potential that our standards are going to creep. And I'm not sure that's gonna save money, which is one of the really compelling article arguments, is that these trials cost $30 to $150 million. But if we keep going down the wrong roads, it's gonna cost us more in the long run. But I do like the fact that they endorsed a Bayesian analysis.

AI Proteins And CRISPR Pressure Tests

SPEAKER_00 24:43

And that, and just for the listeners quickly, where I've probably not really talked about this on the podcast ever, but that is essentially what my book is about. And it's essentially applying Bayes' theorem as opposed to frequentness methods to all of the cognitive work of medicine. So I'll explain to that just real quickly what that means and how we use it. But basically, Bayes' theorem says that the probability of some event A, given that B is true, is equal to the probability of seeing that event B, if A is true, times the probability of A divided by the probability of B. All right, let me simplify that for you. We'll put this on the Instagram. I like to sometimes use change A and B to H and E for hypothesis and evidence. So the probability that your hypothesis is true given some piece of evidence. Now what do we mean there? Well, it could be that some drug makes a difference in a clinical outcome. It could be that your child, your fetus has Down syndrome given that you have a positive cell-free DNA test. It could be anything, any hypothesis given some new piece of information. The probability that that I have HIV given that I have a positive HIV test. Any hypothesis given any piece of evidence, H and E, is equal to the probability of seeing that type of evidence if the hypothesis is true. That's actually a P value in a clinical study. So if people want to dig into this deeper, they'll realize that, times the pretest probability of A divided by the probability of seeing this evidence under all circumstances, including false positives. So let me rephrase this. The probability that I have HIV, given that I have in my hand a positive HIV test, is equal to the probability of seeing a positive HIV test if I had HIV times the probability that I had HIV before we did the test divided by the probability of seeing a positive HIV test for any reason, including a false positive. So this is how we actually interpret tests, and it's how we interpret literature. And in the book, and if for those that have read it, there are calculators to help you do this for clinical studies as well. Because when you get a p-value, all you're really understanding is part of that. But the sort of naive interpretation of the p-value is that if something has a positive p-value, then it's meaningful. It's true even. And that's they call that the naive interpretation because that's not true. So I do welcome the revolution in using Bayes' theorem to basically decide whether or not therapies are beneficial. And it that in itself already turns up on its head how a lot of medical research is done.

SPEAKER_01 27:25

And for listeners that don't want to muddy their hands in the philosophical disputes between Bayesians and frequentists, which are as long-lasting as probably the Protestant and Catholic religious wars in the Western world, I think the key takeaway for those listeners is to say, guys, you have evidence. And evidence is all that which can change how much confidence you place in certain beliefs you have about the world. Beliefs meaning things that you think are true about the world. Before looking at the evidence, you have a certain level of belief in these views. And then you look at the evidence, the evidence interacts with those beliefs. It changes it up, it changes them down, or it changes the views that you have. What Bayesian analysis can allow you to do is it applies numbers to that process you're engaging in. So, like Dr. Harrell is saying, if you encounter a study, whether it be a diagnostic test in the lab or whether it be a study as you're trying to critically appraise a topic, it always needs to be interpreted interpreted in the context of how you think the world works. You cannot just interpret it in a vacuum. And it is only once you interpret it in that way, the hope that you are following a more rational way of approaching evidence and coming to hold beliefs about the world.

SPEAKER_00 28:42

Yeah, so in that regard, I appreciate it because what they're really saying and what they summarize that paragraph with is that we need to use Bayesian approaches. And that allows us to also interpret all sorts of evidence. We can wait not just data from randomized controlled trials, which has been a frequent criticism of evidence-based medicine, but data from follow-up studies and cohort studies and biologic studies and studies from how other similar drugs work and lessons learned from homologous models and things like that, because we're assigning a confidence in that. And that process that Joshua just described is called Bayesian updating, where we move along. I think typically we have done this with meta-analyses in medicine. And students have learned that in residents that meta-analyses are the highest level of evidence. But let's look at McCana, for example, which again, what we've talked a lot about. So a trial came about in the early 2000s that showed that for women with previous preterm births, 70 hydroxy progesterone administered weekly for 20 weeks or so would reduce the rate of recurrent preterm birth. When that trial came out, it disagreed with a few smaller prior studies, and novel, interesting findings are what get published. When that trial came out, many of us pointed out that the control arm had too high a rate of recurrent preterm birth in it, that was higher than we would expect to see. So from a Bayesian perspective, even though it had a significant p-value, from a Bayesian perspective, we would have taken that data and said, well, the pretest probability that 70 hydroxy progesterone was effective was pretty low to begin with. We've studied it before and it didn't work. I have questions about the study, things like that. I can use all of those other sources of information, and I can say that my post-test probability or confidence that it actually is an effective drug is still fairly low. Now, when it got FDA approved based upon that study, the FDA didn't use a Bayesian approach. They used the naive p-value. So that drug might have not been approved in the FDA that Prasad envisions. But it was approved, and then it took another 10 years or so to get the second trial. Well, and that I want to talk about meta-analysis because, again, we're taught, I'm sure you were taught in medical school, that a meta-analysis is the highest level of evidence.

SPEAKER_01 31:08

Yeah, it's a confusing way it's taught, unfortunately, in medical school, where they give you an evidence hierarchy and they don't even tell you what the y-axis is of the evidence hierarchy. Better evidence for what. And what it usually is stating is it's better evidence for making general population level claims, typically causal claims, right? And that's separate from the kinds of particular causal claims that you will endorse when you're saying, hey, we need to give this medication for this patient in labor. And I can draw general evidence, but I'm contextualizing for this case. And so it leads to these misconceptions that are completely misguided. But really, meta-analyses is classically taught as this incredibly high-quality, synthesized source of general knowledge that can give you population level evidence about the average patient, the average content.

SPEAKER_00 31:59

And if you look at that famous sort of pyramid with no y-axis label of levels of evidence, you'll see at the first level that high-quality randomized controlled trials and meta-analysis are listed as first-level evidence. We really could divide that into one A and one B. Because one of the things that we have to determine about meta-analysis is their quality. And I want to at least introduce some of those concepts because that part I don't cover that well in the book. But we are seeing an explosion of meta-analysis being published, partly because

Bayes Thinking Beyond P Values

SPEAKER_00 32:33

it's easy. So Joshua and I could put together a meta-analysis in a couple of weekends and get it published. So it's much easier than doing real research, and millions are being published on all kinds of subjects, and often by people who really don't understand these things. So you're seeing them. A lot of us now, when we do journal clubs, we're starting with meta-analyses. And so there's words and terms and things in there that I at least want to let people know what they don't know a little bit today about them, so you can read these and understand them. Now, essentially, when you had two clinical trials, you are dealing with that false positive rate. If there was a 5% rate of a false discovery, and now you do the second one, then you have 5% of 5%, which they address in that editorial. And the concept, the promise of a meta-analysis is that you put together a series of high-quality studies. And by the time you've got several trials all basically having the same point estimate for an effect, what we call a homogeneous meta-analysis with similar methods and similar findings and similar outcomes, you can say with a very high degree of confidence that something works or doesn't work. You can also do that with Bayesian updating. The problem with meta-analyses is that they're often simulating data that is very heterogeneous, and we'll talk about what that means. What I would say, and what a lot of folks agree today, is that one high quality randomized controlled trial is as more evidence, is better evidence than a meta-analysis of heterogeneous studies. And so when meta-analysis of homogeneous studies exists, that's incredibly high quality data. And there's a ton of those in internal medicine. There are almost none in obstetrics, because again, we lack the high quality trials to begin with. And so that's one point, a takeaway point I want folks to know. And we've we've we don't know if we'll get to all of them today, but we've got a meta-analysis about late antenatal steroids, about interpart and ambio infusion for meconium aspiration syndrome, of course, meta-analysis of magnesium for prevention of cerebral palsy, and a meta-analysis of fertility outcomes for oil versus water-based history of cellpanography in the contrast media, because all of these are areas that we've whiplashed back and forth on. And all of these meta-analysis, if you want to get into them, they include very low quality studies and one or two higher quality studies, and then they often provide a point that doesn't make sense in reference to that high-quality study. And so what I would hope to convince folks is that one high quality trial is better than meta-analyzing several small quality trials.

SPEAKER_01 35:16

Another reason why this is so important, as you pointed out, Dr. Harrel, is that most people's interfacing with the clinical literature now is through these AI search tools like open evidence and doximity. And many times in their algorithm, they will privilege meta-analyses and systematic reviews. But it has been so humbling to look at these trials and to see the kinds of decisions they make and how that actually can lead you astray. You cannot just take their word for the conclusions and say the average meta-analysis has to be better than the average single center randomized controlled trial. The other concern or the other point I want to make is that what we're really trying to solve for meta-analyses and systematic reviews is how do we synthesize disparate sources of data into a good general claim about a question that we care? And this is the same sort of process or a similar sort of process that you do when you're applying a diagnosis to a patient. I, as a fourth-year medical student, will take lots of different pieces of a patient's story, physical exam, and labs, and come up with a diagnostic label. The Dr. Harrell, with his clinical experience, is gonna take those and read it appropriately, recognize when there's concordance and disconcordance, and actually apply it different diagnostically, or apply it sooner in the diagnostic process. That same process of good synthesis versus bad synthesis also applies to meta-analysis.

SPEAKER_00 36:41

Well, I'm just gonna run through a list

Why Meta-Analyses Often Mislead

SPEAKER_00 36:43

of problems. This is this, I just want people to know what they don't know and think about some of these things when they're reading meta-analyses and some of the classic problems. I will start with the classic meta-analysis that we talk about in Journal Club is from the internal medicine world from 1991, efficacy of intravenous magnesium in acute MI in reducing arrhythmias and mortality by Horner. And this came out and it reviewed, I believe, nine small or eight small studies that had been done on this topic at the time. And if you look at the, if you look at the odds ratio line graph of that, only one study found a statistically significant effect. All the other eight, they were not clinically significant. And they concluded from that paper that the patients who received IV MAG had a 54% reduction in cardiac mortality. So eight negative studies, one positive study, and they concluded from a meta-analysis, a point estimate was that you're cutting cardiac mortality in half. And this became a standard of care that within a couple of years was completely blown up by one high-quality RTC. And I'll also, so this is a famous example of bad meta-analysis. And what I want to say is you guys are reading this all the time, just as bad meta-analyses. And you got to be sensitive to that. And also, I would say that if you did a subsequent meta-analysis of these nine studies with that large landmark trial that showed why we stopped doing it, one of Prasad's examples of medical reversal, frankly, it would be unfair to continue to try to show a point estimate of benefit because these small trials are the problem. And so that gets into some of these problems. So, one, small study effects. So this is one of the best known problems and exhibited by that one. Small studies often report larger treatment effects than larger studies due to poor methodology, publication bias, selective reporting, random variation, exaggerated effect size from lower statistical power, or even just p-hacking. So we see this all the time with small studies that fail replication. And one thing I would say is that you don't take three small studies and then take the confirmatory replicating trial that disabused them, and you don't average them together with meta-analysis. Those studies are just wrong. And it's unfair to use that data in most cases. Now, the next one, I mentioned publication bias. So one of the things that happens is with small sort of preliminary studies is Joshua and I get together on the weekend, we've got a big database of data, patient data, and we're just like hypothesis generating and looking for trends. And we come up with a hypothesis that magnesium prevents cardiac mortality, and we look at our patients and there's no difference, so we don't publish it. So only the positive studies get published a lot of times, especially smaller trials. And that's what publication bias is or the file draw problem, where negative studies are just far less likely to be published. So you have, if you have one large, rigorous trial that finds no effect, and 10 tiny studies that show positive findings, they get published, but 10 tiny negative studies that would have balanced those out in a meta-analysis are not published. And so the meta-analysis will then falsely conclude that the treatment works. And we've done this with antidepressants. We've done this with Tamiflu, frankly, is a great example of already meta-analysis supporting a bigger effect size than is likely valued. Okay, next is garbage in, garbage out. So a meta-analysis can't fix, magically fix poor primary studies. Pulling multiple biased studies together just produces a very precise estimate of the wrong answer. And statistically, we can measure the precision and the estimate, and you see all these words and whatever, but if it's a bad study, it's a bad study. We saw this with vitamin E and mortality, where early observational and small interventional studies suggested benefit from vitamin E, but then larger trials showed, if anything, harm. But meta-analysis early on mixed these heterogeneous low-quality studies. And we thought this was a thing for years. And that's what you see a lot in the Maha world or in the functional medicine space, where you have garbage studies, 50 people, no randomization, poor controls, p-hacking, like that sort of thing. And you just can't meta-analyze that stuff together. So let's talk for a second in more detail about the problem of clinical and statistical heterogeneity.

Bias And Heterogeneity Red Flags

SPEAKER_00 41:08

So heterogeneity occurs when studies that are being pulled together are just too different to be summarized by a single number. So this can manifest in two ways clinical heterogeneity, where you combine trials with vastly different demographics or surgical techniques or drug dosages or things like that, and you get an average effect that's actually not true for either group, or statistical heterogeneity, where there's variation in results across studies that exceed what would be expected by chance alone. So if heterogeneity is high, and you'll see this as I in your publication when you read the methods, then presenting a single pool effect size can be very misleading in such cases. And it may be more scientifically honest to present a systematic review without a meta-analysis, where with a systematic review you can say, hey, this study is great and these other studies aren't, but due to these reasons. And we'll come back and talk about some of those, especially with amnioinfusion. So one of the things I'm going to say about the amnioinfusion meta-analysis is that a lot of the data that supports amnioinfusion was done in countries where electronic fetal monitoring is not done routinely. And so we don't know that we're looking at the right thing. We also have the possibility of having different outcomes. So outcomes that are being measured. Like in one case, we may look at neonatal outcomes, and maybe one study looks at meconium maceration syndrome, another one looks at morbidity, another one looks at need for mechanical ventilation, things like that, and they're different outcomes, and people will pull them together, and that's unfair too. Okay, let's get through my laundry list. But heterogeneity is very important, and one of the chief problems is that people combine heterogeneous studies and meta-analysis of heterogeneous studies are trumped by one good clinical trial, I think is my point for that. There's this idea of ecological masking where combining studies can reverse apparent effects if the subgroup structures differ. So, for example, observational studies suggested that hormone replacement therapy could reduce cardiovascular risk, but then when the women's health initiative study was done, it showed increasing in some groups. And so the discrepancies, at least partly due to confounders in the different populations. Now we have a more nuanced understanding about age and risk factors that wasn't present before. Another idea is mixing randomized and observational studies. We made this mistake with hydroxychloroquine early in the COVID-19 pandemic, where we have these tiny observational studies or uncontrolled studies or preprints and low quality trials. And then you can't combine that with a larger trial like the recovery trial that showed that hydroxychloroquine was of no benefit. You see outcome switching and selective reporting. So a trial might report 20 endpoints that are measured, but only one statistically significant endpoint. And then that one's published prominently as if it's in effect and it may not even be that meaningful. Or you may see a meta-analysis where, again, different analysis. Outcomes. That would be an example again of heterogeneity of outcomes. There's a lot of p-hacking and researcher degrees of freedom where conclusions are drawn from small subset analysis or other post hoc analysis and things like that. This is very common in nutrition. And then a dominance by low quality studies under a random effects model as opposed to a fixed effect model. And that's another whole hour to talk about that. But a huge high-quality study can be diluted by dozens of tiny studies with high estimated heterogeneity and random effects weighting. So look for that, and we can talk more about that. Maybe we'll do a second episode about some of these specific things. Low numbers of studies are a big mistake. You should almost never have something less than three or four. It makes the funnel plots unreliable, huge risk of publication bias and heterogeneity. And just like publication bias, there's also this citation bias or prestige bias, where reviewers tend to read articles that are in more prestigious journals that publish dramatic findings and avoid ones they can't find in smaller journals, things like that. The time lag bias, positive studies get published faster than negative ones in more prestigious journals. So it's another type again of file drawer bias. Duplicate populations. We have lots of big studies that have been analyzed by different groups of data sets. And then you'll see sometimes those are unintentionally or intentionally included in the same meta-analysis. So then the sample size becomes artificially larger, and it may imply benefit where there isn't any or at least more benefit. Inappropriate endpoint pooling using surrogate markers and laboratory markers and subjective outcomes instead of the hard clinical outcomes. And all that leads to meta-analyses being used to manufacture certainty, which is great if you want to sell a drug or get published in a big journal or things like that, especially when you tell your med students that a meta-analysis is the highest level of evidence.

SPEAKER_01 46:04

There are so many different ways when you had a very sophisticated tool for producing general knowledge. There are many sophisticated tricks you hide up your sleeve to get the result that you want. And I love the way you framed it as manufactured certainty.

SPEAKER_00 46:20

Okay, well, let's talk about at least one of these in a little bit more detail.

Amnioinfusion And A Meta-Analysis Comeback

SPEAKER_00 46:24

So in our space, for years, we did amnio infusion for patients with meconium-stained amniotic fluid. And this was based upon small studies that suffer from many of the problems that we're discussing. And then there was a landmark trial that was done, an international trial that showed no benefit. And after that, we stopped doing it. And one of the things that could have been done after that trial was published was that somebody could have done a meta-analysis and combined that trial with other smaller trials that have been published and argued that maybe there was a benefit that used the small, poor quality trials to drag the big one down. Now that trial was in 2005, and that's 2005-2006 was when we stopped doing ambient fusion. But then recently in 2023, a group in Florida published a meta-analysis in one of our premier journals, and people are bringing it back. Not based on new data, but based upon a meta-analysis of trials that go way back. There are some newer trials, but this meta-analysis would be wonderful for your journal club. It's got all the forest plots and it's got funnel plots and it's got all the things. But to give an example of what I mean, if you just look at these trials, so the premier trial was the Frasier trial in 2005. This was an international trial that involved nearly a thousand patients in each arm. They had a team of three blinded neonatologists that determined which infants met the criteria for meconium aspiration syndrome. They had a standard protocol. They all had continuous electronic fetal monitoring, and they did this for thick meconium. And they found not only no benefit, but they found potential harm. And so we stopped doing it. But trials that existed already at that time that showed some potential benefit before 2005, some of them had, they were good papers in their own, but they had 40 or 50 patients in each arm. Many of them were done outside of the United States. And since in the last few years, many of these studies have been published in India in particular, and some of those studies have shown a very large effect size. And so what this meta-analysis has done is taken some of these studies that were published in India, in particular the Chowdhari study in 2010, and a Muhammad study, actually from Zimbabwe and a few others, and they've tried to apply this and meta-analyze this wonderful international trial that changed our standard of care and argue essentially that it was wrong. So that Zimbabwe trial, they didn't have continuous fetal monitoring. The Chowdhari trial, they didn't have continuous fetal monitoring. And the Fonseca trial in India didn't have continuous fetal monitoring. And if you look at the data, those are the trials just from a numbers perspective that tend to shift this the most. They're the ones where they get some statistical weight in order to argue that this is different. Now, I want to read the conclusion from this meta-analysis because I find the arrogance of it striking. So their conclusion is that our data suggests they review their numbers and then they say our data suggests that if prophylactic ambient fusion had been implemented during their intrapartum course in the United States, oh, since we stopped doing it in 2005, approximately 200,000 cases of meconium maceration syndrome of the estimated 300,000 total national cases since 2007 could have been prevented. Given that the mortality rate of MAS is as high as 12%, widespread implementation of amnio infusion since 2007 could have prevented deaths of approximately 24,000 newborns in the United States. So they're arguing, based upon this meta-analysis, that 1,000 kids a year, essentially, we think there's about 1,500. They're arguing a two-thirds reduction in the death from meconium aspiration syndrome in the United States, not in India, not in Zimbabwe, in the United States. And I would encourage people to go through that trial maybe as a journal club activity and really think about this. But this is the greatest example I know of in our literature currently, where one large, well-designed randomized trial trumps all of these smaller trials and all the heterogeneity. We could spend three hours talking about the heterogeneity of these studies. Now, for the listener, the reason why amnioinfusion probably had a mortality benefit in Zimbabwe is probably due to cord compression and other effects that may not be directly related to amnioinfusion. But in the trials that were well designed, that were blinded, that were at the lowest risk of bias, and that are most applicable to patients in the United States, and there's another trial, the fung trial too, that did the same thing, it showed no benefit. And so the hubris of the meta-analyzer to claim that thousand babies a year are dying in the US when it's just mind-boggling to me. And now what I've seen is people are bringing this back into clinical practice because they think that we've learned something we didn't already know just by clever statistical manipulations.

SPEAKER_01 51:19

One thing that I think is a very useful set of questions to ask when you're interpreting trials like this is to ask yourself any trial is trying to answer does something work here if it's an interventional trial. But your job as a doctor is to ask yourself, will that thing, if it works somewhere, work here? And when you look at this meta-analysis that Dr. Harold just talked about, you will see, like you said, Dr. Harold, a hubris of saying that, hey, these things work in these contexts, and there's this one trial that shows that it didn't, but look at all these other trials that show it doesn't. They're just making that inference straight to it will work here, and it will work very well in our context in the United States. And once you frame it that way, you recognize how much burden of proof they're going to have to beat to make that strong of a claim. And I think that is sometimes lost because we look at the shiny label of, oh, this is a meta-analysis.

SPEAKER_00 52:23

This must be good. Right. Yes. The all of the other claims get pumped into that. And so even if the authors wanted to argue that in India or in Zimbabwe or in the context of not fetal monitoring, that would be a different meta-analysis. So this is a meta-analysis of incr that is of incredibly heterogeneous trials. Okay, so the other thing I think about with trial with meta-analysis like this is this idea of essentially meta-analyses are they're using methods, but essentially they're averaging together. So the foundation of that thought is that you and I do a similar trial at two different institutions on an intervention. We do a couple of randomized controlled trials and we see some trends towards benefit, but neither of us found benefit. But then we can look at that and think about uh post hoc power analysis and realize that we underenrolled to find benefit. And if we have relatively similar inclusion and exclusion criteria, and we have the same dose of the medication and basically the same population, but we need to get to a larger N in order to see some benefit, we could pool our data and use the methods of meta-analysis to smooth that out and make the comparisons fair and find something that neither of us could find in our own trials. That's the promise of meta-analysis. But here you're taking a study of 35 people in India and 50 people here and 20 people here and 30 people here, all with widely different inclusion and exclusion criteria, methodologies, doses, all these sorts of things. So you can't just add those up to get a bigger in. And all of those small studies are subject to all of these sort of biases we talk about. Now, when we talk about bias, the main biases that we mean really are things like publication bias and problems that preliminary small data have. We're not necessarily talking about like implicit bias, but you can count that in there because if you're doing a small study at your institution and it has academic value to you to become published, then you might sit around and try to tease out of the data something that's statistically significant. And that's a form of bias too. We call that p-hacking, but it's a form of bias as well. So my point is, though, this is a philosophical thing. Do you always take all of the studies and assume that they have equal findings, or do you recognize that there is a false positive rate of discovery? And that's even higher when you allow for p-hacking and things like that and just disallow them. And the author of a meta-analysis could do that. There is a methodology called the Jadad scoring that you'll see when you read meta-analysis, and this is it's also called the Oxford Quality Scoring System, and you get points for different areas. You get points related to randomization and blinding and accountability and withdrawals. And so typically scores above five are considered rigorous and below five are not. There's a lot more to that, but you'll see that. You'll see studies measured by the Chadad, and you'll see sometimes people we only focus on five and ups, or maybe we included other ones. You'll also see people try to account for heterogeneity in the same way, and those are more honest meta-analyzers. And what they're trying to say is we're not going to include all of these low quality studies. The authors of this ambient fusion study admitted that they had incredibly heterogeneous data. But I also want to point out that the Jadad criteria doesn't take much. So you get a point if it was randomized, and you get a point if they describe the method of randomization. All right, that's two points. You get a point if it was blind double blinded, you get a point if the blinding method was considered appropriate, like the placebo dose and the medicine dose were the same size and the same syringe. That's two points. And you get a point if the disc if the study talks about who withdrew from the patient or lost a follow-up. So my point is it's not hard to get a five. And that does not, the Jadad in itself does not describe whether a study is high quality.

SPEAKER_01 56:27

And it just shows that when you the appropriate use of a meta-analysis is going to be hopefully when you are aware of what direction the data is going already, and you just want to quantitate the magnitude when there is general homogeneity of the trials that you're trying to synthesize, and there's clear quantifiable outcomes. But for all sorts of reasons in the fields of pediatrics, psychiatry, certain portions of internal medicine, and OBGYN, we have heterogeneous data. The directionality is the very thing that we are disputing, and there's a whole range of study quality. And so the devil is truly in the details when we make meta-analyses. And we can't just trust what an AI tool tells you about the findings of a meta-analysis because it can just be homogenizing these truly heterogeneous findings.

SPEAKER_00 57:18

So one thing I always do is I do use a meta-analysis as a starting point to find relevant studies. But I'm taken aback when I read a meta-analysis, for example, of magnesium for prevention of cerebral palsy, which we've talked about at length on this podcast. And you look at it and it includes every trial in it's a negative trial. Every trial is a negative trial. But some data from a subset of one trial is then used to change all of this other data and find a point estimate, a very significant point estimate of reduction in cerebral palsy. And it reminds me of that Horner trial or meta-analysis in 1991 about magnesium for MI with a 50-something percent reduction in heart attacks when there was only one positive and eight negative trials. They had different inclusion criteria for patients and different gestational ages. They had different endpoints. And if we step back and realize that actually what we should do is a systematic review of this heterogeneous data, then we can include other things like long-term follow-up of these children. The fact that the neurodevelopmental scores for the children in both groups, even in the trial used to influence the point estimate, were no different. Like other, we if we did a systematic review, we could say, well, we saw this, but actually the long-term follow-up of these kids are the same. The short-term assessment by the behavioral folks showed no difference in the children, things like that. And we can use population level data. There's data about magnesium for CP on a population level, I believe from Ohio, that's shown that implementation of a program didn't change the rates of cerebral palsy. We can use all these other things in a systematic review. But when you act like that only randomized controlled trials are the only thing that exist, and then you torture them to death to do a systematic or meta-analysis like this, you very often get very questionable details. I I would say that magnesium for cerebral palsy, if it does what it's claimed to do with the magnitude of effect that we're teaching residents, then someone should very easily be able to do a trial to show that.

SPEAKER_01 59:34

That's I think a really valuable point that I learned from you, Dr. Harold, which is that if the finding is really there, it's robust. It will show up with multiple methodologies. Don't be afraid to test it.

SPEAKER_00 59:45

Yeah. Well, we had a lot more to say. And I promised oil-based versus water-based contrast. This is just another great example of pick the outcome that matters. Are you interested in live birth? Are you interested in pregnancies? Are you interested in what are you actually interested in? And there is a seminal trial that was published in the New England Journal of Medicine in 2017. And basically it showed that women who had oil-based contrast medium had a higher live birth rate than in the water-based trial. But the reason why I thought about that is prior to that trial, if you look at there was a bunch of, again, small data. And the point estimates were usually not statistically significant, but they trended in some way. And that's where people see an opportunity for meta-analysis. And you could conduct a meta-analysis that showed that water-based was better. And then we get the definitive big trial, well designed, and it shows that oil-based is has a higher, at least a higher live birth rate associated with it. And it's another example of where if you had meta-analyzed all the trials prior to that, you would have had the wrong conclusion. And there's tons of these, which gets back to Vinet Prasad that basically found that the false discovery rate was 40%, at least in how things are implemented in clinical trials, and not 5%, 40%. It's not just a question of the alpha of 5%. It's much more than that because of industry influences, which tend to double the observed effect rates, because of p-hacking, because of reliance on surrogate markers, et cetera. So I guess my summary is I liked the article in the New England Journal of Medicine by Prasad and Becari, but the devil is so much in the details. And it's incumbent upon all of us to get much more rigorous in what we how we analyze evidence and what we consider to be evidence. And your point about open evidence and other AI tools, those tools are already taught that the meta-analysis is the gold standard. I've done this experiment with open AI. And I asked, for example, does magnesium prevent cerebral palsy? And it gave me the results of the meta-analysis that I have in my hand right now. And then I said, Is there any evidence that it doesn't? And it gave back a wonderful answer. It went through and said, Well, actually, all of the individual studies said it didn't help. And it was great, but I had to ask the right question.

SPEAKER_01 1:02:08

There was this conversation once with someone who was teaching us critical appraisal in my fourth year of medical school. Someone asked, Will open evidence and other AI tools make critical appraisal irrelevant? He said with so much joy in his voice, No, it will become even more important because now it is even easier to get full.

SPEAKER_00 1:02:26

Yeah. Okay, well, I think we're at our

Final Warnings And How To Learn

SPEAKER_00 1:02:29

hour. There's so much more we could talk about. If folks are interested in this, I've written a book about a lot of these things, but actually not on meta-analysis. So that'll be a chapter in the second edition. But be very leery and skeptical about meta-analyses and use them. Get back and look at the original articles and ask yourself if what they're saying makes sense. You can use the techniques of meta-analysis to prove almost anything from almost any set of articles by manipulating some of these tools and by just choosing which articles you even include and don't include in your meta-analysis. We saw that from the Trump administration over Tylenol, where they explicitly chose to not include the sibling data studies on Tylenol and autism, which are our highest level evidence for that subject matter in terms of controlling for unknown variables, because had they included it, they would have found no effect on Tylenol and autism, and not even a correlation, let alone a causal effect. So they excluded them, and someone from a big name school published a paper, and that was the impetus for that press conference we all remember about a year ago, was a bad meta-analysis. Okay, well, you might have to come back on. If people like this content, we'll do more of it, or even just do a journal club type segment, because really it takes a longer time to get into the weeds on some of these articles, whether meta-analysis or other. And we've avoided doing that for not wanting to bore people on the podcast. So we'll see if if folks have an appetite for it. Good luck in residency, and you and I will be talking more. And for the listeners, we'll be back in a couple of weeks.

SPEAKER_02 1:04:07

Thanks for listening. Be sure to check out thinking about obgyn.com for more information and be sure to follow us on Instagram. We'll be back in two weeks.