Friend or Foe: AI in the Realm of Epidemiology Artwork

EPITalk: Behind the Paper

This stimulating podcast series from the Annals of Epidemiology takes you behind the scenes of groundbreaking articles recently published in the journal. Join Editor-in-Chief, Patrick Sullivan, and journal authors for thought-provoking conversations on the latest findings and developments in epidemiologic and methodologic research.

All Episodes

EPITalk: Behind the Paper

Friend or Foe: AI in the Realm of Epidemiology

April 13, 2026 • Annals of Epidemiology • Episode 21

0:00 | 21:09

Send us Fan Mail

Emaan Rashidi, a doctoral student at Johns Hopkins Bloomberg School of Public Health, joins us on EPITalk to discuss her article review, covering AI's role in future epidemiologic research and its broader implications. “Is artificial intelligence a friend or foe to epidemiology?“ is published in the March 2026 Issue (Vol. 115) of Annals of Epidemiology.

Read the full article here: https://www.sciencedirect.com/science/article/abs/pii/S1047279726000116

Episode Credits:

Executive Producer: Sabrina Debas (Episodes 1-18) and Sofina Tran (19-)
Technical Producer: Paula Burrows
Annals of Epidemiology is published by Elsevier

Patrick Sullivan 0:10

Hello, you're listening to EpiTalk, Behind the Paper, a podcast from the Annals of Epidemiology. I'm Patrick Sullivan, editor-in-chief of the journal, and in this series, we'll take you behind the scenes of some of the latest epidemiologic research featured in our journal. Today we're here with PhD candidate Emaan Rashidi to discuss her article, Is Artificial Intelligence a Friend or Foe to Epidemiology? You can read the full article online in the March 2026 issue of the journal at www.annals ofepidemiology.org. So I'll introduce our guest, Emaan Rashidi, is a doctoral student in the epidemiology program at the Johns Hopkins Bloomberg School of Public Health. Her research focuses on the development and validation of agentic artificial intelligence system for pharmacoepidemiology. Welcome.

Emaan Rashidi 1:06

Thank you for having me. I'm honored to be here.

Patrick Sullivan 1:09

So it feels like everyone is interested in AI. And I was really excited about your article because your article takes this broad field of AI that everybody's thinking about and talking about and trying to figure out what to do with, and really puts it in terms that epidemiologists will relate to. So I know that some people who listen to this will go find the article and read it, but I think this is a great chance for us to talk about sort of how you got here and some of the main content. So I'm always interested in how people come up with an idea for a paper. So I wonder for you what inspired, you know, this particular research issue and why you thought this issue was important. You know, you're someone who's earlier in your career and you're looking forward to what things are going to be like. So I think AI is especially relevant. So what inspired you to pull this paper together?

Emaan Rashidi 1:59

I had just started my PhD and I knew I was interested in both AI and pharmacidemiology, but didn't know what resources, infrastructure, or mentorship existed at that intersection. So my advisors suggested I informally meet with faculty across the department to ask about their experiences with AI. And what I found was I was met with a spectrum of perspectives. Some folks were enthusiastic but cautious, others were more skeptical and apprehensive, concerned about the black box models and erosion of core epidemiologic principles. And some folks actually exhibited all those sentiments in a span of a 30-minute meeting. So at times I couldn't tell whether the epidemiologists sitting across from me viewed AI as a friend or foe of the discipline. And that tension became the intellectual seat of the paper. My co-authors and I realized that the real question isn't whether AI is good or bad. It's whether and how we can use these new technologies in epidemiology.

Patrick Sullivan 3:03

Yeah, that's such a great story and such a systematic way, you know, as an epidemiologist to assess this. And I'll just encourage people, we'll talk some about this, but part of what you put together out of this is thinking about the epidemiologic process and phases. So, what is it about measurement? What is it about inference? What is it about study population and how AI and machine learning can be used in all those spaces? So I know people will want to go to the paper and check out more on that, but just the genesis of this and what you were hearing from faculty, it really does say it's kind of a double-edged sword. You know, used used correctly, it can make us more efficient. Used incorrectly, it might be a challenge. But so let's get into that detail a little bit about how you structure the paper. So, can you talk some about the steps that you would suggest about how to use AI or machine learning to actually be a productive tool? And we'll start out by, you know, thinking about maybe study populations as a first area, like as we're thinking about a question and what study populations, what data might be available and how we might approach them.

Emaan Rashidi 4:09

I think the most important step is actually the first one. Start with the research question. And to successfully answer any epidemiologics question, one must identify the target source and study population. Because this process undergirds external validity and it helps us understand how we can apply the inferences we make from our study. And epidemiology has always been question-driven. Is the question descriptive, predictive, or causal? But if you begin with the AI tool instead of the question, you risk the misalignment between your objective and study execution. And the populations that you are studying will be directly affected by the question that you are posing and trying to apply the AI to.

Patrick Sullivan 4:53

I think it's it's such a good point that the coming up with good questions and clear questions and precise questions really is so critical to doing a study that's important and that's going to improve health. And so maybe that is still something where we read the literature, consult with colleagues, go back and forth on that question to define it. And then AI might take us, for example, into a next phase of either defining dealing with study population. And, you know, we can go through some of the things that are in your article, but I think your point is really well taken that the question needs to come from research, thought, maybe chatting with colleagues. And once that question is defined, then your article walks through some ways to use these tools in different phases. So your article really goes through nicely each stage. And the first is study population. So what are some of the considerations for defining the study population then using AI or machine learning to deal with data from that study population?

Emaan Rashidi 5:53

So once the researchers define the target source and study population of interest, you're choosing what data sources can accurately represent and capture these patients or people. And in this era of big data, there are large and heterogeneous data sets. And often we can misrepresent this vast data source with representativeness. So there's almost an illusion of representativeness because we have all this large data. And what we really highlight in this paper is that without careful sampling and attention to who is included or excluded in these models, you can induce bias. So there are performance measures such as loss functions that can amplify concerns of misrepresentation if they're interpreted or used incorrectly. So, for example, there's the mean squared error or log loss, which take the global average across individuals or observations, meaning that groups with more data contribute to more loss. So, what this can mean is that AIML models can optimize their performance to majority groups. That in turn can be at the expense of minority or underrepresented groups, which compromises generalizability. So the epidemiologist here needs to understand the semantics of the data and the AIML functions and how the results and the data source can influence your conclusions.

Patrick Sullivan 7:25

Thanks. So your paper next moves on to measurement. And you make the point that like there's this massive amount of data and expanding amount of data. And so one of the questions is how do we really do we really have the resources to take advantage of all this, you know, expanding amount of data? And you sort of say that if it's used appropriately, AI and machine learning can enhance measurement in several different ways, you know, data quality, data collection. So can you talk a little bit about how these technologies actually can help enhance measurement, which is such a critical component of epidemiologic studies?

Emaan Rashidi 8:02

Yeah. So measurement and Epi research has been historically constrained by that limited data availability, underpowered sample sizes, and challenges in integrating these diverse data sets. So different AIML tools can help us overcome some of these challenges. One example is natural language processing in electronic health record data. We often use this in pharmacoepidemiology to enhance exposure and outcome ascertainment. And this can improve the way we refine our constructs in our study designs and reducing sources of misclassification and measurement error. However, that's just one piece of the measurement. When we harness all of this data, we have to understand what the data actually means, who they represent, and what are the sources of measurement error. So there have to be specific methods. We mentioned in the paper you could use like the kappa statistic or coefficient of variation to evaluate reliability, but you should also be validating the tools and the data used.

Patrick Sullivan 9:11

Right. So just because you're using these tools, that doesn't excuse the epidemiologist from really understanding what the underlying data are.

Emaan Rashidi 9:20

Exactly.

Patrick Sullivan 9:21

All right, moving on, like through the same structure as in your paper to inference, you talk about the fact that we have a set of methods to generalize data, you know, that might be descriptive or predictive or asking questions about causality, and you go into some examples. So could you just talk a little bit more, like maybe in different study designs, or you know, give an example of how these tools can be useful for inference?

Emaan Rashidi 9:47

Yeah. So correct inference is rooted in good study design. So that includes sound analytic methods and data integrity while thoughtfully interpreting your results. And you do that with a well-defined research question. One way we describe using AIML and more so as a caution is when sifting through different types of study designs, like in cross-sectional studies, there's a way that people are using AIML coding platforms, for example, to do analyses. The data structure for cross-sectional studies differs from those of causal investigations, of those from predictive studies. So when you're using, for example, some forms of AIML to do these analyses, it's crucial to understand the design's limitations and the interpretation in your data.

Patrick Sullivan 10:42

So after the analyses are done and you have results, the next step is to make interpretations of them. So epidemiologists working with these tools, what are some of the pitfalls at this stage of interpretation of data?

Emaan Rashidi 10:56

So after you've done the analyses using these tools, ideally you've already identified its sources of measurement error and potential biases that are in your study design. And you should take all of these different considerations when framing the inference from the study you have just executed, understanding all of the strengths and limitations of the approach in the frame of an epidemiologic conceptual framework.

Patrick Sullivan 11:26

Yeah, so it doesn't really relieve us of the responsibility for that thought work. And you make this point that having a map for inference, so a conceptual framework, um, thinking about DAGs, thinking about like what the relationships are, is still part of the epidemiologist thought work. Even when these tools are doing some of the work in the analysis, it's still that thought work is still the responsibility of the epidemiologist. And I feel like you really have this point woven throughout. And it is, it is for me the key point, which is like where can these tools help make us more efficient? But the thought process and the critical interpretation has to rest with the epidemiologist who's conceiving of the study and doing the study and then writing the paper. So there's that division of labor, and you really lay these things out as tools. So I guess you know, if we sort of think about this, you know, holistically, you know, what should departments, what should schools be thinking about? And you go through some of the different roles for us in academia around like how these tools can be integrated responsibly.

Emaan Rashidi 12:34

So, as we've already discussed, that our field has the conceptual tools needed to evaluate AI, but what we lack in some settings is a shared language, training infrastructure, and interdisciplinary integration. For example, many epidemiologists are not trained in Python or machine learning workflows, and many AI specialists are not trained in epi inference. There's a communication gap. So in this paper, we argue that AIML should be integrated directly into the core epi curricula first, not as electives, but within courses on measurement, study design, and inference. We also state that departments should devote increased resources for trainees and faculty to access big data and AI tools. They need to build standard guidance on privacy ethics and data governance to ensure proper use. And lastly, leaders should be promoting interdisciplinary collaboration with software engineers and data scientists. And all of these together will help us adapt. Without adapting, we risk widening this methodological literacy gap. And we see that interpretive authority to the adjacent disciplines.

Patrick Sullivan 13:54

I think it's a great point. The only thing I'll add to that is I think you're really talking about how these tools get folded in and become part of the structure of how we teach epidemiology students, how we train epidemiology colleagues. And the one thing I'll add from my perspective is that in addition to the new folks who are coming through the educational pipelines, you've also got faculty, you know, who may have been doing this for 10 or 20 or 30 years. And um, and so we'll be looking to younger colleagues, earlier career colleagues for opportunities because I think it's also important for people who are later in their careers to get a firm understanding of what these tools can do. And so you're absolutely right that we have to invest in the pipeline. And then I think we also have to invest in what resources we make available to people who are already in the field. But I'll just move us towards a wrap-up by saying that again, like when this paper came in, it was exciting because it really recognizes this question of where are AI and machine learning going to live in our epi world? And are they gonna be our friends or are they gonna be our foes? And I think we really make the case that they have the potential to be our friends, but have to be implemented in the right ways and in thoughtful ways. And again, I think what I take away from your paper is that you really can't replace the thinking, the epidemiologic approach and the thinking, any of these tools, but these tools can make us more efficient, maybe with respect to the areas that you lay out, you know, particularly around populations, measure, inference. And then this last piece about how we integrate these tools into our teaching programs and work on that pipeline. So, all around it, it really is a paper that has a lot of vision. I encourage people to read the whole paper because there's a lot of what we've talked about that's laid out with some other examples and just puts even more content to the areas that we've discussed. So we've had a good conversation here about what was in your article. Now we're going to move to a part of the podcast that we say is behind the paper. And I think this is getting outside of the more academic way of thinking about this and just asking you as a colleague, as an epidemiologist, maybe what advice you have for doctoral students now and our whole current generation of doctoral students are coming into this work in the age of AI. So, what advice do you have for doctoral student colleagues who are thinking about their dissertation work in this age of AI?

Emaan Rashidi 16:24

So I wouldn't position myself as the model for how a PhD student should navigate AI. Everyone's path is different and I'm still learning myself. But one piece of advice I'd offer is don't follow the fads, follow the methods. AI evolves very quickly. The model names change, the headlines change, but the underlying methodological principles move much more slowly. So instead of anchoring your dissertation on like a specific tool or trend, take the time to understand the computational and inferential foundations behind it. What is the model optimizing? What assumptions does it make? Where can epi biases enter? What problem is it truly solving? So if you understand these underlying methods, you can adapt as the technology evolves. But if you anchor yourself to a fad, your dissertation risks aging out before you graduate.

Patrick Sullivan 17:18

That's

Behind the Paper

Patrick Sullivan 17:19

such good advice. And I I want to just wrap up by putting this in a little context in terms of how this work and the thinking about AI fits into your larger research agenda and aims and um and what you will ultimately hope to accomplish with your work in the big sense.

Emaan Rashidi 17:36

My larger research agenda is to ensure that epidemiologists are not peripheral observers in the AI revolution, but are active architects of it. Our discipline is uniquely equipped not only to keep pace with the innovation, but to help shape and guide it. In my own work, I focus on agentic AI and pharmacoepidemiology. So I'm currently developing and validating a research program called TAGENTIC, which is an agentic system that is designed to develop drug safety and effectiveness protocols for real-world evidence generation. And I hope to continue putting out methods and evaluation frameworks for safe use of these innovative tools in epidemiologic practice.

Patrick Sullivan 18:24

That's great. And I really think that folks who hear this conversation will be excited as I am just to hear how this thing that we think about is AI, this big concept, you know, you've really tethered down to how it relates to the field of epidemiology and talk some about how you're using it in your own work. And uh it kind of demystifies a little bit for folks who don't live in this world, what the possibilities are. So, Emman, your manuscript had kind of a provocative title, which is Is Artificial Intelligence a Friend or a Foe to Epidemiology? So after all your, you know, your thought on this and writing the paper, which is it? Is AI a friend or a foe?

Emaan Rashidi 19:05

I honestly think it can be a friend if we let it in, you know, with any friendship, you have to let them in. You have to try to understand them and adapt to the different communication strategies for making a new friend. And if you don't want to make a friend, and alternatively, if you want to make a foe, you do the opposite. And in an age where AI is deeply impacting the way populations are affected and the public health in general, epidemiologists should remain central in guiding responsible AI for public health. So maybe we should think of it as a friend.

Patrick Sullivan 19:47

We should think of it as a friend and be inquisitive, right? I think that's good advice about maybe IRL friendships and with AI, which is that we need to be humble and we need to be inquisitive and open to learning. And I think your manuscript really is a great place for people to start to understand some of the buckets or the bins of issues that come along with this. And AI is going to be with us, whether we think it's a friend or a foe. So I agree, let's make it a friend. That brings us to the end of this episode. Thank you so much for joining us today. It was a pleasure, Emaan, to have you on the show and to have this conversation.

Emaan Rashidi 20:22

Thank you.

Patrick Sullivan 20:23

I'm your host, Patrick Sullivan. Thanks for tuning in to this episode and see you next time on Epitalk, brought to you by Annals of Epidemiology, the official journal of the American College of Epidemiology. For a transcript of this podcast or to read the article featured on this episode and more from the journal, you can visit us online at www.annals of epidemiology.org.

Patrick S. Sullivan, DVM, PhD

Host