Drug Safety Matters

#40 How to use artificial intelligence in pharmacovigilance, part 1 – Niklas Norén

Uppsala Monitoring Centre

Far from a future add-on, artificial intelligence is already embedded in the cycle of drug safety, from case processing to signal detection. Versatile generative AI models have raised the bar of possibilities, but they have also increased the stakes. How do we use them without losing trust and where do we set the limits?

In this two-part episode, Niklas Norén, head of Research at Uppsala Monitoring Centre, unpacks how artificial intelligence can add value to pharmacovigilance and where it should – or shouldn’t – go next.

Tune in to find out:

  • Why pharmacovigilance needs specific AI guidelines
  • How a risk-based approach to AI regulation works
  • Where in the PV cycle is human oversight most needed

Want to know more?

In May 2025, the CIOMS Working Group XIV drafted guidelines for the use of AI in pharmacovigilance. The draft report received more than a thousand comments during public consultation and is now being finalised.

Earlier this year, the World Health Organization issued guidance on large multi-modal models – a type of generative AI – when used in healthcare.

Niklas has spoken extensively on the potential and risks of AI in pharmacovigilance, including in this presentation at the University of Verona and in this Uppsala Reports article. His favourite definition of AI remains the one proposed by Jeffrey Aronson in Drug Safety.

For more on maintaining trust in AI, revisit this interview with GSK’s Michael Glaser from the Drug Safety Matters archive.

The AI methods developed by UMC and cited in the interview include: 


Join the conversation on social media
Follow us on Facebook, LinkedIn, X, or Bluesky and share your thoughts about the show with the hashtag #DrugSafetyMatters.

Got a story to share?
We’re always looking for new content and interesting people to interview. If you have a great idea for a show, get in touch!

About UMC
Read more about Uppsala Monitoring Centre and how we promote safer use of medicines and vaccines for everyone everywhere.

Federica Santoro:

Whether we realise it or not, artificial intelligence has already transformed our lives. Generative AI technologies rose to popularity just a few years ago, but they have already revolutionised the way we write, search for information, or interact online. In fact, it is hard to think of an industry that won't be transformed by AI in the next several years. And that includes pharmacovigilance. My name is Federica Santoro, and this is Drug Safety Matters, a podcast by Uppsala Monitoring Centre, where we explore current issues in pharmacovigilance and patient safety. Joining me on this special two-part episode is Niklas Norén, statistician, data scientist, and head of Research at Uppsala Monitoring Centre. In this first part, we discuss how artificial intelligence can fit in the cycle of drug safety, why it's important to regulate AI use in pharmacovigilance, and what it means to adopt a risk-based approach. I hope you enjoy listening. Welcome to the Drug Safety Matters studio, Niklas. It's such a pleasure to have you here today.

Niklas Norén:

Thank you so much.

Federica Santoro:

Today we're diving into nothing less than artificial intelligence, or AI, and its use in pharmacovigilance, obviously. Such a complex and relevant topic. There will be lots to cover. Why don't we start with the basics? I'd like to cover some definitions to begin with. When reading up for this interview, I was surprised to learn that AI is nothing new to pharmacovigilance, really. But we didn't call it by that name until recently in publications. And I read somewhere that even disproportionality analysis, which is one of the go-to methods – if not the go-to method for PV analysis – could be considered artificial intelligence. Is that so?

Niklas Norén:

So it seems counterintuitive, right? I mean, most people, when we hear artificial intelligence, we think of something different, something much more versatile, and something with agency, perhaps. And this is the reason why in the past we didn't use the term, because it was... it wasn't really reflecting the work we were doing at that point when we were solving quite narrow applications with quite simple methods. But I think when you look at the definitions that are actually in use, they do include much more than the most recent class of deep neural networks or generative AI methods. They include simple machine learning methods. They also include hard-coded systems that have no ability to learn from data, for example. And so, if you look at the first chess engine to beat the world champion, Deep Blue, there was no machine learning even in it. Definitely no deep neural networks. It was basically hard-coded human expertise and very efficient search strategies in a computer that was able to do it. And that clearly is a form of artificial intelligence. And so I think, when we talk about AI, the relevant question maybe isn't, is it AI – yes or no? It's, what kind of artificial intelligence do we have? Is it one where it's a narrow application or it can do many different things? Is it one where there's a clear-cut answer that it's trying to get to, or is it ambiguity in the task that it's trying to solve? Does it have an adaptiveness or is it fixed, etc.? So I think those are much more relevant questions. And then, so if we come back to your question: disproportionality analysis in itself, the statistical method, I would say is not AI. But if you use it in a triage to direct your signal assessors to specific case series, I think it represents a simple and quite basic form of artificial intelligence, but artificial intelligence nonetheless, in the sense of a computer or another machine trying to perform a task that would normally require human cerebral activity, like Aronson has defined it in Drug Safety.

Federica Santoro:

And it will be interesting to see if the definition evolves as the technology itself evolves, right? We might get to the point where we have to review how we define artificial intelligence per se, as it includes perhaps more and more aspects and methods.

Niklas Norén:

Or we move beyond that and we say, the real question is not the binary yes or no, but it's more, what type of artificial intelligence do we have in front of us? Is it a narrow one, is it a broader one with more use cases in mind? Is it one that's solving a clear-cut task where humans know the right answer? Or are we doing something quite ambiguous where it's hard to say, even for human specialists, what the right answer is? So in pharmacovigilance clearly, something like signal detection is of the latter nature. But even some things we think of as maybe much more basic, like knowing, does this report relate to a pregnant person? That is not always easy to tell because we don't have all the information. Or do these reports relate to duplicates or not? And I think this is often the case we're in. So I think maybe we have to think about AI in that sense. It's more, what flavour of AI? I mean, is it adaptive or is it fixed? And so forth.

Federica Santoro:

That's a helpful way to think about it: what are we using it for? So, that's a helpful framing. But when we talk about artificial intelligence nowadays, and especially with the public, as you mentioned, we often refer to these newer methods, right, that have been in the news now for years and have taken the world a little bit by storm, I must say: generative AI tools like ChatGPT. What is so special about this technology?

Niklas Norén:

Well, one obvious special aspect is the capabilities of them. I mean, they're massive and they've been trained on massive amounts of data. So the capabilities they have, I think, are unprecedented. I mean, especially in the way that they can process text and generate text and so forth. But also, this difference compared to before when we were typically trying to solve a specific task with a specific artificial intelligence solution, and that's what it was trained and fine-tuned to do. Now, these methods, because they interact by processing and generating new text, they can also do things for which they've not been explicitly trained. So, this versatility, I mean, we refer to it as zero-shot learning, meaning we don't even give it any training data or any examples of what it should do. It could still do things we ask it to do. This I think is a very important capability, but it also, of course, opens up for the possibility that we ask it to do things without really knowing how well it will be able to do those tasks. So I think that, plus this... just how it feels to interact with it when it's not just sort of producing an output like A, B, C, or D, or "I think this is a dog" or "I think this is a wolf", but it's actually producing new text, and we can have what almost feels like a human conversation, is also, of course, something that's different compared to what we've seen previously.

Federica Santoro:

And that versatility raises a whole set of issues that we will dive into one by one in a little bit. But first, I want to explain why you specifically are in the studio today to talk about artificial intelligence in PV. And that's because you've been part... apart from the fact that you are a data scientist and statistician and have been working and thinking about this for a long time, but you're also part of a working group within CIOMS, the Council for International Organizations of Medical Sciences. And that working group has been tasked with drafting a set of guidelines to regulate – or advise, perhaps – the use of AI in PV. So, why do we need guidelines? Or rather, what are the risks, perhaps, of not having any guidelines at all?

Niklas Norén:

So, I think generally, we need guidance because we want to get this right. I think there are great opportunities with artificial intelligence for us as a broader society, and specifically, of course, in our field. And if we want to benefit from them, we need to go about this in a mindful way and know to ask the right questions. This is both to be able to get the full value out of these opportunities, but also not to make mistakes and get things wrong and maybe cause harm, maybe lose trust. And it's not easy. I mean, this is the challenge. I mean, it's so many ways you can get things wrong, and sometimes it will not be obvious that you got it wrong until much later. Connected with the ability of zero-shot learning and the ability of generative AI to do many different things, I mean, this has lowered the barrier for entry. I mean, it's much easier now to develop or test or just deploy, in a way, artificial intelligence than it would have been in the past, where, you know, when, to do it, you would have to sort of define your question, create maybe a reference set, fine-tune an advanced algorithm. There was a big barrier to doing it. Now the barriers have dropped for the development in a way. But I think we should actually... we need as much, if not more, validation or evaluation to know that we're actually getting things right. And I think there's a risk that the relative cost of that validation has gone up a lot with the development cost going down. So for that reason, I do think we need guidance in a general sense. Why do we need specific guidance in the context of pharmacovigilance? Clearly, there are more general guidance being developed. But I think it's good with some more concrete and precise guidance. We have some specific challenges or aspects of our field. I mean, it's a regulated environment, it has potential impact on patient safety and public health, so we need to bear that in mind. I also think we often have this setting of ambiguity where it's really hard to know what the right answer is, often, and much of what we're looking for is rare. So there's a low prevalence of signals or duplicates like we talked about, or just specific address events that we may be interested in, let's say.

Federica Santoro:

Yeah, no, you're right that the technology is so... is so accessible, and you don't have to be a trained data scientist to start playing around with Chat GPT or similar tools. But of course, as you say, this poses problems as well. So it is good to have those guidelines available. And the guidelines now advocate for a risk-based approach to using AI in PV. What do you mean by that?

Niklas Norén:

So it means that the measures we take should be proportional to the risk involved. We shouldn't have a one-size-fits-all: as soon as you use an AI, you should always do these things in this way. I think it depends on the risk, basically. So you need to think about the probability of, sort of, some harm happening and the impact of that harm, and then adjust your measures accordingly. That is the basic notion of it.

Federica Santoro:

And could you give us some examples then of low- and high- risk situations so we get a better understanding of this approach?

Niklas Norén:

And that I think depends on, it's not just about the AI, but the context in which the AI is used. So generally, I would say if you're using the AI to support human decision-making, but you have a human ultimately taking the decision or on the whole deciding, it's maybe lower risk, generally speaking, because then a human will look at that and make their judgment call, they will still be responsible. If you've automated something and there is no human in the decision, then generally speaking, you have a higher risk, of course. But then you have to think about, okay, what would an error mean here? And errors of different type, even for the same application, have different applications. So let's say you're doing duplicate detection and removal. So missing a pair of duplicates, not highlighting them, the cost of that may be a nuisance that, okay, now we're looking at a case series, and actually there are more... there are fewer reports than it looks like, and we spend time, we waste some time on that. If we accidentally take out a very important report, of course, that could lead to delay or failure to detect the signal. So the cost of that could be a low probability, but the impact of that could be quite hard. So you also have to think about that you don't always have a symmetrical cost of errors. Maybe one type of error is more problematic than the other one. But then to add to that, I think you also have to think about the whole human– AI team and what does it mean. You can have an AI method, a simple AI method, say a rule-based method to detect duplicates that has very high recall. It highlights almost every true duplicate, but it also highlights massive numbers of non-duplicates. And then if a human is reviewing all of that, they may well grow numb to the real duplicates and miss them in their assessments. And we've seen this in the past. So, like, the AI component was actually quite sensitive. The team was not because the human just got...

Federica Santoro:

...tired...

Niklas Norén:

...got tired or bored and failed to highlight. So you have to really look at it in that context. And I think when we look at the risk, you should also think about the baseline scenario. I mean, it could be very risk-averse and say we can't do this if there's any risk of getting it wrong, but we need to look at it like, what is the error rate? What is the risk of having a human doing all of this? Can they even do it? What kind of error rate do they have? And not just in a sort of a clean and nice experiment, but in a real-world setting where they get hungry, they get bored, they get distracted. That needs to be kept in mind.

Federica Santoro:

Yeah, absolutely. So, what I'm hearing is you really need to assess this on a case-by-case basis and take the task at hand and really decide critically, okay, what risk is there if the AI does this or does that, and go from there.

Niklas Norén:

Yeah.

Federica Santoro:

So, since we're on the topic of human oversight, I read that human oversight in AI activities can take the shape of human in the loop or human on the loop. I didn't quite understand the difference. So, can you clarify that for me and our listeners?

Niklas Norén:

So, a human in the loop would be performing the task together with the AI. So, every single task there's a human there. It could be that there's a suggestion from an AI, and then the human decides whether to accept that suggestion or not. It could also be, I think, more that the human is responsible for the task, but it's getting some recommendation or can get some inspiration from the AI. But it's really, they're doing it together, and the human is involved in every time the task is performed. A human on the loop means that there's... it's maybe a sort of a fully automated process, but there's a human overlooking and, you know, having oversight of the process and able to step in when relevant. I think this is a little bit harder to see exactly when that would happen in pharmacovigilance. You can definitely have... I mean, you also have the third level, which is human in command, and that just means that humans have the ultimate responsibility to design and decide when it's going to be used. So I don't know the precise definition. If, say, you have something where, say, UMC's Koda algorithm for coding medicinal product information, there will be an opportunity to say, in difficult cases, to delegate it to a human. Is that human in the loop or is that human on the loop? But then also if we deploy methods and we say, you're going to use our vigiMatch method to do deduplication, and regularly we're going to review its performance and decide whether we need to retrain it. Maybe we have some measures in place to even proactively highlight a model drift or a data drift where performance is not the same as we would expect. I think that is human in command, but maybe that is human on the loop. So, like, you hear... I mean, I think these are a little bit fluid as concepts, but the general idea is that a human in the loop is helping to perform the task, a human on the loop... A human on the loop in a self-driving car would be there's a human there. The car is driving, but the human is overlooking it. And if they feel the need, they will step in. It's just not as obvious to me what that would be... you know, when would we have a human overviewing an AI's activity in pharmacovigilance?

Federica Santoro:

Okay. So these definitions can be somewhat fluid, but I think the key point here is that human oversight is very much needed. And on that topic, I'd like us to take a critical look at the entire pharmacovigilance cycle, sort of from collecting data to analyzing it, digging for signals, and eventually communicating new safety information. And I'd like us to pinpoint where AI is used today, where it could be used in the future, and perhaps which tasks do you not advise AI for at all, ever? I mean, of course, this is speculative at this point, but let's try to get that overview. If we start from the status quo, so which tasks do we use AI for today?

Niklas Norén:

So in pharmacovigilance, and now in this broader sense, considering also the simple form of AI, so anything from basic predictive models to even rule-based algorithms to identify certain types of cases, etc. I think a lot of focus, if you go back 15-20 years, was more on the signal management and maybe signal detection in particular. So clearly, there's been a lot of work there. The disproportionality methods and basic triages we talked about. There have been developments of basic predictive models. So we have our vigiRank method that looks at other aspects of a case series than just the... you know, whether there are more reports than we would expect. We also look at the geographic spread, the content, and quality of individual reports, etc. And other groups have done similar things. There's also been work to identify cases or reports that are particularly important to look at and maybe should be prioritised in a review. But I would say that the general interest is now shifting very much to the case processing, and this is because this is so expensive and resource-intensive, and maybe also boring for the humans, that there's a great opportunity here to free up resources to maybe spend them on more value-adding tasks. So I think as a community, and of course, primary recipients of the reports here are driving this, you know, pharmaceutical industry and the regulators as well. But this I think is an important area, too. One area that we've been working on at UMC for a long time has been duplicate detection. And that's an area where we deployed a more sophisticated, now I'd call it artificial intelligence, method. We didn't at the time, but in the broader sense it is, for performing that task. And this is something which clearly we cannot do manually, even if we wanted to. I mean, no human can overview the 42 million reports in VigiBase and get a sense of where do we have potential duplication. So I think there's a variety of areas where we are already using it today. But I would say for the most part, the methods are quite straightforward. I mean, where we have seen some more sophisticated methods has been in natural language processing. So, using modern methods (could be deep neural networks and other methods) to process either narratives on the case reports to extract some useful information from there, but also, say, the regulatory documents describing already known adverse events like the summary of product characteristics documents and trying to know, is this adverse event actually listed already? So we shouldn't spend as much time on it.

Federica Santoro:

So, on a related topic, speaking about tasks where we already use artificial intelligence today, one of our listeners, Mohammed, asks, "how do you use AI to monitor adverse effects or to track signals?"

Niklas Norén:

Yeah, so this is a good question. And we mentioned the disproportionality analysis and some of the predictive models that have been used. Other work in this area, I think, relates to the ability to go beyond just the single drug and adverse event pairs, as we often base our analysis on. This could be looking for other types of risks, like related to drug–drug interactions. It could also be related to other risk groups. Maybe men are at a specific risk for a certain adverse event exposed to a specific medicine, maybe children have a different risk, etc. But also to look beyond the single adverse event terms in MedDRA that we use to code these adverse events and see, can we identify and bring together reports that actually describe the same clinical entity, even if the presentation in the patient was slightly different, maybe the coding of the term was different. So we've worked with cluster analysis methods to do that. Other groups have looked at network analysis or, you know, various ways to do that. We also have another stream of research here at UMC looking at semantic representations of drugs and adverse events to be able to maybe support the assessors in determining which MedDRA terms to even include in a search looking for relevant cases.

Federica Santoro:

Very interesting. So there's lots of areas where AI is already used today, and obviously, there's lots where we are still in the exploratory phase, perhaps. Now, I'd like to get you to talk a little bit more speculatively or philosophically, perhaps – we'll see where we land. Which tasks do we not entrust to AI today, but we could perhaps in future?

Niklas Norén:

Yes. So, one type of question that we've stayed clear of is anything related to a detailed assessment of the relationship between a medicine and adverse event, because this requires so much clinical knowledge and expertise and the ability to adjust the question and the considerations to that specific topic. So, for example, we developed a measure of information completeness on case reports a long time ago. We call it vigiGrade. This was reasonably straightforward. It was meant to just be the first dimension of a broader quality measure. And the next one we were interested in was to do a sort of a relevance measure that would look at the strength of... the strength of a case report as it relates to the possible causal association between a drug and an adverse event. But this proved to be almost insurmountable because even something... just one component of that consideration, the time to onset, it's so dependent on what we're looking at. So, what is a suggested time to onset if we think of this as being an adverse effect? Well, it depends on the medicine, it depends on the adverse event, and it may even depend on the relation. So, of course, to try to do that with old -school artificial intelligence or even, like, basic machine learning is very, very difficult. Now, with the reasoning capabilities of the generative AI models, this may no longer be completely out of reach. Maybe we can get these models to help us do this and actually then solve many tasks that were more difficult before. And maybe we can integrate this in both signal detection and signal validation. This I think would be a huge step forward. But it's not something we've done yet. It certainly requires a lot of research and it's... it's unknown. But I do have hope that we may be able to be more ambitious going forward.

Federica Santoro:

You see it as an option basically going forward. And who knows how many possibilities are out there that we haven't even thought about yet. So that'll be interesting to see, how the field moves on. Are there tasks that you feel now, in 2025, that you feel we will never be able to trust AI with?

Niklas Norén:

So, I've been around long enough to be careful with never saying never. I do think something like a benefit –risk evaluation, that's very personal. I mean, I don't know that that's something you want to outsource to anyone else, and certainly not a computer. So I think that would be less, sort of, appropriate for use, for sure. I also think, coming back to this question of ambiguity, I do think there are some tasks where if there's very much ambiguity, like say in the context of signal detection, where even the human world experts may not always agree, do we have enough evidence to say that there is a causal association between this medicine and this adverse effect, let alone when would we have agreed that, you know, at what point of time was that clear? So if we don't have the agreement on what the truth is, if we have that level of ambiguity, then I think it's very hard to see how we can develop AI to support us because we won't know if it's getting it right. How can we validate that it's doing something sensible? So I do think any area where we have a lot of uncertainty and where we sort of struggle to define what's the truth, then I think it's going to be quite difficult.

Federica Santoro:

A human endeavour in the end, still. All right. So we've gone into the nitty-gritty of the PV process and looked at specific parts of the cycle. Now, if we take a step back, will AI change the PV process as we know it? So, is there a chance that as we get better, for example, at accessing and analysing other types of data – and I know that is already being done, like real -world data to support spontaneous reporting data – could the PV cycle as we know it become obsolete? Could spontaneous reporting become obsolete? What is in store as the technology evolves?

Niklas Norén:

So I certainly hope that there will be transformation, that it won't remain the same. I mean, I think that would be a missed opportunity if we just... we run things like we've done and we just tweak something. So I think it's appropriate to think about the bigger picture and, you know, should we do some things completely differently? I think sometimes we get very enthusiastic and we think we can just let AI loose, especially with these more capable Gen AI methods. That, I think, is probably not the most effective way to go about this. And if you take the human analogy and we think about, how do we get humans to perform certain tasks, if we want them to be done with some consistency and some efficiency and reproducibility? Well, we have processes to support that too, right? So I do think that there's great opportunity, for example, to use generative AI maybe to extract information more effectively from real-world data sources like electronic health records. I do think when you do that, you know, research today shows, at least at this point, you don't tend to get very good results if you just generically ask a Gen AI method to just extract information here. You want to give it quite precise instructions, and the better you can break it down and the better you can prompt it, the better results you will get. So I think we're still going to have to define and we're going to have to be clear what are we looking for, what are we asking for, what's relevant to the decision making. Can we use Gen AI to support that thinking process? I'm sure we can, but I do think there's going to be a careful consideration of how to best do it. And I don't think, like, wholesale dismissing everything that's done today. I think we just have to carefully think through which parts can be done in a different way with still getting the same value as we're getting today or even better value. But to point to those specific areas, I think is... is quite difficult, and we're going have to explore and experiment and learn and then course adjust as we... as we proceed.

Federica Santoro:

That was all for part one, but we'll be back soon with the second part of the episode where we discuss the challenges of using AI in pharmacovigilance. If you'd like to know more about this topic, check out the episode's show notes for useful links. And if you'd like more pharmacovigilance stories, visit our new site Uppsala Reports at uppsalareports.org to stay up to date with news, research, and trends in the field. If you enjoy this podcast, subscribe to it in your favourite player so you won't miss an episode and spread the word so other listeners can find us. Uppsala Monitoring Centre is on Facebook, LinkedIn, X, and Blue sky, and we'd love to hear from you. Send us comments or suggestions for the show, or send in questions for our guests next time we open up for that. And visit our website to learn how we promote safer use of medicines and vaccines for everyone everywhere. For Drug Safety Matters, I'm Federica Santoro. I'd like to thank our listener Mohammed for submitting questions, my colleague Fredrik Brounéus for production support, and of course you for tuning in. Till next time!