Drug Safety Matters

#41 How to use artificial intelligence in pharmacovigilance, part 2 – Niklas Norén

Uppsala Monitoring Centre

Far from a future add-on, artificial intelligence is already embedded in the cycle of drug safety, from case processing to signal detection. Versatile generative AI models have raised the bar of possibilities but also increased the stakes. How do we use them without losing trust and where do we set the limits?

In this two-part episode, Niklas Norén, head of Research at Uppsala Monitoring Centre, unpacks how artificial intelligence can add value to pharmacovigilance and where it should – or shouldn’t – go next.

Tune in to find out:

  • How to keep up with rapid developments in AI technology
  • Why model and performance transparency both matter
  • How to protect sensitive patient data when using AI

Want to know more?

Listen to the first part of the interview here.

In May 2025, the CIOMS Working Group XIV drafted guidelines for the use of AI in pharmacovigilance. The draft report received more than a thousand comments during public consultation and is now being finalised.

Earlier this year, the World Health Organization issued guidance on large multi-modal models – a type of generative AI – when used in healthcare.

Niklas has spoken extensively on the potential and risks of AI in pharmacovigilance, including in this presentation at the University of Verona and in this Uppsala Reports article.

Other recent UMC publications cited in the interview or relevant to the topic include:

For more on the ‘black box’ issue and maintaining trust in AI, revisit this interview with GSK’s Michael Glaser from the Drug Safety Matters archive.


Join the conversation on social media
Follow us on Facebook, LinkedIn, X, or Bluesky and share your thoughts about the show with the hashtag #DrugSafetyMatters.

Got a story to share?
We’re always looking for new content and interesting people to interview. If you have a great idea for a show, get in touch!

About UMC
Read more about Uppsala Monitoring Centre and how we promote safer use of medicines and vaccines for everyone everywhere.

Federica Santoro:

Whether we realise it or not, artificial intelligence has already transformed our lives. Generative AI technologies rose to popularity just a few years ago, but they have already revolutionised the way we write, search for information, or interact online. In fact, it is hard to think of an industry that won't be transformed by AI in the next several years. And that includes pharmacovigilance. My name is Federica Santoro, and this is Drug Safety Matters, a podcast by Uppsala Monitoring Centre, where we explore current issues in pharmacovigilance and patient safety. Joining me on this special two-part episode is Niklas Norén, statistician, data scientist, and head of Research at Uppsala Monitoring Centre. If you haven't listened to the first part of the interview, I suggest you start there so you can get the most out of the discussions. In this second part, you'll hear about the challenges of using artificial intelligence in pharmacovigilance. From problems of transparency and data bias to how to keep up with a technology that moves so fast. I hope you enjoy listening.

Federica Santoro:

Well, I'd like to move on now to the challenges of the use of AI in PV and there's lots to say here. So, if we start with the transparency issue: one of the most fascinating, I think, but also very problematic aspects of modern AI technology is that it is so complex that it can be difficult, if not impossible, for humans – even people who are in the field, right? – it can be difficult for them to understand exactly how they work. This is referred to often as the 'black box' issue, which means input goes in, output comes out, but I don't really know what's happening inside the machine and how that transformation occurs. We covered artificial intelligence and pharmacovigilance in a recent episode of the podcast where Michael Glaser from GSK suggested we actually stop trying to look into the black box altogether. His suggestion is we focus on assessing the inputs and the outputs as best we can. But I couldn't help noticing one of the guiding principles in the C IOMS guidelines is transparency. So, is this suggestion by Michael at odds with our desire for transparency? If I can't understand it, how can I be honest about how AI works?

Niklas Norén:

So I agree that we have to think about transparency on at least two levels. And it's transparency related to the models, and it's transparency related to the performance. And I think of them as quite independent, but maybe if you have less in one, you may have to compensate to have more in the other one. And explainability is not the only form of transparency regarding the models. I mean, it's important, it can be valuable. I mean, it's clear we don't always require it. All of us now use methods that we don't understand how they work. We cannot understand why did this generative AI model suggest, you know, come up with this response. I mean, there's no way, I mean, any of us can know that, right? And we still accept it. So, but there are other forms of transparency related to a model like that that can still be relevant. We can understand, okay, what kind of model is this? Whose model is it, where was it trained, what kind of data has it been used for? If I'm using it for a specific task, I need to know how am I meant to be interacting with it. And those considerations are important. But then you have the performance transparency, which is the other side of the coin. And that's I think what's really important, especially when you don't have full transparency. So here you ask questions like, what kind of errors do we get? How often do we get errors if we do this? And not only that, but of course, in which specific data have you tested this? How big was the data? How diverse is the data? If I have a specific use case in mind, I have to think about how well aligned is that use case with the data it's been tested in. If it's not very well aligned, maybe I have to make sure I get another test in my own data, ideally, or even something that's much more closely related. So we need to look at that and see, you know, errors, how often do they happen, but also qualitatively see examples of errors that are made and both types of errors. So what kind of things that I want to be looking at, let's say, are missed? What things are highlighted, presented to me, that I don't want to look at? But also I want to see when am I getting things right, you know, what kind of successes do we see? Because maybe it's only solving the very simple problems, and that I can get from a performance evaluation. So I do think that performance evaluation is the centerpiece. That's what we always need to do. Because even if you have model transparency, maybe you have a very simple method, you can understand exactly how it works, but you don't know how well it works. It may look like that's going to be quite sensible, but then you look at it and you have a test set where you see these are the positive controls, the items I want to identify, and it actually doesn't do it. You need to have that there, too.

Federica Santoro:

So an extremely critical evaluation of the data that goes in, the data that goes out.

Niklas Norén:

Yeah. And I think this skill, maybe not doing your own performance evaluations, but asking the right questions related to this, because it's very easy as well for somebody to charm you, I mean, present you with the results and say, I've run this, I've had this great accuracy, let's say. Well, accuracy just says how, you know, what proportion of classifications are correct. But if I'm doing something that says almost everything is a negative, like in duplicate detection, I can do something dumb, like say nothing is a duplicate. I'm going to get massively good accuracy, right? That says nothing. So that's an obvious one. But there are also other more tricky things where we look for these low prevalence targets that we are often focusing on. Often what you have to do in your performance evaluation is you have to change your test set a little bit, and you have to make sure you have enough of what you're interested in in there, but then it's no longer like reality. And that means that the performance results you see on that test set may not transfer or will not transfer to the real world, and you can easily be fooled by that as well. So I think ultimately it's always good to run a pilot study in your own use case and see what kind of performance can I really expect. But I think this is where we'll see a lot of disappointment in the future because people will be promised something and they will not get it.

Federica Santoro:

Get it, right. On to the next challenge. Not only are AI models difficult to understand, but they also change extremely fast. We've all noticed it in the last few years, right? Even just trying to follow the developments of generative AI on the news. It's been hard, and it can be really hard for people who aren't trained in data or computer science to just keep up with the technological issues and advancements. And on that topic, another one of our listeners, Harika, sent in a question. She asks, with AI models evolving so quickly, how can regulators ensure ongoing transparency and oversight in pharmacovigilance? And she cites the 'black box' issue again. Are there plans within the CIOMS guidelines to help address these challenges? And do you have any practical advice for professionals like her who want to support the safe and ethical use of AI in pharmacovigilance?

Niklas Norén:

Yes. So I think this comes back to the risk-based approach. So I think we have to think about what is our use case, what kind of risk are we able to accept? And I think it's specifically related to the use of generative AI in a specific... somebody else's model, maybe a public model that is changing. You don't even know exactly when it's being retrained, etc. I n some use cases that may be acceptable. Maybe you say, that's fine, this is a component of my system that has this kind of risk, and I've looked at it and I've assessed it. Maybe you have to put some measures in place to make sure that your performance is not changing as the model is updated, or you have to look for ways to freeze the model and say, I actually don't want a model that is out of my control updated. I want to have it run in a way that I can control when it's being updated or not, if you have a very mission-critical system. I also think we should not go overboard, it depends on how you use it. We should remember that for many of these tasks, we rely on humans today. And I guess it's then sensible to look at what is our onboarding process and do we repeat the whole quality control when we have new staff coming on board, or do we have other ways to ensure that we're getting the same quality? Because in some respects, the way we use some of these large language models is not too far off from how we may use a human in the past and where we do accept some variability. So clearly, you want to have the right quality control in place, you want to have the right risk-based approach to your deployment of the AI. But the answer I think will vary. And then I will say, I mean, we now focus a lot on these generative AI methods. I will say a lot of the AI methods that are still going to be in use are going to be of different flavours. They will not change as...

Federica Santoro:

...as quickly...

Niklas Norén:

...as quickly or as opaquely. We'll have more control. But I also think we can learn more. We have certainly deployed AI methods in the past that were less complex, but maybe we should have been more mature in how we thought about how they changed over time. Maybe the model is updated, but even if the model isn't updated, data can change, so you can data- drift, so maybe performance isn't the same. So I think this goes back to the question in the beginning of why I think it's good to apply some of these principles or all of these principles, not just to the most recent class of methods, but more broadly to our use of computational methods to support our work.

Federica Santoro:

And a related question, given how fast and dynamic the AI field is, how will you ensure that the CIOMS guidelines themselves are kept up to date? How can a document like that be relevant if the technology evolves so quickly?

Niklas Norén:

So the simple answer is that the document will not be updated. Once the working group completes its mission and publishes the report, it will be published in that form and that will not be updated. This is at least how I understand the CIOMS processes. Now, we've been quite mindful of future- proofing as far as we can, trying not to say things we think may not hold in the future. It also means that we stay clear of very detailed specifications for exactly how to do it. We try not to recommend specific technologies, etc. So I hope the principles will stand the test of time, but of course, many things will change. I mean, I think luckily, in a way, the release of the generative AI methods, you know, starting with Chat GPT, happened during the course of the project. So, of course, that was a disruption, but it was very good it happened during the project and not after, because I think a lot of what we thought, even related to explainability, I think if you'd asked me before the working group started, I mean, do you need explainability? I would have said, well, you should always have explainability for many of the tasks, not for all of them, but for some tasks, you do want explainability. I would have said for something like de-identification, no, you don't need it. We already had methods based on deep neural networks for that at the time, and I would have said, I don't need to understand how it gets to the right decision, I just need to understand its performance. But I would have said that for something like signal detection, I do think you need explainability because you have a very ambiguous task and you have a human who needs to interpret that output. I'm still leaning towards that, but I think I have a more nuanced sense now that maybe if you can show that you can use the generative AI method here and do it in a good way, you can demonstrate its performance, you can communicate this in a good way to the end users, and you've said, well, I don't understand exactly how it does it, but it tends to get things right, and we can see historically that the case series it's been pointing to has had a high proportion of what's become signals once the humans have completed their assessments, then I think maybe we don't.

Federica Santoro:

Another issue that is on everyone's mind in the pharmacovigilance field is data privacy. Now, healthcare data is, by its own nature, full of highly sensitive information, right? There's patients' names, ages, genders, medical conditions. And so one of the most widespread concerns when it comes to AI technology is that it could easily infringe patients' privacy. Now, first of all, I'm curious to understand how those breaches may happen, and then what can we do to protect such sensitive data?

Niklas Norén:

So I think the most basic consideration here is if you start to want to develop AI methods where you use personal data to develop them, of course, in that process you open up the personal data to additional individuals, the data scientists and the machine- learning people who will be working with it. So that of course is, now you're exposing some data, and of course, you need to follow the proper governance and make sure you do this in the right way. But that is, of course, an extra additional exposure. We need to do a cost–benefit analysis and then say this is going to, is this worthwhile? But then I think there are some specific risks when it comes to data privacy connected to the generative AI methods in particular, and they're related to prompts, first of all. So you have to be very mindful of, if you're using... including any personal data in a prompt, you have to think about the communication channel. Can you be eavesdropped? Can somebody, you know, interrupt the transmission and actually get hold of it now? So you have to think about the communication channels, and then you have to look at the... if it's somebody else's model and and they're processing the data for you, you have to look at the agreements to make sure they're not using that to, for their own purposes because then you're transmitting the personal data. And those I think are the biggest risks that people are naive. And this is not just personal data, it can also be business-critical data. You don't realise that when you send something in a prompt, you're sharing that information. So that I think is a new risk we need to be mindful of. Theoretically, as well, we have to think about the risk that the the generative AI models that are trained on the personal data could somehow reproduce that personal data. So could somebody prompt a generative AI model trained on personal data, prompt them so that they get personal data back? I don't know if this is going to be a problem in practice, but in theory, I think it is now that you have models that don't just generate predefined categories but they actually generate text. So, in theory at least, I think that is a new risk as well.

Federica Santoro:

Lots to think about. And we'll get into this a little bit later, but it just strikes me that we all have to get a lot more data savvy, perhaps, than we are at the moment. Back to that in a little bit. But first, I had a question about bias, which is sort of intrinsic in pharmacovigilance data. Correct me if I'm wrong, but that's because, I guess, there's lots of information on adverse drug reactions that never reaches pharmacovigilance professionals because underreporting is so widespread. Then there's the issue of how certain groups of people will report more than others, certain subpopulations will be more represented than others in our data. And also that, in general, reporting trends will vary lots from country to country for a bunch of different factors. So if there are already these big gaps in data which can severely bias the conclusions we draw, is there a risk AI will make it worse? Could it make it better?

Niklas Norén:

I hope it will make it better. I mean, AI is a tool, and I guess it's a question of how we use that tool. Of course, if somebody uses AI to create fake reports, it's going to make things worse. But I hope we'll use AI to improve things. And I mean there's great opportunities there. I think the idea of co-piloting... So, now if I'm reporting an adverse event, I'm maybe filling in a form, a fixed form. Maybe I could be co-piloted by an AI prompting me for the right information. And depending on what I enter, they could maybe quality- check and say, that sounds like a strange dose. Is that correct? That's a weird dose unit. Oh, you mentioned this kind of an adverse event. In this context, it's good to know if you had an infection. So I would hope that we could get a lot more high-quality reports from the people who choose to report. And then can we also make sure that we have a more representative sample somehow? I think there are opportunities there as well, especially maybe in terms of embedding it in other information systems and extracting information from there. So I think there's plenty of opportunities. And I hope as a community we'll move generally for the better, so improving things overall.

Federica Santoro:

Absolutely. On to a somewhat philosophical question. When faced with new technology, especially something as revolutionary as artificial intelligence, humans tend to react in one of two extreme ways. Obviously, I'm taking the extremes and not the middle of the spectrum to be a little more provocative, but let's look at the two ends of the spectrum. So, some people can decide to reject it altogether, even demonise the new technology. I don't know what this is, I don't want to learn about it, it's not for me. Others can decide to hype it instead. It's the solution to all their problems. What would you advise fellow pharmacovigilance colleagues who perhaps have taken one or the other stance about the use of AI in PV?

Niklas Norén:

Well, I guess the first question is to understand a little bit the motivation for the stance. So if somebody says, I don't believe in this, I don't want to do it, it's of course good to know why they don't want to engage with it. But if the reason were that they don't believe it can work, that it cannot bring value, then I would always start with, then it's up to us to demonstrate that we can get value from it. And for me, I mean, a good use case, I mean a success story where you show... and, I mean, hopefully something that's meaningful to that person, that I think would be... would be convincing, I would hope. Now, if their reluctance is more on a different level, maybe they say, I'm reluctant because I think this is wrong, I don't think we should have, sort of, computers do this. Then it's a different argument, right? But if the reason you're reluctant is that you don't think it will bring value, then I think demonstrating how it could bring value, I think is one way forward. And similarly, I think if you have somebody who's very enthusiastic and thinks this will solve everything, then I think you just have to show some ways that it can go wrong. And I think this is the responsibility we have as a scientific community, also to demonstrate some of these pitfalls. I mean, we do something similar with disproportionality analysis, which we know is being terribly overused in a naive way and for the wrong reasons, where we now, we're going to try to publish a paper just calling out some of the pitfalls and just demonstrating by example where you can go wrong. I think we need something similar for AI. I mean, I can say this to you now, that you need to look at the performance results, you need to be wary that somebody can trick you with a precision or an accuracy. But I think to demonstrate that, by showing here's how... here's what it can look like when it's being presented to you, and here's what that actually means, and why you'd be amiss if you just trust that number, let's say.

Federica Santoro:

Yeah, and obviously one of the issues we all have experienced, if we've used the generative AI tools so far, is the issue of hallucination, how these models will never tell you that they don't have an answer, but they'll basically make up an answer for you, just to provide a response to your query. And lots of examples of that have been published in many different places and many different fields. So I think, that's probably one of the issues we are most aware of, right, as a community. But of course, there's all these other issues we've spoken about, the transparency and explainability and biases and and so on, that we have to be aware of.

Niklas Norén:

And another problem, I think, with the generative AI is that superficially it can look really good. This has been my experience. Sometimes I ask it something and like, oh, this looks great. And then I start to actually parse the text and see what is it actually saying. It's like, well, this doesn't really make sense. But I was charmed by the...

Federica Santoro:

 ...tone?

Niklas Norén:

...superficial, how it presented this. And I think this is a risk also with explainability. You could ask an LLM, well, explain why this is right, and maybe then it's the wrong sort of output to begin with, but they have such a convincing explanation for why it's right that we could get tricked as humans. So I think this is something to be aware of in the future as well. How do we avoid being misled by the more advanced methods?

Federica Santoro:

Yeah. Isn't this the same with human beings, right? You can listen to someone give a really confident talk and be completely sold, even though if you look at the content of their speech in more detail, then you realise, well, actually, no, I don't agree with this. So I think a problem with AI models is also that they present the information in such a confident and often empathic tone, right, that can convince you. So perhaps we should also look at it from that psychological point of view and see where we can fall into that trap of over-believing what's being presented to us. But whether you fear it or hype it, one thing is clear, and that's that as more automation and artificial intelligence find their way into pharmacovigilance, our traditional professional roles will transform. There's no doubt about this. I don't know if you feel the same about your position, but I certainly feel like that about mine. There's already dozens of artificial intelligence tools out there to create podcasts automatically. Just the other day, one of my colleagues told me about another pharmacovigilance podcast that is entirely AI- generated. So, we're laughing about this, but really the serious question here is, what skills must we learn to remain competitive on the job market?

Niklas Norén:

So, in addition to the need that I mentioned before, which is this ability to, on some level, critically appraise artificial intelligence solutions and be able to appreciate whether they will bring value or not, I think generally we have a responsibility and a need to engage with the technology and really see where it can bring value to us. So, as an example, for me, writing scientific papers, clearly I need to see to what extent can I benefit from some of the capabilities of generative AI in that context. And we've had some interesting experiences on this in the past year, some where we've been, I think, underwhelmed by the capabilities. One example of that would be asking a Gen AI to maybe shorten a text or write it in a different way. Oftentimes it will lose some specific nuance, and I think it's because we have a very precise way. It's important to say adverse event or adverse effect or adverse drug reaction. So often something tends to be lost in translation. I also have this experience I mentioned before where when I look at it first, it may look quite good, and then like, no, actually, this is not actually correct. So it didn't work so well for that yet. And that's not to say that it couldn't work. Maybe I don't just know how to prompt it in a good way or the models haven't matured there yet. What really sort of baffled me was when we asked it to do something which I thought was much more difficult, and I didn't think it would do a good job at it. We actually fed it a whole manuscript. It was our new vigiMatch paper, which is out as a preprint already. And this was an earlier stage during development, but it was quite mature, and we had, sort of, senior scientists working on it, and we fed it the whole manuscript and just asked it to look for inconsistencies or weak points of the manuscript. And it came back and said, well, when you describe the features of the new vigiMatch, you do a good job of defining them, but you don't actually motivate why you selected these and if you chose not to include some other ones. And it was spot on, it was something I would have expected from good scientists reading that and giving the extra... or maybe the peer review. And we got that at that stage and we're able to address it, you know, before finalising the paper. And that is not something I'd expected, but I could see a huge value in an extra pair of eyes on that level.

Federica Santoro:

That's very interesting, and it really goes to show once again how we just need to play around with it and figure what the best use cases will be and what will... where will it add value to our workflow? And there will be a lot of testing involved, but it's probably going to be valuable in the long run. One final question, and thank you a lot for your time. This has been such an interesting discussion. I'll wrap up with a question about the guidelines and what's happening next. The CIOMS guidelines have been out for a month for public consultation, and that period ended just a few days ago. We're recording this on the 11th of June. So what's next? When will the guidelines be published, first of all? And what do you hope... what do you and the working group hope will change once they're out there in the world?

Niklas Norén:

So the next step is that the input gathered so far will be collected, and we have a meeting with the working group at the end of the month where we'll process that input. And I don't know, I've not seen at all the amount of comments having come in. But based on that, we'll have a better sense of how much work is ahead of us, how much feedback we've received, and how much effort is required to address those comments. And that will determine the length of the next. I'm hoping we'll be able to do it in the autumn, but caveating that with not having seen how many... how much feedback has been received, and then we will finalise the report and make it available. My hope for the future is that this will be of value to the community, that we can use it. I mean, it's clearly not something you just take and apply. Each organisation, ours included, will have to think about how do we apply this, how do we implement it. But I hope it can be help and guidance for us in going forward and knowing how to do this in a good and consistent way, and hopefully then avoid some of the possible missteps and enable us to get the best possible value. It's certainly been a great learning experience for me being part of this working group. So many knowledgeable and wise people coming together thinking about this and really also pushing our own thinking on this topic. And even, I will say, we continue to do it with the podcast today. I had to reflect again and think about things again. So, really, thank you. It's my first time here, and it's been a pleasure to participate.

Federica Santoro:

It's been wonderful to have this conversation. So, thank you so much for taking the time to come to the studio.

Niklas Norén:

Thank you.

Federica Santoro:

That's all for now, but we'll be back soon with more conversations on medicines safety. If you'd like to know more about artificial intelligence and pharmacovigilance, check out the episode show notes for useful links. And if you'd like more pharmacovigilance stories, visit our news site Uppsala Reports at uppsalareports.org to stay up to date with news, research, and trends in the field. If you enjoy this podcast, subscribe to it in your favourite player so you won't miss an episode and spread the word so other listeners can find us. Uppsala Monitoring Centre is on Facebook, LinkedIn, X, and Blue sky, and we'd love to hear from you. Send us comments or suggestions for the show, or send in questions for our guests next time we open up for that. And visit our website to learn how we promote safer use of medicines and vaccines for everyone everywhere. For Drug Safety Matters, I'm Federica Santoro. I'd like to thank our listener Harika for submitting questions, my colleague Fredrik Brounéus for production support, and of course you for tuning in. Till next time!