mnemonic security podcast

Auditing AI

mnemonic

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 34:05

How do you audit machine learning models, and where do you start on your AI governance journey? 

In this episode, Robby is joined by Gaute Brynildsen, Chief Audit Executive at Gjensidige, one of the leading Nordic insurance groups. Gjensidige has built a mature and tested approach to AI governance, and Gaute shares what they’ve learned along the way.

Gaute explains how they went about auditing their in-house machine learning model trained solely on their own data, before expanding into broader governance across security, policies, roles, training, and risk. 

He also covers where he recommends starting when building AI governance, highlighting the risks of shadow AI and how to monitor it, the importance of cloud competence and the value of an AI risk officer role.

They also discuss the level of automation among organisations in the Nordics, exploring agentic agents, and whether it’s overhyped or the next real shift.

Send us Fan Mail

Speaker 2

From our headquarters in Oslo, Norway, and on behalf of our host, Robby Peralta, welcome to the mnemonic security podcast.

Robby Peralta

Everyone's rushing to deploy it. Nobody's rushing to audit it. You've maybe done some tabletops around ransomware or data leakage. But who's actually looking at your AI? That's the job of internal audit. And today's guest has been doing it for real at Norway's largest insurer. He reports directly to the board. Not the CEO, just the board. Which means he can walk into any room in the company and ask uncomfortable questions. And nobody can tell him to stop. His name is Gautsa. And he might be the only person I've spoken to who makes auditing sound like the most interesting job in the building. Gaute Brynildsen, welcome to the podcast. Thank you. Glad to be here. Today I want to pick your brain about auditing AI.

Gaute Brynildsen

Yes.

Robby Peralta

What world of audit do you live in?

Gaute Brynildsen

What I don't like about the word audit is that most people, when they think about audit, they are thinking about financial audit. Financial audit is more about checking controls specific for the accounts and they're looking backwards. In the internal auditing, we are looking at all risk dimensions. So basically, how will the company improve? It's one of the most underestimated, interesting professions to have, where you have all the possibilities to change the company. Because we have insight into everything that happens from the board of directors and down into the nitty-gritty details of everything, from how you run claims or how you do investments or how you handle the bank accounts or how you handle IT or AI for that sense. So we are a function that reports directly to the board of directors. So the board of directors hire two people in the company: it's the CEO and me. The CEO can't direct me, he can have opinions about my work, but I can also not care. That wouldn't be constructive. And the best thing about internal audit is that you get to talk to anyone, you get access to all the information, and you learn something new every day. It's never, almost never the same audit. Everything about governance, culture, understanding the business as much as it's understanding the models and uh how AI works. So I just wanted to say that about internal audit. I think it's because most people don't know what internal audit is. We're there uh to help improve the company.

Robby Peralta

Over time, since you started working with audit, how I'm kind of imagining you sitting in a lot of meetings before, sort of interviewing people. That may be like that today. I know that you still have to talk to people, but how is that sort of journey been the transformation, or is it still the same?

Gaute Brynildsen

Internal audit is a communication field. So basically, in how do you actually create change? And it's about having good communication, having the same conceptual understanding about things with people. It's still a human job. But of course, we use a lot of AI to help us do uh audit. So we did one strategic choice early when ChatGPT became popular. So we made a strategic choice. What are we gonna focus on now? Are we gonna focus on we will use AI to improve ourselves, or we will teach ourselves how to audit AI? So we chose the most difficult thing. And that was how to audit AI. And we went straight into a machine learning model.

Robby Peralta

When you say machine learning model, that's not like something from OpenAI or Anthropic. You are auditing your own machine learning model. Can you clarify that a little bit? Yeah.

Gaute Brynildsen

So basically, it's not a pre-trained model, it's a model uh which is a machine learning algorithm, which we only train on our data. So it doesn't take any bias in from other data that we don't know. We know that the model uh itself, we know what it does. Uh, but of course it creates a lot of vectors that we it's immensely complex because we put a lot of data into it. And when we did that audit, uh, one of the key things that we looked at was basically what kind of people were involved in deciding that we would choose uh to use AR for this kind of task. What kind of competence did we have in who chose the model, uh, if it was fit for purpose for what we want to solve. And one what can go wrong still is that they might not have the critical thinking about what kind of risks are involved when we use this kind of data in that model. When we did that, we looked at what data did they put in. And knowing that we work with a lot of personal data, we have uh immense amounts of data from our claims, for example. So we're all looking what kind of data did you put in there? Did you analyze the data first? Did you look for anything that was a deviation from the normal set that you didn't want to train it on? Did you clean the data? All these columns that you put in here, are they relevant for what the model is going to do? And does any of it contribute to any bias that we don't want or any discrimination that we don't want? Are we allowed to use this data for this? So these are the things that we are discussing. And that that was one of the key issues that we brought into these topics. And we have some government agencies who are working especially around discrimination or sustainability. They made a guide for what should you look for when you're looking for bias and discrimination. And that model from the government was really good. So we use that also as a guide to have these discussions. And what is really complex about when you build vectors on data is to understand, even though let's say that we can't discriminate on gender, even though if you don't put gender in it, we would still, with our data, have a possibility to separate gender from it. So it's basically finding these hidden patterns that the model will understand and will start to produce output, not because the model wants to discriminate, it doesn't have its own opinion, but from the data, it can actually turn out to discriminate on gender. So we are in insurance in Norway not allowed to discriminate on gender for car insurance, for example, which is a huge benefit for us men because we are really bad drivers compared to women, although we like to think differently. But the data shows that we have much more expensive claims than uh women, if we take the big numbers, and not me, of course. Makes sense. But then if you look at them, uh we have to take away the color of the cars, we have to take away some dimensions from the cars uh that to basically reduce the chance of uh this kind of discrimination. So this audit taught us a lot. So by delving deep into some complexity like this, made us much more able to go to the next audit where probably other audit functions would start. Other audit functions would probably start at the more easy thing, which would go for how do we live by the AI Act? How do we create policies and governing documents and roles and responsibilities around this? Which is also important. But when we audited this model, we built a company that was quite unique to be more relevant in the next audit that we did, which was the AI governance audit. Where we were looking more at the downside risks, where we were seeing: do we have the right security regulations in place? Do we have the right policies in place? The risk matrix, how we will govern it, the inventory system, uh, the roles and responsibilities, the awareness training that should be in place, all these kind of communication parts of it. So we could have more relevant discussions because we have this experience from before. Of course, we have been using Chat GPT, etc. So we are we're familiar with this.

Robby Peralta

Your context here is insurance and probably making making prices and doing insurance things. You were probably one of the few companies in Norway that's actually made your own models, and you've been doing that's the core of your business, I guess. Yes. So this is nothing new for you in that sense. Is that is that correctly understood?

Gaute Brynildsen

That's correct. And where we came from earlier is that the way we build something called tariffs is a mathematical model where we are trying to calculate the risks based on our data that we have from earlier claims and other extra official data. And then we build a risk model called a tariff that will basically put data in and say this will then be the insurance premium. It's just statistics. It doesn't have the same problems that we have from the model machine learning models that we're using because they're so complex. These other models that we were using before, we could totally explain.

Robby Peralta

So deterministic, non-deterministic is what you're talking about now, the models that you've been building and auditing. Do those hallucinate? Or am I because that's different from like one of these frontier models, correct?

Gaute Brynildsen

Yeah, so basically what we are did with the machine learning model is that the chance of hallucinating is less because we know what data it was trained on. It doesn't contain any other data. So basically what it does is basically producing only numerical output for pricing. And this is also a thing that a lot of people are missing when they are choosing different AI solutions for different tasks is how do we make them explainable? How do you manage to build a model that you can explain? So our key concept of the audit of machine learning is that we contributed to making it more explainable and coming closer to what we call the glass box model. Can you explain that? How do you did that? Yeah, so the first phase of it that they did have in place was that they made the model to output to the customer service department or those who were uh going to talk to the customer, giving the I won't say the number of factors, but it's a uh set number of factors that contributed most to the decision. So the dangerous thing about implementing AI is that if you are implementing it in anything that affects the customer, and the customer calls back uh or something happened with a decision that they can't explain, you can't have someone sitting there, oh, it's just AI, I don't know what that machine is doing. That's gonna be ruining your company. So it's gonna be on Vega. Yeah, yeah. It's uh it's not where you want to end up. So we built this uh structure where it basically said that these are the factors that contributed most to that decision. Because there could be 1,000 factors, but the most prominent one. Once it's uh multiple, uh quite a few actually. And I think that's important to enable the customers to get an explanation that is understandable. That's one of the factors. Now, with the glass box model, we have a more open machine learning model uh that has been developed over time, and we're gonna go back and audit that one and then understand how that is so much more explainable now than it was before. So it's been multiple versions going through uh to improve it. But I think we were key in contributing to get those discussions in place and start that ball rolling. So I think we were recognized by the financial subvisory authorities as the one only one in insurance so far at that time who had audited anything AI, which we were quite proud of.

Robby Peralta

Awesome, all the work that you've done. Have you tried to audit one of the foundation models?

Gaute Brynildsen

I would say yes and no. So what you you can't is that uh let's say you take a Lama model or any other model that you find, GPT for all or chat uh RTX or whatever. You can't basically know what it's been trained on because it's a model that's been trained on a lot of data. And they can maybe tell you what it's been trained on, but it's still gonna be a guess. What you can do, of course, is that you can build a model when you train it on your own data and uh you basically start to use the model for your own data, you can build another model or build another system who's gonna uh test and ask questions for from it. You can also limit what kind of prompts that you can send to it. So you won't get that kind of uh can I make a bomb or what is uh this and this person's salary or something like that. So you can do templates for prompts, input, and then you can also do output prompts saying that these are what that model is supposed to do. So let's take an example. If we have a llama model and I'm training it for the cantina on how to make uh Indian food, then I can basically say that what you're gonna output is only things that have food ingredients in them. Nothing else. You will only answer it within this and this framework. Then you can start to limit hallucination or anything that a bad model can do. But you can also start to train it on a very diverse set of questions that you can do structured without AI asking it, to basically test it. And then you will uh see if you have a good enough training set, you will be able to try to find something. So, for example, if you want to test discrimination, you can test with a set of data that is fit for purpose, where you want to test for does it become a racist or does it do any decisions on gender or things that we don't want. And it's easy to create that data and just send it and see do we get different answers? Trying diverse sets. And the more advanced model is that you start to make an AI challenger model that will challenge the model and start to work with it and then see if it will deviate uh or give a bad output.

Robby Peralta

You're at the very mature end of the scale. All the other companies that uh I feel like the the vast majority just started, they get started with this, and they're wondering where they should start on their governance journey. What would you tell them?

Gaute Brynildsen

There are two things that you really need to evaluate. One is the employee side of it, shadow AI. You don't want to have your data sent out to a public model. It's probably not going to happen anything, but it will still be a breach, compliance breach, uh sending data where you're not supposed to send data, especially from a privacy perspective. You should as fast as possible try to evaluate uh some models that are generative, give them to your uh employees and make rules. Then you mean need to create those boundaries, making awareness, uh, what can we use the different models for? Is it okay to use an external model or not, etc.? And then you need to monitor it for it. So what we did, we did a shadow AI audit, and to make it simple what we did, uh, because we're mostly talking to security people, right?

Robby Peralta

I'd be very surprised if anybody else was listening to this. But they're welcome. I'm listening to your podcast.

Gaute Brynildsen

So um what we did was basically ask for the proxy server logs. And we got a lot of data out from there, and then we were looking for traces of are they using NODNL and external models that are the most common ones. We didn't look for everything, it was like the big ones. This this is uh two years ago now, I think. Uh I don't remember. It's uh a long time ago in my world. It's time flies when you're having fun. We found out that uh how much data is coming in, which is not that interesting, but we could see usage coming in, then but we were also looking at how much data is going out. And if you look at how much data going out, one prompt on text is very little data. Sending documents, you will see the difference. And then we use it for awareness. We see which kind of uh employees and which departments are using it. And we in internal audit are allowed to do these things because internal audit can combine data sources that no one else can. And we also have strict confidentiality uh requirements to our department. So we were using it to create awareness for management and saying that we know it's gonna be being used. We know in those departments, you need to talk about awareness. What can we use for what purpose? So we didn't go after persons in this audit. So that is one aspect. I think enable your employees, otherwise, they will use something else. And then from a company perspective, is do the evaluation about can we use AI? Uh would it be useful for us? And then in order to ask us questions, you need to understand what AI is and what AI isn't. All these people who are used to saying no, legal, compliance, risk, also internal, some internal audit functions, who are afraid because they don't understand it, build competence there as well. And one thing I think is also important when you're building this competence is that you also need to build cloud confidence. Because if you don't know what an enterprise boundary is and these normal cloud structures, what is in within a subscription, what is in a personal boundary, and how these cloud structures work, then you're also gonna fail about what is dangerous and not. If you build that confidence, then you can start to work with the right choices. Where will we have the most effect from AI? Because if you just start random, like uh working with AI here and there, it's gonna be a limited set of people who can actually make those AI solutions. So, how do you use those to create the most effect for the company? That's a strategic decision. So, what we built in our company, we built a unit who will attract the best competence and do that strategic enabler work on what which processes are gonna get priority with these resources first, where it's gonna have effect. Cool. But you also need to enable these small tasks when you're mature enough, then when you are gonna build co-pilot agents with uh co-pilot studio, for example, which is a system you can just buy if you have Azure, for example. There are different other solutions. Those agents can solve small business problems or big business problems. So basically enabling that to also get people to work on this, given that they have competence to ask themselves the right questions. Can I use this data for that task? Is it kind of fit for purpose? Is it legal? Which is one important question. Uh, can we trust it to make that decision? So the way we uh made that is that we made an AI governance system that is built in ServiceNow, where basically when you start to ask yourself these questions, you are getting a guide to what is the typical questions you want to ask yourselves when you are handling personal data. What is the simple questions you should ask when you are working on customer closed processes, which are things that you need to build. So make governance relevant for your company, for your data, for what is the most dangerous things you could do with AI, which are you say this is not legal, you can't put AI in there unless you talk to these people and we really make uh a group management decision on it, for example. So you have to get all this concept in place to make the right decisions.

Robby Peralta

If you don't mind me asking, if you're allowed to share, which teams won those early resources in the beginning.

Gaute Brynildsen

What we've seen is that they're using it in HR. And now you're thinking, oh, that's the dangerous part. We're not using it in recruiting. Because that's the big no no that if you're using that to automatically hire people, then you're gonna fail. That's prohibited. But we use it for the personal handbook. And it's it's it's really efficient. Um we use it for the customer advisors, you're getting a tool to say. Say this customer is asking this what are the rules for selling these kind of products? So we have a lot of rules for all the insurance products that we're gonna sell. You should ask these and these questions. So it's much more efficient for the case handlers to just ask in the model. And now we are putting it closer to the core processes. So I think we were forward-leaning in some cases, and I'm kind of the impatient auditor, so I would uh I want it to go faster. So we are not we're starting just now to put it directly into our core processing uh of insurance, especially claims, where it will have an effect. Because if you put it right into your production systems where you have the most people working, you can improve the quality. So we see that the quality is improving compared to manual handling, and the speed is improving. And then you can use your competence and the best people to handle the most complex cases. So this is where we are getting traction now. And it's uh I go to this insurer tech conference every year, and in the Nordics, we are we have a high degree of automation compared to the rest of Europe. So when I went to this uh in earlier years, I was thinking, oh, we are so far ahead. They're still doing manual pricing and they still have like branches all over and have manual handling for a lot of things. And then I came back a few years ago and then I saw like the big European players are putting AI straight into the core processes. And these are big major players who are getting real effects from it. And not just production cost saving effects or efficiency effects, but quality effects. They can handle so much more data. So they put it into data-heavy solutions. So in some claims, you have to deliver a lot of documentation. Just the evaluation or classification of those things, a machine can do much faster and with much higher quality.

Robby Peralta

So far, the use cases have been using generative AI. Have you moved to the more agentic? You know, they talk about this agency AI where things are actually going and doing things by themselves. Have you gotten there yet? And if you haven't, why not?

Gaute Brynildsen

I think we think we have enough use cases now with machine learning and with the AI solution that are going in advisory to enabling our employees, which are in directly into the profession, and with the automation area of it. Agentic, uh, I think uh we are building agents, but we will look at it this year to see what they're doing because I don't know everything the company is doing, because we have a lot of analytics people, because insurance is all about analytics. And these people are also the ones who are adept at building solutions. So we can probably see maybe there are some agents that talk to each other that we are doing workflow and task handling between agents, but we haven't seen that in the seed yet because we are have this on this year's audit lab to look through this. But I see that there are coming Azure tools that can help us. So we can look for MCP traffic. Uh MCP is the communication protocol between AI agents. They have a security agent that can automatically detect MCP traffic in your subscription. So these things that we will look for to automate more to find where these things come into our environment.

Robby Peralta

Do you think that whole agentec AI thing is just overhyped then?

Gaute Brynildsen

No. Not at all. One of the things that we can do is to build narrow AI solutions who can handle specific tasks and are just good at that, which will also reduce the risk compared to big generic models. So I think it's a possibility for us to have more control if you compare it to microservices kind of uh architectures, that we can build more structured, narrow solutions that have more explainability, more narrow solutions that function as agents that are going to talk to each other based on tasks that they're gonna do. So I think it's gonna be more of the future in how we should design things from an architectural perspective and risk perspective.

Robby Peralta

So a mature AI governance program. Do you have any tips for like where to start and how do you like operationalize it in practice just based on your two years of doing it?

Gaute Brynildsen

So don't build an AI governance just on the side. Integrate it with what you've always been doing. So most of us have an STLC process, we have a procurement process, we have a risk management process, we have a governing structure for how we handle privacy, integrated into those structures.

Robby Peralta

Hmm. I was thinking it makes sense that you own this process in your organization because the audit function is holy, like the board hired you and the CEO, which is that's crazy. It makes sense though. What about for companies that don't have a really strong compliance or audit function? Because you're an insurance, like you they'll take away your license or whatever if you don't adhere to these things, but most companies don't have that.

Gaute Brynildsen

No, so basically uh they don't need that either. It's it's gonna be helpful on the journey if you have the right people in those functions. Uh so we hired a person to be the AI risk officer. He's in first line, he's not in second line or third line, he's helping guide those decisions. So if you build that competence in one per one or more persons, who's gonna help ask the right questions when people are starting to work with this? I think that is one of the key enablers that you need to have in place. He's organized in one of our most sensitive areas where we're using most data. But he's also going across division or department, so he can go anywhere because he has a company role. So if you don't have a compliance function or anything, hire one who's gonna be lead on AI risk things, security things, or competence in what AI is and what it isn't. And that person can help you build the training program, the awareness program, help building how we ask the right questions in our governing documents. Because what I hate about governing documents, if you don't make them pragmatic about context sense relevant for your business, it's just generic theoretical stuff, then it's not gonna work. You have to make how do we ask the right AI questions for the business that we are in. So get someone to work with it because this is gonna take time if you're gonna really enable AI in your company.

Robby Peralta

Last question could be a fun one. You still do cyber insurance, correct? We do cyber insurance, yeah. I've read that certain providers are excluding AI systems from their coverage. What's your take on that?

Gaute Brynildsen

I think that's gonna mature over time. What they are really afraid of now is that companies are putting big language models into customer-facing services without knowing what they do. So I think in order for this to mature, then they need to also, from an insurer's perspective, they need to set more strict requirements about what kind of governance can you need to have in place. So I don't know. I like to say that cyber insurance makes the world better if you try to buy it. Because if you try to buy it, then you will look, uh then you will understand more about what uh you're expected to do as a company. Because there's a lot of requirements in it. So if you start to think about this, this will automatically start to improve if you actually do something about it. Because you will have to do a lot before you buy uh cyber insurance. The same will apply to AI insurance when it's matures. The best way you can prepare the things we talked about earlier about getting the right governance in place, asking the right questions, uh starting maturingness, working structured with it, will help you when you want to buy insurance.

Robby Peralta

It's kind of crazy, right? That there hasn't been any big newsworthy attacks coming from AI systems lately. I thought there'd be more by now.

Gaute Brynildsen

Yeah, it's the recent attack that was before the weekend, where they get all the prompts. It's a small narrow uh AI solution built for uh anime manga things where you want to create comics or uh figures. And who knows what they've used that for. Oh no. It's uh so basically if you combine the prompts with the person, you can basically do extortion. Yeah, that's horrible. No, but you have horrible incidents. So you have the Dutch case. Whenever I talk about it, I get tears in my eyes, where they put AI solutions direct on uh some of the most vulnerable people in society on welfare and support, basically for people who need welfare tickets or payments. And they automated decisions where they were looking for fraud and they stopped welfare payments. People lost uh right to be with their children, people lost their house. The people who worked in the welfare department could not answer why the payments were stopped. So what everything went wrong? It's a terrible case. And this is about terrible wrong competence. And if you think about security, what it really means. Security is not only about attacks, it's about how you create systems that have integrity availability as well. And I think these things, integrity, is key to AI, that you can actually do the right thing with the right data. This is where this is kind of an employee error where someone put AI on the wrong solution. So a lot of things are happening where you put AI on the wrong tasks. And that's so you had the Trump administration doing it on veterans, uh, also trying to find fraud, and that model was not built for finding that. It's it's just horrible. Imagine uh how many companies are rolling out things because they're stressed about if we're not in the AI game, we're not relevant. And I'll leave one end note. I'm talking a lot now. But if you look at all the SaaS solutions that you're buying, all the solutions that you have, the cloud solutions, AI is coming from the side. So you need to know that suddenly AI is doing something to your data because in the last update that just happened, suddenly an AI solution was in there. Because now, if you work in tech and Linux software, you're not relevant if you don't have something AI. So that's a it's a huge pressure. So there's gonna be a lot of mistakes.

Robby Peralta

Well, Gaute, thank you so much for your time today and for showing your expertise and for making audit sexy again. I never knew that you had so much fun at work. No, it's it's a lot of fun.

Gaute Brynildsen

So, thank you for so much for uh inviting me. It's a learning experience to just uh communicate this to others. The honor and pleasure is all mine.

Robby Peralta

Thank you so much, Gaute. Take care. See you. Take care. Well, that's all for today, folks. Thank you for tuning in to the mnemonic security oodcast. If you have any concepts or ideas that you'd like us to discuss on future episodes, please feel free to hit me up on LinkedIn or to send us a mail to podcast at mnemonic.no. Thank you for listening, and we'll see you next time.