Meru Data's Podcast

Simplify for Success - Conversation with Matt McClellan

Priya Keshav

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 30:46

In this episode of Simplifying for Success, host Priya Keshav, CEO of Meru Data, is joined by Matt McClellan, Managing Director at Ankura, to dive deep into the evolving importance of data minimization—a cornerstone of both privacy compliance and AI governance. Together, they explore how organizations can balance the increasing demand for data with regulatory expectations and ethical responsibilities.

From the foundations of retention schedules and business use cases to real-world implementation challenges and the cultural shift required within companies, the conversation offers practical insights, strategies, and lessons learned. Whether you're working in privacy, compliance, or AI, this episode offers guidance on how to align your data practices with both business value and regulatory rigor.

This episode is a must-listen for privacy professionals, data leaders, and anyone involved in the use of AI in business. Approved by the IAPP (International Association of Privacy Professionals) for Continuing Privacy Education (CPE) credits.

 Tune in to learn how to simplify data minimization for AI in your business—safely, ethically, and effectively. 

00:00 – Priya Keshav 

Hello everyone. Welcome to our podcast around simplifying for success. We've invited a few colleagues in privacy and AI governance space to share strategy and approaches for simplification.  I'm your host Priya Keshav.  I'm the CEO and founder of Meru Data.  At Meru Data, we work with companies to operationalize the privacy and AI governance programs. Today, we'll be talking to McClellan. He has... A lot of experience and tips to share with us today around data minimization.  Hi, Matt. Thank you for joining us today. Hi, Priya. Would you like to tell us a little bit about you and your organization?  

 

00:44 - Matt McClellan

Sure. So, my name is Matt McClellan. I'm the practice lead and managing director for the information governance practice here at Ankura. We're a multinational consulting firm that works across all different industries and disciplines. And, you know, for myself, I have spent the last 20-ish years in the data and information space.  A lot of that time I spent in-house as running programs like Blue Cross Blue Shield North Carolina, MetLife, being to the larger places that I spent time. But over the last three years, I've been at Ankura helping build and lead our IG practice to help organizations build out the foundational components of an IG program and then operation-pause that for data minimization at scale.  

 

01:45  - Priya Keshav

So, as I mentioned, we're here to talk about data minimization.  Concept of data minimization has been around for a long time.  It is one of the key principles of privacy, but with AI and now the need for vast amount of data for training and to feed AI algorithm, this topic is once again at the forefront and for privacy reasons too. And we'll probably talk a little bit about that as well.  But data minimization at its core has two parts to it. One is collecting as little as needed.  or necessary.  And the other part of it is to retain personal information to accomplish what is identified as necessary for the purpose of which it was collected and then disposing it off after that. But why is data minimization an important topic now in your mind?   

 

 

02:47 - Matt McClellan

I think it's critical for any organization that wants to pursue the hot topics of AI or analytics or just being a good steward of information from a privacy perspective, you really can't adequately do your job from a privacy perspective or have good, clean analytics and useful artificial intelligence. if you don't have good clean data. And so being able to minimize the data that is either outside of retention or has limited value allows your organization to be in position to utilize the right data, the right time and the right way to build out their AI, their analytics or to fulfill their policy concerns.   

 

03:45 - Priya Keshav 

Let’s dive a little bit deeper on the privacy side, right? So, the data minimization rules, obviously it was first introduced with California and pretty much has been part of every state privacy law.  And it's of course an important principle in GDPR as well.  But California went further in kind of at that point saying that, you know, it should not be retained, you know, for purposes beyond what is reasonable, right?  And the idea was to sort of look at reasonable expectations of the customer. And now of course, Maryland sort of took it further to talk about what is strictly necessary, which kind of limits the collection and use of personal data to what is necessary to provide and maintain the requested product or service, which is a lot more.  a trickier in terms of how long you keep it. And there's been a lot of debate in the privacy community.  And in terms of whether, you know, that is too much or  whether the reasonable is too little because it sort of prompts companies to sort of say it's reasonable to keep the data for  many purposes that are,  you know, that are maybe not quite closely aligned with the expectations of the customer. So, it's been a debate and it's one of the hottest topic in terms of where everybody lands. And obviously you can see that, you know, most of the other states have sort of tried to introduce a Maryland type restriction in their privacy law. And it's widely debated and, and it's sort of not really gone there yet.  So, but, you know, it's just a question that has prompted us to think about data minimization, where, what is right in terms of expectations, and how do you sort of build best practices for data minimization on the privacy side. So that's just from the legal and legislature perspective, but obviously when it comes to implementation, you still see people struggling to kind of make data minimization effective within companies, but what are your experiences when it comes to data minimization and implementation from a privacy perspective?  

 

06:20 - Matt McClellan

One of the hardest things to do is to be able to connect privacy with your information governance perspective, the retention perspective.  And for data minimization, to be able to know what's reasonable and to know what is at best the customer's expectation is to provide a business purpose as to why you need to keep or to manage this information, whatever it is. Whether it's past privacy, you need to be able to say, I'm keeping this type of information that may have no privacy elements to it for a certain period of time because of this business reason and extend that through the rest of  the information in your organization. And we typically do that through a connection of data inventory to a retention schedule and tying a business purpose to those. If you can define the reason you're keeping it, you can have a better understanding of how that information is used day to day by your organization and be able to stay to a regulator or an opposing counsel, whoever, this is why we do this. This is a way that we manage our business. We're doing this for the right reasons in the right way. And we've done the due diligence to explain why we have this information for this period of time. And so the business purpose requires  a bit of a heavy lift to talk to business stakeholders and to get their perspective and to refine that in a way that is consistent a cross the organization. But it is  a Is a necessary effort that puts you in a better place to comply through a host of different requirements, whether that's privacy, retention, e-discovery, any of those kind of data centric efforts.   

 

08:20 - Priya Keshav 

Maybe we can spend a little bit of time a little later about this whole retention schedule and building the retention like the old fashioned way,  you know, for retaining purposes so we don't delete the data before it is,  you know, it was to not delete it. So it wouldn't be something that would be subject to this pollination issues, right? Versus now you're trying to keep it for as limited amount as possible.   and the idea of trying to look at what is a record and kind of making sure the records are retained versus the personal information and the conflicts between the two. But before we go there though, like maybe on the bigger, on the topic of just the data minimization,  should we jump a little bit into the AI side and  talk a little bit about,  for businesses that kind of think because you mentioned that you need to talk to businesses  to do the heavy lifting to talk about the business purpose. So as you kind of have these conversations, do they, you know, this underlying thought that data minimization is in conflict with business objectives, what are your thoughts around that? Do you see a lot of pushback from the businesses generally saying, you know, around data minimization or the concept of deletion?  And if so, what are some things that kind of helps overcome some of those issues?  

 

09:51 - Matt McClellan

Well, I think it's matter of the maturity of the organization in terms of the pushback. So, an organization that's never had to manage their information in any real way and has kept everything forever, there is typically an initial pushback in that. They thought they need to keep it. It's the right thing to do. It's the thing that's made them the most. It makes them feel comfortable. There's a bit of an ownership component that people feel, especially around the information that they personally maintain, whether that's something as simple as email, all the way up to the stuff that they manage for the organization on a larger basis. But what we find is that being able to communicate to the stakeholders within the business, while we're doing this, that this is not an effort to undermine their autonomy. It's not an effort to remove necessary information from them, but it's an effort to control the sprawl of information and to protect customers, to protect the organization and to honestly make their lives somewhat easier.  Then we come to a better understanding. And so there is a bit of almost a  therapeutic nature to communicate to the orders to different stakeholders in terms of their maturity, where they are, that this is a good thing and that we're here to help. And we understand the need to keep certain amounts of data, certain types of data for a period of time to do the job and do it well. But we can't do that at the risk of.  being non-compliant with privacy laws or any other kind of regulation. And that usually is a couple of times we have to those conversations with folks that are in the less mature build of their programs. But those that have been doing it for a while understand the need, but they may not have the discipline or the support or the technologies in place to make those goals and understanding a reality. And so it's a I really think it's a maturity component that you'll see the pushback.  

 

12:10 - Priya Keshav 

Yeah, I agree. But it also sort of goes with, like you said, having those conversations and building, I mean, one of the biggest ways to sort of build that maturity within the organization is the training and communication. So having that communication component because I often sometimes think that people want to develop these retention schedules in a vacuum, right? Like, which never works out because yeah, you can build a retention schedule, but if you haven't had conversations with the business  and  built that comfort level around that retention schedule,  then it's not going to go anywhere in terms of implementation because you have not considered,  you know, the feedback. that comes from the business. that makes sense.  

 

 

13:01 - Matt McClellan

I couldn't agree more. It is actually very simple to build or to pull a potential schedule off the shelf. There are a lot of tools out there that have most of the laws and citations in the world included in their product. But to get the real understanding and to get the buying across the organization requires engagement and communication with the broader business.  And so you bring up a really good point that communications and training for the employees within an organization are critical and to let them know that this is something that they can help with and that they have a responsibility and a role in it. And then it's not just something that is being thrown at them without their input.  

 

13:53 - Priya Keshav 

The  other big aspect of retention, again, you know, I think you brought up, you can pull the citations from databases, but  I mean, at least if you're a privacy professional looking at retentions,  most of what you're trying to deal with, the data that you're going to be collecting as personal information, the data that  is going to be part of analytics, the data that is part of AI, very likely do not fall under any of the citations because they're mostly data that sort of may not be subject to, I mean, maybe the employee side of things, but the customer side of things, they're very little  depending on the industry again, but  there are, there might not be that many citations and regulations and how long you keep that data. And so if you haven't, if you just look to the citations, you may miss, you know, large swaths of data because you haven't really figured out how to assign.  retention to it and a large portion of that may depend on what the business use cases are and looking at the business use case. So again, another reason to talk to the businesses, but great easier. 

 

15:07 - Matt McClellan

 Yeah. You're pulling, if you're only pulling, I don't like to call record pipes anymore. If you're pulling only information types that are identified by some law from someplace, you are missing the things that are core to a business a lot of times. in that they've created their own sets of information, their own information types that they need to do whatever function it is that they're providing within the organization. So there's a lot of stuff that has retention that is driven by business need, not so much by a citation.  And your retention schedule should include that as well as being able to help those folks that are in privacy understand where those PI elements live within the retention schedule so that there's a clear mapping.  Doing it without, doing a retention schedule in the traditional way where it's only citations, it's only  these information types that are  distinguished by state or federal laws leaves out a core component of the business and then also does not help your privacy team in understanding where the risk actually lies within the organization from a PI perspective.  

 

16:24 - Priya Keshav 

And as they sort of look towards one of the things that most AI and data governance teams are looking at is accuracy and efficiency of their AI models. And so the data retention, having good hygiene and discipline, of throwing away things or getting rid of things that you don't need can actually help better improve the accuracy of the AI models as well, right? Like, so it's a win-win even from a business perspective.  

 

17:00 - Matt McClellan

Right. There's no way in my view to have, you're going to have better outcomes with whatever you're doing from an analytics perspective and AI perspective. If you have good, clean data, you're also going to be more compliant from a a privacy and traditional information governance perspective, your e-discovery risks are going to be lower. It just makes sense on a lot of different levels to manage your information appropriately and to not keep everything forever. I think one of the things the privacy laws have really done is created an opportunity where traditionally it was, this is the minimum amount of time to keep whatever piece of information. Now there is a push for a maximum. And so having those opposing forces actually requires organizations to get a good handle on their information landscape and make some decisions that in the past it was much easier and required less effort to just keep everything forever. Now you're not really, especially if you have a lot of exposure to these privacy laws. Now you're not able to do that. And so I actually think it's a good thing.  And we're seeing organizations really take hold and find some real benefit in doing these large scale deletions.  And they're finding some real value in the outcomes.   

 

18:34 - Priya Keshav 

Let's talk about, we've helped  organizations sort of, like you said, You know, people are starting to kind of think about how to update the retention schedule, how to automate and make the retention schedule more, you know, implementable, look at what technologies make sense, whether it's 100 % automated or semi-automated or manual. There's a whole spectrum of things when it comes to implementation. What in your experience are some of the approaches that has worked based on?  You know, your experience implementing this in the past.  

 

19:16 - Matt McClellan

So you're right, it is a spectrum. There's going to be the full spectrum from fully manual to, in some cases, fully automated, though that takes some time to get to.  But the approaches that we've seen be successful are to be able to have clear direction from leadership. A lot times this does not happen without support from folks at the top, whether that's a general counsel or a CIO, CTO, whoever. Chief data officer is another one. That top down is very important because the folks that are in the day-to-day need to know that they have support to do this without fear of repercussions.  have done is work with organizations to find some of that. And I know it's cliche at this point, but find some of that rot, the stuff that is outside of retention, outside of litigation holes, outside of any real value to the organization and start working our way down through that. You can get those, again, cliche quick wins through that. And most large organizations have it. You just have to be able to work through and identify it. There's another approach around scanning, which can help identify some of those potential opportunities. so having that top down, having the goal to get some of the easy stuff out, that doesn't require a lot of decisions. Those are the ways that we like to go about it. Because ideally, there's always gonna be a core set of data that's gonna require a lot of I'll call it care and feeding, at least review. And the smaller we can make that set of data, the better off we're gonna be. And some stuff is really simple to understand. We just have to get the approvals to do it. And so when we can take those legacy information that's been, there's been a migration and that legacy data is still sitting out there. It's outside of retention. It's outside of any value to the organization, cause it's legacy. That's a really good example of just get that stuff out. There's no need or decision really required other than to do it, to get that stuff out. And we worked with an organization that was able to take four and a half petabytes of data out in one year with that approach. And now they're a large healthcare organization that had been around for a long time. And so they had opportunities available, but it requires just taking the step forward to get that information out.   

 

22:13 - Priya Keshav 

But before you sort of go there or do anything, you know, one needs to sort of establish foundational policies and procedures because you couldn't even start. Whether it's fully automated manual, whether it's piecemeal where we're trying to go in and take rot -out or you have like a hybrid approach where maybe some parts of it is automated and some parts of it is maybe on a more manual basis. Either way, before you can execute on your retention schedule, you need to have policies and procedures, right?  Like how, what do you think are some of the fundamental policies and procedures that one needs to implement before they can get started?  

 

22:54 - Matt McClellan

Yeah, so we like to think of it as a... From the top down, you have policy standards and procedures. And so, you have to have your initial IG policy or records policy, depending on how you want to frame it. That covers the entirety of the information, at least describes the entirety of the information that you're trying to manage. And then you build out standards around the use and operation of information, the deletion information. You've got to build out again foundationally these retention schedules have to be built. And again, that depends on the more jurisdictions you have, the more effort that requires. You want to be able to map the retention schedule to your information assets and be able to rank those in terms of highest value, highest risk. Put in procedures for how you're going to minimize data. What are the steps and how you're going to build up the audit trail? What's the governance framework for that? that acquire what types of approvals is going to be owned by a single group? What steering committees do you set up?  All that sort of foundational and procedural information needs to be built and approved and then start building out your targets, doing it in kind of sprints, if you will.  to do those data minimizations, those data deletions. You're not gonna be able to do this with a flip of the switch across all the organizations. The foundational components of policy procedure, standards, schedules, frameworks, prepare you for doing those deletions over the course of time.  ideally you would roll it in time into a business as usual, a BAU component for your organization. So, it becomes just part of what people do day in, day out, that spectrum from manual to automated.  

 

24:54 - Priya Keshav 

What are some of the challenges that you have seen often that come across and, you know, any tips to overcome them?  

 

25:04 - Matt McClellan

A lot of times, you know, we've already talked about the issues in terms of pushback from folks that are, that are maybe uncomfortable with the deletions, but there are challenges around edge cases. So, what I mean by that is people will come up with scenarios that may happen so infrequently that it's not really worth the effort to figure out how to manage that. It's much better to manage to the main, if you will, to the main set of information that happens, that is used and utilized day in, day out. The edge cases will be a roadblock every time. And so  if you get stuck in an edge case, you want to be able to almost put that to the side and let people understand that we'll deal with it when it comes up, but until it is a real issue, we're not going to  build for the edge case.  Another issue that we see people run into is setting unattainable expectations, meaning that they want to have everything   on a consistent retention deletion process within a couple of months. And in most organizations, the amount of data is too much, the number of systems is too high, and the number of resources available doesn't allow for that. So being realistic in terms of what can be done and communicating that appropriately to leadership is key to make sure that you are seen as successful and following a plan.  And those are some of the ones that people don't necessarily think about. Everyone knows that technology is hard. You have got to find the right tool to scan, and the cost of tooling can be somewhat prohibitive.  the communicating effectively, being realistic about your capabilities and setting the expectations with leadership and then avoiding being blocked by edge cases. are some of the things that people don't typically think about as roadblocks that will cause you a huge amount of effort and slow you down considerably in your building and operationalizing of your program. 

 

27:29 - Priya Keshav 

 So pretty much all of the biggest challenges that you kind of addressed are not technical, right? But most of the people kind of focus on this magic button that would kind of just, you start. I think that's a great place to kind of talk about what are your, if two important closing thoughts around retention, as we talk about implementing data minimization within companies.  

 

27:59 - Matt McClellan

Yeah, so I think top of mind is make sure that you engage your stakeholders so that they are part of the solution and not left out. in those decisions and then make sure that you have leadership engagement and leadership support will be my second.  And then lastly, be realistic and somewhat forgiving of yourself in terms of the effort and the progress you're gonna make. It is not gonna be a straight line. There's gonna be ups and downs and that's okay.  Be realistic about that, give yourself some grace and remember that if it was easy, everybody do it. And so that's why we do what we do is to make this easier for people, but recognizing that it still requires effort.  


28:57 - Priya Keshav  

Absolutely. No, I think there can't be a better way to kind of end this episode, right? Which is data minimization is hard. I mean, to tell you the truth, it's kind of... especially in this day and age where we go saying that, you need data for everything you do and AI is the most important, you know, part of every company strategy and AI needs data to, you know, thrive and survive. And so as you kind of think about data as your most important asset, it's hard to think about how to get rid of what you don't need because sometimes that requires very clear thoughts on what you need.  which is not often easy, right? So the other big thing is communicating to stakeholders and the importance of, you know, it's two sides to the coin.  You can easily get rid of what you don't need if you can understand what you need. And part of that is going to improve and inform some of your data strategies as well. So I think that's a great place to, and thank you so much for.  Joining us today, it was a great conversation. Thanks, Priya. I think it was good conversation. It was definitely bringing some of the more important topics because I think we, thanks for bringing it up and thanks for joining us and maybe we could do some other topics too. So nice to talk to you.   

 

30:28 - Matt McClellan

You know, even I'm trying to get back out there in the speaking circuit. like conferences and stuff. So if there's a opportunity to present together somewhere that you think I'd be useful with you, I'd be all for it. Sounds good.