Meru Data's Podcast
There is tremendous value to simplification. To quote Steve Jobs, “Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it is worth it in the end because once you get there, you can move mountains." In this series, we explore how people and companies achieve simplification.
Meru Data's Podcast
Federal Trade Commission Enforcement and Settlement: Real-World Impacts on Company Data Retention Programs
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
The Federal Trade Commission’s recent settlements and enforcement actions are redefining how organizations handle data consent and retention. In this episode, Priya Keshav, Founder & CEO of Meru Data LLC, and James A. Sherer, Partner at Baker & Hostetler LLP, examine the real‑world implications of these developments for corporate data retention programs.
The conversation highlights what companies need to do to align their retention practices with evolving regulatory expectations while still maintaining operational efficiency. It also outlines practical steps for building sustainable, compliant, and future‑ready data retention frameworks.
Topics covered in this session include:
- Key legal requirements for data retention and data minimization
- Designing effective, cross‑platform data‑retention strategies
- Operational considerations for achieving long‑term, sustainable compliance
Setting The Stage: Data Minimization
Speaker 1Good afternoon, everyone. Welcome to our webinar on Federal Trade Commission Settlement and Enforcement, Real World Impacts on Company Data Retention Programs. My name is Priya Keshev, and I'm one of the founders and CEO of Meru Data. Data minimization is an important principle in privacy. While data minimization is simple and easy to understand and somewhat obvious, it's quite difficult to practically implement for various reasons. Just a few years ago, businesses were collecting more information than they needed in the hopes of it being useful in the future. Phrases like data is a new oil was taken quite literally. Organizations started accumulating data for as long as possible because the storage costs kept decreasing. Many felt that the active effort to delete unwanted data was probably not worth the effort. The assumption was that stored data would be useful sometime soon. This resulted in large quantities of unwanted data accumulating in most organizations. As regulators focused on data retention practices, we thought it would be timely to discuss about building data retention programs in-house. We have with us today James Sherer. He's a partner in New York Office and co-leads the emerging technologies program Baker & Hostetler LLP Digital Assets and Data Management Group. James, do you want to introduce yourself?
SpeakerWell, I mean, you covered most of it there. I'm a partner with Baker & Hostetler LLP in New York. And yes, I work with our emerging tech team in the Digital Assets and Data Management Group. And my co-lead on the emerging tech team is our Katherine Lowry, who handles the internal AI use deployment. My focus is on client inquiries and requests relating to technology.
Speaker 1So James, should we start by defining what retention means and what a retention schedule is, and uh and and then maybe talk about how traditionally we've sort of approached retention um schedules and retention programs versus how privacy laws look at it.
Records Schedules: Floor Without A Ceiling
SpeakerSo sure. Uh and terminology matters, I think. And that's one of the first things for, or at least that I had discussions with organizations about. Like sometimes there's, hey, we have a data governance policy, we have a data retention schedule, we have a records and information management policy, we have a record retention schedule, you know, going through these different issues. Traditionally, a record retention schedule has gone through these line-by-line categories. Here's a record, and here's how long we keep the record, and maybe sometimes the justification or requirement for it. And oftentimes we've seen that as a floor. You know, we keep this record at least this long. There's a regulatory expectation or requirement that we still have it if a regulator asks for it or a contracting partner or whatever it might be. But it didn't, it doesn't normally, at least traditional approach, set a ceiling. It doesn't say, okay, well, we we keep it at least this long and then immediately get rid of it. Uh it's more we keep it this long, and if we're like many organizations that struggle with getting rid of information or information's accumulated or accreted along the course of time, and we don't really know what that information is, then it's more likely they're not just gonna sit there or it's gonna be up on a file share. And we've seen this as an issue that's grown, I think it's really grown in volume over time, and and I will assert part of that is because traditional records practice dealt with paper. And you can see paper, you were forced to interact, it's a fire hazard. Uh, which is not to say that accumulation of data now is not a risk of a different sort, but it's not, it's unlikely to catch on fire, data centers aside, I suppose. So people aren't tripping over boxes anymore in the same way, although there is a subset of very healthy paper accumulation that's out there that's subject to its own issues that sometimes we help clients with in uh remediating that. So you've got this idea of records and then minimum retention. But extensions out in two different ways. One is you've got a lot of data now that doesn't necessarily, nor would you want it to qualify as a record. Again, when records were paper, as they were for, well, I mean, we had the cineform and then things written in marble and stone. But then once we moved into paper, records are paper, you were intentionally marking things down. Maybe there were notes, maybe even memos to file could become records. Everything was intentional. You were doing something, you were typing something up, you were printing, it was a big deal. So it was a record and it made sense to think of that way. But when it became cheaper and then essentially free to start sending emails, we're collecting a lot of other information that maybe we wouldn't necessarily qualify as a record. So that that began. Um, and you might not have everything oriented on a record retention schedule. And likewise, it became just so much easier just to collect and maintain. Um there's also some ideas within the uh the InfoGov practice that IT budgets began to increase, and people were were less hesitant or they were they were hesitant to push back because you know, the more information the organization was growing, then the bigger the IT budgets, and the more they had to look at these things. There is this uh this set off too within legal practice. So at different points in time, there were these questions for hey, why does it cost so much to maintain information when organizations finally were turning around and saying, hey, uh we collected all this stuff, and now we've got a legal hold in place, and you want us to review everything, and that's just unduly burdensome. And have to explain to courts in the first instance why there's a tremendous difference between the cost of an external hard drive for a terabyte or 20 terabytes and for associated cloud storage of the same amount. Like it's not apples to oranges, and access and availability of that information isn't the same. But we had this idea in our heads, I think much more pronounced during maybe the last 15 years, that's slowly beginning to fade as people are doing less active NAS storage at home, and now everything is moving to that cloud environment. So, again, records, intentional, data management and governance, broader picture, how we maintain that information and when we should start thinking about getting rid of it. I think it's it's become a much more complicated picture. And we have even gotten into all the different places where data can be stored.
Email, Non‑Records, And Cloud Sprawl
Speaker 1Yeah, and and I think part of it was there was this um, you know, it enron sort of drove this need to make sure that we were not unnecessarily deleting what we needed to keep. And then as e-discovery became expensive, there was definitely a you know um a drive to reduce costs of litigation. Um, and also the you know, the consensus was that keeping data is not necessarily relevant. So at least those with significant litigation exposure started thinking about defensibly disposing of data as long as they kept what they needed to keep for attention. But then, you know, now we are in a different era where privacy laws kind of define the practice. I mean, and it's and and it was always about records and non-records, right? Like so it was about company information that was being kept. Um, whereas privacy laws started defining personal information and uh CCPA in the US, and of course TDPR is always data minimization has been an important principle in privacy laws, but in general, the idea of holding personal information only for as long as reasonably necessary and proportionate to the purpose for which it was collected was is kind of a new concept. Um, and and and of course, it doesn't talk about records, it talks about personal information. So, you know, in in light of the new requirements and and uh we'll also talk about security breaches, but you know, how does the definition of a retention schedule more as a documented policy for keeping different types of data, mostly records, kind of evolve to sort of accommodate privacy?
SpeakerWell, I I think I'd still want I want to be intentional about the terms that we use. So at least when we counsel clients. You can have a record retention schedule, but appreciate that that's that's what it is. Records defined, minimum capture or maintenance of that information, that's guidance. And that's that's how we want to apply it, even as like a filter for practice or for operations. Say, okay, well, if this is and we can take a step back within traditional info gov practice there too, and say that's not to say that we need every copy of the record. Like the record, this this idea of a golden record, like this is our source of truth, this is the record we're talking about. It complicated then. Who maintains it? Is it uh are we using a third party to run our payroll? Do they are they the source of truth for our records? And what does that mean for our record retention? Now, I have seen some organizations begin to migrate toward an approach that's more data governance that treats different categories of of information according to category. And then you have to make that distinction. So let's just stick with email. That's that's one we get questions about a lot. How long should I keep an email? Well, I mean, that's like saying that box in the garage or boxes in the garage, how long should I keep them? Kind of depends on what's inside. And when's the last time you opened it up? Have you moved that box with you three moves and never opened it up? Maybe it's time to let it go. Does it have the old SCSI cables for your computer, or does it have like childhood memories? I don't know. Or some combination thereof. Um so in fact, we can look at that and say, in the absence of a specific record, right, or our golden record, it might be an email, and that's the tricky part. Uh, or uh general requirements like how is the exceptions associated with legal or regulatory hold or investigations or whatever it might be, broker dealer, investment advisor requirements is like a subset. Taking away those, if we can do addition by subtraction, then we can say as long as none of these other things are implicated, maybe we're gonna treat email as uh business necessitieslash uh whatever. It's it's utility for business, it's gonna be three years or a year, or aggressively 30 days. There are different challenges with some of those as you start to win it went down. But with that, I mean I think that's that's the practical way to look at it. And going back to your point about Enron, I mean that did give us the Supreme Court Arthur Anderson case. It's one of the very few that we have directly on point for record retention, and this idea that absent these other obligations, organizations do have the affirmative uh opportunity to pick and choose how long they keep things for business purpose. But we want to articulate that there is a business purpose rather than oh, we just keep everything forever. Because we know that's that's gonna create it, doesn't necessarily create challenges in litigation in the first instance, because you do have a bunch of stuff. There's less like if you have it when you're asked for it, then you don't have to prove the negative. However, if you have a data security incident and it turns out you have it, and there are questions as to why, God forbid you didn't tell people you had it, or that you collected it, or your notices say something different, then it opens up a wealth of other additional challenges. So I think we're starting to get there where we're setting requirements on these technologies and saying, hey, we'll keep it for X period of time, then it becomes the complication of do people feel empowered to either actively delete things or remediate environments, or even to allow the environments to kind of cleanse themselves. The auto, the auto-delete, the hey, at the end of the calendar year, things will roll off or they'll go into a vault for a little bit and then they'll go away unless people actively select to maintain things.
Litigation Pressures Vs Privacy Risks
Speaker 1So, so um, I'm uh sorry to kind of ponder a little bit more on this, right? Like because I think there is a little bit of convergence and a little bit of divergence as well between between the two worlds as I see it. So um when you look at the world of personal information, in other words, you you kind of use the uh a, you know, email as an example, but when you kind of look at analytics or data lake or things like that, some of it sort of maybe overlaps with what is a data record and kind of falls within the record retention schedule, but many could might be just not related to not a business record, but just data as defined by us as necessary for business purpose. So you're kind of looking at what is the business reason to keep them. And they probably all fall under the other category in a records retention schedule. So you're looking at it from a perspective of how long do you need it for what you have collected. So that does that does that make sense? And uh and so it kind of becomes a supplement to the records retention schedule rather than part of it.
Building Data Governance Beyond Records
SpeakerWell, that and that's where I think it we're we're gonna see if organizations move toward a data retention schedule that has a subset in there for records, but then also give some guidance for other repositories. And I'm seeing that more frequently. Uh certainly not with every organization. And again, I've got uh self-selection bias here, depending on who's reaching out to me to talk about benchmarking and an approach. But I do get those questions frequently. How long do I keep email? What are other organizations doing about keeping email? And then I mean, you walk through the filters there. I mean, do you have records in there? Obviously, this doesn't apply. Do you have legal holds applied? This doesn't, you know, our guidance wouldn't apply given all of those things, then X approach. And here's where it goes. And if you've got this broader uh period in mind, we can look to see if that's supported. Because that's part of it too. I think if there's because now you've got different pressures coming in. In the context of litigation, it's kind of both ways, especially if you've got asymmetric litigation, where let's say you've got plaintiff's counsel, name plaintiff, proposed class, unlikely they have a lot of information. Well, there are there, they want to make sure you have all the information because uh the more you've got, the more expensive it is to produce, the more likely that that hypothetical smoking gun document is out there. Um, however, if you've gotten rid of the information, then you can make the play about discovery, right? If you don't have the facts, fight the law. If you don't have the law, fight discovery. Claim proof of the next. You would have had it, but you got rid of it. And now where is that? You've got to prove that this thing never existed that that we imagine and assert. There's a subset within uh regulatory enforcement that has the same thing. Like saying, okay, well, we want to know what happened. And we want to know everything about it. So the more information you have, the better, so that we can do our investigation. But they're not investigating privacy issues. They're there, they're looking at other things that have ostensibly been wrong, or that have ostensibly gone wrong. Flip over and out of privacy enforcement's very different world. Seeing that from GDPR, I think beginning to see it at the US side too, which is something went wrong, and it might have been related to the use of information that you had and you shouldn't have. Maybe you shouldn't have collected it, maybe you shouldn't have maintained it for this long. Uh, certainly you shouldn't have maintained it in the way that you maintained it, such as something happened. Now the pressure is like, well, other regulators might have wanted you to have all that so they can see what happened. But because you had all of that, you got in trouble. So now we're not only did the bad thing happen, but also uh the secondary pull away from that is it did it happen because you did this thing. Um, right? You didn't you didn't violate and send this information out, but you allowed it to happen because the systems had some compromises and you had the data in there in the first place. And you didn't need that extra copy and we're even using it. So I think there's there's this wave, and I'm gonna assert the one thing that we're starting to confront now is the uses of automated processing AI technologies. Starting to actively work with clients, and I'm gonna be working with Arma next week and kind of going through these things and benchmarking. What does it mean for a record retention schedule or a data governance schedule to accommodate for everything associated with AI platforms? The models themselves, the data going into them, um, the data that's used to modify, to double check, to do all of these different things, and thinking of like those data sets as being part of the tool as opposed to standing alone, and whether like record keeping or data keeping requirements associated with those tools now is going to involve having sufficient information to address questions later on if something goes wrong, but maybe to anonymize or de-identify or to limit the potential security issues associated with those data sets. If they've become records in support of this other initiative, as opposed to your earlier point just being a data lake filled with a bunch of information that you kept dumping in because we thought there might be a future potential good use for it.
Speaker 1No, yeah, I I agree with you. And I think you know, just looking at just the FDC settlements, you know, in the last year or so, you can see that, you know, pretty much if you look at Illuminate, um, they talk about you you talked about the security failures. Um they had security failures that led to a large amount of student data breach being breached. And um obviously as part of the order, they sort of said you have to follow the publicly available data retention schedule and uh make sure the information that is collected uh is and establish some kind of a time frame for its deletion. Um if you look at um the recent GM or order, again, it talks about following the retention schedule and publishing the retention schedule so people know when it'll get deleted. Um Marriott and Starwood, uh again, the information security program to help safeguard consumers' personal information to implement and retain personal information only as long as fees nearly necessary and establish a link on their website for the US customers to request personal information be deleted and provide some kind of a disclosure on when that'll get deleted. Um so you know, uh we can keep going, but like telehealth um uh, you know, firm cerebral, they kind of again bring up the uh the idea of having some kind of a data retention schedule and delete most consumer data that is being used for treatment payment, unless consumers consent to its retention and keeping it for longer than it uh than necessary, right? Um so it it's it's becoming more and more um, and I was just talking about all the FTC enforcements, but if you kind of look at the states too, um as uh the number of privacy sort of enforcements increased, uh one thing that has been common, and then of course we've seen only uh you know, we see the tip of the iceberg in terms of the orders and the actual uh settlements, but behind the scenes, most of the questions, uh, especially when it comes to deletion and personal information, have also centered around what kind of deletion practices do you have, how long do you keep it? Do you disclose to these consumers that you, you know, uh that you keep them for certain purposes for as long as for as long as you've kept them and do you delete them, right? So this is becoming more and more relevant from a privacy standpoint, uh, which is completely at odds with, like you said, the need for us to keep it for litigation, the need for the need for us to keep it to fulfill other obligations. Um, but many of most of the personal information that sometimes gets collected probably would not be kind of a record. Um but there is some overlap uh you know with the information that is being stored. So um we definitely uh, you know, as someone looks at retention and deletion and practices, as well as, you know, obviously the most secure thing to do is to get rid of what you don't need, uh, because if you don't have it, then it's not gonna get preached. But it's also important though to think about, you know, your uh the company's legal obligations, um, the need for business to keep some portions of the data for various purposes. But um as uh but but uh the scope of retention and deletion is sort of um Uh should consider some of these uh you know regulatory inquiries as well. Um so going back to uh data minimization, right? So uh what do you think are some of the most why is data minimization or retention difficult to implement? I mean, as uh you know, people have a records program, they kind of talk about minimum data to retain, but you know, most organizations struggle with having um you know uh establishing uh a retention program. So why do you think there are challenges in implementing retention programs?
Auto‑Deletion And Cultural Barriers
FTC Settlements Reshaping Retention
SpeakerWell, because it's it's difficult to categorize information. Like it's it's an extra step, and people are busy enough with their day-to-day. This is it is a compliance exercise. And unless you can unless you can demonstrate how data categorization and management will help people in their day-to-day, I think it's it's a difficult ask. It's like saying, hey, you know, this cafeteria would be a lot less messy if everybody just kind of pitched in and helped clean up as they go. I mean, that's that's absolutely true. You probably wouldn't even need janitors if everyone just kind of kept after themselves and and pitched in. Um but but we accept that people are doing other things and the data and systems are there in support of their daily work. Um and unfortunately, for that particular question, sometimes there's just an economy of scale where you step in and say, you know what, easier to allow things to accrete over time, and then we can handle it responsibly with uh one fell swoop. Uh and and autoclassification is still not matured. So we to the extent that we're affixing metadata that has operational significance and can apply schedules against it, then that's I mean, it's it's still it's still to this point science fiction, although I'll say the barriers to some of this are actually more breachable. I shouldn't use that term in a discussion of all of this, then than other things. I do want to be clarified, or I do want to clarify one point, by the way. Not speaking on behalf of clients, not going to name any companies doing this on my side. There are there's plenty of enforcement out there at both the federal and state level for organizations, and that's certainly instructive for uh current potential clients, regardless, um, to see where regulators are looking at this and and what their sensibilities are about how all these practices work. Um I think at a practical level, when I'm working with organizations, they they want to get rid of information, but it's it's also so difficult to take on that affirmative responsibility internally. So it's it's easy to collect information. Um I always work with bad metaphors that I haven't fully thought out, so I'll do one here. I'm gonna go back to my garage example and and the box of cables. And this is this is the long-running joke. You might have like that box of uh AV slash computer cables that you don't know if you're gonna need again, but boy, you don't want to lose them if you don't, or if you do need it at some point, because then how are you gonna find it? You're gonna go back eBay, right? There's no more, I don't think there's any more Radio Shacks. Like it's it's difficult to replicate that connection out or to stand that technology back up or to find that or to pull that picture off the camera or whatever it is. Uh and then the joke was I saw someone post, they're like, hey guys, you know, I finally bit the bullet and I got I got rid of that box of cables. And he's like, and wouldn't you know it? A week later, I needed something from that box. So never get rid of it. Like that's it, it's it's easier some in some ways to allow this to collect dust if you need it offhand. And and legal professionals are among the worst of the worst. Uh, if you're working in this profession, you might want precedent from uh 10 years ago. Some of the laws don't change. I'll tell you a lot of the record retention requirements don't change over time, and just enough of them do that to think seriously about evergreening practices. But but given that, um we've got this we're academics in this space in large part, and and sometimes in those those organizations is the toughest, right? Research scientists never want to get rid of anything. Um and then we've seen this evolution into big data where there's a claim to your earlier point. Like, this is the new oil. Well, guess what you don't do with oil? Throw it away before you use it. So, like, why not? Maybe, maybe some initiative is gonna come in and demonstrate the value of this. Now, IBM and others have done these studies that show like rot goes up, the the risk goes up, the uh viability and usefulness of the data goes way down after I don't know, what is like 48 months or something? Uh and if we're behavioral economists, people change. Uh a lot of your your consumers don't stay where they are. Um but I think I think that's the challenge. It's too easy to keep. Um the systems have uh made it easier to keep. Um and we don't know if there's value, and then we collect enough of it from organizations that you don't even know why things are out there. You don't really know necessarily what's what's inside. We've we've come up with some tips and tricks, I think, through e-discovery, because you're kind of forced to deal with it at that point, and you're gonna have somebody on the other side evaluate whether you did it correctly and like challenge everything, even if it's just for the point of challenging it. So, in those instances, we've seen we can de-nist, right? We can go through and get rid of system files, we can check uh hash values if we're going back to at least the golden record example. So we don't need to have every copy. But then we have the question is like, well, but before we get rid of this information, is it important for me to know that X person had access to Y file as part of something that might get challenged? So some of those issues are gonna percolate in there as well. And I mean, I I and the team and other similar people at different firms, I think have this whole lattice of evaluative criteria that we would assign and say, okay, if we thought about this, we thought about this, have we thought about this? Great, we've gone through this process. Now we're comfortable that instead of opening every box in the garage, you can just load them onto that truck and it can go away.
unknownRight?
SpeakerWe've we've checked none of the cables in that box. You don't have any of the stuff they go to. You're not gonna need it, and it's very unlikely you're gonna buy one of these antiques that would need that cable. You can let it go. But and and if someone else comes back and says later, why don't you have that cable? You can point to the work we did right now. And that can be your proof of the negative. There's no reasonable expectation that we'd need this stuff, therefore, we can let it go. Um but outside of that, organizations just find it so much easier to keep it, just keep it and maintain it, unless and until you have a crisis and you don't want it to go to waste.
Speaker 1So um, you know, on that topic, right, we have traditionally talked a lot about this big bucket approach, um, you know, which is uh R versus granular approach. And sometimes, you know, at least there was a time when, you know, there was this understanding that the more granular you got, the more difficult it is to solve this problem. So it makes more sense to take a more bigger, bigger bucket sort of approach where you define things at a at a department level or you know, broader categories, knowing that that's not nuanced. Um, but sometimes when you take a big bucket approach, it's easy to define, but it's harder to press the button to actually delete the data because you don't have the details. And sometimes that's where, um, like you said, being able to prove that these cables are not useful because you don't have the gadgets that use them in the first place, right? Uh, because it looks like a bunch of wires and I may need it. Like, so that's and it's harder to prove the negative, right? Like the uh so it's very hard to do that. So um have in your uh experience dealing with clients, what have you seen as approaches that sort of help in practically sort of solving this big bucket versus granular approach? I have some thoughts too, but I'd like to hear yours and then maybe make some suggestions that I've seen.
SpeakerUm if they've gotten to the point where they're involving me, then they're they've already committed some resources, which I think is a pretty good sign. And we've got we've got that momentum. Um I I think the the debt snowball approach one uh can work. And and that's I don't know if you're familiar with this as uh like a Dave Ramsey personal finance, and I know that's like laden with politics and everything, but just assume assume here you've got a lot of different debts. A lot of different debts of varying amounts with varying uh rates of interest. And you've got, let's say you've got a big one with $50,000 at 5% interest, and you've got a smaller one at $2,500 with 3% interest, and you get $2,000, or let's say you get $3,000. You get $3,000 for where do you apply that between the two deaths?
Speaker 1Uh probably the big one, but I yeah.
SpeakerWhy would you apply it to the big one?
Speaker 1Um because it's well I either way I pay for, I mean, I think it's maybe maybe the right approach is to look at the cost of servicing those debts and taking the the highest risk, right? Like which one poses the highest risk for me. Um and so that would be my calculation, which is maybe the interest, but maybe also other factors that go beyond the interest. Um, you know, there might be other factors like which one is tax deductible, for example. I mean, I'm just throwing up variables that may not be in your uh calculus, but might matter for us as you'll look at.
Defensible Deletion And Risk Tradeoffs
SpeakerThis is this is this is infodov. Like this is this is this is what gets tricky, but I will I'll say in that particular instance, everything else being equal, you absolutely should pay off $3,000 of the $50,000 loan because that's gonna save you money, even though you're paying on the other smaller loan. However, we are still human. We still have psychology, we are still, you know, the behavioral economist view. If you want to know what people are gonna do, look at what they do. And the the snowball idea here is if I get rid of that $2,500 and then pay $500 toward the other thing, I feel more accomplished than knocking $50,000 down to 47. It's just psychology. It is not rational. We're not. We're not rational people. So I try to take some combination of doing the rational thing for the organization while ticking the boxes of those ways in which we still remain human. What is because the client needs to see progress. Because those are their internal initiatives. There's some really interesting things stacked up within InfoGov to where even some of the technologies and providers incentivize you not to do anything over a given quarter or even a given year. It might take actually three years to demonstrate ROI if you have to invest money into getting rid of things, rather than just staying pat. And that's known within the industry, and that's a challenge because nobody wants to lose their bottom line in a given year if it's bonus criteria rather than kicking it down. Like it's it's the the hurt is a little bit different. So I mean, I look at that, there have been some projects, and I'm not exaggerating, where they would have been solved with kerosene in a match. Have an emergency, have a warehouse fire. That's happened. And one client that like lost X period of email. And it didn't end up mattering. Like they worked through it, and maybe there are specific instances that become a little bit more difficult. But I think a lot of the emergencies you wonder about don't happen. Um, now that said, like within the broker dealer investment advisor uh space, you did have uh regulators going through in the news of one columnist, treating that like an ATM, saying it's impossible to do this right, especially because we had COVID, everybody went home, people continued doing business, they used all the platforms they could to make it to tomorrow and to tomorrow, and now we get to look back and judge their behaviors based on, hey, we're requiring people to be in the office with their technologies now. So I think there's a little bit of rose-colored glasses that can lead to some challenges. But I mean, in all candor, I personally, and I think this is true for a lot of the information governance practitioners, we're trying to look at this practically. Say, you collected all this up, you probably don't know a lot of this information. How can I help you build an approach that's going to stand up to scrutiny later on? Because what I want to do is put you in a position where you have to, but maybe not too hard, prove the negative. I want you to be in a position where later on someone will come back and say, hey, what happened to all of this? Because if you're you're in a position to say, hey, what happened to all of this? It means you did something and you mitigated some of that risk and cost. Or maybe you transformed the data. I think this is a you know the brave new world of governance here where we're saying we probably you know there are different constituencies within the organization that might want that or that information that we have. We may be running out of good training data on the outside world as AI slop takes over the internet. So we've got all this information internally. Are we sitting on oil? Are we sitting on gold? But what how do we handle that responsibly? And is there a step in here for anonymization as part of data governance that will obviate, I would argue, the privacy laws, which if it's anonymized data, it doesn't hold true anymore. Now there's some some question of data provenance and ownership, which is again some interesting brand new questions for a lot of this. But again, all part of the data governance discussion. And uh I think we're figuring it out as we go.
Big Bucket Vs Granular Approaches
Speaker 1No, I I agree. And I think some things that I have seen, um, you know, obviously we've talked a lot about the defensible deletion approach, right? Like just like I've already built this, I already have this mountain of data and I need to get rid of it. And taking a risk-based approach and and kind of looking at it, you know, makes sense. But as you sort of approach it from a privacy standpoint, if you're trying to kind of implement um as you go, um, the same logic sort of applies too, right? Like so obviously Google and Amazon and others have sort of established that data sort of is powerful. The more data you have, the more you're able to sort of use it for business, which kind of is important. Um but at the same time, um the privacy regulators have sort of also uh pushed enough to you see, start to see momentum towards building technologies as you sort of you know look at newer ways to use data for AI and other purposes where uh maybe purpose and access is built into the use of data, like how the data kind of moves within the organization, right? There were a lot of conversations about, you know, um, for example, uh where data was there and nobody had any controls over who had access to them or who how it was being used. And you can you start to see uh you know larger technology companies sort of reigning in on some of that to sort of you see technical papers coming out to where they're exploring the idea of access restrictions and tying access to purposes and being able to kind of pull data back if need be. Anonymization is another area where there's a lot of technology advancements in terms of how to make it anonymous so that we can truly they can continue to use that data. Um but other smaller things, like you said, being paying off smaller debts make you feel better, but looking at even vendor contracts and establishing retention timelines as you sort of define new user requirements, right? Like use requirements uh with various service providers. So you you you there is a lot of things that that kind of help in chipping away on a regular basis at um look at data uh you know in terms of paying attention to the retention overall, um especially of personal information and sensitive personal information, um, you know, as as regulators kind of focus on this. And so I think you you see a lot of that happening, which hopefully, you know, maybe doesn't get rid of the mountains, but hopefully reduces the um uh for any at least reduces the accumulation of data in the future, um, which, you know, as you said, would be important from a data governance perspective, because if you don't have data governance, it's gonna be much harder to sort of use it for AI. And as companies rely more on uh data for training, they need to be able to do that. Uh, because if you don't have consent and if you have to delete that data in the future, it becomes much harder. Um, you know, uh it's much easier to kind of know that upfront that you have clean data that is not going to be subject to enforcement before you know you start training and making that investment from a company standpoint. So there's there is a business uh um advantage to doing uh uh cleaning up data as well, right? Do you do I don't know what your thoughts are, but yeah.
SpeakerWell, no, no, I mean absolutely. And I mean we see some of those use cases, although again, I'm uh are you using the right uh the right data to make decisions? Uh still stuck, I guess, to make it easier to comprehend. So the the bad metaphors slash examples, hey, what what about information with intrinsic value that that we might want to have on a near permanent basis, uh, but then then it can complicate things. So the example is uh I'm a construction company and I build this complex. And knowing the the floor plans and the plot diagrams and all of those things, like I need to know where the underground storage tank is. Right? And I and I can I can find that out various ways. I can go and check the source of truth to see where we put it, or I can go dig it up where someone else can find it accidentally. Like they're there we're limited in that uh we can do expensive soundings or whatever it is, but if we've got the right records, we we cut to the chase and we can say we can we can work around that or or what's inside the walls or whatever it might be. However, I don't want to have multiple copies of iterations of that and not know which one is right. Hey, where did we settle on this? I've got now two or three different uh areas. This is a client not to be named, but I remember a client saying, Hey, look at this policy and tell us what we need to change here, especially because it's for our internal audit purposes. And I collected up examples and I had the version three, I had the version 3.3, I had the version four, and it turned out the operative version was 3.3. I don't know how it got forked at some point, and you know, you almost have to be like a forensic analyst to look back at some of this to say, oh no, no, this is the operative one, this is what was implemented, this made the decision over here. You can see how tricky that gets. Um I I think a lot of the initial governance isn't necessarily dealing with those edge cases, which you know, bad facts lead to bad law. Like that's suddenly I've got the story. Um, but it's it's more you're trying to deprecate this platform. You're trying to move in a different environment. What what can we do basically? Understanding and appreciating that the risk of getting challenged later on is probably pretty low. But I mean, the the given risk that your roof is gonna light on fire is probably pretty low, but you still have insurance. We think about that. Like we're we're amortizing it across. I think that's why we look at these governance projects and say, okay, well, what what is the insurance here? What what is the the bare minimum? If we're not required by law to do this, and and maybe that's part of the privacy sensibilities here, which is we know no one's uh actively addressing this but for a push. We've seen what's happened already, and unless you actively make money from all of your data, then you're just less incentivized to actively manage it because it's another cost. So, how do we how do we get people to pay attention to it outside of when they have to? Um, can we give credit? Can we give credit for other things you've done in the interim to start to handle these things? Are there commercial pressures? And I deal with the same thing when I'm setting up. Information governance and AI programs now, which is where people are like, what do I have to do? And it's like, well, you know, here's a subset, or we've got a legal requirement for what it means for an assessment or uh internal audit or an evaluation of things. But also, this is what we're seeing as a commercial reality right now. So if you want to be positioned to get the contract, to get the work. Um, I feel like I saw this on uh one of these shorts, maybe it's like God bless LinkedIn doing the videos now, but it was uh someone saying, Hey, we're ready to sign the contract. Um, just want to confirm you guys are SOC2 compliant, right? And like maybe this becomes part of the narrative, which is other organizations don't want to risk the risk associated with these data collections, or you compiling and maintaining their information and to start looking more intentionally at what are you doing, what are your representations, what have you been audited against, uh, what kind of certifications or accreditations do you maintain?
Anonymization, AI, And Vendor Terms
Speaker 1No, yeah, I agree. And I think um, you know, you like you said, um many times, um, one of the common use cases that you get is I'm coming back to the data lake example, right? Like I have the raw version, I have the um I have the cleansed version, and I have now a whole bunch of dashboards and reports that are uh enriched data that I have uh sort of maintained. So now I have like maybe five, six, seven copies of the data in the same environment. Um, and then if you look at the usage, um, maybe the granular information is used for six months, but a lot of aggregated information is necessary for 10 years. Um, but having that insight sort of helps you sort of look at, you know, is it worth the risk of keeping, you know, the data beyond the six months for the 10 years aggregated, or could you actually remove some of the identifying data, which makes the data more usable for longer periods, right? And the scrutiny and it improves security. So as you start looking at business use cases, um, you know, I think uh uh maybe compliance is not the only use case. Um and once you have removed the identifying data, now you don't have to go through some of the scrutiny. Uh and uh for compliance, as you're sort of looking at AI, using the data for AI training, right? Like so it it it brings, uh it saves a lot of time and effort. Um so before we used to talk in terms of defensible deletion just from the perspective of it reduces cost, but now you're kind of you you're able to use the data more freely, uh, which makes for uh a better business case, I guess, uh from a from a you know, for a retention standpoint, which i it's hard to prove, but um you know it's something that you can maybe take uh as we kind of look at this granular versus big bucket, uh maybe as you said, like you know, find these nice to have use cases that sort of demonstrate to the company that there is value in looking at retention programs beyond just compliance.
SpeakerWell, and and I wonder, I haven't seen this yet, but a program. Well, I mean, I I think that's an affirmative defense for some of this would be a we don't necessarily delete this information, but we functionally do um by anonymizing it. So we we've changed the character and nature to accept it, arguably from privacy law. Uh I mean, certainly we have that argument within data security incident response. If the information is uh you know, it's encrypted in whatever way, then it really hasn't been compromised. Yeah, you have a copy of something, but you can't get into it. Uh you have a copy of something, but there's just no identifiable information. Uh and we and we've seen discussions about that. Like data governance extends out because of all the sharing of information within the commercial picture. Uh within the the AI context related to data use, and again, that governance, like we worked with one client where part of the representations were that the information being acquired through the service was going to be anonymized. And even though they were doing advanced analytics across it and making some automated decision making, they they asserted they were anonymizing. Uh given the story before it's it's true, a client there opted to bring in a mathematician to confirm and verify approach. It sounded too good to be true, but at the end of the day, the mathematician spoke with the engineers, mapped out what the process was, and this is something that uh affected a pretty large number of people. And given that, we were able to go back to the contracting picture and refine what the representations were. I mean, we we challenged it and got better language rather than attorneys kind of making assumptions. I was working with uh an associate early today on an AI policy, and associate saying, Well, I removed this and this, I'd like to do the addition by subtraction. And I said, Well, why? Why are why are you doing that? Are you doing that because you're not aware of them doing it? Are you doing it because you know that they don't do it? Or are you removing it because you think that they're never gonna do it? And those are three different, very different scenarios, and we have to think about that with a lot of the governance issues too. Uh so you know it's it's uh it's a complicated picture. Uh but I I think that part of why we as IG professionals get engaged, especially on the outside, is that that someone will help take the responsibility. I was gonna say blame, and I don't I don't quite mean that, but responsibility. Like that the best of these engagements where I've worked on them have been some combination of outside counsel to do benchmarking and help guide the legal process along with implementation on a service provider side, working with a client. Paid to show up to get this stuff done. I I swear it's like it's some combination of the debt snowball where you're you're getting some things accomplished and you can show that progress along with a personal trainer. You show up, you encourage people to show up, people make commitments to doing that, and you do it for like some period of time. Hopefully, it's like uh like working with a psychiatrist or psychologist where you sign up, except there is an end date where you say, Okay, now we're done. Like we fixed it, we fixed this issue. Um, it's not gonna be a lifetime obligation. We've moved through it. Although some of the other things, like it's it's clear that these relationships are gonna be ongoing because we're not we're not collecting less information, we're not developing less information, we're not maintaining less as time goes on. Uh, and we're seeing pressures down for storage costs, which then only seek to increase how much we're maintaining.
Speaker 1As AI becomes a norm, I think uh, you know, if if we think that we're collecting and maintaining a lot of data, we're probably gonna that's gonna explode exponentially with with uh agentic AI, um, you know, because of just the way we're able to connect and process and generate data. So um, you know, uh cars, what the type of data data that they generated today to autonomous vehicles, the amount of data generated by autonomous vehicles. You know, you can just see the amount of data that how it increased exponentially. And that's gonna happen with pretty much everything. Um, and so we're we're we're at an inflection point in terms of um more data is necessary because that also kind of provides us insights to learn for AI agents to learn and correct and you know, uh and for us to understand how well things are performing. Uh, but at the same time, that means there's just more information about us and more detailed information about us, which can also kind of increase the risk. So finding the right balance between, you know, the the usage of data as well as minimizing data when it's not needed. Um, I think we're we're kind of it's been important before, but it's going to be more and more important that we get this right. So um, do you have any other closing thoughts before we end the webinar?
Proving Value: Progress, ROI, And Benchmarks
SpeakerFor anybody watching that that's not already in contact or that watches later on, please feel free to reach out and make connections with us on LinkedIn and uh to stay to stay in that loop. Uh benchmarking is extraordinarily important within information governance. And that's, you know, if we do uh like a future discussion, it'd be great to have people come on and ask ask questions or to find out what other organizations are doing. Uh a lot of what I do is try to take logical approaches and say, here's here's what others are doing, here's what's defensible. A lot of the routes that organizations are taking may not be objectively uh evident because so much of it's behind the proverbial firewall. I think that's part of what some of the orders that we've seen are trying to encourage, which is more information to data subjects about what organizations are doing. But even there, it's not a full arc of a data governance or even record retention program. Seeing some pressures, it'll be interesting to see if that's something that ends up with its own point of enforcement, which will be easier to then pitch to clients and say you need to do this, not just because best practice, but also X organization was challenged here. But I mean, overall, yes. Like stay in contact. It's a community, especially for the data governance side. Reach out on LinkedIn, make those connections, and uh good luck.
Speaker 1Thank you so much for joining us, James. Um, it was it's always a pleasure to talk to you. So um thank you again.
SpeakerAll right, thanks so much for having me.
unknownBye.