Data Minimization in 2026: What CIOs and CISOs Need to Know About Unstructured Data, AI, and Cyber Resilience Artwork

The Amplitude of Tech

Welcome to The Amplitude of Tech podcast, produced by Amplix, a leading technology advisory firm, where we bring the voices of technology thought leaders, subject matter experts, and enterprise IT decision makers to you to talk about today’s transformative technology and how it can create opportunities for increased success.

All Episodes

The Amplitude of Tech

Data Minimization in 2026: What CIOs and CISOs Need to Know About Unstructured Data, AI, and Cyber Resilience

April 15, 2026 • Amplix

0:00 | 1:03:48

Unstructured data now makes up 85–90% of everything enterprises create, and most of it is sitting unmanaged on your network, quietly inflating costs, widening your attack surface, and undermining your AI initiatives. In this episode, Shawn sits down with Michael Smith, Director of OEM & Technology Alliances at Congruity360, to make the case for data minimization as the defining IT discipline of 2026. They cover why more data is not better for AI (the right data is), how orphaned and redundant data becomes a weapon for bad actors, what a crawl-walk-run approach to data hygiene actually looks like, and how to bring your CISO, CFO, and board along for the ride.

What You'll Learn:

Why unstructured data has become your enterprise's #3 most valuable asset, and its biggest hidden liability
What ROT data (Redundant, Obsolete, Trivial) is costing you in storage, backup, and cyber insurance premiums
How orphaned and unmanaged data expands your attack surface and becomes an entry point for bad actors
Why feeding AI models more data creates worse outcomes, and how to identify the "right" data instead
A crawl-walk-run approach to data classification and minimization that doesn't require boiling the ocean
How to navigate GDPR, CCPA, HIPAA, and NYDFS without getting paralyzed by compliance complexity
How to frame the data minimization business case for your CISO, CFO, and board

SPEAKER_00 0:12

Hey everyone, welcome to the Amplitude of Tech Podcast. I'm Sean Corner, Chief Marketing Officer of Amplix. Today I spoke to Michael Smith of Kongrovi360. I didn't think he could do it, but he made the topic of data and data readiness sexy. If you don't believe me, listen to the podcast and see what you think. I enjoyed this one. I think you will too. All right, Michael Smith, thank you for joining the podcast today. Pleasure's mine. It's a uh tough name to pronounce. I've been practicing it all morning. I think I got it right. Is that right?

SPEAKER_01 0:47

Yeah, mom and dad pulled out all the stops.

SPEAKER_00 0:49

My middle name is Joe, so perfect. All right. So let's start off with just, you know, not a commercial, but uh, if you want to give us the quick 30,000-foot overview of what Congruity 360 does.

SPEAKER_01 1:02

Yeah, Congruity 360 focuses on unstructured data management. And really what we're we're looking at is helping our customers with their number one uh unstructured data management problems and the issues that that occur from having large amounts of data and not having in place the ability to manage it properly and what those problems you know generate from from a security perspective, from a data management, from a cost perspective, from moving it into AI models, which happens to be the number one problem that organizations face when they're trying to launch AI projects, is just having the right data. So we've really focused in on a couple of key aspects, putting the right processes in place to match the size and the problems that you're having with it. So it's not a kind of a one size fits all, you run everything through the same process. We have multiple steps, and therefore we kind of match up the right process to the to manage the data more efficiently and do the process uh more efficiently.

SPEAKER_00 2:11

It's interesting that we're in a world now where a company like yours needs to exist. If you went back, you know, 10, 20 years, their businesses had data, right? But unstructured data was not inherently a problem. It was maybe a mess, and maybe it would cost a little bit of extra money in terms of uh infrastructure cost to host all that data. But here we are, we're living in a data economy. Data probably could be argued as most enterprises' most valuable asset. Would you agree?

SPEAKER_01 2:37

You know, it's funny you said you should say that. Uh I looked at some historical uh um research uh analysis and 15 years ago. You're right, it didn't even, you know, when they asked uh CEOs what the most important assets are within the organization, data didn't even get on the list. And now you you'd see that right behind you know human resources and and finances, data is number three on the list or number two. Depends on which industry you're in. So it really has moved up in importance. I think people are starting to understand and and take uh take note. And certainly as we're moving into a more AI-centric competitive environment, folks are looking to get advantages and and take uh uh take advantage of their data sources and start using them as and treating them as assets. So it's fortunate for us and our industry that people are are doing that. And I think you're also starting to see it in some of the frameworks out there. Um NIST is one of the frameworks that we really follow a lot. And the NIST framework, for example, for AI and also for cybersecurity have moved data management uh into the forefront. So it's all uh good for us, I think.

SPEAKER_00 3:56

What isn't data? I think intuitively everybody knows what is data or what data is, but what isn't data? Because in today's world, in today's environment, we're we're collecting all this data, all these different inputs, all these engagement points and interaction points with the business and the business's digital footprint, and there's interactions happening through the contact center. And contact centers now are not just people on the phone, but they're their chatbots and their SMS messages. There's so many different touch points, and there's so many different you know, internal processes and workflows, and there's so many human beings, but now there's also AI agents, and so what isn't data? Like it feels like everything is data, right?

unknown 4:39

Well, yeah.

SPEAKER_01 4:40

I think everything's it is, you know, from a business perspective. I mean, even conversations you and I having is data, we're transmitting data back and forth. So when you think about it, human interaction is all about transmitting data to one another. So it could be visual, it can be oral, it could be uh written, you know, all of that is data. But from a data management perspective, there's really two types of data. There's two main types of data. There's unstructured data and there's structured data. And your structured data is going to be your databases and your systems that are cranking out data that is organized and into standard, you know, database-based kind of uh organizations. Unstructured data is all the other. Now, originally businesses all focused in on the importance of their structured data. Why? Nested heavily in the applications, whether, as you say, it's ERP systems or whether it's their financial systems. Taking a look at that was all run on structured data. But now, especially the communication capabilities and how we interact as human beings has been more on the unstructured data management side. So there's a lot of information that's captured on the unstructured data side, and that growth has been unbelievable. Unstructured data now accounts for about 85% of the data that enterprises create on a on a daily basis. So when you have 85% to 90% of your data is unstructured data, um, and where it's stored, how it's stored, how who acts it, it all becomes really important relative relative to how a business operates.

SPEAKER_00 6:21

I want to take a slight tangent for a second, but are you familiar with the work of uh Antonio Damasio? I can't say that I am familiar with Antonio. So he's a uh neuroscientist that has done groundbreaking work on emotions. So the name of his uh most popular book is called Descartes' Error. Uh Descartes' Error. And the thesis of the book is that, you know, Descartes said, I think, therefore I am. And what he's saying is basically I feel, therefore, I am. And the thesis is that emotions are part of the decision-making process. So even when you think about making rational, analytical, logical thoughts, emotions neurologically pay are a big part of the process of that decision-making. And you don't know it. It's not emotion in the sense that you're cognizant of that feeling that you're having, it's more an act of logical process. But what I love about what you're you're saying is you're putting the human at the center of the data conversation, which I think is so important. So many businesses, you know, especially in a B2B environment, they they don't really think in human terms necessarily. And so what that makes me think about is the data that comes from like sentiment analysis and a contact center, for example, right? And it's like, what are we actually measuring there? And what can we do with that data? And I think what you're saying is that there's there's gold in a lot of this unstructured data, and almost all of the data that we have is actually unstructured. So I guess that's a really long roundabout, probably snobby way of getting to the question of what can we do with all this unstructured data? Is it usable?

SPEAKER_01 8:03

Of course it's usable, but it's just like mining when you think about mining for minerals. Uh, what is the how much of that uh that deposit is concentrated in one area and how much is it spread out all over the place? And uh speaking of human nature, I always say that human beings have part squirrel DNA in them because they have a tendency to take data and store it wherever they are and around it in multiple copies. So data sprawl becomes an incredibly, incredibly important issue to deal with when you start talking about managing data efficiently. So the ability to track data wherever it goes, uh be able to catalog it, to have the capacity to understand what your data sources are, and then winnow it down. It's always about culling the data sets down to what's relevant to the project that you're working on, what you're trying to accomplish as a business, what are your business goals? So whether it's you know putting together AI models or just making your business more efficient, is understanding what the goals are and then matching the data sources and the data elements to that and making sure that you've got an efficient setup to find that data quickly, easily, and make it still at the same time available to everybody.

SPEAKER_00 9:19

So since we have part squirrel DNA, I think one of the things that squirrels do, just to carry your analogy a little bit further, is they take their acorns and they stash them in all these different places, right? And maybe there's logic to where they put it, maybe there's not logic to where they put it. But what we do know is that they don't always go back and get 100% of the things that they squirrel away. So how do enterprises manage individual people and maybe broken down processes that don't put data in a usable or findable or easy place?

SPEAKER_01 9:54

Well, I think you've you've nailed the the probably the number one issue and problem out there and from data sprawl and data growth and uncontrolled data growth. And that's the issue around rot, which stands for redundant obsolete and trivial data. And the other thing is, you know, we have you have turnovers. Think about you know standard enterprise in a business. Uh you have people that work there for a number of years and then they leave the company. Almost, I'd say 90% of the time that data becomes unmanaged. It's just sitting there. And it's sitting out on file shares, it's sitting in SharePoint, it's sharing sitting in box, it's it's sitting all over the place. Why? It's not anybody's problem, right? It's just, you know, that person's no longer here, out of sight, out of mind. That data sits there. And over the years, you've got uh huge amounts of data sets that are completely unmanaged. And when that happens, you've got, once again, you could have some really important information that's in there from a you know business operational standpoint, and also from uh a governance perspective, and also from a risk perspective. Um a lot of those files include PII, it also includes uh uh certainly uh IP uh in some instances. So the fact that that's spread out, it's unmanaged, it becomes orphan data, those things really don't do the business any good. So uh the ability to kind of organize that data and make an assessment of that data quickly, because once again, the sheer volumes of data is quite staggering. And the problem, you know, since it hasn't been addressed in in years, you know, and sometimes never in an organization, that calling process can be painful. Putting together a good tool set to attack the problem efficiently, handle large amounts of data, take the low-hanging fruit, take the easy wins, the ones that you can identify quickly and move on, is always the best practice to start the process.

SPEAKER_00 11:54

So what is the operational impact to the business in having all this data sprawl? I think we're kind of alluding to the opportunity cost, right? There's an opportunity cost of having this data and and certainly not being able to access that data in any kind of systematic way, right? But there's got to be real costs in terms of storage as a starting point, right? But even costs in terms of risk. So talk to me, talk to me about that.

SPEAKER_01 12:21

Yeah. We typically go in and start doing assessments for for companies of their data. We do data assessments. We go in and take a look at their data from multiple different uh viewpoints. The first one is from uh infrastructure optimization and and understanding that. And what goes into that? Well, you certainly have the underlying storage cost and all the associated costs with storage, but you also have backup costs, you also have network costs, you also have just power and cooling, all the things that go into maintaining a data center or you know, uh just maintaining a server comes into play. The other piece of the puzzle is that typically everybody you know leaves it on their tier one storage and they're backing it up. And you ask yourself when you start doing an analysis of it. For a while I've worked in in the area of uh enterprise backup. Most of that data is over you know, two, three, four, seven, ten years. Why are you backing up data that doesn't change? Number one, the main reason why you're backing up data is to have version controls to be able to take a look at the data and go recover. And if you have data that's over two years old, you're only gonna have one version of the data. That's the was just the way backups work. Um, so why are you backing that data up? Better off if you move it to a lower cost archive tier that has built-in availability, higher availability, you're gonna have better protection of the data. It's gonna be off your tier one storage. You've got the ability to call that data back if you need it. So no one can complain that you know you're you're you're taking my data away. You're just moving it into a more intelligently managed environment. So that's that's one side of the coin. From a risk perspective, as I mentioned before, let's just talk about orphan data. One of the things that happens during cyber attacks is the the criminals look for, they look for unmanaged data sources. Why? They can go in and adopt the SIDS and masks themselves to then go ahead and find other sources of data within the organization, sort of like a mole, once they get inside of it, then they make it and use it as an identification as they go through the system. So identifying orphan data and things from a security perspective is certainly key. And one of the other things that we also point out is that if you're leaving these things on your tier one storage, it's on your network, it's on your main network. And usually these have open read, write, and access permissions and what have you. So it's available, it's sitting there, it's vulnerable. Oftentimes, this data includes PII, all kinds of information. We've we find data sets where there's a list of 20,000 security numbers on it in one file, right? All of that stuff doesn't belong on your production environment. It doesn't belong in that environment. It belongs in a protected environment, pull it off to the side where there's no access point to it and move it off until you can manage it more efficiently and take it off. So what happens is by just going in through and doing basic best practices approach to data hygiene and moving that off of your production environment, you're able to shrink your attack surface, which is really when you'll start talking to cyber analysts and start talking about folks who are insuring companies from cyber attacks. Attack surface is a big determination on what your insurance discount rates are going to be. So going in and being able to reduce that attack surface, put it into a gapped environment where access is completely limited, is going to reduce that cyber attack profile dramatically.

SPEAKER_00 16:11

Isn't this so interesting? Because I I have not heard anyone talk about data being part of the attack surface, right? And that's that's been one of the interesting things in my career to see from a cybersecurity perspective is how the surface area of risk, right, has grown exponentially. And we saw it so much, you know, during the pandemic where people left the office, right? And so the periphery of the enterprise network all of a sudden is exploding exponentially because you've got all these individuals going home, and that just opens up a whole other you know, factor of risk. Now, that being exponential because you know you're taking one office that had hundreds or thousands of employees, and you're multiplying that risk by that number. Right now we're talking about data, and data is a much bigger set than employees, right? So that surface area and that risk factor must be growing by orders and orders of magnitude.

SPEAKER_01 17:13

Absolutely. You see the research research out there on how fast unstructured data is growing. And the, you know, I've seen recent estimates where data is tripling over, you know, four-year period. So with that kind of growth, you can imagine, and once again, you think about all the new applications, sharing applications. You know, as new applications pop up, they all have their applications and their storage areas. So each one becomes a silo of information. And you know, that information, they all have different uh protection levels, they all have different access controls and things in them. So there are gaps that that show up as far as being able to access that information. So I think you know, being able to identify, you know, early on where critical information is and taking steps to secure it. And then once again, if you have multiple copies of the same data, you know, you're just asking for trouble, right? So reduce the number of copies that you have, get it down to uh, you know, the minimum, and you're going to really dramatically attack, uh, change your your attack and your your cyber um your cyber vulnerability profile.

SPEAKER_00 18:26

And not only that, but to your point about the social security numbers, right? You're also increasing the impact that a breach could have by having all this data sitting around and accessible to bad actors.

SPEAKER_01 18:40

Absolutely. Once again, most early on in uh the whole concept of cyber, cyber resiliency and cyber attack uh defense, it was all focused in on you know the at the firewall, right? You're you're building you're building a wall around the organization, you're putting up your protections against that. But you're forgetting the number one issue and the number one thing isn't about just getting through your defenses, it's about getting access to the data because that's where the value is. If you look at what a uh you know a cyber attack represents, they're they're going to hold your data hostage, right? And that's what their goal is. So if you understand that instead of just building up and spending all this time building a great wall, so to speak, around your data and spending all your money and time and energy and effort on building that, I think there are more efficient ways to really shrink the size of the wall and rarely where you're going to spend the extra dollars and cents on security on that that key data area and grouping that off. So um, you know, those systems are perfect, but it's just logical to take what is considered the most important data and put it into the highest uh security levels that you have.

SPEAKER_00 20:00

Yeah, and building that wall, I mean, you could build the the biggest wall out there, but there's still an open back door, which is the human beings that work for your company and that are part of your supply chain. And that's always going to be your biggest risk factor because humans make mistakes and humans fall for things. And sure. So bad actors are able to get in through human beings. So while you do need to focus on your defenses, I think you also have to focus on the mitigation of an incident happening. I would assume maybe you'll agree with me that the more data that you have, the more complex your recovery is going to be in the case of, say, ransomware and you're trying to restore your data so you can get back into business, the more data you have, the more failure points, the more challenge.

unknown 20:51

Right.

SPEAKER_01 20:52

And let's let's face it, whether it's backup or whether it's restoration, you've got cycles. And I think by going through and defining what's critical information, you can triage and put what data needs to be restored first. What's the most relevant, important, highest value to the organization. So thereby determining that you can set up a restoration process that if you do come under attack and things are going, what gets restored first is going to be the most mission critical data to you. And so out of unstructured data, that is more difficult than as I said before. When you start talking about ERP systems, it's pretty defined, you know what it is. And you know, you're going to do that on the structured data side. But unstructured has been relatively more difficult, but it just requires, once again, identifying and declaring if by cat you know, classifying that data as being important, tagging it and making sure that uh you've got the systems in place to find it and bring it back as quickly as possible.

SPEAKER_00 21:54

So everything that we're talking about, we're dancing around, I think the solution, which is data minimization. Minimization, right? So what does it mean to minimize your data in this context?

SPEAKER_01 22:03

Yeah, I I think you're going to start seeing. I've I've I've heard through the grapevine that uh some of the research analysts, that's gonna be one of the key areas that they're talking about here in in 2026 is around data minimization and the benefits of data minimization. And having been in the data side of the field for you know 10 years plus, it's refreshing to hear that because uh instead, when you've got the just crazy growth of data and most of it being unmanaged, it just leads to really bad issues and problems. So minimizing data, it just requires a different mindset. It requires, you know, once again, doing that assessment, understanding the value of the data, and getting rid of the chaff. And that's just you know, good data hygiene.

SPEAKER_00 22:51

So we had a little pre-podcast planning session, and you introduced this idea to me. And the tension that's been living in my mind ever since is uh, you know, you used the term rot before, right? So the teeth and rot is trivial data. And what I'm struggling with a little bit is uh, like we said 15 years ago, we didn't know how important data was going to be. Five years ago, we didn't know that AI was going to be as capable as it is today. Use cases are growing and changing every day. We don't know what tomorrow is going to bring, let alone three years from now. So like the question kind of revolves around how do you know it's trivial versus you just don't know what the use case is going to be for it in the future? Like you could look at your data and you could say this data here has a clear business use, this has a legal use, this has uh a compliance requirement, but the rest of it that isn't clear today may be future opportunity that we're losing by minimizing the data.

SPEAKER_01 23:59

Yeah, no, you're absolutely right. And that there's a the science portion of it and there's the art of it, right? And you you mentioned before as we start talking about that the human aspects of things become more important as we go through, especially now that you're trying to use AI to mimic what human the human mind is doing, that that becomes important. But there are some certain things that you can do to start the process. And you're right, you're never going to be 100% accurate or perfect, but there are steps that you can take that to help you manage that process. And what do I mean? Just by taking a look at metadata, you're able to make some key decisions about data. And when we start talking about trivial, we go through with our our customers and take a look at, well, what applications are key to your business right now? And what applications do you think in the future will be? And and understand that. And there are certain ones that you can agree upon. There are certain applications, number one, that aren't supported anymore, right? So you can't even the data is sitting there and you can't even use it because the application, you don't even have the applications anymore. So just being able to go through and identify applications that are that are no longer supported or what have you, or not relevant to the business. So you can go through using different pieces of the puzzle there. There's one data scientist I talked to before that said he starts every one of his projects taking a look at the DNA of his data, and that's the metadata. So I found that to be a very compelling analogy there because you know, that's what we preach, is taking a look at the metadata from a cataloging and understanding and provenance of the data. You can make certain determinations from that and be able to start the process. The rest of it requires contextual analysis. So actually going in and taking a look at the data and starting to find and identify what the key things are. But as you say, that that is sitting down and having a conversation with the organization to find out, you know, and agree upon. And that's not always the easiest thing. But as I said, you can certainly start the process and it takes a big chunks, big bites out of data just by doing that cataloging and identification of the data through a metadata analysis. So that that'd be my suggestion as a first step.

SPEAKER_00 26:27

I think we we touched on this briefly, but uh, if we could spend a little bit more time on it, the process of minimizing your data does have a cost implication in a positive way, right? So can you talk about cost as a forcing function of minimizing your data or as a uh a benefit and outcome of minimizing your data?

SPEAKER_01 26:44

Yeah, it's it's funny. Oftentimes we'll get pulled into projects and it will be certain stakeholders within the organization. So let's just say we're going into a specific piece, and there might be a merger or acquisition coming in, right? And they're looking at the new company and they're they're trying to determine, you know, what stuff are they going to bring over, what stuff are they gonna leave behind, and what have you. But going through a discussion with the customer to identify, you know, those pieces of the data, the benefits of doing that cleanup and things before you bring it in often pay for the entire project itself, the cost savings. So what we typically do is go in, do a quick analysis, make some suggestions, and based upon those suggestions, model out what the what the potential cost savings would be on that from a you know from an infrastructure cost perspective, but also say, you know, from a cybersecurity perspective, if you shrink the attack surface, those kinds of things. So there's really um you know, a good methodology to put into taking a look at a project and not only you leveraging it for uh, you know, once again, the actual project itself, what it can do as far as financing, getting cost savings, getting some uh efficiencies, scales of efficiencies. And then, you know, it's it's usually what you do in those those situations is then look for the next project. And uh you typically find when you start a project like that, other folks within the organization say, Well, can't you do that for my division over here? Can't you do that for my department over here? Can't you do that? So consequently, we we typically, once we get into an organization and start cleaning up the data, it becomes word of mouth within the organization, comes through and people start taking a look at it.

SPEAKER_00 28:35

I also wanted to ask about the outcomes with AI, because I think intuitively, probably because the way that most of the public has been introduced to AI is through large language models. And the implication is that more data is better in that kind of scenario, right? You train it, you end up getting a better understanding of of how humans talk and interact and what they mean and what the intentions are. But you know, famously right now we're seeing a pilot-to-scale gap, right? A lot of people are are talking about the challenges of going through a small pilot and actually having that produce an ROI, but also just the intended result, whatever the intended result may have been. And I think we're finding that for these discrete functions that AI is being leveraged for, more data is not necessarily better, and that to get better outcomes, you may need to have less data.

SPEAKER_01 29:27

I would argue that more data is not what you want to do. I would argue that the right data is what you want. You want a representative uh data set, number one, because when you think about it, more data is start talking about rot, I start talking about duplicate copies. What does that do to your model? Right? It's going to give you bad results because guess what? You're going to have something that's overweighted within the model because you've got duplicates or you've got data that's outside the range of what you're looking for. So I would say, you know, the number one thing is having the right data. And then the other issues when you start talking about large language models and other things such as that, you you start talking about, okay, what information am I putting in there? Is there PII? Is there sensitive information? Well, you're putting it in a large language model. You just open Pandora's box, right? It's now been shared. It's in the model somewhere. You've got to be careful. And as I said, the number one problem is we we did a survey of uh six, it was through an organization that surveyed 600 uh enterprises worldwide and surveying their CIOs and CISOs. And the number one issue and problem they had with their AI implementations was just getting data, the right data into the model. The cost involved, there's a reason why Nvidia is the most valuable company in the world is guess what? People are spending a lot of money on processing data. And if you're wasting it by putting large amounts of bad data in there, shame on you. You've got to know what it is. So I think a lot of the the data scientists that are working on projects, they get it pretty much. So it's it's folks that are beginning to dabble into it and looking at it from a from a just dabbling in it perspective, they're gonna stub their toes a lot. So I highly recommend that spend uh a few more bucks on the upside of cleansing that data and getting the right data in there. It's going to serve you so much better on the outcomes that you get on the back end.

SPEAKER_00 31:46

Is this also a hedge against future operating costs for AI initiatives in the sense that, you know, the you mentioned Nvidia, these chips, they're not going to last forever, right? They're they're gonna be deprecated, three years, or you know, there's better chips that are coming out, right? And so that's a significant cost that these data centers or enterprises that are doing it on their own are going to incur to get to the next chip to realize the next performance power or level, because that's gonna be necessary because as you stated before, data growth is exponential, right? It's just spinning out of control. So isn't it a fair statement to say that the the less data that you're processing now and the better your discipline about using data to process in the future, the less your costs are going to increase over time? Absolutely.

SPEAKER_01 32:42

Cost, uh, time to value. If your models are running shorter, you're gonna you're gonna represent, you're gonna see results much faster. So as I said, the costs are minimal on what you're gonna spend on cleaning up versus what you're gonna spend on the back end. All right. So it makes a lot of sense to spend a little bit of time, get some data cleansing going on, get some uh you know, best practices approach to making sure your data is organized. Other advantages when you start getting into the finer points and taking a look at the data and tagging that data with classification, context around that data. So as you're feeding it into the model, you're also helping the model by giving it some context around that. There's all these advantages that can happen through just proper prep, uh, one-on-one, just getting the data right.

SPEAKER_00 33:33

Now, to drift away from the things that the enterprise has agency over themselves necessarily and kind of talk about maybe market conditions or or risk inherent to the market, what are the risks of public model data sharing when it comes to AI?

SPEAKER_01 33:52

Once again, once it's public shared, it's available to everybody. So if all of a sudden you put into a large language model personal information or sensitive information, that's out there. You have no control over it. It's gone. It's as if you went out to the the world and said, here's this data, go for it. So you gotta be careful. You gotta be careful. And I can tell you, you know, for for many years, starting I started off on Wall Street, um, the the large players were deathly afraid of going into any AI models because they knew they that the amounts of sensitive data that they had that they just couldn't allow out, you know, into a large language model or anything of those, you know, uh just stop them in their tracks. So there are ways of of managing that. And as I said, it's going through cleaning up that data where you have to uh uh you know uh take that data out of the pool or or do some changes to that data before you you implement it? It's well worth it. You're you're I guarantee you you'll spend a lot less problems and money, have less problems, and certainly from a litigation and liability perspective, dramatically reduce your exposure.

SPEAKER_00 35:09

How can minimizing data affect how the individual employees or people that access that data's potential for creating problems and leaking data through these public models?

SPEAKER_01 35:22

Well, there are multiple ways of doing that. We have one customer I'm thinking of as an example. All of their personal data for each one of their employees, they get a final say on um, you know, how that data is being handled and managed. So we have on-made policies that go through and and take a look at it and suggest that, but they get a sign-off on that. And so they can say yes or no. But we then also cleanse the the data afterwards. Truly, it's a matter of making sure that everybody becomes a data steward and under that sense, right? So you're trying to put together a culture within an organization and putting in in place, you know, checks and balances and and putting in in place that that type of structure really goes a long way because if everybody in the organization is is taking, you know, uh taking responsibility for their own data, it it just works in the ethos for the organization and people uh uh attack the problem or deal with the problems differently.

SPEAKER_00 36:28

Do you have any tips for how to impact the culture in that way and and make good stewardship of the business's data, the clients' data part of the culture? Because it's it's often hard to get everyone in the boat rowing in the same direction. And I found with other types of projects and implementations, a lot of it is understanding the why and what the impacts are and relating it in a way like, you know, in in sales and in marketing, right? The first thing they teach you is your language needs to be oriented around the question that the the customer or the client is asking themselves, which is what's in it for me? Why do I care about what's coming out of now? How do you navigate that?

SPEAKER_01 37:07

Well, I I think once again, if you can demonstrate from let's just say from a um financials perspective. So we go in and start doing the following data cleansing and data, we're able to save the company this amount of money, and all of a sudden we got approval for this part of our budget that affects our lives, right? So repurposing that that cost center to other projects within the organization, I think is a great way and a great motivator to build within an organization an ethos of, hey, you're gonna see some benefits of it, right? And certainly giving people the ability to be their own personal data stewards also engages them and shows them because you know you start saying, Well, do you want your personal information out on the, you know, out for cyber criminals? And you should get rid of this. And, you know, you find out that when you start bringing it back and relating it to the individual themselves and how it can affect their life, not only just the organization's life, it it becomes a much easier conversation. I think you've got more buy-in.

SPEAKER_00 38:16

Yeah. So you mentioned NIST before as a framework. I've had conversations with cybersecurity experts, and a lot of them caution that you know leveraging these frameworks is it's helpful, it's a tool, but not if you treat it as a check the box kind of approach, right? So in the context of data management, how would you how would you encourage technology leaders that are listening to this to use those frameworks as a tool, but where do they need to go above and beyond?

SPEAKER_01 38:48

Yeah, you're absolutely right. We're listening we're all into checklists because it it makes it makes the job easier, right? I've checked that box, I'm I'm good, out of sight, out of mind. I think by once again, tying those initiatives into reaching goals, and a part of those goals are, let's say the other cost savings, that some of that budget goes back and and and goes into other projects within the organization that people can benefit of. I think once again, building that ethos where you're reinforcing that, you know, this is good for the company, but it's also good for you as a stakeholder within the organization. I I think it's you know, building it by department, breaking it down into manageable areas that you can say, hey, if our department is able to show these, show the this kind of hygiene, and we're able to represent this kind of savings, it gets shared within the organization, you know, as far as the projects the department wants to work on.

SPEAKER_00 39:46

There are regulations out there that you know dictate how businesses can use data and how the people whose data it is that you're accessing are able to control their own data, right? And specifically businesses in the US doing business in Europe have to comply with GDPR. Right. Almost everybody's gonna have to comply with CCPA. Um there's there's other state level uh regulations, right? So HIPAA, HIPAA, NYDFS, all of all this, yes. So so what do you like what it how does a business manage all of this this patchwork of regulations and requirements? It you know, it's funny.

SPEAKER_01 40:28

I've looked at and I've talked to records managers going you know back through the years, and um it it it really hasn't changed that much. The regulations are all built around you know specifics about um managing things in a thoughtful uh way. Um so I think uh once again, if you can show within the organization that you've taken steps to mitigate the risk, first of all, you're gonna be way, way ahead of the game, especially from a litigation perspective. If you can show that you've you've followed a a you know internal rule to help manage that data, it's gonna reduce any kind of exposure from litigation. But you're still gonna have you know fines that are associated with if you're going to have, let's say, uh release of personal information, let's say during a cyber attack and you didn't have any controls in place, you're gonna you're gonna face fines. So I think it's a matter of just adhering to a couple of key things. And the key things are number one, classification and and putting a classification and identifying information that falls into that class classification, right? So, you know, this is personal information, it's got to be managed as personal information. And it can be big buckets. You're way better off just starting with big buckets, putting in place, making it enforced, than coming up with a taxonomy that's you know 25 pages long, that, you know, oh, it falls under this regulation, this regulation, this regulation. You're better off managing it first at that level and saying, okay, what is what is the key or the longest poll in the tent, okay, so to speak, from identifying, you know, what is the worst case scenario. So once you do that and you manage that data, you're now taking that risk profile and made it way more manageable. So I always suggest to organizations, you know, you got to start somewhere. Don't try to boil the ocean, break it down into chunks, look and identify what are the key areas that are going to keep you up at night. I'm going to address those first and then work on the rest afterwards. Because it is, it is complex. I mean, when you when you get into all the rules and regulations, if you want to sit down and and look at each rule and regulation in and of itself, it's like going through an ISO for you. It's it's pretty detailed. But I say, you know, don't don't stop don't stop the process because it's too daunting. Pick your battles, but start organizing your data into the bigger buckets. And then there are ways of managing it through the process. Like if you have retention schedules and something's, you know, you're not sure whether it's past the retention schedule, move it what we call doing a soft delete. Move it into a protective environment where nobody has access into it. You could take the time and energy, but it's out of that attack surface and people don't have access to it.

SPEAKER_00 43:34

Yeah, you take a crawl run or crawl walk-run approach, right? Don't just sit on the sidelines because you can't do it perfectly. Uh, process by analysis is not going to get you safe harbor if you do make a mistake, right? But you could at least show that there was intention there and that there were actions there. And this is maybe a turn in the conversation to the next topic, but what you don't want is for there to be some kind. Kind of a breach and for customers that are affected to find out that you were sitting on your hands and you didn't even make an attempt to protect their data, because that's a much different conversation than you did take the right steps or at least the best steps that you could and something still got through. Right.

SPEAKER_01 44:19

There's, for example, from a litigation risk, let's talk about that. Getting rid of data. If you can go into a hearing and you got rid of data and they're saying you got rid of the data incorrectly, if you have a policy and you followed that policy and it was presented, and that policy in and of itself was is pretty good, but you still you might have been some data caught up in that, you are in much better position than if you have no policy and that that happened. I can guarantee you you can talk to any litigation lawyer that's going through on from a privacy and also from a you're you're gonna have a much easier time of getting through than to the other side, much less damage to the organization.

SPEAKER_00 45:04

Sure. And you know, the turn that I wanted to make is to trust and board-level conversations about data, right? Because being a marketing guy, and and in particular, my background is branding, right? So I I look at a lot of things through the lens of a brand, and brand equity is built on trust. Customers, your market, they need to be able to trust that the business is going to deliver on the promises that they make and that they're going to that that the product or the service is what they say it is, and the outcomes are going to be what they say it's going to be. And but they also need to trust that when something goes wrong and they call in, they're going to be supported. And if there's a problem or the business makes a mistake, they're going to make it right as best as they possibly can. And I think that what's unforgivable from a brand perspective and and really an existential risk in today's world is if you betray their trust by not respecting their data and protecting their data, because everybody knows that there's a target on the back, you know, air quotes, of the data, right? So that's that's one of the first things that I think a business really needs to look at is how are we going to deliver the best defense as possible to the data that we've collected from our customers?

SPEAKER_01 46:31

Yeah. No, there's you're a steward of their data, a act like one, right? So I completely agree with you. You're given the keys to the kingdom. And you know, many times you look at this information, if it gets in the wrong hands, you're really hurting people. So it makes sense to, right? You can't be paralyzed and not, you know, continue the key functions of your business, you know, because you're completely just frozen by the scale of you know the responsibility. But you can do what's reasonable. And what's reasonable is once again, I'm saying take care of the low-hanging fruit first. Start a process. When you need more time to get through something, take actions that can really stop uh you know 95 or 99% of the issues and problems, and then work through it to uh take care of the rest of it. But doing nothing, sitting on your hands, you know, you're not doing yourselves any favor.

SPEAKER_00 47:34

So through that lens, how do you go about approaching this conversation to the rest of the C-suite, to the board of directors? How do you justify? Because this is not sexy stuff, right? So if you're gonna spend money, time, resources on something, I feel like a lot of times the board, the the rest of the C-suite, they want it to be something that's going to be impactful and and visible. And this is not, it's impactful, but it may not be visible. It's not revenue generating, it's risk mitigation, right?

SPEAKER_01 48:06

So Sean Sean, it's it's funny you say that because I can walk into a C-suite and talk to a CISO and I said, Hey, I just reduced your attack surface by 70%, your cyber attack surface. And oh, by the way, I saved you a bunch of money, and oh, by the way, I reduced your cyber insurance. Watch the response. That person, you know, that that uh persona is going to have a much different respect response than let's say somebody that's in, you know, a finance type person will certainly appreciate the the cost savings, but it's not so you need to be, you know, we we put together assessment reports and we try to try to think of the different personas that are sitting in the room and go through it and highlight, you know, from that data what that data is telling me that might be relevant to them. So it's you know, there are different ways of taking a look at what you're doing, but it, you know, the the folks that are really getting ahead and the folks that are really competitive are the ones that are understanding that, hey, A, my data is important, and B, that you know, proper management of that data as an asset is good for everybody and within the organization. And if you you know, and it starts at the top. So if you can get you know somebody up at the CEO level, uh, you know, we've made levers out of some of them, you know, going in, you know, some of our proposals and then walking them through it. So start a project, you know, show some momentum. And what's key to it is once again, doing that quick analysis and getting some low-hanging fruit to show some progress within the problem early on. And once you do that, the projects typically build momentum and you get core buy-in from the C-suite because guess what? You're delivering on what you said quick.

SPEAKER_00 49:59

Once again, trust. Yeah, trust. Um, you know, risk mitigation, I think, always resonates, but we've we've come a long way from say two, three years ago when the conversation was the board saying we need AI, and the CIO is saying, I don't know what that means. And now I think the board is still saying we need AI, but the CIO knows what that means, and they probably know where they want to go with it, at least in some respects, right? They know where to start if they haven't already started, which I think most have. So the maturity has happened at the uh on the enterprise side and maybe not on the board side. And I've read some data points, and I I wish I could remember exactly where it was. I want to say it was something from McKinsey, but but basically they were making the case that the board is still basically AI illiterate, right? In general. And so it needs to be kind of an education process. And so I'm starting to think of the we need AI conversation to be like AI is the destination, and there's two different ways that you can get there. The first way you can get there is by taking the surface streets that exist today, and that's the tactical approach. That's I'm gonna pilot this, I'm gonna buy this, but we're gonna start that. Or what you could do, and this takes a little bit more time and it takes a little bit more money and a little bit more resources, but you can start an infrastructure project where you build the superhighway to AI. And what has to go into that superhighway is this type of project, data readiness, and it needs to be governance and it needs to be education, and it's all these not sexy foundational things that in themselves don't necessarily have an outcome and don't seem accretive to the business. But what it does is it enables you to get into the fast line, uh, fast line in the future with other projects.

SPEAKER_01 51:46

Yeah, I I I think it's like anything else that we've gone through. When when something's new and you don't have a lot of knowledge around it, what do you typically do? Well, you're the larger enterprises, you go out to a consulting firm to bring them in and have them take you through the project because supposedly they've got the, you know, the they've got some experience in doing these projects for other companies of your size and they understand what what's required in that. For those companies that have gone through that and want to start taking it on in South, they obviously have had some history. They obviously have brought the right people in. They've identified a chief data scientist and somebody who works underneath the the CIO that uh you know has been brought in strictly to handle that, and they're gonna they're gonna manage that, and they're gonna build up uh the right team around the infrastructure and the right team around the tooling and the data management tools around it to go in. Or as I said, you you you do you do an outside consulting group that's gonna come in and and provide you with that that expertise and that knowledge. So it's yeah, we're early on as far as the maturity of the marketplace in getting projects done more efficiently and off the drought. There's a lot of you know, stubbing the toes right now. So you're hearing, you know, anecdote anecdotally, and also I think of some of the surveys that I'm seeing out there, that you know, these projects aren't necessarily being successful uh early on, and so people are having second thoughts about it. The bottom line is I think you know, the the cat's out of the bag. It's inevitable. We're all moving towards a more AI, um more AI requirement within an organization to be successful and competing, you know, in today's marketplace. So I think it's inevitable that you're gonna have your ups and downs, but I go back to you know some really core basic ideas. You know, I'll beat the drum until the cows come home. Bad data in, bad results out. So you're you're gonna have to clean up the data, you're gonna have to have some tools to do it, you're gonna have some people that have to understand the data and what you're trying to do. It doesn't matter if you're doing large language models or you're using agentic, generative to to agentic AI, especially on agentic AI, when you're you're essentially teaching the endpoints to be be autonomous and understand that you've got to train them right. So if you don't train them with the right data up front, you're asking for problems. So, you know, it's just, I think, you know, proves the point that you start off just understanding your data is the first step in any of these projects.

SPEAKER_00 54:25

Yeah. And I think it's probably worth talking about what a right term, let's call it a data forward organization looks like, right? So if you're leading an IT department, what are the the right headcounts to be hiring for to start this process and to manage this process? I know there's company like yours that can help with pieces of this, right? But there certainly must be some level of internal ownership, right? So what does that look like, do you think?

SPEAKER_01 54:55

Yeah, we we you know, our company specializes in providing the tools to the folks that do it. So um we work with some large consulting practices and things that that come in there and smaller, you know, boutique shops and and it varies, you know, once again, the complexity of the the situation, what the what the project is, they're very small, very easily to find, you know, projects that you know have big payoffs that that folks can implement. And there's other ones where they they want to boil the ocean and it becomes you know way more complex. But I think what I'm seeing and the results we're seeing early on is those projects that are focused in on, again, I keep using the term low-hanging fruit, is is taking a look at what processes that the organization does that are highly repeatable, highly understandable, that lend themselves to, you know, we don't need to put people on this, we can put an AI into place and do this because it's it's completely scalable once you get it right to handle that process. So it's I think, and I I I've said that uh data scientists are really, I think, the key people that you have to focus in on. You're gonna be staffing an organization, get yourself a really good, and you want to be doing AI, get yourself a really good chief data scientist. They're the ones that are gonna drive and help you put the right people in place to get the project done because they really are thinking the problem through on how to process that data.

SPEAKER_00 56:29

In the cybersecurity space, for years now, the story has been that there's more demand for people with that skill set than there are people that have that skill set. And so those people command a premium when it comes to a salary and compensation package. Um, but that's if you can even get them, right? So that's why you know MSSPs are so valuable to an organization and their you know security partners. Yeah. Uh so but so you know, my question is like, do you think that we're heading in that direction are we there now? Are we heading in that direction with data scientists?

SPEAKER_01 57:08

I completely think that's the case. Yes, absolutely. Absolutely. If you you you you know there are so many folks that you, you know, that are trying to get into the and even, you know, not even at the large organizations, you know, mid-sized companies that you know certainly see an opportunity to leverage AI to bring value to it. So I would highly recommend that they take a look at consulting firms around and and get a uh virtual data scientist to come in and and handle the project.

SPEAKER_00 57:43

Sure. Yep. What about seems like investing in the people that you already have may be a good strategy as well. And what I mean by that is, you know, the the promise or the threat, depending on your perspective of AI, is that it's going to make certain jobs redundant. And I think that extends in the IT department as well. And so most experts that I talk to, they don't advocate for replacing people necessarily, but augmenting people with AI. Is it a good strategy to look at the people in your department now and invest in their future and your future by giving them education and training in data science?

SPEAKER_01 58:24

Well, I I would I would argue that um, you know, whatever your whatever your job is within an organization, you know, if you if you you're a manager, you're you're trying to do and understand the value of somebody within the organization. Investing in that that talent is always you know key to a company's long-term success, right? Yeah, you build an ethos where you know we're we're we're building a monster here that's going to be tough to compete against. So yes, I think that's true. And if you know having an AI advantage is key, then to your organization and it what you're looking to do in the future, absolutely. You want to keep you want to keep a great bullpen out there that you can pull people in from, you know, to to fill in. So yes, building a bench, building a bullpen, absolutely. And I would argue that's the same in you know, whether it's data science and AI, or whether it's design, or whether it's you know, it's not it's not really different, right? You want to keep that bench moving.

SPEAKER_00 59:29

Any predictions for the next 12, 18, 36 months when it comes to uh data and AI that you uh want to go on record and be held accountable for?

SPEAKER_01 59:40

I think that you know, going through boom bus cycles within, you know, within there's inevitably gonna be a slight pullback. I think it's gonna be slight. I think, you know, once again, the um the advantages are just too huge that they don't overcome slight downturns in the marketplace. Nothing goes dramatically up on an old Wall Street or I can tell you that. So I think it's it's a little bit cyclical. But I think, you know, three years from now, you know, you ain't seen nothing yet as far as what AI is gonna be doing. Um yeah, uh you you're you've got smart guys. I think uh Mark Cuban just said that you know 40-hour work week is uh out the door. Um you know, that uh you we're we're gonna be, you know, working less and working smarter. But you know, I I I I'm a little hesitant to say get get that bold out there. I just think that you're going to see uh you know continued progress within the marketplace as far as AI adoption. I think people are gonna get smarter about what they're using it on and going back instead of trying to boil the ocean, they're gonna start off with you know those projects at low-hanging fruit, you know, automating within a department, you know, a specific, a specific process and that's repeatable and working on those. And then as you grow and get more sophisticated within the organization, you can branch out. But I think AI is here to stay. Yeah. I highly recommend people that if you're not thinking about it, you really should think about it. And as you say, start start planning for its implementation.

SPEAKER_00 1:01:26

You know, I said earlier that we live in a data economy and we do, but we also live in an attention economy. And guys like Cuban, uh traffic in attention, right? And so when they make those bold predictions, uh they almost never come true. It's all about you know getting the uh the tweets and the clicks and all that. But for since computers were invented, we've been saying the promise of computers and and the internet was we would get time back. And you know, we we work more, it's just about an increase in productivity. So I don't think we're ever, I don't think the 40-hour work week is going away unless it goes the other direction. But um, but I do think we'll get more productivity squeezed out of the people that are putting those 40 hours in.

SPEAKER_01 1:02:11

Oh yeah, I mean, let's let's face it, uh, there is a um, you know, the the goal of the company is to return, make returns for their the investors and the stakeholders in the company. And obviously, if um you know the alternative is to you know cut back on hours, so obviously if you're gonna do that, you're gonna try to lower your costs from a you know cost perspective. So I I agree with you. I was just making the comment because I consider Mark to be a pretty shrewd and smart businessman as far as you know understanding what's going on within the marketplace. And and you know, he's believing that AI is here to stay and it's gonna make a big, huge difference, whether it's you know, 40-hour work week or whatever. I'm just pointing to it more as an indicator that the belief in AI, as far as being a critical uh component of our go forward path here, is is you know, business is mature.

SPEAKER_00 1:03:07

I certainly agree with you and Mark about that. Michael, is there uh someplace that people can find you if they wanted to connect with you?

SPEAKER_01 1:03:14

Yeah, uh I'm at uh congrudy360. My uh my email address is msmith at congruti360.com. Feel free to drop me a line. Happy to talk to you a little bit about and introduce you to uh some of the things we've been working on.

SPEAKER_00 1:03:29

Excellent. Michael Smith, thank you for your time and expertise. Appreciate it.

SPEAKER_01 1:03:34

Sean, have a good day.