The Dashboard Effect

How to Prepare Your Data for AI Advances

Brick Thompson, Jon Thompson, Caleb Ochs Episode 112

For months, Brick and Caleb have been recommending that companies start preparing their data infrastructure to leverage the current and upcoming AI technologies. In this episode, however, they break down how to prepare. They emphasize the need for data consolidation (and recommend OneLake on Microsoft Fabric), a robust semantic layer for report development, data literacy development within the organization, and consistent metric definitions.

Click here to watch this episode on our YouTube channel.

Blue Margin increases enterprise value for PE-backed, mid-market companies by serving as their fractional data team. We advise on, build, and manage data platforms. Our strategy, proven with over 300 companies to-date, expands multiples through data transformation, as presented in our book, The Dashboard Effect.

Subscribe here to get more episodes of The Dashboard Effect podcast on your favorite podcast app.

Visit Blue Margin's library of additional BI resources.

Brick Thompson:

Welcome to The Dashboard Effect podcast. I'm Brick Thompson.

Caleb Ochs:

I'm Caleb Ochs.

Brick Thompson:

Caleb, we want to talk today about sort of the state of play of generative AI and analytics and some thoughts we have around why people may struggle with it in 2024. To start with, we do a review once a quarter to see what the new tools are, what they look like, how they seem to be working. Frankly, we haven't seen a lot of advance over the last, I don't know, 90 120 days. Still seeing tools that are sort of wrappers on MLMs, tools that are creating SQL statements to go out and get some data to try to give you an answer to what you're asking in a question. And, it still feels to me like this is something for more of a bleeding-edge adopter, rather than your general business person. Not quite ready for primetime. But the bigger problem is something that you pointed out as we are discussing here, which is...

Caleb Ochs:

Yeah, well, obviously, you got to have your data in it good spot. That's step number one. You know, I think from my perspective, it's interesting because there's ChatGPT, for example, there's been so many advances in that thing since the initial hype. Like, there's so many cool things. It's amazing. I was using it today, and I'm saying,"Do this, I need this image," and it created an image. You don't have to go to DALL-E, a different website. Now it's all in ChatGPT. So it generated the image. And I was like, "Okay, now I need a fake name for this company, and now I need, could you generate a theme file for Power BI for this company as well?" And it did everything for me. So it was awesome. Right. So that thing is advancing. And I think that's indicative of the rest of AI is continuing to move forward. You just may not see, like, we're probably not going to see another big ChatGPT revelation again, but these subtle improvements are just kind of the stepping stones to get us to the place. Really the euphoric place that we've dreamt of where you can just ask, you know, "what happened yesterday in my business?" Obviously, it's gonna be a little bit more specific than that now - maybe not later. And it will just give you the answer. Right.

Brick Thompson:

Yeah. So I mean, you can make the systems do that, and we've even done some stuff internally. Your data has to be really dialed-in for that to work well, and maybe we'll talk in another podcast about all the things that need to be there. But we think it's coming. I mean, I think actually, based on the review this quarter, I think Copilot for Power BI is still likely going to be the front runner, in my mind, for really usable analytics, generative AI analytics for business people. The stuff that I'm seeing now in the preview looks a lot like what they previewed back at the Build conference in May, that was so mind blowing. It's not quite as slick, but it's getting there. But even that, you've got to have your data in order. So, there's going to be a lot of work for people to do to get there, I think for most companies. We reviewed a couple of surveys recently, AWS and Salesforce surveys, I think even people are self reporting that they're not really organized data-wise, to be able to take good advantage of this. And so I think something we talked about months ago, was if you're not getting yourself in order now, when this stuff does hit when Power BI Copilot is supposed to go into preview for everybody, at the end of the first quarter, you're not going to be able to take advantage of it.

Caleb Ochs:

Yeah. If you owned a company right now, let's say you're the head of data at a midmarket company or something where would you want your company to be? Like, what would you say the characteristics of the company to know like, okay, we're, we're in a good spot for when this happens.

Brick Thompson:

So, at a really high level, I'd want to actually Yeah, yeah. So have your data in one spot, have some semantic have consolidated my data into a single platform, you know, a data lake, OneLake in Fabric for me. So that consolidation is the first step. I would just want it all there, and I want it coming in every day or every hour, depending on the data source, you know, that type of thing. And then to actually build a semantic layer, sort of a generalized semantic layer that relates those various data sources at least in a very general way, the ones that can be, at the very least on on date timestamps, that type of thing. But then there's so much more. You've got to get in and make sure that you've got column names, at least in your semantic layer, you've got column names that are human understandable and likely to be things that people are going to refer to. You've got to make sure that your measures and KPI DAX formulas (or whatever you're using) are correct. I was talking to one of our engineers this morning over coffee, and he told me about an argument at one of our clients about what revenue is - how do you define it? It was a hard argument. I mean, there was a lot of opinions on how they define revenue. So it seems like,"Well, I don't need to do that there's a column with revenue in it." No, it's not that not that easy. So there are those types of things to do to get your data sort of certified and ready. It's going to be the same old problem if you don't where the CFO asks the system for for an answer to a question and gets one. The Head of Sales asks the same question and gets something, and they're not agreeing that they're getting a good answer. And so adoption is not going to be good, people aren't going to use it, you're not going to be able to take advantage of sort of the the leverage, the multiplier effect of having this. models built that kind of relate some things together. I think those are good, the only thing I'd add is that you have (kind of your example about the revenue kind of touched on) is that you have some data literacy in your company. Like, people are starting to understand it, and you have somebody who knows(somebody or a team of people) who know your data really, really well. I think once you have those pieces in place, and you've got some reporting, and then you're starting to get some consensus and definitions of revenue, for example, I think by the time these things start coming out, you're gonna be really well positioned to put the things in place like metadata and that type of thing, to where these tools are going to be able to use it a lot faster than someone who is either starting from scratch or just kind of as a kludge of things going on. And honestly, between those two, if you already had some analytics, and you know, you're kind of working away at it, but you didn't really know what was going on back there. Versus starting from zero, I'd rather start from zero, Just so it's clean?

Caleb Ochs:

Yep, so you can start clean and kind of get those things in place. Because it kind of uncoupling an already existing system to retrofit it is going to be way bigger task than building something kind of from scratch.

Brick Thompson:

So if you were if you were looking at if you were at a company that had to deal with this, and let's say you had an old Kimball style data warehouse, you'd say,"Fine, keep using that for analytics, but let's go ahead and pull the data into a modern data lakehouse structure (or something like that). Let's start fresh, let's make sure to get that stuff right."

Caleb Ochs:

Exactly.

Brick Thompson:

Rather than going in and saying, "Alright, where's this screwed up? And let's try to fix it.

Caleb Ochs:

Yeah. How do we make this work? I'd much rather be be going that route. How do we kind of build this from scratch that fits?

Brick Thompson:

I agree with that. I think the other thing you said that caught my attention, and that is make sure you've got analytics and reporting going now, so that you're getting consensus, so that you're cleaning data, so that you don't have to start doing that when Copilot for Power BI is going to hit people who are on Office 365 are using those types of tools are going to want to use it. And if they haven't done that, they're going to be disappointed. That's gonna be a pain.

Caleb Ochs:

Yeah, right. That's what's going to be the delay, right is doing those things that aren't necessarily getting your data into a lake, for example. It's like, no, now you have to do the, I guess for lack of a better term, like the more soft-skill type stuff with your data around what is revenue, you know, getting everybody to understand what that is and defining it. And that's just one example. There's all kinds of those edge cases that you're going to have to start defining. And if you're not doing that, now or you haven't done that, that's a process, not an overnight thing.

Brick Thompson:

Do you think if someone's got, let's say they've got an old mature Kimble style data warehouse sitting on SQL server somewhere? Do you think it's not worth it for them to try to make that work? You think they're still better off going

Caleb Ochs:

I mean, it depends. I mean, if your I don't know fresh? what it would really depend on... I guess how clean that is, and if you're willing to maybe push that somewhere. Like you could you could use your warehouse still, if it's working well and you have no complaints with it to like push the output of that of whatever you're sending to your reports, we call it reporting views. So you could push the results of those reporting views to a lake environment where it's going to be easier for something like an AI to look at it and give you some examples back. The way that we're thinking about it, and it really doesn't matter if it where the data is stored, like if it's a lakehouse or a warehouse, what we're thinking about doing is (we're actually working on it and kind of thinking through how this would work), but compiling metadata about the metrics. So like, what is revenue? You know, having those definitions, and then defining where those answers live. Feeding that to our AI agent, and then when a question comes in, it can look at that information that we fed it, and know where to go get the data. And we're thinking that it would just be a Power BI model that is just going running a DAX query, getting the answer, and bringing it back. Yeah. Now we're in the very early stages of this, and there's a lot more than that very high level that I just explained. But you know, so I guess, tying that back to your question, that wouldn't matter if it's coming from warehouse or lake at that point. If you have an existing warehouse. But what does matter, is having that good, clear documentation, really metadata so that AI can understand what it's doing, and what it's looking for, and give you good answers.

Brick Thompson:

And the data, the metadata has to be set up in such a way that the AI can use it easily. Some kind of consistent setup that is going to make sense to it. You're not going to be writing free text, probably, to describe these things.

Caleb Ochs:

Well, yeah. I mean, maybe to some extent. What I'm envisioning is like revenue, and then you've got, okay, revenue is this, like, plain speak explanation of it. And the goal of that is so that when somebody asks a question that might be revenue, it can kind of figure out okay, yeah, that's probably revenue.

Brick Thompson:

Yeah, there's more context there.

Caleb Ochs:

Yeah. And then it would also know, (this is where it gets a little bit more complicated), so let's say that you want to know what revenue was, then it says, "Okay, this is a revenue question." And then then you have to say, "Okay, where do I get this from?" Because you might have 10 different models, right. And you want to send it to the model with revenue.

Brick Thompson:

So the metadata would tell it that.

Caleb Ochs:

Yeah, so you would go look that up and say, "Okay, this model is where you need to go get revenue," and then it would give you back whatever DAX formula based on the rest of the context of the question to deliver your result. So yeah, there's a lot more of that, too, talking about vectors and stuff. We won't get into all that.

Brick Thompson:

Yeah, no, I know you guys are doing some cool stuff back there. All right. Well, I don't know, to wrap this up. It's still coming. It's getting better and better. It's not quite there yet. And the message that we've been sort of preaching for a while is get your data ready. I think you're gonna be sorry, if you don't.

Caleb Ochs:

Yeah. And one last thing that made me think of is that, you know, we've said that a lot in our in our podcast, but, you know, we haven't really provided much detail on exactly what that means. So I think that through this work, that I was

Brick Thompson:

Yeah. just describing kind of what this metadata looks like and stuff. We kind of touched on it now. Like what the definitions of your key metrics are, that's really important, no matter what you're doing, whether it's AI or not, you need to have a consistent metric. But a lot of times in companies, it's just kind of implicit, and maybe it was defined a long time ago. And everyone just kind of accepts it. There's no real definition behind it. Start doing those things. And then hopefully, I'm fully planning on through our AI work that I just described, I want to we're going to come up with like, "Here are the things that we need to in order for our our model to work." And maybe we can put something like that out when it's ready, so people can start actually making some progress towards getting themselves ready. Yeah. Great. I like that. All right. We'll come back to that. Okay. Talk to you soon.

Caleb Ochs:

All right. See ya.