How to Structure Your Data for Power BI Copilot Artwork

The Dashboard Effect

NEW EPISODES EVERY OTHER THURSDAY!!!

The Dashboard Effect Podcast: Simplifying Data for Smarter Business Decisions

Welcome to The Dashboard Effect, the go-to podcast for mid-market businesses and private equity-backed companies looking to harness the power of data.

Hosted by Brick Thompson from Blue Margin, we demystify data analytics and business intelligence, offering practical insights and actionable strategies that drive accountability, performance, and growth.

From breaking down complex data concepts to sharing real-world success stories, we cover topics like Power BI, data lakes, dashboards, automation, and the latest trends shaping the future of business intelligence. Whether you're a CEO, operator, or BI professional, we’ll help you unlock the potential of your data for smarter, faster decisions.

Tune in to The Dashboard Effect and discover how the right data, at the right time, can transform your business.

Subscribe now and take the first step toward becoming a data-driven organization!

Learn More: BlueMargin.com

All Episodes

The Dashboard Effect

How to Structure Your Data for Power BI Copilot

January 24, 2024 • Brick Thompson, Jon Thompson, Caleb Ochs • Episode 114

Brick and Caleb call out key aspects of Microsoft's Power BI Copilot documentation related to structuring datasets for Large Language Models (LLMs) and explain how to get these best practices in place. Data structure and organization are critical for getting useful outputs from advanced analytics tools like Power BI Copilot, and useful outputs will determine, in large part, whether or not these tools will be widely adopted.

Here is the Microsoft documentation referenced: Update your data model to work well with Copilot for Power BI - Power BI | Microsoft Learn

Click here to watch this episode on our YouTube channel.

Blue Margin increases enterprise value for PE-backed, mid-market companies by serving as their fractional data team. We advise on, build, and manage data platforms. Our strategy, proven with over 300 companies to-date, expands multiples through data transformation, as presented in our book, The Dashboard Effect.

Subscribe here to get more episodes of The Dashboard Effect podcast on your favorite podcast app.

Visit Blue Margin's library of additional BI resources.

Brick Thompson: 0:04

Welcome to The Dashboard Effect Podcast. I'm Brick Thompson.

Caleb Ochs: 0:07

I'm Caleb Ochs.

Brick Thompson: 0:08

Caleb, we thought today we would talk about something that we saw in Microsoft's documentation for their Power BI Copilot, specifically a section in the documentation that talks about how you should get your data structured, to be ready to really take advantage of LLMs and generative AI analytics, specifically, and Copilot. I think this applies everywhere. And I thought we would take a few minutes and go through sort of their list and break down what these different things are form.

Caleb Ochs: 0:40

Yeah, sounds good. Let's do it.

Brick Thompson: 0:42

All right. So let's start at the top. What's the first thing you need to be thinking about?

Caleb Ochs: 0:47

When it says "table linking". So this is gonna be relationships between your tables. Making sure that first of all, you just have good model setup, really.

Brick Thompson: 0:57

And by the way, in the Microsoft world, we're talking about a semantic model on top of the lake.

Caleb Ochs: 1:03

Yeah, yeah. So basically, your model in Power BI. What's so interesting about table linking and some of the other topics that we're gonna get into is that this really all comes back to Kimball-style modeling. I think it's pretty impressive how well that methodology has held. Maybe that's because tools have kind of been built around that methodology that's kind of held up. But even now, when we're talking about Copilot, and this AI, the super futuristic thing, you know, it still wants to use...

Brick Thompson: 1:40

You still need that.

Caleb Ochs: 1:41

Yeah. Relationships.

Brick Thompson: 1:42

Dimensional modeling. All right. Well, the next thing on their list was then standardized calculation logic. We talked about this a couple of episodes ago, specifically with the work you're doing internally with your team on building some LLM analytic stuff. And this is really that you have defined - clearly - the calculation logic for the different types of metrics. So they give an

example: 2:08

Total Sales calculated is the sum of Sales Amount from the Sales Table, and some kind of metadata and then matching DAX, that is true to that definition.

Caleb Ochs: 2:21

Yeah, right. I think what's so important about that is, you don't you don't have Total Sales in one place. And then you have Total Sales and another place and they have different definitions, right? That's just kind of a general best practice data governance thing that you want to make sure you're on top of before getting into stuff like this. So I mean, hopefully, this would start uncovering some of those, if you start going through these and try to make some changes, but that'll be really important.

Brick Thompson: 2:46

Yeah. Another one that seems obvious, but naming conventions. I mean, we're really careful about our naming conventions here for column names, and you know, in views and so on. But their example, would ask you to do something different from what we might do even now. Like AVG rating for average rating, they would say, put"average_customer_rating", just to make it much easier for the LLM to match up what the business users are asking for. I have a feeling in this example, the LLM would figure out what AVG rating is. But it's a good point. So some of the shortcuts that we're used to taking for the last, however many years, we're going to have to get used to thinking about, "How do we make this easy to do a translation from what a business users asking for to what the LLM is going to understand."

Caleb Ochs: 3:39

Right. And I think that what it's also kind of interesting about that is, when you're thinking about doing that, either changing, let's say you do have "AVG Rating" in a customer analytics report. You can either way, this is what's going to happen well, for Power BI copilot, I don't know how configurable it's going to be. But you could provide, you know, the LLM with "AVG rating means average_customer_rating", right?

Brick Thompson: 4:08

Basically synonyms.

Caleb Ochs: 4:09

Yeah, basically synonyms. And that might be a lighter lift, depending on your situation. So not all of these are, are going to be the end-all solution for getting this thing ready. But, you know, they're still good. You're still gonna have to be smart about making this happen.

Brick Thompson: 4:25

Yeah, it's interesting. One of the first things that the Power BI Copilot preview is doing is helping you create synonyms for the old q&a, functionality, which

Caleb Ochs: 4:36

That q&a sucks.

Brick Thompson: 4:38

It hasn't been great. I have a feeling it's about to get a lot better. All right. Now, another thing they talk about is having clear delineation of "fact tables" versus "dimension tables". I think there's, there's maybe two meanings to clear delineation. One is make sure you do your dimensional modeling well, so that you've been smart about what goes into a fact table and what goes into a dimension table, but also make it easy for the model for the LLM to know what a table is. Maybe something as simple as you put an F_ in front of your fact tables and a D_ in front of your dimension tables.

Caleb Ochs: 5:16

Right. Right. Exactly. And that's just best practice Dimensional Modeling right there. So, you know, if you've been doing that, I think but you know, for us, we are forced to be pretty strict about some of these things, right? Because we're passing work between developers, we are hiring new people. We have to have these things pretty well regimented. So it's kind of baked into our DNA. It's not uncommon for someone that's kind of learning Power BI or learning BI, just in general from the ground up to have a table called transactions that is a dimension. And you just you have no naming convention around because they never learned it.

Brick Thompson: 5:58

Right.

Caleb Ochs: 5:58

So yeah, that's not gonna be uncommon.

Brick Thompson: 6:00

Yeah, yeah. Another one on here is"hierarchies". So what are they talking about here?

Caleb Ochs: 6:06

So this, it's kind of interesting. I don't really know why you would want to have these in there. I can understand maybe if you're trying to generate like a report, and maybe that's what they're getting at -- for drill down stuff, so it knows that this is a logical way to drill down into the data. But the example they gave, I thought, was not great. I mean, they're saying Year, Quarter, Month, to Day, I think if you asked Chat GPT, it would know what to do.

Brick Thompson: 6:40

Yeah, but it'd be more like you have eastern region that has sub region.

Caleb Ochs: 6:47

Yeah, yeah. So you might want to do that. That would be a good idea, if that's that stuff you're thinking about.

Brick Thompson: 6:53

I mean, that's something we run into with clients all the time is their hierarchies, and one of the challenges is when clients are adopting a new ERP system or have acquired a new company, and now they're changing their hierarchies. You're gonna have to have those right for the LLM do a good job.

Caleb Ochs: 7:07

Right. So that means like, explicit hierarchies, right, that are defined inside of Power BI. So there's the implicit ones, when you just know that the North Western region includes Seattle. But in Power BI, you want to physically put those fields into a hierarchy so that Power BI Copilot will be able to understand.

Brick Thompson: 7:32

Yeah, good. There's another one here, which is to make sure you have"correct and consistent data types". And I think what they're talking about is be careful that you don't have something that looks like it might calculate, but doesn't, because you got a number in one place, and you've got something else that's a text column that has numbers in it. And maybe it's not going to work. Is that what they're getting it there?

Caleb Ochs: 7:57

I think so. You definitely want to have that. I guess what Power BI Copilot might be doing is going and writing some measures. Honestly, I would really hope that it looks at the type of column because if you do have, let's say you've got numbers in a column, but you have one text field. That column type cannot be number. Yeah, it's gonna have to be text. So yeah, it

Brick Thompson: 8:19

Right. would probably have to figure that out. So that's probably just more general data cleanup best practices, stuff that you should be doing anyway. Speaking of data cleanup, you have to have consistent data in columns that require it. So let's say you have a column that has a Order Status, maybe it's open, closed, on hold or something. You got to make sure that data is clean. But again, this is all the things you have to do to do good reporting anyway. Like you said, it's the old Kimball stuff is still really valid.

Caleb Ochs: 8:58

So yeah, I mean something like "on hold" you if you have on "on-hold", and then you've got "on hold", that's going to look bad in your reports anyway. So you want to clean that up.

Brick Thompson: 9:13

But you also can't aggregate on that?

Caleb Ochs: 9:15

Yeah, yeah, exactly. Matter of fact, the LLM if(well, depending on how Copilot is rolled out), I mean, obviously, maybe this doesn't happen. But you know, maybe in the future, it'd be able to figure that out. Like these are the same and probably stick them together.

Brick Thompson: 9:32

I think there are going to be things just get easier and easier over time. To start with, you're gonna we're gonna need to give it every advantage we can. I mean, I think there's that risk of if you don't have your data set up well, and people are not getting good results. They're not going to adopt it. Yeah. I mean, I've been pushing ChatGPT use in our office for, you know, 10-11 months, and it's been slow uptake. But all of a sudden, with ChatGPT-4, and Plus, and the MyGPT stuff, and the being able to populate a prompt sort of behind the scenes so it knows who you are and what you care about to start with. All of a sudden, adoption has gone way up. And I think it's going to be a similar thing here. Like if your data is good, adoption will be good from the start.

Caleb Ochs: 10:17

Oh, for sure, definitely.

Brick Thompson: 10:19

What other key things?

Caleb Ochs: 10:20

Let's see. So it's got "key performance indicators". So it wants to have your KPIs defined. And again, hopefully, you've already got that if you're building reports, it would be a good idea to put some of those in there, if you don't. Here's one that I was not thinking about, but I think is

really interesting: 10:37

"refresh schedules". So one thing that we put in all of our Power BI reports is a "last refresh date."

Brick Thompson: 10:44

yeah.

Caleb Ochs: 10:44

But I can understand why that would be so important. So that when you get like an answer back or Copilot builds you a report, it can tell you like how fresh this data is. I think that's I think that's actually probably going to be an overlooked piece, but I can see it being very important.

Brick Thompson: 11:01

Yeah. Interesting. And I think of how we put that freshness date on our reports. You might need to think about how to push that freshness date deeper into your data structure somehow or something? I don't know.

Caleb Ochs: 11:16

Yeah, the way we do is we put like a table. That's called "Last Refresh". So you might have to put some metadata around like exactly what that table is and what to use it for.

Brick Thompson: 11:28

And how it gets updated, because you're going to have different data sources are going to have different freshness dates and times on them. Right. Interesting. Okay. All right. Anything else?

Caleb Ochs: 11:40

Well, the last two are "security". Obviously you want the Copilot to be able to figure out who should see what. I think is what they're implying here. That'll be actually interesting to see how they do that. You would think that the person asking the question would just have their security already applied, but why would the Copilot need to know that. That'd be kind of interesting.

Brick Thompson: 12:04

Well, yeah, for for Power BI Copilot. For other LLM 's you might have to think about this, though. differently.

Caleb Ochs: 12:12

Oh, for sure, definitely. Yeah. And then"metadata", which we've talked about a lot. I think that's going to be the most important piece is making sure that you're explaining to the LLM, whether it's Copilot or something else, what it's looking at, it's like giving it the knowledge it needs to do a good job.

Brick Thompson: 12:33

And not just like an ERD - Entity Relationship Diagram. Yeah. Interesting. It's fun. Exciting.

Caleb Ochs: 12:42

Yeah. That's actually really interesting. I wonder if there's gonna be companies that pop up that will gather metadata about your stuff in the right way, so you can use these tools with.

Brick Thompson: 12:55

Sure. I mean, I'm sure we'll be doing that.

Caleb Ochs: 12:57

Yeah. I mean, that's one of our things that we want to do for our clients. Right. Let's get that stuff wrangled.

Brick Thompson: 13:04

Exactly. Okay. Good discussion. Thanks.

Caleb Ochs: 13:07

Yeah. Thank you.