The Layman's Guide to Data Terminology: Part One Artwork

The Dashboard Effect

NEW EPISODES EVERY OTHER THURSDAY!!!

The Dashboard Effect Podcast: Simplifying Data for Smarter Business Decisions

Welcome to The Dashboard Effect, the go-to podcast for mid-market businesses and private equity-backed companies looking to harness the power of data.

Hosted by Brick Thompson from Blue Margin, we demystify data analytics and business intelligence, offering practical insights and actionable strategies that drive accountability, performance, and growth.

From breaking down complex data concepts to sharing real-world success stories, we cover topics like Power BI, data lakes, dashboards, automation, and the latest trends shaping the future of business intelligence. Whether you're a CEO, operator, or BI professional, we’ll help you unlock the potential of your data for smarter, faster decisions.

Tune in to The Dashboard Effect and discover how the right data, at the right time, can transform your business.

Subscribe now and take the first step toward becoming a data-driven organization!

Learn More: BlueMargin.com

All Episodes

The Dashboard Effect

The Layman's Guide to Data Terminology: Part One

February 29, 2024 • Brick Thompson, Jon Thompson, Caleb Ochs • Episode 119

This episode is the first installment in a series of episodes that will serve as a primer on the key terms and technologies surrounding data management. Brick and Caleb begin with the fundamental differences between transactional and analytical databases, explaining concepts like OLAP, OLTP, normalization, and denormalization in an accessible way. They also clarify the essence of SQL, how it interacts with databases, and introduce the concept of Serverless SQL. Whether you're new to data or looking to deepen your understanding, this episode is a good place to start getting a handle on common data terminology.

Click here to watch this episode on our YouTube channel.

Blue Margin increases enterprise value for PE-backed, mid-market companies by serving as their fractional data team. We advise on, build, and manage data platforms. Our strategy, proven with over 300 companies to-date, expands multiples through data transformation, as presented in our book, The Dashboard Effect.

Subscribe here to get more episodes of The Dashboard Effect podcast on your favorite podcast app.

Visit Blue Margin's library of additional BI resources.

Brick Thompson: 0:03

Welcome to The Dashboard Effect podcast. I'm Brick Thompson. Caleb, we wanted to take some time just to go

Caleb Ochs: 0:06

And I'm Caleb Ochs. through some common terms from the data world that you and I throw out all the time. And there are probably some of our listeners who may not be familiar with some of them, especially if they're new to data, and they're coming to us to learn about it. So, for this episode, and maybe a couple after, we'll just go through some of these. Yeah, hindsight, probably should have done this about 56 episodes ago, but better late than never.

Brick Thompson: 0:35

That's all right, we keep referring back to it. All right. So some of this will be basic for some of our listeners, but I think it'll be helpful. So why don't we start with at the level of the database. When we talk about a transactional database, what are we talking about?

Caleb Ochs: 0:55

So that's a database that you transact against, right? So when I say "transact", that means like, you're inputting data, you're making something happen against the database. So a really common example is if you go to the grocery store, as you're scanning items across the scanner, that's a transaction.(you could say the transaction is actually buying the groceries) But let's say you scan a product, it transacts against the database somewhere to say this product was scanned, right? That's a transaction. So it's something that people use in the world to keep track of information.

Brick Thompson: 1:29

Yeah, okay. You can almost think of it like the back end of business software that's tracking data. Like a CRM, so tracking sales data, or prospect data, or as you said, a grocery store transaction. That's perfect. Okay, so why wouldn't every database be a transactional database? What are the kinds do you have?

Caleb Ochs: 1:52

Yeah, so first of all, there's a ton of transactional databases out there. But there is a lot. And it used to be, that's all there was, right? But as people started to want to consume that data and look at it, and like, aggregate it and visualize it,(basically everything we do), you needed a different type of database to be able to support that well. So that's where we get into what we would call an analytical database, or what might be referred to as an OLAP. We don't really use that term anymore. It's kind of old school.

Brick Thompson: 2:25

It's pretty old. It's kind of wonky. It is kind of cool. Yeah, right.

Caleb Ochs: 2:29

If you want to impress your friends, you'd say OLAP. The other one we would call an OLTP, right.

Brick Thompson: 2:33

Yeah. So Online Transaction Processing database. OLAP -online analytics processing database?

Caleb Ochs: 2:39

Yep. Right. Right.

Brick Thompson: 2:40

Okay. And so what's the difference between an OLAP and OLTP?

Caleb Ochs: 2:44

Oh, yeah, good question. So the OLAP is going to take some of those... How can I explain this simply... So when you when you go to the grocery store, when you go to scan, like an apple, the way that that database - the transactional database is set up - is it's likely going to have a table for"items", it's going to have a table for "colors", it's going to have a table for whatever else describes that item. Because the transaction, if you need to transact against things, you need to touch as little data as you possibly can. So you want to really fragment your database as much as you can. But then when we try to report on it, we want to pull it all back together,

Brick Thompson: 3:27

So that it performs.

Caleb Ochs: 3:29

Yeah, right, exactly what we're doing aggregations and stuff over it, not single transactions against it. So we're looking at bunches of transactions at once. So we need to do some transformations to get fewer tables is the simplest way to think about it. If you have a lot of tables in your OLTP you want not so many tables in your OLAP. That your analytical tools don't have to go to so many tables to pull the information together.

Brick Thompson: 3:56

Yeah, okay, exactly. So we didn't have these terms, but we might as well define them. So there's normalization of a database, which is fragmenting it. And there's a term people might hear of "third normal form", where you've fragmented to an optimal amount. So for example, if you have an address database, you might have a state separate table for states that you refer to from your address with a foreign key, which we'll talk about, that can give you the state name, and maybe the region it's in, the definition that you'd use when you're using an address those types of things. Denormalization is going the other direction.

Caleb Ochs: 4:39

Exactly right. You would want to take that state and put it on the address table. Yeah. And when you denormalize it.

Brick Thompson: 4:46

And then that it'll perform really quickly for your analytics work. It may not be the most efficient from a space standpoint, but it serves the purpose. Okay, great. So while we're talking about that, what is SQL?

Caleb Ochs: 5:01

Yeah, good. Good. As far as first of all, let's clear this up. If you're working with a developer and they call it S-Q-L, you've got a bad developer. It's SQL. They're new, they're still learning. So it stands for Structured Query Language. Essentially, it's a way for you to interact with the database - like pulling information back from a database. So you're writing queries (which we'll get to in a second), but you write some code, you run the code against the database, and it will go to the database tables that you've put into your SQL query, and it will pull data back. So it's the code. That's actually not completely accurate. When someone says they use SQL, it's commonly referred to as "the code". Yeah, there are other uses for that term.

Brick Thompson: 5:55

I mean, it is code. It's a very specialized kind of code. So how would you...?

Caleb Ochs: 6:01

No.

Brick Thompson: 6:01

No?

Caleb Ochs: 6:01

I don't know... you could say like "SQL Server", sometimes people just say, "Is this on SQL?" And that means it's a SQL Server database.

Brick Thompson: 6:08

Oh, that's true.

Caleb Ochs: 6:09

So it's used in a lot of different ways.

Brick Thompson: 6:11

T-SQL, which is Microsoft SQL Server version of SQL.

Caleb Ochs: 6:14

Yeah, if you say T-SQL, that's a good point. Because if you say T-SQL that's the code that's "transact structured queried language". So that is actually a code language.

Brick Thompson: 6:27

And other people use SQL like, "MySQL." Ooo, weird name there.

Caleb Ochs: 6:34

MySQL. Yeah.

Brick Thompson: 6:36

And Snowflake has their own SQL. There's a bunch of them.

Caleb Ochs: 6:42

Right, exactly. So anyway, I guess bottom line is there, it's typically referring to a language, but it can refer to other things. Okay.

Brick Thompson: 6:49

So what's a"query" then?

Caleb Ochs: 6:52

Query. So, you know, if we were in the UK, query would be like a question. You know, I like to think about this way. So you're really kind of asking a question of your database, like, yeah, give me these, this information, and it gives you information back. So you can think of it as a question that you're asking a database.

Brick Thompson: 7:11

And you write the question in T-SQL or SQL?

Caleb Ochs: 7:14

Right

Brick Thompson: 7:15

Yeah. Okay, great. All right. So we've used the term a bunch in here the last few weeks, because we're doing some work on Azure Serverless SQL. What is serverless SQL?

Caleb Ochs: 7:28

Serverless SQL. So when we talk about like, a standard database, SQL Server database, let's just say that. There is a physical database in a physical engine, or it's software, but it sits on your hardware and it the data is stored inside of a database. It's like a proprietary tool. And when you write a query against that database, it uses the engine queries the proprietary tables and pulls back information, right? When we're talking about serverless, it's like, so maybe here's a better way to describe it. In that scenario, you've got your storage engine, and your compute engine, which actually runs the code against your data together, it's packaged into one. So it's a SQL Server, right? When you talk about serverless, you're essentially decoupling those. So now you've got your storage, which a lot of times is a data lake. And then you've got your compute, which is your query engine that actually goes and queries data from your data lake in this example. That would be what we refer to and we call something serverless SQL is just that compute layer, so it does not have storage attached to it.

Brick Thompson: 8:48

So you can write your SQL queries the way you normally have, and get a result it'll pull data out of, say, a data lake. You're basically using SQL without using the back end of a SQL Server. The storage part. Is that right?

Caleb Ochs: 9:04

Yeah, yeah.

Brick Thompson: 9:07

Okay, I think we're out of time. So I think we'll wrap this for now, and we'll come back and do another episode of this if people find it useful. That was great. I felt like I was just like, you're on a game show or something.

Caleb Ochs: 9:22

I'm feeling like all my blood is rushing in my brain. That's a lot of deep thinking.

Brick Thompson: 9:28

That was good. All right. Talk to you next time.