The Dashboard Effect

Azure Synapse Analytics - What the Heck is It?

September 21, 2022 Brick Thompson, Jon Thompson, Caleb Ochs Episode 39
The Dashboard Effect
Azure Synapse Analytics - What the Heck is It?
Show Notes Transcript

Brick and Caleb discuss the Microsoft Azure Synapse Analytics service. They cover the parts of it most likely to be used for doing good BI, and touch on why it's a good choice over patching together several unrelated tools.

Blue Margin helps private equity owned and mid-market companies organize their data into dashboards to execute on strategy and create a culture of accountability. We call it The Dashboard Effect, the title of our book and podcast

 Visit Blue Margin's library of additional BI resources here.

For a free, downloadable copy of our book, The Dashboard Effect, click here, or buy a hardcopy or Kindle version on Amazon.

#Microsoft #Synapse #Azure #datalake #ETL #businessintelligence #reporting #BI #dashboard #datawarehouse

Brick Thompson:

Welcome to The Dashboard Effect Podcast. I'm Brick Thompson,

Caleb Ochs:

and I'm Caleb Ochs.

Brick Thompson:

How's it going, Caleb?

Caleb Ochs:

I'm good. How are you?

Brick Thompson:

I'm good, too. Thank you. So today we're going to be talking about Azure synapse. This is, I guess, you could refer to it as branding, but it's sort of how Microsoft has packaged all of their BI and machine learning and AI into a set of tools under the name under the flag synapse.

Caleb Ochs:

Yep. Thankfully, they call it Azure synapse analytics. But, you know, for sake of simplicity, we'll call it synapse.

Brick Thompson:

Alright, appreciate that. Okay, so there's a lot of pieces here. And I was thinking today, we could just spend a few minutes talking about them so that people could be familiar. What's in there? Where do you think we should start?

Caleb Ochs:

Well, maybe just a high level overview about what Azure synapse analytics is for synapse. So you're right, you touched on it briefly, it's a place where all of Microsoft's Azure, bi... all is probably not the right word, but most of it, a lot of the key pieces, they tried to package up and put into a single place. So you have a unified experience of doing your data ingestion, preparing data, doing data warehousing, data integration, big data, AI, ml, all that stuff can happen through one pane of glass at one workspace. Really, it seems to be becoming pretty popular. We, started using it a long time ago, before well, they were, well it technically wasn't synapse then. But some of the some of the components we use, and then Microsoft package it all up, you can even put Power BI reports directly in there. So you can link to a workspace. It's pretty sweet. But lots of different things you can do with it. Microsoft's goal was to like I said, to package that up.

Brick Thompson:

Yeah, just make it easier for people to understand what goes with what and use all those tools. Okay, great. That's, that's a great description. And when you say we were using it before, we were using it before it was called synapse, right? Is that what you meant?

Caleb Ochs:

Yeah. I mean, you kind of get into some technicality there. But a lot of the components aren't necessarily new as part of synapse or just put into a centralized location. So we're using the components before synapse.

Brick Thompson:

Gotcha. Okay, good. All right. What do you think is a good place to start and talking about the different main pieces in synapse? And I know, we could go really deep and detailed here, but we'll try to cover sort of the the main ones.

Caleb Ochs:

Yeah. So maybe we just start with some of the standard things that that people might recognize in terms of a BI system like a data warehouse and ETL? And reports.

Brick Thompson:

Okay, good.

Caleb Ochs:

All right. So inside of synapse, this is where it gets a little bit interesting and why I think this is a good podcast to do. Because when you're talking synapse, there's a bunch of different ways you can store your data and actually query it and explore it and even report on it. When you talk about a data warehouse, that technology, if you're going to use a synapse, technology is going to be a dedicated SQL pool. So it's actually a database technology. It's massive, parallel processing. So it has 60 databases behind it in the back end and the technical terms. And what you do when you stand that thing up is you get to tell synapse, where and how to store the data. So it's really useful, it gives you all the control you could ever need. And allows you to process as you can imagine, huge amounts of data really quickly, where dedicated Superbowl kind of falls down. And something to consider when you're making these decisions is it's not great at quick read and writes. So if you're trying to read one row, or write one row, for example, or just very small amounts of rows, it's not great at that. It's better at big chunks of data. So it's really good at data warehousing, because typically, you're processing larger chunks of data. But that's what you would use a dedicated sequel pool.

Brick Thompson:

Okay, so yeah, so when you're referring to writing or reading one row, it's usually typically in a transactional system on an analytic system. And so this is definitely built for purpose for analytics. If you were going to build database behind an application that you were building something like that, then you might use something else. Right? Exactly. Okay, great. And when you said, you choose your location, are you talking conceptually, within the structure of the sequel pool? Are you talking about geographically?

Caleb Ochs:

No, that's a good point. So you can do both - within the structure is what I meant. So you can, you can tell based on how you set up your tables, how the data is gonna be partitioned across those 60 worker databases.

Brick Thompson:

Gotcha. Okay, great. Okay, so that's obviously a huge foundational piece. What else?

Caleb Ochs:

So how you get data into a data warehouse or the data lake inside of synapse is another big piece that is synapse pipelines. It's essentially Azure Data Factory. Some minor nuances. So it's not technically exactly the same, but it's very similar. So it's a way for you to connect to your data sources, I think they have 9095 connectors to two sources that they've pre built. So you could do some low slash no code stuff with these synapse pipelines. And then that's going to give you a way to move data, do some sort of transformations on it, you can call like Spark pools or scripts, or run stored procedures inside of your dedicated sequel pool, essentially orchestrate your data movement and processing with synapse pipelines.

Brick Thompson:

Okay. So if they don't have a connector, though, you're you're doing all those things that you were just mentioning to connect the no code? Have we used that? How are we feeling about that? If we gotten some good results? There are low code, I guess.

Caleb Ochs:

Yeah, we've we've used some of their connectors before. I mean, it works pretty well. This is another interesting point about those pipelines is that when they say no code, everything in these pipelines is built on a JSON file. So you could go code these and matter of fact, back few years ago, before Azure data factory version two V one, you did have to write some of this JSON code. That actually was a huge pain. But they've kind of made that into a more user friendly UI. So you don't actually have to go to go write the code, you can if you want. But it does give you that graphical user interface. So you can pull something on, you can change your variables, it gives you a lot of flexibility, without having to write that raw JSON. So we have a guest we have used and we do, we do use it.

Brick Thompson:

Okay, great. As you were explaining that you were talking about moving data into the data warehouse, the sequel pool or a data lake, do you want to cover data lake in synapse real quick?

Caleb Ochs:

Yeah. Data Lake is a fancy term for basically, file storage.

Brick Thompson:

We've covered this in another podcast, yeah,

Caleb Ochs:

you can drop whatever you want in there. You know, it's just a place to store raw data, the whole reason why data lakes caught on was, it's a place to unify your, all your various sources in one location for people like analysts, data scientists, data engineers to be able to do things with that data. So you just dump raw data there. And then you should process it further down the line in whatever you're using. So inside synapse, you can connect to data lake.

Brick Thompson:

Yep. Okay, great. All right, what's the next major piece we should talk about?

Caleb Ochs:

So let's say we talked about the data warehouse, dedicated SQL pool, data lake sales pipelines. This is very closely related to the data lake. But synapse allows you to query your data lake. So think let's just think about file storage. Let's say you just have a bunch of CSVs on your local computer. How do you see what's in there? Without having to just go open them? It's not, that's not actually an easy answer now. So people go open them with Excel, right? Right. So what synapse does is it gives you this thing they call serverless SQL, which will allow you to write a SQL statement against your files. So you could say, select these columns from this CSV, and it'll pull back the contents.

Brick Thompson:

And you could tell it, you can point that at a group of CSVs, I'm assuming that have similar column structures.

Caleb Ochs:

You know, I would imagine, I'm actually not totally sure on that. But you know, it's a way it's a way for you to query your raw data in that lake. So you can think about what that means, right? So you can write those queries, but then you can also use those queries and other analytical processes. So you could either use it to select data and do some machine learning on it. You could use it to pull data out and stick it into Power BI. You can do those types of things with that functionality. Serverless SQL. It's fantastic.

Brick Thompson:

Yeah, that's cool. Okay, what's next?

Caleb Ochs:

All right. Let's talk about Spark a little bit. This isn't something that we use very often but for larger data workloads you can spin up Spark clusters inside of synapse and run Spark notebooks. So more data processing and data engineering stuff, but it allows you to process large amounts of data. Spark is an in memory engine that allows you to, to do some heavy data processing quickly. So super scalable, allows you to run those things makes it makes it nice and easy. You have all your notebooks and inside of the synapse workspace, and the other great thing that, as I'm thinking about, I'm just gonna say the, you can connect GitHub, to your GitHub repos to synapse so that as you're inside of the synapse environment, and making changes, you can store that in version control.

Brick Thompson:

That's cool. How about Analysis Services? So you know, tabular queues, that type of thing.

Caleb Ochs:

Now, I think I'm actually not totally sure if Azure Analysis Services is technically part of synapse, I know that Azure SQL database, for example, is not. Like it's not technically part of that ecosystem. I don't know. So I'm not sure about Analysis Services. I think the thing that's very close to it is Power BI datasets.

Brick Thompson:

So basically replacing Analysis Services in the Power BI world.

Caleb Ochs:

At least that's my opinion. I think that's where it's gonna go. So. So yeah, I mean, you can have your data model, you know, everything that Analysis Services does, has nice data models, you build relationships between tables. You can write DAX queries and measures, that type of stuff. So that can all be done through datasets.

Brick Thompson:

Okay. Cool. All right. What else? Well, Power BI, we just mentioned.

Caleb Ochs:

Yeah, yeah, Power BI. I mean, the nice part, I think, really, the key takeaway is that all of this stuff is in one spot. So even your Power BI reports, right. So as you're building your ecosystem and exploring data and trying to build out reports that are actually going to be meaningful and impactful, you can put some of that development into your synapse workspace, so that you don't have to keep jumping over from place to place doing different tasks. It's all right there makes it a little bit easier for you.

Brick Thompson:

Yeah. Okay. So it's clear to me that you and Matt, (and) our engineering team really likes synapse. What do you think is a big advantage of pulling this stuff together in this way is?

Caleb Ochs:

I think there's a bunch of them. But I think, for me, if I was looking at standing up a BI system, you have to make a decision on what technologies you're going to use. Synapse, having everything that you really should need, to do BI in one spot, and you don't have to have five different pieces of technology, manage five different invoices and pricing and vendors and help train people on five different things, make sure they work well together, and all that. And so it just gets rid of all of that. So it's all just one thing. And you know, your life so much easier.

Brick Thompson:

Yeah, that makes perfect sense. Okay, anything else we should touch on?

Caleb Ochs:

I don't think so. I think, I'll say this is the last point. We, for our clients, one of our big concerns is that when we leave, we want to allow them to be able to pick up whatever we stand up for them. So when we're making decisions on what types of technology to stand up, we want to make sure that it's not going to be a huge pain to hire for. So synapse is one of those decisions, right? Are there tools that might do pieces of what synapse does better? Sure. But are they going to be able to do everything better than synapse? Definitely not. So hiring somebody that knows Microsoft tools, knows SQL, knows Data Factory and synapse pipelines? That's going to be an easier higher than finding someone that knows five different pieces of technology. And the right five pieces, right?

Brick Thompson:

Yeah, someone who knows AWS Redshift and Dell Boomi. And you know, who knows what else?

Caleb Ochs:

Exactly. You're starting to look for unicorns at that point?

Brick Thompson:

Yeah. All right. Good discussion. Thank you, Caleb.