#89 SQLBits Unfiltered: dbt in Fabric, MLOps in Action & Copilot in Question Artwork

DataTopics: All Things Data, AI & Tech

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

All Episodes

DataTopics: All Things Data, AI & Tech

#89 SQLBits Unfiltered: dbt in Fabric, MLOps in Action & Copilot in Question

September 25, 2025 • DataTopics

Send us a text

In this episode, we're joined by Sam Debruyn and Dorian Van den Heede who reflect on their talks at SQL Bits 2025 and dive into the technical content they presented. Sam walks through how dbt integrates with Microsoft Fabric, explaining how it improves lakehouse and warehouse workflows by adding modularity, testing, and documentation to SQL development. He also touches on Fusion’s SQL optimization features and how it compares to tools like SQLMesh.

Dorian shares his MLOps demo, which simulates beating football bookmakers using historical data,nshowing how to build a full pipeline with Azure ML, from feature engineering to model deployment. They discuss the role of Python modeling in dbt, orchestration with Azure ML, and the practical challenges of implementing MLOps in real-world scenarios.

Toward the end, they explore how AI tools like Copilot are changing the way engineers learn and debug code, raising questions about explainability, skill development, and the future of junior roles in tech.

It’s rich conversation covering dbt, MLOps, Python, Azure ML, and the evolving role of AI in engineering.

Speaker 1: 0:00

you have. Hello and welcome to data topics, your casual corner of the web. We discuss all about sql bits. Uh, my name is morillo. I'll be hosting you today. Behind the screen is always alex. Hey, alex, hello, okay, and I'm happy I'm joined by two, two special guests. So we have a returning guest, sam, hello, hello, and we have a first time. Is this the first time on a podcast ever?

Speaker 2: 0:30

No.

Speaker 1: 0:31

I have done a podcast before.

Speaker 2: 0:34

Which podcast the Iron Orton from the TET? Ah, really, ah, that's true for the yeah it's from Contest and now I need to look up what the name was for the other one. No, it doesn't really matter. Now I need to look up what the name was for the other one.

Speaker 1: 0:47

It's okay, it's okay, it's okay, but uh, she's a podcast veteran already. Dorian, can you get a applause for them? Actually, there we go. It's been a while. She doesn't use this for the serious one, so it's like she doesn't know.

Speaker 3: 0:58

Um, maybe, sam, can you introduce yourself for people that don't know you yet yeah, uh, I've been working in data which now for about five years and a half or something. Uh, today I mostly work as a data and cloud architect for our customers, designing and implementing data platforms. Um, they are very hands-on, uh, yeah, and I mostly focus on the microsoft stack, which, what you'll notice also during this podcast today cool you also.

Speaker 3: 1:31

You have a special relationship with microsoft yeah, I'm a microsoft mvp, so most valuable professional uh means that I share lots of things in the community, like these kinds of things, but also blog posts. I talk a lot at conferences and meetups, um, yeah, so that's a bit uh it's. I do the same thing for dbt actually. Uh, I also, for dbt, have the community awards, um, and I need a belgian dbt meetup group cool there's no dbt mvp, I guess not yet.

Speaker 1: 2:03

Not yet, but if there was one, I know who got my vote. We'll see, we'll see.

Speaker 2: 2:08

I'm actually got the community awards you got a community award.

Speaker 1: 2:12

Ah, you got the community. Yeah, there you go, like last year, no, not already two, three years ago oh really, wow, time flies.

Speaker 3: 2:18

Yeah, they are looking maybe at some kind of program.

Speaker 1: 2:20

So okay, cool, wow.

Speaker 3: 2:22

Yeah, I feel like, even if there's no official title, I think within dbt a lot of people know you yeah, yeah, since uh I found out the meter group in belgium and uh I think we're almost at the next meetup is going to be number 12. Each time a good amount of people joining the meetups, um, with lots of things in the open source community there as well. Again, also for the uh relationship with microsoft there I I developed uh, the dbk, that's for all microsoft's databases, together with other people.

Speaker 3: 2:56

Um yeah, so lots of people know me, and if they don't know me they are. Maybe I'm using my code.

Speaker 1: 3:01

So yeah, and whenever you walk in a room, everyone sees you because you're pretty tall. Yeah, that's also make an entrance cool. And uh, dorian, for people that don't know you yet, would you?

Speaker 2: 3:14

yeah, so I'm dorian. I'm she learning engineer at data roots for seven years and a half now. Um, one of the ogs, yeah, one of the ogs. Things have changed but I'm still here. I'm glad to be here, and the first time in the in the data roots um podcast, so that's fun. Yeah, I'm an ai tech lead, um staffed at the client right now within the event business and aside aside of that, of not aside of that I'm also a DBT practitioner at the client and we use DBT mainly for our machine learning platform. So it's kind of a different take than a typical data architect type of way.

Speaker 2: 3:59

So or data engineer or analytics engineer.

Speaker 1: 4:01

And maybe so if someone is listening to this and they never heard of dbt and they think is the the therapy? What is dbt? What's?

Speaker 2: 4:10

the lowercase therapy. I don't know that one. What is the lowercase? Dbt stands for data build tool and it's a transformation workflow for data and, in one liner, it's a sql templating engine that runs your sql in order anything you want to add to that sam?

Speaker 1: 4:33

yeah, sql and steroids are basically yeah, bringing software engineering best practices into the world of data yeah, yeah, indeed, it's really cool and also, like we're saying sql, but I think on, well, I don't want to jump the gun, but it's not just SQL, right, you can also have some Python transformations there.

Speaker 2: 4:49

No, Yep, I would want to say that it's quite new, but already it's also three years old.

Speaker 1: 4:54

Yeah, data standards Very, very cool. And also, did you have something with DVC? Because I remember you had the plush toy from DVC. I have the toy, you have the toy, but you don't have a special.

Speaker 2: 5:09

No, I just won a DVC competition. It was a simple quiz and I got a nice. No, don't say simple quiz.

Speaker 1: 5:16

It was like a super hard quiz. It was a super hard quiz, man. It was like so many steps. It's like only the best of the best got to the end. And you were. You were the one, only the best of the best that received a toy. So that's true, that's it then, game over. Yeah, um, very cool. So I thought about you both because you both actually conference speakers. Yeah, right, uh, actually you spoke in multiple conferences already, both of you and maybe actually sam, like you.

Speaker 3: 5:39

Wanna just give some few highlights like of all the conferences or the conferences, yeah, and then we can. Uh, I mostly speak at microsoft's focus conferences or data focus conferences, um, lots of talks about fabric, dbt, with fabric, about dbt itself, um, I've been doing that for the last few years, uh, but I've always been into the community speaking thing with meetups and conferences I think for almost a decade now, so it's something I could do. I travel a lot people, seem, all over the world, mostly Europe, but but all the places on yeah not like you were, like Japan is like still bucket list song.

Speaker 1: 6:25

Yeah, it was one time. It was pretty cool, it was nice, and you as well, not Dorian?

Speaker 2: 6:31

Before, music generation was indistinguishable from human music generation. I did some talks on that and also created AI music. But yeah, you can listen on Spotify, it's recognizable. Yeah, it doesn't pass the turing test like the velvet sundown yeah and aside of that I have also done dbt and python and machine learning talks at sql, bits at pydata and the dbt meets up as well. I was going to answer that, yeah and most notably with the best, the best of the best number three.

Speaker 3: 7:09

They were number three, wow okay, one of the first few oh, wow.

Speaker 1: 7:13

Well, it feels like it wasn't that long ago, but when you say three and everyone, 12, like it was already a while ago, yeah because you only do four or five per year. So wow, very cool, so ask your bits um sequel bits. Oh sorry, sequel bits. I was having this discussion with a cloud bits let's go bits, sequel, bits, sequel bits. Yeah, what is it? Well, how would you say it?

Speaker 3: 7:35

I think most people there like the organizers at sequel bits, sequel beats but. I also had people heard. I also heard people saying ask girl bits yeah, yeah, but I think sequel bits are.

Speaker 1: 7:45

What do you think, alex, what sounds nicer SQL? Yeah, sounds like a, like a sequel, you know, like a sequence of things like a you remind me of a squirrel as well squirrel. Okay, yeah, squirrel bits. So that's the last talk. Is that the last conference you guys, you guys presented? No, it was in well 2025, I think he was in uh june june, just june, yeah end of june, end of june.

Speaker 1: 8:11

Um, yes, so this is already the I'm putting on the screen here for people that are curious about the next, next year's already. They're already announcing it. What is sql bits? What is the conference about?

Speaker 3: 8:31

well, it's a conference, like the name implies, for people who work in data. They are known to focus always on the microsoft stack, so it's people working with data, microsoft stack, but it's not only data analytics. So you also have lots of people there who are like dba kind of profiles, working with sql server. Not even they don't have to work in the cloud, it can also just be sql server, on-prem um also no sql kind of databases like cosmos db or something. But the one thing everyone has in common there is they work with data in some kind of way and everyone understands SQL.

Speaker 1: 9:08

But is it Microsoft exclusive or is it like most of the talks? Are Microsoft or is there like a?

Speaker 3: 9:14

formal? I don't think they. So they even have official sponsors for Microsoft but also Microsoft speakers there, and the conference is known to always have like big announcements as well. In the Microsoft data world, like, microsoft typically has their own conferences built and Ignite in May and October, and now you have Fabcom for Microsoft Fabric. But SQL Bits also has some announcements in this year as well. Well, I don't remember exactly which ones because they weren't really uh applying to what?

Speaker 3: 9:50

yeah to what I do and so on. Uh, but yeah, I wouldn't expect talks there on google cloud or adbs or something, although adbs was a sponsor because they also run sql server on aws okay, interesting there was a bit of a data breaks following as well, but much less really

Speaker 1: 10:11

mostly microsoft and I heard that you're. There was a story and I just wanted to. We didn't talk much before we started recording that when you got there during you didn't know what to expect. Is this, this? There's something to it, or?

Speaker 2: 10:22

there is something to it because beforehand I saw the to it, because beforehand I saw the talks and my talk was entirely different than all the other talks. I was accepted here, yeah, but the SQL bit I also had in my talk, so that's why I got accepted.

Speaker 1: 10:37

Okay, okay, very cool, very cool. So, and maybe before we start talking about your talks individually, anything that really stood out. So you mentioned there was an announcement from microsoft, but uh, yeah, like nothing that really touches, like any, any, any vibes, anything. How was the conference?

Speaker 2: 10:54

it was a laid-back atmosphere, yeah, and also a bit of a fun atmosphere, so they really wanted to keep things light. Yeah, um, it was showcased by the party. Uh, friday night ah, there's a big party for tonight yes, and the party was organized in kind of a gaming hall so you could go carting, there, you got car rings yeah, like you know, usually when you go to a game hall and you see a game that you want to play, yeah, you have to wait for 30 minutes hall and you see a game that you want to play.

Speaker 1: 11:28

Yeah, you have to wait for 30 minutes, oh wow except the carting. That was a long waiting. Yeah, I can't imagine that was a popular item. Yeah, oh, very cool. And this is the first time you both went to this conference, or no?

Speaker 3: 11:35

yes okay I heard about it because I think one of the biggest fans in europe uh around data, but never went before okay nice.

Speaker 1: 11:44

Yeah, I've went to a few conferences as well and like sometimes you, you get the different vibes like some of them are more formal, more even salesy, like even comes a bit on the on the talks. Uh, some of them are like super, like developer, like long hair flip-flops, you know, like guys are like you know, so you get like it's, it's, it's interesting to see the yeah, like here on on friday night, for example.

Speaker 2: 12:08

The team there's a team every year. Yeah, this year's team was neon. Yeah, that's what they're sharing there, so some people were entirely dressed in fluorescent colors. Oh, really cool at the party I had something yellow, but it wasn't that fluorescent it was coincidental, you just know, it was the shiniest thing I could find in my wardrobe.

Speaker 1: 12:28

Okay, yeah, very cool, very cool. So now on to the talks. So maybe starting with you, sam. What did you talk about?

Speaker 3: 12:40

So it was literally about how DBbt makes working with your fabric leg houses and warehouses better. Um, and this is really combining my two passions into one thing. Basically, since the I have like two versions, variants of this talk that I often give. I give one version to more dbt minded people who don't know fabric about. Fabric is a great data platform, but here everyone at that conference already heard about fabrics and it's a microsoft focus conference, so there it's more about what is dbt and what can it do for you.

Speaker 1: 13:16

I see and I see it's a beginner, so it's more like introductory introductory like what, why, why would you care about this? And yeah, okay, very cool. Did you attend his talk, dorian? Yes, I did any feedback for him, did you?

Speaker 2: 13:29

let's do it now live it was great like people were sitting next to me and they were like oh wow, did you hype him up like whoa?

Speaker 1: 13:36

this is the MVP for Microsoft, man.

Speaker 2: 13:38

I did, I did did you get his picture already probably at the end he had to sign, yeah, yeah.

Speaker 1: 13:46

So everyone was like oh, so nice on a lot of inappropriate places.

Speaker 2: 13:50

Oh wow, okay, okay mostly male audiences.

Speaker 1: 13:54

Yeah, as usually it is with this kind of conference. That's right, um, okay, cool. So, and then, like so, double clicking a bit like you, you just kind of do an introduction to dbt. Do you also show a bit like a bit of code, like, um, how to get started with dbt? How would it look like? Open the id, show some things, or yeah, I, I think for dbt.

Speaker 3: 14:17

Uh, so I also do for for for my job, lots of workshops at customers and and these kinds of introductory workshops to teach them how to rocket dbt. And what really for me drives it home each time is if you show them really how you can use it and what it can do and how simple it is to accomplish something and then get all the extra benefits that dbt gives. Like you just create some sql code, select statement and then from that statement you can generate a documentation easily, add tests and so on. Um, because, yeah, as a data professional, you lots and lots of pitches of data tools that can help you in some kind of way, but like the only way to actually believe it is seeing it's how, how easy it actually is.

Speaker 1: 15:02

So yeah there's always a combination of slides and a bit of hands-on that's really nice, yeah, I think, for I mean years ago as well, the first time the dbt clicked for me, I was at a client and we had a data analyst and he was kind of doing the same thing that dbt's would take care for you, but by hand, you know.

Speaker 1: 15:19

Like he said, ah, we need this table, and then he would go, he would execute this query, then execute this query, then execute this query, and then kind of say, okay, now we have this right. But sometimes it gets really complex. Sometimes you have some logic, some like conditional stuff, and then I was like, ah, dbt really solves exactly this with plus, plus, like you said, like you have the documentation, you have all these things. And maybe for the people also that don't know, dbt, um, there's dbt labs and there's also dbt core yeah, and now fusion as well, and now studio and clouds can you, can you uh, explain a bit for people that don't know anything about, like get confused with all these terms like what should people know on?

Speaker 3: 15:55

like this intro yeah, so dbt by itself, uh, when people refer to it is mostly the, the open source tool named dbt core. Uh, this is something available on github and you install it as a python package and, from then on, you have a cli command so you can call dbt something, something, and the most common command you run is dbt run. It takes your sql code, runs it basically on your data warehouse, so it doesn't do any transformations or something. It's. It's basically a thing that reads codes and sends to the data warehouse, so it doesn't do any transformations or something. It's. It's basically a thing that reads codes and sends data warehouse. But then how it does all of that. This is where the magic comes in.

Speaker 3: 16:32

And uh, dpt labs, previously known as fishtown analytics, is a consultancy company that originally built tbt because they saw a common need for this kind of tool at their clients and they saw that this tool by itself became popular in in like 2020 was the first time it really picked up, and from there on, they started growing dbt core as a product. At some point, dbt stopped with their consultancy business for most of the time, they still do it, but I don't think it's that much really uh and they built a dbt cloud product as uh, the source of income, so this has extra features, extra bells and whistles, things that you might need if you're working dbt, things that most teams actually need when they work dbt. To avoid that you have to build it yourself each time, um and the fusion thing is very new.

Speaker 3: 17:23

No, yeah, since uh still the name is like since may yeah yeah, and what is this fusion thing?

Speaker 1: 17:31

I know it's in rust because that much I read I was like, nah, I know everything I needed to know. So it's fast.

Speaker 3: 17:37

Yeah, lightning fast yeah, yeah, they really in the presentation. Uh, they used ferraris to uh showcase how fast it is.

Speaker 2: 17:45

Wow, yeah, it was all about ferraris oh wow, there were actual ferraris on the podium no, it was a live uh screencast.

Speaker 1: 17:54

Uh, but everywhere in the slides they were using ferrari emojis and so on as well, and images was like was it they actually use the ferrari brand or was like red cars red sports cars, oh yeah. Red sports cars, oh yeah red sports cars.

Speaker 3: 18:06

Yeah, and they called it ferraris ah, they did so, okay, I was just wondering if they could you know, because it's like I am now ferrari might come back and exactly.

Speaker 1: 18:13

Well, yeah, exactly how the money that they make, well, I think, yeah, I don't know, with the whole open ai stuff, training all the data, people, okay, I don't know if it's's well.

Speaker 2: 18:23

What I mainly understood about the fusion is that it's a side of being faster because it's written in rust and not in python anymore. It also allows you to compile um, your sql code and understand it, while before it was just yeah, templated so if you got out of it, then you first need to compile and then check to have syntax errors or not. Yeah, yeah but now you would really get it, uh, while developing. So it would also enhance the developing experience.

Speaker 1: 18:49

So I have like the, the hints right on your id if you're saying like this comes from this and this comes from that. I think I also saw. No, I did read more than this, just rest. But I also saw that they had like, um, like, even like column uh lineage as well. So it's like not only like this feature comes from this table, but this feature depends on these three columns of this table and which depends on these columns in this table, so it's like it's really fine-grained as well yeah, they basically.

Speaker 3: 19:14

Uh, you have this competitor to dbt, sql mesh, which already was able to interpret your sql code, and then, around the second half of last year, a company became more and more known as df, which developed basically the same kind of tool for me but people are going to be angry if I describe it like this. But yeah, let's go with it.

Speaker 3: 19:36

Um, the same kind of thing, but then in rust and really 100 compatible with dbt, like dropping replacements like you just would down to sdf binary type sdf something it would do the same thing as a dbt, but then better, because understood, your sql can also say instead of executing all of this and this and this, you only need to execute these small parts of pieces of codes with more filtering applied to it. In the end, I think they said they could reduce your data warehouse costs by at least 10% and in some cases even 70% or something, because it just knows better what it has to execute exactly. It couldn't do that without understanding the SQL.

Speaker 1: 20:25

And now the dbt fusion fusion. It's not open source, that's proprietary.

Speaker 3: 20:31

It is, it is. It is open source, yeah well, partially. Yeah, okay, partially so like because russ compiles do a binary. They have some pieces which are closed source, but most of it is open source okay, okay.

Speaker 1: 20:43

what are the things that are open source? Do you know they're closed source, do you? Okay? What are the things that are open source? They're closed source, do you know? Like, what are the features that you're missing if you just use the open source DDBT Fusion?

Speaker 3: 20:51

It's not fully known yet, since they only open source a very small part of it and the plan is to open source the remaining parts by DBT Code last October. So by then we will really see what is really open source but what is definitely going to stay closed source. There is some kind of license check in it that's going to see if you bought a license for fusion or not, because most features are free. But at some point I might introduce things which are paid add-ons and then yeah, I see, okay, very cool, very interesting.

Speaker 2: 21:27

And then in your talk, dorian um well, one feature that dp fusion is still missing is the python modeling yes, so we haven't talked about that yet.

Speaker 2: 21:38

I will not shift, that's fast to dbt fusion because of it. Um, like, yeah, this python modeling. It's something that started, as I said, three years ago, and starting from version 1.3 is that, next to sql models, sql transformations. Dbt core also allowed python transformations and then, depending on the infrastructure that you're working on, it could either be snowflake, or it could be bigquery, or it could be databricks. It would basically, under the hood, run a pi spark job.

Speaker 1: 22:19

Yeah, and then that allows you to combine transformations between both sql and python, and it really made it more and more expressive so the way I understand and correct me if I'm wrong is like you have the dbt, which is like a template engines for sql. Normally that's how it's starting right, so you'd always run it still on top of the engine, which could be um, like you said, snowflake, bigquery, etc. Etc. Could be also like a, because any cb or something, anything boostgrass and then now.

Speaker 1: 22:48

So everything like every query would well, that's what, that's what they call the model. But then now you can also express these things in not sequel logic in python, and then for your dbt project it just looks like it's another model there. Yeah, but behind the hood it actually will run it in a python, because you cannot run python on on duckdb, you cannot run python on a sql database, but then you would kind of spin up the resources that you need, and then it depends a bit on where you are. I think snowflake has snow park, which is their flavor of spy spark, I guess.

Speaker 2: 23:22

I think on bigquery, I think it would, yeah it's also by spark drop, but it's on a data proc data proc, which is like a serverless uh can be serverless, but it can also be a managed cluster.

Speaker 1: 23:32

Okay, yeah, so it's not like every every engine that supports this as well. I think it depends.

Speaker 2: 23:37

There needs to be a counterpart yeah, it really depends on the, on the info okay for example, in the db. I think it's also possible because wdb runs locally, but then you don't have the added benefit of being scalable. Yeah, well, you do have that when you run it on clusters, especially if they're serverless yes, well, it's nice to see the bill afterwards as well, yeah, yeah, indeed.

Speaker 1: 23:59

Um, and how does this relate to your talk? So, as I put here on the screen your best bet effortless MLOps with dbt.

Speaker 2: 24:06

Yeah, real tongue twister.

Speaker 1: 24:08

Yeah. So what was your? It was a talk.

Speaker 2: 24:12

No, it wasn't a it was a talk and also with a demo. So basically, what we've also implemented at the client is a machine learning platform that supports most of the MLOps principles and most of those MLOps principles, and most of those MLOps principles can be implemented with pure SQL and Python. Basically. It's all you need if you understand the theoretical ideas behind it. And so in this talk, I show a demo of how to use MLOps, where I do exactly that. I build a simulation where we're trying to beat the bookmakers in the past, so in 2016, and we do it day by day. So, like you would usually run machine learning batch pipelines at the company and day by day, we make some bets automated and we evaluate and yeah, okay, and it contains everything.

Speaker 2: 25:04

It contains a feature store. It contains a prediction store, model registry what?

Speaker 1: 25:09

what the engine did you use for this? Was it on gcp, on snowflake? This one was on databricks databricks and in the feature store is it databricks specific or like no, like you could do exactly the same thing everywhere so basically, and again, maybe mlops for people that they heard MLOps but it's not super clear, how would this translate to MLOps?

Speaker 2: 25:33

Yeah, so MLOps is the set of principles that one would expect from a machining engineer to implement when they want to lift their machining model to production to be sure that it's, um, yeah, basically future proof and that it can be easily maintained and evaluated and monitored continuously, and also there's a whole set of software principles that go with it as well. And, yeah, so in this talk I show that with the main languages for data scientists, which are Python and SQL, you can basically implement most of those IDs In dbt, in dbt, and I'm forgetting now the YAML that you have to write.

Speaker 1: 26:20

YAML engineering? No, yaml, it's okay, use AI for that thing you just said it can be ai generated exactly so.

Speaker 1: 26:31

Uh, maybe is there something that you still miss in dbt to do mlops, let's say so. You say like implements most of the mlops. Things is there what like, maybe what are there still rough edges? Are there still things that you like, ah, the only thing that I would like to have, that I don't. That I think is essential. Or things is there what like, maybe what are there still rough edges? Are there still things that you're like, ah, the only thing that I would like to have, that I don't. That I think is essential, or it's a very, very important that is there something like that that we cannot do?

Speaker 2: 26:54

that's a great question and it does always require a bit of creativity to come up with the solution, but so far I always found a solution, so test me Okay. No, I'll trust you. Yeah, of course, like, what's lacking is the streaming, but so this is only for batch pipelines, it's not for online prediction.

Speaker 1: 27:15

Okay, and then would you say that Databricks is a mature environment for DBT, python, modeling, mlops.

Speaker 2: 27:29

I'd say so, but I just did a demo here and yeah, and also in the demo, like a lot of things that I don't do are security and so on.

Speaker 1: 27:33

Yeah, that's true, that's true, um, maybe so, both of your demos actually. So you both did a demo. Is this something that people could actually look at the code or try themselves or something, or no?

Speaker 3: 27:43

not really uh, yeah, um, I have a repository on my github. Um, that's basically, um, yeah, the full jaffle shop in dbt. So the typical example people use but then implement it. Uh, with just the minor differences you need for Microsoft Fabric and the script to just get started. It's connecting to my Azure account to download some CSE fast data and then you can basically follow all dbt principles from that repository so it's all there and people can just clone the repo and just kind of follow the step by step.

Speaker 1: 28:21

Are they going to publish the talks as well later? Okay, so maybe people can even clone the repo and just kind of follow the step by step maybe you can. Even I don't know are they gonna publish the talks as well later. Okay, so maybe people can even like follow the tutorial or like the the code and like listen to the talk and all these things. Yeah, and we'll add this, uh, to the show notes.

Speaker 2: 28:34

Yeah, the repo there, um, and yours during as well yes, also if I did an earlier version of this exact presentation already at byData, which is on YouTube, so you can check that one out and the code should be reproducible.

Speaker 1: 28:49

I like that. Yeah, it's a good framing.

Speaker 2: 28:53

It should be. Yeah, there's now a free version of Databricks. So you can test it there and use the smallest cluster types and smallest compute types and just play a bit with it.

Speaker 1: 29:03

Yeah, okay, very cool, okay, very cool, very cool. Now maybe on to the rest of the uh. So you attended, you presented, uh, I'm sure, both of what was well received. I know maybe for yours, dorian, did you? Did you attend his talk, sam?

Speaker 3: 29:20

no, I couldn't make it on saturday, only that it's on th on Thursday and Friday. But I did watch. The attendees could watch the recordings themselves already. Ah, you could already. So I watched Doyan's recording on the train back to.

Speaker 1: 29:32

Belgium and you had a comment on the video like, wow, he's so hot the slides are really the slides are really beautiful.

Speaker 2: 29:42

I thought, also your Ghibli team as well, but uh apparently, if you do in the open ai and you also generate an image it's just everything, everything is ghibli, yeah yeah, I feel like it died down a bit, but that was a big hype on the, I mean when they started it.

Speaker 3: 30:01

But yeah, yeah so yeah, I really liked it a lot. But I think there the thing is that the conference audience if you see the amount of people who are new to dbt because that's always the first question I ask when people attend my talk like did you use dbt, and so on here this really builds upon. You already know dbt and this is like bring it to the next level. It's. It's a sort of feature I'm working on for the fabric adapter to bring python uh model support to it and uh, um, the thing doyan presented for me is super interesting. Also has lots of questions story in the past already and how he did it and implemented also at clients. But maybe for the conference audience it was a bit like building on top of each other, yeah, yeah for yeah, applicable to everyone yeah, because I saw your talk was labeled as intermediate as well, was it?

Speaker 1: 30:53

uh, because I mean you have to know a bit about the bt to be able to talk about python models and also mlops, exactly. Also, if it's more data, did you feel like a? Uh, how do you say like the duckling?

Speaker 2: 31:03

you know the odd duckling there like yeah, like actually, uh, on thursday on the, I was thirsty while I was at the training data breaks, training and the. The presenter, the person in front she, asked like yeah, who's a data scientist here?

Speaker 1: 31:20

and there's just two people already their hands, the other one being part of the organization, throwing stuff at them. Yeah, everybody was looking like, oh, that's a data scientist. Yeah, yeah, they literally tagged them. You know, it's like no one wants to play go-kart with them, yeah, so then you were introducing yourself.

Speaker 2: 31:41

I was like I'm a Mops engineer. Yeah, but yeah, so, um my talk. There was not so many people in the room, but the ones that were in the room did ask a lot of questions afterwards sometimes it's nice as well, because I think you get more interaction yeah, I think you have some big rooms as well.

Speaker 1: 31:58

Yeah, uh, very cool. So you also attended partially the conference, at least. Um, what are other things, maybe starting with you, sam, what is some, what are? What are your?

Speaker 3: 32:10

maybe your favorite or one of your favorite talks there and maybe I'll put one actually like quite a lot of them, but the one that I liked a lot was, uh, the talk given by andy Cutler. He's an MVP from the UK, but about similar interests as I have, and he talked about the whole thing that's going on with the Battle of the, the table formats, the, the lake formats that you have today with delta lake, iceberg hoodie who is the latest one?

Speaker 3: 32:50

yeah, or if you can't cross table, like the apache x table thing, the one that merges them together, maybe that's okay. A bit late to the party. Yeah, yeah, yeah, well I don't know, but it's.

Speaker 3: 33:03

Is it seen as a separate format? Because that's the one we can together. So he also talked about it and, um, for me, I always uh knew about delta lakes. Since it's, I've always tried to databricks and the microsoft stack and there's the, basically the default one, while iceberg, I think, is more popular when the people using aws and snowflake and so on, and fabric, for example, supports delta lake and iceberg, uh, to read and to write only delta lake. Um, but it's interesting to. For me, they always understood like they're the same thing but implemented differently. That was what I am, was what I had in my head, but he did show some things which are different in each of every one of them and then bring it together in the end by with the cross table thing where you can, uh, by using that tool, read any of those three and also write to any of those three formats um, maybe, uh, you mentioned, some things are different, it's not just implemented differently can you maybe highlight one thing, just to make it a bit more concrete for that I would say, uh, watch andy's talk

Speaker 3: 34:08

on youtube that's just really, uh, very technical at the end. Uh, I also it's difficult topic to give. I wouldn't personally be able to give this, you see, level advanced even. But, and the reality is best to go from people who probably don't know about these formats and maybe only know or fcs v files or pocket files somewhere. Yeah, we didn't explaining, like, what is this concept of these lake formats, um and um, why you need them, why they bring value, and then going into any one of them explaining how they also evolved because I think one of them came from netflix and the other one also came from another big company uh, and explaining what the timeline was like. When did what happened in this term of evolution?

Speaker 1: 34:57

yeah, I know that. Uh, so I think for, maybe, maybe, for a lot of people, maybe it's a bit, it's a bit curious to know like there's a talk about tables. You know, it's like the people get excited about tables.

Speaker 2: 35:12

Yeah, so with these names, like it's not clear that it's about tables.

Speaker 1: 35:16

Yeah, indeed. But it's like if I, for example, tell my wife I was like, oh yeah, I went to this talk. It was really cool.

Speaker 3: 35:28

They talk about the different table formats. You're gonna be like, you know like, but even the names like the battle of the lake house someone not in data, think a lake house. What is it a house? Yeah, lake. And then you have this delta lake iceberg.

Speaker 1: 35:35

Yeah, I think. Uh, I understand, I mean I can understand a bit, because I mean I'm not, I'm not one of you.

Speaker 1: 35:44

Let's say but I feel like I touched a bit, like I know for a client. A lot of the stuff was in Delta Lake and they wanted to migrate everything to Iceberg because they had external tables on Snowflake and they saw that external tables performance was almost the same as the native Snowflake tables. So there was like some excitement there. But the thing is also that's also the feedback they had is like every few years there's a new format that is like better and like if you say I'm gonna migrate everything every time a new format comes, then you're always migrating stuff.

Speaker 1: 36:11

Yeah, you know, so it's a bit uh, yeah, it's a bit a bit challenging, right? You also need to say, okay, let's just yeah, this is not the best, but let's just live with this well, and this talk was um given um around the same time as duck lake was released.

Speaker 3: 36:28

I think he didn't include it in his talk, obviously, because enough time anymore. I think he did mention it, um, but I think, like the whole circle is going around again, we came from one giant database or data warehouse then to decentralize things on data lakes and maybe for before, like for the duck lake.

Speaker 1: 36:49

You said, yeah, what is duck lake? Because I think it's with the ducty, beat stuff.

Speaker 3: 36:53

No, yeah, so one of the problems you have with these lake formats is there is no central place to know which tables you have. Like every table you have is stored in a folder and then that folder contains a set of parquet files and then, depending if you use delta lake, iceberg or hoodie, it will have json files next to it and that scheme of that json file is actually what makes them different and how they can be used. And but the problem is that you need some kind of catalog which, on database, is unity catalog, on fabric it's a built-in hidden thing. Snowflake has polaris, I think, and there's lots of open source iceberg catalogs and this catalog tells your thing, that's, reading or working with the data. Look, you can find the table that we call with this schema and this table name in this location.

Speaker 3: 37:44

On storage, on cloud storage, which can be anywhere S3 or ADLS or something, and the problem is like these catalogs are often difficult to configure and to set up. People will build duckdb, thought, okay, let's make a database and then we store this information in this database and then we'll have a uniform way to talk to all of your data, locate wherever it is, but like if we come from one big data warehouse and then we go to storage formats on data lakes. Now we're back to a database, because the database is necessary to let us know where the data is stored like. Maybe next evolutions go back to the data.

Speaker 1: 38:27

It's a pendulum right, you just kind of keep swinging.

Speaker 2: 38:29

And I think like one thing that these formats add as well is because basically in a Parquet file if you want to do any updates, you need to read the entire Parquet file and then to rewrite, you need to rewrite the entire Parquet file.

Speaker 1: 38:43

It can be slow and I think a lot of these formats also try to address that input output problem and that's also like yeah, also like different gradations of like yeah and I think I mean, yeah, and we're saying this, but I think the impact is really big as well, right, like when you, when you have the amount of data we actually have today, right. So I think I think also like when I have these amount of data we actually have today, right. So I think also like when I have these discussions is also a good reminder how almost everything is really a rabbit hole, like things that you kind of take for granted, you know, but like if you really look at it and like you start dissecting it, there's so much to it.

Speaker 3: 39:17

Yeah, you really have to use any of these stable formats, like if you haven't heard about them and you're working with a data lake. This is your queue. You have to investigate delta lake or iceberg.

Speaker 1: 39:28

Skip away nobody's using you heard it here first. Yes, um, so maybe we can move on to dorian's first uh pick. And I don't think I'm saying first pick, because I don't think this was a talk, actually no, it was workshop.

Speaker 2: 39:45

That was indeed a day of training. Yes, I actually did my demo in data bricks, but that was my only experience with data bricks until that uh moment. So I thought like, okay, let's, let's get to know it a bit better. And that's why I attended this, because it was interesting, and it's actually there where I learned what Iceberg and Delta Lake were, because before that it was also just words. Just words, right, Just like yeah, Fancy sounding words which are hard to explain.

Speaker 1: 40:13

Yeah, yeah, that's the newest one, so I think that's the best one. Let's just go with that. Yeah, so the title of the workshop was Data Engineering with Databricks. How long was this workshop? From nine to five Like a beautiful workday yeah exactly, and you go home to your kids afterwards With free coffee and free lunch. Well, it's kind of like coming to the office.

Speaker 2: 40:33

Not a free lunch.

Speaker 1: 40:34

On Fridays maybe. So Data Engineering with Databricks. So, aside from Delta Lake and Iceberg, what else did you talk about this?

Speaker 2: 40:44

Like it started with the UI, really started from the basics and then there were like some From like zero to hero, kind of thing.

Speaker 2: 40:51

Yeah, and the thing that struck me the most is how Databricks is really organized around notebooks, which I found a bit funny at the same time. But yeah, notebooks are great for explaining things. So we went from start and then also a bit of pie spark, but in the end, since it was a sequel audience, everything was done in sequel in the end. Yeah, on notebooks, still on notebooks. Oh wow, yeah, what I learned and what I liked a lot was actually the unity catalog, which allows you to really open any file type very simply. Yeah, that's that's one thing that I liked about it also, that I learned finally what these uh secret words actually meant yeah, you always felt like you left out of the party yeah, it was like.

Speaker 2: 41:42

It was like getting excited about. It was like oh shit, they talk about unity, and then dalton, like I didn't know. It was like, yeah, word, word, unity, yeah togetherness, yes, and also done some transformations, some table merges, updates and so on and also learned a little bit what happens under the hood. What I found interesting is that under the hood, what happens automatically by Databricks is they optimize the way that you store your data. There's actually an optimized SQL statement in Databricks.

Speaker 1: 42:23

Oh wow.

Speaker 2: 42:25

But they actually do it under the hood when you see that the files don't have the same size when the data is queued, stuff like that. So it was really interesting and it was hands-on as well. It was with exercises.

Speaker 1: 42:38

That's really nice, and maybe Databricks runs on top of AWS or Azure, right? It's not like a separate cloud, I think right.

Speaker 3: 42:48

Or GCP as well.

Speaker 1: 42:49

Or GCP as well, but they use the actual hardware from another cloud provider. Is that?

Speaker 2: 42:54

right? Not sure. I'm not sure which cloud provider they used, because but you need one right.

Speaker 1: 43:00

It's not like Databricks have their own servers. Maybe, after having done this, like when would you go for databricks versus something that the actual clouds already offer?

Speaker 2: 43:12

that's a great question. I think what's nice and about databricks is that it's just for an engineer for data engineer or machine learning engineer just one place to work I don't need to move out of. So that's the nice part. And I think where databricks is heading right now is that they also kind of want to open it up to more data analysts and more business type of roles and they try to open it up to to that and so a bit like the other platforms as well.

Speaker 1: 43:47

I see, is the the breadth of services. Is it as wide as as like Azure or AWS, or is it more?

Speaker 3: 43:55

specialized I think it's upcoming there. So they always. They started from Spark and then they first built a whole notebook experience and Spark jobs. Then they added the Delta Lake, which is made by Databricks they are the creators of it. Delta Lake, which is made by Databricks they are the creators of it and then they had the Databricks, workflows and jobs to schedule things. So basically, you can shut down your airflow and take their thing instead. There's also some kind of dashboarding tool in Databricks if I'm not mistaken, multiple even and I think they're slowly building up their stacks, stacks that you only have one place to go to.

Speaker 3: 44:38

Uh, because what they usually were lacking is ingesting data into your clouds was not possible yet, but I think last year they acquired the company and now slowly integrating that technology into database itself, so it's also possible, and then I think they have to. We need to some polishing work to make it easier for less tech savvy people to start working with database. I always had the impression that it because, as a name, of focusing really on the engineers who want perfection, and then we, the, the, the best tool that can do everything they want, um, and for that it's really probably still the best tool today as well if you focus on really uh good engineering work yeah, um, and maybe one last question before we move on to the next pick uh notebooks for databricks.

Speaker 1: 45:28

Is this uh yay or nay?

Speaker 2: 45:32

I think for deployment practices. It's a bit strange because it's harder to test a notebook than to test a code. So with that, for me it's a bit of a nay, but there's also a yay side of it, of course. I also love first-proofing concepts in notebooks. It's intuitive. You can explain things along the way in a clean way.

Speaker 1: 45:57

For maybe workshops as well, Workshops perfect.

Speaker 2: 46:00

But yeah, there was also this workflow thing that you could do, and I think they announced at their summit that they were going to make it also drag and drop.

Speaker 1: 46:12

Ah, okay, yeah, and that's also then to make it also drag and drop.

Speaker 2: 46:13

Ah, okay, yeah, and that's also then based a bit around notebooks and so on. So that's a bit yeah.

Speaker 3: 46:21

I see. And what about you, sam? Yeah, it's a common theme. Like all the examples you see on Microsoft, fabric and Databricks are always notebook focused.

Speaker 1: 46:31

Even on AWS, I think.

Speaker 3: 46:33

A lot of examples are notebooks, notebooks I think my my next pick is really about that, about that people learn notebooks, but they then they don't know anymore that notebooks are basically something coming from making it easier for exploration, but actually you're not supposed to put this into production yeah it's really to explore your data but you have to probably write a python package to run your spark jobs. But it's good for some use cases like database. Also has these ai notebooks where you can talk natural language and then it does the magic behind the scenes to run your krabby's.

Speaker 3: 47:09

For that it's great. But, like for your data pipelines, don't do it yeah, I think uh lucky there's dbt on databricks.

Speaker 2: 47:17

Now, I didn't mention it in your workshop?

Speaker 3: 47:20

I was wondering because?

Speaker 2: 47:21

all the people.

Speaker 1: 47:23

She knows dbt quite well at the trainer you had yeah, um, yeah, there's even like the notebook engineers and all these things. But I think think even for Databricks, if you actually inspect the notebook artifacts, it's not like a normal Jupyter notebook. It's like there's a Python counterpart and something else. It's kind of nice Because I remember one time people said like okay, I'm working on Databricks and now I want to know a bit how to conversion these things a bit better.

Speaker 1: 47:51

And I know you worked a bit on this and like when I started looking at it, it's like it isn't like it shows as a notebook in the ui and everything. But if you really look at the artifacts it's very different from what they have. And I think it's like maybe people trying to do a bit of a like mutate a bit notebook, so they're a bit less like this exploration focused tool. But then in the end, like you kind of have something like super, like a bit weird, like very little people actually use it like this and some people like use for other things, like because there was, there were like plugins, some tools to work with the jupyter notebooks, which is basically json, but you cannot apply this to the, to the databricks notebooks. This was some years ago as well and I maybe miss me remembering, but I think today you can, on most platforms, export and import from and to IPyMB files.

Speaker 1: 48:34

Indeed. But I'm also thinking like oh yeah, maybe, maybe the IPyMB, right, but I'm wondering if, like the cell outputs and all these things like, is there a conversion? Is there something you get lost in translation? For me it was always a bit like because you see that a lot, you know Nowadays you have Marimo, which is like a different type of notebook, that you have reactive cells.

Speaker 2: 48:57

What does that mean? To have a reactive cell?

Speaker 1: 48:59

So like if you have like X is equal to two and then the following cell you have like Y is equal to X plus two, in Jupyter notebooks you just execute and then you execute and whatever is the state there, like there's nothing. But now if I change the x value in the previous cell, the bottom cell will automatically execute and it doesn't need to be in order, right, like if you execute first the last cell and then the first cell. But there's a this dependency. It will keep track of that yeah.

Speaker 2: 49:25

So it will keep telling you like, yeah, it's not enter and runnable?

Speaker 1: 49:28

yeah, exactly, but like every time. So you can turn it off but by default is on. So every time you change something, anything that depends on this will automatically execute. So there is no like hidden state or something. That's one way to try to solve it. But like, marimo is also another tool. Again, notebooks have an ecosystem, so like if you want to open it up on VS Code and you want to use your AI coding assistance, you could, and Marimo wasn't working very well. So like there's this whole, like it becomes more complicated, right, because Jupyter notebooks they have been very popular for a long time. But yes, now maybe for your second pick and I think you already hinted a bit towards it A junior data engineer's story of success and struggle wow, yeah, so I really like to talk it's an intermediate talk.

Speaker 3: 50:19

Huh, interestingly yeah, I get why.

Speaker 3: 50:23

Um, but also not too many people in the audience, I think maybe maximum 10, 15 people or something, um, which does not for me mean anything about the quality of the talk, and I really liked it a lot because it reminded me of things I faced myself.

Speaker 3: 50:43

I also, uh, joined data which, like I said, five and a half years ago and came from a background in software engineering and cloud engineering and so on and um, all the data engineering was new to me, but I already knew what were the best practices in software engineering. And here Amy graduated five years ago, studied AI at a university I don't remember which one and then did a job interview for a data engineer's role. She thought she wouldn't get it and then in the end, they made an offer. So she joined and she, in her talks, talks about all the things she learned along the way about data engineering, because such a broad field like you cannot ask five data engineers what is data engineering and get the same answer. It will always be something very different for everyone. Um, but she did uh go over the things like like we also discussed.

Speaker 3: 51:46

Like you should not just know about notebooks, but also know that you there is something called spark jobs and python packages and that you can write unit tests for these things, and you should actually test your code she was a software engineer before no she she just came out of school but she, it's impressive how fast she learned everything uh to to a very high level of understanding already and um that uh she her talk really explains the, the roadmap to becoming a data engineer today and everything she learned along those uh five years and uh lots of talks at these kinds of conferences show that people actually don't know about spark jobs and python packages and do everything in notebooks because that's the only thing that exists, they think yeah, I feel like there's a lot of uh when you see demos, or it's always notebooks as well, so I feel like people get really bombarded with notebooks.

Speaker 1: 52:46

I think it's also easy for you to say like, yeah, I can, it's working, let's move on to the next thing yeah, it's all like the, the practices we know from software.

Speaker 3: 52:53

They were forgotten quite often. Uh, she even went into things like behavior-driven testing, behavior-driven development, which is something I also learned at school and then never applied again. Um, but in when I had learned it, I also understood the value of it and I hoped that at some point I could use it in my career. But, um, these are things people never make time for and actually we we should do that a bit more. Uh, so it was a great overview of all the things you might have missed, uh, along the way. And uh, like, basically, like I told you at the end, like people would come to me today I want to become a data engineer. What do I learn?

Speaker 1: 53:32

I'm just going to send them this link very cool and, uh, you would say like this is more intermediate because it kind of touches on a lot of different the software engineering concepts and like, like you can maybe follow as someone that is just starting off, but for you to really appreciate how this resonates you have to have a bit of experience to see this is that I think, because it's only one hour, or a bit less even, and there are so many topics he touched upon that you really have to take your time to absorb them and look them up if you don't know them already.

Speaker 3: 54:05

For me, I already knew, I think, about everything that was mentioned at talk, but I saw with other people in the audience, um, that lots of things were new, like, for example, one thing she mentions is dbt, but I give an entire talk about dbt to help people understand it yeah and there is only one or two minutes, um same with data quality testing, like so that expectation, these kinds of things.

Speaker 1: 54:28

So because that much I think it's intermediate, because you have to be able to follow yeah, I see, I see, I see, I see sounds like a very uh, and it's a personal anecdote, let's say it's her journey it's her journey from graduating to uh.

Speaker 3: 54:45

I think she works in netherlands with info supports, a senior data engineer now, and her story from those five years really cool, really cool.

Speaker 1: 54:55

Very curious as well. Um, you didn't watch this talk, right, dorian?

Speaker 2: 55:00

no, no I think it was, was it on a friday?

Speaker 3: 55:06

I don't remember.

Speaker 2: 55:06

I think it was the friday, just friday I took a day off and I just watched the thumbs. Nice, um, and maybe for your second pick now, dorian automating engineering with ai, yeah, the. But the funny thing was so it was mostly all sql server and a lot of sql-based talks, but here and there it was sprinkles of lms and ai and so on yeah, and he like we're trying to stay away, but like you can't, like it's always like on the corner looking relevant as an organizer and you need to put it in a bit, yeah, and so this was a bit of like um motivational doomsday talk, as I as I would call it like, can we pause a bit of like?

Speaker 1: 55:55

um motivational doomsday talk, as I would call it like. Can we pause a bit like motivational doomsday? How does that sound?

Speaker 2: 55:59

yeah, well, so like it's actually also funny to to see kind of the difference. So my talk was at nine in the morning, right after the party. So okay, like 20 people everyone's hangover, yeah, but then okay, so like, maybe it's, maybe it's because, um, it's saturday morning 9 00 am, but then the next talk at 10 am. Uh, from I think his name was simon whiteley this guy, this talk. Yeah, like, okay, the room was packed uh he's very well known within.

Speaker 3: 56:28

Uh, yeah, I think I saw him before as well yes, uh, I think the most popular youtube channel about data engineering oh yeah yeah, because I mean I'm not super into I mean I know microsoft.

Speaker 1: 56:43

I don't follow as closely, but I think I I've seen him before as well.

Speaker 3: 56:46

Yeah, he's also database mvp and microsoft mvp oh, yeah, okay, yeah, cool.

Speaker 2: 56:51

I didn't know that, but he does seem like he. You met a legend. He was really. It really seemed like he was used to giving this type of talks and so on and being an influencer, and he started really controversially with big white letters on the screen with the black backgrounds. Data engineering is dead. Oh, really that got everyone's attention.

Speaker 1: 57:13

Yeah, great matthews effect. So is like so what was the talk about the engineering with ai?

Speaker 2: 57:22

yeah, exactly like, how should you use it? And also and that was, I think, the most interesting takeaway, which also made a lot of sense so what? What I remember from the talk, which I found the most interesting, was okay, it's clear, everybody at least use some AI in your work to enhance your efficiency and to improve your workflow and to automate the boring parts like testing and data quality tests and documentation to some degree. And he was also like. He also said like okay, um, now just use it. Everybody's going to use it and if you don't use it, you will lose your relevance. So, apply it as fast as possible, as much as possible, but with a conscious mind.

Speaker 2: 58:13

And he was also very interested in what was going to happen in five to ten years. Why? Because right now, all juniors, junior data engineers and it actually is nice follow up on on your talk yeah, a lot of new juniors engineers will use a lot of ai in their work, but maybe to stay up to speed, to remain like fast coders, they will just apply it To be productive, exactly. They will just apply what the machine tells them and they will run it and they might not take the time to understand what they're running and that's okay for now for quick demos and things that need to be put in production immediately, but the backslash will only come in five years or something when everybody's fixing those bugs.

Speaker 2: 58:59

I see, I seem like okay, what did the machine write? Yeah and there we kind of maybe a whole um like usually between five and ten years you become a senior engineer. But the juniors from today, if they apply maybe too much ai coding, will they be a senior? Will they understand what they? Would? They understand all the?

Speaker 1: 59:19

concepts I see, so it's like it's gonna hinder their actual learning yeah, like that they won't be, yeah, like they won't have the expertise because they didn't grind yeah, they didn't grind through it, like they didn't understand why they were doing this.

Speaker 2: 59:35

Um what? For what are the reasons?

Speaker 1: 59:38

yeah, I think the bit of schooling, yeah maybe the best, worst scenario is like maybe in five years the machines are so good that you don't care. You know, maybe it's just like in five years, it's just like hey, claude, fix it. You know, I don't care, like shut up, just do it yeah, that's possibility, indeed.

Speaker 2: 59:55

Yeah, how do you write new code then? How do you write new ideas? That's true. How?

Speaker 1: 59:58

do you?

Speaker 1: 59:58

create new frameworks this is like a snake that eats his own tail. You know, just ask claude to do this and then trains and just like just uh, no, it's a, it's interesting. But, like when he maybe a question as well. So the things you mentioned, like testing and documentation, uh, actually the name of the talk is Automated Engineering with AI, so not Data Engineering. But I was going to ask is it different from software engineering in general? Like, is there anything specifically about data engineering that may change with the rise of AI, or is there a new skill or something else that people may need to that data engineers will need to know about AI? And when we say AI here, we're talking about Gen AI.

Speaker 2: 1:00:40

I think it's mainly about writing DDL or schema generation and so on.

Speaker 1: 1:00:45

So this is the top of the world.

Speaker 2: 1:00:46

Those were examples that he had given in his talk Of where like okay, like this, don't do this yourself anymore had given in his talk of where like okay, like this, don't do this yourself anymore, yeah, or also maybe like grinding, performance tuning and so on. A lot of times this now happens really at the info level. For example, I gave the optimized example data bricks like you don't need to care about it anymore, those type of data engineering intricacies. They disappear more and more since the cloud architectures are already taking care for it for you yeah, and do you believe with his?

Speaker 1: 1:01:21

uh, well, not believe, but do you agree with his concern, let's say with the five to ten year horizon?

Speaker 3: 1:01:30

but I think it also helps people to learn. No, for for me, I use copilot pro all the time, so it's suggesting me all day long things in my code. And, um, I think yesterday I was coding something and I wanted to profile my code in python, which I never did before, and I just create a function like profile this function and it's auto completed the whole function for me, but it's also triggered me to look up like, okay, what are these uh functions it's calling within the python c profile library and what are they doing? Why exactly is it calling it? And I can also like highlight pieces of code and say explain this to me, and so on. For me, it's way faster to pick up new things as well that way, but maybe it's also because I've already been into this for quite a while. I don't know how junior developers, lead engineers, would approach this.

Speaker 3: 1:02:24

Maybe they would say well looks looks good, let's, let's roll with it, but I would think there are still people like me with who would have this interest of understanding exactly what was generated. It's like do you copy paste from stack overflow or do you actually read the answer?

Speaker 2: 1:02:39

you have a great reflex, huh. But I think I'm not sure everybody has the same reflex and I think also stakeholders in general they might, due to the rise of ai, they might expect, like you have ai now you're like yeah, it should be super easy to implement this right Like why is it taking so long?

Speaker 1: 1:02:54

Why is it?

Speaker 2: 1:02:54

taking you so long.

Speaker 1: 1:02:55

Yeah, yeah, and that's kind of like puts a pressure on you and like if you have like not too much time to try to understand what you're doing, if you don't get the time, skip it I think, well, I think there is one of the things the attitude of the person, I think, even in schools and stuff, right like, do you actually want to learn or just want to get it done, right? So I do think that's that's one thing, and I hope. I mean I also think it's a super good learning tool, right like, even if you, even if you said something wrong, but just to kind of be thinking critically, talk about it. One thing that I started doing more with ai is just like I have a plan, I have an idea. Just criticize my plan, just ask me questions so I can be like, oh yeah, I didn't think of that. Oh yeah, because sometimes, like it's very easy to start and be like, okay, I have it figured out. But as soon as you take the first step, you're like, oh, but actually there's this, actually there's that and so super helpful. I definitely see the value. But I think you can be like damaging when people just try to get it done. They don't care about what it is like.

Speaker 1: 1:03:52

And I also think that the other thing is, if you're a junior, maybe the amount of things you need to learn to see, like when you look at the end solution. The amount of things you actually need, we need to look up, are too much for people to actually do it, and I think for you, because you have more experience, is like it's just this. So yeah, I'll look into it. But like, if someone let's think of the talk that you mentioned, all the breadth of things that the engineers need to do, if some AI kind of codes, something that touches a bit on five or six different things, for you to really click and understand and appreciate that why we're using this and why we're not using that, I think that may be too much sometimes as well. So I'm not saying I disagree with you.

Speaker 1: 1:04:32

I do think there is a very good opportunity for learning, but I also think that the people that will benefit the most is one, to have the best attitude and two and I think it's like with any learning, yeah, like to have like a step that is a bit taller than what you're used to, so you're a bit uncomfortable, but not too tall that you cannot climb right. Like you always need to find a bit like that sweet spot. You know, like even when I have I don't know when I was learning rust and I wanted to play with projects. You know, it's like I want to do something a bit more challenging, but also something super complicated. I'm like there's no way I'm gonna do it right. So I think there's a, there's a bit of that, so I think it's it'll be interesting to see. And the other thing I would also say that now, like I think you're talking about ai assisted, but like, if you go to one step further, to vibe coding, I say there's a lot of sense, guys, yeah the word hasn't mentioned.

Speaker 1: 1:05:21

That's it. Uh, I think if you take one step further there, I also think there's a bit of a skill on how, like, how do you ask things or how do you, you know, instruct things, or what are the things you actually need to slow down, or like even the thing of okay, if you have a greenfield project, you just want to start something. Maybe you just want to ask, if you're quite like, just criticize my plan a bit, let's come up with that actual plan and not just get started with doing, because if you just get started it will go super fast. But if you don't have a good plan or if you don't think things through, it's very easy to steer off one way or another.

Speaker 3: 1:05:52

But isn't Fype calling an intermediate phase that we're going through? I really like GitHub's project and got lots of criticism as well, but in the end it's, I think, a good idea that you create an issue, you assign it to a GitHub co-opad and it starts creating a plan to implement it and in the end, after a few minutes sometimes can be a lot of minutes, but it has a pull request for you to implement that feature. And, like now it got on the first page of hacker news because microsoft was doing this and, um, they use it to write the next versionnet sdk of the framework and so on. And you really saw the engineer struggle but commenting on the issue no, you got the trunk co-pilot this and this is and this you have listened to it on. It's like do it again or do it better, and so on. No, you got it wrong again.

Speaker 3: 1:06:38

And people were upset. Like is this the next version of NET that we'll have to use to build our software? Is this AI generated code? Like can I rely on this? But like they're only doing this because you have to dog for it and improve it. But I do have faith that at some point this will become good enough.

Speaker 1: 1:06:55

Like today you can vibe court in some kind of way, uh and and get qualitative results, um, but I think we're only at the first phase of this, and for sure I do think and that's what I mean like, again, things will evolve as well, right, but for example, the the issues right, is the issue very well structured on what you need to do, or is it just saying like, give me this, because that's also very different, right? Is it a big feature or is it a small feature? Like I also in the beginning, I was just saying like, okay, I need to create an app that does first, like first like set up the database on this. Okay, now do that and like just kind of breaking it down a bit more. It also already helps, you know.

Speaker 1: 1:07:39

So I think there are a lot of things and, again, the tooling is going to get better. There's like cloud code now, so it's on your terminal. There's also a Gemini CLI. I also think there's maybe a bit of a. I actually was talking to Bart about it that he said that the way, like he likes clock code and he, but it's not because it's better than the ID for any reason, it's just because there's. It helps you focus on what matters.

Speaker 3: 1:08:03

Yeah.

Speaker 1: 1:08:04

You know, like you have like your human context, you know if you have too many files, too many things, like no, just focus on this, just like one thing at a time, you know. So I think it's like I do think there is a bit of a skill on how much detail you want to add. Also to slow down. I think sometimes it's so easy to move fast with these things that you just give like two sentences, boom something, boom something and then like you stop and like whoa, there's so much crap here, like where is this going?

Speaker 2: 1:08:26

right, it's a dopamine shot. I like I just write one sentence and I get like an entire app. Yay, exactly you know.

Speaker 1: 1:08:31

So I think that I think so, I think there's a bit. I also think there's a bit of a skill there as well, um. So, on the same time, like again going back to the, what we're discussing and the junior engineers, they may have like skills gap in the, in the the core, let's say, on what it's actually doing. I also think that we need to also learn how to leverage these tools better and like some of the stuff is going to be good, some of the stuff is not going to be good, um, but I think and I think we all agree that if you're not using this, you're gonna you're gonna fall behind I mean far behind in many different ways. Right, you're gonna be less productive, um, I think learning as well. Right, like you can learn way more things now, um, so, yeah, also, yeah, I was even mentioning before like I started playing with clot code, um, and sometimes like, yeah, sometimes I'm multitasking, right.

Speaker 1: 1:09:18

So I said clot, do this, and then it just does something and I go check this, check that and go back. No, no, you're wrong. That what I mean is this, this and this and this, okay, try, okay, go back. I was like okay, but now I'm looking, I look at the directory there's too many things, this is too complex, organize this and he does a good job, right. But then I was thinking, okay, that's because I have multiple tasks, right, if I'm just doing this, maybe I need, maybe I can open two terminals and I can open two branches on the same git and just say, okay, you do this, you do that, and then like maybe I can open four, you know, and just and I even heard a podcast the guy was saying he's just like his job and I just 34 for your brain, something like that Not really Like it's already with the current generation and TikTok scrolling that they don't have any attention span, but you're switching between tasks like every second.

Speaker 1: 1:09:59

Yeah, but it's really like maybe, and I was even looking like okay, git has like work tree things so you can actually open two branches on the same like to that and let me try to merge it afterwards and then my brain was like spinning, you know um I think you're trying to be too productive.

Speaker 3: 1:10:18

People scroll on reddit while they wait for quotes.

Speaker 1: 1:10:19

Usually you don't have to work but I just think it's like sometimes if you just have one, like you have the quad terminal and it's just a terminal, right, there's no files, you just say something and you're just like okay, okay, let me check. No, okay, you know, like, and sometimes that in between is enough for me to get bored, yeah and like, maybe I'll forget. And I think maybe if I have a bit more interaction then it would be the sweet spot for me.

Speaker 2: 1:10:43

But like doesn't cloud code before it runs commands, ask your permission. Is that? Is that the perfect.

Speaker 1: 1:10:49

I already, I already said just just do it. Just do it, I mean not for everything, but like for a lot of stuff.

Speaker 1: 1:10:54

That's the interactivity part though, yeah, this shoe. But sometimes it's like can I add a dependency? Can I run uv add? I'm like man, just add whatever you want. Like if you're gonna delete something, let me, let's talk about it, right, but for a lot of stuff, just rest and get. You can always reverse it. Yeah, but it doesn't get commit.

Speaker 1: 1:11:10

So I actually put, put it on the thing, try to commit as often as possible. I try to add pre-commit hooks as well. So it's like not too much, but like I don't even know what pre-commit hooks they have, because I put rough but I said add some linting rules but don't be too specific. You know, just add something and like it did something, you know. But like I was also wondering how much of that is like coaching, a really not even that really junior, like a developer, you know, you just go, ah, do this, and then I'll review your code after a while. And you do that, you know, and then like let's see how it's going.

Speaker 1: 1:11:41

Like there was also another article that barchetta as well. That was like they were drawing the parallel between being like a manager or being like manager of people and manager of agents. You, you know which. A lot of the times a bit the same. You have to be more specific, you have to add more context, you have to explain better what you're trying to do to really get the right output right and that really like stuck with me you know, so now instead of you leading a team of three junior developers and I'm saying kind of junior, because the knowledge that it has is actually quite advanced already.

Speaker 1: 1:12:12

Like you said, you can actually learn from the AI thing.

Speaker 1: 1:12:14

So it's not really that junior, it's just like it's like someone super knowledgeable but like it's bad at like the bigger piece usually yeah yeah, and I think sometimes too, it's bad at asking questions, right, it's like someone that is like super shy, that doesn't want to ask any questions, but has a lot of knowledge. Yeah, right. And then like, okay, do this. Okay, try that. Okay, do this, you know. Or like, let's, let's stop and let's think together Is this what we want to do? What do we want to go next? You know so I that it's gonna be not exciting. And then, yeah, you need to do it because otherwise you're falling behind, but it's like you're not doing something you enjoy, right. But I also hear a lot of people saying that they get like addicted to this, like they're doing this, and then they have to force them. They have to force themselves to stop to go eat, to go take a shower, to go to bed right, which never happened to me yet has happened with me programming, because I was like trying to solve a problem.

Speaker 2: 1:13:14

I get really sucked in, um, but I'm trying to to play more with it and that's the so you want to get to that space where you don't sleep anymore.

Speaker 1: 1:13:21

That's kind of or like that.

Speaker 3: 1:13:23

But it's just like I enjoy this so much that, like I just get addicted to it, you know and I also heard from uh vitaly, and I think maybe also in one of your podcasts, that it's nice to be able to have someone that you can discuss your code with. Yeah, like, typically you work by yourself and you're the one working on this feature and you just have this project you're working on, or something, or this task. You have nobody to discuss it, or you would have to start explaining to a colleague when you're stuck, but yet they also have their own tasks to compete, so they don't have all this time for you to to listen. Um, so it's nice to be able to talk with someone else even if it's virtual agents about what you're doing, and I think that's maybe the addictive part, that's, you don't feel alone anymore, or something no, that's not like you don't play a sad song.

Speaker 1: 1:14:17

Do we have any sad tunes? Just uh, I don't know.

Speaker 3: 1:14:20

Thank, you, I, I don't feel alone anymore but, like you, typically have the, the rubber duct which is popular in course right behind you as well.

Speaker 1: 1:14:31

So because it's known that if you explain your problem to the duck even, or to another person, you will proceed in your understanding of of your task and you might get unblocked or something yeah, I think this is like rubber duck, but on steroids yeah because you actually have something that says something coherent to you yeah and I think sometimes what happened to me as well was like, even if the ai was off, he's like yeah, that's definitely not it, but it pointed to a part of the code that I didn't think about. So maybe, like you know, like just because it doesn't give you the right answer doesn't mean that it doesn't give anything useful yeah, right, so these as well, like yeah indeed, so I think it's uh yeah, and also the other thing is these models have been getting much better in the past like year.

Speaker 2: 1:15:17

Like the difference is crazy but there is kind of like a threshold that how good they can get.

Speaker 1: 1:15:21

For sure, yeah, but I think now what we see is that the tooling around it is getting much better, right. So I was even showing, like again before we started recording, like if you work on cloud code, they actually have like a markdown file, basically kind of system prompted cloudmd yes, exactly, and um, if you like. They also kind of, if you say, do this, this and this, they actually, I think, they update that file or another one with like to do's, so they kind of simulate memory like this. So it's like they have like a to do with like a whole bunch of check boxes and then they kind of check it like this. So it's like they have like a to-do with like a whole bunch of checkboxes and then they kind of check it off, so like he keeps track of all the tasks. I know that, clarko, they spin different shells within your terminal, so it's like they kind of have like multi-agents kind of thing, but it's still sequential, right, it's still need to do A, b and C. So the tooling around is getting much better as well.

Speaker 3: 1:16:10

Right, which there's an argument that you have to take away your um possibility to go on coffee breaks or your multiple tasks. It's it's still too slow to have a rapid interaction with it.

Speaker 1: 1:16:24

Indeed, and that's why I feel like maybe if you just scale it like, just get four of those now, maybe, maybe there'll be enough. And what about your token usage? Because no, I haven't gotten that yet. Like I just had this idea, I was like, oh yeah, maybe I can do this, you know, because I think like cloud is still cloud, code can still be quite expensive.

Speaker 1: 1:16:41

I have heard stories about people like I think you need to have the like. So, yeah, you can explode it if you enable, like, the usage based pricing. But if you just have the subscription then it just runs out and it says, um, come back in a few hours. But I think you can like and I haven't. Again, I never reached the limit yet because, like I said, I was still multitasking, I wasn't with two agents or anything, but I would expect that at least like two, three hours you can, you can, you can spend two, three hours there actually, if you have two of those agents, you can have concurrency issues and you have new races uh, what do you mean, like for example like if you say to to one agent, yeah, fix this code, and you still do the rsg agent and the same apple write code yeah, but that's what I was like I was like, is he raising the other one?

Speaker 1: 1:17:25

that's why I was looking. I was like, okay, let's uh, let's scope into branches. That's why I like the git work there like work directory, like you can have two branches on the same machine. And then it's like really, yeah, you can actually open, like the way when I looked it up git work there and then you can specify the like a new file directory path and then you create like a branch or like a version of a branch in that directory and then you can have like two, two things open on two different branches okay, but that means that still, those two tasks that you divide the agents to, they cannot have any intersection.

Speaker 1: 1:17:56

And what they have to do ideally, ideally not ideally not right but, like I was thinking, it's the same thing as working with developers. If you have two people working on two features, conflicts, exactly, you can have a much conflict.

Speaker 2: 1:18:05

But those are two features. Features are by definition different, but sometimes there's an overlap right.

Speaker 1: 1:18:09

Sometimes you touch the and that's the thing. Like my idea is to also ask claude like, hey, we want to do this, can we give me steps? How much of this can we parallelize, and then just try to spin it around and just try to see if we can work like this, so it's like it's a sounds fun you know, indeed. Then let's see how it goes um cool, uh any. I know we talked a lot about the AI and I think every podcast we end up every end.

Speaker 1: 1:18:40

I can imagine it is what it is. Do you have any other comments, any other things you would like to share? Maybe? I have a question SQL Bytes 2026. Would you recommend to someone that is considering going? Depends on the profile. I don't think I will for you. Like for like. What kind of profiles is the best for sql bytes um?

Speaker 2: 1:18:59

sql bits from data engineers to dbas.

Speaker 3: 1:19:04

Okay, let's say that spectrum okay yeah, I would agree that it's quite a large spectrum. In the end, everyone working with data um. Data science is indeed a bit underrepresented, but maybe they want to improve it in the end maybe after they saw your talk to you, everyone's like, whoa, this is the place to be.

Speaker 2: 1:19:24

Yeah, the next year yeah, let's hope those 20 people they really spread the word yeah yeah, super spreaders.

Speaker 3: 1:19:32

But it's a really nice event because it's really focused on um they, like you, saw the talk on a few talks grounded in reality. It's because it's focusing on talks that tell actual experience. People uh, talk about things they did. It's not sales pitches or marketing talks or something. It's really like this is what I did, this is how it helped me or this is how I got lost with this and this is my conclusion. And you learn a lot. And they also even have these talks which are non-technical.

Speaker 3: 1:20:04

I really liked one talk from billion di on how to ask questions which is something we all have to do in our job quite often yeah and he didn't mention anything technical, but just still very interesting. So I like the conference because of the talks that they select are typically something you will learn from and you're not there to get overwhelmed with lots of marketing pitches.

Speaker 1: 1:20:27

Very cool. Sounds like a nice vibe. Sounds like it's polished, let's say, because there's still like some sponsor stuff, but not like sales salesy or to corporate or something like this. It sounds like it's a nice thing, yeah, very cool.

Speaker 2: 1:20:42

Yeah, also like um, I will actually change my word. I recommend it also to data scientists to change their perspective a little bit and so that we don't live in silos anymore that is true.

Speaker 1: 1:20:51

That's also the point. Yeah, all of together like, yeah, like unity, let's be a unity catalog you need to you're Jason, you're CSV, you're Paquet.

Speaker 2: 1:21:00

We can all live together let's just be friends.

Speaker 1: 1:21:02

That's it indeed, alrighty, dorian Sam. Thank you very much. I had a great time. Yeah, me too, thank you.

Speaker 3: 1:21:13

You have taste.

Speaker 2: 1:21:14

In a way that's meaningful to software people.

Speaker 3: 1:21:18

Hello, I'm Bill Gates. I would recommend TypeScript.