DataTopics: All Things Data, AI & Tech
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
DataTopics: All Things Data, AI & Tech
#89 SQLBits Unfiltered: dbt in Fabric, MLOps in Action & Copilot in Question
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
In this episode, we're joined by Sam Debruyn and Dorian Van den Heede who reflect on their talks at SQL Bits 2025 and dive into the technical content they presented. Sam walks through how dbt integrates with Microsoft Fabric, explaining how it improves lakehouse and warehouse workflows by adding modularity, testing, and documentation to SQL development. He also touches on Fusion’s SQL optimization features and how it compares to tools like SQLMesh.
Dorian shares his MLOps demo, which simulates beating football bookmakers using historical data,nshowing how to build a full pipeline with Azure ML, from feature engineering to model deployment. They discuss the role of Python modeling in dbt, orchestration with Azure ML, and the practical challenges of implementing MLOps in real-world scenarios.
Toward the end, they explore how AI tools like Copilot are changing the way engineers learn and debug code, raising questions about explainability, skill development, and the future of junior roles in tech.
It’s rich conversation covering dbt, MLOps, Python, Azure ML, and the evolving role of AI in engineering.
Introduction to Data Topics
Speaker 1you have. Hello and welcome to data topics, your casual corner of the web. We discuss all about sql bits. Uh, my name is morillo. I'll be hosting you today. Behind the screen is always alex. Hey, alex, hello, okay, and I'm happy I'm joined by two, two special guests. So we have a returning guest, sam, hello, hello, and we have a first time. Is this the first time on a podcast ever?
Speaker 2No.
Speaker 1I have done a podcast before.
Speaker 2Which podcast the Iron Orton from the TET? Ah, really, ah, that's true for the yeah it's from Contest and now I need to look up what the name was for the other one. No, it doesn't really matter. Now I need to look up what the name was for the other one.
Speaker 1It's okay, it's okay, it's okay, but uh, she's a podcast veteran already. Dorian, can you get a applause for them? Actually, there we go. It's been a while. She doesn't use this for the serious one, so it's like she doesn't know.
Speaker 3Um, maybe, sam, can you introduce yourself for people that don't know you yet yeah, uh, I've been working in data which now for about five years and a half or something. Uh, today I mostly work as a data and cloud architect for our customers, designing and implementing data platforms. Um, they are very hands-on, uh, yeah, and I mostly focus on the microsoft stack, which, what you'll notice also during this podcast today cool you also.
Speaker 3You have a special relationship with microsoft yeah, I'm a microsoft mvp, so most valuable professional uh means that I share lots of things in the community, like these kinds of things, but also blog posts. I talk a lot at conferences and meetups, um, yeah, so that's a bit uh it's. I do the same thing for dbt actually. Uh, I also, for dbt, have the community awards, um, and I need a belgian dbt meetup group cool there's no dbt mvp, I guess not yet.
Speaker 1Not yet, but if there was one, I know who got my vote. We'll see, we'll see.
Speaker 2I'm actually got the community awards you got a community award.
Speaker 1Ah, you got the community. Yeah, there you go, like last year, no, not already two, three years ago oh really, wow, time flies.
Speaker 3Yeah, they are looking maybe at some kind of program.
Speaker 1So okay, cool, wow.
Speaker 3Yeah, I feel like, even if there's no official title, I think within dbt a lot of people know you yeah, yeah, since uh I found out the meter group in belgium and uh I think we're almost at the next meetup is going to be number 12. Each time a good amount of people joining the meetups, um, with lots of things in the open source community there as well. Again, also for the uh relationship with microsoft there I I developed uh, the dbk, that's for all microsoft's databases, together with other people.
Speaker 3Um yeah, so lots of people know me, and if they don't know me they are. Maybe I'm using my code.
Speaker 1So yeah, and whenever you walk in a room, everyone sees you because you're pretty tall. Yeah, that's also make an entrance cool. And uh, dorian, for people that don't know you yet, would you?
Understanding DBT Fundamentals
Speaker 2yeah, so I'm dorian. I'm she learning engineer at data roots for seven years and a half now. Um, one of the ogs, yeah, one of the ogs. Things have changed but I'm still here. I'm glad to be here, and the first time in the in the data roots um podcast, so that's fun. Yeah, I'm an ai tech lead, um staffed at the client right now within the event business and aside aside of that, of not aside of that I'm also a DBT practitioner at the client and we use DBT mainly for our machine learning platform. So it's kind of a different take than a typical data architect type of way.
Speaker 2So or data engineer or analytics engineer.
Speaker 1And maybe so if someone is listening to this and they never heard of dbt and they think is the the therapy? What is dbt? What's?
Speaker 2the lowercase therapy. I don't know that one. What is the lowercase? Dbt stands for data build tool and it's a transformation workflow for data and, in one liner, it's a sql templating engine that runs your sql in order anything you want to add to that sam?
Speaker 1yeah, sql and steroids are basically yeah, bringing software engineering best practices into the world of data yeah, yeah, indeed, it's really cool and also, like we're saying sql, but I think on, well, I don't want to jump the gun, but it's not just SQL, right, you can also have some Python transformations there.
Speaker 2No, Yep, I would want to say that it's quite new, but already it's also three years old.
Speaker 1Yeah, data standards Very, very cool. And also, did you have something with DVC? Because I remember you had the plush toy from DVC. I have the toy, you have the toy, but you don't have a special.
Speaker 2No, I just won a DVC competition. It was a simple quiz and I got a nice. No, don't say simple quiz.
Speaker 1It was like a super hard quiz. It was a super hard quiz, man. It was like so many steps. It's like only the best of the best got to the end. And you were. You were the one, only the best of the best that received a toy. So that's true, that's it then, game over. Yeah, um, very cool. So I thought about you both because you both actually conference speakers. Yeah, right, uh, actually you spoke in multiple conferences already, both of you and maybe actually sam, like you.
Speaker 3Wanna just give some few highlights like of all the conferences or the conferences, yeah, and then we can. Uh, I mostly speak at microsoft's focus conferences or data focus conferences, um, lots of talks about fabric, dbt, with fabric, about dbt itself, um, I've been doing that for the last few years, uh, but I've always been into the community speaking thing with meetups and conferences I think for almost a decade now, so it's something I could do. I travel a lot people, seem, all over the world, mostly Europe, but but all the places on yeah not like you were, like Japan is like still bucket list song.
Speaker 1Yeah, it was one time. It was pretty cool, it was nice, and you as well, not Dorian?
Speaker 2Before, music generation was indistinguishable from human music generation. I did some talks on that and also created AI music. But yeah, you can listen on Spotify, it's recognizable. Yeah, it doesn't pass the turing test like the velvet sundown yeah and aside of that I have also done dbt and python and machine learning talks at sql, bits at pydata and the dbt meets up as well. I was going to answer that, yeah and most notably with the best, the best of the best number three.
Speaker 3They were number three, wow okay, one of the first few oh, wow.
Speaker 1Well, it feels like it wasn't that long ago, but when you say three and everyone, 12, like it was already a while ago, yeah because you only do four or five per year. So wow, very cool, so ask your bits um sequel bits. Oh sorry, sequel bits. I was having this discussion with a cloud bits let's go bits, sequel, bits, sequel bits. Yeah, what is it? Well, how would you say it?
Speaker 3I think most people there like the organizers at sequel bits, sequel beats but. I also had people heard. I also heard people saying ask girl bits yeah, yeah, but I think sequel bits are.
Speaker 1What do you think, alex, what sounds nicer SQL? Yeah, sounds like a, like a sequel, you know, like a sequence of things like a you remind me of a squirrel as well squirrel. Okay, yeah, squirrel bits. So that's the last talk. Is that the last conference you guys, you guys presented? No, it was in well 2025, I think he was in uh june june, just june, yeah end of june, end of june.
Speaker 1Um, yes, so this is already the I'm putting on the screen here for people that are curious about the next, next year's already. They're already announcing it. What is sql bits? What is the conference about?
Speaker 3well, it's a conference, like the name implies, for people who work in data. They are known to focus always on the microsoft stack, so it's people working with data, microsoft stack, but it's not only data analytics. So you also have lots of people there who are like dba kind of profiles, working with sql server. Not even they don't have to work in the cloud, it can also just be sql server, on-prem um also no sql kind of databases like cosmos db or something. But the one thing everyone has in common there is they work with data in some kind of way and everyone understands SQL.
Speaker 1But is it Microsoft exclusive or is it like most of the talks? Are Microsoft or is there like a?
Speaker 3formal? I don't think they. So they even have official sponsors for Microsoft but also Microsoft speakers there, and the conference is known to always have like big announcements as well. In the Microsoft data world, like, microsoft typically has their own conferences built and Ignite in May and October, and now you have Fabcom for Microsoft Fabric. But SQL Bits also has some announcements in this year as well. Well, I don't remember exactly which ones because they weren't really uh applying to what?
Speaker 3yeah to what I do and so on. Uh, but yeah, I wouldn't expect talks there on google cloud or adbs or something, although adbs was a sponsor because they also run sql server on aws okay, interesting there was a bit of a data breaks following as well, but much less really
Speaker 1mostly microsoft and I heard that you're. There was a story and I just wanted to. We didn't talk much before we started recording that when you got there during you didn't know what to expect. Is this, this? There's something to it, or?
Speaker 2there is something to it because beforehand I saw the to it, because beforehand I saw the talks and my talk was entirely different than all the other talks. I was accepted here, yeah, but the SQL bit I also had in my talk, so that's why I got accepted.
Speaker 1Okay, okay, very cool, very cool. So, and maybe before we start talking about your talks individually, anything that really stood out. So you mentioned there was an announcement from microsoft, but uh, yeah, like nothing that really touches, like any, any, any vibes, anything. How was the conference?
Speaker 2it was a laid-back atmosphere, yeah, and also a bit of a fun atmosphere, so they really wanted to keep things light. Yeah, um, it was showcased by the party. Uh, friday night ah, there's a big party for tonight yes, and the party was organized in kind of a gaming hall so you could go carting, there, you got car rings yeah, like you know, usually when you go to a game hall and you see a game that you want to play, yeah, you have to wait for 30 minutes hall and you see a game that you want to play.
Speaker 1Yeah, you have to wait for 30 minutes, oh wow except the carting. That was a long waiting. Yeah, I can't imagine that was a popular item. Yeah, oh, very cool. And this is the first time you both went to this conference, or no?
Speaker 3yes okay I heard about it because I think one of the biggest fans in europe uh around data, but never went before okay nice.
Speaker 1Yeah, I've went to a few conferences as well and like sometimes you, you get the different vibes like some of them are more formal, more even salesy, like even comes a bit on the on the talks. Uh, some of them are like super, like developer, like long hair flip-flops, you know, like guys are like you know, so you get like it's, it's, it's interesting to see the yeah, like here on on friday night, for example.
Speaker 2The team there's a team every year. Yeah, this year's team was neon. Yeah, that's what they're sharing there, so some people were entirely dressed in fluorescent colors. Oh, really cool at the party I had something yellow, but it wasn't that fluorescent it was coincidental, you just know, it was the shiniest thing I could find in my wardrobe.
Speaker 1Okay, yeah, very cool, very cool. So now on to the talks. So maybe starting with you, sam. What did you talk about?
Speaker 3So it was literally about how DBbt makes working with your fabric leg houses and warehouses better. Um, and this is really combining my two passions into one thing. Basically, since the I have like two versions, variants of this talk that I often give. I give one version to more dbt minded people who don't know fabric about. Fabric is a great data platform, but here everyone at that conference already heard about fabrics and it's a microsoft focus conference, so there it's more about what is dbt and what can it do for you.
Speaker 1I see and I see it's a beginner, so it's more like introductory introductory like what, why, why would you care about this? And yeah, okay, very cool. Did you attend his talk, dorian? Yes, I did any feedback for him, did you?
Speaker 2let's do it now live it was great like people were sitting next to me and they were like oh wow, did you hype him up like whoa?
Speaker 1this is the MVP for Microsoft, man.
Speaker 2I did, I did did you get his picture already probably at the end he had to sign, yeah, yeah.
Speaker 1So everyone was like oh, so nice on a lot of inappropriate places.
Speaker 2Oh wow, okay, okay mostly male audiences.
Speaker 1Yeah, as usually it is with this kind of conference. That's right, um, okay, cool. So, and then, like so, double clicking a bit like you, you just kind of do an introduction to dbt. Do you also show a bit like a bit of code, like, um, how to get started with dbt? How would it look like? Open the id, show some things, or yeah, I, I think for dbt.
Speaker 3Uh, so I also do for for for my job, lots of workshops at customers and and these kinds of introductory workshops to teach them how to rocket dbt. And what really for me drives it home each time is if you show them really how you can use it and what it can do and how simple it is to accomplish something and then get all the extra benefits that dbt gives. Like you just create some sql code, select statement and then from that statement you can generate a documentation easily, add tests and so on. Um, because, yeah, as a data professional, you lots and lots of pitches of data tools that can help you in some kind of way, but like the only way to actually believe it is seeing it's how, how easy it actually is.
Speaker 1So yeah there's always a combination of slides and a bit of hands-on that's really nice, yeah, I think, for I mean years ago as well, the first time the dbt clicked for me, I was at a client and we had a data analyst and he was kind of doing the same thing that dbt's would take care for you, but by hand, you know.
Speaker 1Like he said, ah, we need this table, and then he would go, he would execute this query, then execute this query, then execute this query, and then kind of say, okay, now we have this right. But sometimes it gets really complex. Sometimes you have some logic, some like conditional stuff, and then I was like, ah, dbt really solves exactly this with plus, plus, like you said, like you have the documentation, you have all these things. And maybe for the people also that don't know, dbt, um, there's dbt labs and there's also dbt core yeah, and now fusion as well, and now studio and clouds can you, can you uh, explain a bit for people that don't know anything about, like get confused with all these terms like what should people know on?
Speaker 3like this intro yeah, so dbt by itself, uh, when people refer to it is mostly the, the open source tool named dbt core. Uh, this is something available on github and you install it as a python package and, from then on, you have a cli command so you can call dbt something, something, and the most common command you run is dbt run. It takes your sql code, runs it basically on your data warehouse, so it doesn't do any transformations or something. It's. It's basically a thing that reads codes and sends to the data warehouse, so it doesn't do any transformations or something. It's. It's basically a thing that reads codes and sends data warehouse. But then how it does all of that. This is where the magic comes in.
Speaker 3And uh, dpt labs, previously known as fishtown analytics, is a consultancy company that originally built tbt because they saw a common need for this kind of tool at their clients and they saw that this tool by itself became popular in in like 2020 was the first time it really picked up, and from there on, they started growing dbt core as a product. At some point, dbt stopped with their consultancy business for most of the time, they still do it, but I don't think it's that much really uh and they built a dbt cloud product as uh, the source of income, so this has extra features, extra bells and whistles, things that you might need if you're working dbt, things that most teams actually need when they work dbt. To avoid that you have to build it yourself each time, um and the fusion thing is very new.
Speaker 3No, yeah, since uh still the name is like since may yeah yeah, and what is this fusion thing?
Speaker 1I know it's in rust because that much I read I was like, nah, I know everything I needed to know. So it's fast.
Speaker 3Yeah, lightning fast yeah, yeah, they really in the presentation. Uh, they used ferraris to uh showcase how fast it is.
Speaker 2Wow, yeah, it was all about ferraris oh wow, there were actual ferraris on the podium no, it was a live uh screencast.
Speaker 1Uh, but everywhere in the slides they were using ferrari emojis and so on as well, and images was like was it they actually use the ferrari brand or was like red cars red sports cars, oh yeah. Red sports cars, oh yeah red sports cars.
Speaker 3Yeah, and they called it ferraris ah, they did so, okay, I was just wondering if they could you know, because it's like I am now ferrari might come back and exactly.
Speaker 1Well, yeah, exactly how the money that they make, well, I think, yeah, I don't know, with the whole open ai stuff, training all the data, people, okay, I don't know if it's's well.
Speaker 2What I mainly understood about the fusion is that it's a side of being faster because it's written in rust and not in python anymore. It also allows you to compile um, your sql code and understand it, while before it was just yeah, templated so if you got out of it, then you first need to compile and then check to have syntax errors or not. Yeah, yeah but now you would really get it, uh, while developing. So it would also enhance the developing experience.
Speaker 1So I have like the, the hints right on your id if you're saying like this comes from this and this comes from that. I think I also saw. No, I did read more than this, just rest. But I also saw that they had like, um, like, even like column uh lineage as well. So it's like not only like this feature comes from this table, but this feature depends on these three columns of this table and which depends on these columns in this table, so it's like it's really fine-grained as well yeah, they basically.
SQL Bits Conference Overview
Speaker 3Uh, you have this competitor to dbt, sql mesh, which already was able to interpret your sql code, and then, around the second half of last year, a company became more and more known as df, which developed basically the same kind of tool for me but people are going to be angry if I describe it like this. But yeah, let's go with it.
Speaker 3Um, the same kind of thing, but then in rust and really 100 compatible with dbt, like dropping replacements like you just would down to sdf binary type sdf something it would do the same thing as a dbt, but then better, because understood, your sql can also say instead of executing all of this and this and this, you only need to execute these small parts of pieces of codes with more filtering applied to it. In the end, I think they said they could reduce your data warehouse costs by at least 10% and in some cases even 70% or something, because it just knows better what it has to execute exactly. It couldn't do that without understanding the SQL.
Speaker 1And now the dbt fusion fusion. It's not open source, that's proprietary.
Speaker 3It is, it is. It is open source, yeah well, partially. Yeah, okay, partially so like because russ compiles do a binary. They have some pieces which are closed source, but most of it is open source okay, okay.
Speaker 1what are the things that are open source? Do you know they're closed source, do you? Okay? What are the things that are open source? They're closed source, do you know? Like, what are the features that you're missing if you just use the open source DDBT Fusion?
Speaker 3It's not fully known yet, since they only open source a very small part of it and the plan is to open source the remaining parts by DBT Code last October. So by then we will really see what is really open source but what is definitely going to stay closed source. There is some kind of license check in it that's going to see if you bought a license for fusion or not, because most features are free. But at some point I might introduce things which are paid add-ons and then yeah, I see, okay, very cool, very interesting.
Speaker 2And then in your talk, dorian um well, one feature that dp fusion is still missing is the python modeling yes, so we haven't talked about that yet.
Speaker 2I will not shift, that's fast to dbt fusion because of it. Um, like, yeah, this python modeling. It's something that started, as I said, three years ago, and starting from version 1.3 is that, next to sql models, sql transformations. Dbt core also allowed python transformations and then, depending on the infrastructure that you're working on, it could either be snowflake, or it could be bigquery, or it could be databricks. It would basically, under the hood, run a pi spark job.
Speaker 1Yeah, and then that allows you to combine transformations between both sql and python, and it really made it more and more expressive so the way I understand and correct me if I'm wrong is like you have the dbt, which is like a template engines for sql. Normally that's how it's starting right, so you'd always run it still on top of the engine, which could be um, like you said, snowflake, bigquery, etc. Etc. Could be also like a, because any cb or something, anything boostgrass and then now.
Speaker 1So everything like every query would well, that's what, that's what they call the model. But then now you can also express these things in not sequel logic in python, and then for your dbt project it just looks like it's another model there. Yeah, but behind the hood it actually will run it in a python, because you cannot run python on on duckdb, you cannot run python on a sql database, but then you would kind of spin up the resources that you need, and then it depends a bit on where you are. I think snowflake has snow park, which is their flavor of spy spark, I guess.
Speaker 2I think on bigquery, I think it would, yeah it's also by spark drop, but it's on a data proc data proc, which is like a serverless uh can be serverless, but it can also be a managed cluster.
Speaker 1Okay, yeah, so it's not like every every engine that supports this as well. I think it depends.
Speaker 2There needs to be a counterpart yeah, it really depends on the, on the info okay for example, in the db. I think it's also possible because wdb runs locally, but then you don't have the added benefit of being scalable. Yeah, well, you do have that when you run it on clusters, especially if they're serverless yes, well, it's nice to see the bill afterwards as well, yeah, yeah, indeed.
Speaker 1Um, and how does this relate to your talk? So, as I put here on the screen your best bet effortless MLOps with dbt.
Speaker 2Yeah, real tongue twister.
Speaker 1Yeah. So what was your? It was a talk.
Speaker 2No, it wasn't a it was a talk and also with a demo. So basically, what we've also implemented at the client is a machine learning platform that supports most of the MLOps principles and most of those MLOps principles, and most of those MLOps principles can be implemented with pure SQL and Python. Basically. It's all you need if you understand the theoretical ideas behind it. And so in this talk, I show a demo of how to use MLOps, where I do exactly that. I build a simulation where we're trying to beat the bookmakers in the past, so in 2016, and we do it day by day. So, like you would usually run machine learning batch pipelines at the company and day by day, we make some bets automated and we evaluate and yeah, okay, and it contains everything.
Speaker 2It contains a feature store. It contains a prediction store, model registry what?
Speaker 1what the engine did you use for this? Was it on gcp, on snowflake? This one was on databricks databricks and in the feature store is it databricks specific or like no, like you could do exactly the same thing everywhere so basically, and again, maybe mlops for people that they heard MLOps but it's not super clear, how would this translate to MLOps?
Speaker 2Yeah, so MLOps is the set of principles that one would expect from a machining engineer to implement when they want to lift their machining model to production to be sure that it's, um, yeah, basically future proof and that it can be easily maintained and evaluated and monitored continuously, and also there's a whole set of software principles that go with it as well. And, yeah, so in this talk I show that with the main languages for data scientists, which are Python and SQL, you can basically implement most of those IDs In dbt, in dbt, and I'm forgetting now the YAML that you have to write.
Speaker 1YAML engineering? No, yaml, it's okay, use AI for that thing you just said it can be ai generated exactly so.
Speaker 1Uh, maybe is there something that you still miss in dbt to do mlops, let's say so. You say like implements most of the mlops. Things is there what like, maybe what are there still rough edges? Are there still things that you like, ah, the only thing that I would like to have, that I don't. That I think is essential. Or things is there what like, maybe what are there still rough edges? Are there still things that you're like, ah, the only thing that I would like to have, that I don't. That I think is essential, or it's a very, very important that is there something like that that we cannot do?
Speaker 2that's a great question and it does always require a bit of creativity to come up with the solution, but so far I always found a solution, so test me Okay. No, I'll trust you. Yeah, of course, like, what's lacking is the streaming, but so this is only for batch pipelines, it's not for online prediction.
Speaker 1Okay, and then would you say that Databricks is a mature environment for DBT, python, modeling, mlops.
Speaker 2I'd say so, but I just did a demo here and yeah, and also in the demo, like a lot of things that I don't do are security and so on.
Speaker 1Yeah, that's true, that's true, um, maybe so, both of your demos actually. So you both did a demo. Is this something that people could actually look at the code or try themselves or something, or no?
Speaker 3not really uh, yeah, um, I have a repository on my github. Um, that's basically, um, yeah, the full jaffle shop in dbt. So the typical example people use but then implement it. Uh, with just the minor differences you need for Microsoft Fabric and the script to just get started. It's connecting to my Azure account to download some CSE fast data and then you can basically follow all dbt principles from that repository so it's all there and people can just clone the repo and just kind of follow the step by step.
Speaker 1Are they going to publish the talks as well later? Okay, so maybe people can even clone the repo and just kind of follow the step by step maybe you can. Even I don't know are they gonna publish the talks as well later. Okay, so maybe people can even like follow the tutorial or like the the code and like listen to the talk and all these things. Yeah, and we'll add this, uh, to the show notes.
Speaker 2Yeah, the repo there, um, and yours during as well yes, also if I did an earlier version of this exact presentation already at byData, which is on YouTube, so you can check that one out and the code should be reproducible.
Speaker 1I like that. Yeah, it's a good framing.
Speaker 2It should be. Yeah, there's now a free version of Databricks. So you can test it there and use the smallest cluster types and smallest compute types and just play a bit with it.
Speaker 1Yeah, okay, very cool, okay, very cool, very cool. Now maybe on to the rest of the uh. So you attended, you presented, uh, I'm sure, both of what was well received. I know maybe for yours, dorian, did you? Did you attend his talk, sam?
Speaker 3no, I couldn't make it on saturday, only that it's on th on Thursday and Friday. But I did watch. The attendees could watch the recordings themselves already. Ah, you could already. So I watched Doyan's recording on the train back to.
Speaker 1Belgium and you had a comment on the video like, wow, he's so hot the slides are really the slides are really beautiful.
Speaker 2I thought, also your Ghibli team as well, but uh apparently, if you do in the open ai and you also generate an image it's just everything, everything is ghibli, yeah yeah, I feel like it died down a bit, but that was a big hype on the, I mean when they started it.
Speaker 3But yeah, yeah so yeah, I really liked it a lot. But I think there the thing is that the conference audience if you see the amount of people who are new to dbt because that's always the first question I ask when people attend my talk like did you use dbt, and so on here this really builds upon. You already know dbt and this is like bring it to the next level. It's. It's a sort of feature I'm working on for the fabric adapter to bring python uh model support to it and uh, um, the thing doyan presented for me is super interesting. Also has lots of questions story in the past already and how he did it and implemented also at clients. But maybe for the conference audience it was a bit like building on top of each other, yeah, yeah for yeah, applicable to everyone yeah, because I saw your talk was labeled as intermediate as well, was it?
Sam's Talk: DBT with Microsoft Fabric
Speaker 1uh, because I mean you have to know a bit about the bt to be able to talk about python models and also mlops, exactly. Also, if it's more data, did you feel like a? Uh, how do you say like the duckling?
Speaker 2you know the odd duckling there like yeah, like actually, uh, on thursday on the, I was thirsty while I was at the training data breaks, training and the. The presenter, the person in front she, asked like yeah, who's a data scientist here?
Speaker 1and there's just two people already their hands, the other one being part of the organization, throwing stuff at them. Yeah, everybody was looking like, oh, that's a data scientist. Yeah, yeah, they literally tagged them. You know, it's like no one wants to play go-kart with them, yeah, so then you were introducing yourself.
Speaker 2I was like I'm a Mops engineer. Yeah, but yeah, so, um my talk. There was not so many people in the room, but the ones that were in the room did ask a lot of questions afterwards sometimes it's nice as well, because I think you get more interaction yeah, I think you have some big rooms as well.
Speaker 1Yeah, uh, very cool. So you also attended partially the conference, at least. Um, what are other things, maybe starting with you, sam, what is some, what are? What are your?
Speaker 3maybe your favorite or one of your favorite talks there and maybe I'll put one actually like quite a lot of them, but the one that I liked a lot was, uh, the talk given by andy Cutler. He's an MVP from the UK, but about similar interests as I have, and he talked about the whole thing that's going on with the Battle of the, the table formats, the, the lake formats that you have today with delta lake, iceberg hoodie who is the latest one?
Speaker 3yeah, or if you can't cross table, like the apache x table thing, the one that merges them together, maybe that's okay. A bit late to the party. Yeah, yeah, yeah, well I don't know, but it's.
Speaker 3Is it seen as a separate format? Because that's the one we can together. So he also talked about it and, um, for me, I always uh knew about delta lakes. Since it's, I've always tried to databricks and the microsoft stack and there's the, basically the default one, while iceberg, I think, is more popular when the people using aws and snowflake and so on, and fabric, for example, supports delta lake and iceberg, uh, to read and to write only delta lake. Um, but it's interesting to. For me, they always understood like they're the same thing but implemented differently. That was what I am, was what I had in my head, but he did show some things which are different in each of every one of them and then bring it together in the end by with the cross table thing where you can, uh, by using that tool, read any of those three and also write to any of those three formats um, maybe, uh, you mentioned, some things are different, it's not just implemented differently can you maybe highlight one thing, just to make it a bit more concrete for that I would say, uh, watch andy's talk
Speaker 3on youtube that's just really, uh, very technical at the end. Uh, I also it's difficult topic to give. I wouldn't personally be able to give this, you see, level advanced even. But, and the reality is best to go from people who probably don't know about these formats and maybe only know or fcs v files or pocket files somewhere. Yeah, we didn't explaining, like, what is this concept of these lake formats, um and um, why you need them, why they bring value, and then going into any one of them explaining how they also evolved because I think one of them came from netflix and the other one also came from another big company uh, and explaining what the timeline was like. When did what happened in this term of evolution?
Speaker 1yeah, I know that. Uh, so I think for, maybe, maybe, for a lot of people, maybe it's a bit, it's a bit curious to know like there's a talk about tables. You know, it's like the people get excited about tables.
Speaker 2Yeah, so with these names, like it's not clear that it's about tables.
Speaker 1Yeah, indeed. But it's like if I, for example, tell my wife I was like, oh yeah, I went to this talk. It was really cool.
Speaker 3They talk about the different table formats. You're gonna be like, you know like, but even the names like the battle of the lake house someone not in data, think a lake house. What is it a house? Yeah, lake. And then you have this delta lake iceberg.
Speaker 1Yeah, I think. Uh, I understand, I mean I can understand a bit, because I mean I'm not, I'm not one of you.
Speaker 1Let's say but I feel like I touched a bit, like I know for a client. A lot of the stuff was in Delta Lake and they wanted to migrate everything to Iceberg because they had external tables on Snowflake and they saw that external tables performance was almost the same as the native Snowflake tables. So there was like some excitement there. But the thing is also that's also the feedback they had is like every few years there's a new format that is like better and like if you say I'm gonna migrate everything every time a new format comes, then you're always migrating stuff.
Speaker 1Yeah, you know, so it's a bit uh, yeah, it's a bit a bit challenging, right? You also need to say, okay, let's just yeah, this is not the best, but let's just live with this well, and this talk was um given um around the same time as duck lake was released.
Speaker 3I think he didn't include it in his talk, obviously, because enough time anymore. I think he did mention it, um, but I think, like the whole circle is going around again, we came from one giant database or data warehouse then to decentralize things on data lakes and maybe for before, like for the duck lake.
Speaker 1You said, yeah, what is duck lake? Because I think it's with the ducty, beat stuff.
Speaker 3No, yeah, so one of the problems you have with these lake formats is there is no central place to know which tables you have. Like every table you have is stored in a folder and then that folder contains a set of parquet files and then, depending if you use delta lake, iceberg or hoodie, it will have json files next to it and that scheme of that json file is actually what makes them different and how they can be used. And but the problem is that you need some kind of catalog which, on database, is unity catalog, on fabric it's a built-in hidden thing. Snowflake has polaris, I think, and there's lots of open source iceberg catalogs and this catalog tells your thing, that's, reading or working with the data. Look, you can find the table that we call with this schema and this table name in this location.
Speaker 3On storage, on cloud storage, which can be anywhere S3 or ADLS or something, and the problem is like these catalogs are often difficult to configure and to set up. People will build duckdb, thought, okay, let's make a database and then we store this information in this database and then we'll have a uniform way to talk to all of your data, locate wherever it is, but like if we come from one big data warehouse and then we go to storage formats on data lakes. Now we're back to a database, because the database is necessary to let us know where the data is stored like. Maybe next evolutions go back to the data.
Speaker 1It's a pendulum right, you just kind of keep swinging.
Speaker 2And I think like one thing that these formats add as well is because basically in a Parquet file if you want to do any updates, you need to read the entire Parquet file and then to rewrite, you need to rewrite the entire Parquet file.
Speaker 1It can be slow and I think a lot of these formats also try to address that input output problem and that's also like yeah, also like different gradations of like yeah and I think I mean, yeah, and we're saying this, but I think the impact is really big as well, right, like when you, when you have the amount of data we actually have today, right. So I think I think also like when I have these amount of data we actually have today, right. So I think also like when I have these discussions is also a good reminder how almost everything is really a rabbit hole, like things that you kind of take for granted, you know, but like if you really look at it and like you start dissecting it, there's so much to it.
Speaker 3Yeah, you really have to use any of these stable formats, like if you haven't heard about them and you're working with a data lake. This is your queue. You have to investigate delta lake or iceberg.
Speaker 1Skip away nobody's using you heard it here first. Yes, um, so maybe we can move on to dorian's first uh pick. And I don't think I'm saying first pick, because I don't think this was a talk, actually no, it was workshop.
Speaker 2That was indeed a day of training. Yes, I actually did my demo in data bricks, but that was my only experience with data bricks until that uh moment. So I thought like, okay, let's, let's get to know it a bit better. And that's why I attended this, because it was interesting, and it's actually there where I learned what Iceberg and Delta Lake were, because before that it was also just words. Just words, right, Just like yeah, Fancy sounding words which are hard to explain.
Speaker 1Yeah, yeah, that's the newest one, so I think that's the best one. Let's just go with that. Yeah, so the title of the workshop was Data Engineering with Databricks. How long was this workshop? From nine to five Like a beautiful workday yeah exactly, and you go home to your kids afterwards With free coffee and free lunch. Well, it's kind of like coming to the office.
Speaker 2Not a free lunch.
Speaker 1On Fridays maybe. So Data Engineering with Databricks. So, aside from Delta Lake and Iceberg, what else did you talk about this?
Speaker 2Like it started with the UI, really started from the basics and then there were like some From like zero to hero, kind of thing.
Speaker 2Yeah, and the thing that struck me the most is how Databricks is really organized around notebooks, which I found a bit funny at the same time. But yeah, notebooks are great for explaining things. So we went from start and then also a bit of pie spark, but in the end, since it was a sequel audience, everything was done in sequel in the end. Yeah, on notebooks, still on notebooks. Oh wow, yeah, what I learned and what I liked a lot was actually the unity catalog, which allows you to really open any file type very simply. Yeah, that's that's one thing that I liked about it also, that I learned finally what these uh secret words actually meant yeah, you always felt like you left out of the party yeah, it was like.
Dorian's Talk: MLOps with DBT
Speaker 2It was like getting excited about. It was like oh shit, they talk about unity, and then dalton, like I didn't know. It was like, yeah, word, word, unity, yeah togetherness, yes, and also done some transformations, some table merges, updates and so on and also learned a little bit what happens under the hood. What I found interesting is that under the hood, what happens automatically by Databricks is they optimize the way that you store your data. There's actually an optimized SQL statement in Databricks.
Speaker 1Oh wow.
Speaker 2But they actually do it under the hood when you see that the files don't have the same size when the data is queued, stuff like that. So it was really interesting and it was hands-on as well. It was with exercises.
Speaker 1That's really nice, and maybe Databricks runs on top of AWS or Azure, right? It's not like a separate cloud, I think right.
Speaker 3Or GCP as well.
Speaker 1Or GCP as well, but they use the actual hardware from another cloud provider. Is that?
Speaker 2right? Not sure. I'm not sure which cloud provider they used, because but you need one right.
Speaker 1It's not like Databricks have their own servers. Maybe, after having done this, like when would you go for databricks versus something that the actual clouds already offer?
Speaker 2that's a great question. I think what's nice and about databricks is that it's just for an engineer for data engineer or machine learning engineer just one place to work I don't need to move out of. So that's the nice part. And I think where databricks is heading right now is that they also kind of want to open it up to more data analysts and more business type of roles and they try to open it up to to that and so a bit like the other platforms as well.
Speaker 1I see, is the the breadth of services. Is it as wide as as like Azure or AWS, or is it more?
Speaker 3specialized I think it's upcoming there. So they always. They started from Spark and then they first built a whole notebook experience and Spark jobs. Then they added the Delta Lake, which is made by Databricks they are the creators of it. Delta Lake, which is made by Databricks they are the creators of it and then they had the Databricks, workflows and jobs to schedule things. So basically, you can shut down your airflow and take their thing instead. There's also some kind of dashboarding tool in Databricks if I'm not mistaken, multiple even and I think they're slowly building up their stacks, stacks that you only have one place to go to.
Speaker 3Uh, because what they usually were lacking is ingesting data into your clouds was not possible yet, but I think last year they acquired the company and now slowly integrating that technology into database itself, so it's also possible, and then I think they have to. We need to some polishing work to make it easier for less tech savvy people to start working with database. I always had the impression that it because, as a name, of focusing really on the engineers who want perfection, and then we, the, the, the best tool that can do everything they want, um, and for that it's really probably still the best tool today as well if you focus on really uh good engineering work yeah, um, and maybe one last question before we move on to the next pick uh notebooks for databricks.
Speaker 1Is this uh yay or nay?
Speaker 2I think for deployment practices. It's a bit strange because it's harder to test a notebook than to test a code. So with that, for me it's a bit of a nay, but there's also a yay side of it, of course. I also love first-proofing concepts in notebooks. It's intuitive. You can explain things along the way in a clean way.
Speaker 1For maybe workshops as well, Workshops perfect.
Speaker 2But yeah, there was also this workflow thing that you could do, and I think they announced at their summit that they were going to make it also drag and drop.
Speaker 1Ah, okay, yeah, and that's also then to make it also drag and drop.
Speaker 2Ah, okay, yeah, and that's also then based a bit around notebooks and so on. So that's a bit yeah.
Speaker 3I see. And what about you, sam? Yeah, it's a common theme. Like all the examples you see on Microsoft, fabric and Databricks are always notebook focused.
Speaker 1Even on AWS, I think.
Speaker 3A lot of examples are notebooks, notebooks I think my my next pick is really about that, about that people learn notebooks, but they then they don't know anymore that notebooks are basically something coming from making it easier for exploration, but actually you're not supposed to put this into production yeah it's really to explore your data but you have to probably write a python package to run your spark jobs. But it's good for some use cases like database. Also has these ai notebooks where you can talk natural language and then it does the magic behind the scenes to run your krabby's.
Speaker 3For that it's great. But, like for your data pipelines, don't do it yeah, I think uh lucky there's dbt on databricks.
Speaker 2Now, I didn't mention it in your workshop?
Speaker 3I was wondering because?
Speaker 2all the people.
Speaker 1She knows dbt quite well at the trainer you had yeah, um, yeah, there's even like the notebook engineers and all these things. But I think think even for Databricks, if you actually inspect the notebook artifacts, it's not like a normal Jupyter notebook. It's like there's a Python counterpart and something else. It's kind of nice Because I remember one time people said like okay, I'm working on Databricks and now I want to know a bit how to conversion these things a bit better.
Speaker 1And I know you worked a bit on this and like when I started looking at it, it's like it isn't like it shows as a notebook in the ui and everything. But if you really look at the artifacts it's very different from what they have. And I think it's like maybe people trying to do a bit of a like mutate a bit notebook, so they're a bit less like this exploration focused tool. But then in the end, like you kind of have something like super, like a bit weird, like very little people actually use it like this and some people like use for other things, like because there was, there were like plugins, some tools to work with the jupyter notebooks, which is basically json, but you cannot apply this to the, to the databricks notebooks. This was some years ago as well and I maybe miss me remembering, but I think today you can, on most platforms, export and import from and to IPyMB files.
Speaker 1Indeed. But I'm also thinking like oh yeah, maybe, maybe the IPyMB, right, but I'm wondering if, like the cell outputs and all these things like, is there a conversion? Is there something you get lost in translation? For me it was always a bit like because you see that a lot, you know Nowadays you have Marimo, which is like a different type of notebook, that you have reactive cells.
Speaker 2What does that mean? To have a reactive cell?
Speaker 1So like if you have like X is equal to two and then the following cell you have like Y is equal to X plus two, in Jupyter notebooks you just execute and then you execute and whatever is the state there, like there's nothing. But now if I change the x value in the previous cell, the bottom cell will automatically execute and it doesn't need to be in order, right, like if you execute first the last cell and then the first cell. But there's a this dependency. It will keep track of that yeah.
Speaker 2So it will keep telling you like, yeah, it's not enter and runnable?
Speaker 1yeah, exactly, but like every time. So you can turn it off but by default is on. So every time you change something, anything that depends on this will automatically execute. So there is no like hidden state or something. That's one way to try to solve it. But like, marimo is also another tool. Again, notebooks have an ecosystem, so like if you want to open it up on VS Code and you want to use your AI coding assistance, you could, and Marimo wasn't working very well. So like there's this whole, like it becomes more complicated, right, because Jupyter notebooks they have been very popular for a long time. But yes, now maybe for your second pick and I think you already hinted a bit towards it A junior data engineer's story of success and struggle wow, yeah, so I really like to talk it's an intermediate talk.
Speaker 3Huh, interestingly yeah, I get why.
Speaker 3Um, but also not too many people in the audience, I think maybe maximum 10, 15 people or something, um, which does not for me mean anything about the quality of the talk, and I really liked it a lot because it reminded me of things I faced myself.
Speaker 3I also, uh, joined data which, like I said, five and a half years ago and came from a background in software engineering and cloud engineering and so on and um, all the data engineering was new to me, but I already knew what were the best practices in software engineering. And here Amy graduated five years ago, studied AI at a university I don't remember which one and then did a job interview for a data engineer's role. She thought she wouldn't get it and then in the end, they made an offer. So she joined and she, in her talks, talks about all the things she learned along the way about data engineering, because such a broad field like you cannot ask five data engineers what is data engineering and get the same answer. It will always be something very different for everyone. Um, but she did uh go over the things like like we also discussed.
Speaker 3Like you should not just know about notebooks, but also know that you there is something called spark jobs and python packages and that you can write unit tests for these things, and you should actually test your code she was a software engineer before no she she just came out of school but she, it's impressive how fast she learned everything uh to to a very high level of understanding already and um that uh she her talk really explains the, the roadmap to becoming a data engineer today and everything she learned along those uh five years and uh lots of talks at these kinds of conferences show that people actually don't know about spark jobs and python packages and do everything in notebooks because that's the only thing that exists, they think yeah, I feel like there's a lot of uh when you see demos, or it's always notebooks as well, so I feel like people get really bombarded with notebooks.
Speaker 1I think it's also easy for you to say like, yeah, I can, it's working, let's move on to the next thing yeah, it's all like the, the practices we know from software.
Speaker 3They were forgotten quite often. Uh, she even went into things like behavior-driven testing, behavior-driven development, which is something I also learned at school and then never applied again. Um, but in when I had learned it, I also understood the value of it and I hoped that at some point I could use it in my career. But, um, these are things people never make time for and actually we we should do that a bit more. Uh, so it was a great overview of all the things you might have missed, uh, along the way. And uh, like, basically, like I told you at the end, like people would come to me today I want to become a data engineer. What do I learn?
Speaker 1I'm just going to send them this link very cool and, uh, you would say like this is more intermediate because it kind of touches on a lot of different the software engineering concepts and like, like you can maybe follow as someone that is just starting off, but for you to really appreciate how this resonates you have to have a bit of experience to see this is that I think, because it's only one hour, or a bit less even, and there are so many topics he touched upon that you really have to take your time to absorb them and look them up if you don't know them already.
Speaker 3For me, I already knew, I think, about everything that was mentioned at talk, but I saw with other people in the audience, um, that lots of things were new, like, for example, one thing she mentions is dbt, but I give an entire talk about dbt to help people understand it yeah and there is only one or two minutes, um same with data quality testing, like so that expectation, these kinds of things.
Speaker 1So because that much I think it's intermediate, because you have to be able to follow yeah, I see, I see, I see, I see sounds like a very uh, and it's a personal anecdote, let's say it's her journey it's her journey from graduating to uh.
Speaker 3I think she works in netherlands with info supports, a senior data engineer now, and her story from those five years really cool, really cool.
Speaker 1Very curious as well. Um, you didn't watch this talk, right, dorian?
Speaker 2no, no I think it was, was it on a friday?
Speaker 3I don't remember.
Data Lake Formats Battle
Speaker 2I think it was the friday, just friday I took a day off and I just watched the thumbs. Nice, um, and maybe for your second pick now, dorian automating engineering with ai, yeah, the. But the funny thing was so it was mostly all sql server and a lot of sql-based talks, but here and there it was sprinkles of lms and ai and so on yeah, and he like we're trying to stay away, but like you can't, like it's always like on the corner looking relevant as an organizer and you need to put it in a bit, yeah, and so this was a bit of like um motivational doomsday talk, as I as I would call it like, can we pause a bit of like?
Speaker 1um motivational doomsday talk, as I would call it like. Can we pause a bit like motivational doomsday? How does that sound?
Speaker 2yeah, well, so like it's actually also funny to to see kind of the difference. So my talk was at nine in the morning, right after the party. So okay, like 20 people everyone's hangover, yeah, but then okay, so like, maybe it's, maybe it's because, um, it's saturday morning 9 00 am, but then the next talk at 10 am. Uh, from I think his name was simon whiteley this guy, this talk. Yeah, like, okay, the room was packed uh he's very well known within.
Speaker 3Uh, yeah, I think I saw him before as well yes, uh, I think the most popular youtube channel about data engineering oh yeah yeah, because I mean I'm not super into I mean I know microsoft.
Speaker 1I don't follow as closely, but I think I I've seen him before as well.
Speaker 3Yeah, he's also database mvp and microsoft mvp oh, yeah, okay, yeah, cool.
Speaker 2I didn't know that, but he does seem like he. You met a legend. He was really. It really seemed like he was used to giving this type of talks and so on and being an influencer, and he started really controversially with big white letters on the screen with the black backgrounds. Data engineering is dead. Oh, really that got everyone's attention.
Speaker 1Yeah, great matthews effect. So is like so what was the talk about the engineering with ai?
Speaker 2yeah, exactly like, how should you use it? And also and that was, I think, the most interesting takeaway, which also made a lot of sense so what? What I remember from the talk, which I found the most interesting, was okay, it's clear, everybody at least use some AI in your work to enhance your efficiency and to improve your workflow and to automate the boring parts like testing and data quality tests and documentation to some degree. And he was also like. He also said like okay, um, now just use it. Everybody's going to use it and if you don't use it, you will lose your relevance. So, apply it as fast as possible, as much as possible, but with a conscious mind.
Speaker 2And he was also very interested in what was going to happen in five to ten years. Why? Because right now, all juniors, junior data engineers and it actually is nice follow up on on your talk yeah, a lot of new juniors engineers will use a lot of ai in their work, but maybe to stay up to speed, to remain like fast coders, they will just apply it To be productive, exactly. They will just apply what the machine tells them and they will run it and they might not take the time to understand what they're running and that's okay for now for quick demos and things that need to be put in production immediately, but the backslash will only come in five years or something when everybody's fixing those bugs.
Speaker 2I see, I seem like okay, what did the machine write? Yeah and there we kind of maybe a whole um like usually between five and ten years you become a senior engineer. But the juniors from today, if they apply maybe too much ai coding, will they be a senior? Will they understand what they? Would? They understand all the?
Speaker 1concepts I see, so it's like it's gonna hinder their actual learning yeah, like that they won't be, yeah, like they won't have the expertise because they didn't grind yeah, they didn't grind through it, like they didn't understand why they were doing this.
Speaker 2Um what? For what are the reasons?
Speaker 1yeah, I think the bit of schooling, yeah maybe the best, worst scenario is like maybe in five years the machines are so good that you don't care. You know, maybe it's just like in five years, it's just like hey, claude, fix it. You know, I don't care, like shut up, just do it yeah, that's possibility, indeed.
Speaker 2Yeah, how do you write new code then? How do you write new ideas? That's true. How?
Speaker 1do you?
Speaker 1create new frameworks this is like a snake that eats his own tail. You know, just ask claude to do this and then trains and just like just uh, no, it's a, it's interesting. But, like when he maybe a question as well. So the things you mentioned, like testing and documentation, uh, actually the name of the talk is Automated Engineering with AI, so not Data Engineering. But I was going to ask is it different from software engineering in general? Like, is there anything specifically about data engineering that may change with the rise of AI, or is there a new skill or something else that people may need to that data engineers will need to know about AI? And when we say AI here, we're talking about Gen AI.
Speaker 2I think it's mainly about writing DDL or schema generation and so on.
Speaker 1So this is the top of the world.
Speaker 2Those were examples that he had given in his talk Of where like okay, like this, don't do this yourself anymore had given in his talk of where like okay, like this, don't do this yourself anymore, yeah, or also maybe like grinding, performance tuning and so on. A lot of times this now happens really at the info level. For example, I gave the optimized example data bricks like you don't need to care about it anymore, those type of data engineering intricacies. They disappear more and more since the cloud architectures are already taking care for it for you yeah, and do you believe with his?
Speaker 1uh, well, not believe, but do you agree with his concern, let's say with the five to ten year horizon?
Speaker 3but I think it also helps people to learn. No, for for me, I use copilot pro all the time, so it's suggesting me all day long things in my code. And, um, I think yesterday I was coding something and I wanted to profile my code in python, which I never did before, and I just create a function like profile this function and it's auto completed the whole function for me, but it's also triggered me to look up like, okay, what are these uh functions it's calling within the python c profile library and what are they doing? Why exactly is it calling it? And I can also like highlight pieces of code and say explain this to me, and so on. For me, it's way faster to pick up new things as well that way, but maybe it's also because I've already been into this for quite a while. I don't know how junior developers, lead engineers, would approach this.
Speaker 3Maybe they would say well looks looks good, let's, let's roll with it, but I would think there are still people like me with who would have this interest of understanding exactly what was generated. It's like do you copy paste from stack overflow or do you actually read the answer?
Speaker 2you have a great reflex, huh. But I think I'm not sure everybody has the same reflex and I think also stakeholders in general they might, due to the rise of ai, they might expect, like you have ai now you're like yeah, it should be super easy to implement this right Like why is it taking so long?
Speaker 1Why is it?
Speaker 2taking you so long.
Speaker 1Yeah, yeah, and that's kind of like puts a pressure on you and like if you have like not too much time to try to understand what you're doing, if you don't get the time, skip it I think, well, I think there is one of the things the attitude of the person, I think, even in schools and stuff, right like, do you actually want to learn or just want to get it done, right? So I do think that's that's one thing, and I hope. I mean I also think it's a super good learning tool, right like, even if you, even if you said something wrong, but just to kind of be thinking critically, talk about it. One thing that I started doing more with ai is just like I have a plan, I have an idea. Just criticize my plan, just ask me questions so I can be like, oh yeah, I didn't think of that. Oh yeah, because sometimes, like it's very easy to start and be like, okay, I have it figured out. But as soon as you take the first step, you're like, oh, but actually there's this, actually there's that and so super helpful. I definitely see the value. But I think you can be like damaging when people just try to get it done. They don't care about what it is like.
Speaker 1And I also think that the other thing is, if you're a junior, maybe the amount of things you need to learn to see, like when you look at the end solution. The amount of things you actually need, we need to look up, are too much for people to actually do it, and I think for you, because you have more experience, is like it's just this. So yeah, I'll look into it. But like, if someone let's think of the talk that you mentioned, all the breadth of things that the engineers need to do, if some AI kind of codes, something that touches a bit on five or six different things, for you to really click and understand and appreciate that why we're using this and why we're not using that, I think that may be too much sometimes as well. So I'm not saying I disagree with you.
Speaker 1I do think there is a very good opportunity for learning, but I also think that the people that will benefit the most is one, to have the best attitude and two and I think it's like with any learning, yeah, like to have like a step that is a bit taller than what you're used to, so you're a bit uncomfortable, but not too tall that you cannot climb right. Like you always need to find a bit like that sweet spot. You know, like even when I have I don't know when I was learning rust and I wanted to play with projects. You know, it's like I want to do something a bit more challenging, but also something super complicated. I'm like there's no way I'm gonna do it right. So I think there's a, there's a bit of that, so I think it's it'll be interesting to see. And the other thing I would also say that now, like I think you're talking about ai assisted, but like, if you go to one step further, to vibe coding, I say there's a lot of sense, guys, yeah the word hasn't mentioned.
Speaker 1That's it. Uh, I think if you take one step further there, I also think there's a bit of a skill on how, like, how do you ask things or how do you, you know, instruct things, or what are the things you actually need to slow down, or like even the thing of okay, if you have a greenfield project, you just want to start something. Maybe you just want to ask, if you're quite like, just criticize my plan a bit, let's come up with that actual plan and not just get started with doing, because if you just get started it will go super fast. But if you don't have a good plan or if you don't think things through, it's very easy to steer off one way or another.
Speaker 3But isn't Fype calling an intermediate phase that we're going through? I really like GitHub's project and got lots of criticism as well, but in the end it's, I think, a good idea that you create an issue, you assign it to a GitHub co-opad and it starts creating a plan to implement it and in the end, after a few minutes sometimes can be a lot of minutes, but it has a pull request for you to implement that feature. And, like now it got on the first page of hacker news because microsoft was doing this and, um, they use it to write the next versionnet sdk of the framework and so on. And you really saw the engineer struggle but commenting on the issue no, you got the trunk co-pilot this and this is and this you have listened to it on. It's like do it again or do it better, and so on. No, you got it wrong again.
Databricks Workshop Experience
Speaker 3And people were upset. Like is this the next version of NET that we'll have to use to build our software? Is this AI generated code? Like can I rely on this? But like they're only doing this because you have to dog for it and improve it. But I do have faith that at some point this will become good enough.
Speaker 1Like today you can vibe court in some kind of way, uh and and get qualitative results, um, but I think we're only at the first phase of this, and for sure I do think and that's what I mean like, again, things will evolve as well, right, but for example, the the issues right, is the issue very well structured on what you need to do, or is it just saying like, give me this, because that's also very different, right? Is it a big feature or is it a small feature? Like I also in the beginning, I was just saying like, okay, I need to create an app that does first, like first like set up the database on this. Okay, now do that and like just kind of breaking it down a bit more. It also already helps, you know.
Speaker 1So I think there are a lot of things and, again, the tooling is going to get better. There's like cloud code now, so it's on your terminal. There's also a Gemini CLI. I also think there's maybe a bit of a. I actually was talking to Bart about it that he said that the way, like he likes clock code and he, but it's not because it's better than the ID for any reason, it's just because there's. It helps you focus on what matters.
Speaker 3Yeah.
Speaker 1You know, like you have like your human context, you know if you have too many files, too many things, like no, just focus on this, just like one thing at a time, you know. So I think it's like I do think there is a bit of a skill on how much detail you want to add. Also to slow down. I think sometimes it's so easy to move fast with these things that you just give like two sentences, boom something, boom something and then like you stop and like whoa, there's so much crap here, like where is this going?
Speaker 2right, it's a dopamine shot. I like I just write one sentence and I get like an entire app. Yay, exactly you know.
Speaker 1So I think that I think so, I think there's a bit. I also think there's a bit of a skill there as well, um. So, on the same time, like again going back to the, what we're discussing and the junior engineers, they may have like skills gap in the, in the the core, let's say, on what it's actually doing. I also think that we need to also learn how to leverage these tools better and like some of the stuff is going to be good, some of the stuff is not going to be good, um, but I think and I think we all agree that if you're not using this, you're gonna you're gonna fall behind I mean far behind in many different ways. Right, you're gonna be less productive, um, I think learning as well. Right, like you can learn way more things now, um, so, yeah, also, yeah, I was even mentioning before like I started playing with clot code, um, and sometimes like, yeah, sometimes I'm multitasking, right.
Speaker 1So I said clot, do this, and then it just does something and I go check this, check that and go back. No, no, you're wrong. That what I mean is this, this and this and this, okay, try, okay, go back. I was like okay, but now I'm looking, I look at the directory there's too many things, this is too complex, organize this and he does a good job, right. But then I was thinking, okay, that's because I have multiple tasks, right, if I'm just doing this, maybe I need, maybe I can open two terminals and I can open two branches on the same git and just say, okay, you do this, you do that, and then like maybe I can open four, you know, and just and I even heard a podcast the guy was saying he's just like his job and I just 34 for your brain, something like that Not really Like it's already with the current generation and TikTok scrolling that they don't have any attention span, but you're switching between tasks like every second.
Speaker 1Yeah, but it's really like maybe, and I was even looking like okay, git has like work tree things so you can actually open two branches on the same like to that and let me try to merge it afterwards and then my brain was like spinning, you know um I think you're trying to be too productive.
Speaker 3People scroll on reddit while they wait for quotes.
Speaker 1Usually you don't have to work but I just think it's like sometimes if you just have one, like you have the quad terminal and it's just a terminal, right, there's no files, you just say something and you're just like okay, okay, let me check. No, okay, you know, like, and sometimes that in between is enough for me to get bored, yeah and like, maybe I'll forget. And I think maybe if I have a bit more interaction then it would be the sweet spot for me.
Speaker 2But like doesn't cloud code before it runs commands, ask your permission. Is that? Is that the perfect.
Speaker 1I already, I already said just just do it. Just do it, I mean not for everything, but like for a lot of stuff.
Speaker 1That's the interactivity part though, yeah, this shoe. But sometimes it's like can I add a dependency? Can I run uv add? I'm like man, just add whatever you want. Like if you're gonna delete something, let me, let's talk about it, right, but for a lot of stuff, just rest and get. You can always reverse it. Yeah, but it doesn't get commit.
Speaker 1So I actually put, put it on the thing, try to commit as often as possible. I try to add pre-commit hooks as well. So it's like not too much, but like I don't even know what pre-commit hooks they have, because I put rough but I said add some linting rules but don't be too specific. You know, just add something and like it did something, you know. But like I was also wondering how much of that is like coaching, a really not even that really junior, like a developer, you know, you just go, ah, do this, and then I'll review your code after a while. And you do that, you know, and then like let's see how it's going.
Speaker 1Like there was also another article that barchetta as well. That was like they were drawing the parallel between being like a manager or being like manager of people and manager of agents. You, you know which. A lot of the times a bit the same. You have to be more specific, you have to add more context, you have to explain better what you're trying to do to really get the right output right and that really like stuck with me you know, so now instead of you leading a team of three junior developers and I'm saying kind of junior, because the knowledge that it has is actually quite advanced already.
Speaker 1Like you said, you can actually learn from the AI thing.
Speaker 1So it's not really that junior, it's just like it's like someone super knowledgeable but like it's bad at like the bigger piece usually yeah yeah, and I think sometimes too, it's bad at asking questions, right, it's like someone that is like super shy, that doesn't want to ask any questions, but has a lot of knowledge. Yeah, right. And then like, okay, do this. Okay, try that. Okay, do this, you know. Or like, let's, let's stop and let's think together Is this what we want to do? What do we want to go next? You know so I that it's gonna be not exciting. And then, yeah, you need to do it because otherwise you're falling behind, but it's like you're not doing something you enjoy, right. But I also hear a lot of people saying that they get like addicted to this, like they're doing this, and then they have to force them. They have to force themselves to stop to go eat, to go take a shower, to go to bed right, which never happened to me yet has happened with me programming, because I was like trying to solve a problem.
Speaker 2I get really sucked in, um, but I'm trying to to play more with it and that's the so you want to get to that space where you don't sleep anymore.
Speaker 1That's kind of or like that.
Speaker 3But it's just like I enjoy this so much that, like I just get addicted to it, you know and I also heard from uh vitaly, and I think maybe also in one of your podcasts, that it's nice to be able to have someone that you can discuss your code with. Yeah, like, typically you work by yourself and you're the one working on this feature and you just have this project you're working on, or something, or this task. You have nobody to discuss it, or you would have to start explaining to a colleague when you're stuck, but yet they also have their own tasks to compete, so they don't have all this time for you to to listen. Um, so it's nice to be able to talk with someone else even if it's virtual agents about what you're doing, and I think that's maybe the addictive part, that's, you don't feel alone anymore, or something no, that's not like you don't play a sad song.
Speaker 1Do we have any sad tunes? Just uh, I don't know.
Speaker 3Thank, you, I, I don't feel alone anymore but, like you, typically have the, the rubber duct which is popular in course right behind you as well.
Speaker 1So because it's known that if you explain your problem to the duck even, or to another person, you will proceed in your understanding of of your task and you might get unblocked or something yeah, I think this is like rubber duck, but on steroids yeah because you actually have something that says something coherent to you yeah and I think sometimes what happened to me as well was like, even if the ai was off, he's like yeah, that's definitely not it, but it pointed to a part of the code that I didn't think about. So maybe, like you know, like just because it doesn't give you the right answer doesn't mean that it doesn't give anything useful yeah, right, so these as well, like yeah indeed, so I think it's uh yeah, and also the other thing is these models have been getting much better in the past like year.
Speaker 2Like the difference is crazy but there is kind of like a threshold that how good they can get.
Speaker 1For sure, yeah, but I think now what we see is that the tooling around it is getting much better, right. So I was even showing, like again before we started recording, like if you work on cloud code, they actually have like a markdown file, basically kind of system prompted cloudmd yes, exactly, and um, if you like. They also kind of, if you say, do this, this and this, they actually, I think, they update that file or another one with like to do's, so they kind of simulate memory like this. So it's like they have like a to do with like a whole bunch of check boxes and then they kind of check it like this. So it's like they have like a to-do with like a whole bunch of checkboxes and then they kind of check it off, so like he keeps track of all the tasks. I know that, clarko, they spin different shells within your terminal, so it's like they kind of have like multi-agents kind of thing, but it's still sequential, right, it's still need to do A, b and C. So the tooling around is getting much better as well.
Speaker 3Right, which there's an argument that you have to take away your um possibility to go on coffee breaks or your multiple tasks. It's it's still too slow to have a rapid interaction with it.
Speaker 1Indeed, and that's why I feel like maybe if you just scale it like, just get four of those now, maybe, maybe there'll be enough. And what about your token usage? Because no, I haven't gotten that yet. Like I just had this idea, I was like, oh yeah, maybe I can do this, you know, because I think like cloud is still cloud, code can still be quite expensive.
Speaker 1I have heard stories about people like I think you need to have the like. So, yeah, you can explode it if you enable, like, the usage based pricing. But if you just have the subscription then it just runs out and it says, um, come back in a few hours. But I think you can like and I haven't. Again, I never reached the limit yet because, like I said, I was still multitasking, I wasn't with two agents or anything, but I would expect that at least like two, three hours you can, you can, you can spend two, three hours there actually, if you have two of those agents, you can have concurrency issues and you have new races uh, what do you mean, like for example like if you say to to one agent, yeah, fix this code, and you still do the rsg agent and the same apple write code yeah, but that's what I was like I was like, is he raising the other one?
Speaker 1that's why I was looking. I was like, okay, let's uh, let's scope into branches. That's why I like the git work there like work directory, like you can have two branches on the same machine. And then it's like really, yeah, you can actually open, like the way when I looked it up git work there and then you can specify the like a new file directory path and then you create like a branch or like a version of a branch in that directory and then you can have like two, two things open on two different branches okay, but that means that still, those two tasks that you divide the agents to, they cannot have any intersection.
Speaker 1And what they have to do ideally, ideally not ideally not right but, like I was thinking, it's the same thing as working with developers. If you have two people working on two features, conflicts, exactly, you can have a much conflict.
Speaker 2But those are two features. Features are by definition different, but sometimes there's an overlap right.
Speaker 1Sometimes you touch the and that's the thing. Like my idea is to also ask claude like, hey, we want to do this, can we give me steps? How much of this can we parallelize, and then just try to spin it around and just try to see if we can work like this, so it's like it's a sounds fun you know, indeed. Then let's see how it goes um cool, uh any. I know we talked a lot about the AI and I think every podcast we end up every end.
Speaker 1I can imagine it is what it is. Do you have any other comments, any other things you would like to share? Maybe? I have a question SQL Bytes 2026. Would you recommend to someone that is considering going? Depends on the profile. I don't think I will for you. Like for like. What kind of profiles is the best for sql bytes um?
Speaker 2sql bits from data engineers to dbas.
Speaker 3Okay, let's say that spectrum okay yeah, I would agree that it's quite a large spectrum. In the end, everyone working with data um. Data science is indeed a bit underrepresented, but maybe they want to improve it in the end maybe after they saw your talk to you, everyone's like, whoa, this is the place to be.
Speaker 2Yeah, the next year yeah, let's hope those 20 people they really spread the word yeah yeah, super spreaders.
Speaker 3But it's a really nice event because it's really focused on um they, like you, saw the talk on a few talks grounded in reality. It's because it's focusing on talks that tell actual experience. People uh, talk about things they did. It's not sales pitches or marketing talks or something. It's really like this is what I did, this is how it helped me or this is how I got lost with this and this is my conclusion. And you learn a lot. And they also even have these talks which are non-technical.
Speaker 3I really liked one talk from billion di on how to ask questions which is something we all have to do in our job quite often yeah and he didn't mention anything technical, but just still very interesting. So I like the conference because of the talks that they select are typically something you will learn from and you're not there to get overwhelmed with lots of marketing pitches.
Speaker 1Very cool. Sounds like a nice vibe. Sounds like it's polished, let's say, because there's still like some sponsor stuff, but not like sales salesy or to corporate or something like this. It sounds like it's a nice thing, yeah, very cool.
Junior Data Engineer's Journey
Speaker 2Yeah, also like um, I will actually change my word. I recommend it also to data scientists to change their perspective a little bit and so that we don't live in silos anymore that is true.
Speaker 1That's also the point. Yeah, all of together like, yeah, like unity, let's be a unity catalog you need to you're Jason, you're CSV, you're Paquet.
Speaker 2We can all live together let's just be friends.
Speaker 1That's it indeed, alrighty, dorian Sam. Thank you very much. I had a great time. Yeah, me too, thank you.
Speaker 3You have taste.
Speaker 2In a way that's meaningful to software people.
Speaker 3Hello, I'm Bill Gates. I would recommend TypeScript.