The Test Set by Posit
A Posit podcast for data science junkies, anomaly hunters, and those who play outside the confidence interval. Hosted by Michael Chow, with co-hosts Wes McKinney & Hadley Wickham.
The Test Set by Posit
Widgets Are Lego Bricks (and Other Things People Are Sleeping On) — with Vincent Warmerdam
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Vincent Warmerdam has been the first full-time hire at a startup, a spacey punster who accidentally got himself a job, a bartender at an Amsterdam comedy theater, and a Dutch bike tour guide — and he'll tell you all of it was career development. Now doing DevRel at Marimo, Vincent makes the case for reactive notebooks, Lego-brick widgets, and why "number go up" is not a data science strategy. Also: chickens die. The model doesn't know. This matters more than you think.
What's inside
- How a spacey pun accidentally launched Vincent's career
- Why Marimo's constraints make it better for LLMs, not just humans
- The gorilla hiding in your dataset — and why the model missed it
- Vibe coding vs. notebooks: three cells at a time as a discipline
- Widgets as Lego bricks: reusable, composable, criminally underused
- Cognitive debt, confirmation bias, and sycophantic data science
- Why natural intelligence is still, actually, a pretty good idea
Welcome to The Test Set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning. Digging into what makes them tick, plus the insights, experiments, and OMG moments that shape the field. In this episode, we interview Vincent Warmerdam, first full time hire at Miramo, founder of calmcode.io, which has some really nice tech tutorials, and someone who seems to be a man of many talents.
Vincent claims pursuing side activities rather than doubling down on tech are the killer feature of his career, and I honestly can't agree more. He started out bartending at a comedy theater. Later, practiced jokes at his other job as a tour guide. Finally, he landed a job at the tech company spaCy by delivering a well timed joke in a bar at a conference.
And this brings us to the meat of the conversation, Marimo notebooks and data analysis. At a time when AI can generate entire notebooks in seconds, Vincent argues for the opposite approach. Generate with AI a notebook with a few cells filled out, and then just stare at a chart for a couple minutes. Ultimately, get involved and be selfish in wanting to understand the data yourself.
So with that, Vincent Warmerdam. Vincent, thanks for coming on The Test Set. Yeah. Where I where do we begin?
I mean, I feel like you have such a broad set of things from your, like, open source contributions to calmcoder. Maybe you could just catch us up on just give us a broad swath of sort of the range you've of things you've been up to. Sure. I mean, one thing, it's calmcode.io.
It's not calm coder. But no worries. No worries. There's a bit of a detail. Now, I guess, like because I get the question a lot because it it is true that I've done, like, a bunch of things.
But the way that I see it is, I'm kind of open in the sense that I like to expose myself to things that just seem interesting, and then you end up doing stuff. And as long as you document it properly and, like, share it with people the right way, then you'll be known for things. So one thing that just happened with, let's say, the open source thing, my most downloaded package right now, think, is scikit-lego. It's been downloaded almost two million times now.
But it was just me playing around with scikit-learn and sort of thinking, you know, there's a bunch of stuff missing in scikit-learn. Whenever I go to a different client, there's like this one component that I always end up reusing because it didn't have Pandas support back in the day. So I had a couple of components. I did the Pandas thing, and I had a couple of components that the group buy thing.
And there were a bunch of these tricks that I had in my pocket, but I kind of felt that contributing those upstream to scikit-learn was way harder than coming up with a little package that just shared the idea. But then it turned out people started running it in production for realsies, and, like, scikit-lego was kind of a cool name. So stuck in people's memory. Then I did a conference talk about it.
It became more more popular. We got contributors, etcetera. Like, that was that was kind of the thing that happened by accident, but I was curious. I have tons of open source projects that went nowhere, just to be clear.
But this is one that just happens to stick around. And, you know, then you became then you become known as the guy who did a good talk on scikit-lego and, like, do more things, do more things. But just like with the keyboards, just like with the YouTube, just like with the career path, it is really just I show up, I do things, I blog about it, make sure it's logged, and people remember me. That's the trick.
It's not nothing planned. It just happens if you expose yourself and if you try to be a bit memorable about it. It's so it's so interesting. I mean, I don't I think I think kind of researching your work for this, this is the most rabbit holes I think I've ever gone down and just been intrigued with kinda like everyone.
Yeah. It's it's interesting to hear about, like, scikit-lego. And I know you were also at, like, Probabl and Explosion, which are really, like, open source kind of driven companies. That might need an explainer too.
So so what happened there, I was at I was at this consultancy that shall not be named because I'm still recovering. But I was attending via that consultancy this conference in Berlin. It was called spaCy IRL. I think it was like twenty eighteen, maybe twenty nineteen.
I kinda forget. But I went there because I thought, you know, I wanna learn more about these text embeddings. Something's happening there and I don't understand it. Kind of kind of feels like I should know about it more.
That's how I met Matt and Enes, the the the co founders of Explosion, co creators of spaCy. And then I was just at a bar and, you know, the bar was packed. We were all, like, sort of standing like this side by side, shoulder to shoulder, and we went to a new bar. And I went up to Enes, and I said, thank god.
This bar is way more spacey. It's like, really, really silly joke. Like, really, really silly joke. But they but they remembered me because of that.
And then, you know, the day after we did some presentations, and then Matt and Enes looked at me and said, oh, I think that guy could YouTube well. Because they were looking for a YouTuber to maybe explain spaCy, and they thought, like, this Dutch guy seems funny. And I and I told them, like, look. Sounds cool.
I've never done it before. You're gonna have to coach me a little bit, but but sure. And then those videos became pretty popular and that's kind of how I got my first job at Rasa, which was this this chatbot startup, like, from way before we did, like, LLMs, basically. And I turned out to be pretty good on the YouTube thing and the explainy thing and made some packages and open source things there that kinda took off.
And then at some point, Matt and Enes said, hey. We have funding now. Maybe it could be cool to join us. So I joined them and then did cool things there, and they were they ran out of funding.
So then I was looking for something else to do. And then turned out that scikit-lego was one of the most popular scikit-learn plugins by then. And then the scikit-learn people who started this company called Probabl said, hey, Vincent. Could you join us here maybe?
And then there, I started doing a podcast for promotional reasons, and then it turned out that there was this project called Marimo, which was pretty interesting. I interviewed the founder because we had a podcast back at Probabl that I was hosting, and then turned out that the founder was from the same town that I lived in. So I I lived in California when I was twelve, and he's also from Cupertino. He he lived like two blocks away from me.
So we hit it off, and then he came to the conclusion of, oh, we also kinda need a death row at some point when we have funding. That's again how I kinda switched. That's it's just more coincidental than you might think, but it's it's it is conscious coincidental in the sense of, oh, that sounds interesting. I'm consciously gonna move in that direction.
But that's how this all happens. Yeah. It's so interesting to hear it all snowball. And and sort of, like, in the beginning, you described it.
You were, like, punning about spaCy, but secretly, unknown to you, were, like, actually interviewing a little bit. I don't know if they were aware of it, but yes. But I don't know if either of the parties knew that they were interviewing, but it I do one bit of feedback that I have gotten quite a few times is it is useful if you can leave a memorable impression because then people remember you. There's a lot of people on the planet and a lot of people are easy to forget.
But if for some weird reason you're kind of funny, then people remember you. I was just saying to to Vincent before the call that that, you know, we met we met, you know, one time at a conference in Budapest that I was where I was giving a talk and I met this Dutch guy who's really into open source. And I think you were still working for a software consultancy around that time. And yeah.
So but you you left a you left a good impression. So when then you popped up you popped up later and I saw you went to to to do DevRel at Marimo and so I was like, oh yeah, Vincent, you know, he's he's a memorable guy. Yeah. I I think I gave my somewhat famous Pokemon lightning talk at that event.
Anyway, it doesn't matter. But but yeah, there there is something to it and like one other thing that I do think and this is like a hindsight explainer. Like, you look at your life and at some point you come to a conclusion. One thing that in hindsight turned out to be, like, the best thing ever that I did back in college, and this is a really, really random thing, but I was a bartender at a comedy theater in Amsterdam, like one of the like the oldest ones, like a really, really fancy one.
And all during college, during, like, college season, like outside of the summer holiday, I was just serving beer to, like, the most famous Dutch comedians, basically. And in the summer, I would be a Dutch tour guide. So, like, people come to Amsterdam, you're a tour guide on the bike. And I was kinda thinking, gee, it's actually kinda perfect that I can, like, practice my jokes that I kind of peek from the comedians.
Because then during summer, I just have a new group of tourists every day that I can practice on, basically. And that was also the way for me to make it fun to do that gig. So I I ended up doing that for, like, four years, basically. And then you fast forward and, like, you're the guy who's, like, really comfortable doing conference talks and you do kinda go, okay.
That's actually a skill you learned back from your college days when you had that job, basically, on the side. And it's, like, one of these little things. I also was, like, really into rowing. I did, like, semi professional rowing for a good year.
Like, little things like that do stack up if you just have a wide variety of, like, skills and things you've done. And that's something I do think some people don't do, there's a mistake in that where I'm just gonna double down on tech, that's gonna be the path forward. Mean, it could work, but you're probably gonna be more interesting if you do more side activities. At least for me, I've noticed that, hey, yeah, in hindsight, that is actually kind of the killer feature of my career at this point.
I think the tour groups too is interesting to hear, where you just have the, like, repeated chance to, like it's it's a lot like the format that, I guess, comics use where they're, like, hitting open mics. You're just like taking advantage of these groups. Well, I mean, like, another aspect of the tourism thing is, like, Amsterdam has a pretty pretty okay rich history, I suppose. It's also like interesting for that, like, get to hear all the stories.
But you but the main thing you practice with that, by the way, is not so much jokes, it's storytelling. That's that's the key thing. Because you gotta tell the stories such that the tourists don't get bored, effectively, because then you don't get tips. Yeah.
Jeez. You actually it was like a high stakes. You had something riding on it. Sweet tips.
It is nice to know that you're the guy who gets like the most tips from the entire group. Like, that's a fun game to play. When I when I asked how you wound up in your current role, you said you were backpacking through Latin America? Oh, right.
Yeah. The story starts a bit more okay. Yeah. Yeah. Yeah. I I did say that.
So let's let's go back even further. So you gotta imagine, my degree is in econometrics and operations research, so very much applied math, but very little coding. Like, they didn't really teach me any coding whatsoever. Like, they they gave you a Java course and, like, a course in, aux metrics and, like, some other language no one, writes in.
But I had to teach myself Python and R and JavaScript and all of those things. So but I wasn't sure if that was gonna be my career because you're normally supposed to be, like, an actuary or something like that, like, good money and all that, but, oh, well, I don't know. Like, data science might become a thing. It was like around the time that was kinda new.
But I had to figure out if, like, coding was gonna be a thing that I was gonna do, if that was gonna be a thing that I would enjoy doing. And then I figured, you know, I do also wanna go backpacking, But if I take some work as an independent contractor with me as I go to Buenos Aires and, like, Latin America, up to Peru, and, like, all those nice places, if I'm willing if if it if at some point it hits me that I feel more like programming than to go clubbing every evening, and that'll be a signal that such that I can say, like, okay, Vincent. Now you gotta commit to this career path a bit.
Because the there's something about this that's just that that really just tickles you. And I just noticed as much as, you know, going out, in Buenos Aires and all those lovely places, as much as amazing as the cocktails in Lima and Peru really are, I just noticed that at some point, I just felt like coding a little bit more than going out every evening. So that's where it started, like, that realization. Because then you kinda know, like, okay.
That I don't know where I'm gonna end up, but I do at least have a experience that really confirms a direction at least. Right? So that that's where, in my mind at least, the career really started. I love that you just stacked them against each other, like, to head.
Who's gonna win? I think it's, like or while I was backpacking, it just became very clear. I don't know if I was that conscious about it upfront. But at least in hindsight, this was the story I tell myself, I guess.
Yeah. It's so interesting, the whole path into tech. And now you're at Marimo. Maybe do you mind maybe tell us a little bit about Marimo and how you got involved there and and what you're doing there.
So the way this all started, you gotta imagine I was back at Probabl. It's the company that's a good chunk of the scikit learn core team is there, and I was doing DevRel stuff. And, like, one of the things you can then do is you can say, let's look at all these adjacent technologies and let's sort of try it out and, like, do it in a livestream. People find that interesting.
So, you know, Marimo was this new notebook thing. I was just kinda curious. And I tried it out on the livestream, then afterwards, I also did the podcast, got the founder on. And then I just played this game of, like, okay.
Do I find myself going back to Jupyter, or do I find myself going to Marimo? Because, you know, it's kind of up in the air. It's a new technology. It's kind of kinda scary to invest in it.
Right? And then I just kinda noticed that Marimo started to correct me. Because a thing that Marimo does that Jupyter doesn't is Marimo basically says you can only define a variable once. There can be one cell where you define the variable, and there's none of that stuff in Jupyter where you can say a is equal to one in one cell and then a is equal to two in another cell.
And depending on the order in which you run the cells, the state and memory changes, Marimo doesn't allow for any of that. So in Marimo, it's all very much reactive. You can define a variable somewhere. You can refer to it from other cells, but the moment you define a variable twice, oh, we're gonna throw errors in your face because probably something is going wrong.
And I just noticed during a livestream, something totally went wrong in Jupyter that Marimo would have instantly fixed if I would just have used Marimo. And that was the point in time where I kinda feel like, okay. It's got that one feature where it's automatically correcting me. Oh, and it also has because it's reactive, like, you change a variable in one cell, all the other cells that depend on that variable automatically update.
That is amazing if you wanna do stuff with, like, widgets and really get the browser into Python. That's also, like, a a thing you can really do. Like, oh, but hang on. I am rethinking the way I wanna work now.
There's something here. I should chase this. That's the the order of things, basically. And I can I can get a more sort of formal definition of what Marimo actually does, but that was the experience that got me towards the company?
Yeah. Yeah. It's really interesting here that it's like, Marimo introduces, like, a constraint. Like, it doesn't want you to redefine a variable, but I love when it's, like, something constrains your behavior and it just ends up really kind of, like, reshaping providing, like, a lot of nice options as a result, basically.
It's kind of like abstraction in general. That's the thing I just really dig. Like, at some point, you just understand linear algebra or calculus or something like that, and then, oh, you don't think about numbers anymore, you think about integrals, and it's a different thing, you can you can approach the topic differently. Yeah.
Mean, after, like, after the Jupyter Notebook got got really popular, I feel like there were a lot of a lot of people that that were thinking about notebooks in terms of like, the everyone's familiar with spreadsheets and like the reactive model of spreadsheets. And so, like trying to introduce those ideas into the notebook medium in a way that that made sense. And I think like, certainly in open source, like Marimo is the first, like, you know, there's some, yeah, there's there's some close source, you know, you know, source notebooks that that implement similar things, but you gotta you gotta pay for a product.
But to have like that that thing available in in in open source is really exciting. And and like what what I always tell people about Marimo is that it's it's the fastest growing Python project, I think outside of UV. Like maybe there's some admit maybe maybe there's some, like, maybe there's some AI prod maybe there's some AI projects that have become faster growing in the meantime. But if you look at like the, you know, rapid star growth, Marimo is just, you know, I think it it it it reached like a hockey stick growth pattern where I think I think UV's got like eighty thousand stars, but I don't know how many Marimo has right now, but it's like it was pretty breathtaking rapid growth for a new project.
Yeah. I think we're around twenty k or something, but the if we just look at downloads, like that's the number that I typically look at. Yeah. It's like a more real metric.
We grow by five percent a week. That's pretty that's pretty intense compounding. Yeah. It's it's it it does get intense, yes.
Wish my bank account grew at five percent a week. That would that would be that would be both amazing and scary. But the no. But like, I'm I'm also on the growth side of things.
I also I do a little bit of engineering now in the game, like, not in the engineering team, definitely more on the growth team. But, yeah, it's like, it's not every day you're in a project that grows that fast. That is definitely yeah. I I mean, scikit-learn also had that phase at some point where it just sort of grew exponentially, but now they've they're a bit more stable now.
But to sort of be in there when the growth is actually happening is definitely a fun roller coaster. I mean, I think what's interesting is that Marimo appeared and kind of co developed with modern AI and and LLMs. And so I'm interested to like, hear a little more from you on your, like, perspective on like, the role of, you know, AI and LLMs in in notebooks. And certainly like, I think it's it's clearly taken an interest from the AI world.
Like, Marimo got got acquired by CoreWeave, which is like a, you know, AI infrastructure company. And so congratulations to the to the team on on on that. But, you know, so it's clear that like there's something going on with like the AI world and so but I haven't really thought too too much about, you know, the user interface layer and how that interacts with like how people, you know, build build AI models and do do AI in twenty twenty six. So one thing I think the earliest AI feature that we had, and I could be wrong because these were these were features that were kind of already there when I joined, but we had GitHub autocomplete because that used to be free, used to be an API for it, so that was a thing that was sort of added in the front end quite early.
We also had, I think right around the time when I joined, this thing in the sidebar where you could bring your own LLM API key, and then we could do this thing where we contextualize everything that's in the notebook right now, have that be part of the prompt, and then we have this thing on the side with a copy button that lets you copy code into cells. And, like, we had those features quite early. But then a few, like, kind of lucky breaks just kind of happened. One thing that happened was that, Dylan, one of the engineers in our team, he just recognized that, hey.
Maybe it's not just the LLM so much when Claude Code happens. Maybe it's the harness that matters most. So we ended up making a linter just for Marimo, and it was really not meant for humans. It was meant for the LLM.
So the thing that we mentioned earlier, in in Marimo, you have a cell. You cannot have the same variable to be declared in two cells. That is that is one of the things that this linter will pick up, and it will do so in a way such that the text that comes out make it makes it very clear for the LLM how to fix it. So, like, that's like a thing that makes Marimo so much more useful when you're dealing with LLMs because it makes it very easy for the LLM to just correct it.
The other lucky break that we also kind of got is there's this, pretty popular, IDE called Zed. I don't know if you've you've heard about it, But they, invested a lot in this, ACP. I think it's think I think I call it a protocol. But it's basically this way such that if you're an agent like Claude Code or whatnot, that you basically have a kind of a protocol that other apps can plug into.
So Zed kinda started with that. Just to round it out, ACP is agent client protocol for anyone. Yes. Yes.
So if you're, let's say, a text editor, you, you don't have to do any nasty terminal read the text that comes out, buffer it back in. It's like an actual protocol that Claude can understand, you can interface with it. And that's something we could basically just piggyback off of, and that thing works very well. So one thing that you can do with an ACP like that, and that's a trick that we like to use in Marimo, is let's say that you have a Polars pipeline or a Pandas pipeline, but something where you start with a data frame and then you do, like, a step, like, we have logs from a server.
We add sessions, and then we filter out the bots. And the thing that comes out of that, wanna write some more Polars or SQL for. It would be amazing if the LLM could just get the schema from that data frame instead of having to try to figure out what the columns are after six of these pipeline steps, so to say. So what you can do in Marimo is you can tell the LM at and then pass in a variable, and then we can add context for that variable that's in memory right now.
And we can have different context if it's an integer, and we can have different context if it's a data frame or a SQL table. And that also helps the LM immensely when it comes to, like, writing proper pipelines. So there's all these sort of lucky breaks that we were able to just discover quite early. But, also, like, to be quite honest, if I were Jupyter right now, like, Jupyter is so used everywhere that they're also in a spot where they just can't make sudden changes to the projects.
Like, that would be kind of dangerous, especially if you consider, Jupyter doesn't just do Python, by the way. It's also wants to do R and Julia and lots of other languages. It's pretty much language agnostic. And on the Marimo side, it's a it's a whole bunch easier because we just say, no.
We only do Python. Right? So there's there's reasons why we're also able to iterate quickly. But, yeah, we just had a couple of, like, ecosystem lucky breaks, things that were just working out very well for us.
Also, like UV, for example. UV added that pep five seven six, I forget the number, but the one where you can add the dependencies on top of the Python file. Like, for for, like, UV run. Yeah.
It's like a Yeah. It's a miracle drug. It's a miracle drug. But here's thing, Marimo notebooks are Python files, so it works for us immediately for free.
That's amazing. So we can package our notebooks dependency free, basically, without the requirements of the TXT file. And that we can do that because we made the choice to have just be a Python file, and for Jupyter, it's a different story because reasons. Yeah, a lot of it we got a couple of good lucky breaks where we're able to just integrate really quickly, but also we are a pretty small team.
Engineers are super solid. We I think we make one release a week now, but it wasn't that long ago when we made three. We're trying to be a bit more stable, but I definitely tinkered a bit with the agent client protocol stuff in Marimo, and I'm I'm super excited. It I mean, it seems like it's early days and it's just rolling out, but I'm super excited to see it like integrated, and I feel like it's gonna be a real open code now as well.
So open code has that's this other CLI that they support all the open source models as well, so that was something we could also just plug in quite easily. I will say, like, one thing that took me a little bit by surprise, but it's something we're heavily investing in now, is just skills in general. So the idea there is that I even if you don't wanna use any of the AI stuff inside of Marimo, what can we do to make Claude Code or any of these command line tools as good as possible to just generate Marimo notebooks on the fly? And we're almost at the point where we can vibe code in any widget just for one analysis.
Like, we're not quite there, but I do have a couple of these examples where they just worked immediately like that. I mean, I've seen I've seen a ton of people like, people are who aren't familiar with never used Marimo before. Like, the dashboard mode is like people are building like little little apps and it's it's because already people are built using Jupyter Notebooks and like Jupyter widgets to build little dashboard like things, but to actually have like, you develop you develop the app and then you can look at it in dashboard mode and you don't see the code, you just see like the widgets, the widgets basically.
So it's it's pretty cool. I I speaking of ACP, I I just merged just just shipped, I guess, ACP support in my code review system RoboRev. So like, clearly, it's it's gotten it's gotten interest in the broader open source open source community. So like people are, you know, whenever they encounter a new project, they're like, oh, can this project do ACP?
And so it's it's definitely very interesting. So on the topic of, like, widgets and stuff that you could do in Marimo, I do wanna have, like, one plug for my colleague, Trevor. There's a standard called AnyWidget. I don't know if you've heard about it or seen it.
It's getting more ubiquitous. It's the sort of thing I'm trying to do. But there is now a standard such that if you build a widget according to this one spec, it'll work in Jupyter, it'll work in Colab, Versus Code, or Marimo, like, all the places, basically. And the way like, people are sitting on this thing and, like, sleeping on it, people should definitely invest way more in it.
There's this library that I've built where I just make all these widgets for my analysis and I share them with people. People should really do more of it. But the thing that I really love about those widgets in particular is that you can kind of start imagining that we now can create these magic LEGO bricks, if it were, such that you can actually attach browser functionality, to Python, which is something that used to be super hard. I can now honestly make notebooks that let me use the gamepad to run some Python scripts.
And for annotation reasons, for e files and stuff, there's good reasons why that makes a lot of sense. But also webcams and also, all sorts of other APIs that the browser has, you can just attach that to Python now, and you've got access to JavaScript and all the good things. So, one thing that I do try to remind people of is, it's not so much, yeah, we've got Marimo and that's great, but it's like the entire ecosystem is way more modern than it was ten years ago. And that means that you should take a step back and just reflect on, hey, what's new?
Because you can rethink things definitely at this point. It's not just LLMs, it's also the entire ecosystems been more made more modern effectively. Yeah. It's a good point too if if people are sleeping on any widget, maybe maybe worth noting too that people like Plotly are using it and because I understand, like, the adoption's pretty high across the ecosystem.
Altair also uses it now, I think. I've made this thing called draw data where you get a two d canvas and you can just draw a dataset that you can then use, like, draw and you get something into it's a polars or pandas. So the sky really is the limit here, like, the most extreme thing that I've ever made this is a really, really nerdy, nerdy thing, but for the longest time in my life, I always kind of understood differential equations, but nothing really clicked. And then I saw this one YouTube video about this differential equation called Lanchester's law, which is about when two, like, Age of Empires armies just smash into each other.
Apparently, there's a differential equation such as you can predict how many Blue Army is slightly bigger than Red Army. Can we calculate how many Blue Army soldiers survive is kind of the differential equation? And I thought, oh, it's so cool. I wonder if I could vibe code in any widget that just simulates all those battles with JavaScript because you can do sort of collision detection and stuff, and you totally can.
So I built this this collision detection battle simulator in JavaScript with Claude, that generates data for the Python notebook to, like, check if the differential equation actually holds, and it does. And again, like, you part of it is the LM. Yeah. Sure.
But, like, part of it is definitely the widget. And you just couldn't imagine this two years ago. Right? Yeah.
I I almost wonder too if it's worth I don't know if this is zooming out or in, but like almost just talking about the widget a little bit. Because I think I think the term widget's so interesting and it like I think a lot of people have some familiarity with working with widgets. But I wonder if it's almost worth talking a bit about, like, what is a widget to you? And what what does it do so well, like, for your workflow?
So I I kind of imagine this question might pop up, so and I think I've got, the perfect analogy. So I've got a kid now. Right? And I I I wanna have a creative toy for my kids, so I could do two things.
One thing I could do is I could buy a three d printer, and then you can three d print all sorts of interesting things, or I could buy the kid Legos. One of the crucial differences between the two is that if it turns out that the toy that I three d print isn't exactly what my kid likes, the only thing to do is to start from scratch and three d print a new thing from the top, like, hole again, basically. And if you then think about LEGOs, like, kind of the magic thing is that you can always just click it in and click it out. It's something that's really reusable in a lot of different ways.
And if you have a very useful widget, then there's more than one way to use it. So, okay, what kind of widgets will be widgets then? Well, it'd be kind of like a map, like a Google Maps kind of a thing inside of a Python notebook and kind of do GeoJSON stuff maybe or a drawing data thing that can have lots of educational use cases for it. But the nice thing about a widget is that it's reusable in different ways and that it can click with Python and all the Python data ecosystem tools.
That's kind of the way that I think about it. And if I think about vibe coding in particular, vibe coding to me is a lot more like the three d printer where you're gonna three d print the thing kind of from scratch for, like, one specific purpose. But it once you've got something done, it's not necessarily the case that you can retrofit it to immediately attach into a Python notebook as well. Like, that's not the way that you would think about it.
So widgets are like Legos. That's that's really the way that I like to think about it. And I guess just to kind of finish out the last piece too is, like, when do you see someone wanting to be in a notebook? Like, you can have a widget in a notebook.
When do you kind of put yourself in the notebook with widgets versus kind of run like, people could vibe code a React app. Do you have a sense for when they'd kind of wanna be in a notebook with a widget? To me, it's when the human's involved a little bit more. So, like, if you have a, oh, people have to use the app and, like, the innards don't really matter, you know, the React app thing sounds a bit more plausible.
But when dealing with data, I mean, usually, the whole problem with data science is that you wanna understand the story that's in the dataset. And if you understand it, then you can make better decisions. That's kind of the the plan that we have. But if but in order for you to understand the data, the human has to be involved somehow.
Like, you have to be able to read a chart or, like, inspect or what what whatnot. So the whole point of these widgets is to make that part easier. Again, it's a LEGO brick that you can kinda click in such that, hopefully, something becomes more interactive and analysis or whatnot so the human can actually kinda play around, swap things out. That that's that's the game plan here.
And, like, I wouldn't do a data analysis with a React app, I think. I would still prefer to do it in Notebook because it's more like a scratch pad. You don't know like, you have to be very flexible. That's that's why we like Notebooks.
One of the, I mean, one of the big questions on on my mind these days, and I'm obviously interested in what you think about it, is that in in a world where people essentially stop writing code and they're just prompting, like what do you think, just looking at the current stack of tools like the widgets and you know, the notebook, and I mean I think it's good that Marimo was built during in this in this era already, so like I assume that you've all been thinking thinking hard about this, but like, what do you think the average user workflow looks like in two years or three years where like, essentially nobody's writing any any Python code anymore, but we still need to create these we still need to create these business artifacts that, you know, can be published whether it's like an interactive dashboard that's a bunch of Python code and analysis as well as some widgets that get published and displayed and and maybe the artifact is a Marimo notebook.
I don't know. But but it's it's interesting that that a lot of these tools were built for users who are spending their day like looking at code, thinking about code, writing code every day, and then were essentially speeding toward a world where speeding toward a world where, like, you know, maybe people glance at the code, but they're not really looking very hard. If anything, they're gonna be asking, like, agents to, like, check the code for them rather than looking at at it themselves, which is increasing what I'm doing. Like, I'm I'm not writing code, but I'm asking agents to to read the code for me and make sure it and look for bugs and things like that.
Yeah. So like, part part of it is the psychology of it, feel like, a little bit. Right? Like, oh, we don't know.
It feels kind of scary. And and the whole thing with Claude, at least to me, is sometimes there's a wave of like, oh my god, it can do so much, but then there's a phase where I see it fail horribly, then I kind of feel grounded again. So there's there's that psychology of it is happening in the background as that that's happening in the back of my mind, just wanna be honest about it. I I don't exactly know what's gonna happen, but I think about my days in consulting, I'm like, what's the thing that really went wrong most of the time?
It's that's people just didn't know what was in the data, they were making all sorts of weird dis like, bad decisions because of it. And I don't know if the LLM's gonna do everything and if it's gonna be more or less bad decisions. But something in my mind is basically saying, like, the magic of the notebook is it's also a nice way to debug. There is something about let's make a chart that should be a summary.
And then just staring at the chart for five minutes can immediately give you this, oh, hang on. That should not happen. Why is why is that line going down in December? It should be going up.
So, like, it's a debugging thing. And in that sense, you could argue, well, the code is not necessarily the most important thing. The most important thing is the understanding. Like, the notebook is the notebook and the code are a tool in order to get to the data understanding bit, so to say.
Right? So, I don't know to what extent you need to properly understand all of the code, but you do need to be able to sort of look at a chart and say, that's bad. We have to dive in deeper. Like, that that, I think, is a skill that I don't see go away anytime soon.
The best the best example I have of this is in R, my favorite dataset, the chick weight dataset. It's just Michael's other dataset that's kinda well known. This is like yeah. It's like it has two columns.
It's built into R. Is that No. No. No. No. So there there but I'm I'm so one column is the diet, the other column is the time, the other column is the chicken, and the other column is the weight of the chicken.
Yeah, that's right. That's right. So over time, you see all these chickens gaining weight. That's the whole idea.
And you can train machine learning models on it. You can predict, given this diet, given this time, how how fat is the chicken or how not fat is the chicken. You can make all sorts of models there. You could do something very fancy with PyMC to do something Bayesian.
It works magic on that dataset. But there's one thing wrong if you're gonna do modeling that way, and that is the fact that some chickens actually die prematurely. And the only way that the model can be made aware of that is if you, the human, are aware of that. Because, if you're gonna calculate some sort of regression average on time stamp ten, how is the model gonna be aware of the fact that some chickens died at time stamp five for that specific diet?
Well, we could have an LLM automate this, but this is this is, like, my my main example of things will go wrong if you don't understand what's in the dataset, and variants of this story happen all over the place. So for me, The Notebook is basically this this environment where I can do my best to prevent stuff like that from going wrong, And, you know, it's one of Hadley's quotes that I remember. Like, the cool thing about visualization is it doesn't scale, but it can still surprise you. It's that surprising nature.
It's it's basically this debugging tool but for numeric stuff. That that's the way that I experience it. And even if the code is gonna be maybe less valuable, the debugging numeric aspect of it definitely won't be become won't become less important. So, one thing I do a lot now is I try to just reproduce academic articles with vibe coding.
Like, here's an article. Here's an LLM. I wanna understand this article as quickly as possible. And then the notebook, again, becomes this artifact where I can look at charts and I can try to understand if I understand the technique of the paper, so to say.
Right? Like, that's, in my mind, the essence of what the notebook is all about. It's not so much about the code per se. It's more this artifact that helps me think or reason, I guess, is the better word.
And if if I'm understanding your paper example, it's like you're kinda trying to reproduce, like, a place where you can kinda, like, tinker and double check and, like, tweak. I wanna check if I understand the paper and and can I reproduce the experiments from scratch and then and sort of fiddle around and see where the limits are of the claim of the paper? Like, that kind of stuff. Yeah.
Yep. Yeah. It's an interesting one. I don't know. Wes, what do you think? Yeah. I'm definitely I'm definitely pretty pretty pretty AI pill these days.
Like, I I I feel definitely this this sense especially for the for the data science ecosystem, I I feel like data science is one of the domains where the most judgment and nuance is is present. Like it isn't like building a to do list app. Like, doing data science is not the same as as build building a building as as building a crud app. And so like, the choice of models and like the decisions involved, like there's there is a certain like art and taste involved with like choosing choosing techniques and building a scientific process that tries to eliminate, you know, your predispositions or biases about like what you think the analysis should, you know, what what results the analysis should have, should have.
And what I what I've been hearing from people who are doing like vibe coded vibe coded science is that essentially the the models are kind of tuning in to what you're trying to prove or what you're trying to demonstrate. And then they're subtly like building data science that that essentially tries to prove or like fit itself to to the assertion that you're trying to make, which is essentially like amplifying your personal bias. I feel like this is exactly what we don't want. Like, don't we've like, don't want the LLMs, like, essentially doing, like, sycophantic science.
Like, oh, like, what are you trying to prove here? Like, let me come up with an analysis that that that confirms your you know, it's like a more intense version of confirmation bias, which is a classic problem in statistics and and and science in general. So, I don't know, like it's it's in in yeah, in this new world where people don't wanna, like, don't wanna write code anymore, like, I haven't written code in months. And yet, I'm building tons of software.
But I'm doing software engineering, which is a different a different sort of thing. Like, don't trust I I don't trust LLMs to add numbers, but I trust them to write Go code or write Python code to write functions that add numbers. And so it's a it's a tricky it's a it's a tricky thing. You could argue though that, I'm I'm assuming in your case, you do I mean, you've written pandas and a bunch of things, so you can tell yourself like, okay, I've got to taste.
Right? Like, there's I need this to happen in a certain way, otherwise it's b s. Like, you you can make claims like that quite comfortably, I think. Right?
As far as unit testing goes, there's like guardrails you can come up with. Right? Yeah. Right.
Right. Right. Yeah. But but yeah. But if if people are mainly interacting with data science through, like they aren't actually looking very closely at the code, they're just looking at the charts and the tables that are being outputted.
It's like and so the question is, can you effectively do data science by looking at the results? So it's like you kinda you do need to scrutinize the methodology and so maybe you can have, you know, agents in explore and and essentially create a visual explanation of what the methodology is. Like how, like, what, you know, what is the model architecture? Like, what is the data preprocessing pipeline?
What types of data transformations are being done? Like, how are outliers in the data, like data trimming and normalization, like how is that being done? And so perhaps, like, the solution will be to create a set of rules and and best practices to essentially a type of spec for for data science. Because I think right now, like specs are all the rage for building software with with LLMs.
Like you develop you work with agents to develop a spec and then you do development and periodically you you do spec conformance checks. Like, you ask agents to adversarial agents, not the one writing the code, but like, another agent wrote this code, here's the spec for what the software is supposed to do. Could you evaluate the work that's been done against the spec and let me know if it's if it's what I it's what I asked for, if what we what we intended. And so maybe maybe with with data science, having a developing a set of prompts and processes where you're enabling LLM assisted data science, but you have like essentially a series of checks and checks and balances basically to avoid, try to mitigate the effects of confirmation bias and, you know, the the model kind of sycophantically trying to, you know, change the methodology such that it produces your desired outcome, you know, because it's trying just to make you happy, basically.
It's like prove the thing that the agent that the user asked for. I mean, in my in my mind, there's a couple of very human habits that I tend to rely on, but I don't it might also depend on the kind of work that I do and the kind of notebooks that I write, but at least the thing that I taught myself a chick weight example again. Right? If I've learned anything from the chick weight dataset is that if I make a chart, I need to stop for five minutes and look at the damn thing because otherwise, you're not gonna notice the line that stops in the middle where a chicken might have died.
That that's kind of the lesson. And that also means that if you're vibe coding, okay, one chart at a time, I need to stare at the chart for a couple of minutes before I move on. I'm not gonna if I do serious work, generate entire notebooks from scratch because then you're not gonna be part of the story of what's happening there. And I don't know.
Like, some of this does feel a little bit more like an added human psychological attitude thing than necessarily LLMs can correct each other. Because, again, it's all about me trying to understand the story in the dataset. And if I remove myself from that equation, I'm gonna be in a world of hurt. I need to be involved somehow.
And maybe the solution then, at least the most plausible one to me, is let's only generate three cells at a time, not not any faster. And when you reach a milestone, make sure you understand everything that's been happening, and then you move on. And then one really cute feature in Marimo is you can have a columnar layout. So you can have this as column one, this column two, so you can have column one, and then when we have a conclusion, move on to column two and organize your thoughts that way.
But to me, that's more the way to go about it, at least in this phase. Like, maybe the future is gonna be different, but, like, for for CRUD apps, again, it's a bit different because you know that you need a model view and a controller and some CSS and stuff so that you can kinda go, okay, boilerplate, let's go generate the whole thing. But for a notebook, I think the approach of just taking in a few cells at a time just might be the best remedy. I think that resonates so hard.
And I think just to, like, recap some of the things you said, I think it's it's really relevant that you started with, like, Marimo brings a constraint to users that you can't redefine a variable, and that constraint opens up a lot of really nice behaviors. In a way, it's, like, psychological. I mean, it is a programming constraint, but it is getting people to think in a different way, and it unlocks a lot of stuff. I think to your point about this, like, psychological question, you know, what what am I doing in the loop, and and how am I approaching the problem?
Like, I I really feel that. Like, when I'm, like, vibe coding, when I'm heavily generating stuff, I do think there's, like, this softening in my brain where I really am not attentive to a lot of things. And I could I could see the notebook, at least as, like, a theory, like, being in a notebook three cells at a time as a type of constraint on yourself to try to, like, put your attention on the right things feels really like, a really interesting approach. And and one I'm pretty curious about as I like use more AI and and notice like where I put attention and where I don't.
I mean, I've reached a point with my agentic engineering essentially, where I I'm not reading any of the code. And so I my, like, my reaction to that was, well, if I don't read the code, then I should have agents read the code for me and analyze the code. And so I built, like, I built a system called Roborev to be my, like, agentic pair programmer to make sure that like all the code that's being generated out of the the LLMs is being reviewed by an agent that didn't generate it. And maybe even multiple perspectives.
And so sometimes they'll have both Gemini and Codex review code that Claude generated and they'll find different bugs, and that's been definitely helping. But I feel like at a certain point, we probably need like, essentially the the equivalent of that, that type of like, it's not a code review per se, but it's more of like a methodological review of whether the data science being done makes sense. Like, does the data visualization make sense? Is the modeling approach does the modeling approach make sense?
So essentially, like, how can we avoid confirmation bias and like, you know, the code generating agent, like, sycophantically trying to generate an analysis that is in tune with, like, what the data scientist is trying to demonstrate with their analysis. And if we can recap Vincent's sake, I almost think there are two very different takes on this that Wes is very AI focused. AI writes the code, AI reviews. Vincent, you mentioned, like, for some situations, exploring data, a sort of, like, notebook, human in the loop approach.
I could definitely be on board with the whole, like, let's have the LMs write the code if that speeds things up. Think that that could be, like, a perfectly defendable attitude. It's just that I'm kind of selfish. I wanna understand the dataset myself and if I'm being removed from that equation, then then I'm failing.
Like, even with LLMs, I still wanna be able to learn is kind of the attitude. And if it means that I don't have to do syntax, I think they could still be fine. But then I wanna understand, like, how do you do I don't know. I'm I'm doing reinforcement learning stuff with LLMs now.
Like, what are good techniques to get your own LM that does, like, a very specific thing? It's better at structured output for, like, very small LMs, let's say. Like, okay. What are what are the techniques that you could apply in order to get there?
Well, I don't want the I mean, the LM could maybe read a paper and implement something, but then I want to understand the failure scenarios. And if, you know I do wanna learn. And I don't know how useful the LLM is unless I'm able to learn along with it at this point. That's kind of my attitude more.
Yeah. I think it's it's we've been seeing like a bunch of articles, blog posts come out about like cognitive debt. Like if the LLM does everything for you, then you you don't have any mental model of the code base with the problem. And like, I think for some problems, that's fine.
But for other problems, you're like, oh, now I've got to this point where it's like it's catastrophically failed, and I've got like like, I have never read this, looked at this code base before. And and now I've gotta like build my way out from zero to that. To some extent. Right?
To what extent this is really a new thing? Because I remember back in the day, what a lot of these data science people would do is they would just hit shift enter in a Jupyter notebook, run the whole thing. They would call fit predict a few times, then they would say, look, number go up. Right?
So to some extent, it's also not, like, a really, really new thing that there's cognizant like, that some people are, quote, unquote, a little bit lazy in sort of taking a step back and sort of saying, like, oh, it's great. Everything's automated. I like to think, though, that one of the reasons why my career has progressed is because at some point, I took the effort to go just a little bit deeper such that I kinda knew more of what was happening and could apply more tricks in practice. And, sure, some of that was syntax, but some of that was also just, okay.
What am I actually doing? Am I solving the right problem? Like, that kind of a mental exercise. And, again, like, even if there's LMs around, that mental exercise is still part of me, like, I think.
There still needs to be someone that understands the dataset. One thing you said earlier, and I think you mentioned, like, one soft skill underrated in data science is staring at a chart for five minutes. For five minutes, no less. Yes.
Five minutes. Yeah. Maybe maybe that's a skill that as AI rolls out will be really hard to cultivate, like, really important, especially important to cultivate. Have you seen Bluffbench, Vincent?
That's something that Sarah and Simon put together where, like, it turns out that if you ask an LLM to, like, read a plot, it often just, reads the axis labels effectively and then tells you a story about what it believes based on the correlation of those two variables. So I think there's also, like, a lot of, like, this visual intelligence that, you know, our lens of of yet to yet to Have you heard of the gorilla dataset? Oh, it's the best. This is the best.
Okay. So there's a I think it was in Italy. There's a statistics class. They split the group in half, and, like, good students, they're equally represented in both halves.
One group gets a dataset and says, here's a couple of hypotheses we want you to check. The dataset has body mass index, number of steps, and I believe male or female. And they wanna say things like, okay. Do men have more steps?
And, like, a couple of hypotheses you gotta check. The other group basically just got the dataset, and they kinda went, do something with it. What what's the story in this dataset? If you were to plot the actual dataset, body mass index on the x axis and, steps on the other axis, there you would get, like, blue and red colors for male and female, and it'd be a picture of a gorilla waving back at you.
Now one of these two groups was more likely to discover this than the other one. Can you guess which one? It was the one that was preoccupied with the hypothesis, checks that didn't bother making the plot. So I thought as a cute exercise it was like a year year and a half ago.
I get this, ChatGPT analysis bot thing, so I figured, hey. Let's give that let's give the, the dataset to this bot, see what it makes of it. It completely failed. I made a YouTube video of it.
It was just super interesting. They're they're really bad at charts. But, again, the thing with the I think humans are also pretty bad at charts if they don't take five minutes. If you just glance at it and call it a day.
There's a lot of stuff you can glance, that's cool, but it's usually like, hey, why does a line go down there? That should never happen and things like that. Wow. I like I can't even get people to read blog posts these days, you know?
Every time send Yeah. Got that. Every time I send people like like, hey, I enjoyed this blog post, read this blog post, they're like, read. I mean, I I gotta say, like, you've been writing a bunch of it.
I've been reading your stuff recently more, I gotta say, Wes. So like, you're you're definitely doing the good work. But I've also noticed like, there's, like, these explainer blog posts that just get crickets these days because why why read a blog post when you can ask an LLM just in time? There's a little bit of that happening.
But I gotta, like, I gotta take the moment and I'm sorry for the listeners, You gotta imagine Hadley came in late. So, like, now that he's in here, I get to ask a question to him again. Hadley, do you know the chick weight dataset in R? Okay.
Okay. Then I'm gonna check if you actually understand what's wrong with it. Because you it was my favorite dataset that I used whenever I did an R course or a Pandas course even. Can you tell me what is wrong with the chick weight dataset?
Have you used it in a class or in a training or that brings back meme vague memories. It's like something it's like something is switched, like the experimental condition was switched or some of the chick labels No. So some of the chickens died prematurely. That's the that's the one.
And and that's in a specific diet, they die prematurely. So if you're gonna predict the weight at time step, like, twenty, chickens would have died at that time step, but the model has no way of knowing unless you as a human were to model that that way. And that's kind of my my main go to as far as you have to understand the dataset, otherwise stuff will go wrong in production even when number go up. I didn't know about the dataset at least.
I'm assuming Michael and Wes did not. So I tried to guess what it was, and I failed miserably, so I feel like The one thing I do think is super interesting is just if you have an interesting question and you need an answer to it I think back in the day, if I had two afternoons with a person and you knew a bit of Excel, I could teach you a bit of R, like and you would be very productive. That was, like, one of the really cool things about the diverse stack and, you know, all that stuff. For Python, it was not necessarily two days.
I needed a bit longer. But you can definitely get it down to, like, one day with an l. And now something about that is, like, super duper exciting. But I am hoping that people who are gonna go walk that path, that they are gonna use some sort of a harness or guardrail, make lots of plots, and do that kind of behavior to make sure that they don't do it wrong.
Yeah. Yeah. Do you I I'm so curious if y'all have advice for I actually just talked to a person who's planning, like, curriculum for high school students who are learning they were gonna learn, like, Python and to to deploy web apps and things like that. And I I think, originally, they were gonna do a lot of, like, teaching, like, building up what they need to know.
And now they're trying to figure out, like, to what degree should we pull AI in really early and get you, say, like, deploying something first. How like, they're asking, like, how should we sequence the instruction? Do you have thoughts on that? Like, think, like, half like, a year ago, I basically gave myself permission, like, okay.
Let's see if you can actually vibe codes two apps from scratch. You're not gonna look at the code whatsoever. See what happens. And then, eventually, I did start looking at the code, but two things happened.
First thing that happened was I was using Flask, and it wasn't and there was no CSRF protection anywhere. It does kind of feel like you should at least manually do the exercise of what could go wrong if you don't add that. Like, there's there's something about reading from there's something about, like, Bookmarks you read from a book, and there's something about StreetSmarts like, oh, I've actually pwned someone's server because there was no CSRF protection on the thing. Right?
And I don't know. Like, something I do think, like, that sniff test, something about that, It's kind of like doing a physics experiment. Something with the Bunsen burner and this professor making something explode is making it stick way more in your mind. Again, something about the story.
And just reading it from, like, a text that goes by so fast, like, and no one can barf paragraphs. Right? I fear that you're just not gonna it's not gonna be in your noggin as much. So one thing I think could be cool is that you could say, like, okay, kids.
We're all gonna build a web app now. We're gonna totally vibe code it. Then we're gonna see if we can break it. And and and, you know, that that that that might be declassed from activity, but you're also confronted with, like, the unhappy path maybe, something like that.
But the other thing that went wrong, by the way, with the LM doing the vibe coding thingy is I told it to make a makefile and to, like, test stuff, and there was, a a production database. But every time that I would do unit testing, it would flush the production database because it forget this forgot to set this one environment variable to only use a local SQLite thing as opposed to the thing in production, so my flashcards kept getting removed. Things like again, and I do think if if you hit your head and that's part of the curriculum, that could definitely be useful. Right?
But I don't know. If you reading about an environment variable and having and going through the pain and knowing how things can go wrong, also if she yeah. That that's that's how do you how do you get people to be good at correcting the LLM if you've never done a correction on yourself before? That that's the tricky bit, I think.
Because that's the skill we we're gonna care about. Like, writing the code is becoming less valuable, but being able to check the code or being able to improve a code base by glancing it, that's a more impressive skill these days. But how do you train that without the aforementioned part? Yeah.
I mean, think right now we're seeing this this rapid adoption of coding agents and everyone's everyone's using Claude Code and Gemini and all open code and all the things. But it's also, I've noticed that a lot of people are essentially putting their, putting their minds to sleep a little bit and that they feel like, you know, the agent knows what it's doing and is looking out for for doing things, doing things the right way and yet, you know, the more you use these things that you you observe over and over that they have egregious blind spots and they forget they forget to do things, they will blow away a production database by by accident.
And so I feel like probably at some point in the future, maybe there'll be some evolved version of coding agents where there's an agentic supervisor in the loop that's like applying software engineering best practices and security guidelines and all those things. But, like, I think right now, it's a little bit of a a do it yourself affair. Like, either you create a process where you essentially apply, like, this supervision where either you're the supervisor, which is more more or less, like, the the best case scenario is that you're the supervisor and you actually know what questions to ask, how to get, like how to correct the LLM and tell it like, oh, you forgot to, you know, you're putting the the the production database in the same place as the application deployment.
Like, you're gonna destroy that with your next rsync. Like, don't don't do that. I actually had that happen to me, by the way. I unfortunately, it was data that didn't really matter where I was like, did you just destroy my production database with that rsync?
Like, yeah, you did. You didn't think twice about doing it either. Did you just r m r f like that directory? Like that it had important stuff in it.
And so like there's there's a lack of lack of guardwares guardrails right now and supervision and like and so that's I think probably why we're seeing like disproportionate benefit to coding agents to really experienced people because they're able to leverage their pre existing skills and bring their judgment and and like ability to like recognize when when the when the agents are are going off the rails. And if you don't have that experience or the ability to recognize like, oh no, press escape, like you're doing the wrong thing, like you're got you've gone off the rails, then, you know, for a lot of people they're just gonna be in a be in a world of pain.
So I think for this all this is is reminding me, like, for a web app, you could maybe come up with, a really constrained web framework where there's only one way to sort of deploy this thing, and that's the the foolproof LLM proof, and because it's a crud app, you can kind of really constrain it. For data science, though, the whole point is that you are able to be creative, I think. Like mixing matching tools in ways that wasn't sort of like, weird I'm gonna just weirdest trick that works, right, in in data science. Weirdest trick.
Back when I was at Probabl, I was able to I figured out a way to get logistic regression to beat XGBoost. The way that I did that is you gotta imagine there's a dataset with cars. So year that it was built, color, brand, that kind of dataset, and then price. I wanna predict the price.
Secondhand car. So okay. One what you could do is you could sort of say, let's do one hot encoding. So, like, car brands, like one, lots of zeros, and, like, colors, like, lots of zeros and a one somewhere, one hot encoded.
And then what you can do is you could sort of try and do a distance between two different encodings of these cars. But the problem is, how are you gonna compare two colors? That distance is always going to be one, basically, because it's a different color. But okay.
How how how are you gonna do a similar lookup? Okay. What you could do is you could have, like, a regression model. So you're gonna take your encoding, and then you're gonna train towards the price variable.
Oh, and that means that you have a coefficient for every single variable, which includes, like, the one that was for a specific car brand, let's say. Oh, actually, think that can be an embedding technique now. You can take your linear model, and then we can use those coefficients, multiply that by the original array. Oh, and if you then wanna do a k nearest neighbor kind of a thing, then you now have a system that gets, a distribution of prices based on similar cars as well.
And similarity now is not just how similar are the properties, but also how does that relate to the price. And turn out if you build a system that way that you actually got better predictions than XGBoost, and you also got, like, an uncertainty bound around that as well. Okay. I like this freedom.
Like, this is something that no textbook will tell you. But if you start thinking about problem a bit more and more, they can come up with these super duper creative solutions. And that's also kind of feedback I do wanna give to people because people have always been asking me, like, Vincent, what book do you read to come up with these techniques? And I think the whole point is that I don't.
Like, I just take a step back. I take pen and paper. I have a coffee, and I think about the problem far away from social media and distractions, and, I give myself permission to let my mind free. And it's a little bit easier now that I've got a kid, actually, because he really gets me away from the laptop.
He really wants to go to the playground. But that's how you do it. And again but the problem, though, is, like, if you wanna then have LMs help you and LMs are super powerful, it is very tempting to be intellectually lazy and that's when you should kind of resist. And maybe, like this was a tweet from Andrej Karpathy, like, I think a couple years ago.
Maybe if learning feels a bit painful, maybe that's a good thing because it means that you're doing something that's uncomfortable and you're doing something that's a little bit unknown. And if you look back at your week and you kinda go like, this entire week was easy peasy, maybe you should start getting worried. Like, that's maybe a way to think about it because then maybe you should invest in a new hobby and learn a new skill maybe that way. Maybe I'm doing calculus again just for the fun of it, also for the puzzling.
Part of the point of learning is, like, mental it's like mental exercise, like, how how like, I I did pure math as an undergrad, and so I spent a lot of time, like, doing, you know, analysis and topology and abstract algebra and things like that. And I think like, you know, certainly like I don't apply any of those skills now. Like was it was it, you know, was it was it useful to learn Galois theory? Like I did, but I've forgotten all of it.
But I do think that the, that like mental stretching and like racking your brain really hard to think about difficult theoretical problems and being able to reason about like these abstract structures in your mind like it does, it it, you know, again, like I don't have the, I I I don't know, you know, how I would have turned out if I hadn't done theoretical math, but I I would like to believe that it had some practical benefit in my like current current existence where I'm like, Bill basically, you know, doing software engineering and, like, not doing any pen and paper pen and paper math. The
so I I recognize some I did operations research, so definitely different field of math. But the thing with operations research is you gotta prove that your mathematical allocation is optimal. If it's not optimal, you can't prove it. It's wrong, basically.
It's a good thing that the professor really hard drilled into me. And something about that is a critical attitude. Like, I'm not gonna believe anything anyone says. I want to see the full proof.
And maybe it's not so much the theory, but more the attitude that comes with the science, I like to think. But it also makes you good at debugging. Hypothesizing a bit, but I can definitely imagine, Wes, that that's a thing that's still stuck with you. This this kind of reminds me of the paper or at least the title of the paper.
It's called Better to be Frustrated than to be Bored, like, about effective states students might have and, like, which ones are better for learning. And I do think, like, what you described is, you know, like like writing it out isn't necessarily boredom, but, like, frustration or kind of responding to something and wanting to prove something or lock it down, that does strike me as kinda a key. I have a kid now, so I can I I can tell you the story? Do you I my theory right now is kids learn to speak because they're frustrated, because they have an idea in their mind they can't tell their parents.
And there's this god awful phase with a lot of screaming in the household when the kid is making it absolutely sure that something is highly amiss, but eventually the kid learns, oh, I gotta learn the language and that's how I can properly communicate to my parent. There's a little frustration thing there that I do think is key. Hadley, bringing it back to you. Yeah.
So what do you I I know you've brought up things like I've seen, I think in the posit Slack, like, I think you shared out an article that mentions like, you wouldn't bring a forklift to the gym. And I'm curious of your thoughts on this topic of, like, yeah, people putting in effort and learning and how do you see it? Yeah. Mean, definitely I think the one thing that's really interesting is, like, like we now, like many of us now, have the opportunity to automate parts of our job that we loved doing.
And like you can automate that and you can find joy in different parts of your job, but you also don't necessarily, like, have to. Like, you killed can keep doing those bits. Like, maybe not all the time, but, like, tactically. Like, you don't have to just optimize for, like, velocity all the time.
But then the other thing I think about is, like, like, why does anyone learn to play a musical instrument? Like, you can hear you know, you can find someone on YouTube or Spotify or Apple Music or whatever doing a way better job of playing any music than you. There's also that, like, you know, some of these skills we gain joy from regardless of whether we're the best at the world or not. And I think, like, some of that, like, how do we, I don't know, teach this stuff?
It's about well, yeah, it's coming kinda coming back to like when when you do math in high school, like most of that you're never gonna use. But like you're training your brain in this kinda like cool problem solving way that you can get a lot of enjoyment from, like, independent of the the strict utility. I mean and again, that if I can come back to a notebook, one thing that I really like about notebooks, it is sort of a nice place where you can put your thoughts as well. There's something about a notebook that's that's that is able to do that way more than a code base, like as an artifact, so to say.
There's it's it's it definitely does feel a little bit more like a thing you would write down in your own little diary as opposed to, like, you know, a theoretical book in the in the closet somewhere. Like, it's very it can still be very personal, and you can also just research something that you're interested in as opposed to something for your work. I will say with LLMs though, like, this is a habit I even if I generate notebooks kind of from scratch and I try to go, like, super big and stuff, you can still vibe code defensively, if that makes sense, in a notebook. And this is a I have a video on this on the Marimo channel if people are interested.
But one thing I really love to do now is if I make charts, the use case here is I wanted to find out if buying a bread machine was gonna save the family money. It's a really silly use case, but you can imagine there's this one line sort of going up that sort of says, like, how much money do I spend going to the bakery? And there's this other line going up that's a little bit more flat that says, okay. I only pay for the ingredients, and I have this upfront cost of the bread maker.
Those two lines intersect somewhere. Except I'm a Bayesian, so it's not the it's not a breakeven point. It's a breakeven distribution because both lines, they wiggle around a bit. There's bit of uncertainty.
We gotta take that into account as well. Right? So I was interested in the breakeven point distribution. But one thing I explicitly told Claude to do, and I and I did it wrong a few times and I kept hammering on it, those two charts need to have the same x axis and they have to be on top of each other because otherwise, I cannot mix otherwise, I cannot glance at both distributions at the same time.
And when Claude was done, I noticed that it was way off because the distribution at the bottom did not fit the chart on top. You can just sort of draw straight lines. You could see it was really wrong. But the only reason I was able to find that bug was because I was coding defensively, quote, unquote.
Those two charts have to be under each other, And then there have to be all sorts of sliders so I can also wiggle around to check my intuition, but I am very conscious of the fact that I have to sort of code defensively in a notebook these days, especially if I go for more than two cells that Claude can just go ahead and rip. Like, that that and that's also, again, still a bit of a mental exercise. You gotta think about that upfront. Even if you're not doing the syntax, oh, how am I gonna model this?
What do I care about? Figuring that out, yeah. Don't skip that. I think the the visualization is, like, super interesting, though, because I think that's like an completely different lens to look at your effort.
And like whenever you look at things through a different lens, you're more likely to find problems. And I've been kind of sort of wondering about that like in the era of like reviewing LLM generated code. Like, sure, you probably don't wanna, like, read every single, you know, line of code the LLM has written. That's not, like, super valuable.
But it feels like tools to be, like, okay. I'll just change the code set, like, this way. There's now like a new big blob of code here. So you can be like, well, actually, like kind of structurally, that doesn't feel right to me.
Like, expected the blob to be over here or to be spread out this way. So it feels like, I don't know, like maybe there's a scope for kind of cool new visualization tools to help us understand, like, what are LLMs doing to our code in the same way that, you know, looking at pictures of data is so useful if we can figure out new abstractions for visualizing code so we can better understand, like, what what are the what's the big picture implication of this change? Another sort of just mental exercise that I do recommend people do, and this is like an old school thing people never people never really did.
You know how you win a Kaggle competition? By checking the training dataset to see if there's any mislabeled items in there. It's it's a little thing that tends to go wrong. There's a lot of Kaggle competitions with bad labels.
And, again, like, how do you find out? Well, just look at the dataset for a bit. And it's one of these steps that people sometimes miss. And visualizations can make that a whole lot simpler.
Maybe you do something with an embedding and then the sentiment curves and, you know, there's all sorts of fun stuff you could do. But, again, there's all these exercises that, if you go just a little bit slower, you can just give yourself permission to, like, just do this one little check, and, oh, boy, does that really help you understand stuff as well. And, again, human in the loop. Even if you're a business that like, there's this dream that you can have, like, the one man billionaire startup with LMs now and stuff.
Right? But something is telling me if you're the one man and the LM does everything and you have no idea what the LM is doing Like Really? One billion company? Is that like, really?
I don't know. There's still something about, like, there's gotta be some human around that still understands what's happening. I think that's still kind of fine. The joke I always make is natural intelligence is also still a pretty good idea.
Yeah. Wes, do you I feel like there was a response kind of a reaction from you. Do you do you think a person will vibe code a billion dollar company? If they don't get if they don't get too distracted by shiny objects along the way, that seems to be thing that's the thing that's happening.
Like, I mean, I've been observing things like Gastown and like these things from the Steve Yege verse. And honestly, my my reaction, and this is a little bit of a hot take, is like, I'm like, any of these people building anything useful? I'm like, I'm show me something that I can use that's useful. Like I can't I'm not I mean, I see that there's a lot of work being done, a lot of tokens are being burned, but I'm not I'm not seeing like the useful output.
I wrote a blog post called called The Mythical Agent Month that's been that's been circulating and that and that came out a couple a couple weeks ago and the and the idea yeah. And so the idea is that that you know, this The Mythical Man Month is a book that came out in nineteen seventy five, so over fifty years ago. And obviously, engineering and computers and the whole the Internet is a lot of stuff has happened in the last fifty years, but Fred Brooks wrote a bunch of claims. You know, essentially, the core idea is that when you add people to a project, to a late project, it will become later, and that the larger a team is, the more communication overhead, coordination overhead you have.
And so essentially now with with agents doing the development, more or less you have like an agentic version of the mythical man month where, you know, essentially like you were the chief surgeon and so you have to have this conceptual model of how the system works and what the agents are supposed to do, but you also have to review and supervise their work and make sure they're doing the right thing and they aren't creating giant messes for each other to clean up. And so for what what I'm seeing is that people are creating these like half million line, you know, million line software projects, but they're just gigantic hairballs that are like, at the end of the day, like, you know, this could have been fifty thousand lines of code and a great deal simpler, resource less, you know, more resource efficient, like, a lot more a lot more useful and without lots of, like, unnecessary features because there's nobody to say no.
You can add a hundred additional features like every day you wake up and over your morning coffee, you're like, oh, can you add these five new features that I had, that I thought of, you know, while I was in the shower? And the agent's never going to say no. And the agents won't refactor the code and make the code cleaner unless you unless you tell them to. And so it's it's essentially like now, like your human like your judgment and your taste as a designer of these software systems has become the most, single most important skill in your in your tool belt.
And unless you are extremely judicious about what you build and what you don't build and how you proactively hold the agents accountable to build in accordance with your vision. It's not if they create messes that you spot the messes and you clean them up. But essentially, like it's created a new type of industrial process for for building software. And there's a lot of people that are just kind of yolo ing it with their agent sessions and going in and essentially they're asking a thousand prompts in a row, like requesting one feature after another and they're spotting bugs and asking the agents to fix them, but with little regard for the types of code bases that they're that they're generating.
And I think people who are engaging in that mode are are in for a world of pain at some point in the future when they hit the inevitable scalability wall when the performance of their agents, like, stops stops being good. And like every time they touch any file, they're like, oh my god, my context, the pain, it burns. So yeah, it's it's an interesting problem. Maybe, Matt, there'll be magical pills from Anthropic and OpenAI to like make this pain go away and to like bring agentic supervisors like auto auto software engineering, you know, auto data science.
So that like, you know, you can just ask for what you want and like let the software process be conducted by, you know, an agent swarm or whatever they're calling it these days. But, yeah, it's not it's not quite there yet. But it's also I mean, if I think about scikit-lego, right, there's a thing that I made. And and I and I guess, sure.
Let's say scikit-lego, they could have been vibe coded immediately. We didn't need Vincent to write that stuff on his laptop. We couldn't we could have we could have done it way more efficiently. Okay.
Great. The reason why scikit-lego is popular is because I went on stage at a couple of events and started talking about it and started spreading the message. And maybe just taking a step back, the code is not all of it. It's also like, can you do the marketing around it?
Can you it's it's also about, like, can I demonstrate to other people that I've got taste, and can I convince them that even if I'm vibe coding, you are gonna get value out of it if you use my thing as opposed to some other dude who is also vibe coding but maybe just wants to yell at it? Like, people sometimes forget about there's a marketing aspect of it, and that's also a constraint. The how fast can you convince people that your thing is good? Maybe that's more the constraint and also why I vent I figured going into DevRel is actually a super useful skill than, oh, can we code it?
Yeah. I I completely I completely agree. Like, if I if I put out something, even if I built it with agents or if, you know, highly put something out built with agents, like, you know, people are probably more likely to maybe more likely to take it seriously because they'll say, oh, well, you know, the the the these are people that that built things that weren't weren't slop in the past. And, you know, I I I've, you know, put a couple things on on GitHub that are I would describe as AI slop.
Like, there's there's it's my slop, it works, you know. But most of the projects I'm building are are deliberately, like, not slop. And, like, I don't wanna be known as, like, a purveyor of AI slop on the internet. That's not not not very, you know, brand aligned.
I will say that, like, on the topic of like, okay, is there like something more awesome now with these LLMs? There's this feeling that people miss that, like, what's the better app that we didn't have before. In my mind, and we've mentioned it before in this episode, there is this one exception that stands above everything as, like, the giant pillar, and that's widgets. You can't actually comfortably say, for this one analysis, I need this one widget now.
You download the Marimo skills that we've got, and you can just generate the widget on the fly. There's this one math challenge with on Veritasium. I'll send you a link. But the whole math problem becomes a whole lot simpler if can just have the D3 graph chart appear.
That that was done in five minutes with Claude, and it's part of the notebook, and it becomes so much easier to understand this one math problem. I like Tectonic and Tsugaru. Those are, like, Sudoku kinds of puzzles. Oh, I have a widget now that lets me generate them on the fly and then have all sorts of okay.
The fact that that's normal is just such a creative freedom. Right? And and I'm also like, that was just not imaginable a few years ago, but that is something that is just people are sleeping on this. For God's sakes, I still don't get why I'm the only guy so excited about these widgets, but there's so much intellectual freedom to gain here.
I love that it all comes back to widgets, you know? Like, we started at widgets and yeah. So I think, honestly, we've covered so much incredible stuff and I feel like I've just got so much to think about. I think like, I I feel existential about my role in the loop in coding, but now I I also feel like we've learned so much about, like, notebooks and widgets.
I I yeah. I I really appreciate your approach of, like, wanting to look at charts and to think, like, deeply about the problems and actually, like, be sure that you're learning and, like, processing problems as you go. And I I definitely think for listeners, like, the the idea of, like, Marimo and trying life, like, three cells at a time, even if you're working with Claude, trying this sort of, like, exercise of being in a notebook, like, building things gradually as an exercise seems super powerful. And I I I don't doubt that widgets also have a huge role in that as an output that you can sort of, like, manipulate or interact with.
It makes my thought a bit more interactive, and there are some things where a static chart is just fine, but there are also these moments where that interaction is able to surprise me in a different way than just a static chart on its own isn't able to. I forget there was this thing in R all this this is super long ago, Hadley. Like, was ggplot2, and then we had this thing that you were building on top of Yeah, I think wasn't that like the interactive canvas thing that you were building on top of dplyr, like, ten years ago ish? Yeah.
It's like ggvis. ggvis was kind of another super interactive thing and a different So that was actually in my mind a couple of weeks ago. We now have, like, that the ability to actually select points in a Matplotlib chart now. I saw that.
Yeah. Brushing. Yeah. That was and that's like another one of these examples where, oh, that that just opens up so many things that you can play with now, just having a few widgets like that.
Yeah. That's wild. Yeah. Vincent, thanks thanks so much for coming out to The Test Set.
I'm I'm also gonna flag, you mentioned, the Gorilla dataset. So for people listening, I think I found your article OpenAI versus the Gorilla dataset, so definitely recommend people check it out. And I guess, like, the only thing for the people listening, check out any widget and know that there are on the Marimo, there's a Marimo skills repository. You download that, and then Claude Code can generate these things on the fly for your Marimo notebook.
Give that thing a spin. Have Trevor Mance on the show. He's the guy that made it. He's a colleague of mine.
I'm a big Trevor fan, so I feel like it's inevitable. Same. I'm I I think I'm on record, like, besides Trevor, I'm the guy who made the most widgets and I might have beaten Trevor at this point. That's a sweet spot to hold, I feel like, you know.
It is. Oh, it's really If people are interested in widgets, by way, there's a library called Wiggly Stuff that has about forty eight widgets at this point, I think, that people can directly use. Yeah. Nice.
Awesome. Well, thanks. Yeah. Thank you so much for coming on. I know it's like late in the Netherlands, so appreciate you burning the midnight oil My pleasure.
For us. My pleasure. And It's been fun to hang out with you folks. Yeah. We'll see you see you on the Internet.
Thanks for coming on. Appreciate Thanks for having me. The Test Set is a production of Posit PBC, an open source and enterprise tooling data science software company. This episode was produced in collaboration with creative studio, Agi.
For more episodes, visit the test set dot co or find us on your favorite podcast platform.