The Test Set by Posit
A Posit podcast for data science junkies, anomaly hunters, and those who play outside the confidence interval. Hosted by Michael Chow, with co-hosts Wes McKinney & Hadley Wickham.
The Test Set by Posit
The Code Doesn't Lie — with Mike Bostock
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Mike Bostock made D3 when the browser was still a joke. He built bl.ocks when people needed somewhere to share their work. Now he's building Observable — reactive notebooks with an AI that actually looks at what it made. In this episode: the three-GIF bar chart that launched 25 years of viz, why open source needs both intrinsic and extrinsic motivation, and why an agent that can't see its own output is likely to be confidently wrong.
What's Inside
- The 1998 visualization library that could only make bar charts
- Why D3 hit #3 on GitHub, and what killed the gallery
- What spreadsheets got right that notebooks ignored for years
- "The agent can lie with text, but not with code"
- Why Observable scrapped canvases and went back to notebooks
- The penguin dataset that exposes AI
- Strength training, tennis mind games, and a resurrected Stanford game
Welcome to The Test Set.
Here we talk with some of the brightest thinkers and
tinkerers in statistical analysis, scientific computing,
and machine learning.
Digging into what makes them tick, plus the insights,
experiments, and OMG moments that shape the field.
In this episode, we talk with Mike Bostock,
creator of the visualization library D3,
one of the top three starred GitHub repos through much of the twenty tens.
He was graphics editor at the New York Times,
and he founded Observable whose reactive notebooks handled the
whole Viz process from end to end.
We talk about his journey into visualization,
which was heavily influenced by his early time on the Google
search quality team,
where feature decisions often came down to a single number,
that was heavily debated.
And how notebooks are sort of an attempt to crack open all
the computation that goes into visualization.
And of course, we end talking about AI agents and the future of notebooks.
I'm so excited to bring this interview to folks.
And so with that, Mike Bostock.
Alright. Mike, welcome to The Test Set.
We're so excited to have you on.
So you're Mike Bostock,
creator of D3,
and you were the graphics editor at New York Times,
and now a founder at Observable,
where you build powerful tools for visualizing data through
code, UI, and AI.
Yeah. Thank you so much for coming on.
It's my pleasure. It's great to be here.
Yeah.
And I'm joined by cohosts, Hadley Wickham,
who's chief scientist at Posit, and Isabel,
who's just an incredible software engineer at Posit,
and graciously agreed to come.
Yeah.
As people can see, especially the people listening via
podcast, we're in beautiful San Francisco.
If you can't see the video,
we're actually on the Golden Gate Bridge,
and we're surrounded by birds.
It's a beautiful day, so we're so happy to have you.
Mike, I feel like you've done so much.
And as I talk to colleagues about D3,
feel like there's this this incredibly incredible history of D3.
So I'd love to talk a bit about your how you got there and
built D3.
And then I know there's a lot you've talked about on open
source and AI and your work with Observable.
I was I was curious if you could just catch us up a bit on
sort of some of the history of D3 and your work Sure.
I do I do like to build tools,
and I have been doing it for a while.
And I think, you know,
a lot of my tools come out of or ideas for tools, guess,
come out of frustrations using existing tools,
or in some ways to a desire to understand kind of how those
existing tools are built or why they were built the way they were built.
I mean, for d three specifically,
so I was doing a lot of work in browsers, know,
in SVG using JavaScript, using the DOM API.
And in particular, you know, the DOM API is very verbose,
and if you're doing stuff with SVG,
there's this like namespace URL that you have to remember to
create an SVG element or to set, you know,
x link x link href attribute and stuff like that.
And so it was just difficult to remember this specific URL.
I mean, remember it now cause I've I've repeated it so many times.
And just just to zoom way out, D3, you're making, like,
beautiful visualizations in SVG.
Is that Yeah.
So the goal with D3 I mean,
and there are other libraries that I worked on that kinda predate that.
But I think D3 specifically was focused on
kind of interaction and animation and transitions and performance.
And I think in many ways, like,
lot of its success came out of being at the right place in the
right time kind of thing.
So it really builds on web standards.
So these standards exist already or existed already,
SVG and Canvas, and of course the DOM API as I mentioned.
But in many ways they were very tedious to use.
And so the goal for d three was to build on those technologies,
like let you leverage all of those capabilities,
but make it much easier for you to get started,
make it much more, you know, performant,
or I guess, like, maintain that performance capabilities,
but make it easier for you to use it.
There's also an element to d three that's about kind of all
of the visualization techniques,
and just kind of packaging those up in a reusable way.
So like the tree map, squarified tree map algorithm, for example.
Like that's fairly tedious to write yourself,
like to read the paper and to implement it yourself.
And there were some existing implementations of that that predate D3.
But what I wanted to do with D3 is to try to think about, like,
what is the kind of purest encapsulation of that algorithm
in a way that's independent of how you display it.
So, like, the the layout algorithms in D3 are all,
like, data space.
They're just like data in, data out.
They don't dictate how you display it,
like whether you're using SVG or Canvas or WebGL or anything
else or even like React, whatever you wanna do.
So I want I try to like decompose it into these
composable pieces,
so that and I think that helps contribute to its longevity.
May I mean, maybe you had the same experience as me as, like,
reading these Viz papers would be, a cool visualization.
Yeah.
And they provide software,
but the software does that visualization,
and like nothing else, and then you're like, well,
I'd like to combine it with this other visualization. Yep.
And there's just like no, like, connection.
And they're really fun to work on as well,
like the circle packing algorithm,
I think has been the most fun one that I worked on.
And they have these like fun diagrams that show how how they
work, how they kind of build out these layouts progressively.
And you can kind of work on them and you get these really
satisfying animations as it's as it's iterating over the layout.
And yeah, like packaging them up so that it's easy for people to reuse
that's not tied to like like any implementation
artifact that's in a paper has all of these kind of
somewhat arbitrary choices of what language it's in,
or what other kind of parameters or UI that's around it.
And so it is fun as a kind of a software engineering puzzle to
think about, like, what is the most reusable version of this implementation.
And in a way, like, that is kind of the art of open source.
It's like, how do you take a complex problem and kind of pare it
down to its essence so that people can then use your
solution or use your tool in as many cases as possible.
You you talked about this a little bit in, like,
your your lessons from, like, ten years of open source too,
but there's also that, like, this is, like, a hard problem,
and I've solved it, and it feels very satisfying,
and then sometimes you're like, well,
like, was this actually a problem that needed to be solved?
Or, like, does this actually, like, move the needle?
Like, do you like, how do you like, some of the, like,
there's definitely the aspect of, like,
this is a fun challenge and I've, like, nailed it, and,
like, oh, I've done something useful, and those are, like,
sometimes totally different.
Yeah.
I mean, I think you wanna have both your intrinsic motivation and
your extrinsic motivation.
So I think it's definitely fun to work on stuff that you just enjoy.
Like, if there's an interesting puzzle of you, like,
implementing this algorithm, like, then great.
Like, just do that.
And I think a lot of that's kind of the beauty of open
source is that you can work on a lot of these things that
you're just intrinsically motivated to work on.
And then if other people find it useful,
now you have an extrinsic motivation to, like,
continue to develop it, right,
because you've got this community,
you've got this user base,
they're excited about it as well,
and they're giving you ideas of what it can do next or ways
that you can make it easier to use.
So that's good, you know, validation.
Can you I'm really curious, in terms of the open source,
you sort of, like, set the stage?
Because I know you did, like, Protovis, I think, before Yeah.
D three, like, what was it like, basically?
How how did you get into Viz?
What was it like sort of starting with Protovis and open source?
What was that experience like?
Sure. Well, getting into this happened much earlier.
So I I think the first time I really got interested in
visualization in a professional capacity,
I was working at Google and I was in the search quality
evaluation team.
And eval's team is their job is to kind of take all of these
experiments that the search quality team is doing,
like these potential changes to ranking,
or like new signals that can be incorporated,
and try to empirically assess, you know,
what the impact will be,
like is this improving quality or not.
And at the time, you know,
the main outcome of these evals was a number, you know,
a number between one and five, and the idea is like,
if your experiment scored like four point one or higher,
you know, you got to launch, and if it was lower than that,
then you had to go back and kinda make some improvements.
And the problem was, you know,
as as is always the case,
like there's human opinions and personalities involved,
and every time you're working on some change to ranking or whatever,
like you there's certain things that you're trying to do,
and maybe other things that you're not thinking about so much.
And I found that a lot of the discussions around whether we
should launch something or not could devolve into, like,
debating whether the eval metric was good, like,
or whether it didn't kind of sufficiently capture the unique
nature of this experiment.
This one number.
Yeah. Yeah.
You're just arguing over a number, and it's like, well, this, you know,
the argument was more about the evals being flawed than the
kind of the nature of the experiment.
So what I was interested in is like,
how can we surface all of this information that we're
collecting about these experiments?
So some of it was, you know,
kind of a b whatever you can do you can do experiments where
you actually launch it and you see what the effect is on real
human users,
but we would also do stuff with human raters where we have sort
of test sets or query test sets,
and we would kind of ask them like is this improving the
quality, like is this a better result for this query,
is this a better result set for this query.
And so we had all of this information,
and by surfacing that information partly through
visualization, I think we helped the engineers better
understand what the impact of their changes was,
and then they could spend more time actually debating that,
right, and highlight maybe some of the unexpected
consequences of their experiments.
And this is this is like two thousand three ish?
Yeah.
So I think two thousand five, two thousand and six,
something like that.
Are you in this case, are you using like JavaScript?
Since Java, actually.
Okay. Although I did start getting into some web based.
I think one of the first things I did was using XSLT,
which is really gonna date me, but there was this cool, like,
XML transformation style sheet kind of thing,
and you could yeah,
I guess we would generate like XML outputs from the data and
then like transform them into a web page,
and so it was another way of doing it.
But yeah, I think some of the first visualizations that I did were
actually in Java.
And yeah, didn't get into kind of the JavaScript well,
actually, no, I should correct myself.
In nineteen ninety nine, I was, like,
I did a summer internships at kind of at Netscape,
and I wrote for their developer website.
So there is actually, not even nineteen ninety nine,
think it's actually like nineteen ninety eight or
something like that.
I wrote a my first JavaScript visualization library,
like pre Canvas, pre SVG, pre everything,
it could only, like, render a table element,
and it had to use, like,
an invisible one by one it actually had three one by one
GIFs.
There was like a a transparent one, a blue one, and a red one,
and you just did like horrible things using just the HTML
table element, and it could produce, I think,
only a bar chart.
Are you saying it's like it's like a grid?
It just made a grid and now it's filling it in with
red and blue using, like, there wasn't Yeah.
There was a very difficult to do graphics in the browser
Yeah. In that time.
And so you but you could kind of approximate
exact pixel positioning by using the table element.
Yeah.
And so I did terrible things in order to produce That's incredible.
Very rudimentary bar charts.
But yeah, think that, you know,
I've always had an interest in in helping people understand
data through visualization,
and the talk technology was not really there at the time, but,
you know, twenty years later or something,
it's it's really quite impressive.
Yeah. It's so interesting.
I mean, even you described a little bit, like, SVG,
even at a little time later being tricky to use too, but Yeah.
Quite less tricky than a table with, like,
three one by one images in it.
For sure.
And I think, you know,
it's not in a a sense like a flaw in the design.
I think, like, it's it's really helpful to have these
kind of formal and very precise APIs
that you can then build whatever you want on top of it.
So, you know, D3 is obviously a very opinionated
way, and a specialized way of constructing the DOM.
And so it's not in a sense like better than the DOM API,
it's just it's much more specialized,
focused on how do you transform the DOM to conform to a
particular dataset.
How do you describe transformations to the DOM?
And so I think, you know, what my interest, I guess,
is in this more it's I guess it's design, right?
It's like how do you create software interfaces that are
more accessible to humans,
that they can understand how to use them effectively and use
them efficiently.
Yeah.
And going from, you know,
tables and then SVGs and all of this,
do you feel like there's something that you wish you had
learned way at the beginning that would have made this
entire experience better, or is it just the nature of changing software?
I mean, I I'm sure I've learned lots of things along the way.
I think maybe one of the biggest lessons from D3 is
kind of the value of the ecosystem,
like the the web standards, like I mentioned.
You know, this ability to interoperate and be
compatible with and use the tool in conjunction with all of
these other tools.
So when we wrote the D3 paper,
I think in some ways it was controversial because it was
coming after the Protovis paper,
and in a way it was rejecting a bunch of the ideas from
Protovis.
And I think one of the things that was heretical about it
was this idea that, you know,
you don't need a specialized representation for information visualization.
And instead it's better to pick kind of a bog standard,
you know, the DOM API, SVG,
something that's not specialized to visualization,
and just use that even though it's not kind of designed with
visualization in mind.
But you get so many advantages from interoperating
with browser technology, like being able to
use your style sheets,
or being able to use the browser's dev tools,
or being able to kind of integrate with React or other
like DOM APIs.
And so I think the, you know, from the research community,
I think those kind of practical benefits of the technology
weren't as obvious,
and so it was somewhat heretical to come in there and saying like,
let's not try to build a specialized representation,
but just take this thing that exists already,
and focus on kind of the more practical benefits of, like,
how it gets put into practice.
Because people never use these tools in isolation.
They're always using them with all sorts of other things.
And it's like it's hard to remember now,
but this is also like like,
this JavaScript was kind of like a joke language.
Like, this is still still a joke language.
Some people still think.
It was like it's like it was slower.
You're saying compared to R. Yeah.
But just like it wasn't like people didn't consider it a
real programming language, and, like,
had all these performance issues,
But, like, you clearly could have sort of saw, like, hey,
the browser's gonna be really important for visualization.
Like, let's That's right.
Yeah. Bid on that.
And I obviously, like,
JavaScript has made tremendous strides in terms of its
performance and its capabilities,
and it's kind of nothing like it was, you know,
twenty years ago.
But I think from my side,
like it was always inevitable that that would happen because,
you know, it really is the the interface that all human
beings are like interacting with their
computers, with their displays of data.
Like, it's really hard to compete with the convenience of
delivery through the web, like, through browsers.
I think it yeah.
Like, I think an an interesting like,
I was just kinda thinking, like,
imagine a world where you are, I love Java applets.
I'm gonna, like, invest in that.
But in some ways, that that's a little bit like processing,
which was kind of like around a similar time, like,
the not not for visual, like, more of like art,
like a system for programming art that was really all in on,
like, Java applets, I think.
That was and, you know, and I think that's still thrive,
is still thriving today, but we've had to, like,
make a lot of, like,
technological leaps to kind of stay, or, like, D3.
I think there is something inherently beautiful about kind
of HTML and the DOM.
This idea that it is not kind of an opaque representation of
a graphical user interface,
but in fact is something that you can inspect and then even manipulate.
So there's all sorts of benefits
to the user from that.
I mean, like ad blockers is probably the most obvious one.
But like screen readers or other accessibility improvements,
and like it just it is dramatically different than an
application that, for example,
just draws to like a pixel buffer or whatever,
that just has a array of RGB values effectively.
And I think that is key to its success.
Do do you think it helps that, like,
you can go in and like inspect, like, can go into a D3,
like, inspect the and be like, oh, okay,
this now I understand, like, what's going on.
And and, you know,
that's how I got into programming in the first place,
like, growing up and, like, seeing web pages.
I mean, can't really do it these days because there's like
complicated megabytes.
It's minified, yeah.
But in the old days, you know,
you could go to a website and you could just, like,
click on the menu and view source, and they'd be like,
how did you do this?
You know, and it would be right there, kinda obvious.
You could inspect it, you could learn from it.
And so there are some challenges now with the
complexity of modern development, but even so,
like yeah, the ability to kind of inspect the SVG and kinda
see how it's structured,
like definitely gives you some some clues.
And then with d three in particular,
I've enjoyed sometimes,
like it you bind the data to the elements.
So using the dev tools,
you can actually see the data structures that people are
using in their charts,
and and that's kind of fun to understand how they work also.
Yeah.
I one thing I'm curious too to hit on is the
popularity of D3.
As I looked back, someone mentioned, I think
D3 was like one of the top starred GitHub repos?
Yeah. Is that It was number three at one point.
Okay.
After like Bootstrap and React,
I think was maybe the other one.
Yeah. But yeah, it was very popular.
And I think at one point I've heard that just a tremendous
amount of traffic was going to the D3 wiki.
So I think one of one of the things that made, I think,
that helped D3 grow as a community was the fact that all
the documentation was just an editable,
publicly editable wiki on GitHub.
Oh, wow.
Yeah.
So anybody could go in there and, like,
people contributed translations to all different languages,
as well as kind of like fixing typos and stuff like that.
But I think the thing that had the most kind of momentum
around it was the gallery.
So it started with just some thumbnails of examples that I
had made, often ones that imported from Protovis,
but there was this kind of rite of passage in the D3
community where people would learn D3,
and maybe the first or second or whatever visualization that
they made, they would take a screenshot of it,
and they would add it to the gallery.
And so very quickly,
there were like a thousand examples that various people
had contributed, and you just went there,
and it was this overwhelming display of kinda all the
different cool things that people were building.
A lot of them were like animated GIFs as well,
so you would see things moving around.
Unfortunately, then, once it became popular, you know,
spammers started putting links in there,
like malware and stuff, and it just became unmanageable,
so I had to take that down.
Yeah.
Is Was that a precursor because I know you had blocks, is that right?
Yeah.
Where Could you explain a little bit about what
blocks Okay.
So I was making all these examples to help people learn D3.
And very often, like, when people would ask questions,
so we had the Google Groups mailing list,
which was the primary way that people would ask questions.
I think now it's more on GitHub discussions and that sort of thing.
But people would ask a question, I'd be like, okay,
I'm gonna make an example for you that shows you how to do this.
And it just kind of became difficult to manage all of
those examples.
And so I started to build some machinery around making it
easier for me to manage those examples,
because I didn't want to kind of keep uploading them to a website.
Even just naming, like,
the folder name started to become difficult difficult to
come up with a good name for these things when you have like
a thousand examples.
So I started using GitHub gist,
which is just like a very lightweight git repo that has
just a randomly generated giant hex string as its name.
And then you could put an index dot HTML file in there,
and But what I needed was just a way to like actually render
that HTML page.
And so blocks, which is b l dot o c k s dot org,
is just a viewer for GitHub GISTs.
And so you could then have a URL, you give it a GIST ID,
and people could go look at that example,
and it would show you the source code below.
And so that kind of became the primary way that I started
sharing examples.
And of course, since it was on GitHub GISTs,
anybody else could start sharing their examples as well.
You remember it was like b l dot ox,
and then it was like also b o s Yeah.
So bost.ocks.
Yeah.
So I bought the ox dot org domain name just because it was
nice and short Yeah.
And then I can put whatever Yeah.
Other domains on it or host on it.
Suffix. Yeah. It's such an interesting one.
I know you talked about in your ten years of open source, like,
I'm curious of your thoughts on, like,
how yeah, how do you think, like,
do you approach documentation?
Well, that feedback loop is, I think, the most exciting,
most valuable, most powerful thing about open source,
where you're you're putting something out there,
and then you get to see how people use it.
And they're asking you, like, how do I do this,
or how do I do that?
Or they're complaining about something, or whatever,
or they're giving you ideas, and that, you know,
being in that feedback loop,
getting all of those new ideas is what really helps you
advance the tool, like make the tool better,
because you understand the problems that people are running into,
and the ideas that they have of other stuff that you can add,
and that other people can then, you know,
benefit from once you've incorporated it into the tool.
And so documentation is part of that,
but I think examples has been another huge part of that.
And in many ways I think people view the D3 examples as kind
of more of what D3 is than D3 itself, if that makes sense.
I think there was something that I read once that talked
about all the different chart types that were supported in d
three, and I was like, no, you've got it all wrong,
like there aren't chart types in D3.
But what they're thinking of is is the examples that you provide, right?
And and they can just be copied and adapted with new data
that's being put into it or whatever.
And so I think the examples, as I've written before, like,
they serve multiple purposes.
And sometimes it's kind of giving you example code,
like helping you get started with something,
showing you how to use something.
Sometimes it's like inspirational, where you're
just saying, showing what's possible,
kind of the breadth of possibilities in a particular tool,
and kind of getting people excited about what they could
be doing with it.
Yeah. Yeah. You give a talk about examples. Yeah.
I found that, like, really that was, like,
really influential for me and, like, guided a lot, I think,
a lot of the R package docs Yeah.
That I agree.
Like, people people don't really wanna read the docs.
Right.
They just wanna scroll,
find something that looks like what they want and, like modify it.
Yeah.
Mean, especially if you're making a visualization tool, you know,
you have to show people the visualizations to get them
excited about its capabilities.
So that's always been front and center in what I've done.
I think the interesting now with AI is
whether that demand for examples will still be there in a sense,
or like whether people will ask the agents essentially to
construct the example that they want at the time that they want it.
So one of the things that people struggle with with the examples,
like if there's an example that's exactly what you want, then great.
But, you know, the whole idea of D3 is that it can be really
expressive and do all sorts of things.
So you typically don't want just that example and use it
off the shelf, like maybe you like this example,
and then there's another technique from another example
and another and so on,
and you kinda wanna stitch them together to build the thing
that's unique to you.
It can be hard to do that,
particularly if you're new to D3 or if you're new to
programming, but models agents are really good at kind of
fitting those two things together.
So it's good in empowering people,
but it also I think is a little bit worrisome from the open
source community perspective,
where there's not as much of an incentive for users to come
read the documentation, to come look at the examples,
or to share their own examples, or to ask for help,
because they can just kinda ask an agent for those things.
Do you feel like your strategy for documentation has changed
with the age of AI?
I have no, because I'm stubborn maybe, or slow.
I mean, I know that there are people that are doing, like,
llms.txt or whatever, and, like,
trying to make it even more easily consumable by AI.
But I think, you know,
my attitude is I I don't really wanna write documentation
specifically for agents or for existing models.
I mean for one reason like they're gonna change all the time.
And so if you overfit to whatever the current models
are, when the next model comes out,
like it may not work as well.
And so in that sense,
I would want to build something more durable,
and so if it works for the models, then great.
But I also want it to be understandable by humans and
trying to teach humans.
And it can be a really good forcing function to figure out,
or at least another kind of feedback mechanism to figure
out, like, how to explain something to somebody.
Like how do I articulate when you should use a certain
argument or option in plot or not?
You know, you can very quickly evaluate that with the agent.
Whereas, like, actually doing that with humans is is much more expensive or slower,
because you have to put it out there and see what they do,
or do user research and stuff like that.
Yeah, makes sense.
I'm curious what your setup is, like, you're working.
You mentioned, like,
working with agents and their use of examples.
What does your setup look like when you're developing?
Sure.
I mean, a lot of it is on Observable for Yeah,
that makes these days.
So, you know, when I'm publishing D3 examples,
you know, I'm I'm doing it as an Observable notebook, and,
you know, we're sharing it on the the D3,
whatever, page on Observable.
For like application development,
like when we're building observable, you know,
I use Zed as my primary editor,
and it does have like an agent built in there.
But I think, you know,
I don't maybe I feel embarrassed like admitting this
in the public, like I don't, you know,
I don't use Claude Code to do a lot of like vibe coding stuff right now.
I think I do use
agents often kind of as a substitute for searching the
web or looking up documentation.
I mean, there's so many tools that we're using right now,
it's impossible for you to kind of have the API reference
memorized for every tool that you're using.
And so in the past, you know, I would search the web for,
you know, how do I do this?
Maybe it's Stack Overflow,
maybe it's the official documentation,
that sort of thing.
But with agents, you know,
they can often surface example code that helps you understand
how to use an API.
And so it's not kind of writing the whole thing for you,
but it is filling in gaps in your knowledge,
and then you can test it and see if it's doing what you want to do.
Yeah.
I found it so useful for, like, for dbplyr,
which translates dplyr syntax into SQL, like, oh,
I wanna know what this syntax looks like across ten different
database back ends.
And, like, previously, I would have been doing a bunch of,
like, googling, like,
trying to we wade my way through the official docs.
And, like, you get the answer in the end, but, like,
Claude just does it all in parallel,
and then it just makes me a nice little table.
And Yeah.
Like, even if it's only, like, ninety percent correct, like,
it still saved me so much time.
And it can often surface APIs that you didn't even know
existed because you wouldn't know to go look for it.
Yeah. I'm curious.
You mentioned working in the observable notebook.
This might be a good chance for us to explain what
observable notebooks are a little bit and some of the even
with
I'm even curious to sort of move from, like,
D3 working on D3 three to founding Yeah.
Observable, what that what that looked like.
So, you know, my interest in building tools in general is,
like, how do you take these valuable skills, expertise,
practices, and make them more broadly accessible?
And a big part of that for me has been programming,
like most of the tools that I build are programming tools,
programming libraries.
I love the power of code, the generality of code,
kind of these compositional primitives that you can deploy
in all sorts of interesting and creative ways.
It's not like there are six chart types with whatever ten
options each or something like that.
You can really like have a lot of flexibility in what you create.
And not just visualization too, I mean again,
like visualization is not just kind of a thing that you do in
isolation, it's that you're like, what data are you using,
and how are you kind of modeling or transforming that
data into kind of an interesting representation that
you can then visualize,
and then how are you sharing that visualization or
distributing it.
There's all sorts of other tasks that go alongside visualization.
So I love the power of code, but of course, like,
it is difficult to use.
Like, let's be honest, like,
there's a lot of things that you need to learn.
And so I've always been interested in how can I make
code more accessible, and and then how can I make
visualization more accessible?
So, you know, we were talking about blocks dot org earlier as
a way for people or as a way for me to share examples,
and then other people would share examples as well.
But, you know, one of the issues of that is still very much a local
development environment, right?
You typically would get clone a GitHub gist,
you'd have a web server running locally,
you would use your code editor, you know,
those are some barriers to entry in order to just get started.
And so what I wanted was something that was web based
where you wouldn't have to install anything,
you would just go to the web page,
Not only would you see the example, but you could edit it,
you know, you could change the code,
you could replace the data,
you could do kind of any number of things because it's an
entire development environment that runs in your browser.
And furthermore, like a social collaborative development environment where
you could share what you've built,
you could import code from other examples and stuff like that.
So, you know, and that comes from the open source mindset where, you know,
working out in the public lets people kind of inspire each
other and share techniques, and in general,
kind of advances the state of the art much faster than than
anything that doesn't have that same level of collaboration.
So Observable started as this way of like,
how can I make kind of programming more accessible,
like moving it to the web so you don't need any development environment?
But it was also, in a way,
like rethinking some of the aspects of programming.
So,
yes, it's a computational notebook, you know,
like Jupyter, you have cells, you can run cells,
and they can display things, and they can compute values.
But I think one of the key differences,
one of the key innovations of the observable notebook was reactivity.
This idea like in a spreadsheet where you don't have to
manually run the cells, that instead it understands the
topological relationships.
So if you declare a variable in one cell,
and then you reference it in another cell,
When you redefine that variable,
any referencing cell runs automatically.
And it's just like kind of a bookkeeping thing in a sense,
where like you as the author of your program don't have to
remember to keep rerunning things,
or worry about being in this kinda inconsistent state where
you've changed the code,
you've forgotten to rerun certain downstream cells,
and so like it's not matching up what you expect.
So like spreadsheet programming, I think,
has made that form of programming much more accessible,
But you only have these tiny little cells that can produce numbers,
and in a sense I just wanted bigger cells that could produce
graphical outputs as well.
Oh, interesting.
And the other thing that comes along with this reactivity
model is basically anything can become user controlled,
anything that can become interactive.
And so you can take, you know, a number,
like let's say you're building an inflation calculator or
something, and you want to, like,
have given a certain amount in a
certain year, and you want to say, like,
what is that value in today's dollars?
Rather than having to edit the code to change what the number
is, like just replace that with a slider.
And because it's reactive,
you don't have to change anything downstream,
like any constant variable can be turned into something that's
interactive or turned into something that's animated and stuff like that.
My because my experience from D3 was that a lot of that
kind of interactivity, like the asynchronous nature of loading
data, and kind of handling all of these different states,
like that was the hardest part in a way about making
interactive visualizations.
And so if you could handle that at the language layer or by
with the runtime Yeah.
Then it's kind of a simpler way of thinking about it.
Are you saying you're saying a lot of the problem was even
just getting data in Yeah.
And maybe doing some stuff with it before Yeah.
It hit the plot.
I mean, so just as a small example,
like on the blocks examples,
you would it was like pre promises,
so you would use these callback functions.
So you do like D3. JSON, or D3.csv,
or something, and then there'd be a callback function.
And And there was always this question of like,
what goes in the callback function versus like what goes
outside the callback function?
What do you do if you wanna load multiple data sets and
join them together?
Like, you nest them together, which is slow?
Do you use another I've read another library that's called D3-queue,
which is like for running it's totally obsolete now with promises.
You just say promise dot all.
But there are all these kind of mechanical concerns about
asynchronous state or handling interaction that I
think, you know,
is just another barrier for people to produce a good visualization.
So I wanted to think about that problem and and make it easier
for people to express these reactive or
interactive programs.
I'm really curious.
There's the new idea of, like, the observable whiteboards,
and how does that fit in?
Like, is this a next generation notebook?
Is this replacing dashboards?
Where do these whiteboards, which are like almost sticky
notes as a Jupyter Notebooky thing Yeah.
Where does this fit in?
So I I think, you know,
we have observable canvases,
which has kind of been an alternative to notebooks.
And the idea is like, know,
what happens if instead of essentially like a linear
layout in a document, you have a infinite canvas,
like a two d layout?
I think it's been an interesting experiment for us
to think about like how we can build more
UI versus building more code.
Like, there there are like,
there's the substrate difference, I guess,
between a canvas and a notebook, but there's also,
like, the different components and stuff like that.
I think after, you know,
we've spent about a year on canvases,
I think we are basically heading back to notebooks at
this point, having learned—
kind of some of the innovation that we're able to get through canvases.
I think canvases was also an opportunity for us to revisit
how the AI works, and having AI being more integrated
into kind of your data exploration and your visualization.
And so we have some exciting stuff that we've been working
on that pulls that back into notebooks as well.
Oh, wow.
But the challenge has been, like, having
kind of fractured or disparate tools.
And so one of our lessons is like,
we want to build on like notebooks as the foundation,
so that you can easily move between kind of
different ways of working without kinda having disparate
applications that don't work well with each other.
It sounds like you learned a lot doing Canvas.
Like, it sounds like through Canvas you learned a lot about what
notebooks can give people, maybe?
Is that Yeah.
I mean, think we're always learning in terms of what's working and
what's not working,
and trying to kind of push that envelope in terms of how do I
make this more accessible.
Yeah.
So I think with the, you know, the reactivity,
with the web based development,
I think absolutely we've we've made progress in terms of
making kind of interactive visualization more accessible
or making web development, let's call it,
more accessible as well.
But I think, you know,
there still clearly is a barrier to entry there.
And one of the questions is, like,
can you solve that through UI, or can you solve it through AI?
And I think a year ago, I mean,
these things are changing so quickly,
but I think over the last few years,
I think we've been doing a lot of work on the UI.
And I think we've made progress,
but it is very challenging to do.
Because like no matter what the UI looks like,
it feels like you're sacrificing a lot of the
expressiveness of it, and it's very expensive to build UI.
And so even if we build a great UI for visualization,
like what happens if you don't know where your data is, right?
Or like you want to cross reference data from different sources.
You know, often there are public data sets that you want to pull in
to, correlate with whatever proprietary data that you're using.
And that's a whole another like technical challenge if you
wanna do like self serve analytics or whatever,
you wanna kind of bring in broader audience to be able to use these tools.
You can't just do the visualization part of it,
like it's all the other parts that you have to support.
And so I think we've made progress in the UI,
but it's just it's been very expensive to kind of build out
enough to really make a difference in terms of the
accessibility of it.
And I think now with the advancement in AI,
it feels like there's a whole new opportunity to kind of go
back to a code based approach.
And so in a way,
like we're very well situated in that these computational
notebooks and the reactivity in particular are, you know,
more human accessible form of programming.
But it turns out they work exceptionally well for the
agents as well, because you can take really complex programming
problems and break them down into smaller steps,
kind of run the code incrementally,
see what happens, inspect the results,
and kind of iterate and learn from that.
And ultimately, the goal is that that code,
even if it is produced by an agent,
should be understandable, interpretable by a human.
Like, don't want to create these black box solutions where we
don't understand what it's doing,
and therefore can't trust the output of it.
So I think it's it's nice to have this more accessible
medium, even if it has the AI in there,
to kind of let people write code without Yeah.
You know, spending years becoming a professional programmer.
Yeah. It's so it's so interesting, the notebook.
I I feel like I've also I've really come around to notebooks
in the past year as as a tool versus,
like before I was doing a lot of text based,
these like R Markdown, QMD.
It's like a text document that renders.
But I do feel like the use of agents in notebooks is really
I've been thinking about it a lot recently,
and I do feel like Observable with a lot of these experiments
and also Observable was way ahead of the curve on this,
like, graph, this reactive execution of cells and things.
I I'm curious your thoughts.
I know notebooks have had an interesting history with,
I think, people like Joel Greus,
who famously came out against the notebook,
and Jeremy Howard, who Well, I think specifically about, like,
kind of the the lack of reactivity in Jupyter and this
problem of, like, inconsistent state and stuff like Yeah.
Yeah. Okay. So that's, like, the kind of the key.
I'm curious your yeah.
Your thoughts on, like, the evolution of notebooks and
what's what do notebooks do that's,
like, so Yeah.
Useful.
Well, we've always been an interesting outlier, I think,
not just from the reactivity side,
but also from the JavaScript side.
We're, like, we're really focused on kind of the front
end, the interface, the kind of live computation, you know,
interactive documents, interactive applications,
interactive visualization.
So I think I I may I don't know if I can speak to, like,
whatever the lessons have been from Jupyter because we've been
this kind of anomaly in a sense.
But I think, you know, again, we have, I think,
made progress in terms of making programming more accessible.
And I do think notebooks are a significant like they're
they're a different thing than building an application,
you know, like they're not really a replacement for
building a whole web app or building a desktop application.
But at the same time, like,
there needs to be something that kind of sits between,
let's say, a web full software application and like a Google
Doc or a Word Doc or something that's completely static.
And, you know, it's great to have a live dataset,
or a live visualization,
or even like an interactive model that can be embedded
directly within that document.
So that you can kind of tinker
with the parameters and kind of see what happens.
And it's still, as I said, I think prior,
my hope is that the coding agents are really going to help
make this more accessible to the broader audience.
But for the people over the last ten years or however long
that have had that skill,
have been able to do programming.
I think notebooks are really exciting medium for explaining
things and kind of making things feel more hands on.
You know, like Bret Victor has talked about,
like being an active reader,
so you're not just kind of reading what whatever the words
are or whatever the the numbers are that are being fed into
some equations, but you can actually interact with them,
like poke and prod and see what happens if you change this.
And like, I want that not just for, let's say,
some parameters that somebody might expose in in a
simulation or in a model, but like what if I edit the code?
Like going back to that kind of view source mentality,
where I can see how things are built,
and like what happens if I comment out this line,
or if I change this to that, you know,
and and be able to tinker with it.
There's something fun about these documents that you that
other people have created,
but that are then more interpretable and editable even by you.
Yes. Yeah.
It feels like something about, like,
some there's something around, like,
owning the means of production.
Like, I found it really interesting, like,
I like, the R community never
really used notebooks.
Like, I think partly because we always had this interactive REPL, like,
that's where you do your kind of learning and iteration,
and then you record what you found in an R Markdown or Quarto
doc, which you then, like, render as a whole, like,
guaranteeing, like,
reproducibility because you're running the whole thing from
scratch each time.
Yeah.
The thing I'm finding really interesting is with like,
it it does feel like AI agents change the game a bit because
now you want not just the code,
you don't wanna just give the agent the code,
but also the output of the code,
and that's that seems really Yes.
Really compelling.
So that's actually the centerpiece of the new AI that
we're working on,
which we haven't launched kinda to the web yet,
but we did soft launch it yesterday to Observable
Desktop, which is our, like, desktop app.
But yeah, the key thing is that it does runtime inspection.
So when you ask the agent to do something and, you know,
generates a bunch of code or whatever,
it doesn't just assume that that code did exactly what it
expected, like it can actually inspect any of the declared top
level variables, as well as anything that was displayed,
and it really changes the behavior of the agent.
So like one of the examples that I like to do,
because it has all this innate knowledge, right,
like it knows the penguins dataset.
But the funny thing is the penguins dataset sometimes it's
called bill length mm, and sometimes it's culmen length mm,
so it's like a more technical name for the ridge of the bill.
And one in observable uses culmen length.
So I like to do this the kind of like,
I'd like to deliberately mislead the agent,
and I ask it to make a chart of penguin bill length mm,
and so it's like, sure, here you go.
And prior to the inspection,
it would like make an empty chart for you because that
doesn't exist in the dataset.
But it would be like, perfect, you know, here's your chart,
and the Adélies are like this or whatever,
and because it can't see that it screwed up.
But now that it can inspect it, it notices that like, hey,
there's something missing here from this chart,
and it then goes and like inspects the dataset and says,
oh wait a second,
like this has different columns than I expected,
and then it goes ahead and does the correct chart.
And there's so many other examples like this where, you know,
it's working with some dataset and it's just as slightly
different than it expected,
and the fact that it can actually verify that and then
correct for it makes it much more robust.
Have you seen our Bluffbench?
Simon Couch and Sarah Altman did this for us where we kinda
noticed, like, the LLMs are, like,
pretty lazy at reading the plots,
and often they'll just effectively, like,
read the axis labels and then based on the kind of like,
if you do a plot of fuel economy
and then ask it, like, what's going on,
it just tells you the expected relationship between engine
size and fuel economy without actually looking at the plot.
It can be very dangerous because it has that kind of
innate understanding.
And also, I think they tend to be too optimistic in a sense, where, like,
either it is you wrote the code and it assumes you know what you're doing,
or it wrote the code and it assumes it knows what it's doing.
And so, yeah, we did you do have to kind of nudge it to be a little bit
more discerning in looking at that output.
Cause we found that even when it could see the output,
like if we didn't tell it that it really needed to review it
and make sure that it did what it expected,
it it wouldn't do well.
And then similarly,
so one of the big challenges with runtime inspection,
particularly working with data, is you have large datasets, right?
So you have, you know,
a hundred hundreds of megabytes or even more of datasets you
might have loaded into memory.
Obviously you can't fit that entire thing into the context,
and even for an SVG,
like if you have lots and lots of circles or lines or
whatever, like that can be many kilobytes as well.
And so we've written some kind of interesting code to
inspect those arbitrary values that hopefully does a good job
of kind of giving it a broad overview,
like deep enough that it can kinda see some examples,
but not see everything.
But you also have to tell it, like,
whether or not it's truncated,
because, you know, if it thinks that that might be truncated,
it'll just assume everything's fine.
It's like, oh, it's everything's fine,
I just can't see anything, you know?
But it's like you have to tell it what it's looking at and
tell it to be more discerning.
And have have you all had to dig a lot into, like, evals?
Whether with a capital e or a lowercase e, like,
if you change the software, the prompts,
how does the agent's behavior Yeah.
Very much.
And in a way,
that's like a flashback to what I was talking about earlier
working on search quality.
The Google eval.
Yeah.
No. For sure.
That we have, like, these eval suites that we run.
I think, you know, they're they are helpful,
but we need a lot more in a sense, like, you know,
and we need, as I said,
we just like soft launched it yesterday,
so like more people using it,
I think we'll we'll hopefully get feedback from real users
who share their their work with us and kinda get feedback that way.
But I think you're right,
you're like you need some way of knowing when you make a
change to your harness or to the prompts or whatever,
like what is the actual effect of that.
And if you don't have the evals,
then you're basically going blind.
Like whatever you're testing manually,
maybe that works great.
But all these other things that people will encounter once you
make that change,
you're not aware of those changes that you're making.
It is very like that's the most, I think,
frustrating thing about working on agents is just nobody really
understands how they work or what you should be doing, or like,
you change one thing and who knows what the effect
is gonna So if you don't have evals, you're really operating blind.
Yeah.
I do appreciate though that the Palmer's Penguins dataset is doing its part.
Like, it's playing it's, you know,
it's doing its part in making sure the agents
do their job well.
Yeah.
It's really hard to write good evals as well,
we've discovered.
So, you know, you kinda have these different I mean,
first of all, even if you set the temperature to zero,
like, it still does change.
I mean, you know,
like your system prompt includes the current date in
it, so that's like one way that things can be different.
But who knows, like,
what the other underlying instabilities are in there.
But the challenges with writing the evals are basically like
how specific are your assertions and how specific are your prompts.
Because when you have failures, you know,
your choices are like, do I make my assertions more general?
Like basically, do I enumerate more and more of the various things that I've
seen that are acceptable, or do I somehow, like,
try to make them more general?
Or do I kind of change the prompt and make the prompt much more specific?
So like one of the examples that we run into is that the
agent is just like happily keeps going, you know,
like it just keeps showing you more and more things.
If you ask it kind of too open ended a question,
it can easily run for, you know,
a few minutes or whatever,
generating different views and stuff like that.
And so if you wanna have an eval that runs efficiently and
has like a thirty second time out or something like that,
you do have to be a little bit more specific to tell the agent
to kind of wrap it up and Oh, interesting.
Yeah, and be concise in its response.
But of course, the danger is, sorry,
the danger is that if you are too specific in your prompts,
then they're no longer kind of representative of what users will do.
And so again, having that external feedback,
like having real users using it,
is essential to know whether your evals are really
representative of actual usage or not.
Yeah. That makes sense.
And I know you also oh, did you have a you sure? Yeah. Alright.
Last chance, you know.
I know this this might also tie into I saw you wrote a blog
post on playing safely with fire Yeah.
Last year. Yeah.
I'd be I'd be curious to hear Which is about AI, by the way.
Yeah. Sorry. Not about fire.
You wrote this beautiful blog post about arson,
and it really spoke to me.
I'm curious.
You've mentioned so much about evals and and this use of
agents in notebooks.
Yeah. I I'd be curious.
Maybe you could recap a little bit about the blog post Sure.
And whether any of your thoughts have changed or or stayed the same,
because I know so much has happened in the last Yeah.
Well, the main thing is really about interpretability,
verifiability, and trust.
You know, I think certainly the models,
the coding agents are somewhat miraculous or magic in
what they can do, but are also clearly capable of
misleading you if not lying to you, you know,
they can be sycophantic.
They can, you know, assume things are
working when they're not, or just make stuff up,
and all that sort of stuff.
So I think while they are very exciting as a
technology, I think it's easy to be misled by them as well,
and certainly there's a lot of hype around it as well.
And so I think our take is coming back to that, you know,
how do we make a more human medium for computation,
for programming?
And so if we're incorporating AI into that,
like it has to be about how does a human understand what
the agent is doing, and how do we make it more verifiable.
And in a way, like a lot of what we're doing now, think,
bringing the agents into notebooks and kind of having
this code first code first way of working,
is trying to, in a sense,
shift some of the agent thinking into code.
Because that is a more formal specification,
a more verifiable specification.
So I have this like kind of saying where that the agent can
lie to you with text,
but it can't lie to you with code in the sense of, you know,
if like if you ask it something, you're like,
what are my top customers or something like that,
and it just gives you a markdown bullet list.
You know, who knows if that's right?
Like, you need to cross reference it with an actual query in order
to know whether that's correct.
If it gives you code,
you have the problem of whether or not the code is relevant,
but assuming that the code runs and is relevant to the question,
like you can generally trust that that query will behave
deterministically, kinda give you the results that you expect.
And so in that sense,
like we're trying to get the agent to shift more of its
thinking to code.
Hopefully, that code is still interpretable,
but that way it's more verifiable and trustworthy and reproducible.
Heard the idea of the, you know, agents enable, like,
code backed data science nowadays where it really is
your source of truth and being able to use them more
effectively than just some sort of markdown list, like you said,
and being able to really verify that this is correct.
This is what we want.
I think it's really powerful in a world where it feels maybe
uncertain, you know, where is code going,
but having that as a source of Yeah.
Absolutely.
It's it's been interesting, you know,
because there was this whole, like, no code,
low code movement before.
And I think we were on the peripheral of that,
working on kind of UI cells and notebooks and stuff like that.
And I think in a way it's it's liberating to come back to code
as like this sure foundation, like this reliable thing,
this powerful expressive thing,
and now we're much more specifically focused on, okay,
we're not hiding the code, or taking away the code,
or abstracting the code,
but how can we help people learn how to do that code?
And I think there is a difference between writing code
and reading code, like in terms of the challenge,
the barrier to entry there.
And so I think, you know, if even if you can't write code,
if we can teach you to read code and review code, you know,
that that broadens the audience of who can do this.
I think it does, it does feel, like,
tremendously, like, cool for us that have been trying to, like,
persuade people about the magic of code for, like, so long,
and now it's, like, easier than ever for anyone to, like, Yeah.
Access it, which is, like, really, like,
genuinely cool and enabling and democratizing.
Yeah, absolutely.
I think that's the most exciting aspect of it.
Because it's not, I think,
my hope is that people don't just purely rely on the agent
to do all of the work.
Like, if it helps them get over that initial hurdle,
and then they're like, oh,
if I learn a little bit more about how this is working,
then I can get more hands on here.
And it really, you know,
helps you kind of scale up your expertise.
I I do think the other the other thing that I just find
really cool is now you can you can interact with a agent in,
like, any human language, and, like,
it speaks back to you in your language,
and I think that's also, like, really cool.
Like, we never would have been able to, like,
translate all of the documentation,
all of the UI into all of these different languages.
But now, you know, it's the translation's not perfect,
but just, again, like, tremendously enabling for people.
It's interesting, like, as a tool builder, does this change
how we're building tools?
Does it make it feel like we want to have them be more readable?
Like, maybe more scrutinist of the APIs that we're writing as
people are just looking at this maybe from a slightly more
outside point of view?
Yeah.
I think that's a really really interesting question about how
we design the agents and how we kind of guide them to do
certain types of code or certain libraries.
So most top of mind for us is kind of when you're
asking the agent to do a visualization, you know,
is it going to use D three or is it going to use plot?
In general, like we very much try to bias it towards plot because plot is
a higher level abstraction like grammar of graphics.
It is, you know,
there are certain things that you can't really do in plot,
at least not yet,
like there's no tree maps in plot for example.
And so there are certain things that you do need to use D3
to to use those, but in general we try to bias it towards plot.
Because if it is able to use plot,
it's much more likely to produce a good output.
A correct, like, buggy, but also just a better designed output.
Like, they look better,
they have better tool tips that are built in,
they kind of make better choices by default,
and all that sort of thing.
So I think it is important still,
the design of these abstractions,
the designs of those libraries.
It's not like it can use D3 to do all of
the things that you can do in plot,
like it's just much harder to get it to produce
as good of an output with D3 as it is with plot.
There's also a really good feedback loop in place,
where like you ask it to do something,
and like it makes a mistake, and then you're like,
my choice is I can either change the prompt to teach the
AI to do something differently,
teach the agent to do something differently,
or I can go back to plot and like add a new feature to plot,
or add like some better defaults or better warnings or
something like that.
And it's fun to have that instantaneous feedback where
you just you change the feature,
you change the library,
and you can immediately test it against the agent with all
these different evals.
And it kinda helps you, yeah,
it's another feedback loop to to stress test the design of your interfaces.
Are the agents pretty good at sticking with plot,
or where does the path of desire go for agents?
Do they want Plot or D3?
I mean, I don't know how much of it is the the prompt that we've given it.
It it definitely sticks to plot pretty reliably,
and I think a big part of that is just the whole, like,
observable community and all the public notebooks that have
kind of fed into these models.
Like, it really does have a pretty good understanding of both how
D3 works and how plot works.
But of course, we've given it instruction recommending the use of plot
for most of its visualization work.
And that kind of for the more advanced stuff,
you know, the tree maps as I mentioned,
or some of the more like animations that'll use D3 for that.
Because the the other the other obvious advantage of plot is,
like, it's easier for humans to Yeah.
Read that rather than trying to Absolutely.
Fully understand what the D3 code is.
And and that's part of our goal.
Like, I I want it to produce something that is interpretable
and kinda teaches you how to use plot and kinda because
you're gonna wanna do that refinement as well.
Right?
So if you are looking at the output of a chart,
but if you can see what the specification is,
that might give you some hints as to how to guide the agent
when you wanna refine it.
Well, I think everyone too now has had this experience where
you've got, like, you the AI has produced
something that's, like, ninety five percent what you want, and you, like,
ask for one very simple thing to get it to a hundred percent,
and it, like, makes it worse.
And you just get, like, stuck in that, like,
trying to prompt it to do the right thing, but often, like,
when the code's right there, you're like, okay.
I will just change that. Like, that's that's easy.
Like, it's explicit.
I don't get stuck in this, like,
really frustrating loop where Yeah.
Agent just seems so dumb.
So we have some cool stuff that we haven't announced yet,
but it's reminding me of it.
And basically, what we're working on is kind of like a REPL complement to a
notebook, and the idea is that the REPL,
which is actually we call it chat,
is basically like an agent first way of developing with it,
where you're kind of all of your messages are going to the agent,
and the agent is like replying with live code that runs in the chat.
And it's great for experimentation and kind of ideation.
You don't have to worry about it kind of destroying things
because it can only add to the conversation,
and it clearly puts the emphasis on, like,
just ask it questions.
Like it's not the same as as much less intimidating than
opening up a notebook where like the first thing you see is
a blinking cursor in a code cell, and you're like,
what even goes in here?
Like it's just ask a question and the agent will give you an answer.
But that doesn't kind of scale up to beyond the ninety
five percent case.
Like it's really not good for refinement,
where you get something that's pretty close,
but then you could go through like twenty different charts
that all get added to the conversation,
and it becomes a mess, it becomes slow.
And so the way that we've designed it is these chats are
actually just a different interface for notebooks.
And so you can transition into a notebook,
and then you can switch into this refinement model.
And that's what lets you kinda edit in place and and hopefully
get to that one hundred percent case.
Oh, yeah.
I was thinking about this morning because I was using
Gemini to make a logo, and it, like, put a white background,
and I'm like, please make the background transparent.
And so in the background, it just drew, like,
the white and gray squares, like, when you often see it.
That's right. And I was like, no. Like, make it transparent.
Like, it just does not.
And that's just, like,
so intensely frustrating when you're like,
you've done you've done this amazing job.
I've given you this really vague, ill defined thing.
You've killed it,
and now I'm asking you for this small concrete change, and you, like,
just cannot do it.
It's, like, it's fascinating. Yeah. Yeah.
That's I love when it tries to bamboozle you through, like,
smoke and mirrors.
Yeah. You know?
Yeah.
I figured one one thing I'm really curious about is we
talked a lot about the tools and your approaches.
I'm really curious about just how you unwind.
Like, do you unwind after a long day of cooking up
visualizations?
So
I started strength training,
like, spring of last year,
and I think that has been super fun,
and I've really enjoyed that.
I think, you know,
I've always considered myself somewhat of a nerd, I guess,
and like poo pooed to, like,
people that go to the gym and work out, but, like,
it's so fun.
I love it. You have like a trainer? Yeah.
I go to a trainer twice a week,
and it's really good for motivating.
I mean, it's hard.
I was doing a lot of running for a while also.
Running is more less enjoyable for me.
I feel like it's just suffering,
but you feel really good afterwards.
Whereas the trainer, like, the strength training,
I enjoy it in the moment as well.
But it definitely helps to have a trainer, like, in terms of,
you know, you've got this preexisting commitment.
Like, you you can't just decide not to show up,
like you I mean, you could, I guess,
but like it's this pressure to show up.
And also, like they're really good at kind of giving you,
like making it interesting, like doing new things.
I think if you're just doing strength training by yourself,
it's easy to have, like,
I have three or four exercises that I do,
and I do those every time, and then it just gets boring.
And even though, like, I've been doing it for a year now,
like, twice a week, like,
feel like every workout is different.
Like, I'm doing exercises I haven't done before,
and that makes it exciting,
and you feel like you're getting kind of better coverage
in terms of the muscles that you're growing.
Yeah. Yeah. It's so cool.
Does it do the trainers also cover, like,
diet?
Like, do they are you doing how are your macros, I guess?
I don't worry about that too I mean,
I there's only so much that you can do, right? I mean, yeah.
But in a way, like,
got into the strength training because I had a lot of back
problems, actually.
And I was doing a lot of biking, was commuting by bike,
and I don't know if that was helping, but just, know,
sitting at a desk,
like sitting at your computer for however many hours a day,
it's just not good for your body.
And I think what's been most successful for me with
the strength training, I mean, knock on wood, is that, like,
I have not really had any back problems Oh, wow. For months.
And, like, that is gold to me.
Like, I don't really care about all the other stuff as well.
Yeah. But it's I I really do recommend it.
I think it's fantastic.
There's been some there's some really cool videos of, like,
people starting strength training, like,
in their seventies and eighties, and, like,
it's amazingly empower I'm gonna say do it.
Yeah. My parents have just started starting training Yeah.
It's, like, so empowering, and, like, you get, like,
you're never gonna be like a, you know,
competing in strength competitions,
but it still makes you so much, like Yeah.
Yeah.
Know the the robustness, like the mobility that you feel,
like just walking around,
like bending over and picking stuff up,
it really is life changing.
Yeah.
Have you and your parents ever hit the gym?
Have you like No.
They just started in Yeah. The last couple weeks.
They're not at your level yet, is what you're saying.
No.
Yeah.
I wanna see my dad doing the deadlift,
but I think they're doing doing well.
Yes. I've played tennis with them.
I mean, play a lot of Okay.
But, yeah, that's cool.
Do they, like, school you, or
This is getting a little personal.
Yeah. Let's get into it.
You know, it's it's hard to play tennis with your parents because it's
such a mind game.
You know, I I didn't play that much right now,
but I played a lot in high school, and I always felt,
like, thrown off, like,
when I would try to play with my parents.
Like, my dad can play very aggressively, you know,
but is not as reliable,
whereas my mom is like just she never misses a shot, you know?
Like, and so she's not hitting it that hard, but, like,
at some point, you're gonna make a mistake,
and she's gonna win.
She's in your mind. Yeah.
And so it's like, the mind game gets me,
and I think that's I always struggled with that aspect of tennis.
Yeah.
Yeah. That's very funny.
I feel like it's a good relationship too for your
parent to be like, I'm in your mind, and I'm gonna school you.
Yeah. You don't know when. You know?
No. Love it.
You know, like, there's just that pressure of, like,
you know, wanting to meet your parents' expectations.
Yeah. And, like, now you're in direct competition with them.
Yeah. Right. You're, like, locking horns.
Know what to do exactly. Yeah.
Isabel, can you just explain to Mike the brief adventure you and
Claude Code went on?
Yeah.
So doing research for this,
I found I was going through old GitHub repositories,
repositories, and I did find a game called Poly Be Gone Oh, yeah.
That has been rehydrated by Claude,
if you would like to play it after.
I have it on my laptop. It is working.
Oh, that's great.
I also can't get past the first level. Yeah.
Okay. Awesome. Yes. Let's play this afterwards.
Because it stopped working in some, like, Mac OS update,
and I've been meaning to, like, get it working again,
but I couldn't I didn't spend the time to figure it out.
But, yeah.
So that was from a computer graphics class at Stanford,
and they had a video game competition as the final
project for that.
And I was I've always been into kinda like the physics
simulation, like, for Verlet integration stuff,
like there's a many-body Barnes-Hut,
but is that right?
Anyway, the the what is it called?
Like, the quadtree.
It's for simulating kind of mutual effects of gravity when
you have either particles or stars or whatever,
like I'm just at the level of seeing what seemed to be a
battery rolling around a Yeah.
Well, like a or like a hoverboard.
Predates hoverboards, probably. Shows you how old it is.
But I think what's fun about those is you have these very
simple rules, like you have mass, and you have velocity,
and you have friction, and stuff like that.
And if you just define what these simple rules are,
you can get all sorts of interesting kind of dynamic
behavior that results out of it.
And so that game, you know, you control this robot,
and you basically can like add force to the wheels independently.
I guess it drives like a tank,
so you can go forwards and backwards and you can kinda
have it spin around.
But it has momentum,
and so it can like launch itself off of ramps.
It has like some rolling cylinders that you can drive
across and like some flippy boards that like throws it into
the air, and it it was just a lot.
Not get that.
He did not get to the flippy They did get a great movie.
Soundtrack on it as well.
Yeah.
Honestly, I was like, I'm on in it in it for the soundtrack.
Totally.
And there is definitely a feeling of mass about this little poly.
It's very light. It really goes flying if you if you send it.
And you can be, like, pretty creative.
I mean, it's not a long game. Let's be clear.
There's only, like, eight rooms in it or something like that.
But there are a couple different ways of solving the
rooms and that come out of, like, unexpected behavior.
Like, there are jumps that you can make that I didn't anticipate,
but that come out of, like, whatever the physics.
I'll to put up a pull request after this.
There's a fun visualization feature in that as well,
where if you, like, replay a room, like, so you can often,
like, die, you know, fifty,
sixty times or whatever trying to get across the room.
But there's some button that you can hit,
and it will show you all of the paths that you've taken through
the room, so you can see the ways that you've died.
We'll have to we'll have to share out Poly Be Gone,
you know, remaster.
Restore.
Yeah. Twenty twenty six edition.
Yeah. Buy it on Steam.
Yeah. For real.
For real. Yeah.
Mike, thanks so much for coming on to The Test Set.
I think, honestly,
it's incredible to hear about D3 and just all your work
from, like, Google eval, single number,
people in a room debating, to, like,
how do we what's the future of data analysis and visualization
in these reactive notebooks?
Yeah.
I'm so curious for what y'all will launch,
and really appreciate you coming on.
Thank you.
It's been a lot of fun.
I've enjoyed it. Thanks.
The Test Set is a production of Posit PBC,
an open source and enterprise tooling data science software company.
This episode was produced in collaboration with creative
studio, AGI.
For more episodes, visit the test set dot co or find us on
your favorite podcast platform.