The Outer Loop of AI-Powered Coding | Merrill Lutsky, CEO of Graphite Artwork

Infinite Curiosity Pod with Prateek Joshi

The best place to find out how AI builders build. The host Prateek Joshi interviews world-class AI founders and VCs on this podcast. You can visit prateekj.com to learn more about the host.

All Episodes

Infinite Curiosity Pod with Prateek Joshi

The Outer Loop of AI-Powered Coding | Merrill Lutsky, CEO of Graphite

February 18, 2025 • Prateek Joshi

Merrill Lutsky is the cofounder and CEO of Graphite, an AI-powered code reviewer that's used by tens of thousands of users. They are backed by amazing investors including Andreessen Horowitz.

Merrill's favorite book: Never Split the Difference (Author: Chris Voss)

(00:01) Introduction
(00:06) Teaching AI to Understand Code
(02:40) AI-Assisted Code Generation and Code Review
(06:20) Current Landscape of AI-Assisted Code Review
(09:04) Motivation Behind Launching Graphite
(16:52) Landing the First Paying Users and Early Learnings
(21:42) Growth Experiments: Wins and Misses
(26:27) Current Scale of Graphite
(29:12) Tech Stack Behind Graphite
(33:12) Future of AI-Assisted Coding and Graphite’s Role
(35:37) Rapid Fire Round

--------
Where to find Merrill Lutsky:

LinkedIn: https://www.linkedin.com/in/merrill-lutsky/

--------
Where to find Prateek Joshi:

Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
X: https://x.com/prateekvjoshi

Prateek Joshi (00:01.42)
Merrill, thank you so much for joining me today.

Merrill Lutsky (00:04.422)
Thanks for having me, Prateek.

Prateek Joshi (00:06.444)
Let's start with the basics. AI assisted coding has been taking off big time and it's making a lot of news. So can you start with explaining what it takes to teach an AI model to just understand code and help us with it?

Merrill Lutsky (00:26.578)
Yeah, that's a great question. So I think at a super high level, you have to have like a massive, massive corpus of data in order to teach a model to think about code. And this was like one of the biggest improvements if you look at the leap and going from the previous generation of recurring neural networks to now these large language models that we have. It's both.

the complexity of the model, like far more layers, but also just the scale of data that's required in order to give them that understanding. once you have that, now that data set has been assembled by many of the companies building the base models, that you're able to train these highly, highly complex models on that scale of data, that's really kind of what goes into, that was the big leap forward that.

in terms of being able to understand, to build that understanding of code. More recently, what we've seen, even bigger, a couple of things that we've seen in terms of advancements lately. One is that historically one of the biggest challenges that you had was with code in particular, and it was true of any text, but code in particular is that you need to be able to both ingest enough context when you're writing code to understand.

what code has been written before and what this function or this file is trying to do. And you also need enough output in order to generate something that actually accomplishes what you're trying to get it to generate. both of those previously, again, going back to the previous generation of models, both of those were major limitations. Even with LLMs, we've seen that dramatically change going from only having like...

32, 64, 128K tokens now to seeing all these models coming out with like million token input windows and significantly improved output windows as well. So those I'd say are some of the main improvements that we've seen and what's really enabled these huge leaps forward in terms of code understanding and generation over the past couple of years.

Prateek Joshi (02:40.04)
When you look at AI assisted coding, one common use case is you type in English like what you want and it generates a net new code. Also, another big use case is, hey, here's a bunch of code I stitched together before going to production. I need someone to review it because I don't know, many people are expecting AI to help them with that. So, AI assisted code review. Like what happens in there and also how is AI helping?

Merrill Lutsky (03:10.578)
Yeah, I think that one of the things that we think about a lot, Graphite, is that so much of the innovation of LLMs and how it's been applied in terms of code in particular has gone towards what we call the inner loop of development. that process you described of opening up your IDE, tab completing what you're doing, or in the cases of some composed features, being able to give it to like...

give a more general prompt and ask these great tools to go and write a larger piece of code. But a lot of what I think not that many companies are thinking about yet is the outer loop. It's only everyone who's been a professional developer knows that just writing the code is only the beginning of the process. You have to make sure that it's tested and reviewed and deployed and then make sure that's actually having the product impact that you want downstream.

There's so much else that happens beyond after you've created that code change locally. And that's what we call the outer loop, all that collaboration, review, testing, deployment. And that I think is where we see a huge opportunity at Graphite is that we can apply these same great, know, great models that they can generate, that are so good at generating code and apply it to understanding and improving code that's already been written. And that's really where, you know, where the AI assisted code review piece comes in.

We don't think that a human is going to be taken out of the loop anytime soon. I think that it's really important that you still have both for understanding the code base and also just for accountability and making sure that there's somebody ultimately responsible for this. You're still going to have a human in the loop for the foreseeable future. I think that what's important though is that much like with writing the way it's done for writing code, you can make...

code review a lot more focused and even a lot more enjoyable and efficient for human reviewers. So much in the same way, I think a lot of folks have talked about how tools like cursor are great at kind of helping, both helping the more junior developers to just do more and also helping more experienced developers to focus their time on the high level architecture and not having to understand like this particular API definition. I think with code review, what we're seeing is that

Merrill Lutsky (05:31.778)
AI is really, really good at doing things like finding some of the bugs or nits or code-based style inconsistencies that would take a first pass at a review to go and catch. And it might be hours of, or maybe even a day or more of back and forth of having to like re-request the review, get that approved. We can cut that out entirely by using AI to scan the code bait or to scan the PR in just a few seconds after the update. I think that's really...

cutting down those number of review cycles and focusing human time on the high level. Is this architected in the way that we think it should be? Is this doing what we want to do from a product standpoint? That's really the promise of AI that we're realizing in terms of improving code review.

Prateek Joshi (06:20.726)
If you look at the current landscape of AI assisted code review, how would you describe it in terms of what can we do well and where are the gaps? And this is, let's assume this is before Graphite, outside of Graphite. Like what can the world do well today?

Merrill Lutsky (06:35.634)
Yeah.

Merrill Lutsky (06:39.046)
Yeah, I think that what we've seen so far is that AI is really good at finding, there are two classes that we've seen so far. There are the bug and knit bots, are a lot of tools, a lot of companies that are just building GitHub bots that go through and will scan for bugs or nets. There's some tools like ellipsis, ScrapTile, CodeRabbit. There's a ton of companies that are just building tools like that.

And then on the other side, actually what's been here for a longer time and isn't necessarily based on LLMs, but has the same kind of interaction model are security scanning tools. So you have tools like SonarCube, Sneak did this quite well, GitHub Advanced Security. There are a lot of tools that kind of already have this interaction model of something that autonomously scans every change that's made.

flags potential issues and then in some cases will suggest fixes there. I think that that's what we've seen so far on both of those is that they work pretty well for more limited, smaller bugs, smaller changes. What's a couple things that I'd say are missing though. One is kind of higher that like higher level, senior engineer level of input. think a lot of the tools right now will, they can find small issues, but they aren't able to give you like

architecture or performance improvement suggestions, the type of things that maybe one of the more senior engineers on your team might be reviewing for and might just have more context on. And then the other piece I'd say is at the moment, it's very much just limited in terms of how the depth of the interaction with it. It leaves you a comment and maybe it leaves you a suggestion and that's about it.

There isn't this sense, one of the ideas that we're really excited about with Graphite is making PRs in motion by default. Can you have agents that are guiding the pull requests or guiding these code changes through the process by default and doing things like self-healing when there's a case when CI fails, resolving merge and rebase conflicts, basically keeping code flowing constantly.

Merrill Lutsky (08:57.503)
instead of having to have a human engineer at each checkpoint kind of advancing it along the process.

Prateek Joshi (09:04.906)
And if you look at the point at which you launched Graphite, what was the initial motivation behind launching it? And also, how did you decide what features should go into that very first version of Graphite?

Merrill Lutsky (09:23.63)
of Graphite, the platform, or of Reviewer? Yeah.

Prateek Joshi (09:28.142)
Let's do both. Let's start with both. And this is more of a product question. Just understanding what should go into, or rather, how do you decide what goes into the V1 of any product?

Merrill Lutsky (09:40.92)
Yeah, that is a great question and one that's always kind of hard to grapple with. So in the beginning, I'll talk about both Graphite, the code review platform, and then Graphite reviewer or AI-powered review companion that's a part of it. So Graphite in the beginning actually came from an internal tool that we built. We were working on a different subset of the team was working on

different idea and dev tools at the time, building like an iOS app release and rollback platform. Had some customers for that, but wasn't getting a ton of traction. And a couple of the hires that we made, so my co-founder Tomas was at Meta for a long time. A couple of our early hires as well were his old teammates. And pretty much universally, the first thing that they'd say when they'd to the company before it was called Graphite, and we'd set them up on GitHub was, you

how do I do this? How do I do code review without fabricator Meta's internal tool? And one of them kind of famously said, know, wow, this makes me feel like a caveman compared to the tool chain that I used to have. the pain, happened a few times and the pain for them was so much so that they actually rebuilt what became the first version of the Graphite command line interface in an internal hackathon a couple of months in.

basically started using what became Graphite for a few months before we even thought about making it available more broadly. It just so happened that it came up in conversation with some other Xmeta folks, those folks on the team who built it knew. And basically the response was like, wait, can you guys give us this? We have to have this.

Can you build this for us? We miss this tool too. There's sort of this universal recognition of anybody, and this is true to this day, of pretty much anyone you ask who's spent time at Meta, if you ask them, do you miss Fabricator? Do you miss the stack diffs or stack PRs workflow? Pretty much universally, every engineer who's been there will tell you, yes, this is the thing that needs to, that I had there that I miss the most.

Merrill Lutsky (11:56.37)
kind of working with a more publicly available tool chain. that was basically from there. We heard a lot of excitement about it. The other product we were working on wasn't getting a ton of traction at the time. Had a few paying customers, but nothing super exciting. we basically ran an experiment where we said, we're going to work on this full time for a month. If we can iterate on this and get to 20 engineers at companies that we know.

that are using it every single workday at the end of the month will pivot the company and focus entirely on Code Review. And we ended that month with 50 engineers at companies that were well known just through word of mouth. At that point, we'd put them all on a Slack channel and we maxed out the number of Slack Connect connections that you could have in one channel. And we weren't charging at that point. We were just basically asking everyone who was participating to...

to spend an hour with us or spend half an hour with us every week and give us feedback and basically, know, pay with, it was like the mom test as famously, famously like, you know, get people to pay with either, either like their time, their money, their time or social capital. And in this case, like giving us that, that money or giving us that time in lieu of paying us anything, that was kind of the validation of like, if they keep coming back every week, we know we're onto something. We know that, you know, that they're giving us feedback and.

Yeah, the journey, obviously, fast forward now and we're really excited by the scale of multi-thousand person organizations that the Graphite serves. throughout that, I think that the constant user feedback and the generosity of Graphite users to tell us what's great and then also tell us what we're messing up has really helped us iterate and build a great product in that time. I'll come back to then reviewer in particular.

You know, really that one, I guess, similarly came from an internal idea. Greg, my other co-founder and CTO, was playing around with applying LLMs to review for a while. This was like a year and a half ago at this point. Greg would do some of these experiments on the weekends, like add things to the code base. for a while it was, some of the team honestly found it kind of annoying of like, you what's this bot doing commenting on my PRs?

Merrill Lutsky (14:20.606)
And we tried to the models at that point, the models weren't quite quite good enough. And then, you really it was only with some of the big model advancements last spring and summer that this became more of a more of a reality for us and more more that the capabilities were there for us to actually do AI powered code review. for us, I think it started with one kind of looking at

looking at like comps from, again, looking at comps from bigger companies. So critique at Google, their internal code review tool, actually for many years has had like an AI powered commenting and suggestions feature. They were even using like older generation models before LLMs and now obviously that's gotten a lot better. So there was some good like prior, know, prior art there of big companies being, big companies implementing this internally.

And then we also were obviously chatting with our users a lot about this. We had some inbound requests to build the code review tool with AI. And then we obviously involved our larger customers early in the process and got a ton of great feedback from them in the early days. Last thing I'll mention on scoping is really, it's always painful, I think is the short answer. You always want to add,

For us, we had this great vision from the beginning of, you know, it's going to suggest changes and it'll fix the AI failures. And, you you'll have, you know, it'll index the code base fully with RAG and you'll be able to generate custom rules and English prompts to have it. And, you know, ultimately you kind of have to cut it down to like, what is the, I'd say, you know, one, what is like the minimum amount of value that you can provide in a well-designed experience. And I think critically have a feedback mechanism for it so that you can, you can improve and iterate on.

So for us, being able to with with reviewer having like the upload and downvote having a way for users to like give feedback on every suggestion that it's making that was critical for us both in like testing different different models and prompts and validating the usefulness of reviewer and then also being able to constantly iterate and improve on our prompting and you know the the critics that were that we're using in order to generate the comments. So that's how I'd say we thought about kind of the

Merrill Lutsky (16:44.547)
the scope, launching both reviewer and graphite and then also the scope of reviewers, like initial v0.

Prateek Joshi (16:52.302)
Amazing. That's so many good nuggets in here. Okay. So now you decided what goes into the product and now you launched it. Can you talk about, in the early days, it's always hard. How did you land your first paying users? And also what did you learn from them? Like obviously they were in your Slack channel. You are talking to them. What did you learn from them? And also maybe part C here is

Merrill Lutsky (17:17.158)
Yep.

Prateek Joshi (17:22.03)
what feedback did you end up incorporating in the product itself?

Merrill Lutsky (17:28.466)
Yeah, yeah, that's it was it was a long process for us. You we were we were in closed beta for about a year and a half, actually, between. So we did that initial like 50 user test. We were we had sort of the initial alpha like 50, 50 to 100 users in like fall of 2021 around. It was right before Thanksgiving in 2021 that we launched our waitlist more broadly. So we were we launched on Twitter and Hacker News. We had.

had like thousands of developers that signed up in that first 24 hours or so that we were on page of HN. And then we had a referral mechanism built in to kind of, if you got five or more folks from your team, we'd put you to the front of the wait list. So it was like a nice mechanism for kind of sorting out the teams that were most serious about this. then we were in closed beta working closely with them for the next year and a half or so. And a lot of what we realized, one building,

Building a code review platform is, code reviews is a massively complex process. There's so many different parts of it. Building, you we were building on top of GitHub. which is both, I think is like, we have a great partnership with GitHub and that gives us like the interoperability and partial adoption pathway was fantastic for us. But it's also really hard to build something and to build like a syncing engine with GitHub's PRs like.

We sync millions and millions of changes every day and have infrastructure challenges at the scale of a company a hundred times our size just because of the nature of what we're doing. Early learnings were one, you kind of need to build the entire platform end to end. It is one of those cases where there's...

it's hard to, if you wanted to do stack PRs, like you can't just have, you know, there were existing CLI to like open source CLI tools out there for, for like stacking branches locally, but you really need kind of the Facebook and Google style and tool chain to do this workflow. Right. starting from, and a lot of the difficult lesson of what we learned in closed beta was that

Merrill Lutsky (19:36.71)
You do need to build all of it. You need the local experience of how I'm creating and updating branches and pull requests locally. You need the review platform that shows you the relationships between all the stack PRs and helps you prioritize what you need to take action on at any given moment and actually go through and review them. For larger teams, you need a merge queue that's effectively...

efficiently making sure that everything is tested and avoiding merge conflicts, keeping your trunk branch green. Really, you do need that whole tool chain in order for this workflow to really function at scale. And I'd say that was a big lesson was just the scope of how much that we needed to build to get this to a really incredible experience end to end. And then I'd say the other big piece was in figuring out what to charge.

We knew from the beginning that we didn't want to charge individual developers. I don't think that that's ever like a great business model in DevTools. So we knew that we needed to create value for teams and organizations. that was, so first building, I think we spent probably the first half of at least of that closed beta period just completing kind of the individual user experience of how do I create, review and merge pull requests on Graphite.

And then even then, once we had that foundation, we then had to go and build the organization and team value on top of that. that was features like the merge queue, the merge queue, engineering insights, automations, which are like an if this, then that for PRs the teams will use for like assigning reviewers or adding context or defining an order in which folks have to look at PRs and review.

things like that that are more like beneficial to orgs and enterprises. But we had to build, first you build that foundation of like the individual use case that's going to be free. And then on top of that building like the team and organization value that we ended up actually being able to charge for.

Prateek Joshi (21:42.678)
Now, that's great. And going from day one, meaning you launched the product and to today, obviously you're bigger now. So you must have conducted a whole bunch of growth experiments. Some of them worked, some of them didn't. So if you had to pick one that really tilted the curve for you, and also another one that you thought would have worked amazingly well, but completely ended up like did not perform the way you thought. So what are those?

One example for each, if you will.

Merrill Lutsky (22:14.822)
Yeah, the growth side of it is always, that's always like a big question for us. So I'd say, you know, maybe one, one experiment that worked, that worked really well for us. I'll mention like the, the viral waitlist component in the beginning, like that, that was, that was amazing for us, both in terms of, of like vastly increasing the number of signups that we had to the initial waitlist. And then also, also I think it was a great filter for like,

which team, anytime that you introduce a little bit of friction to the process, it can be like a great sorting function for like how serious a team is about using your product. And for us, many of those like first wait list signups, actually, and actually have now become some of our larger enterprise customers. So I think like Snowflake was on there. I'm trying to think of some of the other examples.

There are a few examples like that of companies. I'm not sure that I can say them, but a few companies of that scale that signed up from the very beginning and then had been kind of graphite users throughout. So that one I'd say highly recommend doing something on that order. Trying to think of one that didn't work as well. Actually, I think one recent one I'll point to is actually the initial launch of Reviewer. So we gave everyone...

a all existing Graphite customers a month free of using Reviewer in the repos. You just had to have someone in the org go and opt into it and turn that on. And interestingly, think what we noticed and then what we realized is that a lot of those initial users, and this was like back in the late summer before we even launched Reviewer publicly, a lot of those initial users churned just because the quality wasn't there.

We tried to, we were iterating on still on our prompts. the like Claude released a big update to 3.5 Sonnet, the dramatically improved performance of Reviewer. A lot of, but a lot of in the beginning, a lot of those comments were, it was like too noisy. A lot of the comments weren't accurate. And we, I think one of the big realizations that we've had is like developer trust. And this applies kind of throughout the product is that

Merrill Lutsky (24:33.948)
Developer trust is just so hard to build and so easy to lose. if you have a... This is like one of the biggest problems with AI products in general and many of the AI code review tools out there, I think is it's so easy for something to just be written off as like, this is noise. I'm going to ignore this on every pull request. And by nature, think AI is pretty yappy without a lot of prompting, a lot of editing and developers don't like that.

Prateek Joshi (24:57.635)
Yeah.

Merrill Lutsky (25:02.578)
It doesn't really, you if you have something that's adding, you know, 12 nonsense comments, or even, you know, sometimes it'll have like, uh, some things that are frustrating, that will be frustrating to a developer. Like, uh, for a while we had to actually add a critic to our model to, uh, to get Claude to not suggest adding a comment on top of every single function. Uh, like things like that, there's some things like that, that it will just, it will do just based on like the, the, all the data that everything that it's trained on.

might suggest that that's a good thing to do. A lot of developers are going to be annoyed by that. So it took us a while to really refine. And I think that's when we launched the product and in the iterations that we've made since then, we've really focused on high signal, noise, building something that's trusted, that you think of as like another senior developer on your team.

rather than like, you know, a Yavi spam bot that's adding 12 comments that you just use like automatically minimize whenever you open the PR. But I think the big lesson there is like the growth, know, giving a free month, like promoting something like the growth side of it doesn't work if the substance isn't there. like that is like the fundamentally, think having a product that is

that is going to get the type of user love that you need. That's kind of the foundation for any sort of growth experiment or growth marketing.

Prateek Joshi (26:27.096)
Now, going to where Graphite is today, and you can pick a metric of your choice, but how do you describe the size of Graphite today? Could be number of customers, teammates, number of people you've served. So where is Graphite today?

Merrill Lutsky (26:44.742)
Yeah, so Graphite today, I'd say, we have tens of thousands of paying developer, of paid users that are active on the platform, creating, reviewing, and merging PRs in the hundreds of customers with... And it's fun to see the variety of customers as well. have everything from small teams that are just going through YC. Sometimes we even have research labs or our favorite customer.

One of our favorite examples is this team, Proximate Fusion, that's building next-gen fusion reactors and all their modeling software they're developing with graphite. I think one of the awesome things about DevTools is that you get to see all these teams that are building incredible technologies beyond what you'd ever think of doing with your tool. I think one of the things that's so exciting and...

for us as a team is like, you we get to help improve their workflows. And if we can give them even like a five, 10, 20 % of their time back that they would have spent like bogged down in code review, you know, think of all the, think of like what they can create with that time and like how much good that they can do. So teams like that all the way up to, you know, now companies with, you 5,000 plus engineers that are running their processes on graphite. And that's like the scale of,

enterprises that we support. that's, I think that that's really the, the perhaps the cool part about this is that, you know, graphite, the stack PRs workflow and a lot of the tooling that we build. It's amazing from day one of just starting code review as a company, but I think it really, it really has the most benefits for massive enterprises, you multi-thousand person teams, monorepos where you have thousands of engineers trying to merge changes concurrently and

You need to manage the merge conflicts and potential problems of that. That's really the reason why these technologies and workflows were developed at places like Google and Meta. And that's really where the value of graphite compounds the most.

Prateek Joshi (28:56.088)
mean, tens of thousands of users, that's a lot of users. And to serve them reliably, obviously on day one when you have four users, you can just pitch a bunch of stuff together. But when you're serving these many users, there has to be a lot of reliability. So can you walk us through the tech stack behind Graphite as it stands today to serve all these users?

Merrill Lutsky (29:12.039)
Yes.

Merrill Lutsky (29:19.1)
Yeah. Yeah. And I'll say definitely like reliability at our scale is something that we take very seriously and we feel acutely when, and our users feel acutely when it's not available. we usually, whenever there's one funny thing that we noticed is like whenever there's a GitHub outage, because we were built on top of GitHub.

Usually our users know when there's a GitHub outage before GitHub even puts a status page up or anything. Like they'll just say like, graphite's not working. And sometimes it's an us problem. And sometimes it's a GitHub problem. But just the, you know, knowing within, usually within a few seconds of any sort of outage, like, you know, our users are on there. And I think it's a testament to like how, you know, the responsibility that we have building something that's like so core to, engineering workflows. And we, we say that basically next to the IDE.

And this is true looking at the data is like next to the IDE, know, graphite is the tool that engineers are spending the most of their time in every single day. And that's both, it's super exciting and rewarding for us to like build something that's, that has that position in the developer workflow. But, you know, it's also like a huge responsibility and something that we need to, we need to make sure that we're providing like a really high availability and like high quality experience.

So our tech stack, we're based, a lot of our team, our entire team basically uses TypeScript across the product, even down to the CLI, we use like YARGs for the CLI. So we can maintain consistency, like every engineer here can work on every single part of the product. And as part of onboarding, actually, we have every engineer write a pull request in each different part of the product to make sure that they're familiar with it.

Everything's hosted on AWS. We use ECS and Docker. Our backend is mostly Node, Postgres, TypeORM. On the front end for the Graphite web app, we use a lot of React and MobX. And then Reviewer itself is mostly powered by Cloud 3.5 Sonnet. We have a great partnership with Anthropic. Actually, my old co-founder from my...

Merrill Lutsky (31:34.332)
past startup that I did a long time ago is now like a key engineer at Anthropic. And it's kind of fun to get to work with Eric again and talk about, ask him for features and they've been really great partners to us. There's a great case study that they published on Graphite Reviewer pretty recently. So we use 3.5.SANA for most of the reasoning. We're embedding with OpenAI, storing everything in PG Vector. Yeah, that's the, I'd say that's the

majority of the stack. Again, a lot of the difficult part about Graphite and what a lot of our team goes to is just maintaining that syncing engine with GitHub where we're ingesting millions and millions of updates to PRs across all the organizations that we serve. And we have to do that in near real time so that everything that we show on the Graphite interface and everything that you're seeing on GitHub, things like even something as simple as emoji reacting to a comment.

Prateek Joshi (32:32.046)
Right.

Merrill Lutsky (32:32.582)
Having that, getting that down to, it took us so long to get that down to less than one to two seconds of latency between doing that on one side and having it show up on the other. there's so much in terms of optimistic UI and all these little things that you have to think about to make sure that you're providing a great product experience and having this be totally interoperable between the platforms.

Prateek Joshi (32:56.398)
That's amazing. All right. I have one final question before we go to the rapid fire round. Where do you see the future of AI assisted coding heading in the next two years? And also what role will graphite play in that?

Merrill Lutsky (33:12.454)
Yeah, I think this is perhaps the most exciting part of where we are in the developer lifecycle is that if you look at how the agents are improving rapidly, how both the base model context windows and output windows and reasoning capabilities are growing so quickly, what we're seeing, and think it's a pretty commonly held belief that...

all these agentic development of the development tools are just going to get better and better. And more and more of the world's code is going to be written primarily by agents versus human developers. if that, if you think about like, as these agents get better and better, what happens to the developer workflow as we know it? Like it probably doesn't look like someone sitting in IDE, character by character line by line anymore.

it probably looks more like how I interact with one of the engineers on the team, which is, we have a high level task that they're going to go and perform or implement. They do that. They send me a pull request. I go through and give feedback on that. I might open it up in an ephemeral environment and play around with it. I give feedback in the form of comments on the PR, and then we go back and forth. They iterate on it, and eventually I approve it and it ships.

which is code review, basically. So I think that that's really the exciting part is that one of our core beliefs here is that code review is going to become, the inner loop will kind of be absorbed by the outer loop in some way. like most of the, as agents are leading on more of the world's development, the way that we interact with them will look more like how we interact with human engineer teammates. It'll be more conversational. It'll happen on the pull request.

on the artifacts that they're producing. you know, in that world, think, you graphite could become, you know, we talked about how graphite is sort of second to the IDE at the moment in terms of amount of time developers spend in every day. I think there was a world in which like code review and graphite in particular, you know, can become the, you know, the, the, the place that engineers are spending the most time every single day, you know, the place where they're going to kind of both, kick off and, and then interact with, with agents and

Merrill Lutsky (35:31.962)
a place where humans and agents come together to create amazing software.

Prateek Joshi (35:37.614)
Alright, with that we are at the rapid fire round. I'll ask a series of questions and I'd love to hear your answers in 15 seconds or less. You ready? Alright, question number one. What's your favorite book?

Merrill Lutsky (35:37.874)
Yeah.

Let's do it.

Merrill Lutsky (35:46.513)
Let's do it.

Merrill Lutsky (35:51.218)
I love never split the difference. It's made a huge difference for me personally in terms of how I approach negotiations and hard conversations.

Prateek Joshi (36:00.374)
Yeah, no, it's a great book. I highly recommend it. Right, next question. What has been an important but overlooked technology trend in the last 12 months?

Merrill Lutsky (36:10.406)
I think in AI in particular, there's this great blog post called the Bitter Lesson, which talks about how people keep trying to extend each iteration of the model and figure out the prompting tricks and everything. But none of it really matters. Every six months when the models jump, it just completely changes everything. it even changes, everyone kind of has to reset and figure out how to work with it. So over time, think it's just the base models. The base models kind of catch up and overrun everything.

Prateek Joshi (36:38.058)
What company do you admire the most and why?

Merrill Lutsky (36:42.458)
I really admire Shopify. I think I had the pleasure of speaking with Toby a few months ago. And I think they do an amazing job of keeping such a high percentage of like core engineering at scale, avoiding the kind of bloat and like both product and organ certification that happens as you grow a massive company at the scale that they've done.

Prateek Joshi (37:04.526)
What's the one thing about AI assisted coding that most people don't get?

Merrill Lutsky (37:10.962)
The inner loop isn't really the blocker. So much of what holds up changes at enterprise scale is the outer loop of code review, code review testing and deployment. Much of the benefits of these inner loop tools have even yet to be realized because of how much of a bottleneck processes like code review are.

Prateek Joshi (37:33.806)
That's a great point actually. What separates great product from the merely good ones?

Merrill Lutsky (37:41.958)
think that it's, especially in AI in particular, it's thoughtfulness of experience. Does it do what it implies it can do? Does it get out of my way? Does the mode of interaction make sense? A chat interface isn't the answer for every task that you can perform or every job to be done. does that make sense? And lastly, how easily does the matrix break? Does it set expectations that then it can't deliver on very quickly because

kind of as soon as you hit that wall, all of a sudden the magic is gone.

Prateek Joshi (38:15.852)
What have you changed your mind on recently?

Merrill Lutsky (38:19.378)
I think a good one was sleep tracking. I previously was a big skeptic of it. got the aura ring for Christmas and it's made a huge difference, not necessarily in like, it's helped me get more sleep, but also just like setting expectations for like, how well rested am I? Like, what should I expect of myself on any given

Prateek Joshi (38:30.272)
Yeah.

Prateek Joshi (38:45.038)
Now that one's been a huge thing for me as well. I highly, I can't stop talking about it. Anybody I meet, I tell them how much of a difference just tracking and understanding your sleep has made. And suddenly for no other reason, my energy is better and it just, just, just like you're more energetic, you're more present. And if you get, just get your sleep right, a lot of good things can happen. So yes, I'm definitely on board with that one. All right, next question. What's your wildest?

Merrill Lutsky (39:04.689)
Yeah.

Prateek Joshi (39:14.638)
protection for the next 12 months.

Merrill Lutsky (39:18.194)
Maybe it's not quite the next 12, but I think it's that the local development goes away. Like IDE, everything that's happening locally on your computer ends up being far less relevant in the development process. everything just gets, it's sort of the logical extension of the Android CarPathy vibes-based coding that you tweeted about the other day. I think pretty much everything becomes vibe-based coding and the interaction becomes more code review than code.

authoring of code.

Prateek Joshi (39:49.92)
Right, right. All right, final question. What's your number one advice to founders starting out today?

Merrill Lutsky (39:58.321)
The MVP as previously thought about is dead. Invest in product experience, invest in design, make sure that you're early. Like the bar is so high now, even for enterprise applications. and both the bar is so high and the barrier to entry is lower than ever with all these great AI tools. So, you know, go the extra mile, you know, get the details right of your early product, build something that isn't going to be, that isn't going to like suffer.

just because you didn't put in the effort to build an amazing product experience and let the core value that you're delivering shine through with that.

Prateek Joshi (40:33.806)
Amazing. Meryl, this has been a brilliant discussion and we just loved the insights you had on just building and shipping a functioning product that can be used by tens of thousands of users, paying users. So thank you so much for coming out of the show.

Merrill Lutsky (40:48.006)
Fantastic. Thanks for having me, Rootiek. Really enjoyed the conversation.