Infinite Machine Learning: Artificial Intelligence | Startups | Technology

Generative AI for Coding

April 02, 2024 Prateek Joshi
Infinite Machine Learning: Artificial Intelligence | Startups | Technology
Generative AI for Coding
Show Notes Transcript

Varun Mohan is the cofounder and CEO of Codeium, an AI code generation tool used by hundreds of thousands of developers. They recently announced their $65M Series B led by Kleiner Perkins with participation from Greenoaks and General Catalyst. He was previously at Nuro. He has a bachelors and masters degree from MIT.

Varun's favorite book: The Idea Factory (Author: Jon Gertner)

(00:00) Introduction and State of Play
(03:03) What Generative AI Can Do Well
(06:10) Introduction to Codeium
(08:53) Handling Different Programming Languages
(11:26) Model Architectures and Optimization
(13:27) Interpreting and Trusting AI Decisions
(18:33) Security and Privacy Considerations
(20:07) Impact on Software Quality and Developers
(21:50) Potential Obsolescence of Programming Languages
(23:39) Handling Edge Cases
(26:07) The Biggest Impact of Generative AI for Coding
(28:27) Technological Breakthroughs in Generative AI
(29:30) Rapid Fire Round

--------
Where to find Prateek Joshi:

Newsletter: https://prateekjoshi.substack.com 
Website: https://prateekj.com 
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 
Twitter: https://twitter.com/prateekvjoshi 

Prateek Joshi (00:01.227)
Varun, thank you so much for joining me today.

Varun Mohan (00:04.47)
Hey Prateek, how's it going?

Prateek Joshi (00:06.839)
Let's get right into it. Generative AI for coding, right? It's hard, it's important, everyone's talking about it. What's the state of play in terms of what it can do well today?

Varun Mohan (00:17.57)
Yeah.

Varun Mohan (00:21.106)
Yeah. So I think, I think, you know, over the last, I would say like year or two, we've had a lot of truly big breakthroughs, I think in, in general, AI and happy to talk a little bit about what we've been up to. Um, but also at the same time, there's been an incredible amount of, I think demo where.

also that has come out in the space. I would say over the last year and a half, we've seen a lot of demos. And I think this has both helped and hurt us as a company. And the demos make it so that, look, the space can do basically anything, and which is exciting because we have a product and that means that people naturally will be more excited about our product. But at the same time, it adds a lot of misinformation about what's actually feasible. And just basic things about the product space. Like right now, the products that are providing the most value are products like AutoComplete, which is...

basically supercharging developers as they type code, being able to use context from everywhere in the code base. And then chat that is capable of munging data throughout the code base and providing actionable insights about how you should actually change your code, right? And there's many other things that are sort of coming out, but these are the two proven applications right now of these systems.

But last year we ended up getting a whole host of these agents like, Hey, you know, auto GPT is something that will take your entire code base and generate a bunch of stuff, look on the internet and do a bunch of stuff like that. And realistically, nothing really came of that despite the fact that the technology was GPT four and all these announcements happen. And I think, I think it will happen at some point, but just in that case, like this idea of autonomously being able to write an entire PR end to end.

is something that we have not done in enough domains that it is capable today. Right? These models still hallucinate very, very easily. They still make basic bugs. And the approaches that companies have taken to sort of get around this is just write the bugs. And if you can possibly execute the code, keep executing the code until the bugs disappear. Right? So have it both generate the code and the test and see if that's actually possible.

Varun Mohan (02:16.778)
And in some domains that works, but in really complicated code bases, that just doesn't work. So long answer short, a lot has been possible with tools like autocomplete tools, like Codium, generate over 50% of all code that people write. But at the same time, some of the stuff of automatically generating PRs, we are a little bit away from doing that at enough scale. I would say.

Prateek Joshi (02:33.999)
Right. That's an excellent point. And you use the word demo-ware, which means like it's easy to build a nice flashy, cool demo, but when it comes to doing real work, like in big enterprise settings and like production settings, it's just, it falls short. So when you look at the work of building software, all work, right? So what can generative AI do well, reliably, meaning you would say, okay, for these tasks,

we can definitely trust generative AI. And also on the flip side, what are things that these demos have shown, oh my God, it works, but in practice they fall way short. So maybe what are the two buckets here?

Varun Mohan (03:16.932)
Yeah. Yeah, so, okay, here's the thing. Here's the, I love this like a fantastic question. When we think about AI products, I think about them in three categories, okay? Or there are three dimensions that they need to hit. The first thing is quality, okay? If the quality is not high enough, these products just are not gonna be used. And I'll give you an example here, like even for autocomplete, with the advent of LLMs.

Was it only capable for these products to have high enough acceptance rates that users were even willing to use the product, right? Because if the acceptance rate is too low, if it's like 10%, even if it does give you useful suggestions, it's too noisy. That users, it erodes user trust, right? The second key piece is latency.

Okay, if the quality is not great, but the latency is very low, then at least it's interactive. You can keep regenerating it over and over. And autocomplete hits a sweet spot where the quality is like not perfect, you're not accepting it all the time, but the latency is so snappy that the user is willing to basically deal with it, right? The user is willing to deal with it. And then the third key piece that I like saying is correctability. If what this thing does is it generates a lot.

And even if the latency is low and the quality is high, if it's even wrong like 10, 20% of the time, but the user needs to go in, understand what the AI generated and do all of these other things, the user will expect pristine quality. They will need to assume that it's completely autonomous. And you know, there's an analog, because before this I worked in self-driving of YL5 autonomy, which is perfect autonomy is very hard. Because you can't, there are no edge cases that are feasible.

right, in that fully autonomous case. So coming back to what you said, this completely explains how the products that we find the most useful in the code space are those products. People accept the fact that these systems hallucinate from time to time. And companies at Codeium have been able to fix this with context awareness, fine tuning the model on your private data, and stuff like that. But these systems still hallucinate, right? And the user still needs to review it, ultimately. And I think the reality on the flip side of the products that don't work.

Varun Mohan (05:15.23)
actually, Copilot came out with something, GitHub Copilot, a year ago for PR summaries. And what we realized is actually that the PR summary is useful. It is useful to have a PR summary. It's the final step before a PR gets accepted into a company. And, but the PR summary on the other hand, it would summarize the code incorrectly. Sometimes it would miss a file. Sometimes it would incorrectly summarize the file with respect to the entire, which is the whole point of it. The whole point is to aid. So,

It was not the technology was not good enough to get in this day or is a useful enough PR summary. And at the same time, on the other hand, if I can just name one thing, one thing that they added in there was a haiku. So something that just summarizes it with a poem, but that was very nice to read, but that's not useful to a developer. So there are these two Venn diagram buckets of useful to a developer and technically feasible. And the sweet spot of these products are somewhere in between, and they need to hit this quality latency and correctability trade-off, um, to, for the product to actually have real adoption in the enterprise.

Prateek Joshi (06:10.251)
That's a fantastic way of looking at this sector and understanding what's doable. Meaning sometimes you can build cool stuff, but actually to build a real product, a lot of the practical constraints, as you said, like, yeah, you'll generate a bunch of amazing code, but is that actually usable? Can the developer correct it if needed? Can they accept these suggestions? So I think that's fantastic. Now, maybe it's a good point to stop and talk about.

You know, you're the founder, CEO. For the listeners who don't know, can you quickly explain what Kodium does?

Varun Mohan (06:46.09)
Yeah, so maybe I can start with a little bit of backstory of how we got here because we didn't even start building podium. We started with a company that built GPU virtualization technology. So technology to run GPU workloads at scale and GPU compiler technology. So technology to run these deep learning models even faster. And we were managing upwards of 10,000 GPUs and sort of what happened was in the middle of 2022, transformers ended up becoming really popular.

And we realized that this would usher in a brand new set of applications. And one of the big applications that we were really excited about was code. And we were early adopters of product like Copilot, but we thought that this was just the beginning of what the future would look like. So what we decided to do was take our technology and actually vertically integrate it and build out this product called Codium. But we ended up doing it with a little bit of a twist. We said we'd give the product away entirely for free, which meant that we needed to optimize the infrastructure so much. We needed to train our own models and run them at massive scale.

And just some stats, in the beginning of 2023, we had less than 1,000 users. At the end of 2023, we had over 300,000 users that used the product. We processed between 50 and 100 billion tokens of code every day, which is between 5 and 10 billion lines of code every day. Codium is one of the top five largest generative AI apps in terms of text in the world. And then also on top of that, when developers use Codium, over 45% of all software that they write is generated with Codium.

And Codium offers like a suite of functionality. It offers ultra low latency auto complete that generates the most amount of software of any tool in the market. It offers chat that is completely contextual aware. And it also lets you make larger scale changes in your code base with operations like command and stuff that is agnostic of what IDE that your company ends up using.

Prateek Joshi (08:24.991)
Right. I actually saw when Kodium launched, like on Hacker News, you posted it, and it's a tough crowd, and you're answering questions, and you're standing up to the crowd. And I mean, new product is always tough. They'll compare you to a trillion dollar company, but I think it was a great launch point, and I'm glad you kept coming a long way. Now, let's dive a level deeper here. So there are so many different...

programming languages, paradigms. How do you handle that during training? And also, do you need specialized training for each language, each paradigm, each different section of this huge programming universe?

Varun Mohan (09:09.674)
Yeah, so I think two good points that you brought up there. So the first one is actually one of the beautiful things about these LLMs is they're generic learners, actually. So I'll give you an example of what that means. So Rust and C++ are fairly similar languages in some ways. Like Rust might have better memory semantics than C++ does. But what you actually notice is the quality and capability of these models at writing Rust improves as you train it on more C++. Because it's actually able to learn from the semantics of C++ to write better Rust.

Just some other piece of detail there. So no, the answer is like a lot of work doesn't need to be put on the most common languages. Like we don't need to do something very bespoke there, but we actually try to make our system work for some of the more bespoke languages given that we work with enterprises like Verilog, RTL. If you think about languages like COBOL, these are languages that we care about for enterprises. And that is something that we take time with.

If I can say one thing that we've had to do outside of the modeling side is making sure that it works across all platforms, right? If you write Windows applications, you will use Visual Studio. If you write Java applications, a lot of companies use Japrains or Eclipse. And actually, we're the only product on the market that has an Eclipse extension right now.

And that's fundamentally because we realized that developers are tied to the platforms that they write in. And our goal here is to supercharge development regardless of where people are. Right. And that was a, that was a core tenant of the company. And we came up with the JetBrains product very soon after coming out with a VS Code product.

Prateek Joshi (10:36.239)
That's actually a really good point, meeting developers where they are, where they live, because as developers, once you choose a tool, you'll stick with it for a long time, and they're not gonna switch to a brand new programming tool just because there's another tool available. So that's very interesting. Now, let's talk about the model architectures that are more conducive to building a product like this. Now, you mentioned earlier that

Varun Mohan (11:02.104)
Yep.

Prateek Joshi (11:03.495)
LLMs are generic enough that you don't need to do a lot of custom work. Now, can you explain the underlying AI models that are powering this? And also, what parts can be optimized so that when you compare it to generating like normal text versus generating code, what can be sped up, optimized, or made better in general?

Varun Mohan (11:26.878)
Yeah, so I mean, I mean, the high level answer for some of this of what people are running, they're running auto aggressive transformers. So what does that mean? That basically means like, you know, you need to pass in a prompt and this prompt gets processed. And then afterwards, it generates one token every step that it runs and runs through the entire model. This is actually fairly inefficient, right? In the sense that you need to read all the weights of the model to generate one token. You can process the prompt all in one shot, but you need to every time.

you do a token. So this means that these models are really efficient in terms of compute utilization if you batch because you can you can process many, many prompts at the same time and you can actually generate when you generate one token, you can actually do it across many prompts. You can get better compute utilization. These are the kinds of tricks that we've had to we've done where

How do we actually make sure that we can batch a bunch of these prompts and a bunch of these generations all at the same time, while not blowing up the amount of memory usage on these GPUs? Because these GPUs don't have a lot of memory on them. So I'm getting into the weeds here.

But these are kind of the architectural decisions you can make on the transformer to make it more conducive to the application. One other thing that really matters is the objective by which you train these models. And I'll tell you, that's like a big change that we sort of had to make, which is that when you're writing code, you're, it's not like chat. There isn't, there's code underneath you. There's code above you. There's code in other files. And we've made it so that the objective one is good at this objective called filling in the middle, which is filling in code. Most of the time when people write code, they're refactoring code.

Right? They're not writing net new code. They're taking an existing application and changing it. So for those kinds of applications, we needed to change the pre-training objective, objective by which we train the model to actually make it good at that.

Varun Mohan (13:05.75)
The same thing is we even need to make the model, I would say, better at applying context from other files. That isn't a thing that you read a document and you pass it into a data set. It will be good at applying context from other documents. That's something that we've needed to change our models to be capable of. But these are tweaks that we've had to make on the training site to make the product better and the application better.

Prateek Joshi (13:27.567)
I think you bring up a great point. When you look at different coding tasks, sometimes it's just refactoring, sometimes you are adding net new code, sometimes you are just extending the functionality by a little bit, the dependencies are all over the place. So what's the path to pre-training? So in this case, how often do you have to do it, or rather, do you have to keep all these tasks in mind when you do pre-training? Or...

Do you just say, here's a whole bunch of code and has a whole bunch of tasks, just take care of it. So how much do you go into each coding task, if you will?

Varun Mohan (14:08.377)
Yep.

So I think this is one of those things where having, why having users is important because we can realize what are the most common things that people actually care about. And like everything in the world, there's like this 80-20 distribution, right? Where 20% of the tasks account for 80% of the usage, right? So from that, we're able to very quickly learn, here are the objectives that we care about the most. And yeah, like we do want newer code all the time. We want newer code, we wanna be able to be up to date. Some of this can be solved with retrieval, right? Like if you're within a bespoke code base,

probably want to use retrieval rather than train every hour, right? Train the model every hour. So there are different trade-offs that we try to think about, but yeah.

Prateek Joshi (14:50.315)
Yeah. Now, in terms of interpreting the model, and if it's just autocomplete, meaning you're either giving me a line of code that I'll use or not, I can just accept it or reject it. I don't need to really understand what model you're using, but as the amount of code generated increases, meaning you're generating a block of code or maybe an entire file, you don't do it, but in general, this field is doing it. How do developers

understand or trust the decisions made by an AI tool. Like, are there tools to debug AI logic? Or, as you said earlier, that's not even the right path here. You should meet where the developers are where they don't need to understand all the complicated architectures of the underlying AI models.

Varun Mohan (15:38.494)
Yeah, I think it's probably beyond the scope of most developers to just understand all the details. I think though, one thing that should be clear is the stuff should be reviewed if it's critical, you probably don't just automatically stamp and just say it's fine. And one of the things that we commonly say is don't get rid of your CI CD pipelines and your security pipelines. Like actually validate that the security of these systems is high and continue to use the same deterministic systems. And these models will make the same kinds of mistakes humans do. So

Yeah, I think probably don't need to understand the underlying details of how this works, but you shouldn't just take it for granted. These systems are perfect.

Prateek Joshi (16:15.619)
All right, and as you go to bigger and bigger companies, they might say that, hey, there's a starting point, there's a big model that does the work, but I want it to be specific to my context, meaning we've been writing our code for a couple of decades now. So how do you take an AI assistant and make it specific to the context of a large customer? And also to...

Does that even matter? Can the big AI model just say, hey, it doesn't matter, there's too much code that we used to train this. So your small code base of one company doesn't really matter. So how do you address this part of it?

Varun Mohan (16:54.558)
Yeah, so I guess a couple of things that we actually do. So Dell has public case studies using the product, right?

So very large companies currently use Codeium. I think we address it in two ways. So first of all, we actually enable the largest companies to potentially self-host the product if they don't want their code leaving the company. And this is possible just because we actually do train our own models and have a stack that we run ourselves, right? Our company isn't wrapping a bunch of other technologies, which is not bad, but that enables us to do that, right? And then the second thing is we offer a level of personalization that is for the company that enables them to get the best experience possible for their code base. And that could be through the form

retrieval that could also be because we on the same piece of hardware we enable companies to continuously fine-tune the model too by the way on the same piece of hardware

Right? So for us, we built infrastructure that actually is able to do this and tune the model to the private data. And this mostly is useful if they write bespoke languages, right? Let's say they use a language or they do something that is completely unknown from what's happening in the open source. Like don't expect if you write React, like every other company, for fine tuning, you provide a lot of value. But in those cases, we're able to get that additional level of personalization to make sure that they're not just getting a cookie cutter product, but a product that is like tuned to them.

Prateek Joshi (18:08.207)
Right. And if you look at the security and privacy aspects of it, as you said, no matter what code is generated, you should still have your CI CD pipelines. You do all your tests to make sure it meets your standards. But before that, when you release an AI coding assistant product, how do you assess the security and privacy standards of the code that's being generated?

Varun Mohan (18:33.366)
Yeah, so I guess there's security and privacy in two ways, right? Like, is it leaving the company? Is it, how is it getting stored? What are the compliance? So we have a cloud product with SOC 2 compliance, zero data retention, and stuff like that, so that's one. Or we have a self-hosted product that you don't send anything anywhere. It doesn't like, the code is like always resident where you are, which is what some companies postures are in terms of code, which is totally reasonable given the way they think about code as some of their most important IP inside the company. When it comes to security, security being like the actual, if there are

comes back to what we discussed previously, which is that ultimately these systems are not gonna be perfect. You can, what the kinds of things that we try to do is, in our pre-training dataset, we try to remove code that has known vulnerabilities, right? So if it has log for j issues or issues that are with very, very clear security issues, we don't put it into the data.

But ultimately, there's probably real security bugs in even the Linux kernel, right? Which is maybe one of the most reviewed pieces of code. There are probably real security bugs there. People find security issues all the time. So if that's possible, these systems will have security issues.

Prateek Joshi (19:37.463)
Right, and also as we see this subsector maturing, like more people will write code, the code generation will become easier. Do you think there's a risk of an increase in subpar software just because as it gets easier, more people will do it and just because of that, the average might go down because by definition, that's what happens here. So how do you think that

place out.

Varun Mohan (20:08.702)
Yeah, my suspicion is that in this dimension.

having people will be able to write more software. I think here's what's gonna fundamentally happen. I think actually the capabilities to review software also get way better. And we might have some answers to this very soon where you can actually AI-assisted review code in a way that is like, once again, you're validating the code, but you're able to validate it much more easily. And I think one of the things that's actually kind of interesting is, if you look at some of the senior engineers or these quote unquote 10X engineers, they're actually by the time if the company matures a lot

have been at the company for a while, they're spending most of their time answering questions, reviewing code, and I suspect that actually what these tools are going to do is they're going to accelerate the capability for new hires and people that are more juniors to onboard onto a code base way more quickly.

on board onto a code base, answer their questions way more quickly, which is going to free up more time from the experienced developers that fundamentally want to write differentiated software. They want to write applications that make the business better. And actually think it will accelerate both of these aspects. I don't think it's just going to be, we're going to get a lot of garbage and bloated code. I actually think the way around that is make it much easier to review code.

Prateek Joshi (21:18.531)
Right, that's a great point actually. I think AI assisted code generation is leading this movement. And then yeah, AI assisted code review will also kind of catch up. And yeah, that's a great point. Now, do you believe that in generative AI for coding, it will render certain programming languages or practices obsolete?

If so, what are those corners that are at the risk of just becoming extinct?

Varun Mohan (21:50.206)
Yeah, so it's a good question. One of the things that I think is possible of why that could be more possible is generative. Like these kinds of tools make the switching costs from one language from another a lot lower, right? Because the ability to recreate the app is much easier. All that being centered on stuff like migrating from Cobalt to Java is probably just very hard. And actually there's a book on this, on just Cobalt. The reason why this is so hard is.

Some of these business applications that have been written in COBOL are so complex, have so much logic that just migrating it to COBOL, to Java, is just so error prone. I think the IRS tried to move their COBOL code base to Java and just spent over a billion dollars and just couldn't do it in the end, right? So maybe these kinds of tasks...

I'll be already kind of hard. They will get easier. But one thing I don't believe is that there's gonna be fewer developers and this is gonna take jobs away from developers. I'm firmly against that, I believe, just because what happened when we moved from people writing assembly to people writing Python? There were more developers, because more people can write Python. And more people that have ideas are actually able to execute on them. Right?

And people that already know and understand computers deeply are more leveraged. And I think we are clearly in a world that is software start. You go to a big company that spends 20% of the revenue, 10% of the revenue on R&D and software. And you ask them, look, would you rather fire a bunch of your developers or do more with your existing developers? All of the CIOs in the company will say the latter. They will not say the former. So.

Prateek Joshi (23:16.259)
Right, right. Earlier you mentioned edge cases or like less common scenarios. How does an AI coding assistant handle these edge cases? Obviously it's not life or death like in like a autonomous car, but it is pretty, pretty important when you're shipping production software. So how do you handle edge cases here?

Varun Mohan (23:39.538)
Yeah. And what do you mean by edge cases? Like just bugs in software?

Prateek Joshi (23:44.659)
It's cases in the sense that you encounter a customer who's doing something that is just not in the defined area of what a coding assistant will do. So will it just say, hey, I just don't know how to auto complete this, you figure it out or how do you handle things that you haven't seen before?

Varun Mohan (24:02.779)
Oh, I see. I got what you're saying. So two things.

So these models hallucinate for a couple of reasons. They hallucinate partially because if you look at the internet and just the web, the amount of times people say, I know for things that they don't know is way higher than the amount of times people say like, I have no idea, given a question, right? So people are way more confident when they write things than when they're speaking in person and all this other stuff, right? So these models, generically speaking, if they don't know the answer, they'll make something up. Now, what you can do is there are ways in which you can tune these products. And we've done this for autocomplete where if the model thinks that it's a low probability sequence, you just don't show.

it right or you show it and tell people that it is and you know you could you could do other things where you can make it so that you're you have a preference or reward model that rewards a model and this is for stuff like chat that says I don't know when it clearly doesn't know

Right. And you can actually make it do these kinds of things. And we've actually invested time in making sure that look, like, for instance, when you chat with your code base, sometimes it'll just lose it. I saw this in your code base and just, if it didn't have, if the context doesn't support the justification that the assertion that it made, just say, I don't know. I don't really know. Right. And default to that. Now the problem is you don't want to get into a state of what these other systems, I think, Anthropic released a model for quad two at that point.

Prateek Joshi (25:07.915)
All right. Yeah.

Varun Mohan (25:17.546)
I've yet to see if Quad 3 is a big improvement on this dimension where it'll just reject responding to you. It'll just be like, sorry, I cannot respond because I believe that this is a threat. And there's a fine line here where you, you know, it's not, it's paralysis. You're not completely paralyzing your model to not do anything. And then also at the same time, like confidently, being confident when it should be, and then saying, I don't know when it doesn't, right?

Prateek Joshi (25:41.099)
Right. If you look at the impact of generative AI on coding over the next five years, there's so many dimensions in which it's doing a good job. It's bringing in more people who can now write code. It's increasing the speed of iteration, the amount of code that gets shipped, the number of times you can test. So along all these dimensions, where do you think the biggest impact

will be, and this is your personal view, like where do you think it's gonna do a phenomenal job in the next five? No, for coding, Gen.AI for coding. Is it about bringing more people in, speed of iteration, amount of code generated? What's gonna be the biggest impact?

Varun Mohan (26:13.086)
Is this encoding or is this outside of coding?

Varun Mohan (26:22.878)
Yeah, I said a lot of things about agents. I believe that some form of agents that are controllable are gonna be materially important in the next five years.

I think, you know, I try to do this thing where, because I was in autonomous vehicles where, look, the technology is always getting better. It's always getting better. And, you know, I'll draw a couple of analogies there. When I was in AV, every year the next year, we would have had AV. And clearly we didn't, right? Like I was there for, I was out of company for four years. And in reality, like ultimately large scale AV was not still yet to be deployed. But the trade-off is in the meantime, actually the technology curve got way better. Like consumer-grade GPUs went from 10 teraflops in 2018.

to 660 by the end of 2022. So they went up by two orders of magnitude, the amount of compute per GPU has gone up, right? So from that perspective alone, the capability of the new models will increase. That means that the frontier of what's feasible and what applications are able to be built, will be higher. That means that the capability for us to get closer and closer to autonomously generating a PR is gonna be very high. Are we gonna be autonomously generating all the PRs that really exist? I find that a little bit hard to believe just because even at the largest companies,

How do you, like the tools necessary to execute code, the containers necessary, the bespoke environment, the tools that it needs to get access to. It's just a lot of things need to come together for that to be the case. But I think in certain specific ODDS or operating domains of writing software, we will be getting closer and closer to autonomy. I don't think we're there yet now, but I see that as a possibility.

Prateek Joshi (28:00.467)
Right. One last question before we go to the rapid fire round. And it's about the technological breakthroughs in generative AI. So if you look at this field of all the things that are happening, specifically to enable coding tasks, like what technological breakthrough are you most excited by in terms of what it can unlock for you and Codium?

Varun Mohan (28:27.934)
Yeah, so I think there are two key things that are gonna be huge. Model scaling and more reasoning is gonna be good. This makes the capability of these models hallucinate less. It can do larger scale tasks more reliably, right? Because once again, if the quality is not high enough, it doesn't matter that you have something that's like kind of good. It's just not gonna be good enough to work. The second thing is being able to munch and consume as much context as possible is gonna be the other key piece. Like...

What are the techniques necessary to actually be able to understand incredibly large data sets? Right. That's going to be the two key aspects because ultimately when you look at a software engineer, they have and the code bases they operate on top of, some of them are a billion tokens. Right. That's like a far, that's far by way larger than what we're seeing right now. So I think both of these in conjunction will get us way further along on the, just the basic technology curve.

Prateek Joshi (29:19.375)
All right, that's amazing. All right, the rapid fire round. I'll ask a series of questions and would love to hear your answers in 15 seconds or less. You ready? All right, question number one. What's your favorite book?

Varun Mohan (29:30.731)
Yep.

Varun Mohan (29:35.518)
Idea Factory, it's this book about Bell Labs and how they came up with ideas during World War II and stuff like that.

Prateek Joshi (29:42.479)
Amazing. Next question. What has been an important but overlooked AI trend in the last 12 months?

Varun Mohan (29:50.282)
Um, probably still the impact of scaling and more data and high quality data. I think there's a lot of other things that people are talking about, but, but not enough talked about that.

Prateek Joshi (30:01.267)
What's the one thing about AI coding assistance that most people don't get?

Varun Mohan (30:07.831)
You can't build a product that does multi-step agent behavior if the first step doesn't work reliably already.

Prateek Joshi (30:16.263)
What separates great AI products from the merely good ones?

Varun Mohan (30:22.408)
Those three factors, they hit the sweet spot of latency, quality, and correctability.

Prateek Joshi (30:27.711)
What have you changed your mind on recently?

Varun Mohan (30:37.958)
I guess the speed at which GPUs are gonna improve. I think they will improve actually faster. Looking at things like GROK, I think there's a lot of competition in this space and that's gonna be awesome for the entire category.

Prateek Joshi (30:48.191)
Yeah, I agree with that one. What's your wildest AI prediction for the next 12 months?

Varun Mohan (30:59.258)
I think the way in which video and media consumption happens could change markedly. And it's pretty impressive to see what things like Sora are capable of doing.

Prateek Joshi (31:08.727)
All right, final question. What's your number on advice to founders who are starting out today?

Varun Mohan (31:15.51)
Focus on building a product that people actually want today rather than a demo. I think it's possibly easy to get hype on a demo, but ultimately you gotta build this company and a company doesn't work if it's really just a demo.

Prateek Joshi (31:26.783)
Yeah, that's amazing. Varun, this has been a brilliant discussion. Love the sharp opinions. And it's always nice when somebody kind of, you have thought things through, you have an opinion and you're presenting it. It's always nice. So thank you so much for coming onto the show and sharing your insights.

Varun Mohan (31:43.478)
Thanks a lot, Pratik.

Prateek Joshi (31:45.363)
Alright, give me one sec-