
Infinite Curiosity Pod with Prateek Joshi
The best place to find out how AI builders build. The host Prateek Joshi interviews world-class AI founders and VCs on this podcast. You can visit prateekj.com to learn more about the host.
Infinite Curiosity Pod with Prateek Joshi
Is LLM the New Operating System? | Anant Bhardwaj, CEO of Instabase
Anant Bhardwaj is the founder and CEO of Instabase, an AI-native unstructured data platform. They've raised $322M in funding to date from NEA, Andreessen Horowitz, Greylock, and Index Ventures. He did his masters from Stanford and PhD from MIT.
Anant's favorite book: The Singularity Is Near (Author: Ray Kurzweil)
(00:07) Defining Unstructured Data
(01:18) The Growth of Unstructured Data and Its Challenges
(02:05) Evolution of Tools for Analyzing Unstructured Data
(04:25) How Large Language Models (LLMs) Changed Data Processing
(05:27) Do We Still Need ETL in the LLM Era?
(06:05) Structured Queries vs. Direct Unstructured Querying
(08:22) Applying LLMs in Enterprise Settings
(09:34) Ensuring Accuracy in AI-Driven Data Analysis
(11:29) SQL vs. AI-Driven Queries in Business Use Cases
(13:48) Retrieval-Augmented Generation (RAG) for Enterprise AI
(15:02) The Founding of Instabase and Its Early Vision
(19:03) Building the MVP of Instabase
(22:52) First 10 Customers: Lessons from Early Sales
(26:01) Scaling Customer Acquisition: Experiments and Failures
(30:35) When to Hire a Sales Team: Key Lessons
(33:52) AI Adoption at Instabase for Internal Productivity
(37:48) The Technology Stack Behind Instabase
(42:36) Transition from OS-Based Architecture to LLM-Based System
(43:45) Rapid Fire Questions
--------
Where to find Anant Bhardwaj:
LinkedIn: https://www.linkedin.com/in/anantpb/
--------
Where to find Prateek Joshi:
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
X: https://x.com/prateekvjoshi
Prateek Joshi (00:01.361)
Anant, thank you so much for joining me today.
Anant Bhardwaj (00:04.206)
Thank you so much. I'm glad to be here. It's great.
Prateek Joshi (00:07.567)
Let's start with the basics. How do you define unstructured data and where do we encounter it in the real
Anant Bhardwaj (00:17.122)
Yeah, so I think before defining unstructured data, let me tell you, it's much simpler to define structured data and then everything else is unstructured. So the structured data is the one where you basically really understand the different kind of information it presents. So for example, if you are an employee, then they must have name or address or date of birth. So you create a nice table or, know, Excel seat with columns, which is here are the attributes you put the value.
And there are really, really good languages defined to understand them, to query them. You must have heard of this stuff called the structured query language, which is a standard for sort of like storing any information and then being able to do analysis and query on them. That's how the whole world works. That's how most of the applications work. Anything that cannot be put into that format is unstructured.
So that's a much simpler definition. not a direct answer to your question, but hopefully, hopefully you get it.
Prateek Joshi (01:18.161)
No, that's great. Actually, by defining it this way, we've shown how much larger unstructured data is in the real world. Because not everything, actually a very tiny fraction of the data can be put in a nice structured format, and everything else is unstructured. So unstructured data, it's been around for a long time, and it's been very manual. Meaning if you want to look at receipts and spreadsheets and documents and log files,
Anant Bhardwaj (01:32.216)
Yeah, exactly.
Prateek Joshi (01:48.261)
It's been very manual because we didn't have the tools to analyze it and automate it. So what has changed now? What tools do we have today that are enabling us to do something with this unstructured data?
Anant Bhardwaj (02:05.1)
Yeah, let's first figure out how did we understand unstructured data earlier and then what had changed. So earlier, let's say the original form of data is unstructured. And I'll tell you like, what do I mean by original form? So sometimes the original form of data is structured itself. Like for example, if you go and buy five bananas from a store, they store it nicely that.
some order number with banana, but this price, quantity, this is the order. So you have nice structured format and you can basically understand them. But the data that basically in the original form is unstructured like log files and documents and just random information, text and other things. In the past, if you really have to understand them, there were this process that people used to have called ETL, called Extract, Transform and Load.
and that will allow you to convert this into some kind of structured format. and people had the big data techniques like MapReduce and Databricks and others, but basically they will take all these unstructured data and then bring some structure to it. And once you have the structure, then you go and kind of like do your query, do your question answering and so on. So that is what was done until like now. So you can't query unstructured data directly. You have to first bring structure to it. And the way you do that is by ETL.
and then you run structured query. And this is still kind of like being done for most part. So, however, certain things have changed. And for the first time, you can literally ask a query directly on unstructured information and large language models. What they do is you go and let them read all of the internet and somehow they are able to compress all of those things and you ask question in unstructured format and they can give you an answer. This was not possible.
Like if you look at the earlier, if you had to understand the internet, the way you do that is Google will run a crawler where they will crawl all of the web pages. Then they will run indexer that will define which term of pen, you know, appears in which page at what, what position called TF IDF indexing, then ranking. And then they will do the structured query and then give you the relevant query, the relevant documents matching your query. And that's how you were able to get answers or information relevant.
Anant Bhardwaj (04:25.186)
to your query, but for the first time with the large language models, I think you can actually create a large neural network which can see all of the words and tokens and somehow they're able to compress it into something that we still don't fully understand. But if you sort of like ask a question, they're able to give you the answer. So we have figured out some mechanism that is still not fully understood that how neural network has encoded and decoded those information.
There is of course the transformer architecture and all that that you have heard about, but we are yet to understand the sort of like detailed understanding of explainable way of understanding this. So now this is being extended with like how you can apply this on the data that is not directly in these models by doing some kind of vector indexing, figuring out the match and then using LLM to present an answer. So the key change that is happening is our ability to be able to query unstructured data directly without needing to bring structure to it.
and that opens just so many new possibilities.
Prateek Joshi (05:27.431)
You made a very good point. If you look at it in a broad two-step process, step one is you use a tool to bring some structure to it, and then step two is you query. Now, ETL, which is step one. Now, with the LLMs that we have today, there are many companies, entire companies built on just providing the ETL tools. Now, are LLMs gonna...
skip that step, meaning like, do we still need to do ETL in the modern world today where we can just take unstructured data, put it into an LLM, and then you start querying.
Anant Bhardwaj (06:05.944)
That's a great question and it depends on what kind of questions. So for example, like earlier, the literally some of the
very important questions, but not even possible by doing ETL and structure. Like for example, can you tell me quickly the summary of this stuff? Like what structure do you put? Right? Like you really need to understand unstructured information or things like, can you understand the, you know, every information given in this and present me with a draft that allows me to create a presentation? very, like so earlier, like ETL just allowed you to put nice
know, table relational data format, which tells you, me all the rows that matches this column specifications, filter on this column stuff, find all the rows, do some kind of join. So those kinds of stuff, and we're just very, very valuable. mean, it's not that they're not valuable. So that's where basically the whole ETL was very, very helpful because the best query language that we knew was SQL and everybody.
wanted to convert all the data into some kind of SQL format so that you can run all those analytics tool because all the tools also only supported SQL. So that was the reason behind that. Now, I think different kind of questions are possible, which earlier were not possible. So that's number one. And second, of like it understand things at much deeper layer that cannot be
really transformed into those row column structure that earlier we were limited to because that was very, very limiting. had to kind of like, just think of, right, you have to map every world information into some row column format. That's a very, very, very hard thing to do. You lose a lot of information in that. So I think being able to, you know, ask questions directly on the raw unstructured sort of original data.
Prateek Joshi (07:59.475)
Yeah.
Anant Bhardwaj (08:10.414)
is one of the most important toolkit that would open up so many new opportunities.
Prateek Joshi (08:22.207)
That's fantastic. And some of more, the very popular LLM tools we have, ChatGPT and the underlying models like GPT-4.0 or Cloud. Now, they took all of the internet. As I said, they compressed it into a model. And then you can ask questions. It will provide answers. Now, if you take this to an enterprise setting, meaning you go to a big company and their needs are a little more
structured, they need more accuracy, you can't just make stuff up. So if you go to a company and they have all this data, like log files and receipts, and when they use tools like this, they need specific, real, accurate answers, meaning, what was, how many receipts did we process last week? You cannot guess it, it's not probabilistic because they need this information to do stuff. how do you teach an AI model to understand unstructured data? And two,
How do you meet the needs of a big company where hallucinations are not as tolerated as if Chajjibidhi makes something up, I'll be fine. I'm an individual, I'm fine. So how does it work in a big company setting?
Anant Bhardwaj (09:34.478)
This is actually a very, very good question. And I think there is just so much nuance to this question. And I think, let me just break this question into multiple parts. What are these models? So I think sometimes we misunderstand. Models are basically some intelligence.
that allows you to take natural language specification, instead of taking a specific programming language or very specific instruction, it can take much more ambiguous human instruction, can interpret it and do things. That's what the large language models are. And of course, they also can compress information based on where they are trained on. Now, in enterprise settings, there are two things that are different. Number one.
It might include data that these models are not trained on. So basically the data are very specific to enterprise because some of those data could be their intellectual property or their basically mode or whatever they want to call it. So many of these data are not available on the internet, which the model might not have ever seen. So that's number one. And second,
is you want answers to be accurate. Like for example, how many people would qualify for this loan or basically should we terminate this contract or things like that. And you don't want like a subjective answer. And so it's very important to understand how do we apply these models to those kinds of problems. And it would be a bad idea to...
apply these models on the unstructured data naively. So, for example, we know that the if the data is already nice in structured format, we know that SQL is great. SQL works. It produces you accurate, reproducible, explainable results.
Anant Bhardwaj (11:29.9)
And that basically means you don't want LLM to go and answer that question. You want LLM to figure out how to convert this into SQL query and run the SQL query. Because then you can pretty easily debug it, pretty easily explain it, pretty easily know that it will not produce anything bad. you know, I think the, there is still a valid use case that you want to take unstructured data, do some kind of like ETL, bring some structure to it, and then don't use LLM to like,
sort of directly answer those questions. Let's just think of like human brain. Even though we have good understanding, we don't answer all the questions just from our knowledge. We run some tool, right? We go and run some tool and based on the tools we all we produce some answer. So I think the intelligence which large language models provide can be used also to execute and run tools rather than just answering directly from the knowledge within the LLM. So that's the first part. Second part is...
Now let's say it is not possible to bring the structure to that data because we said that it could be very limiting, right, if we have to convert everything into row column formats and so on. So let's say if your data is lying into a bunch of these documents and images and audio and all of that PowerPoint and Word and Excel and so on. Now there are techniques that allow you to basically sort of without...
bringing structure, you can convert this into kind of like some kind of encoding and put that into some kind of like vector index. So just think of those as like row of numbers and floating point numbers that allows you to convert every text or whatever, the pixels into those numbers. And there are specific databases designed to store them. And there are good techniques that allow you to figure out based on the query.
what are the relevant matching results? And then you can basically sort of like get an answer from them. So now you are not answering question from directly the data that is, you know, learned during the training, but rather searching from the data that is within enterprises. It's the technique is called, you know, retrieval augmented generation, right? And there are different variants of it that people use. but now a lot of people basically would use LLMs more of like
Anant Bhardwaj (13:48.39)
intelligence layer rather than the data layer. So it's not that they need to know all the answers. What they should be really, really be good at is figuring out how to interpret the question, go and figure out what is the most relevant like chunk of all the stuff that we have stored might be relevant to this answer. Read them. If they're good, then produce the answer or keep doing that. Like now you have these more of the agents where
based on this, now you can figure out what the next step to do, and then you figure out the next steps to do. So now a lot of techniques are being developed where you're not using these language models for encoding information within itself, but more often in intelligence layer, where they can now go look at your data, find out the relevant information, and give the right kind of answers. The advantage of that one is the answer that you are giving is backed by the data that has been in the enterprise. So now you're not just giving an answer. can say, like, the reason why I gave you this answer is because
I found these basically segment of information which came from this word document that you had or this PDF that you had or this PowerPoint presentation you had. So you can kind of like still grounded by the enterprise data.
Prateek Joshi (15:02.643)
Amazing. Let's move to, I want to take you back to the launch of Instabase, right? So at the time, want to, right before launch, what was the landscape like and what was the insight behind launching the company?
Anant Bhardwaj (15:22.092)
Yeah, so actually we had a company kick off just last week and we were talking about that. So there is a blog that is still available. Not that website is down, but Web Archive is still archived that. So it's still available in the Web Archive. I wrote a blog called The Changing Landscape of Data Systems. And I'll send you the links so that actually you can read the whole thing. That was written in 2015. And that makes basically three basic arguments. And the first one is...
Prateek Joshi (15:34.259)
Yeah.
Anant Bhardwaj (15:51.488)
the true value of data would not be realized until it can be used to answer questions that people care about and be able to make the decisions that people care about. Those are the two major things you want to do. So that's the number one. Second is different people
want to use data in a very different way to answer questions or to get things done. So for example, a product manager would look at the same data very differently than a software engineer would look. So it's very, very important that the systems that we build cater to different kinds of users. And
So for example, the non-technical users like, let's say, product managers or an analyst should be able to find answers and explore those things without needing to understand technical details like information models and data architecture and so on. But at the same time, sophisticated programmers would be able to do increasingly more sophisticated tasks by being able to use all the complex programming languages and things that they have access to. So that was the basic hypothesis. Now, the argument at that time was made.
what that kind of system would look like. And I don't know how much you have seen the history of Instabase. We said there is one software that we know looks like that, which is operating system. Operating system hides all the complexity, like if you get Mac OS or Windows or Android or iOS, and they give you different kind of apps.
And different kind of apps are used by different kind of personas, like iPhone apps that used by developers might look at, let's take Mac OS, that's a better example. So non-technical people use very different kind of apps than technical users. Technical users might use like VS code or some terminal, but the, know, sort
Anant Bhardwaj (17:44.622)
Non-technical people would use things like Word and Excel and PowerPoint and other kind of things. So OS kind of like caters to all these different types of users by being a single software and will expose the right kind of interface. So Instabase started with this hypothesis of building that kind of operating system. That was, and it came from the MIT Data Hub project and we made a lot of progress, some in right direction, some in wrong direction. And of course later then LLM came and that became the core part of things and so on.
Prateek Joshi (18:14.451)
No, that's amazing. And I still remember years ago, you and Sarah did an interview years ago on like, that was like first big like I think, obviously, it's the basis around but that was like a big move. it was, yeah, so it's been, yes, I've been following InstaBase on and off for a while. And it's been very inspiring to see you build the company. And obviously, now it's much bigger. But no, it's been great. Now, going back, obviously,
Anant Bhardwaj (18:19.598)
Mm-hmm.
Prateek Joshi (18:41.063)
You've had developed the product, multiple iterations, big customers now. What did the MVP of Instabase look like? How did you decide, this is what should go into the MVP? And over the years, obviously, if there's been a big redo or relaunch, how did you decide what goes into that MVP? Post-LLM era, if you will.
Anant Bhardwaj (19:03.406)
Yeah, yeah, so much so many different iterations. So because we started with this idea of operating system. So the first layer was a bit in operating system, you were able to mount two different data sources because we knew that we did not know where data lives. So the OS came up with like, hey, just create your workspace and mount like different databases, which would be file systems, database applications and so on. So that was the layer, but we did not know.
who wants to use it and what they would do with it. So we created this app called Refiner. So Refiner is just like an Excel-like format where you define the structure that you want out of it. And you just give a couple of examples and expect them seen to be able to automatically figure out. And the technique that we used at that time was called program synthesis. And the reason why was because ML was too
know, rudimentary at that point in time and required a lot of labeling and training and all that, which people didn't want to do. So, so program synthesis was a technique that worked on some kinds of problem, did not produce great results on some other types. But, but that was initial interface. So what we would do is we would go and show demo to everybody. It's like, and some demo would be on log files. Some demo would be on some random text and so on.
And we were doing a demo to a company that does this payroll and benefits company. And we were doing this demo, and they said, like, this log file stuff, that's not our problem. Our problem is we get this PDF from these benefits and insurance provider, and we want to figure out which one had the best premium. And every employee has to select which plan they want to pick. And nobody wants to read this, like, 100 page details and all that stuff.
So when they go to our website, they should be able to compare all the plans and everything immediately. And we were like, we don't do PDF, but we are an operating system. Maybe let's add an app for processing PDF that takes the text, pass the text back into the refiner, and extract all the fields. That company, of course, there's something happened. The CEO left. We never did a deal. then we basically, but that demo looked very good. So we will show that demo to everybody.
Anant Bhardwaj (21:19.234)
So there was this another company that did lending. So we would go and say like, look, you can also process this PDF. They're like, yeah, you know, this is great. But the PDF that we get is when people apply for loan, they submit that camera picture of their pay pay or pay stubs and bank statements and ID cards. Those are images. Those are not like PDF that you can put into text. So like now you have to take the images, convert into text and do the same thing. And that's how we got pulled into this like gnarly problem of
document understanding early on. So we added this OCR and then extraction, but then they're like, we don't even know what kind of documents people submit. Somebody can submit some kind of brokerage statement, somebody can submit some kind of pay stuff, somebody can submit something else. So you have to figure out what kind of document it is first before you can do, then add some classification layer. So that's how we got into this problem of unstructured data. And that's where that particular company chose to...
do a proof of value, a proof of concept with us. That company also had their CEO fired, which is kind of not good. So we never did a deal with them. But that gave us a first real huge case. And then, of course, we did several of large banks. Like we have four of the top five US banks as a customer. We got that pretty early when we were literally four or five percent company. So that's what led to the figuring out the first huge case that became one of the primary huge case.
and led to rebuilding sales team and so on. So accidental.
Prateek Joshi (22:52.037)
Yeah, no, that's amazing. It's always, you always works like that where you solve something burning and just floodgates open. Okay. So in the early days, you're going from zero to 10 customers, obviously your early customers are very, very big, but what can you share about what it takes to get those first 10 customers? And what did you have to do? mean, clearly in this case you had to go and they asked you something, you had to go build it because that's what they're willing to pay for. So
But in general, less than customers, what do you have to do to close them?
Anant Bhardwaj (23:27.096)
So I think the general framework that we used, so rather than giving an answer what specific we did, but what the mindset was, which might be more relevant answer here, and that is.
knowing that it's very, hard to really predict where the product market fit is. And so in the beginning, keeping the go with the open mind so that you can engage with your potential users and customers in a way that allows you to determine what the real problems are.
because sometimes we might get too stuck, right? We will not do this because this did not fit in our strategy or we will not go in this direction. And I think that, and it could be that some people are very, very smart. didn't know all of those things in the beginning. Of course I wasn't. So for me, what worked was just being open.
to engage with the world with very open mind. mean, basically, we did not say we will not do OCR or we don't understand images or we don't do PDF or we would not work with banks or whatever those kind of things. And I think we just engaged and tried to go and add those things and see that if this solves a real problem. And once we saw that
you know, there are more than one and more than three and more than five and more than 10 who have similar problems. That's when we decided maybe there is a large enough market and let's go in and figure out how to scale this. But early on, literally we had no idea what would really work. So we just we just we were just searching for. You know, a real problem.
Prateek Joshi (25:13.939)
I think keeping an open mind is such a great point. And many times, as you said, many young founders, they get stuck with, oh, I won't do that because it's not in my very specific product direction. And I mean, it is a point to consider. you can't, you have to engage, as you said, have to engage with the world as it is before you can reshape it. Hopefully when you're bigger, you'll find a way to reshape it. But in the early days, you have to find a way to engage. That's a great point.
Now, after the first 10 customers, you saw there's something working. Clearly, there's something working. What experiments? And experiments, mean, product experiments, sales and marketing experiments. What experiments did you run over the years to speed up customer acquisition?
Anant Bhardwaj (26:01.198)
Yeah, I'll give you like what failures we had to. So basically once we got the first six, seven customers, we actually grew very, very quickly from like 200, you know, from zero to 5 million in like eight to nine months. So that basically gave a false illusion that we can go on scale. So we literally hired some really massive sales team and a lot of people, and then we realized it didn't work. And the reason why was that
Prateek Joshi (26:12.721)
Mm-hmm.
Anant Bhardwaj (26:30.06)
this whole experimentalist mentality that allows you to find product market fit is not very good for scaling duty and motion because what they want is a very clear defined playbook which they can just run exactly the same thing. So and if there is no clarity, like, know, they can go in different directions because they don't, because for salespeople, right? Like the reason why, you know,
Coke can predict literally the revenue is because it doesn't matter who do you put on the table. They can sell the Coke exactly the same price because they have created all the mechanisms and all the marketing, all the stuff that doesn't matter who the person is. They will sell exactly the similar amount. Right. And I think for a successful GTM at scale, what you really want is pretty well defined motion that just works.
and people being enabled to run that motion. And if there is too much ambiguity, you know, like when you hire sales team, that didn't work well. So we hired sales team a little too early. And because when it was founder led sales, was easy. I could basically, you know, in real time, figure out like how to approach this problem differently and then how we can add this tool. then because it's with operating systems are too flexible, we had this like backdoor called user defined function.
say any problem that you could not solve, you can add that Python code, and if you can write code, you can solve any problem, and that's how we will win those customers. But when you hire a salesperson, they can't do it, right? So I think we learned some hard lessons, actually, which is for go-to-market motion to work, you need complete clarity. And then we had to literally tell them that you...
don't go and sell this like unstructured data platform. What you have to go and sell. Does anybody have this problem of lending where they need to process unstructured data that is specific enough that they can go to the right kind of persona and sell the right kind of value and so on. So we kind of like built a go-to-market motion that way early on, which basically gave us early growth, but at the same time, technology were changing. the technology program synthesis,
Anant Bhardwaj (28:48.43)
was perhaps the best producing result, but then Transformer model came around 2017, end of 2017, BERT model came around 2018, but they were not very good at understanding anything that you could not fit into sequence of tokens, like if you have layout and those kinds of things. So we basically worked on our own layout-aware language models called Tilt, Lambert, and so on. And that basically was a turning point. And at that time, the issue was,
So we wrote this language model called InstaLM that was in 2020. But in order to run them, you needed GPUs. And it was very unheard of if you go and tell the customers that if you want to process loan, you need GPUs. Like what? And people did not know at that time. But we took that bet very early on. And even though we had early, early trouble in getting all the IT and the technology ready, it allowed us to build the project in the right way. And of course, as soon as large language models came.
Prateek Joshi (29:28.962)
All right, all right.
Anant Bhardwaj (29:45.45)
in 2022, we launched AI Hub very immediately. So we've done like three major product changes, like, and not changes in the sense that feature ad completely rewrite from scratch, like complete technology change.
Prateek Joshi (30:00.359)
Right. That's such a phenomenal point. And I see this often too, where hiring a sales team prematurely, not only does it hurt, but undoing it is another big, big headache. So you mentioned until there is complete clarity, sales team might not be very helpful. Now the question is, what does clarity look like? Because for different companies, just appears in different ways. So if you were advising.
founder in the earlier parts of the journey. What does clarity look like objectively?
Anant Bhardwaj (30:35.598)
So I think the, can test, I can give you rather a mechanism to test whether you are ready to hire a salesperson or not, which is don't hire a sales leader first. Hire one or two smart, basically non-technical people who are not founders and who are not engineers and see that if they can be successful without a lot of your involvement, if you can make one or two of those people successful.
Prateek Joshi (30:43.314)
Mm-hmm.
Anant Bhardwaj (31:01.006)
then you know that what is the process to make them successful. Because the problem is otherwise if you go and hire let's say a sales leader, then they go and hire head of America and head of Europe and then head of within America, East and West. And now you have basically 30 % team, but you have not figured out how to make one person successful. And now you figure out like 30 % you have spent $10 million and you're still learning how to make this one person successful. So I think the, founders are very good.
in selling, but that is in scale. so basically the founder sale gives you only one answer, and that is if the problem is real, that's it. Is this the right product or can you build a GTM motion around it? Those answers need to be addressed. Those questions need to be addressed before you build a massive sales team. And in order to address this, think the...
Prateek Joshi (31:41.575)
Right. Yeah.
Anant Bhardwaj (31:58.028)
The first question is, can somebody other than you sell it? And in general, you need to hire someone who looks more like salesperson rather than an engineer. So basically someone non-technical and you can of course help them with a little bit of technical support, but see that if they can do it. So basically just reduce your cost of experiment. if you make two people successful, then at least you know that the process that you follow to make those two people successful
you can hire similar profile and then build a team. So I think the common mistake, and I have made this mistake a lot, which is founders think that they can hire a sales leader and they can build a sales team. I think that can often go hit or miss. So the right way to do this is hire a couple of people in the field, make them successful first, because that tells you that this
somebody other than me can be successful and then you go and hire a leader. I would not hire a sales leader as the first hire for UTM.
Prateek Joshi (33:01.149)
Yeah.
Prateek Joshi (33:06.577)
Right, right. And it's funny, like it's extremely tempting to just hire the sales leader because you as a technical founder, you would think that, I'll just hire that magical VPS sales and my headaches will go away because they'll solve all the headaches, but it has the opposite effect. It's very interesting. All right. So going, okay. So I want to talk about the technology stack, right? At InstaVe. So can you, as much as you can disclose, what technology stack
Anant Bhardwaj (33:21.198)
Yeah.
Prateek Joshi (33:36.751)
have you used to build Instabase? And also part B is inside Instabase, company, where do you use AI internally for any work like sales marketing product, whatever it is, where do you use AI to make your own life easier?
Anant Bhardwaj (33:52.056)
So let me answer the second question first and then first question. So second question is using AI internally, this is one of our key OKRs. company level, we define three key goals. And our three key goals, for example, this year has been on the GTM side, landing new logos at high velocity, conjunction by the customer, and third one is building productive capacity. Productive capacity basically means
when you hire a salesperson, have they ramped so that you can assign them the code that will be successful. So basically those are the three indicators of GTM success. On the product side, launching search, is basically which we are planning as a GA. Second, the building enterprise ready reliable product because we sell to enterprises. And...
The third one is the security compliance and those kind of things. The third entire objective from the company perspective is literally using AI to increase productivity by 50 to 75%.
And that basically means now we are changing. So number one, as part of that objective, three key results we are measuring is AI being part of our SDLC process, which is software development lifecycle. So from the code reviews to testing to vulnerability management, all of that is first, basically first step is done by AI. So for example, when code review happens, when you write a code,
the code is not reviewed by the human first, code is reviewed by AI first because 95 % the, AI has seen a lot more code than a human reviewer will look at, right? Like, so the point is like, it just saves your code review time by so much because AI has already, like basically your code is almost perfect. So this is just for verification that that's all we need human reviewer. So that's number one. Second.
Anant Bhardwaj (35:41.642)
All of the static analysis, the security checks, all those are done by AI. Much easier to find those things. Third one is vulnerability management. Now we are implementing, we call this like continuous zero vulnerability, like whenever there is security vulnerability that gets detected on the internet, go and figure out if we are using any of those packages, automatically update the packages, rebuild the whole thing. So that's basically being done by AI.
And then all of these standards, like when you start to enterprise, you have to comply with accessibility standard and all that kind of stuff. So like just have AI look at all the code, make sure that we can make all the right changes so that we comply with all those standards. So we use heavily AI in our software development lifecycle. Now we're also implementing continuous testing and all those just being done by AI. So that is still work in progress. So that's on SDLC. Second, legal.
So we just did a fundraise last year. And I think every question that the investor asked, we literally had AI basically answer that and return even RFP when they ask you all these question and model risk management questionnaire. So we have basically internal data repository. AI answered those questions. humans just reviewed it. Even the company kickoff that we did.
All of the employee questions about events, anything, this is all answered by AI. So we didn't have any humans answering them. All the HR questions internally, all being answered by AI. So we are doing a lot. But we are looking at more ways to do that. I will have better answers. I will have, hopefully, more elaborate answer by end of the coming year. But I think we have already made a good amount of progress. But we hope to make even more. So that's why I made this in OKR for the whole company.
Prateek Joshi (37:05.171)
Bye.
Anant Bhardwaj (37:28.492)
And then, of course, we are launching our own search product. So one of the third key role is part of this objective, which the InstaBase employees, 80 plus percent are the weekly active user of our own search product. So if we know that we, ourselves, are getting value from that, then we can be more sure that our customers are also getting value from that.
Prateek Joshi (37:47.293)
Right, right. That's amazing. Yeah.
Anant Bhardwaj (37:48.992)
Now on technical architecture. So some really good things happened when we were starting the company. 2015, there was this new technology that had come in called Docker. You might have heard. And that made the containerization pretty popular. And so we basically had just started from the very, very beginning with microservices. Kubernetes was still very early on at that time.
Amazon had something called ECS, which is elastic container services. But we basically took the bet that Kubernetes would be successful and implemented Kubernetes even though it was in beta at that time. So I think we started with the right architecture from the very, very beginning, which was great. Given that we had started with an operating system, we actually separated the data layer and the execution layer as the nice logical separation and physical separation from day one. So that basically was very, very helpful.
And the OS architecture actually helped us in a significant way. Because what is the operating system? Operating system by itself doesn't provide you much. What it allows you to do is build applications by giving you all the common services. So that basically meant that if something didn't work, you don't really have to worry too much. You build a new app. Old app can continue to run. You just stop using the old app. And then all the new things you build on the new app, and then slowly migrate from old to new.
So because your platform is same, the OS didn't change. Only thing that changed is your application. You can move from Apple Calendar to Google Calendar. It's not that high. You can move photos from Apple Photos to Google Photos. So you just build a new app that allows you to do the migration. So we added a lot of these new apps, like initially OCR and image recognition, all those kind of stuff. Then we added all of these transformer-based training, the language model training later. And then
So it worked pretty well. It served us well until about seven years. So because operating system allowed us to build more and more things. Then we realized that this flexibility came at a cost. And that cost was too complex. Because now every customer, basically we have like 45 plus apps and you have five customers using on.
Anant Bhardwaj (40:09.996)
these seven apps and another seven customers using these nine and another. So now you have a small team maintaining so many different and then backward compatibility issue. How do you move and what you have to support this old version and new version. So unfortunately that with the cost. So in 2020 when LLM came, are like, LLM kind of what is that language model? And if you actually kind of like take a step back.
The.
If you go pre-LLM era, in order to tell computers what to do, you had to speak in a language they call code. And that's why you needed this sophisticated skill called software developer who basically can write code. But then in order to allow non-technical people to make computer do something, they will build those interfaces like UI, where you can pinch and drag and drop and click.
And so you basically had very limited interface and depending upon what you dragged or dropped or clicked, they will convert into some code that runs. So basically there was this middle layer of code and then interactions are limited to what user interfaces can provide, right? That's what the limitation. For the first time now, you can actually have the ability to tell computer into some language and that LLM can tell computer to do whatever. So it kind of like can do a job with the operating system.
because it can take any human specification and then under the hood, you can make computer do whatever. So we are like, let's kill the whole operating system idea. Painful, very controversial decision. And so we built AI from scratch, where literally LLM is the operating system itself. And then we created a hub, which has all these apps. But now what you do is you just literally say, hey, I wanna do this. And now the LLM will figure out how to break that into the stuff, but all that composure is gone. So there is no concept of apps and all that.
Anant Bhardwaj (42:02.69)
You literally like, here is your data and three product surfaces that we give. One is automate, which basically means what you want to make computer do automatically. And then two other surfaces that analyze and search. Because we believe that with data, you want to do only three things. You want to ask question, which is analyze. You want to find information, that is search. And you want to make computer do something automatically. That's automate. So we try to simplify everything into just three product surfaces.
with under the hood LLMH being operating system. So that's the new tech stack. The other one was the old tech stack.
Prateek Joshi (42:36.547)
No, that's amazing. I love the explanation. It's phenomenal. I think you have an incredible way of explaining things, is remarkable. All right. We could go on forever, but we are at the rapid fire round. I'll ask a series of questions and would love to hear your answers in 15 seconds or less. You ready? right. Question number one. What's your favorite book?
Anant Bhardwaj (42:53.752)
Good. Yep.
Anant Bhardwaj (42:58.945)
singularity is near
Prateek Joshi (43:02.051)
What has been an important but overlooked technology trend in the last 12 months?
Anant Bhardwaj (43:07.992)
Quantum.
Prateek Joshi (43:09.779)
What company do you admire the most and why?
Anant Bhardwaj (43:15.096)
Google, it fundamentally changed the productivity of the whole world.
Prateek Joshi (43:19.193)
What's the one thing about unstructured data that most people don't get?
Anant Bhardwaj (43:24.814)
You don't need structure to understand it.
Prateek Joshi (43:28.179)
What separates great products from the merely good ones?
Anant Bhardwaj (43:35.33)
Great products lead the people to do the right things. The good products require documentation and bunch of other things for people to do the right things.
Prateek Joshi (43:45.484)
Amazing. What have you changed your mind on recently?
Anant Bhardwaj (43:51.502)
the reality and if we live in parallel universes.
Prateek Joshi (43:58.493)
What's your wildest prediction for the next 12 months?
Anant Bhardwaj (44:03.092)
AI will continue to surprise and impress us more than people estimate.
Prateek Joshi (44:09.299)
right, final question. What's your number on advice to founders who are starting out today?
Anant Bhardwaj (44:14.658)
Don't follow any advice.
Prateek Joshi (44:16.306)
I'm
Oh, love that, love that final bit. It's funny. Anant, this has been a brilliant discussion. Obviously, I've been following Instabase. I know you're a remarkable founder, and I'm really glad we got a chance to do this. So thank you so much for coming onto the show.