Converting Cameras into Autonomous AI Agents | Rish Gupta, CEO of Spot AI Artwork

Infinite Curiosity Pod with Prateek Joshi

The best place to find out how AI builders build. The host Prateek Joshi interviews world-class AI founders and VCs on this podcast. You can visit prateekj.com to learn more about the host.

All Episodes

Infinite Curiosity Pod with Prateek Joshi

Converting Cameras into Autonomous AI Agents | Rish Gupta, CEO of Spot AI

April 29, 2025 • Prateek Joshi

Rish Gupta is the cofounder and CEO of Spot AI, a video AI platform for the physical world. They've raised $93M from amazing investors such as Scale, Bessemer, and Qualcomm Ventures.

Rish's favorite book: Atlas Shrugged (Author: Ayn Rand)

(00:01) Introduction
(00:32) Video-AI basics: ingesting camera feeds across diverse networks
(02:42) Edge-vs-cloud trade-offs for compute, storage, and bandwidth
(05:40) Mapping the sector: hardware waves to cloud cameras to pure-software layer
(07:43) Founding insight: why Spot AI attacked the video layer now
(11:35) Bare-bones MVP: two-page dashboard that unified camera access
(15:34) First-10-customer lessons & pruning the ideal customer profile (ICP)
(18:54) Go-to-market experiments: ICP variants, pain points, and channels
(23:00) Early-team blueprint: engineering-heavy, founders run sales
(24:03) Hardware stance: free IP cameras to simplify one-vendor buying
(26:01) Biggest tech hurdle: supporting thousands of camera brands & configs
(27:00) Sales challenge: outbound fatigue forces novel GTM motions
(28:55) Future vision: each camera becomes an autonomous AI agent with a "job"
(30:25) Key AI unlock: massive context windows enabling flow-state reasoning
(32:14) Rapid-fire round

--------
Where to find Rish Gupta:

LinkedIn: https://www.linkedin.com/in/profilerish/

--------
Where to find Prateek Joshi:

Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
X: https://x.com/prateekvjoshi

Prateek Joshi (00:01.43)
Rish, thank you so much for joining me today.

Gupta (00:04.546)
Prateek. Thank you for having me.

Prateek Joshi (00:07.338)
Let's start with the basics of video AI for the physical world. Now, all the foundation models people are building to do video, understand video, but when it comes to the physical world, it's a whole new world of challenges. So if you're building a product like this, what are all the things it should be able to do well?

Gupta (00:32.012)
Yeah, the first, it starts with the very, from the video source itself. So the very first thing you need to understand is that when you step out of the physical world, these cameras are behind different networks, different configurations, different brands. You don't control the network topology, the hardware infrastructure. So.

And they're not neat little databases that you can scrape off the internet like LLMs can and easy to read. So, and somebody needs to provide you access to this. So, one is to build that whole layer, which is the networking layer, the video ingestion layer, the video understanding layer, which is able to talk to all different brands of cameras, is able to ingest those videos, is able to convert them to a unified format for them to be fed to an AI pipeline.

That's the first layer. The second is then you have to build an AI pipeline which can work both on the edge and in the cloud. Because most businesses or most places will have...

tens, if not hundreds and thousands of cameras in one building. like San Francisco Airport, for example, might have five to 6,000 cameras. Stanford campus has about similar number of cameras, five to 6,000. So you can't stream so many 4K cameras to the cloud. So you have to compute, do a lot on the edge. And then you have to figure out what the hybrid architecture is to do enough on the edge and then supplement it some with the cloud and how much data are you sending to the cloud and how you bringing it back in a low latency format. And the third is,

You have to make all of this very easily accessible to users. You're dealing with lots of videos. If you have hundreds of cameras, you're dealing with 100 different video streams streaming 24-7. You have an insane amount of video clip. So just to give you an idea, even though we are much, much, much smaller than Google or YouTube, even at our scale, we're ingesting twice the amount of net new video.

Gupta (02:25.836)
than YouTube does daily in terms of net new videos. And that's because our cameras are streaming 24x7. So when we have lots and lots and lots of cameras, we quickly have to, we have a lot of data that we have to go through.

Prateek Joshi (02:42.632)
amazing. mean the scale is mind-boggling. Now when you think about all the cameras and we'll dive into it in a minute but so much raw data is being captured and the bandwidth the network doesn't support like sending everything to the cloud. So how do you decide what stays there, what gets processed, what gets sent to the cloud because

How does it serve the need of like maybe you had to go back seven months and look at the raw feed and if you don't have raw data, then it's a problem because many of the applications might be centered on maybe security, for example. So how do you decide cloud versus edge here?

Gupta (03:22.604)
Yeah, so first, the storing data on the cloud, storing such amount of video data on the cloud. So there's a storage aspect, there's a compute aspect. The storage aspect is so infinitely more expensive on cloud compared to what you can do on the edge.

that it makes almost no sense unless you're regulatory required or you have some really business-driven reasons to store in the cloud that you would go and do so. It's both bandwidth-consumptive because you have to at some point transfer all the video from the Edge to the cloud, and B, the storage cost of AWS or GCP or Azure or whatever service you might use is just prohibited. So storage is much preferred on the Edge, which does bring a constraint of...

A customer has to decide how much storage they want on the edge. Do they want 30 days, 60 days, 90 days, 120 days, 180 days, 365? And then they have to live with the reality. So they chose 90 days, they can't come after seven months and be like, can I get access to a certain video from 90 days ago? Then comes the compute layer. And the compute layer, and then there's a system of record layer. The compute layer can happen, you take...

You do as much as possible on the edge. And luckily, the edge computing stack is getting richer and richer, and models are getting more and more efficient. So we can deploy more and more on the edge. And then once you have some inference from the edge, and then you supplement it with a cloud with much larger models and stuff, when you actually detect something interesting.

So you're not going and passing everything to the cloud. So you're using Edge as a smart layer between what to get, keep, what to transfer, and then where it needs to transfer, it will transfer. And then everything which is a system of record goes to the cloud. So system record is all the metadata, all the analytics.

Gupta (05:10.57)
if there's a clip that you saved. So let's say something happened in the last 60 days, which was incredibly important, was an accident or something else, which needs to be stored first. If you click Save, that then gets a copy saved in the cloud, which means you can come after seven months and ask for that specific clip and you will have that. So the system of record stays in the cloud. Compute, you stagger between Edge and cloud, depending on what you can do where. And then storage, you kind of move all towards the Edge.

Prateek Joshi (05:40.138)
Now, when you look at the entire sector, there's so much happening. If you were to explain this sector to somebody who doesn't know, how do you break it down? What's the current landscape, including hardware, software, integrators, consultants? How do you break the sector down and how do you understand where the gaps are?

Gupta (05:48.974)
Thank

Gupta (06:05.23)
It's a really interesting question. the very simple way to think about the sector, at least the way we think about it is there is...

The first wave of people who build cameras are very similar to any other hardware company. They were building a bunch of hardware, so same thing with phones and stuff. What we call the IP camera revolution. So 2000, the internet protocol becomes a big thing and you have IP cameras coming out. These are network connected cameras.

Then the second stage came out, which was cloud-connected cameras, where the cameras were connected to the cloud networks, and then they had a good dashboard and good software. And for the first time, camera vendors were able to charge for software. And now there's a third wave emerging, which is a purely software layer, which do not sell cameras, which says, hey, we will provide the software layer that runs on top of all this hardware infrastructure.

and think of it as like Android or Microsoft Office for PCs and stuff like that. You're building the operating system that can go on any camera and can run across different brands of cameras. And with the AI agent declared, you're able to deploy AI and AI agents on this video layer and make it much more.

easy to consume and for the end customers. So the end customers in the first wave were buying cameras, then they were buying cameras plus some software from the camera vendors. And now you have this AI layer, which is getting layered on top of cameras, which is what the third wave of companies like us are building.

Prateek Joshi (07:43.64)
When you look at the product itself, you launched Spot AI, you built it, going to the day right before, or the trend leading up to it, the cameras have been around for a while, observing video, extracting intel, that concept has existed for a while. So how did you spot the need or perhaps the angle of attack to launch a new company here?

Gupta (08:11.65)
Yeah, so there's two things happening. One is why launch this company in the first place and then you're figuring out how to launch it. Why launch the company? There were two trends that we were seeing when we were in grad school programs. One was the edge computing stack was getting more and more efficient every year. So you could get twice the amount of capacity at the same cost almost every year. The Nvidia roadmap was pretty solid.

and they were just delivering it year after year. So you could project it out 10 years and be like, okay, the edge computing stack is going to be 1000x more powerful and it's going to be very cost effective. And then you're looking at AI models, which every year is just getting, again, much, much more efficient and much more powerful. And that was happening if you were in any AI lab or if you're doing AI classes that in any point after 2015, you would see that rapid.

improvements happening on a year on year level. And then you project that out and you say, okay, in 10 years, even though you couldn't call it the GPD, but you could say that, something really smart is going to exist because that's the rate of improvement that's happening to the underlying technology. So then it's like, okay, if you have...

lots of compute at the edge and your models which can understand anything, you want to throw the richest data source at it because that's going to be the most value creating thing because everything which is less data intensive will get consumed first in the richest data source. So then that became video and then that's like, how do you now go back to cameras? Then it to point of like...

How do you launch this? So you have this insight that you could do lot more of the video data in the future than you can today, and you can build a company entirely around that premise. Then comes a second degree order of like, do you actually go about launching this company? And then you truly understand that, okay, every other camera vendor in the past who's dealt with videos has basically been a camera vendor. They've sold primarily hardware. know, majority, 70 to 90 % of the revenue comes from selling physical hardware.

Gupta (10:15.374)
If you've seen the shift of, I was giving mobile phones as an example, but if you go back to mobile phones in 2008, before iPhone or 2007, before iPhone came out, Motorola and Nokia were selling about $70, $80 billion worth of phone, which is pure hardware. Not lot of software revenue there. And then if you look at today, yes, there's still a lot of hardware revenue by iPhones and Samsungs of the world, but.

just the App Store revenue, not even the Android Play Store. Just the App Store revenue way engulfs the $70, $80 billion worth of phones that Motorola and Nokia were selling. So the analogy that, OK, the software layer on top of video is going to get way more interesting and much larger than the tens of billions of dollars worth of hardware being sold felt something to change and something that will come through it as some defined timeline.

Prateek Joshi (11:05.496)
And I think that's a great insight that in terms of just the sheer volume of data being generated, video eclipses like almost everything. It's crazy how much data exists and with the foundation models becoming more and more powerful, the more data you throw at it, the better insights, Intel you can extract. So that's wonderful. Okay. Now the first version of the product, you could have built so many things, you could all, you know, can build for years. So, but you had to ship the product.

Gupta (11:23.372)
Yeah.

Prateek Joshi (11:35.542)
What did the first version, MVP, look like and how did you decide what features would go into it?

Gupta (11:43.638)
Yeah, I mean, by definition, the MVP should have the least amount of features possible. And you are looking for something that gets the tasks done. So it's not about, I have three half big features. It's about, is there one feature which end-to-end gets a task done which the customer cares about? And what is it one task that they can do?

When we launched, more than any AI task, first task that the customers wanted to do saying, AI is all cool and great. I just don't even know how to access my videos from different brands of cameras. So the very first version we launched pretty much had just two pages probably on the dashboard. And those pages, one allowed you to add a user so that you can add more people in the organization to this dashboard. And the second page,

had the list of all your added cameras. So you can see all your cameras in a single page. our website, and we realized that that value proposition is enough to get into the door. They get excited about the AI stuff, but they're buying just for this one thing and they're willing to pay for this one thing and everything else becomes cherry on top at the beginning. So that's all we did for the first couple of months. And then we slowly got more feature requests and more more things that they wanted to do on top of it. And then we kept building it out from there.

Prateek Joshi (13:01.842)
it's funny how the what you think in your head that you want to build a complicated enough to build but what the customers want what they're to pay for the needs it's very interesting how they're ready to pay you just if you can just show the camera feeds on all the cameras and let me add users that's enough it's very interesting yeah now what are the key points that someone should understand when you're working with cameras

Gupta (13:17.08)
Yeah.

Gupta (13:21.196)
That was enough to the first few customers over there.

Prateek Joshi (13:31.128)
In the wild, you mentioned, mini brand, that's one point, but if there's a founder, an early stage builder out there who's working with cameras, what should they know?

Gupta (13:42.35)
I mean, one, just, I mean, the number of camera brands present and the number of configurations present, the codecs, the frame rates, the pixel sizes, like pixels on the camera, like in settings of the camera, they're just, the permutation combination of that goes in just millions, if not tens of millions or more. So you're just dealing with a very wide range of sensors, even though they look the same or they feel the same.

And you have to build a unified system which kind of takes in all the sensors. So that is the one challenge anybody building this space is going to face. B, you're dealing with the richest data source. We kind of spoke about it. We were doing some searching, it's like one minute of video is equal to putting two million words into charge JPD. That's the kind of computation load that you're putting, which is huge. Just a minute of video is worth two million words.

And so dealing with this really expensive data source and how to kind of manage that, because if you just, it's expensive at every, if you try to store it, it's expensive. If you try to transfer it across the bandwidth, it's expensive. If you try to throw computational power in it, if you try to throw tokens at it, you'll very quickly need way more capital than probably exists to do startups. So.

Yeah, so you have to be really, really smart about how to deal with this really dense data source.

Prateek Joshi (15:14.936)
Earlier you talked about how that bare bones MVP got you the first few customers. So let's say now you just closed your first 10 customers. Let's go back to that day. What are the key learnings? At that point, what did you learn from the first 10 that you wanted to implement for the next 100?

Gupta (15:34.19)
Yeah. And I have a different point of view than some other founders have on this. lot of founders look at the first 10 customers as a way to learn and what is scaling, what is scalable so I can go and build a scalable go-to-market engine to get to 100. I actually think...

As a founder, you should just go and sign first-end customers as fast as you can. People spend too much time thinking, is this the right customer? Is this not the right customer? I'm right now you're just getting anybody to try your product. You actually have very limited understanding as we talking about, like what your product needs to be exactly in what dimensions, who your customer needs to be. So to me,

We actually don't wait to the first 10 customers. And then you pause and you look at the 10 customers and say, which are the five customers I need to fire, which do not need an ICP. This will not scale my business, but you need to get to those 10 really fast and then look at it and be like, okay, should I go small? Should I go large enterprises? Should I go this problem or that problem? Should I go this pinpoint at this persona? And you look at all tenants, look at commonalities and think through all the effort that you've got each person through and how big are those temps. And then you think about which ICP you really want to focus on. So the first 10.

customers is a way to prune your ICP or get closer to your ICP understanding. And then once you have that, then the 10 to 50 journey or the 10 to 100 journey is more about trying a bunch of go-to-market hacks to find more of your ICP. And hopefully you find some repeatable kernel in that journey so that you can now then go from 100 plus.

Prateek Joshi (17:00.088)
Among the first 10, how do you choose the file? What metrics or what heuristics should I use to pick the file?

Gupta (17:07.662)
I mean, you're looking for a few things. One, a pin point that is uniquely solvable by you that has to be one of the most important things that you are able to provide for these set of ICP.

you're able to provide something which is genuinely unique. So if you came across another similar ICP, they would value your product or fall in love with your product because that uniqueness stands out across this ICP. Because some other ICPs might have stumbled upon your product, your product just might be good enough that they kind of like your product and bought it, but who are the people who really, really, really... So that's number one, whose pain points are you solving the most? And number two is how big of a market do you think...

they represent. So your ICP is just set of 10 hobbies and that's it and you're tapped out or is it, oh no, I can repeat this and there are thousands of them or tens of thousands of them and they represent this amount of budget value. there's a large enough market to, your tab doesn't need to be into billions necessarily with these ICP, this is a sizeable addressable market, like serviceable addressable market. So as long as it's in the hundreds of millions.

I think it's good enough to go after. And then you're looking for other channels in which you're able to identify and reach to them, are they repeatable? And if some ICP scores well on all three fronts, you have a really good ICP.

Prateek Joshi (18:34.552)
And if you were to advise a guide, an early stage founder who just got their first 10 and now they have to run a bunch of experiments to figure out how to get to 100, how would you advise them to structure those experiments? Or what should the experiments even look like?

Gupta (18:54.7)
Yeah, I think the...

Gupta (19:02.712)
The experiments to us is so that you can do a lot of experiments on messaging, you can do a lot of experiments on channel, and you can do a lot experiments on pain points and product features. And you can do some experiments on versions of your ICP. So even if you have an ICP, there are versions of it that senior, junior, larger, smaller companies that you can trade on. So.

And I think I do it in the reverse order. So you want to experiment the most on different versions of your ICP. So larger, smaller, bigger, senior, more junior, more newer age, more legacy companies. Try all kinds of experiments to see where exactly you fit in. You know that this kind of person resonates with you, but get more and more specific. Then you do want to get into really understanding what

product features or pinpoints or benefits that you talk about that makes them quickly read something or hear something and be like, aha, this is great. This makes sense to me. This solves a problem to me. So you want to figure that out through iteration. And if you have to build things, if you have to test things. And then there is the whole

channel piece, is, does LinkedIn work better for me? Reddit works better, the cold calling or emails or billboards or whatever. And the last bit is kind of more of the branding marketing messaging, which is beyond product features where you're trying to call yourself X for Y or, know, you.

I think that experimentation matters the least in the beginning because you're just not well known enough that people are choosing or deciding based on that. So yeah.

Prateek Joshi (20:48.044)
when you're experimenting with the ICP, how does it, there's a bit of tension between, you want to find the repeatable kernel and you want to just go after it. And on the other side, you want to experiment enough to see what is the kernel. So how do you balance this trade off between exploration and just going after the thing?

Gupta (21:09.76)
Yeah, so when I say experiment with the ICP, what you don't want to be experimenting is you need to have some, hopefully the first 10 customers they've given you told you that this customer, this product is bought through developers or this product is bought by finance people or this product is bought through the engineering leaders or whatever your case might be. So you've got some semblance of who the person is. It's an engineering leader, it's a finance person, it's a developer, it's marketing person. Great. Okay.

I have that understanding. Now I don't know if our product is going to resonate with marketing person in a 5,000 people company and a 50,000 people company in a 50 people company. So to me, that's experiment with the size of the customers. Then does this marketing person has to be an IC? Does it have to be a mid-level manager? Does it have to be the head of department? Experiment with those things, saying, who'd?

resonates more deeply with your messaging, who creates the budgets and stuff like that. So it's a lot more about finding variations of your ICP in different, and it might be across different industry marketing person, food and beverage is different from marketing person, automotive. And so that's the kind of experimentation which is helpful. So you take something, either a sector or a persona and you stick to it. And then you experiment a bunch within it from 10 to 100 to figure out.

Basically, you fine-tuning your target approach.

Prateek Joshi (22:38.2)
Right. And when you think about the team construction, so basically many different ways to build a team. So in this case, how do you think about the first 10 hires? And also now, how is the company structured in terms of sales, marketing, product, engineering? How do you think about that?

Gupta (23:00.866)
Yeah, I mean, the first time hires normally depends on the kind of company building, right? Like most companies in Silicon Valley tend to be tech heavy. So you're going to skew very heavily engineering and product design and sales is going to be done by founders or maybe one person in the team. So that ends up being more than norm than not.

The company skills, obviously you start adding all functions. So you set marketing, finance, people, sales, sales ops, yeah.

Prateek Joshi (23:34.528)
Right. Going back to the comment that you made earlier about the software layer can open up so much revenue, it can be very big, but also you also have a hardware offering, the IP camera. So when did you add this and how do you think that product line is going to play out? Eventually, would you expect this to unify all the fragmented, like different brands and cameras into this?

Gupta (24:03.574)
Yeah, so when we spoke about the hardware layer with respect to spot A and video, we'd speak about cameras as being the traditional hardware that's been sold in this market. And then there's something that we sell, which is an intelligence box, which sits on the network, which can talk to all these cameras, which is what makes this camera agnostic, hardware agnostic. So we're not in the business of selling cameras. We're purely in the business of doing the...

all the software stuff for which we need an AI box. So that's a different story. The cameras themselves, you will see on a, if you go to a website, you will see that we provide some IP cameras for free. And the reason we give IP cameras for free is A, we realize that you can go and buy these cameras for really cheap and is the same quality as the competitors, but you're just not marking it up. You're not branding it. You're not doing any marketing around it.

And the reason we started offering it is not because we cared if it's our cameras or somebody else's camera that you connect to our solution. We had large customers who would say, oh, I also need to buy cameras because as I'm going to use software for my existing 100 cameras, I also want increase my camera deployment to 150 cameras. And then you're like, OK, I can connect you to another vendor you can buy 50 cameras from because we don't care who you buy those from. And they're like, no, no, but we want to do business with one vendor.

So we very quickly understood that, you know, sometimes you would lose customers because they would want to buy everything related to that purchase through one party and not having any offering would put us out of the race. We'll not check a box. And so being able to check that box and in order to disrupt the market and to stay true to our kind of software leanings, we don't charge anything for the camera. Cameras are free for our customers.

Prateek Joshi (25:52.514)
What has been the biggest technical challenge you've had to overcome to build the company?

Gupta (26:01.462)
The biggest technical challenge, it shifts and always the next one looks bigger. there are probably, I don't know, a few thousand camera brands in the world. And figuring out how to work with all of them and across different customer sites.

It took us about two years to get it right and to be able to support all of that. So that felt like the longest onslaught of you trying to solve a problem every time thinking you've kind of got it and then realizing, no, you're two steps behind.

Prateek Joshi (26:44.248)
All right. And when it comes to the business side, basically getting customers. So what has been the challenge on the customer acquisition side that you had to overcome?

Gupta (27:00.398)
I think the challenge has been the, there has been, you know, the dominant strategy for most of late 2010s and early parts of 2020s has been

a very outbound strategy, which is dominated by lots of GTM tools, which allow you to enrich leads and get to know more about your customers, and then send a bunch of automated emails, autodilers, which make code calling easier. It's the outbound engine just getting fine-tuned and fine-tuned. The email message is more personalized. The call is getting more more personalized. That's been the dominant motion.

And over the last couple of years, we've seen a shift where email and outbound and cold calling is no longer a dominant motion. I'm sure you get hundreds of emails, thousands, and you barely respond to any if at all. And that's happening across the corporate chain. So the traditional outbound method, which if you...

would go and speak to somebody five years senior to you who's built a great company and be like, how did you grow this? How did you grow your channels? How did you grow your sales? They will talk about outbound and emails and how they automated it, how they used all these tools and they had internal data teams fine tune all these tools. And suddenly that thing wasn't working. I think a lot of companies had to have to go in the last couple of years and figure out their own unique GTM mechanics and they can't rely on a playbook.

So that shift for us, the early first couple of years where those machine mechanics were working, suddenly they're not working as well at all. So you can't scale something which is working. So you have to go and really define and find new ways to reach your customers.

Prateek Joshi (28:44.536)
Right, now peeking into the future of video AI for the physical world, like how do you envision the future for the sector?

Gupta (28:55.628)
Yeah, the future of the sector to us is each of these cameras can have a job description. They will be fully autonomous AI agents. And unlike if you had a very simple sensor which just looks at machine telemetry or temperature data or environmental data.

There's only very simple job description that you can create for it. you know, if it's an environmental sensor, like control the HVAC and make sure that people are within comfortable range of temperature. So it can look at a weather app automatically. It can look at how many people are in the building and can look at what the temperature is and it can regulate those things.

Video is extremely powerful in the sense it's basically a set of eyes watching everything that's happening 24-7. so if you, right now it is very much still treated as a system of record. Like as you said, video is recorded seven months from now, somebody wants it, what do I do? And then my mindset of shifting to like, if you had a supervisor who was super smart, watching everything in the operations 24-7, and it could actually control a machine telemetry or it can control a bunch of peripherals.

How would you, what would you drive this person job description to be? And how would you use them as a teammate or a cooperator? It's in safety or operations team or across multiple teams. And so that's gonna be the biggest difference in the way these cameras are being engineered or being interacted with is gonna change very much from system of record into a job description mode.

Prateek Joshi (30:25.72)
One final question before we go to the rapid fire round. When it comes to tracking AI advancements, so many things are happening. What advancements are the most exciting to you as it pertains to building the company?

Gupta (30:29.377)
Yeah.

Gupta (30:40.802)
What excites the most? Yeah, to me, the most exciting thing is the context windows. We all talk about AGI, which is the sense of almost human-level intelligence. And I think most people think of AGI as this, an intelligence test that if you and I were having an intelligence test with a machine, the machine will outsmart us.

Whereas what's more important is all of human creativity and human endeavor happens because you and I can get into a flow state. Like we can have this half a long conversation. We can get into writing something or thinking about something and think about it for four hours, five hours, six hours, work with a team on a larger collaborative project lasting days and weeks and months. We can think across time.

And to me, that's the human flow state. And to have you like intelligent, you have to be able to get into a flow state. So the larger context windows, which are allowing AI to actually think for much longer, connect the dots much longer, and be kind of in a different way of flow state is to me the most fascinating thing which can affect.

If you want to have a fully autonomous AI teammate tomorrow, an AI agent which can just automatically do an employee's job or it can work across multiple days and calendars and stuff. Yeah, that's what's blocking it. And to me, the context window is probably one of the largest improvements that we're going to see and it's going to have a huge impact.

Prateek Joshi (32:08.248)
Yeah, I think that's really exciting. And with each new foundation model, Lama 4, they're increasing the context window so much. And I agree, I think that's going to be a huge unlock as we build more autonomous systems. All right, with that, we're at the rapid fire round. I'll ask a series of questions and would love to hear your answers in 15 seconds or less. Ready? right, question number one. What's your favorite book?

Gupta (32:14.477)
Yeah.

Gupta (32:30.734)
Perfect, all right.

Gupta (32:35.95)
It's a book called Atlas Shrugged by Ayn Rand.

Prateek Joshi (32:39.83)
Yeah. Which historical figure do you admire the most and why?

Gupta (32:45.198)
Oh, so I really do like Harvard Hughes, if you're from the early 1900s. I think his impact on cinema, aviation, defense, a bunch of things that he touched was just amazing.

Prateek Joshi (32:58.262)
Yeah, I know that's amazing. What has been an important but overlooked AI trend in the last 12 months?

Gupta (33:06.574)
I mean, we kind of discussed about this, is the kind of the context window to me, it's not overlooked, but to me, it's just underappreciated in all the discussion that you see in the public space. The only other thing I would say is, and this is probably biased by what we do, but there's so much hype around text and images.

that the fact that LLMs are really getting good and understanding long-form videos, which they could not a couple of, even a year ago, really well, is incredibly amazing because it's basically giving AI a set of eyes to work across the world.

Prateek Joshi (33:47.188)
What's the one thing about video AI agents that most people don't get?

Gupta (33:53.208)
think the simplest thing is the usability. Like humans are not going to watch videos. So the entire software stack that we built around videos is based around humans watching videos. And when humans are not going to watch videos, the entire usability stack, the UI stack that you're to build of how humans interact with this data source when they don't have to watch a single minute of it is completely different. And that's very exciting to build that.

is a very large.

Prateek Joshi (34:24.714)
What separates a great AI products from the merely good ones?

Gupta (34:29.998)
I think UI drives real ROI in AI. I genuinely think it's a new paradigm to talking to machines and usability is incredibly going to be the differentiating factor.

Prateek Joshi (34:44.0)
What have you changed your mind on recently?

Gupta (34:47.68)
What have I

It's a good question.

One of the things on a personal level that I've changed my mind on is I've always been somebody who's constantly...

trying to maximize my day, which is if I have an extra hour, go read a book, go learn something, go meet a friend and try to talk about something or go learn a play a sport or constantly trying to maximize and being giving myself the permission to actually recharge for a couple of hours. I I'm actually going to do nothing and not be productive and not has been a big shift. with the other thing kind of as a leader that's kind of shifted from a company building perspective is really focusing so much more with your team on the why than the what.

Because when you're younger as a company, you're still obsessed with who's our customer, what are we building, what features, what product. And all the what has to be outsourced, and you really live in the Y realm. And that's a hard shift to do.

Prateek Joshi (35:47.832)
That's actually a good one. Very few people can do that. I think that's a great thing to focus on. All right, next question. What's your wildest AI prediction for the next 12 months?

Gupta (36:01.786)
wildest AI prediction for the next 12 months. It's a good one. You know, the, the one of the human beings, we talked about people who inspire us, and it's talking about how to do it, but one of the human beings, is universally one of the most inspiring human beings, probably Leonardo entry, just because he was a polyglot across science and mathematics and paintings and just a bunch of things. And

You know, the human evolution or technology evolution has been such in the last 400 years that we've become more and more specialized. So we become good writers or we become good product people or we become really good engineers. And I think over the next 12 months with these tools, a base level product manager, a base level marketer, a base level content writer gets so high that a lot more of us are going to start becoming more polyglotic. So to me, that is...

Probably one of the most fascinating things that can come out of it is that is humans are created is not really unleashed in a specific direction. But most of us can now make movies without knowing how to make movies or build an artistic image without knowing going through 10,000 hours of painting. And I think that's going to create a bunch of people who can work, who are not as specialized and who can really unleash their creativity on multiple subjects.

Prateek Joshi (37:18.988)
That's a remarkable observation and a great way to position that observation too. So I really like that. right, final question. What's your number one advice to founders who are starting out today?

Gupta (37:31.566)
I mean, it's the simplest advice, which is just go build things. There's like, just build it. There's, yeah.

Prateek Joshi (37:39.96)
Perfect. Rish, this has been a great discussion. Loved your insights on how to build and shape a product, which, I mean, you're doing the hard thing to actually kind of working with cameras in the physical world is difficult. So thank you so much for coming onto the show and sharing your insights.

Gupta (37:55.15)
Yeah, thanks for having me and thanks for having Spot. Super excited. Thank you.