Infinite Curiosity Pod with Prateek Joshi

Building a Visual AI Platform | Brian Moore, CEO of Voxel51

Prateek Joshi

Brian Moore is CEO of Voxel51, a data infra platform for visual AI. They most recently raised a $30M Series B led by Bessemer.  

Brian's favorite books: Trillion Dollar Coach (Author: Eric Schmidt, Jonathan Rosenberg, and Alan Eagle)

(00:01) Introduction and setup
(00:22) Defining visual AI — beyond traditional computer vision
(02:14) Why visual data is so hard to manage
(04:17) Common “gotchas” in image and video datasets
(06:43) Is it a data problem or a model problem?
(09:41) The importance of edge cases and scenario analysis
(10:46) Coverage and handling rare events in datasets
(13:35) Using synthetic data and foundation models to fill data gaps
(14:25) The origin story of Voxel51 and the birth of FiftyOne
(17:56) Open source strategy and community growth
(19:31) Handling massive visual datasets — storage best practices
(22:03) Cost vs. quality tradeoffs in video storage
(23:54) Cleaning and indexing messy datasets
(25:49) Measuring real progress — beyond simple metrics
(27:40) Compute bottlenecks and faster iteration loops
(30:05) The economics of data infrastructure
(31:53) Labeling inefficiencies and smarter annotation workflows
(33:56) Hidden costs of data wrangling and wasted engineering time
(35:10) Positioning Voxel51 and lessons for founders
(37:53) The future of visual AI and missing industry standards
(40:36) Rapid Fire Round

--------
Where to find Brian Moore: 

LinkedIn: https://www.linkedin.com/in/brimoor/

--------
Where to find Prateek Joshi: 

Research Column: https://www.infrastartups.com
Newsletter: https://prateekjoshi.substack.com 
Website: https://prateekj.com 
LinkedIn: https://www.linkedin.com/in/prateek-joshi-infinite
X: https://x.com/prateekvjoshi 

Prateek Joshi (00:01.802)
Brian, thank you so much for joining me today.

Brian Moore (00:05.368)
Great to be here.

Prateek Joshi (00:07.548)
Let's start by defining visual AI data. it has been around for a long time. People have different understanding of what it covers. So if you have to define it, how would you define it?

Brian Moore (00:22.742)
Yeah, so when we talk about visual AI at Voxel51, we're really talking about the multimodal task of ingesting visual and perception data, whether it be image, video, sensor data like radar, lidar, associated modalities, speech, audio, even natural language, and feeding that into a decision making system, an AI system that needs to produce actions or decisions about the real world.

So historically, where one's mind might go to is what is called computer vision. Computer vision, as you alluded to, has been around for a long time. It refers traditionally to the lower level tasks that would make up a reasoning system, like understanding the classification of an image or localizing an object in an image, these very low level tasks, which of course is still important. And fundamentally, the AI systems that we're using today, they still understand and can perform these tasks.

But increasingly, what's needed to build systems that can operate in the real world is to go more from raw data inputs, multimodal inputs, all the way to decisions. And so we personally feel that visual AI is a more apt term there to kind of appreciate the origins of computer vision, but also give a nod to the distinct ways in which the technology is advancing now.

Prateek Joshi (01:46.336)
Amazing, I love that explanation. And I think it puts a nice, the foundation, the basics and the continuity of where we're going. If you look at visual AI data today, so many products, so many offerings out there, like what's most broken about visual AI data today? Or rather, what's most broken about how we can handle and understand visual AI data?

Brian Moore (02:14.892)
Yeah, well, one of the main challenges, and I would also say opportunities with visual AI, is the data quantity is vast. I like to quote a statistic that something like 95 % of all of the bits that go through routers on the internet are actually visual in nature. Makes sense, right? mean, a 4K video stream running at 60 frames per second is generating an immense amount of content. And that, course, brings challenges in how you store that data.

the costs associated with processing it, the scale of models on a technical side that are needed to effectively understand and reason about that data, and then maybe from more of a builder standpoint, the infrastructure required to kind of manage this amount of data, visualize the data, make decisions about which amongst your ocean of visual data you should actually feed to a model for training. These are very difficult challenges.

And many of the tools that were built in the world of what I call structured machine learning, tabular data, time series data, they really weren't built from the ground up to solve the challenges of visual data. And so as a result, practitioners in computer vision or visual AI often fall back on writing scripts and building their own data management systems and things that just don't scale because they don't have the benefit of the kind of mature tooling that may exist for other modalities to build on top of.

Prateek Joshi (03:40.48)
that's great and let's go a level deeper. So 95 % that's a staggering number and I think many people don't realize just how skewed this is because it's an immense amount of data. So when you look at an image data set or a video data set and somebody's building like a simple system like a reasoning system they want to look at it understand it take a decision what are the most common

like gotchas that you see in an average data set, like the things that people may not notice until it breaks in the real world.

Brian Moore (04:17.867)
Yeah, well, maybe one of the first challenges. people think about visual data, you might think of images, but the reality is that almost all visual data is captured today as videos. mean, even the photos that you take on an iPhone, let's say, are live photos. They're behind the scenes capturing multiple frames, and the data really has this temporal component. And so when you're trying to build a model,

Up until very recently, most models that were processing visual data were really just built for images. And so you have this key challenge of how do I go from this large amount of video data with all of these interesting patterns and correlations between frames? And how do I go down to an image data set I can feed to my transformer model or whatever to train it? And to be honest with you, the state of the art there was quite primitive.

I don't know how much data can I process. Maybe I should sample one frame every 30 seconds or one frame every 60 seconds or something like this, right? Which can lead to a lot of lost opportunity to capture some of the nuances. If you were to adjust in time just a couple of milliseconds in a video, you might go from a very clear crisp image to one that's very blurry because of motion artifacts and stuff like this. And you may think, I want the perfect data.

Not really, because your system needs to work in the real world and be able to deal with all of the nuances, the bug on the windscreen, or the low light condition that results in very blurry data. So you do want this coverage of all of the different types of scenarios that could exist in the real world. So there's lots of data sampling issues there that are very interesting. The vision, of course, is that you can just take a raw stream of data and feed it directly to a model.

and it can reason about it all as a space-time unit, we're finally getting there with some of the more recent advancements. And it's all just due to sheer data volume and model size. As we get to the next order of magnitude and parameters that it's feasible to train models on, we'll start to see even more visual AI use cases unlocked because we're getting now to the scale where there are enough parameters in a model to learn interesting things about video.

Brian Moore (06:30.958)
for us as a company in the visual AI space, very excited about the new use cases that will be coming online due to the scale of processing that now exists over the next few months.

Prateek Joshi (06:43.882)
And when you build a model to look at this data and understand it, and let's take a simple use case. There's a live video stream coming in and you have to detect if there's an intruder at the door or not. Look at a simple system. And you have data and you have your model. Now, when you look at the performance of the system over time, this is step one. Look at a problem.

How can you tell if it's a data issue or a model issue? Like, let's start from there. This high level, you see a problem, like how do you look at it?

Brian Moore (07:20.172)
Yeah, well, first of all, one of our key findings in the academic work that myself and my co-founders have done, as well as our experience in industry, in theory, it's all about models and algorithms. In practice, it's always about data and data quality. Like that's where 90 % of the time goes. Any data scientists in the field will tell you that, whether it's like headaches around how do I even just deal with the data and build my pipelines down to what you're asking, which is, hey, when I trained the initial version of a model,

and I'm only at 90 % performance and I need to get to 99%, where do I invest my time? Is this a data issue? Is it label mistakes or errors in the data? Is it a representation issue? Maybe I haven't defined a data set that's representative enough of the types of scenarios that that model is going to have to deal with in practice. And maybe I have a false sense of my model's performance because I'm looking at the wrong scenes or biased scenes.

For us, we call our approach scenario analysis, where it's very important for subject matter experts to sit down and define all of the different niche, especially edge case scenarios in which they expect their system to perform. And when you do your analysis, you can't just look at it in aggregate and say, what's the overall performance of my system? Is it 99 %? But rather, what's the performance in each of the key scenarios? And

Am I actually in evaluating model A versus model B? Am I getting performance across the board in all of these less common edge cases, or am I only getting better in the common cases? Because the edge cases are what matter in practice. Whenever you see a visual AI system deployed into the world, it always works perfectly in the kind of normal, easy case. But where it fails and what leads to high profile rollbacks of visual AI systems is when the system encounters something that it didn't expect.

I love Andre Carpethi, formerly at Tesla. He had a presentation about state of the art in self-driving. And so he would point out a basic task like detecting a stop sign. You might think, that's a pretty easy use case. How hard can it be? Let me just take some photos of stop signs. I'll train the model, and it'll be perfect. And then when you think about it a bit more, you realize, well, wait a minute. There's stop signs that exist in

Brian Moore (09:41.775)
the US, maybe they look different in other countries. And then of course, stop signs can be occluded by trees, the lighting conditions can change their color, so you can't just rely on it being red. Maybe you have a truck driving down the highway that's carrying a bunch of stop signs in its bed. Your system has to be able to understand all of those nuances. And if you can't distinguish a stop sign in clear view at an intersection from a stop sign that's in the back of a truck, you don't have a reasoning system.

And so those edge cases and nuances, they really matter in practice. And so we try to think about evaluating models through that lens of what scenarios is it supposed to perform in.

Prateek Joshi (10:25.792)
If you look at the edge cases, it's a long tale of edge cases in the real world. So how do you think about coverage? Especially in visual data, the edge cases can, they're insane. So how do you think about coverage and making sure that a system can work with very, high reliability?

Brian Moore (10:46.05)
Yeah, super basic thing first. We find that so many people forget to actually look at their data. It's so easy and tempting to build pipelines that automate things and to think about evaluating a model's performance as a spreadsheet problem, like I mentioned before. If the performance is better numerically, then it must be better, right? If you just think about looking at the performance, like evaluating the performance of a model on real data.

that can really lead to key insights on whether the performance of the system is truly acceptable in different scenarios. So thing one is just to look at the data. So suppose you're committed to looking at it. The next thing, like we mentioned, is you have to really sit down and describe all the different types of scenarios that the system needs to perform in. And then inevitably, you're going to find a bunch of rare situations. And then the question is, how do you get data to train your model to perform in those rare instances?

So there, there's like this active learning or data mining challenge where you can know how to describe what it is that your system is maybe weakened and needs to do better. And you need to go find examples of that. So one technique is definitely to leverage, let's say, like foundation models to do similarity search in your data lake. So very concretely, like set up a world where all of your raw data gets indexed so you can perform a query.

There's multimodal models, like say Clip or something for the technical folks in the audience, that can allow you to do a search. So you can describe in natural language the key elements of the scene that you need to find examples of. And then you can go mine that data from your data lake. So when you think about updating your training data, you're not just throwing in a bunch more data at random. If you do that, you'll just get more of the typical case, and your model will get no better at these outliers.

But if you target your search and increment your data set in the areas that it's specifically underrepresented or weak, then you'll get the biggest improvement in model performance. If you don't have examples, synthetic data is another great technique. This is more of an emerging one, but especially in physical AI, the capability to generate photorealistic reconstructions of scenes through neural reconstruction models and things like this, it's getting to the point.

Brian Moore (13:03.619)
where let's say you're trying to train a self-driving car and you want to understand how it performs when there's a delivery truck on the side of the road, you need to cover not just, you know, whatever UPS trucks, but also FedEx trucks. Simulation models are to the point where you can actually come in and say, hey, replace this UPS truck with the FedEx truck or change the weather conditions and so on in the scene. And this, course, is a great way to increase coverage of rare scenarios that you you need to invest in.

Prateek Joshi (13:35.552)
That's great explanation. And I think that helps understand how to tackle a problem in a practical way, because on paper, it can hide a lot of problems when you're inside papers and inside your lab. I think just look at the data is always a good piece of advice. And then obviously, once you spot the issue, then go collect data to solve that puzzle. So that's great. Moving into...

the company. You're the CEO of Voxel 51 and also you have an open source component and it's called 51. So just a quick high level tour like what does Voxel 51 do and also the open source package what does it do and where's the line between open source features and who should upgrade to the paid offering.

Brian Moore (14:25.975)
Yeah, definitely. So first of all, our mission is to improve AI performance through better data. Because as I mentioned before, in practice, it's all about data. In theory, it's all about models. So we exist to help solve those key data challenges and get visual AI systems into production. Maybe to take a step back for a second, the genesis of OXL51 came from University of Michigan. I did my PhD there in machine learning.

a few years back and met my co-founder Jason, who was a faculty of computer vision at the university. And we actually started collaborating together on research projects and then had an opportunity to work on a grant from NIST, the federal government agency, to work on deploying computer vision to public safety use cases.

So integrating computer vision onto camera networks to help reduce first responder times in cities that have large camera networks to keep people safer. Anyways, it was a great test bed to sort of develop early versions of our technology. You we were located in Ann Arbor, Michigan, just down the road from Detroit. And so we had the chance to collaborate with some automakers to develop early versions of L1, L2 autonomy features, you know, 10 plus years ago.

So it was through those experiences in industry, solving real world problems that we really connected with this need for investing in data quality. And because we didn't find any sort of data infrastructure off the shelf we can build on top of, we had to build it internally. And the more conversations we had with other companies, large and small in the space, the more we realized that everybody seemed to be building their own data stack and data tooling, especially for visual AI internally.

And we said, aha, there's something there. And what we chose to do actually as our official business model was open source it and give it away for free and see what happens, which was actually the perfect thing for us to do. Because I think as a founder, a technical founder, there's a lot of temptation to just go into the lab and build a bunch of tech and nerd out on all of the nuances and theoretical aspects of what the world's best data infrastructure could look like.

Brian Moore (16:44.907)
But all of our most important learnings came from just open sourcing, giving it away, and then having conversations with users. It was through that open source process that we met many of our early adopters of the product, learned more about their use cases, what things they were liking about the tool, what things they wished they could do, and to your point about our commercial strategy and the difference between our open source project, which still exists. We still invest in it. We still love it.

very bullish on being the world's most easily accessible data infrastructure platform through downloading our open source tool and using it for free. But we heard from enterprises that they were liking using the tool as individuals with data stored locally, but they wanted to collaborate with their teammates on data stored not locally, but in the cloud. And they wanted to scale up some of the workflows they were doing, like labeling data or evaluating models, connecting to their data lake.

And to do that, they wanted our software to connect to their cloud. And so those became sort of natural features of our enterprise version of the tool, which is appropriate for customers that want to collaborate on large scale data as a team.

Prateek Joshi (17:56.832)
And also the open source tool does gain momentum. So tell us about quickly what's the footprint, how many downloads, how many people using it, like how, just paint a picture of that.

Brian Moore (17:57.713)
and also the open source.

Brian Moore (18:09.368)
Yeah, absolutely. So it's been around for a few years now. 10,000 stars on GitHub, millions of downloads, representation across verticals. So as a business and also our open source product, we support verticals and use cases like autonomous driving, certainly, defense, agriculture, retail, advertising, robotics, manufacturing. So we're really touching all of the very exciting and diverse use cases of Visual.ai.

in production. We definitely invest a lot in interacting with our community. Like I mentioned, it was the way that we got some of those key insights into what enterprises really needed from their data infrastructure. And we continue to invest in that today. We've got quite a large community team for a company of our size that puts on literally hundreds of events per year, whether it's organizing opportunities for folks to come present their research or what their company is doing with Visual AI.

or running hackathons or virtual webinars about best practices for using our software. We really try to invest in building a community around our open source project, because it's our stated intention for it to remain free and open source and as powerful as we can possibly make it for individuals in perpetuity.

Prateek Joshi (19:31.616)
Going into just addressing the size issue here. The amount of data is vast and any company dealing with video data, they just have to deal with insane scale. So as a starting point, how do you advise teams that have to, people who have to store big vision data sets so that they know it's cheap, it's fast to access, easy to origin, easy to model, and hopefully do a bunch of stuff. So starting point, storage.

What's your advice to teams dealing with massive data sets?

Brian Moore (20:05.412)
Yeah, so maybe unlike other types of AI, in visual AI, the raw media, the images, the video, the sensor data, that is like orders of magnitude larger than all of the metadata that you might have associated with it. By metadata, mean like basic stuff like timestamps and properties and other more like AI specific things like the locations of objects and so forth. So.

Virtually everybody will store the raw media in cloud storage, cold storage, buckets, things like that. The metadata is very unstructured. And so that's exactly the problem that we're helping solve for our customers. You ingest all of the metadata into 51, it becomes indexable and thereby searchable. So when you want to visualize and query your data set by all of those interesting metadata, including like,

more like semantic things, like I mentioned before, if you're trying to find rare edge cases in your data set, you need to index it with embedding models. So you can do a search by similarity, find me similar images or find the images that, or video data samples that match this natural language query. That's important metadata as well. So the best practice for sure is to have all of that metadata ingested into.

something like 51 that makes it all searchable, visualizable, queryable, so that you can truly find the right corners of the data set that you need to train on or improve and so forth.

Prateek Joshi (21:36.712)
And in an ideal world, I would store every single frame at the highest fidelity and it'll be fine. But let's say you are talking to someone who are like, okay, this videos are getting out of hand. I gotta do something. what the raw data, so what are the measures and what do you tell them to do so that without losing useful detail, you can keep the costs down and you get to keep almost all the detail.

Brian Moore (22:03.918)
Yeah, so one interesting thing about visual AI is because of the immense volume, like if you're capturing raw 4K video at the edge on some physical device in the real world, it's oftentimes completely infeasible to even get all that data back to any kind of central hub, right? I mean, even just bandwidth costs, your cellular bill or whatever would be too much. And so visual AI systems have to be designed to operate at the edge.

So much of the inference and the raw data processing happens only at the edge. And so a best practice for developers of visual AI systems is to log events at the edge whenever, you know, let's take autonomous driving. If there's a hard breaking event or some sort of anomaly, tag it and then send that event data back. And there you want to send the full resolution data because this is a very unique or interesting or important scene that you're likely going to want to incorporate into your training data.

So log events store events at high resolution in your data lake for training. But beyond that, we're seeing less of an issue from our customers in terms of being willing or able to store the raw data. They can store vast amounts of raw data. The real question is, which data should you train on? The real limiting resources like watts, like a power that you're willing and able to dedicate towards training your model.

And there is where it becomes very important to select the right data for training. If you feed your model more examples of typical situations, like I argued before, it's not going to move the needle on those rare scenarios that truly dictate a system that's ready for production versus one that needs to stay in development. So that data selection problem is where the most important investment is.

Prateek Joshi (23:54.312)
And if I give you a huge messy data set and the goal is for you to probe and understand it, how would you do it? Like what tools do you have at your disposal to poke at it? And also what are the, maybe the top three, like lowest hanging fruit that usually moves the needle on the messiness of the data set?

Brian Moore (24:17.166)
Yeah. So first of all, it's such an important problem, so important that the typical enterprise that we work with has entire teams of dozens or hundreds of data engineers that are responsible for building high quality pipelines to extract all of the right metadata that they care about. So depending on their use case, they're building pipelines where every time some raw data comes in, they're tagging it according to a certain set of, let's say, like foundation models that they've trained internally or put off the shelf.

so they can augment that raw data with all of the metadata that they can find. Again, let's take an example, let's say in agriculture. If you're interested in understanding weeds or different varieties of crops or optimizing weather patterns, you want to tag all of that raw data with all of your best models so it has as much metadata available as possible so that when it goes into your data, you're going to be able to filter and query by all of that.

And the answer to what are the right models to run to tag the data, it's really use case dependent. It depends on the problem that that person's working on. In some cases, using off-the-shelf models might be sufficient. In other cases, you really need to sort of fine tune and develop your own models to do that tagging. As an infrastructure provider, it's our goal to make it easy for you to upload or choose the right models that you want to use to index your data. But that's definitely a key thing. Like index the data.

store as much metadata as you can about it because that's going to be absolutely key to being able to search and utilize it downstream.

Prateek Joshi (25:49.566)
And when you have to measure the rate of progress or improvement, how do you keep yourself honest in the sense that it's easy to get better at the easier use cases and discard the hard ones? So how do you keep yourself honest on metrics that matter and not just getting better on the easy use cases?

Brian Moore (26:11.408)
Totally, yeah. That's where it just gets back to our core principle. Just look at your data. Never, ever evaluate the performance of a system by just looking at its performance in a table or a spreadsheet. Always build a gold standard data set where you've all agreed upon, like, these are the specific scenarios, especially visual data. Because as humans, our best sense is our perception. Many times, computer vision problems suffer from this perception from humans that, that's easy. Because me, as a human, have

visions my best sense, can exactly identify a tiny car that's like one pixel in the distance of an image because I'm able to understand all this context and so on. That stuff matters. If you get in the habit of defining gold standard data sets and then reviewing visually in addition to numerically the performance of your model in different scenes, you'll get a much better understanding of its intuitive performance. And if you're deciding between model A versus model B, there's lots of like a

second order conclusions that you can draw about the strength and weakness of these different models. Even for a same level of performance, you might find one is far better at edge cases, one is far better at common cases, and that can be an important go, no-go decision for whether you should ship that model or ensemble them or something else. But all of the key insights in our experience come from looking at the data and looking at the model output as part of the development loop.

Prateek Joshi (27:40.384)
Let's talk about compute and iteration speed. So people often complain that, hey, training takes days or weeks and it's a long, long process. So if you had to help them, guide them, find the speed ups, where is a fertile ground? Is it in data selection, scheduling, get more compute? Where is the bottleneck here?

Brian Moore (28:07.95)
Yeah, we're certainly hearing that compute is a key bottleneck. Even when we work directly with some of the large hyperscalers, even their internal teams lack access to compute, enough compute to run their experiments. So if that's the case inside the companies that are developing and providing these clouds, then that problem can only be exacerbated for a

companies at the next level in the stack. So yeah, you have to be very decisive with what data selection to your point, your strategies you're using to feed your next training data set to your model. Of course, people use techniques like they don't train models from scratch. They'll take a model off the shelf and then fine tune it for their specific use case. That cannot be often a large head start, not suitable in all cases. Certainly some of the more niche use cases of computer vision.

like take like a defect detection or other use cases where you have like a camera in a fixed location with like pretty constrained environments. We're finding that the fastest path for some of our customers is to not train some big, huge power hungry model, but to train more of a smaller expert model that's intended for one specific task. That can often be a much faster and more cost-effective path.

Because after all, for visual AI, you normally have to quantize or otherwise distill the models into something smaller so it can actually run at the edge. It doesn't do you a lot of good to have some 7 billion parameter model that has fantastic reasoning performance if you don't have the power or device footprint capability to even run something like that at the edge. And so there's a lot of work that goes into distilling models. But in some cases, it's better to just

train a smaller model from scratch from the start and kind of bypass a lot of that expensive distillation.

Prateek Joshi (30:05.364)
Let's talk about the economics of data infra. Companies spend tens of millions, actually hundreds of millions on just data infrastructure. It covers a vast amount of work, but they spend a lot of money. Now, if you have to look at this landscape and identify like wasting money, meaning where do companies waste the most money when it comes to data infra?

Brian Moore (30:32.42)
Yeah, so probably the biggest one I would point to is how you get your data labeled or annotated. So of course, when you're training a supervised model, you need to provide examples of the intended output to train the system, whether you're doing it from a reinforcement context or from scratch as a traditional supervised system. So a big cost center in an ML project is how you're getting your data annotated.

Historically, you might gather a bunch of raw data, send it off to an outsourced firm or something like this for humans to draw boxes or tag data, caption data, whatever the use case may be. A real quote from one of our customers that represents like a typical outcome. They sent off a big seven-figure data collection campaign. The data came back nicely annotated, and then they only used 1 % of it.

because the data that came back was just more of the typical scenarios that they already had perfect coverage and performance on. So especially given how expensive the data labeling campaigns are, it becomes very important to choose the right data, not just more data to label, because the limiting factor is often not how much data your model can process, but how much you can afford to actually get labeled so that your model can train on it. So that's key.

Prateek Joshi (31:53.404)
Yeah, this is great. So going in that direction, data labeling, what's the second most wasteful place here?

Brian Moore (32:04.206)
Yeah, well, just to double down on that data labeling piece. So we believe that that reality I just described, whereby data labeling is just a process of picking some data and outsourcing it for some humans to draw boxes on. That's kind of the past. The future is annotation is much more of a software problem. There are, as an example, very powerful foundation models that are available.

that might know quite a lot about your use case. And so a much faster path to bootstrapping a high quality training dataset might be to first send all of those raw data to these models to do some auto-tagging and then have humans come in and do some much more, you know, higher fidelity QA work, thereby getting to the same end goal of a high quality, you know, diverse training dataset in much less cost and time.

So we definitely question the status quo that the right thing to do is just to outsource data labeling. Instead, think of it as a software problem and integrate it directly into your data platform. Not only for cost, but also because, you know, we're seeing a lot of interesting changes happen in the market where annotation companies or data engine companies are getting scooped up by hyperscalers or otherwise like changing hands, which calls into the into question as an enterprise. Should you be trusting your data?

with an outsourced labeling provider? Or is it just one acquisition away from all of your data, your tasks, your training recipes falling into the hands of a competitor? So if annotation is really a software problem anyways, why not bring it in-house and use a platform that allows you to do your data labeling in a secure, internal way in your cloud? That's what we're providing.

Prateek Joshi (33:50.496)
So outside of labeling, where else do companies waste money?

Brian Moore (33:56.173)
boy, I mean, I guess there's loss all over the place. In our context, specifically focused on data, we find in surveys of our customers that before our product, they'll spend, their data engineers will spend over half of their time wrangling data, we call it. This means like, you know, building their own scripts and their own tools to store data and manage data and notebooks to run experiments and so on.

So this is, your developers are one of your most expensive resources, of course, in putting together an AI product. And so investing in tooling that makes your team more effective can be a huge time savings, cost savings. Our mantra is always, you are the subject matter experts and your use case. So you need to put your team in a position where they can use their skills as experts in agronomy or autonomy or what have you.

and offload all of the infrastructure work to a company with a proven track record in the space that solved that problem and doing it in a best of greed way across your industry. Innovate in your subject matter expertise, not in infrastructure.

Prateek Joshi (35:10.144)
Let's move to you as a founder and a company builder. So as a starting point, you've been doing this for a while. How do you position Vauxhall 51 in the market? And rather, maybe part B is if you had to advise a younger, earlier stage founder on just positioning something in the mind of the customer, what would you tell them?

Brian Moore (35:35.473)
Yeah. Well, first of all, so how we position Voxel 51, we want to be the development platform for building visual AI. So we're focused on development and visual or multimodal AI specifically, rather than falling victim to trying to offer a sort of generic, you know, all things AI type solution. Especially when you focus on infrastructure and especially data, we find that it's much better to invest in being flexible and extensible as opposed to building some sort of like

walled garden flashy product that has like very opinionated workflows, especially in a space like AI and visual AI, where the best practices aren't well understood, the pace of innovation is extremely high. We found that it's the winning strategy to prioritize extensibility and being able to, for example, deploy into a customer's infrastructure because data is so private and important to them, or being able to quickly adapt to integrating with the latest model when it comes off the shelf.

or that internally built workflow that enterprises have. So flexibility, extensibility, that's definitely key to our positioning. But even taking a huge step back from that, as I mentioned earlier, as a technical founder, don't bury your head in the sand and build a bunch of tech. All of our key insights came after we decided to open source, build in the open, give it away for free. That really optimized for having discussions with customers. And then after enough, you know,

conversations were under our belt, it of became obvious to us what the market wanted in terms of a product. And they answered for us what our paywall should be. And for example, in our case, we started with just image and video data, and then over time, we're becoming more and more multimodal, which means adding support for new modalities, 3D, LIDAR, captions, audio, time series data.

because we invested in building a community through our open source tool, we were able to kind of listen to our customers and have them tell us through their votes, their feature requests, their usage data, which are the right emerging verticals or use cases that we should go into next, rather than just taking a guess and hoping that we're correct.

Prateek Joshi (37:53.884)
I have one final question before we go to the rapid fire round. And this is about the future. So part A, what AI advancements are the most exciting to you as it pertains to Voxel 51? And part B is if you could set one industry standard for vision data sets like today, what would it be?

Brian Moore (38:17.616)
Wow, great question. So, you what am I most excited about in AI? I, even from the very beginning, I felt like I had to somehow apologize that computer vision has been around forever and people kind of see it as a solved problem. And yet, if you think about it, like I argued, almost all of the data that's captured and exists in the world is visual in nature. And if you think about it, we don't have fully

deployed physical AI systems that are automating, you know, human activities, know, saving lives, saving us time. So I'm very excited that we finally got to the scale of models and data sets where some of our, you know, like algorithmic advances are ready to start crunching on visual data. So I'm very bullish on the next chat GPT moment being in visual and physical AI, which I think is, you know, a long time.

overdo and represents the biggest opportunity in terms of just raw use cases and data volumes in the space. Data standards. There's definitely no one standard, especially for visual data. I like to think that we're doing our part in providing a popular, both open source and commercial product to standardize on the storage of data. So I'll throw our hat in the ring. Check out 51.

but I do think that flexibility and extensibility is key. There's this funny thing in like ML research that a lot of the research in the community has catalyzed around benchmark data sets that maybe an academic group releases, whether it be like the ImageNet data set or the Cocoa data set or something like this, and whatever format the researchers chose to distribute the data, it kind of becomes like a de facto format.

Like you need to be able to share data in Cocoa format. Otherwise, many systems won't be able to accept it. So it happens kind of by accident. But I think that we could benefit from sitting down and standardizing on formats that are more broadly interchangeable and optimized for not just sharing data, but processing it, querying it, indexing it, and so forth. So excited to see how that evolves.

Prateek Joshi (40:36.512)
Perfect. With that we're at the rapid fire round. I'll ask a series of questions and would love to hear your answers in 15 seconds or less. You ready? Question number one. What's your favorite book?

Brian Moore (40:44.912)
All right.

Brian Moore (40:50.736)
So it always changes. So I'll give you my favorite book of the last year. Trillion Dollar Coach. This is a book about Bill Campbell, who's this fascinating CEO leadership coach. He's like a football coach by background to somehow wound up advising Eric Schmidt, Larry and Sergey at Google, Steve Jobs at Apple. So fascinating to hear how a football coach has like

steered the world's largest tech companies behind the scenes.

Prateek Joshi (41:25.792)
Which historical figure do you admire the most and why?

Brian Moore (41:29.934)
Hmm. So I guess it's slightly bittersweet that I can say historical figure, but huge fans of Berkshire Hathaway. So Charlie Munger, fascinating guy. As like my my guilty pleasure, I love listening to Berkshire Hathaway annual meetings. If you're familiar, Charlie and Warren Buffett are hilarious. They're insightful, love the long term perspective. But one thing that stood out to me about Charlie is that he

Prateek Joshi (41:51.828)
Yeah.

Brian Moore (41:58.531)
always tries to associate himself with really intelligent and like well-rounded people. He would hold these dinners and invite all of these like luminaries in different spaces and just like pick their brain and learn from them. I definitely feel like you are like the average of people you associate with, so that's awesome.

Prateek Joshi (42:17.472)
I'm a huge fan too and I have a collection of all the newsletters throughout the decades. So it's fascinating. I've watched videos. Yeah, it's a huge fan. All right, next question. What's the one thing about visual AI data that most people don't get?

Brian Moore (42:24.216)
Yeah

Brian Moore (42:34.884)
Hmm. Well, I guess I spilled the beans already, but how prominent it is. The vast majority of bits that go through routers on the internet are visual in nature. So just think about the huge opportunity, even if it's all about tokens and watts as the new like economy of the AI revolution. If there's that much visual data, then there's clearly a huge market opportunity once we're able to effectively reason on it.

Prateek Joshi (42:39.872)
Hmm.

Prateek Joshi (42:58.58)
What separates great AI products from the merely good ones?

Brian Moore (43:03.034)
Hmm. Another thing I mentioned before a little bit, especially in a fast moving space, let's assume you're building something in the infrastructure layer for AI, you need to prioritize flexibility and extensibility. Nobody wants a product that's purpose-built for a model that came out two years ago, because two years is forever. The state of the art in models and algorithms and data formats and so forth has definitely changed since then.

And if you're providing a tool that's flexible enough for you to plug in the latest and greatest model, technique, and so forth, now you've built something that is going to be relevant in the future.

Prateek Joshi (43:42.602)
What have you changed your mind on recently?

Brian Moore (43:46.723)
So I used to be a night owl. I feel like in graduate school, like the universe just wants you to stay up late and all of your deep thinking happens like after midnight. For me now, I'm a new dad. 6am to 8am is like the sweet spot. After you know, the morning feed, I can finally get some work done, stay heads down. So I have to be a morning person now. I'm still adjusting.

Prateek Joshi (44:14.356)
That's amazing. I went through a very similar journey and now I feel like morning is just like way better. I think many people don't never get to experience it until, but when you do, it just like, it's unlocks a whole new level. So next question. What's your wildest AI prediction for the next 12 months?

Brian Moore (44:34.18)
Hmm. I think I got a double down on my claim that annotation is a software problem. If you're familiar with the data annotation or data labeling space, there's been some very high profile companies, tons of investment into data labeling for good reason. It's very important, but don't sleep on the fact that it's no longer a human problem. It's a software problem. And the smart money is on investing in tools that you can use to auto label data.

Prateek Joshi (45:02.922)
Final question, what's your number one advice to founders who are starting out today?

Brian Moore (45:08.752)
So my head of community, chief community officer on the team, really awesome guy with decades of experience in open source and building software products in the data layer. He always says that people don't, enterprises especially, don't buy software from companies. They buy it from the people at the companies. And so we've had a lot of the...

A lot of our strongest relationships with our customers came even when there was some big problem that happened right after deployment. But because we stepped up to the plate and we supported them and helped them understand that we're here to build alongside you and make this work, they became our biggest champions. They became our strongest customers. They definitely were less so buying the product from the startup because, you know,

If you're gonna, are you really buying a product from a 50 person startup because you believe that it's superior technically in every possible way than something that's been built by thousands of engineers? No, you're betting on someone being responsive to you, being nimble, able to adapt to the specific things that you need. So that's definitely, that mindset has helped us.

Prateek Joshi (46:20.8)
Amazing. Brian, this has been a great discussion. As an engineer, I've spent the longest time in computer vision, image processing, visual AI. So this has been a wonderful way to kind of go down memory lane. So thank you so much for coming on to the show and sharing your insights.

Brian Moore (46:38.243)
Amazing. Enjoyed it. Thanks, Pratik.

Prateek Joshi (46:41.248)
All