What the futr
What the futr is a biweekly podcast that explores the intersection of AI, sales, and humanity. Hosted by Sandesh Patel and Chris Brandt, each episode features AI startup founders and tech leaders sharing real stories, their value proposition, and visions for the future—structured like a smart first-call sales meeting. It’s all about making AI make sense for businesses—and helping people stay informed, not left behind.
What the futr
Inside the GPU Arms Race: Suresh Vasudevan, CEO Clockwork Systems – Ep 06
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Welcome to Episode-6 of What the futr.
In this episode of What the futr, Sandesh Patel and Chris Brandt sit down for a one-on-one conversation with Suresh Vasudevan, CEO of Clockwork Systems and one of the most respected leaders in enterprise infrastructure.
Join the conversation as they discuss topics ranging from the hidden bottlenecks of AI infrastructure, the ongoing shift from speed to resilience, security, observability and automation, to Suresh’s journey from India to CEO of Clockwork Systems and insightful advice on building leadership that actually works.
This episode is a rare insider breakdown of the biggest shift happening in computing today, delivered by someone who has built and led billion-dollar infrastructure companies.
If you think GPUs are the end, stay tuned to learn what’s actually coming.
Subscribe for regular episodes.
👉 Explore the platform: https://futrconnect.io
👉 Business inquiries: sandesh@futrconnect.io
00:00:00:07 - 00:00:28:16
Sandesh
Thank you for tuning into another What the Future podcast. I just got done recording a podcast with my good buddy Chris Brandt and our special guest, Suresh Basu. Davin, you might remember Suraj from his days at NetApp. He was a rock star there and then went on to be a CEO of nimble when IPO and sold to HP later he became the CEO of Stig and ultimately he's now at clockwork, where he is also CEO.
00:00:28:18 - 00:00:51:14
Sandesh
If you know Suresh as you know him to be incredibly brilliant and talented and very well respected as a leader. But he's also a really good dude. As well. We talked about things like leadership and culture, and then of course, we led into what clockwork does, which, really solves a lot of problems in AI infrastructure. I learned a lot.
00:00:51:14 - 00:01:06:11
Sandesh
I didn't realize some of the complexities that went into building out these environments. But it was really, really cool. I hope you guys enjoyed the show, and thanks for tuning in.
00:01:06:13 - 00:01:12:15
Sandesh
We got to change the game.
00:01:12:16 - 00:01:17:00
Suresh
00:01:17:01 - 00:01:42:15
Suresh
Earlier this week, we've probably had three customer meetings, that were all really, really good. Some key customers apps. And this weekend. This weekend, I went on something called an SF Hills ride, which I've not done in ages. It's a five hour ride through all the hills of San Francisco. Oh, it's a bike. It's a bike ride to all the hills of San Francisco.
00:01:42:16 - 00:01:48:18
Suresh
And that was enormously fun. I was not sure if I was fit enough to do that, but ended up being a lot of fun.
00:01:48:20 - 00:01:49:18
Sandesh
So that's.
00:01:49:20 - 00:01:51:20
Suresh
That's my last few days.
00:01:51:22 - 00:01:57:14
Sandesh
Oh, that's that's amazing. I also saw your, the webinar you did on Tuesday.
00:01:57:16 - 00:02:07:18
Suresh
That's right. The panel discussion or the webinar. Sorry. Was was that yesterday, I think. Yes. Or, I'm confused. No. Day before yesterday. You're right.
00:02:07:20 - 00:02:08:20
Sandesh
So. No, that was.
00:02:08:20 - 00:02:11:08
Suresh
Really good fun. Enjoyed it thoroughly.
00:02:11:10 - 00:02:34:11
Sandesh
Yeah, that was a great conversation. So I talked to Joe, obviously. So I've already gotten the clockwork pitch and then I was able to listen to it, and it really a lot of things started clicking for me. Nice. And for me, like, I've been selling infrastructure my entire career. Right? So this is cool now. Like exactly. We needed that there.
00:02:34:12 - 00:02:36:08
Sandesh
That's freaking really cool in a while.
00:02:36:12 - 00:02:58:12
Suresh
I completely agree. And I think, you know, it's the last 15 years, we've become so used to the cloud abstracting away all infrastructure and almost basically not having to think about networking, not having to think about storage, not having to think about compute. It all comes from. So you really operating above that layer for the most part.
00:02:58:14 - 00:03:19:23
Suresh
And then the world of AI and the AI infrastructure data centers are back in with a bang. I mean, hard for sure. It's actually one thing I will tell you when we talk, so many of these neo clouds or even people trying to put up on prem infrastructure, the skill sets have disappeared. People that really understand these domains don't exist anymore.
00:03:19:23 - 00:03:26:02
Suresh
I mean, I mean, of course they do, but not in anywhere near the numbers needed to fund the kind of infrastructure buildout we are seeing.
00:03:26:03 - 00:03:28:08
Sandesh
That was one of my big takeaways.
00:03:28:10 - 00:03:31:06
Chris
I have to go back to building data centers.
00:03:31:06 - 00:03:32:03
Suresh
I know, Chris.
00:03:32:08 - 00:03:40:16
Sandesh
I yeah, Chris used to be a CTO, building data centers. He's built this 300ft to tell him about the one that you built.
00:03:40:19 - 00:03:46:08
Chris
I built an EMP shielded data center 250ft beneath a mountain.
00:03:46:10 - 00:03:51:05
Suresh
Oh, my gosh. I'm curious what was in there and why it needed to be ampicillin, but that sounds cool.
00:03:51:05 - 00:03:52:13
Chris
It was some government work.
00:03:52:17 - 00:03:54:14
Suresh
Yes. I imagine,
00:03:54:16 - 00:04:24:16
Chris
It was an insurance company, actually, that was interesting because they wanted to, you know, like we're like, well, there's not going to be much for you after the Army to do anything. Yeah. They're like, yeah, at least we'll have the, you know, of the, the, arrays with all the, claims data. Nice. Okay. You do you, but, but it was wild, man, because I was it was, you know, this facility was absolutely massive, and it it was the, the facility itself was a full acre underneath underground.
00:04:24:18 - 00:04:49:11
Chris
It had its own zip code. It had its own fire department. It had a lake in it that was the chill water plant. I call things in the lake of rain off the backside of the mountain. So there's no never any risk of flooding with it. Right, right, right. And the mining engineer who who managed the whole thing, he would be he would get into this boat with the big hip waders and stuff like that, and get in his boat with a flash flashlight and a can of spray paint.
00:04:49:11 - 00:04:52:01
Chris
And he'd be like, If I'm not back tomorrow, come look for me.
00:04:52:03 - 00:04:57:03
Sandesh
Oh my gosh, that's where the lazy and where the lake. Why are you sure that.
00:04:57:04 - 00:05:00:20
Suresh
Yeah, your skills will be in high demand at the moment.
00:05:00:22 - 00:05:09:22
Chris
Right. It's funny because like the a lot of the guys who built the, the, you know, the, the black site in Utah for the NSA. Yeah. Same. Same crew who.
00:05:10:00 - 00:05:11:05
Suresh
Yeah, exactly. You can imagine.
00:05:11:05 - 00:05:20:14
Chris
EMP shielding and they man, they, they, they got they made me think twice about a career in security. Like, I don't want to I don't live with that kind of paranoia.
00:05:20:14 - 00:05:21:11
Sandesh
I know, I know.
00:05:21:11 - 00:05:24:04
Suresh
Exactly, exactly what I'm saying.
00:05:24:06 - 00:05:51:23
Sandesh
Yeah. It's funny like how how it all kind of comes back around. Right? Like you said from the infrastructure folks not going to the cloud. So now those skill sets are not there, but there's also this uniqueness. We won't get into it. The uniqueness of high performance computing, these these the GPU powered environments to in my mind, and I think probably most people, they just think what it's hardware, it's servers and it's CPU is, you know, chips and it just works.
00:05:51:23 - 00:06:03:07
Sandesh
And there's storage and there's there's been networking around for years. What's the problem. Right. And so like, what's cool for me is like, there's new problems to solve, you know? Sure.
00:06:03:07 - 00:06:22:02
Suresh
I mean, oh my gosh. First, how much the energy density of the data centers that you're putting these into is totally different. So where do you get the power from then? How do you the racks are basically dissipating a lot more, heat. And they have a lot more power draw. So you have to change the racks all over.
00:06:22:02 - 00:06:54:10
Suresh
The cooling has to be completely redone. The cabling per server is roughly, 8 to 10 x. The connection density of just because I mean, where a traditional server might have a couple of NICs. These modern servers have ten NICs in an eight GPU server and far more in assembly to a GPU server. And so and these are roughly operating at somewhere between 4 to 8 x the bandwidth of a person on a traditional server.
00:06:54:10 - 00:07:16:11
Suresh
Right? It's a so you start doing the math on every front. That's part of the reason why seniors are so rampant, in this environment. Because almost everything is an order of magnitude faster, harder, denser. And so that means just things are just operating at the edges all the time.
00:07:16:13 - 00:07:30:05
Chris
Yeah. Yeah. Well, I mean, I, I remember building, you know, like we were a home run fiber and, you know, like all the, like this big single mode bundles would be like this big around and we, like, make the, the racks come down from the ceiling right as they were.
00:07:30:05 - 00:07:31:04
Suresh
So I don't know exactly.
00:07:31:07 - 00:07:35:18
Chris
Exactly. Never, ever remove a cable because we're seconds. Right.
00:07:35:18 - 00:07:39:00
Suresh
Oh that's right. No, that's exactly right. That's exactly right.
00:07:39:02 - 00:07:46:10
Chris
Well, you label, you still just can't trust it. Yeah, it must be. Selling them through is going to, you know, can disconnect things and break things to sets.
00:07:46:16 - 00:08:16:00
Suresh
Exactly. We have one large customer where our contract we typically price on a per GPU basis and per agent or per node basis. And they basically priced it on the basis of where we are deployed works the kilowatt, all state racks in which we are basically deployed, if you will. And so it's almost like the unit of scale has become how many kilowatts of megawatts and, are basically able to be deployed in a, in a data center context.
00:08:16:00 - 00:08:21:10
Suresh
And so it's completely the constraints are quite significant on a lot of fronts.
00:08:21:12 - 00:08:30:02
Chris
Yeah. And so the cooling is the hardest part. Yeah. For sure. Like indeed indeed the power there, it's scary. I mean some of the power they're putting into these, that's all.
00:08:30:04 - 00:08:50:03
Suresh
Water cooling systems and crazy cooling structures. Exactly, exactly. Yeah, exactly. And then you start some peeling the onion on that. You know, I still do remember the days when petabytes were massive, and now you're just starting to see more and more of these large scale data centers talking exabytes, as if it's just the way it's sort of in a heartbeat.
00:08:50:03 - 00:09:22:01
Suresh
We've gone from terabytes to petabytes to exabytes. And so the amount of data, the network bandwidth and the speed latency that you need, just every, every angle of the infrastructure design you're pushing, the boundaries. That's right. That's what's making it fun. Because in some sense. Oh, gosh. I don't know if you recall right then before virtualization, the idea that servers were operating at 30, 40% utilization because you could really virtualize the application, it was a single application per server.
00:09:22:03 - 00:09:45:17
Suresh
It was the mindset. And then virtualization came around and you could cram 6 or 7 VMs onto a single physical machine. You started increasing the utilization. We're back into utilization for other reasons, not for very different reasons, but we're talking about utilization in the 3,040% zone. We're talking about availability that in the cloud is measured in the for compute four nines or five nines.
00:09:45:19 - 00:10:07:07
Suresh
We're not talking about 90 to 93%, availability for for the compute instance because most of the rest of the time it's just there are failures that have taken the system down. So on some dimensions we are pushing the boundaries so hard. On other dimensions, we feel like if you solve problems that have been solved all over on things like resilience, fault tolerance and so that's funny.
00:10:07:08 - 00:10:26:14
Chris
That is so like I mean, at some point we used to talk about like tier three and tier four data centers. Right, right. And like to have a tier four data center, you have to have staff on hand all the time that our HPC, AC engineers, you have to have electrical engineers, you have to have like that those skills present and available all the time.
00:10:26:14 - 00:10:42:05
Chris
You know? Right. In addition to all that, you know, to and pathing and stuff like that. And, you know, like we are seeing the change and it was like this. No, it doesn't make any sense to do that. It makes more sense to just make your infrastructure more fault tolerant. And that, that that's right. The rise of the SRT world where.
00:10:42:08 - 00:10:54:15
Chris
That's right. That's right. Like let's just design to what the capabilities of this platform are. Yeah. And and build you know, to accept that. And that's where I think we've kind of landed. And so like those failures are sort of built into the system.
00:10:54:20 - 00:11:24:16
Suresh
Exactly. And that's that will happen here too. We're just in such an early innings on rethinking infrastructure for AI that the first focus has been speed, speed, speed. Right now you're starting to say, well, security matters, resilience matters, observability matters. Automation matters. And so because in the end, those things actually don't take away from speed. They just make it possible for you to to operate in a much more predictable, predictable manner.
00:11:24:16 - 00:11:49:06
Suresh
And so I think you're starting to see and unfortunately, part of it is also the technologies are needed to achieve those very same things, whether it's fault tolerance or deep observability. Well, the well, the discipline is the same. The technologies are very different. I mean, the distributed communication protocol is in training, are not built naturally with the ability to do sort of a retry after a timeout automatically.
00:11:49:06 - 00:12:07:22
Suresh
Don't let a failure in the network propagate up to the application layer. Those are well-understood principles of protocol design. But that's not necessarily how training for a distributed, training works today. Right. And so so there's many things that I think will evolve for sure because.
00:12:08:00 - 00:12:20:15
Chris
I, you know, you should talk to, I talked to, the droid Panda at, he, he runs, he's like the VP and general manager of Cisco out shift.
00:12:20:17 - 00:12:21:12
Suresh
Right.
00:12:21:14 - 00:12:26:20
Chris
And I don't if you're familiar with Cisco, I mean, it's sort of their, like, internal incubator. And so.
00:12:26:21 - 00:12:27:23
Suresh
I'm not actually.
00:12:28:01 - 00:12:49:07
Chris
The two projects that he's working on, like, sort of longer term horizon projects. One is one is sort of quantum networking, which is really an exciting thing. They've got a right, I've got a chip now, that room temperature that will entangle 200 million photon pairs per second. Wow. Okay. And I can send it over like normal fiber optic and then spy quantum teleportation and like, here's the problem.
00:12:49:07 - 00:13:06:02
Chris
It's like with with quantum computers, it's like you can get to a thousand qubits, but you need like 100,000 or million. And it's hard to do that. So like they're, they're, they're like gambling on like, well, let's put a, you know, a hundred thousand cubic computers together with quantum network.
00:13:06:03 - 00:13:06:17
Suresh
And I see.
00:13:06:22 - 00:13:08:06
Chris
I see make it easier.
00:13:08:06 - 00:13:08:19
Suresh
Interesting.
00:13:08:19 - 00:13:33:07
Chris
But the other the other problem they're, they're focused on in a, in the shorter horizon now is what eight network of agents. That's what they call it. I say they're, they're building all the new protocols for inter agent communication in an AI world. So they're doing all that work to tie it all together, like kind of some of the, like the protocol level stuff.
00:13:33:07 - 00:13:40:23
Chris
They're like kind of rewriting the whole stack. Right. So like integrate AI and inner agent communication. So you should, you know, you should reach out to like.
00:13:41:01 - 00:13:46:10
Suresh
Oh, I'd love to. Yeah. It seems like both I mean, they're sort of bleeding edge problems.
00:13:46:13 - 00:13:51:00
Chris
It that they're putting together working group around this. No, no okay.
00:13:51:01 - 00:13:55:14
Suresh
They're very helpful, very helpful. Especially given the focus on communication.
00:13:55:16 - 00:13:58:20
Chris
Yeah. Yeah, I mean, I I'd love to put you in touch with that with them.
00:13:58:21 - 00:14:04:10
Sandesh
I think we're going to change the name of this podcast to Nerd Talk. Their talk I think I.
00:14:04:11 - 00:14:08:23
Suresh
Love it, I love it, I. I love it.
00:14:09:01 - 00:14:11:17
Sandesh
I do you too would I, I knew you two would.
00:14:11:19 - 00:14:13:17
Chris
But yeah. You got to love a nerdy CEO.
00:14:13:17 - 00:14:14:15
Suresh
Yeah. No.
00:14:14:17 - 00:14:16:13
Sandesh
No. Yeah. That's too often.
00:14:16:14 - 00:14:27:01
Suresh
And that's that's the fun of being at a startup. Otherwise if, otherwise, you might as well join a large company. It's, it's, that's that's, at least that's what I thrive on.
00:14:27:03 - 00:14:28:09
Chris
Yeah. No, no.
00:14:28:11 - 00:14:46:15
Sandesh
So I'm just going to get started here that we're already recording. And the reason why is because of what just happened. I know that we're going to get some great clips and, our team will edit you and we'll give you everything. People will be looking up cubics and whatever else you guys are to. Yeah, I.
00:14:46:15 - 00:14:49:11
Chris
See, although now you just name the created a new company.
00:14:49:12 - 00:14:51:00
Suresh
Exactly. Right. There you go.
00:14:51:00 - 00:14:54:13
Sandesh
Exactly. I was the sales and marketing guy is.
00:14:54:15 - 00:14:56:17
Suresh
To be a new technology based on Kubernetes.
00:14:56:17 - 00:14:58:03
Sandesh
It's like you, but it's. I'm sure he.
00:14:58:04 - 00:15:00:22
Chris
Thinks a quantum quantum AI company.
00:15:01:00 - 00:15:03:00
Suresh
Yeah, exactly.
00:15:03:02 - 00:15:22:22
Sandesh
People would be like, I wonder what that is. Yeah, I did, yeah. So, I want to ease into this as soon she's going to be very, very casual. I'm gonna I like in the beginning to give you some softballs. Peep what we're learning as people want to know what you think, and you know, this this podcast, we're calling it what the future is.
00:15:22:22 - 00:15:41:17
Sandesh
That's what we're trying to do is figure out, nice. It's the future, right? Right. So, just a couple of questions around the AI bubble. I want to ask you about that. And then just a little bit about what Chris was just talking about, too, is like, you're a technical CEO, but, you know, I will I'll kiss your ass here in a second and brag about how awesome you are.
00:15:41:17 - 00:16:00:10
Sandesh
But, like, you really, I really respect you as a leader. Mostly because they say about you to me, it's, off the charts. So there's a lot of speculation these days. On if we're in an AI bubble. What's your thoughts?
00:16:00:12 - 00:16:25:14
Suresh
You know, I think I'll say 2 or 3 things. One, I think we have to separate out what's happening in the stock market from the technology side. That's the first thing I'll say, because I think on the technology side, I said, I think it is one of the most incredible developments in, the last thousand years in terms of transformative power.
00:16:25:16 - 00:16:44:17
Suresh
I do think people that say the AI revolution is like the industrial revolution are not at all overstating the case. I think what we are on to is the beginning of something that will change everything about sort of how we live, work and do day to day stuff. Some good, some bad. But it will be transformative for sure.
00:16:44:17 - 00:17:10:15
Suresh
That's the first thing on the stock side. You know, almost all great big, changes have have been accompanied by some excess followed by sort of a promotion to the mean and then some sort of a couple of hype cycles, if not more. And so it's possible that we are going to get we're going to have a stock market crash for AI valuations in particular.
00:17:10:17 - 00:17:29:13
Suresh
That said, here's one thing that struck me. If you look at the amount of spending, let's take Nvidia for a moment. Let's say if you look at Nvidia's revenue and say how much of that revenue is coming from companies that are using other people's investors, dollars to buy Nvidia product, which would be the classic sign of a dangerous bubble?
00:17:29:15 - 00:17:51:22
Suresh
I would say vast majority, probably 75, 80% of their revenues are coming from really large cloud companies that are able to fund their annual CapEx or of their annual cash, cash generation. That is sort of. So that almost puts a floor on how much frothy revenue exists here that's made. That's not coming from people with real operating cash flow.
00:17:52:03 - 00:18:07:23
Suresh
So it's possible that the stock market valuations will come down. In fact, they look super crazy. Hi. That said, I also think there's a flaw here because I think there are real business models underpinning the purchases of AI infrastructure.
00:18:08:01 - 00:18:24:03
Chris
Yeah, I was gonna say, I think it's really interesting to see like what happened with Amazon recently where, you know, they had this massive lay off. But the and they sort of pinned it on AI. But when you dig into it more, it looks like they actually laid off all those people. So they had the money to buy more.
00:18:24:05 - 00:18:25:02
Chris
Yeah.
00:18:25:04 - 00:18:53:08
Suresh
Exactly. I mean yes. Yeah. So but I said so. Not all good. I do think we'll go through a, sort of a period of turmoil when it comes to employment and sort of reskilling and there'll be some. So jobs displaced payment is one of the penalties of what will happen over the next few years. And so on the one hand, you're seeing companies make completely, unprecedented profits compared to any prior year.
00:18:53:09 - 00:19:03:15
Suresh
And those very same companies are also saying we have too many people, in our payroll and we need to shrink the payroll. And so, so that's, that is unfortunately, both happening at the same time.
00:19:03:17 - 00:19:10:00
Chris
I think it's very funny. Like we talk about people being replaced by AI, but I mean, these people are literally replaced by a GPU.
00:19:10:02 - 00:19:12:09
Sandesh
Which almost seems exactly right.
00:19:12:09 - 00:19:13:09
Suresh
It's actually. Yes.
00:19:13:10 - 00:19:14:08
Sandesh
That's right. Something.
00:19:14:12 - 00:19:16:19
Suresh
That's right, that's right, that's right.
00:19:16:19 - 00:19:43:13
Sandesh
Oh that's glass right. That's right. Well, on the subject of people so you are seen as a great people leader. And I remember those days. I know that when I first met you there and you were just promoted, promoted, promoted, you became CEO, nimble CEO of sister. You've had an amazing career. But what I really appreciate is, as every time I've met you and been with you, it's always been such a pleasure.
00:19:43:13 - 00:20:09:08
Sandesh
Easy conversation. And, like, it feels like you care, you know? And I think that was a lot of the NetApp kind of culture. Right? Sure. For sure. Who cares what you know until they know that you care. So I'm curious, what how would you describe your leadership style, balancing people, but also the rigor and to be tenacious in this market?
00:20:09:08 - 00:20:13:07
Chris
Well, and I think you got the passion too, which is a really big part of it as well, you know.
00:20:13:08 - 00:20:34:09
Suresh
Yeah. They appreciate that. Chris, I think I'll start by saying some things. And I honestly, I feel like, NetApp was basically a ten year school I went to. I learned so much at NetApp that even of dimensions on on how to lead companies or organizations or teams, I feel like I learned a lot from mentors like Dan and Tom at, at NetApp.
00:20:34:09 - 00:21:01:20
Suresh
And so I credit a lot for sort of, over the years how, I've actually been, leading organizations and teams. And I think there are a few simple principles. I do think the starting point, for me is, is almost to believe that the team actually can accomplish a lot more than any talented individual. And so to harness the power of the entire team, you have to essentially set great context.
00:21:01:20 - 00:21:24:12
Suresh
So, so real transparency and communication. Make sure that doing so I typically do 50 to 61 on ones. In my last company, dig, where we are an employee base of 600 every month. So the 50 to 61 on one side. So try and talk to as many people, establish context, and have a, very free sharing of information so that there's great transparency in the entire culture.
00:21:24:12 - 00:21:46:08
Suresh
That's the first thing that I would say is extremely important. And ultimately, the goal is a deep belief that a group of individuals that maybe, individually, not as smart as one particular talented individual will always do more if they can work with really good shared context. That's the first thing I'd say. The second thing, I think is a clear sense of purpose.
00:21:46:08 - 00:22:16:08
Suresh
I feel like organizations need to know they're working towards something. And frankly, in tech companies, that sense of where you're heading has to come from a technically grounded perspective of what is the vision that you're working towards? What is the when we succeed, what will the future look like? So let's paint a picture of the future. And in a tech company, that picture of the future has to stem from almost an understanding of where the technology will go, how we can intercept it, and then getting people excited around it.
00:22:16:13 - 00:22:43:22
Suresh
And so that's the second thing I would say is just imbuing the entire organization with, here's what our destination looks like and why. That's a good place to go is, is probably the, second thing that I would say is key. And in that I would say one other thing is, you know, culture really matters, right? And so what sort of what I mean by that is, these are about where you go sharing of information is how you operate.
00:22:44:04 - 00:23:04:08
Suresh
But but also what is valued in the company. At Sysdig, we had a principal at sorry at nimble, we had a principal which was basically no jerks. Right. And so it doesn't matter how smart you are, if you're not fun to work with, then you're probably not going to make for a great team member. And so the culture matters and it's things around how people are rewarded.
00:23:04:10 - 00:23:24:03
Suresh
What is sort of lauded in the company was this was frowned upon in the company. And so essentially what you encourage as behaviors within your team, within the larger organization. And is that creating a collaborative environment. So that's probably the third thing I would say. And that is this, you know, this I absolutely saw that very much at NetApp.
00:23:24:03 - 00:23:26:19
Suresh
And that's something that stuck with me.
00:23:26:21 - 00:23:49:19
Chris
You know, when you mentioned sort of the vision part because, I mean, I totally agree with you on all these things. And I think what I see leaders struggle the most with, I think, is communicating an effective vision, right? And sticking to it, you know, because in a lot of organizations, the lack of vision is if you have the vision everybody marches towards and you don't have to tell people to do stuff, they just kind of know where they're going.
00:23:49:20 - 00:23:58:12
Chris
That's right. That's the best way to do it. That's right. Like what what what are the tricks to like really communicating that? I mean, I guess one of it is have a good vision to start with, but yeah.
00:23:58:14 - 00:24:34:14
Suresh
Yes. I think the, there's, there's a couple of things that I would say because I think, having, a vision that is, the clearer it is in your head, the more I think you should be able to translate that to multiple layers of the organization. The same vision. When you're talking to the finance team and how you're communicating it, or when you're talking to a business decision maker, on the customer side and how you're communicating, it is very different from sitting down in a room full of engineers and architects and how you're communicating it.
00:24:34:14 - 00:24:55:00
Suresh
And so I think and I believe the ability to translate up and down certainly comes from having good communication skills, but it also comes from clarity around the vision itself. So I think that's the first thing I would say is when you have a vision, find ways of communicating it, even if it takes explicit work to do so at various levels.
00:24:55:00 - 00:25:06:14
Suresh
A one pager all the way down to a deeply technical document that describes the architecture of where you want to go, has to be very, very, that has to exist in the organization. Yeah. The second thing.
00:25:06:14 - 00:25:08:15
Sandesh
I would say is I'm sorry. Go ahead. Sorry. Go ahead.
00:25:08:15 - 00:25:09:14
Suresh
No, no no, please. Go on.
00:25:09:18 - 00:25:11:12
Sandesh
No no no no no second. Second thing please.
00:25:11:16 - 00:25:33:18
Suresh
Yeah. The second thing I would say is it's, you know, so much of our energy is spent on how do we communicate that externally, whether it's in our first call decks, whether that's in our website messaging, etc.? I think more of the battle is, is really in synthesizing it internally to employees and making sure they are more so find forums, right?
00:25:33:18 - 00:25:53:23
Suresh
Whether it's written forums, all hands meetings, team meetings, etc. and actually this is something I did not do, really well. And I and I continue to struggle at this is you feel like some if you've communicated it five times, you've done enough, or if you've done it ten times, you've done enough. And I find, gosh, there's no amount of repetition.
00:25:53:23 - 00:26:10:14
Suresh
Find different flavors of saying it. Don't say the same way each time, but but you have to keep coming back at it every new product. Let's go back to our vision and why this new product fits every new customer that we want. Let's go back and talk about why did this customer become a customer? Because they bought into our vision and here's how it fits in.
00:26:10:20 - 00:26:20:22
Suresh
So I feel like that's something that you have to do. Also consistently articulate what your vision is in a way that sort of you don't feel like you've done it. If you've done it a few times now.
00:26:21:00 - 00:26:36:03
Sandesh
Yeah. You're. Yeah. So spot on. The other, the third point that you made about culture. Yeah. For Chris and I that like hits right in our heart. You need to work. Indeed. It is, well, we were spoiled at NetApp, right? I, I.
00:26:36:03 - 00:27:01:09
Suresh
Could not agree more. I think, you know, it's a I joined NetApp from a, from another company, McKinsey, the consulting firm. And just the honest truth is this there's been so many scandals around McKinsey lately that sometimes it's hard to remember. But when I joined McKinsey, it was it was I think there was a book comparing it to the Jesuit priest or something, because there were so sort of evangelical about what their mission was and so on.
00:27:01:09 - 00:27:18:14
Suresh
I was very proud of the organization, and I still am, notwithstanding sort of the noise around McKinsey. But part of what I was going to say is another place where the entire product was people and the expertise of the firm. Right. There was no technology they were building. And so they had to, it's a different style of culture now.
00:27:18:14 - 00:27:45:05
Suresh
The culture was beautiful in its participative nature, etc.. McKinsey's culture was all about sort of, fostering talent at the expense of everything else. Right? So sort of how do you take the brightest minds and motivate them and sort of keep making sure that the organization maintains a high bar on talent. But what is common to both of these is ultimately how do you take whatever it is that you believe in and make that percolate to the entire organization, whatever your culture is?
00:27:45:07 - 00:28:02:23
Suresh
How do you reinforce that and make sure every person's living that's that's almost a that's what NetApp did so well, is sort of it was a strong culture. But how did every employee, whether it was the 100th employee or the thousandth employee, seemed to embody that culture? I think they did that really well.
00:28:03:01 - 00:28:24:01
Sandesh
And we the other thing that helps culture is when you have a very successful sales organization for the sales teams, architects and anyone supporting the sales team, yeah, if they feel they're winning, if they're coming in to work every day, they believe in the vision. They're excited about what they're doing. They feel like people have their back. They have good leadership.
00:28:24:03 - 00:28:41:10
Sandesh
You know, guiding them those kind of qualities is why people stay at company. They I firmly believe a lot of people don't stay at these companies because of money. They see there because of other factors. And they I know for me, that's definitely the case. Is the CEO for sure.
00:28:41:15 - 00:28:57:10
Suresh
Practices like Tom calling not just salespeople of closed deals, but engineers that help close deals and support people that help support the, customers that ultimately did expansions. Those allow everyone to focus on success, rewards everybody. Right. And so I agree.
00:28:57:10 - 00:29:12:16
Sandesh
Yeah, I got to tell you this quick story. And then we'll get started here on the Tom Mendoza topic. So, a really good family friend of ours, my wife's best friend's daughter, they live in our neighborhood here. And, she got into Notre Dame and her. Right.
00:29:12:16 - 00:29:16:03
Chris
You know, before you go there, you should explain who Tom Mendoza is.
00:29:16:05 - 00:29:17:01
Sandesh
Oh, I think.
00:29:17:02 - 00:29:18:18
Chris
Everybody who's listening is going to know who.
00:29:18:18 - 00:29:35:21
Sandesh
Sam Mendoza I think our our audience definitely does. But Tom Mendoza was when I started in 98 and NetApp, he was the director of sales and marketing, and he was then became president and vice chairman. And now he's on a bunch of boards and he is just he's.
00:29:35:21 - 00:29:37:01
Chris
All in the school of.
00:29:37:01 - 00:29:44:12
Sandesh
Business, the Mendoza School of Business. And just that story, when he tells you about when how that all happened, it's just so awesome.
00:29:44:14 - 00:29:51:13
Suresh
They, and Tom and Dan were the sort of great partners in shaping that entire company's evolution.
00:29:51:15 - 00:30:12:09
Sandesh
Yeah. Yeah. Totally. Sure. So. So, our, our family friend's daughter is going to Notre Dame. It's a freshman year. She's a little, you know, just like anybody would be a little nervous. And I just reach out to Tom, and I just said on LinkedIn, hey, you know, she's going to Notre Dame. You know, as a freshman. She's super excited.
00:30:12:09 - 00:30:33:15
Sandesh
She, you know, wonderful person, wonderful family. Dad went to the Mendoza School of Business to, would you mind just sending her a note? He's like, send me both of their phone numbers. And he created a voice. Sorry, a video for each of them. And send it to. Gosh.
00:30:33:17 - 00:30:38:10
Suresh
So it's just amazing that way. That is, he makes time for these things. That's what is amazing.
00:30:38:11 - 00:31:06:06
Sandesh
So important. So important. Especially these days in the world of work. From home and, you know, social distancing and distractions of these phones and devices. You know, I'm I'm very, very, very big on the humanity side. That's why I know for sure that. Yeah. And yeah, tech sales and humanity because yeah, you know, we can't forget. That's why, you know, we're all human I agree, I agree, I agree, I agree yeah I agree.
00:31:06:08 - 00:31:23:04
Sandesh
Yeah. Well let's dive in a little bit. So for people that might not know you, can you just tell us a little bit of history of, you know, you're from you know, your young ages to kind of start you. Yeah. What brought you here and, how do you how did you get to where you are now?
00:31:23:06 - 00:31:45:03
Suresh
Yeah, absolutely. So I was born and raised in India, did my engineering undergrad and, business school in India. And then I joined, McKinsey, the consulting firm, in the early 90s. And towards the late 90s, I transferred to the US, to Chicago, and particular with McKinsey, and then with the intention of going back after about a year of a short term stint.
00:31:45:05 - 00:32:07:21
Suresh
But that's when a friend of mine had joined NetApp. And having when I visited him, it became clear to me that the excitement of the late 90s was 98. The excitement of the valley was just completely, impossible to ignore and get drawn to. And so I left McKinsey to join NetApp, and I spent ten years at NetApp.
00:32:07:21 - 00:32:34:04
Suresh
That was really, in many ways, the beginning of my tech career and tech education. And I joined as a an individual contributor, managing one of our alliances with Dell at the time. Moved gradually into product management and then onto the executive team, to run engineering and product management. And so that was, one of my most that fun ten years, stints where I learned an enormous amount.
00:32:34:06 - 00:33:06:06
Suresh
The last two years were interesting in that while I was running, R&D and product management, I had just come on for a year and a half of running a business unit that we had acquired, a company we'd acquired. That was my first virtual CEO experience, because there's a security company called Deck Crew that NetApp had. We decided to structure it as an independent business because it was partnered with EMC and other storage companies, and to give it a chance to survive, we needed to think of it as an independent shop with its own sales and marketing.
00:33:06:08 - 00:33:25:17
Suresh
That never panned out. When EMC acquired RSA, the idea of an independent security subsidiary that would partner with EMC did not make any more sense. And so we merged it back. But the one and a half years of running a full business so unfortunately spoiled me in the sense that I did not want to go back into a functional role after that.
00:33:25:19 - 00:33:44:23
Suresh
I love the idea of running an entire organization and, the business, not just the product. And so that's when I left. And since then, I've just been doing startups. Christmas and Asia was at, my first startup was a company called Omnium. Also an interesting experience. This was when I joined. They had an S-1 on file.
00:33:45:01 - 00:34:05:08
Suresh
Dumbest. Timing wise. Dumbest. Time to have left NetApp to join because it was, 2008, towards the end. So when I joined, the revenue was something like 37 million. When I signed the offer eight weeks later, when I joined, it was about 23 million. And so really, the bottom had fallen out of the market.
00:34:05:08 - 00:34:29:16
Suresh
We had to pull the S-1, we had to restructure the business. But ultimately, we were acquired and it had a successful exit. And then I think began what I think of as my probably my most fun, SEO startup experience, maybe organizational experience. I joined, nimble. We were about 20 Nimble Storage. We had about 25 people or thereabouts when I joined, just had launched the product.
00:34:29:16 - 00:34:47:09
Suresh
I'd been on the board for a year prior to joining as CEO. And so it was really pre pre-revenue. And over the course of the next 6 or 7 years, the company became a public company and then went on to be acquired by HPE. Just about a month ago, I had the entire nimble team at my home for dinner.
00:34:47:09 - 00:35:11:10
Suresh
So it's a it's more than sort of the milestones that are business milestones. I remember it was another place where the culture felt exactly like the culture. Almost everyone I meet from there talks fondly about their time at at nimble, for sure. I then last, took a year off and, wanted to do something completely different from hardware, and, infrastructure, which is ironic considering where I came back to.
00:35:11:10 - 00:35:36:12
Suresh
But went to a company called Sysdig, partly enamored by the idea that Kubernetes was going to become the basis by which cloud applications were built. Sysdig was really an observability and security company for containerized microservices. Again, joined pretty early. So I think we were about 20 people when I joined, just under 5 million in error. I left about six years in.
00:35:36:15 - 00:36:16:04
Suresh
We were we are now. Sadiq is now, about 20,000,600 people successful, very, strong in container security, an open source pioneer in, in, sort of cloud native, data security. But for a couple of reasons, mostly to do with AI. I decided it was time to leave. So just the last part of my journey and arriving at clockwork for the last year or so, when I was at 60, we had already started to embrace AI, to try and bring a genetic, security to bear within the Sysdig platform.
00:36:16:04 - 00:36:44:16
Suresh
And I I'm seeing first hand how transformative AI can be when used. Well. I also realized as an existing company that has existing customers, a pre, existing architecture that you've built on, you can do a lot to embed AI. And so stick is doing that really successfully. But two things struck me. One, the next decade and a half, every product is going to be designed ground up with AI in mind.
00:36:44:18 - 00:37:13:11
Suresh
Rather than embedding AI. And the second thing that fascinated me was, as much as you can think about companies that use AI, I was fascinated by the, the underlying infrastructure and the underlying models, the underlying actually foundational layer itself that was then feeding these companies that were using AI. And so I decided it was actually going to retire after sysdig or do something a little less intense, after so many years of startups.
00:37:13:11 - 00:37:31:01
Suresh
But the draw of AI was so strong, I wanted to go back to doing some things, in the AI space and ideally a startup that is always, so my preference and so that's, that's how I came to clockwork. And I don't talk a little bit about I knew the founders of Clockwork Inside, but that's what brought me to sort of clockwork in many ways.
00:37:31:03 - 00:37:31:15
Suresh
00:37:31:16 - 00:37:41:18
Chris
Yeah. That's the lure of AI is incredible because it, it's such a it's a unique period of time. I've never seen anything like this in my all my years in it.
00:37:41:20 - 00:38:05:21
Suresh
I cannot agree more. Chris. It's, I was on the board of clockwork for, I joined the board last year, and then, about 6 or 7 months ago is when I came on full time as CEO. But it's so it's a, it's a it's hard to, exactly characterize how so much change can happen in just six, seven months or even within cloud.
00:38:05:21 - 00:38:20:20
Suresh
Yeah, I'm just seeing the pace. I've never seen anything moves as quickly as what we are seeing, whether that's at the infrastructure layer, whether that's at the model capabilities, whether that's in terms of how companies are embedding AI. It's just wicked fast.
00:38:20:22 - 00:38:30:09
Sandesh
Yeah, it's going to be an interesting hype cycle, right. For sure. We we've seen these hype cycles before. There's just something very unique about this one. And one I think it's.
00:38:30:09 - 00:38:44:12
Chris
Just I don't think you can look at AI is like one thing on that hype cycle, because I think there's things that are in the trough of disillusionment, peak of inflated expectations, and the plateau productivity all at the same time, you know, so many different things going on.
00:38:44:12 - 00:39:05:11
Sandesh
Yeah, sure. No, no, I get it. At the same time, I think from the people that aren't in the AI world, they're in panic mode. And for people like us that are in it. Yeah, yeah, I understand why they have some panic, but we'll be okay. We've been here before. This is just the part of innovation, you know that.
00:39:05:13 - 00:39:20:05
Sandesh
Yeah. It's it's everything's going to be AI. You know, back in the day, if you were, if you didn't have a cloud story, you weren't getting funding, you know, if you weren't going SaaS, that was going to be a tough road. Right? And now it's, you know, if you don't have AI, you're not going to get funding.
00:39:20:05 - 00:39:39:17
Sandesh
And, you know, no one really cares. But ultimately, I do think this is the the bubble is going to burst. And I think a lot of startups will fail, but I think that's a good thing from the perspective of we're going to learn so much in the next, well, we already learned so much, but this is not done right.
00:39:39:17 - 00:39:45:16
Sandesh
Like we're going to get into this like some of the problems that are now out there that we need to solve. Indeed.
00:39:45:18 - 00:39:46:07
Suresh
Indeed.
00:39:46:09 - 00:39:57:08
Sandesh
And it's absolutely amazing, you know, in some of the for sure when we talk about them AI, they're almost elementary to me in some ways. But there's so incredibly impactful, you know.
00:39:57:10 - 00:40:12:21
Suresh
For sure. I think we are early in optimizing everything that goes into from the infrastructure up to sort of agents that are delivering services to end customers. Everything is still very, very early in its, evolution.
00:40:12:23 - 00:40:23:05
Sandesh
So, why clockwork? What brought you there? And maybe if you can give us a little bit of that history, I mean, I know it, but you know, the history of how the clockwork actually become a company in.
00:40:23:06 - 00:40:42:17
Suresh
Absolutely. So I specifically, I, got drawn to clockwork, partly because of the problem they were solving, or in large part because of the problem they were solving. And the technology they brought to bear. I happen to know the founder, Balaji, who's a professor from Stanford. I've known him socially for the last couple of decades, off and on.
00:40:42:19 - 00:41:02:00
Suresh
And then as, as, as I understood it a little bit, the co-founder Balaji School Fund, a yearlong, is in many ways sort of the the thesis that he worked on with biology at Stanford is the foundation of clockwork. And as I understood more of what was the technical foundation of the company, I got completely enamored.
00:41:02:03 - 00:41:26:19
Suresh
So let me talk a little bit about sort of the evolution of the company itself. Fundamentally, the company started when Elon created this, mechanism to synchronize clocks on a, large. So within a data center, across data centers, across a large thousands of machines, using purely software, where the clocks are synchronized to within tens of nanoseconds of each other.
00:41:26:21 - 00:41:50:00
Suresh
And so and so when you get accurate clocks in terms of drift between machines that are the early 4 or 5 years. So the company was started in 2017. For the first five years, a lot of the use cases, frankly, it was operating as a, Stanford group, if you will. So more as an offshoot out of Stanford rather than as a full commercial company.
00:41:50:05 - 00:42:13:20
Suresh
It was a small team of 4 or 5 people. And really the use cases were vertical market use cases. So we have companies that are fortune 100 financial companies that want to timestamp their records so that they have accurate timestamps, that are companies that are, crypto trading companies that are using this to make sure that they can optimize, how to place a trade in the most efficient manner from a timing perspective.
00:42:14:01 - 00:42:43:15
Suresh
So there are a group of customers that were really vertical market customers, and that's where the company was focused for the first five years in 22 and 23. The first big leap, that happened was applying the fact that when I can synchronize clocks accurately, then I can accurately measure one way, delay the time it takes for a packet to go from machine to machine B or virtual machine to virtual machine B or container E to container B once.
00:42:43:15 - 00:43:02:19
Suresh
So I can measure that delay accurately and I can measure one way delay. So if time from A to B is different from a, most people basically estimate latency by taking roundtrip time and dividing by two. And so we able to get accurate one way latency. And then we built an entire network telemetry port folio around that.
00:43:02:19 - 00:43:31:00
Suresh
That's the first time that the company leaps from being focused on vertical market time stamping applications into how do we optimize, networking across a large number of distributed machines, if you will? The, step beyond just measurement came when we were able to embed control logic. So a software control plane to say this is all still focused on TCP networks connecting containers in virtual machines.
00:43:31:00 - 00:43:55:00
Suresh
What we would able to say is now that we know the delay between machines using software instead of relying on the network itself, can we do things like congestion management? So if there's a congested, set of links, then let's slow down the pace of traffic so we can manage congestion. Can we do quality of service? So for example, if application A is more important than application B, let's prioritize traffic for application A or application B.
00:43:55:02 - 00:44:18:07
Suresh
So those two things right. Which is sort of taking our block foundation. And we built what we call dynamic traffic control which allows you to manage congestion and quality of service on networks, all applied to VM and container. Clusters of you 24 is when the big breakthrough happened. That's when I sort of started engaging. Up until then, we were still on the CPU side.
00:44:18:08 - 00:44:43:20
Suresh
And what we realized was that GPU clusters used for AI training are the most demanding, distributed applications that have ever existed in history. They have lots of unique properties. One GPU that slow by a few seconds will make every other GPU in a distributed training job. Wait for that one machine. And so there's a whole bunch of properties where timing is extremely important.
00:44:43:22 - 00:45:05:20
Suresh
All computation happens across a large number of machines. So if we can take everything that we've done with respect to measuring the condition of the network, optimizing the flow of traffic on that network, and thereby improve what happens with GPU clusters that is a big transmission. So that was really the opportunity that came to clockwork in 24, and we started working on that problem.
00:45:06:02 - 00:45:18:14
Suresh
That's also when I sort of got completely excited. I the and I'll talk a little bit more about what we do, but that's sort of Long's explanation of how I came to clockwork and how clockwork itself has evolved. So.
00:45:18:16 - 00:45:33:09
Chris
Yeah. And from you're trying to solve though, is kind of a non-trivial problem within the GPU cluster world. I mean, can you speak to like, like how much utilization, a little utilization we get out of these things?
00:45:33:09 - 00:46:01:14
Suresh
No, absolutely. So if I, if I describe the problem in terms of its business impact first, right. Yeah. Typically in these large GPU clusters, really well managed clusters will get up to 50% in terms of utilization. Often utilization runs in the 30 to 50% zone utilization being measured as if I have a thousand GPUs in my GPU cluster capable of delivering X amount of flops, then what I'm really realizing is somewhere between 0.3 to 0.5 flops.
00:46:01:14 - 00:46:24:15
Suresh
Right? And so that's one this that we within buried in that, many things. But the other really egregious problem is that if I look at a thousand GPU cluster for every 24 hours that I'm basically operating that cluster, I lose between 2 to 4 hours of availability on that cluster where something disrupts my training job. I have to stop this training job and restart it from a previous checkpoint.
00:46:24:15 - 00:46:50:07
Suresh
And so I'm losing 2 to 4 hours, which of course, of course feeds into your, cluster utilization. But it also feeds into the fact that your jobs are taking longer than basically some number of people on your observability team have to quickly find out exactly what happened and correct it. So observability. So there's an emotional aspect when failures happen that's as daunting as the last utilization, right?
00:46:50:07 - 00:47:13:05
Suresh
No operations team wants to feel like things are always failing. They don't have complete control. So that's really the nature of the problem. Low utilization very high failure rates. Every incident takes a long time to detect and remediate, and therefore your training jobs take much longer to complete. As much as 2 to 2 and a half takes longer than in a theoretical best, if you will.
00:47:13:08 - 00:47:16:08
Suresh
And so that, in a nutshell, is the problem.
00:47:16:10 - 00:47:18:11
Chris
Yeah, we you. Oh, sorry.
00:47:18:17 - 00:47:19:15
Suresh
Okay. No, no, please go on.
00:47:19:15 - 00:47:32:10
Chris
Because I was going to say like I mean there's a reason why Nvidia is a, you know, $5 trillion company. Now these things are not cheap. So like utilization has a huge economic impact right.
00:47:32:11 - 00:48:05:21
Suresh
For sure. You take $100,000 100,000 GPU, cluster. You're probably spending 5 billion on that cluster as in terms of the data center. Yeah. So there's a capital that you can think of. And that's sort of wasting 2 to 3 billion is just not. That's crazy. Right? It's crazy in terms of sort of how much, opportunity exists to improve, but but sort of equally significantly, that's same data center will consume, let's say 125, 250MW of power.
00:48:05:23 - 00:48:35:08
Suresh
And that means you're basically throwing away something like 50 to 75MW of power. That's 25,000 homes that that basically can be lit up all year long. Right. And so every data center. So that's wasted. I'm not talking about the total power draw, just a wasted. So there is there are so many dimensions to this, to this problem of we are not yet really good at efficiently operating the underlying infrastructure and extracting maximum utilization.
00:48:35:13 - 00:49:04:18
Suresh
In a nutshell, that is the problem that we fundamentally attack. And if you think about sort of where does clockwork intersect that problem. Chris Lee so I workloads or they achieve everything they do through a distributed communication process, a single GPU cannot do what you need, not. So you're throwing thousands of GPUs to complete the job. And so really, the bottleneck to how you accomplish whatever your workload is trying to do is all about communication efficiency.
00:49:05:00 - 00:49:22:21
Suresh
And that's why. So we we we like saying that communication is the new Moore's law in AI infrastructure. That's because everything that you're trying to achieve from an AI workload depends on the efficiency, effectiveness, reliability of how these GPU GPUs talk to each other on a GPU cluster.
00:49:22:23 - 00:49:33:03
Chris
Yeah, yeah. Well, I mean, and and I've heard it, your company described as sort of the ways for GPU clusters, I think.
00:49:33:04 - 00:49:55:12
Suresh
So that's because everything that you're trying to achieve from an AI workload depends on the efficiency, effectiveness, reliability of how these GPU GPUs talk to each other on a GPU cluster. It's helpful to start with sort of what are the three building block technologies? And we've touched on a couple of those. Right. So the first building block is is and I'll talk to there are three core building blocks.
00:49:55:12 - 00:50:13:18
Suresh
So let me walk through each one of those. The first one we've touched on is already our ability to synchronize clocks to within tens of nanoseconds. And as a result, sort of what we are able to do is to look at every single message. So if you take a PyTorch job, break that down into sort of what is a distributed communication.
00:50:13:18 - 00:50:34:15
Suresh
So there's a messages being sent by this collective communication library. That message breaks up into chunks. Those chunks are broken up into, units that are sent over RDM connections. And so you break it all the way from the application down to what's traveling on the wire, and you're timing everything that's traveling on the wire because of our ability to synchronize clocks very accurately.
00:50:34:16 - 00:50:58:01
Suresh
So the first thing we have is the ability to look at all communication, understand what's going normally and what's being delayed, map that to. What does that mean for your PyTorch training jobs. So that's the first building block is clock sync leading to really deep network telemetry correlated to the application. The second building block is what we termed dynamic traffic control.
00:50:58:07 - 00:51:20:01
Suresh
So really it comes to and it's actually very simple. What we are able to do is space traffic. So either slow it down when it's likely to be congested. So pace traffic we can slice traffic. So we basically are able to say if I want to slice it into five pieces and guarantee one versus the other, we can slice traffic and then we can reroute traffic.
00:51:20:03 - 00:51:46:09
Suresh
Right. And so that's really what this control plane does. We happen to plug it into TCP using instrumentation like RBF instrumentation. We plug it into RDM networks using RDM APIs. We plug it into collective libraries like Nicole with an Nvidia Oracle in the AMD ecosystem. So this ability to slice based and route traffic works with different communication protocols, libraries in MPI.
00:51:46:10 - 00:52:24:08
Suresh
That's the second foundational capability. And then the third one is what we call distributed state tracking. So sorry if I'm I hopefully this is making sense and I want to translate this. But because all these jobs are essentially being managed as a set of processes that are executed on large number of GPUs, imagine that one GPU fails if you're not making sure that everybody is at a consistent point, then essentially you have to restart from the scratch, because where you fail, the job is hung and all the GPUs are at a different place, if you will, in that in the collective process.
00:52:24:10 - 00:52:44:02
Suresh
And so what we are able to do extremely well is understand what the distributed state looks like, so that we can recover gracefully when failures happen, when links are right. So that's really the third aspect of it is understanding the distributed state in a, in a job that's executing on a large number of GPUs. And that allows us resilience and fault tolerance when failures.
00:52:44:07 - 00:53:06:15
Suresh
So if I step back to we at the core it's three breakthrough technologies, right. One is blockchain sync leading to insane telemetry. The second is dynamic traffic control that allows us to sort of pace, slice and route traffic flows. And the third one is distributed state tracking. That allows us fault tolerance, if you will. And so that's what we've combined.
00:53:06:17 - 00:53:34:12
Suresh
Now if you think about what our customers see when they're operating these clusters, we have, multiple large neo clouds that are now deploying us. These are among the world's most successful neo clouds. We have one of the world's leading, well, LinkedIn was on our, on our webinar yesterday. We have one of the world's leading telecom companies, deploying us leading, sovereign labs, and one of the top five hyperscalers.
00:53:34:12 - 00:53:57:02
Suresh
Right. So good deployments that give us a broad sense of, what we're seeing. And essentially it's three things. One is observability. So we first get deployed because we give extremely good observability, particularly with respect to networking. But broad cluster availability, like one of our neo clouds, uses us to audit their cluster to make sure it's configured correctly before handing off to their customer.
00:53:57:02 - 00:54:27:22
Suresh
So so really good observability of the cluster. Then, the second and arguably the most common reason why we get drawn in or the biggest deal driver is resilience, our ability to keep a training job, continuing without disruption when you have links flapping. Today we're working on technologies that will allow us to survive, not just a network link flap, but any kind of GPU failure without transmitting that failure up to the application and continuing nondestructively.
00:54:27:22 - 00:54:44:03
Suresh
So fault tolerance is a second thing that we are deployed for. And then the third one is performance optimization. So just by using sort of load balancing, congestion control, how do we optimize the performance of the network and therefore the overall cluster. So that in a nutshell is sort of what, what what we do.
00:54:44:04 - 00:54:49:02
Chris
Yeah. And I got imagine the TCO on that is it's so easy.
00:54:49:04 - 00:55:11:19
Suresh
To it is I mean, honestly, I think it's if you just take the link slapping, this is sort of our analysis. Right. So, and this is based on really good data, both published data from the likes of Alibaba and others that have really, meta and others have documented failures in gory detail. But we also seen this in our own customer base.
00:55:11:21 - 00:55:36:04
Suresh
For every thousand GPUs, you're likely to witness something like 150 to 300 failures a year. Failures in the sense that restarts job restarts a year, and then you continue and say, let me do the math on those restarts. You're wasting somewhere around 200,000 to 300,000 GPU hours a year from link flops alone in in a GPU. So 200 to 3000 GPU hours.
00:55:36:04 - 00:55:55:14
Suresh
So you can do the math on just that. Forget delayed, time to production and all of that. Just on the lost hours at 2 to 2 and a half dollars per GPU hour. The math there itself, even in a thousand GPU hours, you're getting to sort of close to a million and that doesn't count the human cost of just finding problems.
00:55:55:14 - 00:56:05:17
Suresh
It doesn't count the delayed a time to market of your products and so on and so forth. So I think the ROI has been, easy enough to establish for for some of the value, Chris.
00:56:05:19 - 00:56:18:06
Chris
For sure. Hey, I just just for a second. Yeah. It's like the the light behind you from the open window is, like, shifting, and it's like you got kicked off, so it's it's got the little.
00:56:18:11 - 00:56:21:04
Sandesh
Oh, yeah. I can't see your face.
00:56:21:05 - 00:56:26:17
Chris
Yeah, yeah. It's it's starting to change the exposure of it. Yeah. That's that's probably a better bet. Okay. Perfect.
00:56:26:17 - 00:56:28:10
Suresh
Perfect for that.
00:56:28:12 - 00:56:32:23
Chris
Yeah. Yeah. No I was just getting a little, little, blown out there.
00:56:33:00 - 00:56:39:07
Sandesh
Yeah. No, that. Yeah. That's something Chris. Thanks for saying there. Yeah. Just so you know, he.
00:56:39:07 - 00:56:42:08
Chris
Started out all right, but I think that while we were talking the sun move.
00:56:42:09 - 00:56:52:09
Sandesh
Yeah, yeah. Makes sense. It makes sense. Yeah, yeah. Chris is the pro at this. I don't know if you know, but he's like, he's on a multiple podcast. So, this is not. As far as I know.
00:56:52:09 - 00:56:56:17
Suresh
His setup was impressive last time he saw it. So, yeah.
00:56:56:19 - 00:57:00:01
Chris
What do you do? I do four podcasts now, so. Oh, man. You're crazy.
00:57:00:01 - 00:57:29:09
Sandesh
Yeah, I told you, if you want nerd talk. Oh, God, that's Chris's jam. Jam. One of the the key takeaways for me and anyone that is watching, you know, from an infrastructure perspective, there's a rejuvenation that is happening. However, I'm not seeing the average, you know, fortune 500 company building out these very large scale environments. The problem you are solving, just by its nature, you need thousands of GPUs.
00:57:29:11 - 00:57:57:08
Sandesh
That's right. You're spending thousands, you know, billions of dollars. It certainly makes your value proposition, you know, very simple. But what my biggest takeaway was, I didn't realize how many failures there actually are in the not just, just, the GPU itself, but also the network. And I just kind of felt like, why doesn't this just work better?
00:57:57:10 - 00:57:58:00
Sandesh
You know.
00:57:58:05 - 00:58:23:00
Suresh
I, I think it will it will take a few years. I think the ideas around how to make the infrastructure perform more resiliently, to have more observability built in to have more resilience, built in, to have, security built in will all come. I think we are at a phase in the evolution of AI infrastructure, deployment of GPU clusters, where speed trounces everything.
00:58:23:00 - 00:58:45:02
Suresh
And so everybody is going for larger data centers faster. And so there's not enough time to step back and design. And the other phenomenon in here that's interesting. I want to come back to the question about enterprises and how many can actually deploy these versus how they'll consume AI. Will they consume it through the infrastructure layer, or do they basically consume it through some of the cloud layer?
00:58:45:04 - 00:59:18:11
Suresh
But sort of what I was going to say is many of the people that are solving these problems are solving it in a bespoke manner for their own internal infrastructure. These are the open eyes of the world, and the tropics of the world are likely solving these problems through software techniques that are specific to their needs. What you've not yet seen is the emergence of independent software companies that are starting to say all of these are a software value add on top of the underlying networking hardware on top of the underlying GPU hardware server hardware that we need to build out that's existed.
00:59:18:11 - 00:59:43:16
Suresh
I mean, there isn't a Datadog for observability in the GPU world. There is. So there are many, many software ISV that are just, I think, to emerge over the next few years. And that's certainly our vision. You asked a great question, right? So, well, I mean, this is something I think about all the time, given the complexity and the cost of this infrastructure, I think there won't be there will be always sovereign AI.
00:59:43:18 - 01:00:17:19
Suresh
That will be sort of deployed, as bespoke data centers mature and there'll be some really large enterprises that have the scale to justify building out their own data centers. For the most part, I think the emergence of the neo cloud space is because you want someone else to take these problems away from you. And I think, on the one hand, going all the way to an AWS and a Google, as they're becoming more and more AI centric, it's it's sort of it's almost like I don't need to consume the 150 cloud services of 250 cloud services.
01:00:17:19 - 01:00:43:13
Suresh
I just want a set of services tuned for AI. And that's where the neo clouds came in. They are really focused and therefore lower cost. And, more purpose built for AI. Gradually you're seeing those offerings emerge from the big hyperscalers as well. So I think most enterprises over time will consume that from either hyperscale public clouds or neo clouds that are becoming larger and larger and almost, challenging the hyperscalers.
01:00:43:13 - 01:01:08:08
Suresh
And so, there will be, some really large enterprises, of course, that will do their own, but that's the, for us, part of what I'm excited about when I see the success we are having with, the hyperscaler I mentioned and the and the neo clouds, ultimately we are serving the tenants of these companies. We also some of our products are more useful so to the tenants than they are to the operator themselves.
01:01:08:08 - 01:01:33:07
Suresh
Right. So for example, we do network monitoring as I mentioned, and there's a lot of telemetry at that layer. But we also expose infrastructure dependencies all the way through a PyTorch lens. So if you're an ML team that wants to know why is my iteration running slow and what can I do to change that? We give you insights on how to correlate what you can change at the infrastructure layer, all the way through to your training jobs interest.
01:01:33:07 - 01:01:55:03
Suresh
Similarly, if you think about having a PyTorch job, continue on disrupted when failures happen, what we are looking into are hooks within nickel, hooks within PyTorch itself that are relevant to the tenant of these clouds. Whether you're running in a Google or an Azure or you're running it and nebulas or in your own data sets.
01:01:55:05 - 01:02:10:02
Chris
Yeah. Well, you know, it's an interesting conversation that's been happening around the idea of like, have we with all these foundational models, have we sort of reached peak, training and now it's all going to be inferencing.
01:02:10:02 - 01:02:11:21
Suresh
Yeah. And then you squeeze.
01:02:11:22 - 01:02:26:08
Chris
Then you start looking at like well, but you know, we've still got all this visual training data that's going to be coming in. And there's just like, this is going to happen. This is going to happen. It seems like training is not going anywhere anytime soon in my book. But like what's your perspective on that.
01:02:26:10 - 01:02:51:01
Suresh
Yeah. So I think for sure, training I think is nowhere near the point where you can say there'll be sort of no need to train and everything will work on a foundation model. I do think that it's hard for me to imagine more than a handful of foundation models that will be successful over time. So if you think about sort of truly massive foundation models, you maybe have less than ten globally.
01:02:51:01 - 01:03:16:21
Suresh
I mean, there'll be some that countries will want to have simply for national security reasons and so on. But in terms of broadly applicable foundation models. But then, let's take a simple example. Actually, you you called out image models as a great example. There will be a few, dozen companies really specializing on image centric models. You take, automotive.
01:03:16:23 - 01:03:45:14
Suresh
Tesla is at least publicly known to have deployed somewhere between 60 to 100,000 GPUs just for training automotive. And it's not likely that there's going to be a single model that Volkswagen and Mercedes and everybody else will use. So the automotive, there will be a couple of dozen, autonomous driving models in, that will exist. Similarly, drug research and pharma, you'll continue to have either not necessarily ground up models, but pre-training and fine tuning.
01:03:45:14 - 01:04:07:17
Suresh
And there's a whole bunch of AI. So when I think about it that way, I'm convinced that there will be several hundred large companies doing training. Yeah. I don't believe, Chris, that there will be tens of thousands of enterprises doing training. They may do some amount of sort of fine tuning, but even that, I believe, is is probably not going to be numbering in the tens of thousands.
01:04:07:19 - 01:04:34:08
Suresh
Inference of course, is frankly, every application in the world will be an inference application for sure. Right. And so so I think that is truly extremely broad. The interesting there with training is if there are a thousand companies doing training with the infrastructure spend, there is still enormous, enormous. What I also find interesting is that inference itself is becoming multi GPU, highly distributed.
01:04:34:10 - 01:05:15:17
Suresh
Yeah. There are trends that are driving sort of as, as you have larger and larger context length and keeping the what's called Cui cache, which is really the tensor translation of all the context, if you will, that needs to be in memory to produce the output. When someone asks, query the amount of, context whose tensor values you're storing up front, and to to invoke rapidly when a new query comes in that's growing into the terabytes of memory, and that's forcing a disaggregation of inference into multi GPU inference, if you will.
01:05:15:19 - 01:05:16:23
Suresh
Right. And so that is.
01:05:16:23 - 01:05:22:07
Chris
Bringing a perfect application for this product. I know exactly work exactly.
01:05:22:07 - 01:05:38:03
Suresh
No, no. Exactly. We are finding that suddenly all the things we talked about as highly stateful, highly latency sensitive, multi GPU training environments are becoming more and more true of inference as well. And that's what I believe will happen even in the world of inference.
01:05:38:05 - 01:05:44:16
Chris
Yeah, I think right now the biggest challenge for AI is context. Yes, exactly. That is exactly, exactly.
01:05:44:20 - 01:06:12:17
Suresh
And there are some really powerful technologies that are evolving in the inference world to solve that. Right. And so how do I first is how do I store, hundreds of gigabytes of terabytes of inference in an external storage system and then and yet bring that into the memory of a GPU that's actually serving that inference request in a really fast manner, using special, protocols to move data fast for inference serving.
01:06:12:19 - 01:06:29:09
Suresh
How do I take an incoming query and route it to the specific GPU that has the context most readily available and is idle at the same time? Right. And so it's it's a it's a it's slowly a communication optimization problem that, that you're starting to see for sure.
01:06:29:11 - 01:06:59:08
Sandesh
Yeah. Yeah. What's interesting is your I mean when new technologies come out, it's rare that they start immediately. And in these large, highly performant, complex environments. Right. And and that's, that's what's interesting to me too, is that you're if you can the customers that you have and if you can help, then you can solve their problem. The others will be much easier, you know for sure.
01:06:59:08 - 01:07:04:12
Sandesh
And after the hardest customers out there and you're trying to sell them.
01:07:04:14 - 01:07:33:02
Suresh
No, that's so true. It's, you know, it's it's both the, the boon and the bane of of, the problem we've solved and where that problem is most urgent. Even in our non GPU, microservices, customer base where we are optimizing, sort of both detection of network bottlenecks in large scale microservices and addressing the problem.
01:07:33:02 - 01:08:08:06
Suresh
The customer that I think we in our public announcement of the of our platform for coming out party, if you will, we talked about Uber as one of the customers. And what's publicly known about Uber is they have thousands of microservices running on nearly 200,000 machines across three clouds. Right. And so, so, so the scale we're getting latency down right, is what matters the most is in these extremely large scale user facing applications where a delay or an outage on an application means revenue is on the line right now, rides are lost or food is not delivered.
01:08:08:06 - 01:08:29:15
Suresh
And so there's real implications for service levels on the GPU side. In a similar vein, where our ability to deliver resilience matters is when you're running thousands to tens of thousands of GPUs and when you lose time, it's basically thousands of GPUs lying idle. So the so you're going to do whatever it takes to fix that. And so the good news is we are battle tested in some really large environment.
01:08:29:15 - 01:08:44:09
Suresh
And the revenue per customer is abnormally high compared to all my prior experiences. The bad news is failures are very, very I mean, your product had better do what it's saying because otherwise the black eye is is dangerous.
01:08:44:11 - 01:08:45:06
Chris
Yeah, yeah.
01:08:45:06 - 01:08:47:10
Suresh
No doubt a startup in particular. Yeah.
01:08:47:10 - 01:08:53:13
Sandesh
Yeah, yeah. Maybe I could keep going for hours. We'll try to land this plane.
01:08:53:15 - 01:08:56:14
Chris
We have already just blown through our time, so.
01:08:56:14 - 01:08:58:06
Sandesh
I apologize and things.
01:08:58:08 - 01:09:18:00
Chris
And I'll tell you the the the the first problem you solved is a really interesting one too. And I feel like we almost need a podcast just to talk about that technology, I think. Yeah, to to get it down to that level of granularity. Yeah. It's really interesting. I, I'd be very curious. This is the honest.
01:09:18:00 - 01:09:40:23
Suresh
Truth when, when I first, even though I knew the founder for 15 years, our Mike my interaction with clockwork started when a, headhunter called me and described this company that had solved this clock sync problem, and, I my first reaction was, look, I have been in tech for a long time. This is a very old problem in computer science.
01:09:41:05 - 01:10:02:08
Suresh
How do you synchronize? And frankly, if clocks could be synchronized to that level of accuracy, databases would not quiz in order to do sort of right, in order to do some housekeeping storage systems or not acquiesce in order to take a snapshot, they would just rely on timestamps to say, I can create a snapshot and so on. So I'm like, I'm not sure I buy what you're saying.
01:10:02:13 - 01:10:20:04
Suresh
And so, and in fact, he then sent me the paper, the, this paper where this was presented. And that's when I realized, oh my gosh, this is a this these are people I know well. So let me just go. I didn't know that that's what they were working on. And so I agree with you. It's an extremely, interesting problem that it's not software.
01:10:20:04 - 01:10:27:07
Chris
That's the thing that really. Exactly. Excuse me because I'm not exactly as I like trying to envision how you would do that. And that's right.
01:10:27:07 - 01:10:39:14
Suresh
It's no ERP year course. Right. Because this problem has been solved through hardware, has solved this to the same levels of accuracy, but with very specific hardware on a smaller number of machines. Exactly, exactly, exactly, exactly.
01:10:39:14 - 01:10:40:19
Chris
Now, super interesting.
01:10:40:19 - 01:10:52:01
Sandesh
So cool. Well, landing the plane. Are you. Before I ask you this question, I just want to know, is it okay to talk about Torch pass that? Yeah, I know.
01:10:52:01 - 01:10:57:08
Suresh
I alluded to it by saying we're now surviving are the GPU failures. That's what we were working on. Actually. Now with the that.
01:10:57:10 - 01:11:11:15
Sandesh
I'll tell you for that, I just want to make sure that was okay. I don't know where you were at. Indeed. Indy announced you so rush what is in store for clockwork now, both from a technology vision and strategy as well as the company.
01:11:11:17 - 01:11:37:05
Suresh
Yeah. So I think, I'll start with the technology. I think our, the vision we have of a software driven fabric has so many places where I think we can add value, add color to initiatives, in particular some. There's the first one I talked about our current product capability to survive network link flops and keep training jobs running because we're working on a project, internally we call it Torch Pass.
01:11:37:05 - 01:12:11:19
Suresh
But basically what we're what it's aimed at is the ability to survive any kind of GPU failure and continue to, operate training without letting that failure disrupt Python jobs, if you will. So non-disruptive fault tolerant training in the face of any kind of GPU failures, not just network link flops, is one of those, projects. The second one is, really being able to allow rdms storage and TCP flows to coexist on a single Ethernet fabric.
01:12:11:21 - 01:12:43:11
Suresh
Specifically, RDM requires a very separate set of configurations on underlying Ethernet. Even if you use Ethernet. Rocky. Rocky V2 is the is how you deploy RDM on Ethernet fabrics. And then TCP of course, is extremely well known. Traditionally, the two behave extremely different. HCP anticipates losses and uses losses to control congestion. RDM requires lossless networks and so typically to deploy them, you've had to physically segregate the network, in very complex ways.
01:12:43:13 - 01:13:04:00
Suresh
Our software very trivially allows you to run both RDM I don't say trivially, I should say in a very simple manner allows you to run RDMa and TCP on a common underlying Ethernet fabric. We're very excited about this. It's early stages. We have one of the largest hyperscalers trialing this in their environment, but it can lead to massive cost reduction and simplicity.
01:13:04:00 - 01:13:27:16
Suresh
So those are a couple of but the larger vision of ultimately allowing software to write to really, optimized distributed applications on commodity networks. So these are examples of what we're doing next that go in that direction that I'm excited about as a company. To be honest, what's top of mind for us is just scaling the business both in terms of customers, but really scaling the employee base as well.
01:13:27:16 - 01:13:45:12
Suresh
We based out of it's a young company we've added almost, we've grown by almost 40% in the last few months. And so scaling the engineering team, scaling our ability to deliver to really large customers is top of mind. And so that's sort of something else I'm excited about. It's it's,
01:13:45:14 - 01:13:47:15
Chris
It's fun challenge on stage.
01:13:47:17 - 01:14:06:18
Suresh
It's indeed it's very challenging because it's finding good talent and, especially given a strong desire for us to sort of be in one location, have everybody sort of literally work from a single office and so on. Is something that we are we want to hold on to as long as we can. And so it makes it even more challenging.
01:14:06:20 - 01:14:13:19
Sandesh
Right? Yeah. Yeah, yeah. Well, you you seem to like challenges. So I, I have a feeling you'll you'll figure this one out.
01:14:14:01 - 01:14:18:13
Chris
I'm not sure you're going to make it to retirement. I think it's it's not, it's not. You know.
01:14:18:18 - 01:14:22:05
Suresh
As long as life is fun, why bother retiring exactly.
01:14:22:07 - 01:14:23:09
Sandesh
Is that is this.
01:14:23:09 - 01:14:24:08
Chris
Is a retirement for you?
01:14:24:08 - 01:14:25:18
Suresh
Indeed. Indeed.
01:14:25:18 - 01:14:43:05
Sandesh
Indeed, indeed. It's easy to do it to, you know, from a sales guy perspective, you you have it's not like there's thousands of potential clients for you. You're really focused on the big, big guys. But also this is a very technical sale. It seems like you.
01:14:43:05 - 01:14:45:01
Suresh
Really need to.
01:14:45:03 - 01:15:01:07
Sandesh
Understand the these components and how they're talking to each other. And, you know, all these resiliency issues and the visibility issues just there's so much that goes into this so that it's interesting. I mean, like you can't just have a now I no.
01:15:01:07 - 01:15:30:02
Suresh
Know you're right now, but I let's see one thing. Sandesh. Come to the first point second. But let me start with the second point of it's a very technical sale. I'll say yes and no because I think on the on the yes side, when it comes to the how do you solve the problem and you explain how your technology works, you definitely need, a strong technologist on the other side that can that can first buy into how you're approaching the solution space and then prove it to themselves in a POV.
01:15:30:02 - 01:15:52:13
Suresh
That is no deploying a solution without testing it in your own environment and so on. So that but the early phase of can I get to that second technical conversation and a proof of concept. What we are finding is articulating the problems and asserting that these are what you're probably seeing in your environment. The person on the other side quickly says, exactly right.
01:15:52:13 - 01:16:10:23
Suresh
And so, yeah, the problems are very easy to articulate. They are something that they're experiencing all day long. So getting our sales teams are able to go to the right. As long as you go to the right person and say it, this is probably what you're experiencing. If so, we want to have a second conversation that's technical, that's proving to be not that hard.
01:16:10:23 - 01:16:20:07
Suresh
And so at least half so far, I think Joe and the sales team is generally having a good time of being able to engage customers and get to the next conversation.
01:16:20:09 - 01:16:23:13
Chris
Yeah, it's a well-recognized problem.
01:16:23:15 - 01:16:25:10
Sandesh
Sale is what really makes.
01:16:25:10 - 01:16:26:12
Suresh
That's right, that's right.
01:16:26:12 - 01:16:34:08
Sandesh
You know. That's right. I guess you're right. Yeah. So like, yeah. You need a technical person to explain it, but the business value is really clear.
01:16:34:08 - 01:16:50:23
Suresh
Exactly, exactly. And, you know, Joe is a big Joe. Our CRO Joe is very big on saying, look, I need to be able to translate this into what it means to the customer in their business. And he's saying, this is not hard for me to do in this case. And so that's step on the question of how large is the customer base.
01:16:51:01 - 01:17:29:16
Suresh
Something else I'm excited about is, is, so today we focus on really two very targeted customers. One is Neo Clouds. There's about 180, I believe, of them. And, and sort of over time it'll probably become a smaller number, but that's one group. And the second one is extremely large enterprises that have more than 256 GPUs that are doing training now in the not too distant future, we are bringing something that allows you to address people doing training in clouds, with a small number of GPUs and let's say an Azure or a Google, and bring some of the value even to those environments.
01:17:29:16 - 01:17:52:01
Suresh
And so that suddenly expands the group to a much larger audience of who we can solve. Part of it is also being very careful to say, let's frontload where, sales and marketing efficiency can be, highest, where there's sort of, large deals with technically sophisticated buyers is where we can make the fastest progress. And so we'll stay focused on the first audience.
01:17:52:01 - 01:18:11:02
Suresh
But gradually we are seeing an opening up of the opportunity to even sort of cloud hosted training customers and as I mentioned, inference is feeling very much like a distributed application. And so that's the next step that will open it up larger. So we see a pathway today for sure. It's focused on 500,000 customers as the as the focus.
01:18:11:04 - 01:18:14:17
Chris
But what a great place to start because you know that's exactly right. Target.
01:18:14:17 - 01:18:21:10
Suresh
Yeah exactly. I worry more about branching out sooner than we should than about not branching out.
01:18:21:12 - 01:18:22:08
Sandesh
01:18:22:10 - 01:18:33:12
Chris
Yeah. Yeah. Well, and I and I got to imagine as as you evolve over time, there's some foundational technologies that you guys have developed. They're exactly used in a lot of different areas.
01:18:33:12 - 01:18:36:17
Suresh
Exactly. To I could not understand, Chris. I could not agree more.
01:18:36:19 - 01:18:39:16
Sandesh
So we're going to have to stay in touch soonish. Absolutely.
01:18:39:20 - 01:18:41:05
Suresh
Looking for that some days. Chris.
01:18:41:08 - 01:19:01:21
Sandesh
So exciting that where you're at in such a short period of time here and just this conversation I and learning about clockwork, I'm learning so much, I learned so much just through the conversation that I had with Joe and you and just reading about you guys. I joined that webinar as well. So hopefully you're a frequent offender on here, and I look forward to.
01:19:01:23 - 01:19:03:19
Suresh
Yeah, we're all living in fun times again.
01:19:03:19 - 01:19:08:05
Sandesh
So first. Awesome. Well, thank you so much for joining. Thank you. Take care.