The Macro AI Podcast

Nividia Vera

The AI Guides - Gary Sloper & Scott Bryan Season 2 Episode 83

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 14:44

In this episode of the Macro AI Podcast, Gary and Scott break down NVIDIA Vera and why it matters far beyond another chip announcement. 

Vera is NVIDIA’s new data center CPU, but the bigger story is NVIDIA’s push to define the full AI factory architecture — CPU, GPU, memory, networking, interconnect, security, rack design, and software working together as one system. 

Gary and Scott explain why the AI conversation is moving beyond GPUs alone. As AI shifts from simple chatbots to agents that retrieve data, call tools, use APIs, check permissions, and complete real business workflows, the infrastructure around the GPU becomes increasingly important. 

The episode covers how Vera works with NVIDIA’s Rubin GPUs, NVLink, ConnectX networking, BlueField DPUs, and OEM systems from companies like Dell and Supermicro to support high-volume agentic AI workloads. The hosts also discuss why this matters for hyperscalers, neoclouds, colocation providers, mid-large enterprises, and even smaller AI-native companies where inference cost, latency, and model performance directly affect product margins. 

The key takeaway: Vera is partly a cost optimization story. Not because CPUs replace GPUs, but because better architecture keeps expensive GPUs focused on high-value computation instead of wasting time on coordination, data movement, or system overhead. 

For CIOs and AI product leaders, Vera raises a critical question: where should each AI workload run? Some AI belongs on the PC, some in SaaS, some in public cloud, some in neoclouds, and some in private or colocated AI factories. 

Enterprise AI is becoming a distributed system — and the winners will be the companies that understand which workloads belong where. 

Send a Text to the AI Guides on the show!


About your AI Guides

Gary Sloper

https://www.linkedin.com/in/gsloper/


Scott Bryan

https://www.linkedin.com/in/scottjbryan/

 

Macro AI Website

https://www.macroaipodcast.com/

Macro AI LinkedIn Page:  

https://www.linkedin.com/company/macro-ai-podcast/


Gary's Free AI Readiness Assessment:

https://macronetservices.com/events/the-comprehensive-guide-to-ai-readiness


Scott's Content & Blog

https://www.macronomics.ai/blog






So join us,  let's explore, learn  and lead together.  Welcome back to the Macro AI podcast.  Today we're going to talk a little bit about Nvidia's Vera platform and what that is. And for those of you that are not familiar with Vera, it's Nvidia's new data center CPU. But really the bigger story is that Nvidia is trying to control more of the quote unquote AI factories. So think of CPU, GPU, memory, networking.

01:26
interconnect security rack design and software. ah So for the last few years, everyone is focused on GPUs, know, the H100s, Blackwell, Rubin. And the GPU has been really the star of the AI infrastructure story. But as AI moves from chatbots to agents, as we've discussed in the past, the system around GPU becomes much more important. We did an episode on AI PCs  and thank you for a lot of the feedback we received.

01:55
and some additional questions that we've answered. And that episode was about intelligence moving closer to the endpoint. Vera is on the other side of the story. It's about the back-end infrastructure required when AI becomes  industrial. Yeah. Yeah, Gary. That's kind  of what I was thinking. That's really good framing there.  It's much more than  just a chip episode, a chip story. This is  kind of more of an infrastructure story. Right. And I think kind of the

02:24
The business question is what happens when AI starts becoming part of real workflows, which it's already doing inside of medium large businesses everywhere. So the agents don't just answer questions. They go out and retrieve data. call tools, use APIs, check permissions,  complete tasks, uh all while maintaining context. And that creates a much more demanding infrastructure problem. And that's what we're talking about here. Yeah, exactly.

02:52
For you listening in plain English,  is NVIDIA's ARM-based CPU for AI data centers. It's designed to work tightly with NVIDIA's next-gen Reuben GPUs,  and that as part of the Vera Reuben platform. So the important point is that NVIDIA is not thinking about this as, you know, in a traditional server sense where you plug in a GPU and call it AI infrastructure. They're thinking more at rack scale. In the Vera Reuben NVL72 design,

03:22
NVIDIA combines various CPUs,  Rubin GPUs, NVLink Connect X networking, Bluefield DPUs, memory, cooling, software into one integrated AI system, which is really interesting. The goal is to make the rack really behave less like a collection of servers and more like one comprehensive AI computer, which for many of us that started in the data center, we suggest have two or four post dummy metal racks.

03:52
Yeah,  exactly. And now we're talking about AI performance.  And so it's not just about raw compute anymore. And once you get into large scale inference,  know, long context windows, retrieval, tool calling, and the workflows, that bottleneck can become, you know, data movement. And the bottleneck can become the data movement. And how fast the system can move data between CPU, GPU, memory, network, and storage is all a factor. Yeah.

04:20
Yeah, that's exactly the point. The GPU still does the heavy AI math and  Vera does not replace the GPU. uh But the CPU helps coordinate the workload. It schedules tasks,  execution, uh supports a data pipeline, interacts with storage and networking, and helps orchestrate the broader AI system. With Vera, NVIDIA is trying to make the CPU and GPU work together more efficiently  through things  like uh high speed,

04:50
connectivity, coherent CPU to GPU links, then  an NVL link connects the GPU. So ConnectX handles high performance networking, if you think of it that way. So Bluefield DPUs help with infrastructure services, security, isolation, data movement. So the real story is not Vera by itself, it's Vera plus Rubin plus NVLink plus networking plus DPUs plus software. And that is  what NVIDIA means by

05:19
really their AI factory. There's a lot of components here to unpack virtually, but also physically. Yeah. And that, I think that phrase right there that you just said, the AI factory  is a pretty useful one for business leaders. It's, when you think about it, an AI factory  takes inputs and produces outputs. So taking data context, models and workflows, and then putting together uh intelligence or completed work. So the metric

05:48
starts to change. So back when you think about chat bots, people were talking about cost per token, but now we're talking about big systems with lots of agents. And I think the better question is kind of cost per completed task. So you got these big, complex, more complex systems that you can buy. And so now you're thinking about cost per completed task. did it complete the workflow correctly? Did the agent resolve the support ticket?

06:14
Did it review the contract or whatever the case may be? So you're looking at the business case a little bit from a little bit of a different perspective. Yeah, it's a good point about how, you know, the AI system complete the task. That's actually a really good way to put it. And this is why the infrastructure matters. An AI agent may need multiple model calls, multiple retrieval steps, tool calls, security checks, and really intermediate outputs to complete one task.

06:42
So if the system architecture performs efficiently, it can lower the cost per completed workflow. You kind of, you know, that goal of what you were just talking about. And that is the vera story for business leaders. It's not Nvidia built a CPU. It's Nvidia is building the infrastructure really for high volume agentic work. Yeah. Yeah. I think it's important to note that the ecosystem is really forming around it. Um, you know, the ecosystem of vendors. So Dell, uh, announced

07:11
PowerEdge systems and that's using Nvidia Vera Rubin and VL72 You  also have super micro out there they announced they announced a package with Vera Rubin and VL72 plus the HGX Rubin and VL8  and The Vera CPU systems that that we've been that we're talking about  In a video has also talked about other relationships with major partners like Cisco Dell HPE Lenovo super micro so I think this announcement is you know moving right into the

07:41
enterprise infrastructure story or market. Yeah. Good point. And that's important because it changes the buyer conversation. This is not only for hyperscalers. Yes. Hyperscalers and Neo clouds will buy at massive scale, but mid to large enterprises may also buy OEM systems and put them into  their own co-location facilities, whether they're using  a third party or their own, right. Um,  or even private AI environments and,

08:10
And I would go one step further.  Some smaller companies may be better candidates than larger companies if AI is the core to their product, right? Not this large um conglomerate of a company. Maybe a better fit for some of these smaller institutions. um The dividing line is not the company size. The dividing line is uh the AI intensity for each business. Yeah. Yeah. When you think about AI intensity, I think that's perfect point. So like a large enterprise,

08:40
using AI mostly for say, you know, internal productivity might not need a dedicated VeriRubin class infrastructure, but a much smaller AI native SaaS company or a cybersecurity platform. You and I both work with voice AI providers, legal AI companies, whole bunch of different potentials out there, whatever. They might absolutely look at dedicated infrastructure if the inference costs

09:06
latency and model performance are tied directly to their product margin that they're putting out in the marketplace. Right. And if your AI is your product, it's your core, you know, gemstone of the company. Infrastructure is not just IT. It's, it's the cost of goods sold, right? So go back to your, your accounting days. It's the product performance and customer experience. may be part of your competitive moat, but it is, it is core to your organization. It's not just a,

09:35
you know, an operational expense to run the business. Yeah. And you could maybe just uh get what you need through a SaaS model from a vendor. Correct. So, so a smaller AI heavy company may  decide to buy or lease OEM AI systems, place them in a high density co-location facility and connect them to cloud environments. And multi-cloud connectivity designs is where I've been spending a lot of my time  with a lot of customers recently,  where they want to connect to

10:05
you know, uh public cloud providers, but also have a data center environment as well. So it becomes a hybrid and multi-cloud  design. Yeah, and that network piece gets pretty complex. It does, it does. And for those of you that uh may be new to the data center space, because you were told to get out of the data center for so many years, you didn't really focus on it. Now it's coming back. Right. And these OEM AI systems that we're talking about  offer

10:34
companies that are listening right now, a middle ground between public cloud APIs and building their own data center completely. Right. Exactly. Yeah. And what we're talking about here is, is obviously pretty, serious infrastructure. I mean, these are, these are dense, expensive power hungry, oftentimes liquid cooled systems, and you need the right co-location facility with the right power, cooling networking security. And then you also need your own AI engineering team.

11:02
to whatever extent that can keep the hardware actually utilized. So it's not for every company, but the ones that really need it and have the expertise. Yeah, exactly. And the question isn't, can we buy it? The question is, can we use it enough, operate it well enough, and turn it into business value for our organization? And if the answer is yes, dedicated AI infrastructure can make sense. If the answer is no, cloud or managed platforms may be better for your company.

11:31
And Vera is uh partly a cost story, not because lower cost CPUs replace GPUs, but because good architecture keeps expensive GPUs focused on high value computation instead of wasting time on coordination, data movement, uh or even system overload. So those are the things you just need to weigh for your specific use case. Yeah. Yeah. We talked about that cost story with CPUs versus GPUs a while back. I think... uh

12:00
earlier, later last year. But, um, you know, so for CIOs and AI product leaders,  uh, Vera is really a, it's a, it's a forcing function. It kind of forces them to ask where should our AI workloads run? And some AI belongs on a PC, like we talked about,  uh, some belongs in SaaS and you can get it through a, through your SaaS vendor.  Uh, some belongs in public cloud, maybe, maybe in Neo clouds.  Um, and then like we've been talking about,

12:28
Some belongs in private or co-located AI factories. And I think there's more of a push towards looking at that. lot of cost models being done right now. So I think the future is hybrid.  Yeah, I totally agree. that's a big takeaway. Vera matters because it shows that Nvidia is moving beyond the GPU only conversation that everybody's been talking about for  the last 12 to 36 months. They are building the AI factory as a full system.

12:58
CPU, GPU, memory, interconnect, ah networking, DPU, software, and the rack design we just talked about.  And as AI becomes more agentic, that system architecture becomes a competitive advantage in the marketplace. Yeah. Yeah, exactly. So I think that if you kind of back up, think the simple thesis is that, know, Vera's  not important just because Nvidia  now has a CPU.

13:25
is important because Nvidia is,  fine tuning and building the backend infrastructure for industrial scale AI and all the potential use cases in there. And they're doing it as cost effectively as possible because the environment is getting more competitive. There are more chip offerings out there. There's a lot going on, a lot of ecosystems. And so now they're, they realize that they need to also be cost competitive and have the full ecosystem. Completely agree.  And I think for the business leaders listening to the show right now.

13:55
We always try to do some takeaways. think that one takeaway is this, enterprise AI is becoming a distributed system. And we can certainly go off into a lot of different tangents about that and where data centers are placed. ah But the winners will understand which workloads belong where. And they will design their artificial intelligence architecture accordingly to where that proximity needs to reside. Yeah. Yeah, I think that's it, Kerry. Yeah, I think you're right.

14:23
I hope this short episode was informative for you and thanks to everyone for listening to the Macro AI podcast. always, please share with your network.  Any questions, please reach out to Scott and myself.  And until next time, keep  leading in the AI era.