Agentic AI at Scale: How to Build Without Losing Control Artwork

AI Proving Ground Podcast: Exploring Artificial Intelligence & Enterprise AI with World Wide Technology

AI deployment and adoption is complex — this podcast makes it actionable. Join top experts, IT leaders and innovators as we explore AI’s toughest challenges, uncover real-world case studies, and reveal practical insights that drive AI ROI. From strategy to execution, we break down what works (and what doesn’t) in enterprise AI. New episodes every week.

All Episodes

AI Proving Ground Podcast: Exploring Artificial Intelligence & Enterprise AI with World Wide Technology

Agentic AI at Scale: How to Build Without Losing Control

March 31, 2026 • World Wide Technology: Artificial Intelligence Experts • Season 1 • Episode 75

0:00 | 34:11

Agentic AI is getting real fast. The challenge isn’t building it anymore. It’s running it without losing control.

Recorded live at NVIDIA GTC, this conversation gets into what actually changes when AI moves into production. Not in theory, but in practice.

As organizations start designing for real usage, new pressure shows up. Systems get more complex. Costs become harder to predict. And security can’t be something you fix later.

Cisco’s Kevin Wollenweber, NVIDIA’s Kevin Deierling and WWT’s Neil Anderson talk through what they’re seeing right now and how teams are adjusting in real time.

If you’re thinking about how to scale AI without losing control of your environment, your costs or your risk, this is worth your time.

More about this week's guests:

Kevin Wollenweber is Cisco's Senior Vice President and General Manager of Data Center and Internet Infrastructure. In this role, he leads product strategy to enhance Cisco's infrastructure solutions for the data center, high-performance routing, and mobile networks. His leadership is pivotal in driving growth and developing cutting-edge solutions to meet the dynamic needs of businesses worldwide.

Kevin Deierling is the senior vice president of Networking at NVIDIA, joining from Mellanox Technologies where he was senior vice president of Marketing. He has been a founder or senior executive at five startups that have achieved positive outcomes. Combining both technical and business expertise, Deierling has variously served as the chief officer of technology, architecture, and marketing at these companies, where he led the development of strategy and products across a broad range of disciplines including networking, security, cloud, big data, virtualization, storage, smart energy, and DNA sequencing.

Neil Anderson has over 30 years of experience in AI, Software Development, Wireless, Cyber, and Networking technologies. At WWT Neil is VP and CTO in our Global Solutions and Architectures team, with responsibility for over $16B in WWT's solutions portfolio. Neil advises hundreds of Fortune 1000 companies on their global architecture and technology strategy.

The AI Proving Ground Podcast leverages the deep AI technical and business expertise from within World Wide Technology's one-of-a-kind AI Proving Ground, which provides unrivaled access to the world's leading AI technologies. This unique lab environment accelerates your ability to learn about, test, train and implement AI solutions.

Learn more about WWT's AI Proving Ground.

The AI Proving Ground is a composable lab environment that features the latest high-performance infrastructure and reference architectures from the world's leading AI companies, such as NVIDIA, Cisco, Dell, F5, AMD, Intel and others.

Developed within our Advanced Technology Center (ATC), this one-of-a-kind lab environment empowers IT teams to evaluate and test AI infrastructure, software and solutions for efficacy, scalability and flexibility — all under one roof. The AI Proving Ground provides visibility into data flows across the entire development pipeline, enabling more informed decision-making while safeguarding production environments.

AI Strategy Has No Finish Line

SPEAKER_03 0:00

If you're treating the latest model release or infrastructure as the finish line for your AI strategy, I've got bad news for you. You're already behind or may have lost the race already. Because what matters most now is whether you can govern agents, secure token flow, and scale inference before experimentation turns into operational sprawl. From Worldwide Technology, this is the AI Proving Ground Podcast. And this episode, which features three leading experts from NVIDIA, Cisco, and WWT, was recorded live on the show floor at NVIDIA GTC, where one message was impossible to miss. Enterprise AI has moved from model fascination to production reality. We'll be joined by Cisco SVP of Data Center and Infrastructure, Kevin Wohlenweber, NVIDIA SVP of Networking, Kevin Deerling, and WWT CTO of Cloud Infrastructure and AI Solutions, Neil Anderson. Together they'll talk through what leaders need to do now to build secure AI factories, prepare for agentic AI, manage token economics, and design for hybrid inference at enterprise scale. Because the hard part is no longer getting AI to work, it's making it secure, measurable, and worth the spend when thousands of agents, users, and workloads all hit the system at once. So let's jump in.

What Actually Changed at GTC

SPEAKER_03 1:23

Gentlemen, welcome to the AI Proven Ground podcast here at NVIDIA GTC, Kevin Wollen Weber, Neil Anderson, Kevin Deerling. I'll do a little dancing here with both Kevins. Appreciate Neil, you sitting in the middle and helping me out there. How are you guys doing today? Doing great. It's been a great show. Yeah.

SPEAKER_00 1:38

Outstanding. Lots of robots, lots of cool AI everywhere.

SPEAKER_03 1:42

It's a cool show. It's always exciting. So you three kind of come as a package deal. We've met at Cisco Live. We've met at NVIDIA GTC. We've met virtually. You kind of feel like you guys are just like interlocked arms walking around the conference floor here. But you know, Kevin, maybe we start with you here. As

From Demo to Deployment (The Hard Part)

SPEAKER_03 1:58

you think about, you know, when we talked last time out in at GTC 2025, what's changed, you know, over the last 12 months and and and more so, what does that mean for enterprise leaders trying to drive AI?

SPEAKER_01 2:09

Yeah, I think it's been awesome. Because if I look at that, even over the last three years, we went from you know some ideas and some some big thinking that we wanted to start working together and building things to actually announcing some stuff we're building to now actually seeing real live deployments. And and what I love is with the pace of innovation, you know, when we see things like OpenClaw and we see some of the agentic workflows happening. Oh, oh, you got that out. We all need to get some errors. I didn't realize there was a cover that quick. But uh with all that happening in the world, like the the pace of innovation and that that rapid spike of people actually doing things with these agentic workflows has been amazing to watch. Yeah, Neil, build on that.

SPEAKER_02 2:44

Yeah, I think I would say it's uh we're we're really moving into a lot of production deployments. It's been theoretical a little bit for some companies, it's been prototyping. What can we do? But I'm seeing scaled deployments. That's that's why I think is one big difference. We're we're talking about how do you scale this a lot more than architecture and a little bit of, you know, how do how would I build this prototype? But it's it's a lot more like, all right, I gotta turn this loose on 10,000 people. Like, what does that look like?

SPEAKER_03 3:10

Yeah, I mean, Kevin, things are out in the open now. Jensen's had the keynote, we've had a chance to walk around and see what's going on. But you know, do anything surprise you from what you've seen here so far, or maybe what you heard from Jensen?

Agents + Tokens = New Economics

SPEAKER_00 3:21

Yeah, I think just the scale and the speed, and it's just happening everywhere all at once. You know, we talked about the fourth scaling law Jensen introduced, which is agents. So agentic AI, the amount of tokens that it's generating. I also think he explained really well that not all tokens are the same. There's free tokens in a free tier where you're gonna maximize your AI factory, just token production. And then at the other end, there's massive amounts of tokens being consumed as fast as possible by a user, and that user can be an agent. And I think that understanding of the tokenomics, it's extraordinary. And you can figure out where you're gonna operate and maybe have different parts of your factory operating at different points, and then of course, security.

SPEAKER_01 4:08

Well, that I mean that's what's amazing to me is this this non-linear effect that we see, and and it takes something transformative, like like, oh, open claw and agents, and then you're like, before we were trying to figure out okay, is everybody gonna build a factory? Why do we need to generate our own tokens? And now it's well, we need to be able to generate our own tokens efficiently because I'm gonna have agents popping up all over the place that need to consume those. So it's amazing watching how fast that stuff goes.

SPEAKER_02 4:30

I thought it was also fascinating. Jensen talked about like it's gonna become part of the benefit of you employed at a company like what's my token budget? Right? My salary, my benefits, my token budget.

SPEAKER_03 4:40

Yeah, I mean, Neil, so the the AI factory is certainly becoming more of an operating model and not just a hardware stack. How does that change

Why AI Factories Go Hybrid

SPEAKER_03 4:47

the math for enterprise leaders just trying to make sense of it all?

SPEAKER_02 4:51

Well, you know, I think it's playing out, I think, exactly the way that we all kind of thought it would, right? We're seeing more and more demand for what I would call private AI factories. You know, like it's one thing to, of course, you're gonna prototype in cloud, you're gonna scale some things in the cloud, but a lot of times with our clients, this is the most sensitive intellectual property and data that they have. And so they want to make sure that that is contained in an environment they can secure and they're comfortable with. They want to make sure the cost model is the right. And I and I think that's that's another realization I'm seeing with clients is they're doing the math. Like, what is this gonna cost to scale it out in somebody else's infrastructure versus hey, maybe I can operate this more efficiently myself.

SPEAKER_00 5:34

Yeah, and I think I heard Jensen say today we were talking about the open model builders that you can host on a secure platform. That's Nimatron, that's Llama, that's the Mistrals, you know, all kinds of mods. And Jensen said, but it's not an OR, it's not that you're gonna use this frontier model where you're paying high bucks for some tokens in a frontier model that somebody else has trained, or you're gonna do it on-prem. He called it the tyranny of the oars. It's not a tyranny of the oars, it's an ad. You're gonna do both. And when you need to have secure, for whatever reason, governance or privacy, HIPAA, you want to do it with your own data on-prem, you want to do that for economic reasons, and then you'll call off an agentic, it's gonna be calling lots of models. There's gonna be different models for different different horses for different courses. So you're gonna actually see all of these models. It's a hybrid world. So you're gonna have on-prem, you'll build that out where you need to, and we'll do this all automatically. The agents will do this for us, choose which model to use when, and you kind of give it a budget and let it go.

SPEAKER_01 6:38

But but it's only logical, right? I mean, AI is not one thing, it's not one application, and so it makes sense that you're gonna have multiple ways of implementing it, multiple places to run it, and you know, multiple levels of security and policy and things you need to apply. It makes sense, but it just took some time to get there.

SPEAKER_00 6:52

Yeah. And Gense characterized these agents as like workers. Yeah, you know, they're your workers, they're gonna help you. And a lot of the stuff they're gonna do is mundane work that you don't want to do. And so that's really important. Just like society is organized. We have different specializations across society. We're gonna see the same with agents. There'll be just a proliferation of great agents that are gonna help us. And I think a lot of the ideation that we end up doing is people is gonna become really fun, and we'll have other people that can do some of the hard work that we don't want to do. Yeah, we're seeing that in in live use cases, Kevin.

SPEAKER_02 7:26

Like there's there's certain functions that it's just kind of overwhelming for a human to even think about doing them. There's just so much data that they have to sort through to do something. It's like it's there's just no way other to scale it than through an AI agent.

SPEAKER_01 7:39

Well, I think you know, you we we've been talking about the coding agents for a long time, and you know, we're heavily leveraging those as we build stuff internally, but we almost did ourselves a disservice that coding was all was like the only thing that we were gonna do. Now that you see these agents popping up doing all these other tasks, the idea of having full teams of people with a couple of your physical employees and then and then your agentec employees and being able to give them tasks and let them perform things, we just we we need to expand beyond that coding use case. And then once we saw people popping up agents like they are now, it it exploded.

SPEAKER_00 8:09

Yeah, and I think what's really interesting about this is the last 20 years, we've talked about the revenge of the nerds in Silicon Valley with you know computer scientists and engineers really leading a lot of the things. I think we're gonna see a revenge of the lit. People, the liberal arts majors that can think clearly, can articulate, and now they don't need to code it themselves.

SPEAKER_01 8:29

I'll tell you, what the what it hit me was the the first day we did a little kickoff in in the morning before before Jensen's keynote. And when I was talking to my head of sales about the five agents that he has and what he does with them, like it really sets in that this goes way beyond just engineering.

SPEAKER_00 8:43

Exactly. So it's gonna accelerate everybody's paying attention to coding. I think what we'll see in this year, yeah, 2026, is multi-disciplines, all kinds of disciplines will start to get accelerated with AI. And people that can articulate clearly, have great ideas, they've been hamstrung because they needed 100 engineers to go execute on some of these ideas. We're just gonna see, Jensen says let a thousand flowers bloom. I think we're gonna see such great productivity come out over the next decade. Yeah, I agree with you, Kevin.

SPEAKER_02 9:12

Like we're seeing that in action. You know, if you think about some of our clients, saw a client the other day, they were telling me that they they've counted, they already have, they think, they're not really sure, but 23,000 agents that have been developed by citizen developers. Yeah. Because what like software coding has done is it's really lowered that barrier. Anybody can write code, right? Really. Now, of course, you're gonna still need software architecture and that kind of thing. Software coders are not going away, but just this idea of being able to do a low-code agent, it just opens it up to so many different things.

SPEAKER_00 9:43

Yeah. And I can't tell you how many times over the course of the last couple of days I've heard Cisco and WWT. Here's an agent, and here's this operating, here's some infrastructure. And we need to pull that together securely.

You Can’t Secure Agents Later

SPEAKER_00 9:57

Yeah, that's security word because if you think about an agent, it's access to data, it's authority, and it's act. So access, authority, and act. You actually can't give just wide open to an enterprise. Hey, you can do anything with any data and you can send anything out. That doesn't work. So you actually need a secure AI factory, and this is where you guys come into play. I can't tell you how many different verticals they said, oh, we're gonna go to WWT with Cisco and make that happen. And so we're happy we're providing the low-level accelerators. Jensen talked about that horizontal stack, but we have to make it secure. We have to make a solution.

SPEAKER_03 10:34

Great partnership. Yeah, I mean, Kevin, I'm glad you this, Kevin. I'm glad you brought up security. Kevin, I was gonna ask you if we're looking at this wave of agents coming at us, how does that change the security perimeter?

SPEAKER_01 10:46

Yeah, well, that's what I love about it is you know, we we we created this idea around security factories, and we've been talking about what it means to have one, but we needed that example of like, why do you need a security factory? And now we have to think about not just we think about there's securing of the equipment itself because it's now a critical piece of infrastructure for all of our enterprises that are deploying it, but it's also kind of you know protecting the models from the people and protecting the people from the models. And so we have to look at identification of agents. We have to be able to apply policy to agents. Most people don't realize these agents are running with your credentials, and so you're basically giving it access to do the things that that you would normally do. It's your responsibility. And so our ability to change the way we do identity, identify agents, apply policies to them, understand that they're maliciously acting and be able to block them, those are all things that have to be implemented, and that's why this secure I factory concept is so critical.

SPEAKER_03 11:36

And Neil, do you do you feel that organizations that you speak with, you know, on the regular, or do they grasp that concept that these are your credentials, or is this a learning curve that we're gonna have to do?

SPEAKER_02 11:45

I think it's really starting to hit home. When you when you we talk about it in terms of like, look, you got to think of an agent as it's it's an employee. Yeah. And just like you would badge them and you would give them coaching and managerial oversight and and train them, provide them some coaching back, like what is their identity? You're gonna challenge that with multi-factor, like all those lessons that we've learned with humans apply to agents. And the sooner than our clients have gotten their heads wrapped around that, yes, it's leading the conversation about, oh, wait a second, what what how do I authenticate an agent? How how do I know it's Kevin, this Kevin, not that Kevin? Like that's odd. It matters, right? And so, yeah, it's it's definitely leading into that conversation a lot more. Not every client is getting it yet, but when when we have that conversation with them, it kind of brings it home like, oh, now I understand. I can't just have agents autonomously talking to other agents without some way to understand what data they have access to. Are they really who they say they are? Like perfect for a man in a middle attack if you think about it.

SPEAKER_01 12:43

And constant monitoring. It's not like you can let them in and then everything's good. You have to continue to monitor and understand are they continuing to act as expected, or do we need to you know change policy and change?

SPEAKER_00 12:52

Just like you would one of your employees. Exactly. Yeah, that's that's perfect. And I think you know, if you look at that, that automation that you're talking about with agents, they can actually operate a lot faster than humans. Yeah, so the attacks are coming faster. So this is why we're so excited with hyperfabric. We have a bunch of accelerators with our blue field. You guys are supporting that. The hyperfabric announcement came out. Super excited to see that because now we can automate that sort of threat detection. And previously you would have thought Cisco is all about networking and protection using access controls. It's gone much higher. It's identity, it's authentication, it's all of the things that you do for it. Well, and it has to get fused into the network too. It can't just be a thing that sits on the side or on the perimeter, it's got to be part of that entire ecosystem, which is the beautiful thing about hyperfabric. It's not just an edge protection that's important, but it's everywhere. It's everywhere and runtime. So it's fantastic. Exactly.

SPEAKER_02 13:44

Yeah, and if you think about the architecture, I think it's fundamentally changed in the security architecture. If you think about, you can't think about like perimeter firewalling. That's not going to work for this. There is no perimeter. You have to, there's no perimeter, first of all. Yeah. So this idea of a mesh firewall like Hypershill that's embedded into the process of agent-agent communication and agent to data, like that to me, that's the only way to architecturally solve the problem.

SPEAKER_03 14:07

Kevin, I mean, security baked into Cisco's Security AI factory with NVIDIA. I mean, that that that's not new. That was we were talking about that at this time last year, if not before. And now the uh the claws are out on the other Kevin's head. A little bit. The claws are out. Um, I mean, uh, we've seen some um announcements here this week. How else is uh the security AI factory evolving?

SPEAKER_01 14:25

Well, yeah, and actually Neil was talking about some of the underlying technology components, but well, one of the things we announced here was the ability to take that hybrid mesh firewall and bring it all the way down to the blue fields that are running in that security factory. So, yeah, we we can have firewalls sitting around that perimeter, and we've we've been doing that since the beginning. We can have AI defense protecting the models, but now we can actually bring that identification and policy management layer all the way down to the infrastructure itself, run it on those blue fields that are sitting in the AI factory and and run it directly in the in the infrastructure stack.

SPEAKER_00 14:56

Yeah, it's probably hard to take me seriously, but this clause is a huge thing. You know, we announced Nemo clause this week. We had the guys in that wrote clause Peter was in at Nvidia. It's amazing how people are coding now and developing things. You're telling an agent what to do, you're telling another agent to red team that agent, and a third agent to patch the hacks that have been found. And underlying all of this, there's, you know, you're planning, reasoning, and acting. Planning creates a to-do list, just like humans. We have a to-do list and we track, oh, this is what I'm doing now. I've completed this task, and you follow that to-do list. Then we have memory. It turns out thinking needs more memory, and it's memory writ large. It's across the storage stack. And so the integration with storage partners becomes really critical because you have an agent, it reads 600,000 documents, and it summarizes that into something that it stores and passes on to the next agent. And having all of that protected, who's allowed to access what is really important when you have distillation of knowledge and thinking. So I think the the combination you have with all of your storage partners that comes together at WWT, that's a fantastic uh well.

SPEAKER_01 16:06

I mean, Jen Jensen was on CNBC, and that's what he said. He said this was like that open AI moment where you know a technology ramped faster than anything we've ever seen in the open source world and just opens up the door for us to do so many things with.

SPEAKER_03 16:18

Without I can take my claws off. And for the listening, yeah, for the listening audience, uh, yeah, Kevin, Kevin Deerling had uh some open, well, I don't know if they were open claws uh uh headband, but uh it was good. It was good.

Storage Is Becoming Memory

SPEAKER_03 16:38

We started to bring up memory and storage here. I mean, has that always been a very vital part of the equation, or is it just now, are we just now realizing it?

SPEAKER_02 16:46

Well, I mean, data storage at large has always been a fundamental ingredient. If you think about like what is an AI factory, it's taking power and data and producing tokens, right, at the end of the day. And now we're talking about doing it in a secure way as well, right? But uh, so data has always been there. But I think people have always thought about data as, you know, well, it's the storage, it's high performance storage that can feed the GPUs the data that they need when they need it. But they're but as these models are getting more sophisticated, like Kevin pointed out, when you're talking about agentic, you're talking about many agents accessing many models simultaneously. That takes memory, right? And so that has become a fundamental, a little bit of a constraining factor. And I think it's creating demand on the industry, which is fantastic for all of us. Yes, there will be pain in the short term because of that. But if you think about our careers, like this is this is creating an amazing investment in IT.

SPEAKER_01 17:36

Well, even beyond the token generation, think about the impact that agents are gonna have to the network itself. So, you know, you we sleep, agents don't. And so you're gonna have agents that are running 24-7. And so more net more traffic on the network, more access to storage, more access to resources. And so we're gonna have to think about not just building the token factories and be able to generate these tokens securely, but what are these agents gonna do when you have, you know, when Kevin's got his 10 agents running for him while he's sleeping, it's 10 times the Kevin all operating on the network at the same time. And I won't be able to handle it.

SPEAKER_00 18:04

One Kevin's bad enough. And you know, it's not just memory that are in high demand, it's tokens. Yeah, there's a huge shortage of tokens. And so one of the things that we're doing to address the memory and the token demand is rather than recompute things constantly. You know, we're showing at over at the NVIDIA booth, we're showing, we call him Ricci the Tax Man. It's a little robot. He ingested all of the IRS code, and you can ask him questions. And in one case, every time somebody new comes in, it re-ingests all over the IRS code. It runs through something called prefill and it creates something called a KB cache. And it's a bunch of AI data, it's a new class of data. It's not the written rules, it's something that AI can immediately answer questions. So you start asking him questions about tax deductions or whether you should be able to write off something. You might see me at the booth later. There you go. Yeah, and we thought it was timely. You know, you can ask it to how do I cheat on my taxes? And I think we pre-programmed one to say you have to talk to a lawyer. But that's what guardrails are all about. I mean, that's security account. That's it, that's what guardrails are about. But the point there is that by pre-computing it and reusing it, we've run out of memory. We store it, it goes into storage. So you start to think about storage as a new tier of data center memory. And there's much, much more storage than there is DRAM in the planet, 50 times more in terms of the actual price per bit. If you just look at the cost, the power per bit. And by reusing data rather than recomputing it, we don't have we can take those GPUs and we can use them to generate different tokens. So you get efficiency and you get faster time to response and better power efficiency. All of it is good. But that that's what's crazy, right?

SPEAKER_01 19:48

We've been we've been training for for so long. For the last three years, this has been about training networks and building the big model builders. We're entering into this world of inference, which means using the technologies in different ways. And so storage was really important in training as well, but it's going to become even more important as we enter this world of inference.

Why Inference Stresses Networks

SPEAKER_00 20:05

Yeah, what's interesting about storage for training is actually people think training is incredibly hard, and it is. It takes a huge amount of compute, it's an enormous amount of data, but it's actually a very predictable process. You start with a hundred petabytes of data, the world's all of the world's data, and you train on a chunk of it, and then all the GPUs share what they learned. There's a huge amount of networking, and then you grab the next chunk of data, but you know ahead of time exactly how you're gonna operate. Inferencing is the wild west. There's hundreds of agents, they're all running asynchronously, they're all getting something new, and you don't know what you're gonna need next. You don't know what the next piece of data. So it actually puts more stress on the network and the storage and inferencing. People were completely confused that inferencing was easy, easy. Inferencing is way harder from a compute and algorithmic perspective, and the pressure that it puts on the network and being long-time networking people.

SPEAKER_01 21:02

I'm really excited to see kind of what this is gonna do to what we build. Yeah.

unknown 21:06

Yeah.

SPEAKER_03 21:06

Well, I mean, Neil, if if if inference is going to be kind of that up and down and unpredictable, what other constraints or what other challenges is that place, not only on storage, but on on the network or any other piece of the factory?

SPEAKER_02 21:19

Well, I think I I think of a couple things. So, you know, like, and Kevin pointed it out, like predictability. Creating models is kind of predictable. You you know what your workload is, you can almost schedule it. It's gonna start here and it's gonna stop there. And but you're you're exactly right. In inference is a whole lot less predictable. Like we we were only scratching a surface of what people were gonna do, which is sort of my second point is how how are you gonna limit that? How are you going to put any sort of governor on all these citizen developers that are doing the most creative stuff? Like it it it is kind of the wild west. I think the next frontier is gonna be all right, how do we how do we almost timeshare to make sure that no one Person is consuming more than their fair share of token.

SPEAKER_01 22:03

That unpredictable thing that was Kevin, Kevin was talking about too. You're going to have these like spikes of, you know, you're going to have events that drive just a mass of agents doing things at the same time that are going to do things to the network that we've never seen before. And so I think we're also going to start thinking about preparing for just a completely different way the network's used and how we're going to how we're going to mitigate it.

SPEAKER_00 22:19

Yeah. And part of this is having a community to solve these problems because there's all different kinds of problems. You know, we talked about Dynamo. We just released our 1.0. Jensen announced it last year. He called it the operating system of the AI factory. We're going to have to make that now the operating system of the secure AI factory. We have to secure it. But it's really about taking all of that. We introduced a new storage platform, STX and CMX, which is containing all of this context. And context is not just the low-level KB cash that we put on CMX. It's all the context for the agents. When you're passing off from one agent to another agent, it has things like to-do lists and files and memory. Some of it's ephemeral, some of it needs to be durable. All of that governance protecting it.

Who’s Controlling Token Spend?

SPEAKER_00 23:12

It really makes things more complex.

SPEAKER_03 23:15

Just a moment ago, Neil, I mean, we were pretty explicitly tying AI factories to token economics to tokenomics. Does that do you think that has to change the math as far as leaders looking at ROI? How does it how does it change the lens of ROI?

SPEAKER_02 23:30

Well, I think, you know, and Jensen mentioned this the other day. I I think there's there's an amazing, you know, sort of lowering of the cost of tokens. But at the same time, that's creating the opposite effect you would imagine, which is massive, massive usage because it's more affordable. G Ven's paradox. Remember that we talked about that before. Yeah. But so so that's happening. And and I think that's just gonna continue. I and I and I don't think it's going to fundamentally change the math. The math is kind of the math. Like you can, if you have this power of compute and you have this much power that that's gonna take, you're gonna be able to create this many tokens per second. That's like fundamental building blocks of the architecture. I think I think where it's gonna get difficult though is like, okay, I can generate this many tokens, but what what does that demand curve look like? When are they being used by whom? And how do I police a little bit about, like I said, who's who's maybe becomes a top talker, then are you they're they're more using far more of their share. Let's say so, like these are not infinite things. You mentioned like it's it's a scarce resource, is the way to look at it. So that's that's kind of what we're what I'm thinking about is the next thing of the how how we're gonna police a little bit and get kind of you know what I can't remember the Cisco term for like fair sharing, yeah, but uh that's what we're talking about.

SPEAKER_01 24:43

First, it's it's gonna come down to observability and visibility, and we're gonna have to you can't you can't police what you can't measure, right? So first we're gonna have to measure it and understand it, and then we're gonna have to come up with the other.

SPEAKER_00 24:51

Well, you guys have some pretty good tools for measuring and observing things. So, and I also think that you'll see some of the tokenomics where you're saying, hey, you know, I've got some tokens I can run on-prem with these open models that are really, really cost efficient. Yeah, and I have some other tokens that I'm gonna go out, and when can I use which and that balance? I think you're gonna start to see, and we saw that in Genesis keynote where he said, Hey, 25% of these tokens, if I'm an AI factory operator for providing them, 25% are gonna be free tokens that I'm gonna get for the free service. 25 are gonna be the mid-tiers, then the high tier, and then the ultra-valuable tokens. Just think about the type of workloads.

SPEAKER_01 25:30

Like if you're asking simple questions, or you don't you don't need the the most intelligent thing in the world, and that's right. If you're doing brain surgery or something like that, you're you probably want something pretty smart doing it. And so that kind of tiering of of intelligence or or task is absolutely gonna have to happen.

SPEAKER_00 25:44

Yeah, and Jensen was talking about it from the supply side, yeah, from the AI factory provider. There's also from the consumption side. If you look at an enterprise, hey, how do we do that? Let's constrain 25% to come from this on-prem tier with these models. We'll have another, and you can dial those up and down over time. You know, somebody who is overconsuming, they're gonna find their token rate start to slow down because you're gonna be shifting to a slower tier because you've been overconsuming. And then you got to go to your boss and say, Hey, can I have some more tokens?

SPEAKER_03 26:18

Yeah, so I I guess employees or uh prospective employees are gonna be saying, I want to work at a place that's gonna give me a boatload of tokens. It's not, yeah.

SPEAKER_00 26:25

No, we see that all the time at NVIDIA. I mean, we spend, I don't want to say how many billions, but it's it's definitely billions. Fair amount. A fair amount. We'll bleep that part out. Yeah. No, I mean, it's it's really important for our RD, for our investigators, the people that are building, you know, Nematron and all of these models, they have to have access to massive amounts of compute in order to do their life's work. And so we see that happening. And the good thing is is Jensen understands it and he writes the big checks. He writes the big checks. So yeah.

SPEAKER_02 26:55

Yeah, I was thinking about this yesterday when I heard, you know, or well, Monday when Jason Jensen made that comment. And this comes in waves, if you think about it. Like, do you remember when it was like, well, what kind of laptopper am I gonna get access to if I take this job? Right? That was a big deal. Yeah, right. People didn't want to be just put into kind of the stodgy old laptop. They wanted state of the art. And then it was, well, what is your remote work policy? Because I don't know if I want to come in the office every day. And imagine that fast that conversation next year is gonna be, well, what kind of AI tools do I have access to? What is what is the my token budget? Like it's a fascinating, I think, evolution that's gonna happen.

SPEAKER_01 27:31

And they can be, you know, in the top tier of efficiency when they have more. That they're gonna have to do that.

SPEAKER_02 27:36

Yeah, because you're gonna be finding it. You're gonna, you're gonna have to be tough to attract and retain people.

SPEAKER_00 27:41

Yeah, I think there's gonna be, you know, employees that really lean in and take advantage, and they will be able to absolutely shine. And it they don't have to be a math or a physics or an engineering major. It's somebody that thinks clearly and has product ideas and can articulate those clearly. And then there'll be other people that don't, and you know, the it's that's not good. You know, you can't, I'm not gonna be fly fishing. I'm gonna be working on AI, even though I'd rather be fly fishing.

SPEAKER_03 28:08

We're nearing the kind of end of the uh episode here, getting short on time. And the three of you have been gracious with your time. I know it's an incredibly busy week for everybody here. Neil,

Practical Steps After GTC

SPEAKER_03 28:17

I mean, if this is kind of like this conference is kind of seen as the vision setter for what's what's to come, what are the kind of um immediate practical marching orders that we need to take care of when we return to the office, uh either this week or next?

SPEAKER_02 28:29

Yeah, I was struck by a couple things. You know, well, first of all, in our AI proving ground lab, we're bringing these things to life. We're experimenting with customers, trying to reduce the complexity, get people really into production. That's what it's all about. But I was thinking about this concept of the the, I want to, I want to get this right, the completely vertically integrated company, first in the world, that is also with open horizontal partnerships and integration. And it took me like 30 seconds to get my head wrapped around. But then when you look at what NVIDIA is doing, really partnering with all these different domains, whether it's drug research or autonomous vehicles or whatever it is, like there really isn't anything like that out there. And so I've my brain went to like we have to get really good at helping people to consume those different libraries that NVIDIA puts on top of this infrastructure so that they can bring to life whatever domain they're working in, the best, you know, and most capable software for them. So that's that's my action is like, wow, we got to get really smart about the libraries. We already are smart about many of them, but I mean all of them. There's hundreds of them.

SPEAKER_01 29:37

But what I I love those light bulb moments because uh yeah, there's tons of those libraries. It's almost to the point where it's it's too much information for a lot of people to consume. And when I see people do demos of of certain technologies, you see that light bulb moment, you know, we we we collectively launched some stuff we call AI grid, where you know, we're bringing telcos and they're bringing uh GPU services to the edge, we're laying them on top of existing customers that they have. And they showed a demo of just a simple interview like this, and then at the edge, they were doing transcoding into in a bunch of different languages. And and that's not something you want to bring back to a centralized data center with all the video, use the network to push it back and forth, use tokens to do it in a bunch of places, and then push it back through the network, throw it all the way out at the edge and do it. And then you're like that light bulb moment, of course you would do it at the edge. That's the only place it makes sense. Yeah, and so I think more and more of these examples where people could see it and then adapt it to to what they want to do is I think I think you're talking about our digital human demo.

SPEAKER_02 30:31

We had run it right over here, Kevin, which is and that was part of the use case was how do you bring a concierge experience that's multilingual, text to speech, speech to text, like and and people can just interact with it and it and it's also smart because there's an LLM under the hood. So that's that's exactly what you're you're talking about. And you're exactly right, like we think of that the same way. You the chances of that needing to come back to the data center, like you can't home run all that traffic, you're gonna need edge compute to bring that to real life because our customers are not gonna have one of those, they're gonna have thousands of those, right?

SPEAKER_00 31:00

Yeah, so I'll disclose that for enterprise, they should get started now. Claws is the clause up, clause up, clause up. That's you know, the fourth inflection point is agentic AI. It is so easy to use these things. We had a claw hackfest in the park here. People were running on the little DGX sparks, so sparks in the park, and you can run on the DGX workstation, you can run on RTX Pro 6000, you can run on the Big Iron, and so all of those are possible. It is crazy to watch people program in air quotes. These are people that don't know how to program. They're literally talking into their phone saying, Hey, can you bring up this robot here? Can you go read the manual? There's been things I have been wanting to do, but I just don't want to learn how to use a CNC machine. I'm just gonna say to my agent, go read the manual, and I want to build this thing. And it's gonna have to their device, it'll boom to their Wi-Fi or something and do all that for you. Exactly. So the productivity gains we're gonna see over the next decade. Get started now. There is the ROI Gensen says let a thousand flowers bloom. Every company in the world should be embracing AI and specifically Clause and Agentic AI. We got great people that can help to make it secure and safe.

SPEAKER_03 32:18

It's an exciting future for sure.

Start Now or Fall Behind

SPEAKER_03 32:20

Just real quick, Kevin. I'm sure we'll be catching up at Cisco Live here in a few short months. What are some checkpoints that we should be thinking about that would indicate that we're on the right track towards going that in that direction?

SPEAKER_01 32:31

Yeah, I am looking forward to doing this again. These are always really fun for me, so I can't wait till we do that at Cisco Live. Bring the claws, too. Bring the clause, or maybe we'll have something new that we can wear. There you go. I don't know. For me, it goes back to Kevin's, let's just get started. Like I just I want to see people, if if you're running into problems, you've got great partners like WWT. You've got resources, uh, you got some something falling down from the ceiling, but you've got resources and other things that you can leverage from NVIDIA, from Cisco, from great partners like WWT. So let us help you get started. Let's use some of these examples. And I mean, the pace of innovation and and the speed everything's going, we have to help people move. If you get too far behind, I think people are gonna find it hard to catch up. And so just get moving now is is the best advice. Buckle up.

SPEAKER_03 33:12

Let's go. Buckle up. Well, Kevin, Neil, Kevin, as always, you guys are fantastic. You make it easy here to do this uh type of interview. Um, awesome context, awesome insight. We'll see you uh in a few months out of Cisco Live. And thank you again for the time. I know, like you said, you know, it's it's meeting after meeting here, so I know the time is scarce.

SPEAKER_02 33:29

Excellent. Thank you. Thank you, Brian. Appreciate it.

SPEAKER_03 33:32

Okay, thanks to the Kevins and Neil for joining. It's clear from our conversation that in this market, speed matters, but scalable execution matters more. The lesson is simple. In enterprise AI, experiments are fine, but the ones who operationalize responsibly at scale will be the real winners. This episode of the AI Proven Ground Podcast was co-produced by Nas Baker, Kara Kuhn, Sarah Chiadini, and Addison Inglert. Our audio and video engineer is John Knoblock. My name is Brian Felt. Thanks for listening. See you next time.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

AI Proving Ground Podcast: Exploring Artificial Intelligence & Enterprise AI with World Wide Technology