AI Proving Ground Podcast: Exploring Artificial Intelligence & Enterprise AI with World Wide Technology

AI Factories: Built or Broken? With Cisco's Nicholas Sagnes

World Wide Technology: Artificial Intelligence Experts Season 1 Episode 52

Cisco's Nicolas Sagnes and World Wide Technology's Bob Watson outline why the next generation of AI infrastructure must be engineered as a secure, observable, full-stack system — and why enterprises relying on pilots risk falling behind.

The AI Proving Ground Podcast leverages the deep AI technical and business expertise from within World Wide Technology's one-of-a-kind AI Proving Ground, which provides unrivaled access to the world's leading AI technologies. This unique lab environment accelerates your ability to learn about, test, train and implement AI solutions.

Learn more about WWT's AI Proving Ground.

The AI Proving Ground is a composable lab environment that features the latest high-performance infrastructure and reference architectures from the world's leading AI companies, such as NVIDIA, Cisco, Dell, F5, AMD, Intel and others.

Developed within our Advanced Technology Center (ATC), this one-of-a-kind lab environment empowers IT teams to evaluate and test AI infrastructure, software and solutions for efficacy, scalability and flexibility — all under one roof. The AI Proving Ground provides visibility into data flows across the entire development pipeline, enabling more informed decision-making while safeguarding production environments.

SPEAKER_02:

From Worldwide Technology, this is the AI Proving Ground Podcast. Over the last year, one phrase has surged across boardrooms, engineering stand-ups, and budget meetings, the AI factory. It's everywhere, and like most ideas that spread quickly in tech, it's already at risk of being misunderstood. For some, an AI factory is just another way of saying infrastructure for AI. And for others, it's a marketing term or another label in a crowded landscape of architectures, reference designs, and blueprints. But underneath the hype is something very real. Organizations are running into the same wall. They can experiment with AI, they can pilot AI, but scaling it, actually operationalizing it and turning it into something that touches customers, employees, and real workflows is where everything tends to break down. So on today's show, we're talking with two people who spend their days helping organizations break through that wall. We'll have Bob Watson, a principal solutions architect here at WWT, someone who has helped design and deploy AI infrastructure for some of the world's largest organizations. And Nicholas Sognis, Cisco's product marketing leader for AI infrastructure, and the Cisco Secure AI factory, who brings a front-row view into how enterprises are wrestling with the practical realities of security, networking, and day two operations as AI moves from pilot to production. Together, they'll help us unpack what an AI factory really is, why security can't be bolted on, how networking quietly becomes the silent bottleneck, and why readiness may be the single biggest differentiator for enterprises heading into 2026. So stay with us. This is the AI Proving Ground Podcast, everything AI, all in one place. Let's jump in. Okay, well, Bob, Nicholas, thank you so much for joining me here today on the AI Proving Ground Podcast. How are you? I'm very well. Thanks for having me. Excellent. Bob, how are you? Doing great. Happy to be here. Awesome, awesome. We're here to talk about uh the AI factory. And uh Nicholas, I'm gonna start with you. You know, everyone, it feels like these days has their own flavor of AI factory. It's it's becoming um a buzzword. I it some might say it's at risk of losing its meaning. From where you sit, from your advantage or from your vantage, from Cisco's vantage, what what is an AI factory? Maybe what is it not?

SPEAKER_00:

All right, so we're diving right in. That's good. So yeah, I mean, an AI factory is you'll hear it everywhere now because of NVIDIA. You know, NVIDIA really understands the the problematic with AI in general. And so they they really came up with that framework uh and blueprint of architecture that you need to deploy any type of AI infrastructure uh to support AI workload and get to the value of AI applications and expand you know the benefits of AI. So that that's why you need an AI factory. At Cisco, we we we you know, listening to our customers and the like basically the wide enterprise base, but also ourselves, you know, deploying our own AI. Uh, we we really saw the issue around the security and the network and uh and the the the issue about making it uh more of a viable system uh in an existing enterprise and really taking like every single element of that stack holistically as a system, uh so to build a secure AI factory with Nvidia. So that element secure here is key because that that's really what we we saw is a lot of enterprises could not get from pilot to production, uh, or at least not as quickly as they should, uh, wasting a lot of time, wasting a lot of capex. Uh so building in from the ground up the fabric, the data plane, the data ecosystem, uh, the compute, the security at every layer of the of the stack uh is key. So that's really the approach that we took is to really engineer that full stack uh infrastructure uh following Nvidia uh reference designs and reference architectures, but but designing it ourselves. Uh we have a process called Cisco CVDs, uh validated designs, where we we re-engineer the entire stack of AI from hardware, networking, uh storage, uh partnering out with ecosystem partners, um, and then embedding the the the software stack on top. So NVDI AI Enterprise, of course, is the is the main one here to be able to develop, train, optimize, accelerate workloads. Um, you also have uh a host of orchestration tools like Run AI, uh, which is great, you know, to be able to slice GPUs and allocate GPUs dynamically, uh, expand and optimize the utilization. Um, and then you you have uh also the container stacking, the container layer, which is uh uh very important in order to uh efficiently direct workloads, uh control them uh without impacting performance, um, and then yeah, again, optimize these GPU resources. Yeah.

SPEAKER_02:

So yeah. I mean, Bob, that that was a a comprehensive covering of what AI Factor is from the Cisco point of view. Um before we unpack a lot of that, why is this term gaining so much traction right now? Is it just the time is right, the technology is right, or is it just kind of capturing some of the momentum that we've got we've had around Gen AI over the last four or five, six years? Aaron Ross Powell, Jr.

SPEAKER_01:

So that's a good question. I think in a lot of respects, what NVIDIA had done when they coined the term AI factory is creating an analogy that is relatable to the masses. So everybody understands the concept of a factory where you have raw materials coming in and you have finished goods coming out, right? So that's the basic premise of a factory. And in the terms of an AI factory, you have your raw materials, which is data, the information contained within large language models, the information that's contained within your enterprise organization or your research university, there's information that a lot of these models are trained on, but they're in the public domain. Right. And then there's an an equally large amount of information that's in the private domain that these models haven't been trained on. So taking both data sets, being able to analyze and process that information through a factory like workflow, and then on the outside of that, you have um use cases where we have agentec workflows where we're giving these systems agency to work on our behalf based upon the data collected and the questions queried, or the intent of what the business wants to achieve. So if we think about an AI factory of raw root materials coming in, from an AI perspective, that's data, both private and public. And the information coming out, and this is in the form of tokens typically, right? Um, good tokens give you great answers, and tokens that aren't so well, as you know, lead to things like hallucinations or delays. Garbage in, garbage out, yeah. Exactly. So I mean, I think when NVIDIA terms AI factory, it gives it a very relatable context to be able to understand what's going on in a very complicated environment. And the complexity of that environment, like Nicholas said, we have so many different layers of that architecture. We have that trinity of compute network storage, we have an orchestration and workload layer, we have the applications that sit on the top to be able to provide that. The applications contain things like large language models as well. Um, you have security. So we want to make sure that the information is protected, not exfiltrated, not um corrupted or abused in any any sense of the in an innate fashion. So, you know, uh Cisco's approach provides in in our opinion, as you know, one of one of the a phenomenal partner to both Cisco and NVIDIA gives a unique perspective on this factory when it comes to things like security. Yeah, as um Cisco has a very strong security um presence in the market as well as observability in day two operations, right? Their acquisition of Sablanc a few years back gives them insight and visibility into the full stack of everything going on within a factory, whether it's visibility for data scientists, C levels, infrastructure managers, security uh personnel for security insights and incidents and things of that nature. So and then of course, we can't leave out networking, right? Cisco pretty much paved the way to ubiquitous access to data everywhere through its strengthened network.

SPEAKER_02:

I'm glad you brought up security because that's where I was gonna go, Nicholas. Um, certainly we see a lot of you know what's going on in the market right now, AI Factory starting to balloon into that that buzzword, but Cisco is one of, if not the only, one putting secure next to that that name, AI Factory. Tell me why that's important. Um, why that is such a strong footing for Cisco moving forward.

SPEAKER_00:

Yeah, yeah. And I think I I think it's a it's a very important point. You know, when enterprises get into AI and they they decide to buy GPUs, um, you know, most of the time they're going to be successful in in some capacity in some islands, but they will have a lot of roadblocks and bottlenecks very quickly. Yeah. Uh on the on the networking side, uh, on the governance, security, observability, not being able to share the system across the organization, or having uh AI models that conflict directly with the security policies that that are in place, uh heterogeneous systems distributed worldwide. So all these all these problematics that the enterprise face in the day-to-day that are very common to the IT uh and mission critical uh leads, right? And so so now that AI is coming into this infrastructure, so how can I, as a uh, as a IT data center manager, network manager, how can I implement these the that GPU computing capacity and make it available to uh basically all my working groups uh and make sure they can do it securely and that's not gonna uh crash my network and not gonna introduce security vulnerabilities. So that's that's why you know we took that approach of having that full stack security where if I start from the top on the application layer, we developed AI defense, which is uh a model validation and model control software, uh, where when you can leverage open source models, you can apply your policies across the board everywhere. Um, you can uh see how the model behaves, uh, what kind of data it's fetching, where it's fetching it, what data it's sharing. Uh you can put it uh you can put some some guardrails in place automatically. That's what we use for us. You know, all every Cisco employee today that goes into our own Gen AI tool, uh uh everything that you upload online will be scanned through AI defense. Everything is watermarked in the database. So it's so here it you already have some components of a system. So that's just the top layer. It works with it works better and well with a system that integrates security on the container layer. So we acquired that company, uh Isovalent, uh, which is the company that invented the the CLM uh open source model and yeah contributed to eBPF protocol, so which is uh basically the ability to open a container uh inline uh in the network.

SPEAKER_01:

I'm not too technical deeply, so I don't really understand the you know how that works, but yeah, you can so I so Cisco's approach to securing the AI factory comes in in many different layers, as Nicholas was talking about. And when it comes to there's the network layer, and Cisco has a very deep security history with things like next generation firewalls, understanding segmentation strategies and zero trust architectures within um a customer environment, being able to leverage that and then extend that into the container workload environment with iSovalence. So our partnerships with Red Hat as a Kubernetes or container workload management system coupled with the the security presence of Cisco iSovalent, allows the ability to be able to take a look at through the container networking side and say, how have we been securing and isolating and pr and and creating guardrails and controls around our traffic flows in a traditional sense? Now, how do I apply that to the world of AI and AI workloads, right? So as OpenShift spins up these different containers, iSovalent is the network interface within the container networking space that provides security policies, segmentation, things of that nature. So all the way through the stack, Cisco's deeply thinking about you know, whether it's from containers and workloads to the applications and protecting the AI applications with something like AI Defense. Whether they're on-prem models or cloud models, AI Defense works ubiquitously. It will work with ChatGPT as well as private Lama, private models like Llama 3, so on and so forth. And you know, we've done early testing and implementation of the on-prem version of AI defense, and we found that a lot of these open source models have a lot of vulnerabilities right out of the gate that um security and bad actors can take advantage of. So having a first line of defense before your user base or your agentic user or agentic applications get access to the data that's that's in your environment, that is critical, right? Because these models don't necessarily secure themselves or protect bad actors from being able to find and expose information that really shouldn't have been exposed. So guardrails, data loss prevention, things of that nature, inspecting these models to make sure that they're addressing all of the vulnerabilities that are inherent within a lot of these models, whether they're commercially API access models like ChatGPT or the private models. It's it's very it's a very thoughtful way to look all the way across the stack from infrastructure to workload to applications, yeah, and having a security strategy address all the different fields. And you know, from us as a Cisco partner, we feel that is a little bit unique of all of NVIDIA's partners that are doing AI factory solutions. They have security has a very or Cisco has a very strong security business unit. And we look at the we look at it holistically, right? Some of NVIDIA's partners have native storage and native compute, right? But maybe they don't have native networking. Cisco, they don't necessarily have native storage, but they have native, they have a lot of the other pillars address from compute to networking to security and observability, right? So in some sense, customers are concerned about vendor lock-in, but then the corollary to that is the more aligned you have um under a blue chip OEM, the less friction and the more you can accelerate your speed to production and adoption. Yeah. So these are the trade-offs that customers are having conversations with worldwide about.

SPEAKER_00:

Yeah. I think I think we can also, you know, on the security side, there is that that container orchestration application layer, but also on the hardware side, we we really went deep. We're the only OEM today uh to integrate Nvidia chips into our switches. So we have a project called Silicon One where we develop with NVIDIA our smart switches and the spectrum partnership. And the spectrum partnership, etc. Uh that that basically is part of that hybrid mesh firewall reference architecture, which distributes security across the stack and across the the network. Uh we've had also later uh latest innovation on on switches that embed uh DPUs in them. Uh so you can have uh basically layer four security solved uh out of the box. And so you eliminate that that extra bottleneck of the firewall. So you can basically control your AI workload and secure them in line without impacting performance. And this is this this is really huge. But that that whole security, again, if you look at it as an island, it's good, but it's not gonna change uh fundamentally the problems you're gonna have. You need a full system that that is integrated, and a way we we solve that uh and is talking about the life cycle is intersight. So intersight is really key. Um when we met with customers recently, we had uh uh a few of our uh customers in Austin. Um uh and uh they were all sharing their experience about Cisco InterSight and how lifecycle is is critical, the ability to control compute and uh and you know uh check utilization and how the compute is is uh um you know leveraging the network or clogging the network with uh our Nexus dashboards. So all that data that you get from your infrastructure, being able to compile that and and observe it at every level, um that that creates a system that you can act on and that can you know rem remove all these uh all these AI hurdles. So um, you know, uh Splunk, you mentioned Splunk earlier. Splunk is kind of the end game. Uh you you can control your full infrastructure uh at every layer, bring all kinds of data and correlate the data, um, even from you know the power and cooling. You know, how much is my model gonna impact my power and cooling infrastructure? Now with Splunk, you can do it.

SPEAKER_02:

Well, yeah, we focused a lot so far on security, and security certainly a differentiator here for Cisco, but there's there's a lot of other things to get into networking observability. Um we haven't even talked about AI pods yet or the reference architectures too too deeply. Um, Nicholas, maybe let's let's let's start with networking. Um can you tell me about you know where are we seeing AI networking act as a bottleneck and how is you know uh Cisco's secure AI factory helping relieve some of that congestion?

SPEAKER_00:

Yeah, uh so definitely you know congestion is one of the recurring uh uh challenges here on the network. Uh the ability to have a network that can handle all these uh heavy packet flow, uh all that east-west traffic, to have uh the back-end fabrics that are well dimensioned to support the GPU traffic, and then the front-end fabrics that are you know embedding all that security. Um so having uh that that in mind, we we have basically two approaches. The first one is if you have the the competence, the network expertise on staff, you you mostly will want to go with uh an on-prem management structure. So that's what Nexus dashboard is for, uh using Nexus 9000 switches. Um and and you're gonna be able to go deep and have a complete control over what transact through the network and have a good understanding of the AI workflows and how you can control that congestion and redirect traffic dynamically. Um but if you don't have that expertise and some some enterprises don't really, they they had a cloud-first strategy. Uh, they use managed services providers. And so in that case, we came up with a concept uh called hyperfabric and hyperfabric AI, which is kind of a turnkey infrastructure solution that is cloud managed, uh, where you can deploy uh uh switches in your data center that receive directly their their management and control from the cloud instance, and you can apply services to it, which is what the partnership with uh Worldwide Technology is great there, uh, where we can have uh you know remote hands uh in that infrastructure and have kind of the same benefits uh but without having to do it on-prem. Yeah, so that's kind of the two philosophies there to optimize the the AI fabric. Yeah, um, and so you you spoke about AI pods a little bit. So AI pods is basically how so now great, we have all these elements. We we haven't talked about data yet, but we have all these elements. So, how do I deploy it? Uh how do I remove the guesswork? I don't want to have to design my cluster, optimize my cluster for maximum performance. Performance, you're gonna go towards uh AI pods, that's really the way to deploy it. Uh, we took all these reference architectures, reference designs, and uh and and tested it and validated it. We deployed them uh ourselves in our own data center for Cisco and Cisco. Uh and so we know it works out of the box. So you can just choose your density, choose your use case application, and then you'll have your pre-optimized fabric uh with on-prem or cloud managed options. You'll have your compute, so you can choose your compute density, what kind of GPU for what kind of application. Um, you know, and here there is the full breadth of solutions there as well.

SPEAKER_03:

Yeah.

SPEAKER_00:

Um uh and and then the data, you know, that will give you the choice of what kind of data platform you need to adopt uh for your AI application, because data is another bottleneck, right? It's like you cannot be successful in AI if your data is not accelerated, distributed, and available everywhere. Uh from you know, your if you have edge applications or if you're training a model, uh, you need to have these data sets ready and fed to GPU. So that's why uh we ramped up a very strong partnership with VastData that that has been really tremendous in uh advancing our roadmap as well. Uh we qualified systems together uh on the Cisco C225 and we run VastData platform and and went the step above uh embedding Vast Insight Engine and AIOS uh into our compute. So uh basically you have a turnkey rag ready platform um uh out of the box. That's that's really the concept here. Uh and so that's really helpful for you know any kind of heavy training to rag and inferencing um um in the data center. But there is also the edge. So the edge is is really the next frontier for Secure A Factory, is uh out of our announcement uh uh last week uh for uh Cisco Unified Edge, where now you can run any type of edge workflows uh um and accelerate them locally and have very like virtually no latency on your AI application at the edge, you will have that data fabric with Nutanix that will be you know very tremendous in accelerating these workloads and making sure that um you know your your investment is safe and sound and always with that security. You know, security is always in the background by default. So that's kind of that philosophy, right? Uh see that as a system, design don't don't design uh pockets of infrastructure, but design systems across the board.

SPEAKER_02:

Bob, I mean let's pick your path here. We we touched network, we touch AI pods, we touch data, and then we we touched on edge, which I didn't even know we were going to get into as the next frontier. I'm gonna give you a choose your own adventure. Where do you want to go from here? And I'll ask you a question.

SPEAKER_01:

Um so I I think under having an understanding of uh the full stack and all the different components of an AI factor we talked about. So maybe we can dig into a little bit about it. We have I think there's a lot of confusion in the environment around reference architectures, validated designs, and why customers are attracted to a prescriptive approach that de-risks a lot of the very complex infrastructure decisions that need to be made. Um I think now more than ever, having NVIDIA setting some North Star blueprints of reference architectures and having um NVIDIA's partners like Cisco creating validated designs afford customers that have never had to in the past historically build an AI factory before, have ever had to scope and understand the enormity of network um load that these systems create. Right. We talked, we touched a little bit about East-West networks. Traditionally, in an enterprise or a higher ed data center or a colo environment, you've never had purpose-built, high-speed East-West networks doing collective communications like this in the past ever. Um we're building 400 with in collaboration with Cisco, we're building out 400 and 800 gig networks. Um, these backplanes are multi-terabyte in nature. There's new architecture patterns like rail optimize and rail-only environments that allow a very large cluster of GPUs to be communicated together. So these are all new frontiers for a lot of our customers. And they don't know what they don't know. Right. So these validated designs and these reference architectures de-risk that. Yeah. Saying, come to Cisco, come to Worldwide. You've done a lot of these. Show us your best practices, show us how do we get our tokens to production if we're from a design and ID idea standpoint.

SPEAKER_02:

Uh just a little bit of a devil's advocate on that, Bob, and I'll stick with you here. Um 100%, it feels like people are wanting that prescriptive nature to get to the reference architecture so that they can move faster and to get it over the finish line. Absolutely. But things are changing so quickly that do you think that perhaps there's some fear out there that if I stick with this reference architecture right now, something might change and pop up in the next week, month, whatever it might be. And I'm assuming that's where we start to say this is where integration and an ecosystem come in. How do you kind of combat that?

SPEAKER_01:

No, you you're absolutely right. Like the pace of change in this field has accelerated new innovations, new capabilities, and new demands on your systems like never before. So to you know, last year our customers were buying hopper, this year they're they're cutting POs for Blackwell chips, right? The year it in the next two years that we're gonna be looking at Ruben chips by Nvidia. So, you know, you make a very good point. I would say, and we've struggled with this, and Cisco has, along with our customers. Um a couple of the things that allow us to not lose our mind when we're taking these approaches, is understanding the difference between a reference architecture and a validated data design. A reference architecture is just that, it's a reference on how to compose these different systems and what balance of network compute and storage makes the most sense for your use cases to create an output. And a reference architecture is only good if it can allow you to scale and move as your workloads evolve and become more demanding. Right. So you can print out a reference architecture and that and stay within the same architecture, but start changing the speeds and feeds, right? Your generation of GPUs, your port speeds, you can move evolve from 400 gig networking to 800 gig networking. Just because you're moving the performance envelope of a specific company doesn't mean that you have to reinvent your architecture, reinvent your design. Sure. Right. These references allow you to scale to generations at basically future-proofing your roadmap until it comes a time that you know you've you've fully gotten out of gotten your investment out of that very large investment buy, and you're starting to do life cycle planning for the next generation. And there you you come to an impetus around do we need to modify our architectures based upon our need finding because of new learnings?

SPEAKER_02:

Yeah. No, that makes sense. Um well, we're talking about, you know, Nicholas, uh Bob mentions you know, we've never had this amount of uh traffic or this speed of traffic on east-west networks. Uh, we're talking about the edge. Um, I'm assuming this is where observability really comes into uh play as key here. What types of uh what types of risks or uh challenges or something that you wouldn't uh necessarily think of? What arises when you have that observability across across everything?

SPEAKER_00:

Yeah, I think, you know, I think different groups have may have different requirements in terms of what they look for. Um, you know, I think the the the investment side uh and the the line of business side uh you know most likely will want to see ROI. Um is my are my GPUs running correctly? Uh are my applications running? Um and then you have the IT and networking side, they you know, everything has to stay up. Um how can I make sure that uh my network is not going to be uh uh taken down by the the vast amount of traffic? How can I make sure that the latency stays low or or or zero at the edge or or in any type of you know rag uh inference application? Um so the observability platform across the board, uh that's where Splunk really helps, is that to to to get all these data points from the machines themselves, you know, from in a GPU server, you know, the ability to uh to understand uh what's my uh what's my GPU utilization, how many cores are being utilized, um uh how what's my fan speed, you know, all the the that infrastructure data. Right. Sorry, and uh and then um on the networking side as well, uh the the port utilization, uh the bandwidth, you know, uh all these metrics about congestion that we talked about, uh this gonna be this is gonna be critical to uh to um uh correlate in that platform. So um so taking these data beat bits together, correlating them and creating your own dashboard with your own metrics, uh it's really the key there. So you don't start from from scratch. We came up with uh a set of dashboards that are really relevant to specific you know uh use cases and applications. So where whether you're uh managing distributed compute uh at the edge and you have hundreds of thousands of locations and you want to bring them in the same uh pane of glass, or you know, if you have multiple teams working from a data center cluster and you want to distribute uh GPU capacity to these teams, uh we have ways to do composability right in the machines, like for example, with our X series platform, uh where you can uh basically design your server on the flight from InterSight. And then once you have these uh these virtual servers uh in containers, uh then you can also leverage a run AI software layer uh to vent this even further. So it's uh and on top of that, you have also new partnerships like Rafay, for example, for GPU as a service uh that can be applied. So a lot of flexibility uh in this reference design, as you said, you know, you you have the reference design that is the North Star, and everything has been uh uh predetermined and pre-designed, deployed with an AI pod, but now you have the freedom and flexibility to even come with your own tools. If you want, you know, you're using a specific MLUps platform or data platform. Bring in your own tools, it will fit into a resilient, secured system across the board to to to you know make your your workflow successful.

SPEAKER_01:

Yeah. So Nicholas, I think I um one of the things around observability that we've noticed is historically we've tied management and monitoring to specific domains. Meaning if you build a network fabric, and in Cisco's case, you say you're building a network fabric with Nexus dashboard or a hyperfabric cloud interface, or you're building a compute complex and you're managing that with Cisco Intersight, or you're looking at security products and you're looking at Cisco Security Cloud Control. Having domain-specific element managers is phenomenal. Being able to monitor takes a little bit of a nuance. And what I mean by that is all of these element managers can collect and receive information about the health of their specific domain. When we start looking at the larger problem around an AI application or the different personas that are involved in AI strategies, we have users consuming the applications, ML ops and data science teams building applications, we have platform teams creating workflow orchestration and workload management solutions, and we have infrastructure domains, right? And if you have a problem in the system, how do you know where to go? Right? So now this is where monitoring moves a little bit farther away from your traditional domain management. If you have an aggregation platform like Splunk, you can correlate and get to the right answers faster. Is it a problem in the application? Is it a problem in security? Is it a problem in in any of your infrastructure? If you have aggregated data coming in and you're applying things like AI systems, which Cisco develops, and there's um AI agents within the Splunk environment. They're working with um AI Canvas and that allows cross-context understanding and speed to resolution. Right. And gets into things like predictive analytics where you're seeing early signals on the wire, and you can be able to before it becomes at an impasse or user-affecting, you can take the the correct preventative measures or maintenance measures, right?

SPEAKER_02:

So do you think that uh organizations, at least as we see out there, you know, in the real world, are sleeping on that observability as kind of one of the key ingredients to the future of AI success? Or does everybody kind of generally say, yeah, that would be exactly what I want? I'm just struggling to actually get there.

SPEAKER_01:

So I it's been an age-old demand and problem statement in technology for many, many years. And and the reason I say that is we've had observability. Like Splunk's been around for a long time. So is other platforms like Grafana, Datadog, um, Logstash, things of that nature. Um, the problem in the past is the the volume of the data at hand, the velocity of the data. It's like finding a needle in a haystack.

SPEAKER_03:

Yeah.

SPEAKER_01:

And so a lot of times historically, these these monitoring solutions that were boiling the ocean, they were effective and they were required, but over time they were seen as more of a compliance product. Today, with AI disrupting every field, now finding that needle in a haystack is not on the onus of a human, but it's on AI. And that changes the game.

SPEAKER_00:

I'm glad you're mentioning this.

SPEAKER_01:

Big unlock.

SPEAKER_00:

I'm glad you're mentioning this because with Splunk, you can deploy Splunk on an AI pod, specifically dedicated to learning your environment with the right credential and the Splunk pod, basically, that you create with Splunk running on an AI pod for your infrastructure, will get all that data and will get you the insights. You'll have specific agentic workflows dedicated to your observability. And that that's possible.

SPEAKER_02:

Yeah. So far, I mean, we're coming up at the bottom of the episode, and and you two have been you know super insightful, and I very much appreciate that. Uh we're we're offering up a little bit of a roadmap here on how to best position an organization to move forward with AI. But still, we hear about pilots stalling, projects not coming into production. Where Bob, where do you see that gap right now? What maybe what are some signals that are that would show you early on that an organization is going to succeed or a company is going to struggle?

SPEAKER_01:

So a lot of that is not necessarily within a lot of our engagements at worldwide. We see slowness of movement, not because there's not an appetite to transform your environment and make it AI ready or AI enabled. It's more around what does it take to roll up our sleeves to understand our data, our environment, our current processes, our technology silos and dependencies, and figuring out, okay, how much of this is are we reworking, how much of our data is ready, how much um it's it's all the complexities of doing business, and then reorientating your business with a a whole new landscape of capabilities, right? And that's a very challenging endeavor that most customers move on. And then when we think about moving from pilot to production, that jump from an MVP and a science experiment to be able to provide business value at scale is a very significant jump, right? Um a lot of customers aren't going to in our experience, aren't investing tens and hundreds of million dollars into a science experiment or an MVP. Sure. They want to start small, prove out the business case and use cases, right? Getting a six-rack environment to prove out a use case is not a natural motion for a lot of these customers. So a lot of things that we find is they'll start with their science experiments in the cloud. And then when they have to start looking at what does it mean to provide value at scale through their AI use cases, then we they start looking at do I do I rent and continue my my cloud path or do I own, right? I have a data center, I have racks, I have infrastructure teams that are understanding of compute, storage, and network, and I have platform teams that understand container management. So what are the synergies of and what are the trade-offs between continuing to rent to being to own? And some of those decision matrices, things around where your data is, how sensitive it is, how you know, how reliable and how critical access to that information and how much redundancy is built in. How much of that are you when you're renting, do you control your own fate around all of these decisions, or do you actually outsource some of that to your providers versus if I own my factory, I have full control over the fate because this is my asset and I can take that into the direction that the business needs. Right.

SPEAKER_02:

I mean, a lot of what you're getting at in general talks about AI readiness there. Um, Nicholas, I was I was reading, I think it was the Cisco AI readiness report that released just a couple weeks or maybe a couple months ago. Um, and it defined AI readiness as one of the more ultimate differentiators for organizations. It's readiness feels like a little bit of a subjective term. I'm wondering from your vantage, what do you, you know, what do you think AI readiness means as we kind of wrap up here in 2025 and start to head into 2026?

SPEAKER_00:

Yeah. Uh I think it comes down to we're gonna have to invest at the end of the day. You know, if you you know, if you look at legacy enterprise systems that are siloed, uh, as you said, you know, uh compartmentalized, sometimes the nature of the business, if you're under compliance or if you're if you're if you're government uh or government contracting, et cetera. So some some complexities are inherent to to the business, but there is uh uh uh some sort of of um uh you know basic uh understanding that we need to upgrade the fabric. We need to have a um you know a fabric that is AI ready that can handles these handle these workloads. We need to have the right GPU platforms uh that are optimized for specific workloads. So here we have multiple choices now, and the choices are becoming simpler with on one side the RTX Pro server approach for kind of the general workhorse type of GPU, and then you have the uh the the more uh uh performant HGX or DGX systems, NVL systems for really high performance workloads at scale, but not every enterprise is able to deploy that. Right. Uh however, they want the logic and uh the the best practices of these types of clusters applied to their enterprise. So that's where uh that security factory concept uh is really important is that you will take these best practices and click on the click of a button or a few buttons, you'll get there quicker. Yeah. And so you'll get your readiness by adopting that concept and and the you know, the security factory uh in your enterprise that will give you that kind of roadmap blueprint, so to speak, to uh to get there faster. Um uh you know, uh understanding your data. Well uh as you said, where where is your data located? Do you have friction today in moving your data around uh your enterprise or egress or ingress from cloud? Um, you know, how does that look like? Where is the data residing? So solving that issue is uh adopting solutions like VAST. That it's the the NVIDIA AI data platform reference design at the end of the day, um uh distributing data everywhere, bringing GPU to that data so the data could be computed and and reduced uh at scale and brought to the AI applications faster. Uh and that's not really a heavy lifting, it's just a mindset change.

SPEAKER_01:

Sure. So and I mean, from we take a little bit of a different stance at worldwide around AI readiness. You know, one of the things we we advocate is meeting the customer where they're at on their AI journey. And when we think about that, all of our customers have a very large appetite, like they're ready for AI disruption within their organization, within their business, right? The path to get there is wide, varied, and complicated. So, you know, having reference architectures and s and you know, uh excellent partnerships with companies like Cisco and NVIDIA, as well as you know, our storage companies like Cisco works very well with VAST, but they've had a long history with NetApp, Pure. So meeting the custom maybe the customer is a NetApp shop, meeting with where they're at, what does that mean? Well, it might be higher friction to have a switching cost to move over to a brand new storage environment. Sure. Um, lower friction if they use their incumbent and figuring out what's on the truck from say Pure or NetApp that is AI ready today, that would fully complement a Cisco factory. Um us as a partner will have an understanding of everything it needs to create a token of production, right? So our partnership with NVIDIA, understanding NVAIE licensing and NIMS and blueprints and everything that affords, run AI, um our partnerships with structured cabling companies like Corning and Panduit, because we're talking about thousands of counts of high high density fiber bundles within your data center environment, racks, power, power readiness, power measurement, um, our ability to complement the factory with expertise with Red Hat from IBM because they have a container component to it. Our ability to accelerate production operations by using massive staging facilities that are global that can integrate all the different OEMs, the structured cabling, the power, the racks, so that when the customer sees it on the dock, it's not 200 boxes from Cisco. It's a fully integrated, complete rack that has Panda, that has APC, that has Vertiv, that has VAS, Pure, NetApp, Dell. It it really, you know, from a storage meeting with the order they had, they they they'll have their own opinions of what storage they would like to select as well. So keeping the options open and keeping the customer satisfied to figuring out what is their path, their their lowest friction path to success is something that we're always cognizant about.

SPEAKER_02:

Yeah, no, absolutely. Uh well, we are up at time here. Uh to the two of you, thank you so much for taking the time. I know at the time of this recording, you guys are both busy at Supercompute 25. So enjoy your time at that conference and uh hopefully we'll get some good insights out of there as well. So thanks again. Perfect. Thanks so much. Okay, if AI is going to matter beyond demos and pilots, it needs a home that looks and behaves like a factory. Data in, governed and secured at every step, and trustworthy results out at scale. From Nicholas, we heard how Cisco and NVIDIA are turning that idea into a secure, full-stack system where networking, security, and observability are designed in from day one, not patched in later. And from Bob, we heard why enterprises can't do this alone. They need prospective blueprints, validated designs, and partners who can meet them where they are, on-prem, in the cloud, or somewhere in between. The lesson is simple. AI readiness isn't a feature of a model, it's an attribute of your entire environment, your data, your networks, your security posture, and your ability to observe and adapt. This episode of the AI Proving Ground Podcast was co-produced by Nas Baker, Kara Kuhn, and Diane Devery. Our audio and video engineers, John Knoblock. My name is Brian Felt. Thanks for listening, and we'll see you next time.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

WWT Research & Insights Artwork

WWT Research & Insights

World Wide Technology
WWT Partner Spotlight Artwork

WWT Partner Spotlight

World Wide Technology
WWT Experts Artwork

WWT Experts

World Wide Technology
Meet the Chief Artwork

Meet the Chief

World Wide Technology