What's Up with Tech?

Building The Future: AMD's Software Strategy for the AI Revolution

Evan Kirstel

Interested in being a guest? Email us at admin@evankirstel.com

The AI revolution demands a new approach to computing infrastructure, and AMD is positioning itself at the forefront of this transformation through an ambitious software strategy that empowers developers and enterprises alike.

In this illuminating conversation, Anush Elangovan—who joined AMD through the acquisition of his company NordI—shares how AMD is fundamentally rethinking its approach to software development. Rather than treating software as merely an enabler for hardware, AMD now views its software layer as a product with a 10-year lifecycle that transcends hardware generations. This philosophical shift is enabling developers to build with confidence on AMD platforms like ROCm, knowing their investments will be supported long-term.

Perhaps most striking is AMD's embrace of open source development. The company has moved to a model where both internal and external developers work from the same source code, allowing customers to modify core components just as AMD engineers can. This democratization has already yielded impressive results, with community contributors enhancing Windows support for ROCm faster than AMD might have accomplished internally. For enterprise customers, AMD is launching a Developer Cloud that offers free credits to try AMD GPUs with a seamless path from development to production.

Energy efficiency—increasingly critical as AI compute demands grow—represents another AMD advantage. Their innovative chiplet design for Instinct platforms provides granular control over power utilization, with features like CPX mode that can partition compute units and disable interconnects when not needed. Looking ahead, AMD anticipates inference workloads will dominate AI compute demands, and is designing both hardware and software to excel in this environment.

Explore these innovations and more at AMD's Advancing AI event on June 12th, featuring new hardware announcements, RocM 7, and the Developer Cloud launch. Connect with AMD's AI team on X @AIatAMD to join the conversation about the future of artificial intelligence.

Support the show

More at https://linktr.ee/EvanKirstel

Speaker 1:

Hey everybody. I am so excited here to be chatting today with AMD, with a true innovator and industry insider from AMD. We're talking all about software strategy around AI and beyond. Anush, how are you?

Speaker 2:

Good, good Thanks for having me, Evan.

Speaker 1:

Well, thanks for being here. I've admired your work and your team's mission for some time. I'm really excited to chat. Before that, maybe introduce yourself Also for those who may not be familiar with the latest and greatest at AMD how do you describe the company these days?

Speaker 2:

Well, I'm Anirudh Kalingovin. I was the founder and CEO of Nordai that AMD had acquired, and so I came to AMD about a year and a half ago, and it was right at the juncture of an immense push and investment in software, and that's what brought us to AMD, and all the way from the core methodologies that we use to develop software, how we deliver software, how we ship software, all of that has gone through a reboot, if you will, at AMD, and so what we are trying to do and I think we're getting there is build and ship software like a software company, but it's backed by, you know, the 55 years of experience that amd's had in building world class hardware that's a fascinating space.

Speaker 1:

At the moment, um the ai software landscape is rapidly evolving. Every day there's blockbuster news. How do you see AMD positioned on that landscape and how do you see your competitive? You know differentiation these days when it comes to software.

Speaker 2:

Yeah, so you know the kind of um hockey stick curve of like capabilities that are being unlocked by AI. We're just at the beginning of it, right, like you know, um, we're getting into agentic frameworks, we're getting into AI that can help us um, you know, change the way we live and to to unlock that value and to unlock the capabilities of AI, you do need a robust compute and um and and to unlock that value and to unlock the capabilities of AI, you do need a robust compute and AI infrastructure platform, and AMD has been traditionally very good at building that hardware. And so if you look at how AMD has now started to build and have moved their hardware to a yearly cadence, so every year there's a new generation of hardware that's released. We had the MI300, the 325, and pretty soon we're going to be talking about the next generation platform in the next week. That cadence is continuing to relentlessly move forward, but we need to be able to get customers to be able to trust AMD's software layer.

Speaker 2:

On top of that, we look at the software layer as a product in itself that far outlives a generational hardware piece. We want you to build on AMD's Rockham, on AMD's software, and we want to support you with whatever you build for the next 10 years, right. So the software longevity is like we need to plan for the next decade of you developing on AMD software. And so we're putting in the core foundational elements to make sure that we are robustly building all that's required for unlocking AI at scale.

Speaker 1:

Fantastic. I've been a fan of open source for decades and your open source framework, like Rackham, has been quite a blockbuster. What has that played in your strategy? What role has it played, and how do you contribute to these open source ecosystems?

Speaker 2:

Yeah. So open is a very, very important part of AMD's strategy, right? Open ecosystems, open community and open development methodologies. And so we wholeheartedly support open source and we want the best software or the best ecosystem components to succeed so that the best capabilities and innovation happens on AMD hardware. Until now we've been a little, you know, we were thinking of it as like, hey, here's OpenSource, you can do something.

Speaker 2:

But what we have actively started to do now is we are moving all our development, both internal to AMD and external to AMD, from the same source, which means if you were to buy an AMD device, AMD hardware, you can make the changes that you need in this core Rokam software just as much as an AMD engineer can do.

Speaker 2:

And what this means is you're on the forefront of innovation with open source and we build and ship nightly now, and so it gives you the ability to fix anything that you think that needs to be fixed, or if you want to contribute or you want to explore something. For example, we had the strict Halo form factor that was out and we were helping the community build Rokkum on Windows and suddenly two external contributors came by and they did PyTorch and they did all of the AO Triton and now we have world-class Windows support for Rokkum, which is amazing to watch because if we had done it ourselves, it would have taken us a while to get to that point. If we had done it ourselves, it would have taken us a while to get to that point, but the fact that it's open and somebody else really wanted to push forward the innovation there, they were able to contribute very meaningfully.

Speaker 1:

I bet, and you have such a unique bird's eye perspective on hardware and software performance, how do you think about sort of seamless performance optimization between your hardware and software stack, particularly for these AI workloads you're seeing?

Speaker 2:

Yeah, so it is a very fascinating question place to be, because we watch AI at the application level move really fast, right, but hardware tends to move. Even though we're doing it every year, it takes a year for hardware to show up and get new capabilities. Previously that used to take five years, right, like when old school silicon was designed, it would take a five year cycle, but now we're getting down to a one year cycle. But that's still hardware. It's slow, but software is moving really fast. So the software pieces is always like a combination of how do you, how do you use the hardware you have and you're co-designing how you want the next generation hardware to be and behave so that current generation workloads will work well. But then you're also trying to predict what the workloads would look like, because two years ago everyone was like, oh, pre-training, we need the biggest cluster to do this. And now everyone's like test time, compute and inference is the big thing, right, and AMD has invested in inference pretty significantly.

Speaker 2:

So it's good for us that we call that, because pre-training curve is now the scalability laws of pre-training is kind of saturating a little bit, and now it's all about test time, compute and inference. So we need to predict, we need to maximize the hardware that we have today. We need to design the hardware for the future. And to design the hardware for the future, we need to predict to some extent where the software and the applications are going to be so that we can design next year's hardware A great point.

Speaker 1:

And speaking of predicting the future, so much excitement around building Gen AI applications I mean everyone is diving in on the developer side. What does that mean for your roadmap and how are your software teams adapting to this shift and demand for Gen AI applications?

Speaker 2:

Yeah, so I think Gen AI applications and just broader AI applications, I think just scraping the surface in terms of the capabilities of what is possible. We are investing heavily to make sure that day zero, any hardware that AMD ships is fully software optimized, and what that means is you will be able to unbox and have day zero support for PyTorch, jax. You know all the common frameworks, but then you have the agentic frameworks and others on top, right on top MCP and others, where you want to be able to hook into the higher level reasoning and frameworks that unlock the AI layer of the applications, and so we want to get the hardware out of the way. We want to get the base enablement software out of the applications, and so we want to get the hardware out of the way. We want to get the base enablement software out of the way, and then we want to innovate in the places where you know the new models are coming up and the new gen applications that we want to. We want to, you know, enable right, Fantastic.

Speaker 1:

Let's talk about community. You have a tremendous community built around AMD's AI software, so much you know grassroots interest across the world. How important are developer tools and that community and nurturing it and investing in it moving forward.

Speaker 2:

That is a very good question, and that is, you know, something that I always remember Steve Ballmer getting on stage and shouting developers, developers he had a sweaty shirt on, and in fact, for the Rockham meetup a couple months ago in San Francisco, I did the same thing.

Speaker 2:

just to kind of stress the point that we are very, very, very um interested in enabling the developer ecosystem, and not just, you know, it's not just about um enabling it as giving you tools to do what you want to do.

Speaker 2:

We want to co-innovate with you, like we want to be able to get um the developer community working hand in hand, and so anything that we can do, let us know we're going to make that happen. For example, we are, for the first time, we're going to be shipping our new hardware that is just going to be launched next week to the open source community at the same time of the launch, right the day that the product is launched, they're going to have access to the same system and the software is going to be ready. So it's not like, hey, here's hardware and six months later we're going to do something else. We're taking it very seriously in terms of enabling developers. We're going to be shipping out these systems to the major open-source ecosystem player so that day one they can unbox and play with it just as much as an AMD engineer can.

Speaker 1:

That's fantastic. I come from a background in the enterprise and data center where things have traditionally moved a little more slow and cautious. That's changed. I mean the cloud providers, the enterprises, are absolutely clamoring for more AI, more AI tools. How do you make that accessible for those folks with your software platforms?

Speaker 2:

Yeah, so you know, for the retail users that want to get access to AMD, next week we're going to be launching our developer cloud, which you know anyone who signs up is going to get free credits for using AMD GPUs. But then, let's say, you try it out and you like it, you can put in your credit card and continue using it. But the ability to move from development to production will also be reduced, so that you have the ability to just try it and then if whatever you're trying works, you can just move to production right there. Right, so you have the ability to try before you buy, if you will.

Speaker 2:

But also, amd has an immense footprint across all verticals. I mean across all verticals, like embedded. We have a very good presence client, on laptops, on discrete GPUs, on the workstations and then obviously, on instinct. So what we are increasingly doing is giving you a unified interface that you can develop on one of these and deploy it in another. So, for example, I have my Windows laptop with Rokom on it and I can actually build Rokom, entire Rokom on my laptop. This is my primary Rokom development machine and I can deploy on my instinct system as soon as it's ready. So the pervasiveness of AI and the pervasiveness of AMD's AI software is coming together in a way that I think will unlock customers to be able to get the best experience across.

Speaker 1:

Fantastic. Let's talk about growth. Ai models are growing exponentially in number and size. We haven't seen this since maybe the beginning of the internet or the rise of mobile, perhaps the early days. So tremendous challenges. How do you see those challenges around scale, software, scalability, energy efficiency, other issues? What's your thinking at the moment?

Speaker 2:

Yeah, so AMD has a unique advantage in terms of power and efficiency. We've always been very power, efficient, but also just physics-wise. Amd has a chiplet design for the AMD Instinct platforms. So we have eight chips with the HPM memory that's associated with it, so we have immense ability to control where the power is utilized, how we want to turn off parts of the chip. So, for example, we have a mode called the CPX mode, which is we can partition each one of these eight compute units into one of its own and then we turn off the interconnect. That saves quite a bit of power, because now that we know that you don't cross talk between the chiplets. So that's just one example.

Speaker 2:

But energy efficiency is built into the foundation of AMD's hardware and software, and then that kind of unlocks at a higher level, because now you have the ability to efficiently deploy on the systems, but then you are also able to in aggregate serve at scale. I've personally seen data center deployments where we show immense energy efficiency at a data center scale just by switching to AMD CPUs and GPUs. It's a fascinating axis. It's not just about performance. It's performance for what?

Speaker 1:

Fascinating, and you also have to take a long-term view of AI, advancements in software and beyond. Looking three, four, five years ahead, what trends do you think will be most transformative? What are the big opportunities there and how are you preparing for those?

Speaker 2:

Yeah. So I think the big transition that I, you know there are things that I know that will happen, that are already happening, that we're, you know, inference and the follow-on to inference is taking on a bigger portion of the compute and so the thinking models and the reasoning models, they're doing reasoning after the fact that the training was done. So the training kind of like I, I think will, you know, it'll still be a big portion, but then the inference part of it is just going to grow to a point where you know it just depends on like, okay, think for an hour and tell me what to do, right, that's just more and more inference, more and more inference. And you're trying to make decisions on that. So I think the growth of inference is going to drastically continue to increase. And the investment that we have to make is on the ability to weave these together. Where you have the core pre-trained LLM that's generating tokens right, that's a token machine Then once you get those tokens, how do you make it smarter and go through more inference loops that allow you to make more of it, to make it meaningful for whatever task you have? That's one.

Speaker 2:

Then, obviously, on the agentic frameworks some of the demonstrations that you've seen, probably even integrated on Chrome, or like the operator from OpenAI. Those are all like connecting dots for people, where you're like, hey, I don't have to go keep checking kayak every 30 minutes to find if I can get a deal to fly somewhere. Right, that should allow an agent to be just looking at that for you, because you just want to deploy an agent. I think the potential of that is pretty significant, but also how we utilize it and how we bring that to fruition is important, because you want to do something else and deploy this agent and it goes to kayakcom as one of the websites that it has to go to, but then it aggregates this and makes intelligent decisions for you.

Speaker 1:

Fantastic. I can't wait to see all this unfold. I'm super excited for your Advancing AI event. June 12th. It's coming up To get an inside peek at the future of AI and you and the other AMD thought leaders, including Dr Su herself, will be there. You must be getting ready. What can people expect from the event? You'll have thousands of developers there, online and in person. What are you most excited about?

Speaker 2:

I'm excited about, obviously, new hardware capabilities that we're going to be disclosing and launching.

Speaker 2:

I'm super excited about Rokkum 7. We're going to be talking about Rokom 7. Like I mentioned, this is the first time that when we launch a product, the product is going to be available for the system integrators and the ecosystem to start working with us and Rokom 7. I'm super excited about the Dev Cloud, because one of the things that people didn't get access to was an easy-to-use like hey, here's an AMD GPU and you can do whatever you want to do. So we'll be announcing that on stage and I'll be demonstrating some of that.

Speaker 2:

And then there's a lot of customers customer journeys Just as much as I shout like developers, developers, developers, on the other side I shout customers, customers, customers.

Speaker 2:

And so we're going to be bringing up a whole slew of customers that are deploying AMD at scale and talking to them and kind of letting them talk, showcase what they have experienced with AMD, because we hope that this would kind of foster the overall developer ecosystem. The other customers that we are going after, you know, after this. But overall I just think it's a it's a good culmination of, like a good amount of, you know, hardware innovation, software innovation and software investment. I think we just made the announcement on acquiring Brigham. We are going to invest heavily for the long term in software, and talking about that and making people aware about seriousness and the tenacity in which we are investing in software is going to be super exciting.

Speaker 1:

Super exciting, a blockbuster event. What more could you ask for? I've enjoyed your tweets, as well as those from your colleagues on X at AI at AMD, and so I assume people can reach out to you there for more info.

Speaker 2:

That's right. We're there on X or LinkedIn, you know. Just reach out to us, and we're looking forward to seeing everyone at the Advanced AI event.

Speaker 1:

All right, We'll see you there. Sign up everyone Thursday, June 12th. And congratulations, Anoush. It's an extraordinary journey you're on, Take care.

Speaker 2:

Thank you.

Speaker 1:

Thanks everyone for listening and watching.