The Macro AI Podcast
Welcome to "The Macro AI Podcast" - we are your guides through the transformative world of artificial intelligence.
In each episode - we'll explore how AI is reshaping the business landscape, from startups to Fortune 500 companies. Whether you're a seasoned executive, an entrepreneur, or just curious about how AI can supercharge your business, you'll discover actionable insights, hear from industry pioneers, service providers, and learn practical strategies to stay ahead of the curve.
The Macro AI Podcast
The AI Compute War: Why Anthropic Is Paying xAI for Colossus
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
In this episode of the Macro AI Podcast, we break down one of the most important AI infrastructure stories in the market: Anthropic’s major compute agreement with Elon Musk’s xAI and SpaceX infrastructure.
At first glance, the deal seems surprising. Anthropic, the company behind Claude, is backed by Amazon and Google and competes directly with xAI’s Grok. So why would Anthropic pay for access to Colossus, one of the largest AI compute clusters ever built?
The answer points to a major shift in the AI market. AI is no longer just a model race. It is becoming a compute race, a power race, and an infrastructure race.
Gary and Scott explain what Colossus is, why xAI’s rapid buildout matters, and why Anthropic needs massive production capacity to support Claude’s growth across enterprise users, developers, API workloads, coding tools, and agentic workflows. They also explain the difference between training and inference, and why inference is becoming the day-to-day economic engine of frontier AI.
The episode also gives CIOs a practical view into the market cost of AI compute. High-end NVIDIA H100-class GPU capacity can vary widely depending on provider, commitment level, scale, networking, storage, support, and availability. We compare typical enterprise GPU pricing to Anthropic’s reported $1.25 billion-per-month agreement and explain why the deal should be viewed less as a simple GPU rental and more as an industrial-scale capacity reservation.
The key takeaway for CIOs: AI strategy now requires infrastructure strategy. Enterprises need to understand where inference runs, what providers are involved, how data is handled, what happens during demand spikes, and whether their AI vendors have enough compute capacity to support business-critical workloads.
This episode is essential listening for business and technology leaders trying to understand the next phase of enterprise AI, where model performance, compute availability, power, cooling, network design, vendor dependency, and cost governance all come together.
Send a Text to the AI Guides on the show!
About your AI Guides
Gary Sloper
https://www.linkedin.com/in/gsloper/
Scott Bryan
https://www.linkedin.com/in/scottjbryan/
Macro AI Website:
https://www.macroaipodcast.com/
Macro AI LinkedIn Page:
https://www.linkedin.com/company/macro-ai-podcast/
Gary's Free AI Readiness Assessment:
https://macronetservices.com/events/the-comprehensive-guide-to-ai-readiness
Scott's Content & Blog
https://www.macronomics.ai/blog
00:57
Welcome back to the Macro AI Podcast. I'm here with my co-host, Scott Bryan. Today we're digging into an AI infrastructure story. And you probably see a lot about this company Anthropic, but Anthropic is the company behind Claude that has entered into a major compute agreement with Elon Musk's XAI and SpaceX infrastructure, which is rather intriguing.
01:26
Um, at first glance, this sounds strange. Anthropic is backed by Amazon and Google. It competes directly with X AI's grok. Um, but Elon Musk has publicly criticized Anthropic in the past. So kind of scratch your head a little bit and say what's going on. Um, but yet. Anthropic is now paying big bucks for access to one of the largest AI compute clusters ever built. So Scott, what, you know, what do you, what do you think about all this? Yeah. Well, it's, it's definitely on the news quite a bit.
01:56
recently and I think the announcement was quite a surprise to a lot of people who have been paying attention to the industry. So the obvious question is, why would Anthropic do this? And I think it's pretty simple. I think the answer tells us a lot about where AI is going and then the massive demand for compute, which is why they jumped into it. Agreed. And this is not just a story about things that you would expect like GPUs. It's about compute becoming one of the most strategic resources in the world.
02:25
you know, people could argue it could be more strategic than oil at some point. And with compute, the story is also about power, cooling, data centers, uh networking and inference capacity. And the reality is that AI is now constrained as much by physical infrastructure as it is by software. And I think we've seen that with a lot of, you know, build outs and lack of build outs. think Scott, when you saw this deal, you know, what really jumped out at you? Yeah, I think
02:55
I think the headline was interesting, but I think the infrastructure story underneath it is the real story that people are starting to really talk about. I think the simple version is that Anthropic is just they're buying compute from XAI or SpaceX. So score another win for Elon Musk and his ability to kind of navigate business strategy because people thought he just spent a lot of money on Colossus and XAI was a distant third.
03:23
But I think the deeper version is that, you know, one of the leading AI companies in the world is growing so fast because of its capabilities. Everybody's using it for coding. I got a lot of attention for that, but that even as existing hyperscaler relationships that they're leveraging are just not enough to satisfy their compute demands. So they had to reach out to a company that's positioned itself as a competitor, but is now just selling them massive amounts of compute. Gary, you mentioned, you know, Anthropics,
03:53
relationships with Amazon and Google for compute. those are, you know, those are two of the most capable infrastructure companies on the planet, but yet Anthropic still needed more capacity badly enough that they went and entered into a deal with for infrastructure that's tied to a direct competitor. And that just, you know, screams that that scarcity, compute scarcity is a real problem. Right. I think it also tells us that the AI market is moving from the, you know, the experimentation phase
04:23
to the industrial infrastructure phase. And the winners won't simply be the companies with the best models. They'll be companies that can secure enough compute, power cooling, know, those things you just mentioned, operational reliability to serve customers at scale. And I think that was a threat to Anthropic since they just couldn't secure enough compute. Yeah, that's exactly the point. mean, the last couple of years, enterprise AI conversations have focused on
04:52
model selection, and I know you've seen this, Scott, is GPT better, is Claude better, is Gemini, or what about Llama or Grok? Which model reasons better? Which one writes better code? Which one has a better context window? And all those questions still matter, but this deal reminds us that the model is only part of the story. If the model is great, but customers are rate limited, response times are inconsistent, or the provider can't
05:21
scale capacity fast enough, then the business value gets constrained. So today we'll, we'll cover three things. What Colossus is, why Anthropic would want to access it, and what CIOs can learn from the deal about where things are headed. Yeah. Sounds good. So let's just jump right in with Colossus. um So some people might be familiar with it, but Colossus is XAI's massive supercomputer cluster down in Memphis.
05:51
It's been referred to by XAI as the Gigafactory of Compute, which is kind of a useful phrase because it kind of frames the cluster less like a traditional data center and more like industrial scale production facility for AI. So, you know, the way AI framed it is they said Colossus was built in 122 days and then doubled to 200,000 H100 GPUs in another 92 days.
06:20
And it's one giant single interconnected cluster. And that matters because Frontier AI workloads are, not just about having a bunch of GPUs in the building. They're about getting those GPUs to work together in a highly efficient manner with enough network bandwidth, storage, throughput, and power stability, which is a big concern to keep them highly utilized. And that's, that's different from, you know, your traditional enterprise IT and the
06:49
in a typical data center that has multiple enterprise users in it. Yeah, and I think to your point, ah when a CIO hears 200,000 GPUs, it's easy to think, okay, this is just a huge number of servers, but that does not really capture it. A normal enterprise data center supports business applications, VMs, databases, storage, hopefully security platforms, maybe some private cloud.
07:15
The architecture is designed around uptime, application availability, and really general purpose workloads. An AI cluster like Colossus is different. The entire environment is optimized around feeding massive AI workloads. So if you think of it this way, it's about GPU density, power delivery, cooling, east-west network traffic, which you and I have been doing for, gosh, probably 30 years.
07:43
and memory bandwidth, storage throughput and utilization. So the goal is simple, but incredibly hard, keep the GPUs busy. And I'm seeing this, like I just deployed one customer that needed, not connections into this, but into an AI framework where they needed 400 gigs of network capability immediately for one region only. And that was only going to stack on top of each other for other regions. So it's here. This isn't something that, you
08:13
We're just making up like organizations are using this now. Yeah. I think just to back up for some of the listeners, East West network traffic. So that's traffic that moves laterally inside a data center or an AI cluster. Like we're talking about GPU to GPU or server to server or server to storage. And that's different from North South traffic, which moves.
08:38
in and out of the data center between users, applications, clouds, like your customer that you just mentioned that needed 400 gigs. And that's where this becomes relevant to CIOs because in traditional IT, compute planning often means server capacity, cloud consumption, storage, VMs. But in these large frontier AI models, you have to think about the full stack a little differently. Can the power system support massive density, massive cooling, ah
09:09
Can the network data actually move between the GPUs without becoming a bottleneck when you have huge spikes in demand? uh then, know, can things like, storage feed industrial scale AI system quickly enough to have it perform? uh And then you've got things like the scheduler. the workload scheduler use the hardware efficiently? So I think an important question that comes out now is, can the operator bring all of that online fast enough to matter?
09:38
And on that point, XAI is, they've proven that AI infrastructure can be deployed more like a manufacturing operation and Musk has proven this. ah Then, your typical data centers and everybody's forecasting multi-year deployments. He did it in a very short period of time. Yeah. And if we were to pause on that, I mean, if you think about a traditional large scale data center project, that can take years. I mean, we were talking about East-West.
10:08
I mean, that was back when getting up to a gig within a data center was huge news years ago. things that you have to consider as part of this, you have to have site selection. You need power uh interconnection, which requires permitting, which requires backup generators, which has fuel capacity and EPA requirements and the construction of all that.
10:34
And then obviously the systems and testing and commissioning of this environment. with Colossus, the speed was really, really remarkable at how they were able to achieve this. Yeah, they got it done. I was just kind of thinking about Grok. I recommend Grok for a lot of things. Grok is the XAI model. And a good example would be real-time market or public sentiment intelligence, where X Twitter is part of the signal.
11:03
And now you can also add in that he's getting information from Starlink and from Autonomous Driving. all those things are moving into Grok. It's going to become more pertinent down the road. ah But for business buyers, think Grok is still pretty clearly more of a challenger than a default enterprise platform. it's still right there, uh right behind Claude and Chad GPT. So for Musk, owning this massive compute infrastructure.
11:33
And then the option, um, that if it's underutilized to pivoting to renting it to Anthropic, think was, was kind of his, his option play in his business strategy, which he's now using. Well, it's interesting because we we've seen this, you know, in a different form. mean, years ago when AWS launched, they launched because of their seasonal needs for compute. And they realized, Hey, if we could build this model and then rent out that capacity to others.
12:03
This would be interesting. That's kind of how cloud started, right? So, um, so if we think about getting, know, if we dig into the deal here and X AI, yeah, they announced that space X signed an agreement with entropic to provide access to Colossus. Um, so I that's really interesting. And specifically space X is listed as the counterparty, not just X AI. Exactly. So it gives entropic access to Colossus AI data centers.
12:32
and their capacity at those locations associated with the broader XAI SpaceX infrastructure build out. So XAI described Colossus One as having more than 220,000 NVIDIA GPUs, including the H100, the H200, and the GB200 accelerators, which is unbelievable. So XAI also noted that Anthropic planned to use the capacity to improve Clawed Pro and Clawed Max.
13:03
If you were reading in the news, Reuters reported a much larger financial picture. Anthropic agreed to pay SpaceX 1.25 billion per month, I believe, the spring of 2029. And that was for compute services using Colossus and Colossus 2. So Colossus 1 and Colossus 2. And at the time, they also reported that either party can terminate the agreement with 90 days notice.
13:32
with reduced fees during the initial ramp up period. it's pretty real. Yeah, it's a huge deal. It's hugely important to both parties. And it's definitely not your typical enterprise cloud services contract that you and I typically deal with our clients. um And it's not a company that's just buying a few reserved instances. This is industrial capacity reservation. Yeah, I mean, definitely a much bigger deal than the typical enterprise agreement to your point.
14:01
Yeah. think so. And Anthropic is buying what they're getting is they're getting access to a really a super scarce resource. So right now, compute is the scarce resource and they're getting it at massive scale. um And what they're getting more than just GPUs, they're buying an immediate block of infrastructure that can be brought into the cloud system basically immediately. Yeah. But I think the central question here is why would Anthropic enter into this agreement? Yeah.
14:30
On that point, think it's just because demand is rocketing. ah Just kind of looking at their numbers. Anthropic predicted sales for the June quarter to exceed 11 billion, which is more than double the prior quarter. And growth like that in this space is uh just a brutal operational challenge. they've got, Anthropic has, they've got lots of enterprise customers.
15:00
You know, chat GPT has enterprise customers, but a ton of consumers, uh, Anthropic has developers, Claude pro Claude max, like you mentioned, uh, coding tools, huge API demand for enterprises. And now they've got an agentic workflows that are starting to just consume capacity. that's different from, traditional SAS models where incremental users, you know, mostly consume application server and database resources. These.
15:29
AI users in the Anthropic world consume really expensive inference capacity. And so now we're talking about inference, know, inference capacity every time they interact with the model. So it's not just training, but now we're into the inference stage. uh So that's, you know, every prompt and every response has a cost, every long context window and every coding agent that's running, you know, for a, for a long session, sometimes overnight, whatever that has a huge cost.
15:58
And then every enterprise workflow that's reading documents is reasoning across them and then generating output and calling tools. You all of those things are expensive and Anthropic is starting to see the costs rack up and the compute demand just explode. So the more successful Claude becomes, the more compute Anthropic needs. And so that's the genesis of this deal. Yeah. I think that's a critical point. just have this vision of, uh you know,
16:27
coal feeders in a ship trying to get the engines uh to go even faster and just shoveling and shoveling, right? And I think a lot of people still think about artificial intelligence costs in terms of training the model. And training is incredibly expensive. Nobody's gonna argue with that. But once the model becomes a product that millions of people are using, inference becomes the day-to-day economic engine. So training is like-
16:55
Yeah, needs lots of coal, right? So training is how you create the model. Inference is how the business operates, right? So you want that coal in the engine so you can muster up more than a couple knots. ah So you can keep going. So if Claude is being used for software development, customer support, enterprise automation, whatever it might be, and hundreds of other business critical workloads that are being put into this, then Anthropic needs reliable
17:23
production capacity all of the time at global scale. Yeah. High performance, highly reliable. They need it. And like you said, inference is the key and it's no longer just a side issue. It is the business. It's these customers running their business. uh So this deal that we're talking about uh suggests that the economics of frontier AI are now shifting from research infrastructure
17:51
and training to production infrastructure. So the old question from just last year, you know, who could train the best model is now shifting to 2026 and it's operating. And the new question is, know, who can serve the best model at global scale with low latency, reliable capacity. And now we've got the key question as they're going public, you know, at acceptable margins and enterprise grade performance. And that's, that's an operating problem. That's, that's, you know,
18:21
Next next level. Yeah. And so Anthropic is paying a premium to XAI, not just for compute, but to protect the growth. Yep. Right. So the worst thing Anthropic can do is frustrate customers with things like usage limits, slowdowns, or even just general bad performance because they're more likely to pull up stakes. Yeah, that could be death for them. And I think any, any rate limits that might be imposed.
18:50
As if, if they're hitting compute capacity or having other issues, if, if rate limits are suddenly imposed on the enterprise users, those aren't just technical controls. Those are, those are market signals about constraints that they have in their business. So with this deal, this 1.25 billion a month deal to buy compute from XAI space XA XAI, they're, basically buying the ability to continue scaling. Right. Right. And so if we were to go.
19:18
dive into this a little bit deeper, technically, maybe you want to explain the difference between training and inference because it's really central to understanding this deal and why this is so important. I think especially for the next six to 12 months. Yeah. And I think we've, you we've covered it in other episodes, but I think just, briefly, you know, training is the process of, uh, building or improving the model. It takes enormous GPU clusters.
19:47
synchronized computation, huge data sets, high speed networking and coordination across GPUs. And during training, the model is learning patterns. The workload is highly parallel and very sensitive to communication between all those GPUs. So if the network is too slow or the GPUs are waiting on each other, utilization drops and then the economics really get ugly. Now,
20:12
switching gears, inference, which, the enterprises, day to day are using inference. It's, different. It's what happens when a user asks Claude a question and Claude generates an answer. And at small scale, it's, sounds simple, but at entropic scale, it's, it's a massive production and engineering problem. I mean, you have to serve all those millions of requests with all different use cases. So, so the inference layer has to route, schedule, batch, prioritize and
20:42
and constantly optimize. And that helps to really explain why a mixed GPU environment can still be valuable. A cluster with H100s, H200s, and GP200s ah may be complicated for certain uniform training workloads because training often benefits from things like consistent hardware and predictable synchronization. But for inference, different classes of GPUs can serve different types of workloads.
21:10
Some workloads may be latency sensitive. Some may be memory heavy. Some are batchable. And some may just be better suited to newer hardware that we're still trying to understand. So these are the things that I think also have to be taken into consideration. Yeah. Yeah. So I think just breaking that down, I think the takeaway is that AI infrastructure isn't just one monolithic thing. You've got all the factors that have different infrastructure profiles. So when a
21:39
when a CIO is planning enterprise AI, they don't just simply ask, do we have access to a model? They should really look to understand what type of AI workload are we running? What infrastructure does that workload require to optimize it? Because your typical internal chat bot is much different from a real-time customer service agent, real-time. uh A coding assistant is different from batch document processing workflow, for example.
22:09
the workload matters and I'm sure that Anthropic will be working closely with XAI on the details of the infrastructure that they're obviously paying a monthly premium for. Yeah, that's a point. So let's talk about the hyperscaler angle. Anthropic has major relationships with the players that you would expect, Amazon and Google. And those companies have massive cloud infrastructure. I don't think that's news to anyone. uh
22:38
Yeah, they are among some of the best and biggest in the world at building and operating data centers. But every major AI company is also trying to secure capacity at the same time. So there's that competitive nature there. Yeah. So yeah, huge demand for compute and all those, as competitors, Amazon, Google, they're building quickly. They're buying massive amounts of GPUs. They're even designing their own custom chips, like we've talked about in a few episodes, and they're expanding all of their global regions.
23:07
And of course, working really hard to secure power, like we've talked about also. So, you know, Google needs compute for Gemini, Meta needs compute, Microsoft, Amazon, they all need it. And then of course, there are thousands of enterprises on a smaller scale and everyone's pulling on the same supply chain to try to try to get that compute. Yeah. I think it's important to note that even if a company has great cloud partners, it may still need a multi-source compute strategy.
23:36
So for years, I've been living this for a while, cloud strategy has often been framed as a single cloud versus multi-cloud. ah AI may force a more nuanced discussion. might have your data in one cloud, your AI model, uh access through a SaaS provider, an inference capacity through another provider, your vector database somewhere else, uh and private models running in a co-location environment, for example. That makes it complicated.
24:05
Yeah, that makes the architecture more distributed and somewhat diluted. It makes governance harder and it makes vendor dependency that much more important. So network design strategy is critical and things that you talked about before around latency and just overall governance. And so I think CIOs will need to understand not just which model they're using, but where inference runs, how data is handled. ah
24:33
what third party infrastructure is involved and what happens if capacity gets strained? Do you just get shut off? Do ah you have some sort of bursting mechanism in the future? These are the things that you just need to understand so you don't have a gotcha in your AI infrastructure for your business and ultimately your clients. Yeah, exactly. So like in the older SaaS world, uh sourcing was focused on vendor due diligence, application security.
25:02
compliance, DR, uptime. But now with AI, CIOs have to add the infrastructure questions. And I think they're starting to get that, but obviously something they're going to have to pay attention to. Where does the inference run? What third party infrastructure providers are involved? I Gary, you do a ton of work on that, with multi-cloud connectivity. What regions are used? How are workloads isolated? And then you've to think about what happens when demand spikes.
25:31
and rate limits and fallback options. So those are, it's just getting more complex for sure. Yeah, absolutely. And I think if you think about all of that, compute really can start being looked at as a strategic currency. ah So if you were to put that in context, because CIOs are asking the question, they've started asking question of me, what does AI compute actually cost in the market? ah
25:59
So if an enterprise wants access to high-end NVIDIA GPUs, the pricing varies a lot depending on the provider, the GPU generation, whether it's on demand or reserved, whether it's in a single GPU or a full cluster, and then questions like whether the customer needs basic things. maybe a little bit more enterprise level, but enterprise support, networking storage, security, and that guaranteed capacity so they don't run out. um
26:29
So, so as a rough market snapshot, let's look at it this way. H100 class GPU capacity today can range from roughly $2 to $4 per GPU hour from some specialized GPU cloud provider, know, keep their names out of it. While hyperscaler pricing can often land closer to six to 11 GPU per hour, depending upon the region, obviously the instance type and the commitment model, but
26:58
Some broad 2026 pricing trackers show each 100 pricing ranging from around a dollar 25 an hour at the very low end, right? To nearly $15 per hour at the high end. But those are extremes. they don't always reflect what a large enterprise can reliably buy and negotiate at scale with guaranteed availability. But that just kind of gives you a picture.
27:24
Yeah, I think there's a huge difference between the cheapest listed GPU price on the internet and what an actual CIO might want to depend on for production AI. So I mean, think about like a startup might be able to rent a few GPUs from a low cost provider. But an enterprise running critical AI workloads cares more about, you know, they care about much more than just the hourly rate. They care about availability, SLAs, security, performance, all the else.
27:54
and whether capacity will still be there when demand spikes in that environment. So I've got a few pricing examples here, Gary, too. So I show AWS P5 H100 capacity at about 55 an hour for an 8 GPU P5 48 XL instance, or roughly 688 per H100 GPU hour. And then I've got some other market trackers. have
28:21
Google Cloud at the A3H100 on demand pricing is a little bit higher. So 10 bucks to $11 per GPU hour. And then you've got some real specialized GPU providers and we're working on much names, but we've got, know, Lambda, RunPod, CoreWeave and others offering lower H100 pricing and the $2 to $6 per GPU hour range, depending on the specific hardware you select and then the contract terms.
28:51
under them in the license agreement. Yeah. So I guess then the entropic comparison becomes really interesting, Yeah. I mean, 1.25 billion. 1.25 billion per month, which is astronomical. And if we use XAI's description of Colossus One having more than 220,000 NVIDIA GPUs, then a simple, I guess, back of the math envelope calculation would put that at roughly
29:22
ah 790 per GPU per hour. It's not a perfect calculation because the deal may include more than just broad GPU hours. get that. But it probably includes networking storage, operation support and so on. But it gives CIOs a uh useful benchmark. Yeah. So I think the kind of the conclusion there would isn't really simply Anthropic is overpaying. I think the better conclusion would be that Anthropic appears to be
29:51
based on the size of the deal, they're paying a premium, but they're paying a premium for dedicated capacity at massive scale. And I think that's the part you need to focus on. They needed it, they went and they secured it. ah So at a small scale, GPU pricing is really kind of a commodity comparison, happens all the time, network bandwidth. But at massive scale, you have to kind of consider it a move to handle their serious capacity reservation problem. ah
30:21
So really they're not buying it per hour, obviously. They're buying a large block of immediately usable AI infrastructure so that Claude, their business can keep scaling as they're rolling into an IPO. Oh, and then, and by the way, XAI can, per the terms, they can terminate the agreement with 90 days notice. Well, both parties can, but XAI could terminate 90 days notice. Like I said, I haven't read the details, but if XAI rolls out a...
30:50
you know, a really kind of a killer update to, uh, to grok that suddenly starts drawing in massive user gravity, which is how things can work in the software world. They could reclaim their compute from Anthropic. Yeah. And, and I mean, this is where it gets interesting. I mean, it, we, we saw this in the early days of the data center capacity when, people were just putting, you know, client server environments in there. You know, people were reserving space or future growth, but then others were
31:20
So, ah you know, it's an interesting statement that you're making about they each have that ability to terminate. ah I don't know what that does, but, you know, it certainly makes it very competitive, but it also, you know, does prevent some vendor lock-in on either side. So that could also help other users. Well, that about covers it. So let's leave it there. Thanks, everyone, for listening. We'll see you on the next
31:48
next episode. appreciate your support. Please continue to send in questions. Feel free to link in with both Scott and myself. Until next time, we'll see you soon.