The Tech Strategy Podcast

Understanding AI Infrastructure Part 1: Apps + AI, Saas + AI and AI Data Centers (265)

Jeffrey Towson Season 1 Episode 265

This week’s podcast is the first in a series about AI infrastructure. Both capabilities and costs.

You can listen to this podcast here, which has the slides and graphics mentioned. Also available at iTunes and Google Podcasts.

Here is the link to the TechMoat Consulting.

Here is the link to our Tech Tours.

Here is the article on the Agentic AI Operating Basics, which I wrote up last week.

Here is the Huawei whitepapers related to AI Data Centers

AI-Ready Data Infrastructure Reference Architecture White Paper

https://e.huawei.com/en/forms/2024/solutions/data-storage/ai-ready-data-infrastructure-white-paper

AI Data Center Facility Reference Design

https://digitalpower.huawei.com/upload-pro/index/index/Huawei-AI-Data-Center-Reference-Design.pdf

Here are the slides from the mentioned Huawei whitepaper.

-------

I am a consultant and keynote speaker on how to accelerate growth with improving customer experiences (CX) and digital moats.

I am a partner at TechMoat Consulting, a consulting firm specialized in how to increase growth with improved customer experiences (CX), personalization and other types of customer value. Get in touch here.

I am also author of the Moats and Marathons book series, a framework for building and measuring competitive advantages in digital businesses.

This content (articles, podcasts, website info) is not investment, legal or tax advice. The information and opinions from me and any guests may be incorrect. The numbers and information may be wrong. The views expressed may no longer be relevant or accurate. This is not investment advice. Investing is risky. Do your own research.

Support the show

00:06
Welcome, welcome everybody. My name is Jeff Towson and this is the Tech Strategy Podcast from Techmoat Consulting.  And the topic for today, Understanding AI Infrastructure, part one.  And this is going to be probably six-to-seven-part series on how to think about AI infrastructure as a businessperson. What matters, because you got to understand some level of the technical capabilities, but you also got to start to understand the cost structure.

00:35
and what you can build and use and not. So, we're going to go through this step by step.  Part one, basically going to talk about AI data centers, which are crazy right now.  And then we’ll sort of begin at the beginning, which is apps plus AI and SaaS plus AI.  So, software as a service plus AI. So basically, incorporating AI into the two most popular software-based services, apps, SaaS.

01:02
So that'll be the topic for today.  I'm kind of excited about this. I've been working on this for a long time, so I'm going to sort of pull it together over the next couple weeks, and hopefully that will be useful. That's the idea. For housekeeping stuff, we have the tour at the end of November, which is Shenzhen.  Really, it's kind of a conference and a tour, because we're not traveling to multiple cities. We're going to Shenzhen, we're staying in one place, we're in a...

01:29
you know, the Four Seasons conference room. We're have a lot of speakers come in. There's going to be some site visits, talk to some companies, things like that.  So, it’s kind of, let's call it tour slash conference, really about, you know, digital China, the leaders there, but also macro. So, looking at Shenzhen, the greater Bay area, sort of top down plus bottom up, mostly bottom up, micro more than macro.  Anyways, that's going to be a lot of fun.  If you're curious about that, me a note or go over to techmoatconsulting.com. We have the details there.

02:00
Standard disclaimer, nothing in this podcast or my writing or website is investment advice. The numbers and information for me and any guests may be incorrect. The views and opinions expressed may no longer be relevant or accurate. Overall, investing is risky. This is not investment legal or tax advice. Do your own research.  And with that, let's get into the topic.  Now over the last really year, I’ve been sort of going into this in pretty good depth in my own sort of study, trying to understand the technology better than I did.

02:28
And what that means in terms of deploying it into products, services, workflows, productivity, operating models. And then what does it cost? And I mean, the short version of that is the economics are different than software. The variable costs are a big problem. Well, big factor. This is not, hey, we've got a nice clean CPU based tech architecture with a database and CPUs and we're getting 60 to 70 % gross margins because it's all software. No, no, no.

02:59
We've got a lot of human components here. There's a lot of upkeep. There's a service component to all of this. And so, you're looking at gross margins at 30, 40, 50%, probably not 60 to 70%. Well, that makes a big difference in terms of what you can offer as a product or service. So, we'll go into the cost structure, which is still evolving. hard to get, you know, it changes every month, but we'll try and get smarter about that. So that will be the topic. you know, I've been visiting, reading a ton, visiting.

03:29
the giants of Asia, is Baidu, AI Cloud, Tencent, AI Cloud, Alibaba, AI Cloud, Huawei, AI Cloud. mean, pretty much everybody on the ground that I could talk to, I've been talking to and digging into it. So, I'll start to bring in some of that more and more once I've kind of laid out a framework for how to think about it. I think then it's easier to put in the examples once you have sort of a coherent framework. So that's the goal for today. Well, today's the first part of that. Okay.

03:58
So, I thought we'd start with the AI data center frenzy, because this is one of the craziest buildouts maybe we've ever seen. The amount of money that is being spent to build these AI data centers across the world, but mostly China and the US right now, to a lesser degree Europe, and then other places, it's unbelievable.  I don't think we've ever seen anything like this.

04:25
The craziest, okay, let's say it's probably OpenAI, we're going to spend $500 billion on these things, a trillion. But Sam Altman makes up these huge numbers, so who knows? Microsoft, you know, on a yearly basis like Microsoft, we're going to spend $50 billion building out AI data centers. It used to be Big Capex spends the biggest spenders in the world R &D was probably Huawei.

04:49
a couple others, they were at the 20, $22 billion in a year, know, TSMC's up there with their new fabs, well, you 20 billion. We're talking 50, 75, $80 billion. It's crazy. Okay, so I sort of thought about, let's talk, yeah, is it a bubble? Oh, for sure. Nobody spends this much money rationally. You can either call this strategic positioning or you can just call it.

05:15
the animal spirits in irrational behavior because we've all got to participate because we're afraid of missing out. It's probably a bit of both. Okay.  So, what are they doing? And I'll give you some examples because it's fun. It's a massive build out of compute power for AI workloads as opposed to traditional computer workloads. It's a different system. We're looking at different types of uh

05:43
workloads being assessed; you're generating content is very different than pulling from a database.  And okay, so how do you figure out how are we building against this massive workload coming down the pipeline, non-traditional workloads? Well, I mean, it's playing out in a lot of places. A lot of companies are building their own sort of in-house, on-premises capabilities.

06:07
Larger companies are maybe building their own models in-house, but I think the easiest place to look at least for today is we look at the big AI cloud companies and we look at sort of the big AI giants like OpenAI and XAI.  They're building faster, so it's a good place to start is to look at what they're building because they're kind of the ones pushing the frontier and everyone else is sort of following that. Now when we look at sort of AI, first of all AI data centers the wrong term.

06:36
When you say data center, everyone knows what that means. Okay, we got some CPUs, we got a database tied in. We kind of know what it, this is, data centers have been a lot more like storage. And then some compute on top of that. Okay, when we look at what XAI, Elon Musk, is building in Memphis, it's not a data center, it is one massive computer.

07:03
that functions like one big computer. So, these things are more like supercomputers. And the fact that he called it Colossus, yeah, that kind of works. So, if we're viewing this as a supercomputer, or you could call it a factory or a data center, we’re really talking about four components. You got the compute. That's the key to everything.  Everyone's talking about how many Nvidia...

07:31
GPUs is in this one versus that one. Okay, the compute is obviously the biggest thing. It's the biggest engine. It's the biggest spin. But second to that, you've got the energy requirements. And these things soak up so much energy that, you know, the tech giants are becoming energy giants. They're building nuclear power plants, or at least they're buying them. You know, they're building massive gas turbines.

07:55
They are soaking up so much energy that yeah, we can consider these the new energy companies. And then second to that, we've got the cooling because this, you know, the GPUs with all that electricity throw off a huge amount of heat. So, you've got to have tremendous cooling capability, which is water-based cooling pretty much. Air is not going to get it done.  And then you've got the networking part of this, which is, you know, there's semiconductors that are just doing the networking. There's unbelievable amounts of optic cable.

08:24
Because you've got to connect all these GPUs not only within a cluster, but then all the clusters, all the racks have to connect with each other. And it can be in one location like Colossus.  Or it can be between multiple sites, which is what OpenAI is building, where you're building four to five of these super centers.  Well, they've all got to be connected pretty much by ethernet cables. So, the networking site is huge as well. So, you've got kind of four things to think about within all of this. And all of these have cost structures.

08:53
Now if you build one of these in a medium sized company and you don't just rely on cloud, you're going to have those same four components probably. Okay, so think about it, that's an easy way to think about it. Then you've sort of got the last bit I suppose is the ecosystem. If you're going to do stuff on AI cloud, you're going to contract with Azure or whatever. Okay.

09:18
For every dollar or two you spend there, you're going to spend two, three, four dollars on ecosystem partners to get this stuff to work in your company. So, these are the software vendors that help you deploy, they do your maintenance, they do your upkeep. So yeah, you got the compute, AI data center's a good way to think about it, but then around that you have the ecosystem. That to me is the world we're looking at and there's a cost for all of that stuff.

09:44
Now let's talk about the craziness a little bit, if you're not following this in the news, you really should, because it's really fun. We could say, look, everyone's building out these centers, mostly American companies, but obviously European too, but the major hyperscalers, they tend to be in the US. There's a lot of them, but there's three different sort of models I keep in mind. You've got XAI, that's Elon Musk, that's powering Grok.

10:13
It's powering Tesla. Colossus One was the first built out in Memphis. This is the one that kind of got everyone's attention. This was launched in September of 2024. He had 230,000 NVIDIA GPUs. Within that, 150,000 H100s, 50,000 H200s, and then 30,000 others. Okay. That's kind of an interesting approach for a couple of reasons. Number one.

10:43
he's going with one core technology, is Nvidia. The other players mostly aren't doing this. They're using lots of types of chips from lots of different companies.  So, they're sort of heterogeneous in their computing architecture. Elon is going with Nvidia. It's a unified computing approach, which is going to make things a lot less complicated. It's going to make things a lot more efficient, but you're going to, you know, it's sort of a little bit like

11:11
when the iPhone was launched, went with an integrated business model. Apple did everything itself in-house, kept it simple, did end-to-end. Android, lots and lots of partners, things like that. This is kind of this, eh, somewhat the same.  Okay, so the compute power was crazy. No one had ever deployed 230,000 GPUs, let alone the highest-end GPUs from the best company on the planet, Nvidia. You'll see the Chinese players...

11:39
they don't have access to those chips in China so they're doing a different approach with sort of the core compute which I'll talk about. Power wise 250 to 300 megawatts of power for Colossus One. That's important because everyone now is talking about basically gigawatts. So that's interesting. But for the first one they basically used a mix of the electrical grid

12:07
the local substations. They built their own natural gas turbines. They bought a gas plant that was there. They flew in new turbines or something.  And then they've also got the Tesla megapacks there as well because the goal here is absolute stability and reliability.  If you're spending all this money training a model and your power drops, that's a real problem. Also, the GPU workloads for AI

12:34
are much more volatile than we saw for CPU-based computing. They can spike dramatically. So, you got to have real reliable energy. Reliability's key. it jumps all over the place, kind of.  And then you've got the connectivity piece. How are we going to network all this stuff together into one supercomputer? Well, he used Nvidia again, Spectrum X Ethernet for high bandwidth networking.  And that was kind of his thing. And then water.

13:03
for cooling, which is, not going to talk about the water, because you know, it's a bit off topic. Okay, Colossus 2, which is being built right now, it's supposed to launch 2026, it's right down the road. I think there's a secondary site maybe, but mainly it's right down the road. 550,000, maybe up to a million Nvidia GPUs. So, two to four times as many.

13:28
So that's going to get you more power, more compute, but at the same time, they're also upgrading the chips to Blackwell. So, the GB 200s, GB 300s, okay, those things are dramatically more powerful than the Hoppers that were used in Colossus 1. So, one, he's adding more volume.  Two, he’s using the next generation chips, which have a lot more power. So, it's a massive step up in terms of computing power. They're also upgrading Colossus 1 with some Blackwell chips from Hopper at the same time. Okay.

13:59
Colossus to 1 gigawatt of power so four times as much 30 natural gas turbines 200 plus Tesla megapacks for buffering Maybe a grid electricity grid tie-in. It was Diverse energy sources for redundancy and for handling the peaks which is a big deal. Okay That's really cool. What's interesting about that to me is this is a very

14:29
It's a very Elon Musk thing to do.  It's a different approach to this than virtually everybody else. He is betting on one technology. We're going all in with Nvidia because those are the best. We're not doing lots of different computing platforms and chip types and all of, no, we're doing with one. We're buying the best of the best. There's risk in that. And also, we're not doing multiple sites. We're doing one site.

14:55
Well, that's risk. If the one site goes down in Memphis, everything goes down. There's no secondary sites in other states. know, geography is a big deal. You're betting on one location. Okay, so he's kind of focused on one site, not multiple sites, and he's doing one computing architecture, not multiple, which you have to stitch together. All that means more risk, but it also means he can move faster, which is very Elon Musk. He seems to be very tolerant of risk versus anybody else.

15:24
Interesting.  He's basically building one tightly interconnected unified computer.  And because of that, he's moving at a speed. Really, I don't think anyone's matching him. Okay. Another interesting counter example, OpenAI, they're building Stargate. Now Stargate, four to five locations across the U.S., all within the U.S.  Texas, Wisconsin. I don't think he's announced all of them.

15:55
I think there's a couple sites they haven't quite said. They're mostly in the Midwest. lots of partners. Oracle, SoftBank, think is the biggest owner. I think SoftBank, Masayoshi-San, I think he's technically chairman of this whole thing.  okay, Microsoft obviously in there. uh So he's doing multiple partners. uh Spending in theory more, this is where that $500 billion spend number over four years is floated around with

16:24
100 billion already spent. I don't know.  Don't hold me to those numbers. The argument is this is going to be 10 gigawatts, not one gigawatt that colossus two, but 10 in total capacity. But this is over several years. Who knows? Heterogeneous computing. So, it's technology agnostic. There's Nvidia. They're in there for sure. There's ADM. There's Intel.

16:52
Lots of specialized chips, uh custom Oracle, custom Microsoft. again, more complicated, harder to move fast. It's more difficult in terms of operational complexity, less efficient, but it's going to give you more options.  You can develop in different directions. You're not betting on one tech. So anyways, four to five sites, heterogeneous community, bigger numbers, bigger dollars, bigger energy, all of it. The third model to think about,

17:21
for the US, there's others.  The third model I think about is Microsoft. Because Microsoft is, okay, they're using their AI data centers for Azure, obviously, Co-Pilot. They have a partnership with OpenAI. But they're not building something new just for their AI products that they're building from scratch. No, they're transitioning their 400 plus data centers that they've already built across the world.

17:50
to handle AI workloads. Now they already have data centers everywhere in the world, so they're kind of upgrading and transitioning those to this additional service.  They cover 70 regions globally.  That makes them different than the other two, are, you know, they're doing services everywhere, but they're data centers they're building in the US. I think that's for licensure reasons, I'm not totally sure why they're doing that.  So, I think about them because they're a global push from day one.

18:20
And they’re transitioning existing structures, not building de novo from scratch like Elon and Sam Altman. So big focus on Europe, Asia, and they're building an international global AI network. In theory, this thing is going to operate by one network that crisscrosses the planet. And they're doing heterogeneous computing as well. They're tech agnostic to a large degree.  Anyway, so those sort of three models that I keep in mind.

18:48
You can also look at Google. Obviously they're huge. Amazon's doing a ton. Meta is obviously doing a ton. So is Oracle. I mean there's seven to ten major players building out like crazy. And then there's all the medium players. So that's kind how I think about the data AI data frenzy coming out of the US. Now in contrast we have a similar behavior coming out of China. Also going international.

19:16
in some cases, and others staying purely domestic.  Now for this, there's also three companies.  I follow all of them, but I think there's three to keep in mind. First of all, that’s basically Alibaba Cloud, uh Huawei Cloud, and then you could say Tencent Cloud or Baidu Cloud.  This is where we start to think, this is going to be a different playbook because for one,

19:43
They can't get Nvidia chips in China anymore. The market share in China for Nvidia is now officially zero, as far as I can tell. According to the CEO, it's zero. So, everyone there is building in a way that they are basically self-reliant. They don't want their technology supply chain dependent on the US because it got weaponized against them, starting with the semiconductors and...

20:10
Now we're seeing the flip side to that where China is doing the same thing with the rare earth elements back the other direction. you people like you can't do that. You can't cut off European carmakers from all the rare earth elements without which they can't really build EVs.  And it's kind of like, well, you know who taught them that?  They didn't think that up on their own.  The US kind of did that to them first. So, it’s a free for all as far as I can tell.  Anyways.

20:40
But how to think about it?  They're definitely doing lower spending. The US system, US approach is let’s spend tremendous amounts of money and go for scale. We're just going to build; how do we get more computing power? We're going to use more chips. We're going to use the best chips and we're going to put more of them. We're going to go big and we're going to use deep pockets to do it.

21:07
This is not that, they're doing lower spending overall.  And they're not trying to just brute force it in terms of let's buy all the chips and stitch them together.  They're having to be a bit cleverer where they're looking at how the chips connect to each other. And they're clustering. And they're coming up with flexible sort of peer-to-peer clustering approaches where it's more dynamic.

21:35
And there's a couple reasons for that. One, it's smart. And you can already see this in something like Deep Seek. know, everyone was trying to create better LLMs by throwing more computing and spending more money on training. And then Deep Seek comes out of left field and kind of says, we're just going to be smart about this and we're going to spend less money, but we're going to be really smart about how we do the math. And we're going to get similar or close to similar performance at a fraction of the price. You know, cost innovation.

22:05
It's kind of the same thing. They're not competing on the pure scale of powerful chips deployed. Now, that also tends to work out because, they can't count on getting the most powerful chips which are coming out of the US. So, you kind of got to do that anyways.  The other thing they're doing is they're sort of putting multiple lower or less powerful chips together and getting similar performance, or hopefully similar performance,

22:35
to the higher end chips and Cloud Matrix 284 out of Huawei. That's kind of the leader in this.  We're going to stitch together lots of types of lower power GPUs and really do the networking and clustering and get similar performance.  And it looks like it's working. I'll talk a lot about the Cloud Matrix 284. That's actually really important.  Now, the downside to that scenario is you're using a lot more chips and each chip has a certain power.

23:05
requirement, which means they're going to use a lot more electricity. Okay, we got three to four times as many chips to get the same performance. The electricity bill just went up. Now, fortunately, China is very, very good at deploying electricity.  They can build electricity.  It also means you have higher water requirements for the cooling. They're doing that as well. So, you're going to see that.  And then the last thing that makes them different is some of them are staying in China.

23:35
like Baidu. Others are going international very aggressively. Alibaba Cloud is probably the leader and they are focused on the global south. So, they're going into Malaysia, they're going into Thailand, they're going into Latin America. They're also trying to get into Europe for sure, but they're definitely, seeing this split when you look at how the, you know, the Western giants are going international, it kind of looks like the G7.

24:02
And when you look at the Chinese player, it looks a lot like the Global South. Not 100%, but it does kind of line up that way. So those are kind of the differences. Now here's a couple of companies, and I'm going to talk really more about the Chinese companies in the next parts of this than the US ones, because I know these companies better.  So, Alibaba Cloud, definitely the most, well, almost for sure the most aggressive in terms of international expansion.  They want to build a global network.

24:32
And obviously they're going to focus on China because that's the home base. That's the powerhouse. If you're big in China in this, you're big by definition, but they're also going international. they're opening in Brazil, AI data centers, France, the Netherlands, Mexico. uh They are moving really pretty fast. Now internationally, their data centers, as far as I can tell, are using Nvidia chips. So internationally they're computing.

25:02
Platform looks similar to the West domestically obviously they're not there You know they're building using they have their own chips in general they have SMIC for fabrication of things so that's an interesting sort of uh distinction if You compare that to Huawei cloud Okay They’re going international as well, especially in a couple locations Southeast Asia. I would put that number one on the list definitely Latin America

25:32
We'll see how far they get in Europe. Now, they can't use Nvidia chips anywhere as far as I can tell. So, they have their own Ascend chips and they have their own architecture because obviously they were added to the entity leads first.  So, they're using that everywhere. So, we're seeing the China's tech stack going international as well. And what makes Huawei interesting is that, but also unlike these other companies that are data center companies,

26:01
Huawei can build the entire digital infrastructure of telecommunications from end to end. They can put devices in your hand, uh cell phones, smart cars, all of it. They can do the networking, the 5G, all of it. They can do the microwave transmitters, and they can do the data centers, and they can do obviously the software, although they tend to sort of be a platform that others build on.

26:29
So, in their data centers, you're going to see a lot more openness to other types of software running on it.  That's a real difference than say, you know, some of the Western ones. So full end to end digital infrastructure going international and they cannot use the US products. That's really interesting.  The last one you can think about is maybe Baidu or Tencent. I'm not going to go into those, Tencent Cloud.  Neither of them are obviously using NVIDIA domestically, because they're kind of gone now.

26:59
They've got their own chips. They're very good at this stuff. Baidu is staying domestic. Tencent is going international, but not as aggressively as Alibaba. So, there’s three Western versions of this AI uh data center frenzy story, and then there's three China versions you can think about.  I think it's super interesting, but it changes every couple month.  Okay, that's kind of point one I wanted to talk about. Let's get back to the business side of this.  So, if we look at this from the business perspective,

27:28
I wrote sort of an AI playbook about a year ago, just starting out. Generative AI, when I say AI, I'm talking about generative AI.  Technically predictive AI, historical AI, which has been around forever.  I put that into sort of traditional compute and then I do sort of AI compute. So traditional workloads, AI workloads. I'm talking about generative AI, I'm talking about AI agents, not basic predictive modeling.  Now.

27:55
Where we start with all this?  that playbook, I'll put the slide in the notes and I'll put the link in.  like, you're spending a lot of money, time, effort, resources, training, all of this.  You want to stay as close to the customer as possible because that's where this all translates into value. You it makes the customer experience better. It improves your product.  And I always try to stay as close to that as possible. Now keep in mind,

28:24
When new technologies come along, that's usually what happens. Not only does all the technology go directly into customer value, the experience, the product or service, or it makes things cheaper on the operational side, and then that savings is then passed on to the user. Most of the value goes there. It's actually pretty rare for that value to translate into more profits. Warren Buffett famously talked about this for the last 40 years.

28:51
know, technology is awesome, but keep in mind, most of that value ends up going to the customer, not to the company and not to the shareholders. It's just the new cost of doing business to offer this stuff.  Okay, so when I think about this, the Generative AI Playbook I wrote, and I'll put the slides in there, look, I argued step one was you basically put this into the products and services, and that means put it into your existing products and services

29:20
and try to launch new products, ideally that are going to get you some 10X result that everyone loves, if you can.  That's where goes. The other part of that is, let's put this into improving our operations. Because it turns out generative AI is really, really good at making people more productive, especially knowledge workers, white collar workers. It doesn't do that much for people working on farms, but if you're sitting in an office typing on a computer, yeah, this is going to make you more productive.

29:50
So, you know, new staffing, new skills, new tools you can use, things like that. So, you know, put it into your products, put it into your ops and go for a productivity gain. That's kind of step one of the Generative AI Playbook.  And there's another level to that, which I wrote about or talked about last week, which was like, there's actually, you we have the digital operating basics. Well, there's also sort of a Generative AI, agentic operating basics.

30:20
There's a bigger playbook than just deploying tools and getting more productive. We need to start to redesign workflows. And I wrote that up, yeah, I wrote it up. I'll put a link to that article. I thought that was mostly McKinsey stuff. So, I thought that was a very good way to think about, you know, sort of the AI operating basics as opposed to the digital operating basics. And they're obviously overlapping like 70%. The other thing to think there is, you know, there's sort of four levels to that operating.

30:50
There's traditional rules-based software, deterministic processing. There is predictive AI, historical AI, analytical AI, and then you move from that to generative AI, and then you move to agent, which is different. And it requires it basically a different tech stack. And I'll get to my so what shortly. Anyways, when I was in, I was at the Tencent Cloud meeting in Shenzhen a couple of weeks ago.

31:16
know, walking around looking, because Tencent has products and services everywhere. There's so many, it's hard to keep them straight. Everywhere I was walking, was like everything had tacked the word plus AI onto it. So, it was apps plus AI, SaaS plus AI, uh know, documents, Tencent documents plus AI, gaming plus AI, video.

31:42
plus, mean, so you can see them literally putting this into every one of their products. Not all of them, but it seemed like all of them.  So that's kind of, you know, the point of this podcast was, look, start with its apps plus AI and it's SaaS plus AI. That's the beginning sort of business question here, and then you go into the operating. But you can see this everywhere. Like, I get updates every day about new announcements in all of this space, and every day it's like,

32:10
Okay, Microsoft Edge is now incorporating AI into the browser. It has Copilot mode, you know, announced this week. uh Chrome, Google's Chrome, the browser, now has Gemini incorporated into the browser.  OpenAI has their new browser they just launched this week called Atlas. Instagram has AI, Instagram AI. Google Earth now has AI. So, you can see just, they're plugging it into absolutely everything. So that's all sort of step one.

32:39
Focus on building products, incorporating it. Then you can think about operating performance. And if you really want to move quick, then you think about an operating basics approach, AI operating basics, and you're plugging into everything. So new products, new workflows, new tech tools, things like that. uh Okay, last point for today, which is kind of, guess, the big one.  At the core of all this is something we don't, or at least I don't talk about very much, which is,

33:09
All of this requires a fundamentally different approach to compute.  When you talk about software, we always talk about apps and we talk about databases, but there's actually a third pillar, which is compute.  That's how you build an app. You have the app, you have the database it draws on, and then underneath it you have the compute infrastructure, which is, it's a CPU, actually it's several, maybe a GPU in there as well.

33:35
You've got the storage, you've got the sort of networking on the chip, you've got networking around the chip.  We don't really talk about that third pillar very much because it doesn't change that much. In fact, the whole system is a little bit static. We might upgrade the app every week, every month, but we don't upgrade the app every five minutes.  now, the whole system's sort of static. uh

34:02
And we mostly always talk about the apps and the database. And we don't really talk about the compute it sits on very often. Well, when you move from apps to apps plus AI, you get a different architecture. And particularly you get a different type of compute at the beginning. And it turns out it changes all the time.  The compute architecture is always changing. Well, you basically go from three pillars to four. You have the apps, like always.

34:31
You have the database, although the databases are different. Then you have the model, the foundation model, and then you have the compute. So, we're kind of going from three pillars to four pillars. And I'll put you some slides from Huawei, basically saying what I just said, which is, yeah, we've got a different compute architecture, and those are the four pillars.  Now there was a great publication by Huawei called the AI Data Center, and I'll put you the...

35:00
give you the link to it. It is worth reading two or three times because it kind of lays out the tech beneath these AI data centers and yeah, but it's useful from a business perspective. It's not just sort of getting into the tech for news. I read this thing three times. I'm going to read some of it. I'll put some slides from that in the show notes, but I'll read you a couple. Like they basically argue there were four computer eras. There was 1940 to 1990, which was the computer era.

35:30
And the data center for that was a big equipment room, bunch of servers sitting in a room. 1990 to 2010, they call the internet era. That's when we start getting full-fledged data centers. Everything's running up there. There's Facebook, there's other stuff. 2010 to 2020, we get the big data era, where suddenly the data is dramatically more, and we start to get cloud data centers. Lots of data moving back and forth.

35:59
Now basically we're starting the intelligent computing data center, the intelligent era, where we're basically going to build this thing on AI data centers, which are different things.  The traditional data center basically was CPU-centric. Unless you were doing gaming or video generation, it was pretty many data centric. I'm sorry, CPU-centric. CPU sits at the center, you've got some DRAM associated, you got some storage being...

36:29
but it basically supports the application and the databases. And they use what they call the von Neumann architecture, which is, build these things bigger, you sort of allocate space on these machines for different usage, but it's kind of general-purpose computing. The computing density was relatively low.  The cooling structures and energy was much less. So was mainly air cooling, not water-based cooling.

36:58
Okay, once you move to intelligent computing, sort of get the XPU-centric center. XPU, you know, can, the X is a placeholder.  So, GPU, NPU, TPU, CPU as well.  So, we sort of get the XPU-centric system, uh lots of AI training and AI inference being supported. The architecture is not Van Neumann sort of fixed where you allocate, you know, you can have this VM, you can have this container.

37:28
but it's peer-to-peer architecture where everything is talking to everything else and the allocation is dynamic. And you basically, instead of getting one type of computing power, you get sort of diversified computing power. There's high computing density. You're putting these racks full of CPUs, GPUs.  The energy being thrown off is crazy.

37:53
Liquid cooling is an absolute must. You can't get it. So, you kind of have this different approach. I'll read you some of the other differences.  This is directly from that book. I found this really helpful.  So, what's the difference between traditional and AI data centers? What are the differences? Here's the slide from them.  The traditional data center primarily supports enterprise applications and data storage.  It has

38:20
Mostly routine information processing tasks like web services, database management, file storage.  When you switch to an AI data center, you’re mostly supporting AI model training and inference.  You're basically doing as much as you can to efficiently provide dramatically increased computing resources for processing very large data sets.

38:50
provide those resources efficiently is a big question. uh Computing power, as mentioned, traditional data centers are CPU-centric, doing general purpose computing, AIDCs are XPU-centric. You're doing basically lots of matrix operations. You're doing lots of AI model training. I if you like the math side of this, uh know, CPU-based computing is...

39:17
You know, it's a lot of addition and subtraction and multiplication. When they do the flops and you do one mathematical transient equation after the next, it's mostly that kind of math.  Once you go to the AI side of things, it's all matrix multiplication. We take one matrix, which is a row of numbers.  It's maybe a three by four matrix. Three numbers across, four deep.  You take that matrix of numbers; you multiply it by another matrix of numbers.

39:45
That's how you get into vector databases and stuff like that.  So, the math is different, which is interesting. The analogy I've been using, which is not awesome, but maybe if someone has a better one, I'm obviously, I do the business side.  if you're trained in this, you’re probably grinding your teeth, like, cause I'm not getting it.  I think I've got it directionally correct, but I'm probably making some mistakes here and there.  The way I think about it is, traditional CPU processing is...

40:13
It's sort of very deterministic. Its serial based. You do one long, usually more complicated set of equations with a start and an end. So, I always think about it like a librarian in a massive library. And we're walking up to the librarian and we give her five tasks to do. I want you to go find these five books and bring them back to me.

40:39
and the librarian goes off and it's a very specific linear process, goes out, finds one, book two, book three, four, because they actually do kind of a lot of activities, and then they bring them back to you and that's the end.  That's like updating a database or updating an app, something like that. uh Linear, sequential, uh deterministic with a right or wrong answer, very rules based. When you do generative AI,

41:07
It's not like that, it's more probabilistic. It's we're creating something that maybe doesn't exist yet. So in that case, it's like taking 20 editors and telling them all, I want you to all run out into the library and get a little passage and draw on what you know,  and I want you to all sit down at the table and based on what you all know, I want you to write me a five page summary of something. Now, so I'm kind of.

41:35
you know, these editors would be sort of experts. Well, that’s your foundation model. It's like an expert. And that foundation model, this expert, has been trained on a library of books that they already kind of know about. So, we've trained these experts on books in the library. Now they may have to go and sort of search for a little extra information, but they have a lot of it in their pre-training.  They all sit down at a table and they write something that's not been written before. It's new.

42:03
instead of a sequential linear process with a sort of deterministic outcome, it’s parallel processing. They all work together at the same time and they will, you know, they will all take a stab at writing something and then they'll rewrite it and they'll rewrite it. So, it's kind of recursive, it's parallel processing.  The tasks they're doing are actually much simpler than what the librarian did, but there's dramatically more of them happening.

42:30
That's kind of how I think about it. That may not be 100 % right, but I think it's got a decent amount of how it works. Okay, other things to mention, last bit here. Technical architecture, the traditional data center, von Neumann architecture, is, you know, basically the CPU acts like the commander and it assigns tasks to other components. I mean, it can do parallel processing, obviously, but you know, it's mostly sequential.

42:59
AIDC is fully interconnected peer-to-peer architecture. Everything's talking to everything else. So, you get direct communication between processors with memory, with the network adapters. uh the latency is much less because you're not dependent on the general, the CPU to tell everyone to do everything or the librarian. uh So it's sort of more distributed parallel computing.  Cooling, cooling is actually kind of fun. I've looked at some of these.

43:28
I was hunting around the data centers at Huawei in Shanghai a couple weeks ago and literally they have examples of the big racks set up and how they lay the cooling out within the racks. It's actually really interesting. know, traditional data center, the power density of a rack, three to eight kilowatts per cabinet, something like that, you know, and then it starts to go up to 10 to 15. Okay, when you start to get to an AI data center, your rack,

43:58
can have 200 kilowatts per cabinet, 50 kilowatts per cabinet, 75, I think it goes up to 100. I mean, the heat coming off these things is dramatically more.  So, you got to start to use liquid cooling, which is interesting. Actually, they can sort of get these liquid tubes and they lay them directly on the circuit boards, often to a cooling plate and then the chips that's on the cooling plate.

44:23
But the later versions I think they've been playing with is where you basically put the entire rack in water. And forget the tubes, we're just going to design chips that sit in water. And that's how we'll do it. I've seen some of those which are kind of interesting. I don't think they're really deployed too much.  Anyways, that stuff's all super fun.  And that's most of what I wanted to cover for today. Just sort of three ideas which is, you

44:49
These data centers are huge. It's kind of the foundational thing. It's good to understand what's happening. Put on your business hat and think, okay, how does this matter for a company? Okay, product, services, workflows, productivities, operations. That's kind of step one of the generative AI playbooks.  But then really the core of all of this is we're talking about a different type of compute at the core. And therefore, we've got a different technical architecture being built. And you kind of got to understand how that works.

45:17
because that plays out in the cost structure of everything.  The compute is going to be a major part of the cost of doing this in any company. So, you kind of want to understand how all that stuff works. I'll give you a couple of last numbers then I'll finish.  If you look at traditional sort of CPU general purpose computing, standard server, mostly sequential processing, pretty low parallelization, your single core CPU is

45:46
going to do, let's say three gigahertz, you're going to do a task in a couple seconds, maybe up to a minute, depending on how much you can get the data.  Okay, that’s, let’s say, 0.1 teraflops. That's your basic processing speed for something like that. When you move up to inference, you're talking two, three, maybe up to 10 teraflops for inference. So, it's an order of magnitude higher in terms of just computing in general.

46:16
The memory's got to be bigger. The specialized GPUs are mostly operating in parallel.

46:25
So anyways, it's a different model. That's kind of where I'm going to finish, I think, for today. When you move to the next step, which will be part two, what you end up doing is you start talking about the peer-to-peer architecture and how you're going to cluster. To do this specific task for the generative AI, we’re going to assign these 10 chips to work together on this product. So, you start to sort of allocate resources dynamically because everything connects with everything else.

46:53
And that's different than traditional computing where you just set up a VM or a container and a lot of it may sit dormant, stuff like that. Anyways, we're going to talk a lot about clustering. Okay, that is it for today. I hope that's helpful. I think this is all super interesting. If you want to read more about this, I'll send you what I've been reading, which is mostly the structure of data centers. Grok XAI is doing this stuff too. So is OpenAI, so is Meta.

47:22
but they're not really selling it as a service. So, they're not publishing what you can buy from them. But when you look at the cloud companies, whether it's Azure or whatever, they’re actually telling you what services you can buy. So, they give you the architecture. It's really helpful.  I don't really know what XAI is doing behind the scenes other than what's reported, but I can see the offerings from these other companies. Anyways, that is it for me for today. I hope that is helpful. This is part one of, I’m thinking it's going to be six.

47:52
As for me, I'm having a great couple week. This has really been spectacular actually. I'm heading out to China in a couple days, going around Beijing, Shanghai, Hangzhou.  It's going to be a lot of fun.  I'm really looking forward to it.  I also went camping in Thailand, which I hadn't done before.  I wasn't really sure if I was going to like it because it can get pretty hot during the daytime and at night, you know.

48:22
Mosquitoes are not that big a deal, snakes concern me.  So, I sealed up that tent like I checked it so many times.  So that was kind of an inner... I didn't know if I was going to like it. I really enjoyed it. I had such a good time.  And I was up near Khao Yai National Park, near Pak Chong, that area.  Wonderful area, just sort of north of Bangkok.  Absolutely fantastic. And the part I really enjoyed the most was...

48:49
Once the sun goes down, cools down so you can set out at the campsite. Everybody was out at the campsite, like barbecuing tables. Everyone was just having a good old time.  I've seen Thai people do camping. I'd kind of known that. I'd seen that before. I'd never really sort of partake.  Man, what's that fun?  I'm going to do it again, like a lot.  So, it turns out I really enjoy camping in Southeast Asia, minus the worries of uh snakes.

49:19
Which I don't think I'm being crazy on that. I think that's actually a real thing to be concerned about.  Yeah, but if I'm wrong about that people from Thailand, let me know.  I think I'm pretty on target with that worry.  I don't think it's irrational.  Anyways, that’s it for me. It's been great. But yeah, I'm packing up and I'm uh going to be on the road. Most of November I'll be on the road. So yeah, that'll be, I really do enjoy that. So anyways, that is it for me. I hope everyone is doing well and I'll talk to you next week.

49:48
Bye bye.