The Macro AI Podcast
Welcome to "The Macro AI Podcast" - we are your guides through the transformative world of artificial intelligence.
In each episode - we'll explore how AI is reshaping the business landscape, from startups to Fortune 500 companies. Whether you're a seasoned executive, an entrepreneur, or just curious about how AI can supercharge your business, you'll discover actionable insights, hear from industry pioneers, service providers, and learn practical strategies to stay ahead of the curve.
The Macro AI Podcast
What Are AI PCs?
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Are AI PCs just another hardware refresh cycle — or are they the next major shift in enterprise AI architecture?
In this episode of the Macro AI Podcast, Gary and Scott take a deep executive-level dive into AI PCs and what they really mean for CIOs, CTOs, and business leaders.
They break down:
• What an AI PC actually is (CPU, GPU, and NPU explained)
• What models truly run on AI PCs — including small, optimized LLMs like Llama, Phi, Mistral, and Gemma
• Why most enterprise AI tasks do not require frontier-scale models like ChatGPT or Claude
• The difference between frontier reasoning models and edge inference models
• How hybrid AI architecture balances cloud and endpoint intelligence
• Why token cost is now a critical part of AI ROI analysis
• How to model AI token OpEx vs AI PC CapEx over a 3–4 year lifecycle
• Security and governance implications of distributed AI
• How much IT talent is actually required to deploy and manage AI PCs
• Whether AI PCs are foundational — or just hype
A key insight from this discussion:
AI token economics are becoming part of endpoint strategy.
As AI usage scales across enterprises, token consumption can compound quickly. AI PCs introduce a new lever in AI cost governance by shifting routine inference to the edge — reducing cloud dependency while maintaining access to frontier models for complex reasoning.
This episode reframes AI PCs not as a device trend, but as a strategic architecture decision.
If you are designing AI infrastructure, evaluating AI spend, or planning your next endpoint refresh cycle, this is a must-listen conversation.
Send a Text to the AI Guides on the show!
About your AI Guides
Gary Sloper
https://www.linkedin.com/in/gsloper/
Scott Bryan
https://www.linkedin.com/in/scottjbryan/
Macro AI Website:
https://www.macroaipodcast.com/
Macro AI LinkedIn Page:
https://www.linkedin.com/company/macro-ai-podcast/
Gary's Free AI Readiness Assessment:
https://macronetservices.com/events/the-comprehensive-guide-to-ai-readiness
Scott's Content & Blog
https://www.macronomics.ai/blog
I'm Gary Sloper here with Scott Bryan as always. Today we're tackling a topic that is starting to show up in the CIO strategy sessions that we're a part of and we're hearing about. And that's hardware refresh discussions, things that we're calling AI PCs here in the industry. You've seen the announcement from companies like Lenovo, HP and Dell.
01:27
new chips with NPUs, built-in AI acceleration, and claims that this is the next evolution of the enterprise endpoint. But here's the real question. Is this just marketing layered on top of a normal device refresh cycle, or is this the structural shift in enterprise AI architecture? We've received a few questions on this topic, and really listeners wondering to know, where is this going? Yeah, exactly, Gary. And AI PCs are not...
01:57
They're not replacing cloud where everybody knows AI sits, but they're kind of changing the architecture of where intelligence lives in some cases. So for the first time in a long time, we're seeing compute move in a slightly different direction, not just centralized into hyperscale data centers, but it can now be distributed back to an endpoint. And when intelligence moves along with it, economics, governance, and strategy, it'll have to move with it.
02:26
And that's kind of why it makes this topic important. um So let's uh just do like a kind of a clean definition. An AIPC is a traditional laptop or a desktop, but it includes a uh CPU, a GPU, and now something new, a dedicated NPU or neural processing unit. So there is a different component to this hardware refresh cycle that some people might opt into.
02:56
Um, in the NPU, um, the neural processing unit is optimized for AI inference. So, you know, running models, um, efficiently at a, at an end point, um, with a low, low power consumption. Right. That's a good point. And this does not mean training GPT class models locally. means executing certain AI tasks directly on the device. So instead of sending them to the cloud, and now it's being done locally in your lab.
03:26
Um, for years, enterprise AI meant prompts sent to hyperscale data center or data centers, and then receiving a response back, uh, from the destination here, AI PCs rebalance that model as some of the inference moves locally. So instead of just completely going out to the cloud routing, they're coming back. Now it's sitting right in front of you. So.
03:51
Some of that continues to be sent to the cloud, but now this really becomes more of a hybrid AI architecture at your fingertips. Yeah. So instead of everything being cloud dependent, intelligence becomes distributed and that distribution is really what kind of changes the whole conversation. Right. And I know what everybody's asking. This is really cool. What models actually run on an AI PC? So let's address what everybody's probably going to ask us, sending us in questions before you do.
04:21
uh When people hear or think of an AI PC, they think ChatGBT or Claude, and they wonder, are those models running on the laptop? And maybe Scott, you can kind of go a little bit further. We talked about this before the show, but we should probably debunk and also uh add to what the complexities and the simplicity of this is.
04:46
Yeah, exactly. think so just to answer that question, is it, know, chat GPT or something running on my PC? It's the answer is no, not the full frontier models, the full frontier versions. Those models obviously require massive GPU clusters, enormous memory footprints and power. Um, but what, what runs locally on AI PCs are smaller, highly optimized models. So for example, I think we've talked about a few of these in past episodes like
05:14
Lama 3, 8B from Meta. That one can be quantized and run very efficiently on modern NPUs. And then the PHY family from Microsoft, including the PHY3 uh Mini and small. Those models were designed specifically for strong reasoning, but on highly constrained edge hardware. ah And then there's other models out there like models from uh Mistral and the 7 billion class, those are widely used for local inference.
05:44
And then there's a whole bunch of small open models like some from the the Gemma family from from Google are optimized for super efficient edge deployment. Yeah, I can just imagine if if everything was run from your laptop, just it would just melt in your lap with the amount of power the heat. ah Yeah, we'll run another episode on small models. We probably should talk about that at some point, Scott, so we can write that down. ah
06:12
Because I think these models typically range from three to eight billion parameters. They're quantitized for efficiency. So they are tuned for fast inference. Think of it that way. Not massive multi-domain reasoning. So the thing to think about here, the key insight is most enterprise AI does not require frontier level reasoning. ah And that's really uh important for lot of organizations that
06:41
quite honestly, haven't even built out an AI strategy. So don't think that you're gonna be building out a uh frontier level reasoning platform on your laptop. ah So if we think about model size versus capability, maybe we just talk a little bit about this as well, because I this is important. And the reason why is because when executives hear smaller model, they assume less capable, right Scott? Right, yeah, definitely.
07:11
And it depends on the task. So model size correlates with the breadth and the reasoning depth. So larger models handle uh highly complex, multi-step, cross-domain logic better than a small model. But capability is really task-dependent. So if you're summarizing a 15-page contract or you're extracting invoice data,
07:39
or rewriting internal emails or categorizing uh CRM notes. You don't need frontier level reasoning. What you really need is efficient linguistic processing and smaller models do that extremely well. So especially when they're paired with retrieval systems that feed them relative context, like a retrieval augmented generation or RAG for your own internal business data. Yeah, I was just thinking RAG as you were saying that. Maybe if you were to...
08:09
Think of it this way, if you're an executive, a frontier model is like a global consulting firm. It's broad and it's powerful. Small edge model is something like a specialist, a SME, a subject matter expert, focused, fast, efficient at what they do. Most enterprise AI work in a specialist format, that kind of that specialist focused. And that's why AI PCs make sense for most of these organizations today. Yeah, perfect.
08:39
So I think the right question isn't, know, can my laptop run JET GPT-4? It's more like how much of my organization's AI usage really needs GPT-4 level reasoning. And the answer is usually a lot less than people think, especially in a large enterprise, across a large enterprise. Well, yeah. And to that point, I think this is a good segue to talk about cloud versus edge economics.
09:05
And there's the token ROI conversation that's hitting a lot of ah conference rooms from a discussion standpoint. And this is where executives really should lean forward. They ask, is this about performance? Is this about innovation or cost? And cost is becoming a major driver, specifically token cost. Most organizations, right, using cloud-based, hosted, large language models are billed per token.
09:35
And I think a lot of companies don't even, you know, some of the users don't even understand that at the base level. So that is the input and output within the LLM. So at a small scale, feels insignificant. I'm just asking a few things of, of the environment, but at an enterprise scale, if you've got hundreds or thousands of employees, that definitely has a compounding effect on your costs and, and basically, you know, what you're using the, environment for. Yeah.
10:03
Yeah, definitely. think token costs certainly compound, like you said, and we can walk through an example. So imagine you have say 2,500 employees, but really only a thousand of them use AI daily. And they're it for drafting emails, summarizing documents, analyzing spreadsheets. So every one of those interactions consumes tokens going out to the large language model. So multiple prompts per day, long documents uploaded. uh
10:33
you meeting transcripts, and then you've got, uh, using your, your rag systems might be actually expanding those context windows. And suddenly you're talking about, you know, millions or billions of tokens per month, and that becomes real OPEX. And I think here's the, insight is that a large portion of that token usage is really just routine cognitive work, summarizing, extracting, drafting, categorizing. And those are tasks that a smaller local model.
11:03
can handle without burning those tokens. So a really good point. think given that description, which you were much more eloquent than I was around how this can impact your token cost, AI PCs become part of the ROI equation. So if 50 to we'll say 60 % of routine inference moves locally, now you don't have API calls. You don't have
11:30
Uh, per token billing, don't have incremental cloud charges. Now you compare ongoing token OpEx versus incremental CapEx for AI enabled endpoints. And you model it over a three to four year device life cycle. And that should be able to tell you where you should make your investment. Yep. Exactly. Um, so if you were to, you know, work on that ROI model for AI PCs, you'd say, um, first step.
11:58
you you'd quantify the AI usage patterns across your organization, which you can now do. ah The second step would be to, you know, estimate annual token spend growth. So based on the trajectory that you're seeing, see what percentage of the workloads are actually lightweight workloads. And then the last step would be take a look at what are the costs, what are your, what have you been bided, what are your bid safer AI PCs? And then you can amortize the AI PC costs over, over the life cycle, then compare.
12:28
You know, what is, what is the ROI? Should we be using AI PCs as, as endpoints? And do we have the internal technical capabilities to run these lightweight models? Yeah. And this is workload optimization, right? So, um, heavy reasoning stays in the cloud. Doesn't need to move and routine inference becomes distributed. So token costs is now part of your endpoint strategy. Like I mentioned earlier.
12:54
10 years ago, we optimized bandwidth, right? You and I've been doing that for a long time. Five years ago, we started optimizing cloud compute and that's when Finox became a regular norm in the finance organization. Yep. Finox. Yep. And so now we need to also optimize AI token flow and it's better to do it now and put in the right architecture. Otherwise you could be out in front of your skis pretty quickly. Yep. Exactly. Yeah. And I think that's the, that's the shift in it.
13:24
And it kind of reframes the entire AI PC conversation. So it's not just about new laptops. It's about overall cost governance and an AI deployment tokens, all your other associated costs. Exactly. And the organizations that analyze this early instead of, you know, react to ballooning token bills will have a structural advantage. Yep. Certainly.
13:50
And then if we think about security and governance, which we talk about often, in some ways, local inference can probably reduce your risk. So if you think about scenarios such as sensitive documents, they could be summarized locally, never to leave the device, never to go over the wire out to the cloud. That could reduce some of your exposure, especially as it leaves your environment.
14:14
ah But then governance becomes distributed. So now inference is happening across thousands of endpoints. No different than, you know, CISOs have to deal with today around endpoint management internally for an organization. Now you just have to add AI as part of that. the challenge shifts from... new. Yeah, yeah, exactly. So the challenge shifts from monitoring API calls across the WAN to now enforcing policy across devices. So that's an architectural...
14:43
problem or challenge, depends on how you look at it, but it's not a model problem, right? uh It's not the model's fault of how this is shaping. This is really an architectural challenge for you. So we could probably revisit that in a future episode because I think that'll be interesting, Scott, and maybe even get a security expert, a hardware expert on that's focused in that area. Yeah, agreed. Yeah, I think there are a ton of topics to cover around security and governance. um
15:12
But I think another natural question from business leaders would be, if we deploy AI PCs, how much IT talent does it take? And that depends on what deployment you choose. And this is an area, a market really, that's developing pretty quickly. So one option would be, there are OEM managed or MSP managed AI PCs now. And there are lot of...
15:41
competitors that are starting to move into that space. They have integrated models. um They push out automatic updates. So really not a heavy AI engineering lift. um And then there's kind of the next tier, which would be a controlled enterprise deployment. And that would be where you actually have an IT team that has skills in this area. They select those lightweight models. They integrate the RAG system, the Retrieval Augmented Generation System with your corporate documents.
16:09
to use your company specific data. And the team is relatively competent with a stronger level of AI governance to manage those endpoints. uh Definitely requires an AI literate architecture, but not an extensive AI team. And then there's the next tier up, a lot of large enterprises are, very large enterprises looking at. uh that kind of third tier requires the capabilities to handle model fine tuning. uh
16:39
distributed machine learning operations, internal model registries, and not a lot of organizations need that today, but it's definitely a quickly growing space, a quickly growing market, and a lot more talent is emerging coming out of college that's studying this right now, and they're being hired into the workforce. So the real talent shift uh is uh architectural thinking and not specifically around model training. Yeah.
17:09
I think the question is, does this change endpoint strategy and regarding your architecture? think you'll start to see, and some companies are considering accelerating the refresh cycles here. This isn't, this isn't a move that you would launch universally. Typically in years past, you would categorize the users, deploy, you know, strategically where it made fit, ah measure productivity and cost improvements by, by doing a refresh.
17:37
scale intentionally from there ah as it made sense for the business. But I think now you could potentially see a lot of that changing because of things like being able to have your inference locally. ah It can be a game changer. Your governance may be more distributed, but it's not having as much exposure going out to the cloud. So I think this does end up changing your endpoint strategy.
18:03
Yeah. And the CFOs might get involved when they see those giant token bills and put some pressure on the, on the CIOs and IT teams to really consider it. Well, and to that point, think, you know, that's where you go back. We've talked, mentioned FENOPS earlier. That's where a lot of C CFOs have struggled because they're so used to hardware, CapEx, depreciate, depreciating those, those assets. They might be open to this. They might be, they might say, Hey, this is great. I can have more of a.
18:33
Uh, uh, uh, a legacy model of it that I'm used to where I have something that I can depreciate versus you just giving me these token bills that are up and down, up and down. You're saying that that's going to change and be a little bit more. Controllable. Yeah. More fixed. I'm I'm on board. I'll go fund you the funding for it. Yeah. Yep. And then they'll be able to assume a pretty predictable. Projectory of growth on everything that does actually go out to the cloud. Right. But, uh, so.
19:02
Anyway, I think AI PCs will eventually become a foundational deployment across the enterprise. uh Intelligence is definitely going to move. Some of it is going to move out to the edge. The endpoint will be an AI node in a distributed architecture. The cloud, like we said, is still going to remain essential. There'll still be token costs, but the balance between cloud and edge, I think is going to start to really permanently shift. Very true. However, I would warn IT execs
19:32
or IT nerds like myself from over buying too early. So this shouldn't be, uh you know, the launch of the new product at a showcase this year and you have to buy it. Make sure you have a use case. uh Assuming local AI replaces things like cloud AI or underestimating governance complexity is something you must consider. So AI PCs are essentially part of the stack.
19:59
and most likely will not be your entire stack. Right. No, I agree with that. think the first movers are really AI intensive enterprises, ah industries that are highly regulated, um companies that have a really large AI API spend. you know, those would be pretty obvious. They've APIs and a quad GPT and they're huge amounts on tokens.
20:27
But everybody else will be, you know, piloting them, you know, measuring, measuring their ROI and starting to plan deliberately for, for some, and then more AI at the edge. Good point. I think that's a good place for us to close out the episode. Yep. AI PCs represent the next stage in enterprise uh AI evolution. And I think not because they replace the cloud, but because they rebalance intelligence between the cloud and the edge.
20:57
So the winners won't be the ones who just buy a lot of hardware. They will be the ones who design a hybrid AI architecture, no different than you do for cloud today. And making it intentionally, balancing edge, cloud, governance, and token economics, I think will be paramount and will lead to success. So thank you for listening to the Macro AI podcast this week. We appreciate all the listeners.
21:25
Thank you so much for sharing it within your network. very successful. Until next time, we'll see you soon.