Zane Benton Podcast
Hi. I'm Zane. Much of my time is spent researching and staying up-to-date with the latest technology. I play guitar in my free time and hangout with my cows, chickens, donkey, and 3 dogs. Life can get interesting out here on the Texas Blackland Prairie, so I hope you'll join me for the latest!
Zane Benton Podcast
M5 MacBook Pro w/Qwen3-Coder-Next Review
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Apple recently launched a new MacBook Pro. Local AI models are very RAM intensive, so I ordered a maxed out MBP to run the fast inferences. This episode reviews the Qwen3-Coder-Next-UD Q8 and the comparison between my old M2 MBP with 96 GB RAM and the new MBP a M5 with 128 GB RAM. I am in complete awe at the speed of the new MBP and can't wait to put it more to the test!
The MacBook Pro was on my list because I use the same computer. I've used the same computer for years. And um I've never been one to buy fully maxed out computers. But um but with AI and the way everything's going and and understanding that these um these models require a ton of energy, RAN specifically, to process, I thought to myself, man, it would be a good time to get a computer that can crunch all that data. And so I did. So when Apple announced the uh pre-orders earlier this month, I think they started at like whatever time it was, I can't remember, but I think it was 8.15 my time, central time. And uh so I was on just like refreshing, you know, at 8.13 two minutes before I was refreshing, refreshing, refreshing. And when it uh when it finally came up, I was able to order one. And I just do that because I'm used, like I used to buy iPhones every year. So when um I didn't buy the first generation iPhone, I bought the second generation iPhone, was my first one. So that was the iPhone 3, 3G, whatever it was. Um, and so back during that time when Apple was releasing iPhones, um, it used to be in September, and then like in 2011 or 12, I can't remember when it was, there was like a supply chain glitch, and they couldn't release in September. So uh, or actually they used to release it in June, I'm sorry, at the WWDC, and then there's a supply thing chain thing, so now they're released in September, which follows Apple's uh production schedule. Now they just fit it in to where they can always have something launching throughout the year, whether it be different computers, iPhones, different versions of the iPhones, iPads, and so forth. So I just kind of got used to going on and refreshing to make sure that my order got placed because um, you know, I don't know how it is anymore. I don't buy iPhones like I used to, but um, it used to be that if you didn't get your order in within like five minutes, 10 minutes, then you'd be weeks out on the shipping date. And uh and the servers would always crash. So um, so I got this computer ordered and got it in a couple days ago. Apple was actually late. They were supposed to deliver it on Friday, and um, and it didn't get here till Monday. No big deal. But I was surprised that Apple did not even send a notification. I thought that, you know, they usually send the text, and I enabled the text, um, and they usually send emails, but I never got a notification that the uh computer was running late. And but nonetheless, got it, got it a few days later, and uh, but actually sat here for a day or two because I was like, what what how do I set this thing up? What do I how you know what's my use case for it? And uh really it's just development, um and specifically with AI, which is why I got this, you know, maxed out. So it has um the M5 Pro Max or whatever the whatever the chips are, the the max chip in it, and um 128 gigabytes of RAM and an eight terabyte hard drive. So this morning, actually yesterday, I downloaded um I went to Hugging Face and I downloaded the um Quen coder which which Quinn model is. I need to check. It's the next the next three just give me a second. Quinn three coder next and it's the uh eight quant version. So it took up it only took up about I think 86.3 gigabytes or so of hard drive space. But uh, you know, you need at least 96 gigs or so of RAM to run it. So I that was my first thing. So I got that downloaded last night, and I and this morning I got on here and I got my Claude code to what I did is I went into Claude Clode uh Claude Code and I started a um session with it, and I said, Hey Claude, I have um a new computer. It's you know, and I gave it the specs, and I said, I want to run um, or I said I I just downloaded this model, and I you know, actually what happened was is I went and I downloaded a model that my computer wouldn't support. I downloaded a um, it was a GLM 5 um version that was the full version that you know 128 gigabytes of RAM can't uh run. I mean it was you need like like a 512 gigabyte, you know, Mac Studio or something to run that, um, which I do not have. So so Claude said, hey, you got this massive thing downloaded, but you can't use it. So what you need to do is you need to download this Quinn coder and you know, gave me gave me the link to it. So went to Hugging Face and I got it downloaded. And once I got it on my machine, I told Claude, I said, hey Claude, here's the the directory that my AI model is living in, the Quinn model. And I want you to set up a Claude Code like program for the Quinn model, but I do not want to use Quinn in Claude Code. Does that make sense? Because Claude Code uses the anthropic API, you know, the Claude. So you tap into their their system, so you and I like the way that works. Claude Code is an excellent program, and I love it, and I don't want to change it. And I want to keep it just the way it is, the way the company anthropic has designed it, um, because they're a heck of a lot smarter than me. So I want to leave that exactly the way it is. But what I told Claude was I want you to set up a separate program that to use this Quinn um next on, and I want it to function just like Claude code, and it said, no problem. And so um it gave me like a few different options. Like it was like, would you like to do it this way, boom, boom, boom, or would you like to do it this way? And so we settled on an option, and so now what I do, excuse me, now what I do is all I have to do is so like when I go into uh terminal and and I want to run Claude, you run Claude, that's all you have to run. Um, so if you and then if you want to run open claw, all you have to do is run open claw tu i in your terminal and it will get it kicked off now. So what Claude set up is I just go to terminal and I type in Quinn Enter. And when I type in Quinn Enter, it brings up boom the Quinn terminal program with Quinn running in it that Claude designed. And so it is, I was just what I so I just started messing with this this morning and I was absolutely blown away. And the first thing I uh asked, I said, Are you awake? And it says, Yes, hey, you know, whatever. And I said, Okay, write a um a Pong game in Python and save it to my desktop. And so immediately I press enter it and I was amazed, it just starts so fast, you know, it's like because I use the local model on my computer. I have a 2023 MacBook Pro and 96 gigs of RAM, it's an N2 Pro Max chip or whatever it is, and uh it's a good computer, and I run a version of the Quinn on it, but it's slow, and so I I you know I have it there and I've used it just a little bit, but I don't use it by practice. I use um Claude Code and Anthropic just because it's so much faster. But with this computer, when I press enter for the first time, uh asking it to make the the Pong game, it just starts just spitting out the code. I was like, whole and it's local, so it's it's your local machine running at that speed, and I was just like, oh my goodness. But there was one problem, I instructed it to save the Pong game to my desktop as a Python uh script, but it didn't do that, and so I followed up. I said, uh I said the script looks great, the program looks great, but I want you to save it to the desktop. And it's it told me it said, I'm uh it said, I'm sorry, I can't do that as an AI model. I'm only supposed to do this, this, this, this. I don't have access to all these other things. And so I went back into Claude Code and I told my Claude Code um instance that built the Quinn program basically as a clone of itself, but with Quinn, um, I went back and told Claude Code, I said, hey, like my my Quinn's working great. You did a great job setting up that program, but it's not saving anything to the desktop like I instructed it to. And it's saying that it doesn't have access to these certain files and so forth. And so Claude knew exactly what to do. I didn't even have to tell it, hey, go fix this or go look at this. It just said, no problem, let me fix that. And it goes through it and uh says, okay, we're done. And you can go back to Quinn and uh start a new terminal session with Quinn and try again. And so I went back to Quinn and I told it, uh once again, build a Pong game, but uh save it to the desktop. And about I pressed enter and it didn't even do anything. So I thought it was maybe frozen or something for a second, and in about five to eight seconds, poop a Python file just populated to my desktop. So I ran it in terminal, the program, and actually I went to Visual Studio Code first to run it through the debug thing, and it just popped up right away. And then I ran it in terminal and saw it, and there was your Pong game. And so this was a validation of sorts for me because um the I can assure you that the local AI models are going to become huge, huge, and not many people are running them right now because you have to have the the memory and the space to do it, but they're they're making the models smarter with higher compute density, um, and managing the weights better and the way that they're training them. And so we're going to be getting, you know, it's just like everything else with technology, right? Like it just gets small, like it gets better and better and better, but the devices get smaller and smaller and smaller because they're able to pack more onto it. And I really think that at some point, you know, I saw Sam Altman, actually, OpenAI just a couple days ago, was it yesterday or the day before, said that they were um going to kind of restructure their business in a in a way that that goes after uh bit that goes after enterprises. So open AI would now focus on uh enterprise businesses, the B2B cells uh or customer, and then also uh programmers, so people using it to code. And that was a it was a real interesting shift. I didn't go and uh read and I just saw the headlines talking about it. But what that signals to me is that you know, open AI is at the forefront, you know, of all this AI technology. And they're they're at least one of them, right? And so we have to take what they do very seriously, and if and if they're restructuring away from the the general consumer just going to chat GPT, because that's really where OpenAI got started, right? Like that's how they blew up was through ChatGPT, which was a web-based chatbot. And now they're realizing where this agentic programming is going, and they're realizing enterprises can pay for this stuff, and we can overcharge them because there are these big businesses, and you know, even though they could have some dude vibe vibe code them a program, they're gonna spend a ton of money because they feel safer with open AI or whatever the case may be, and that's great and fine. Uh, but but things are shifting, and things are not only shifting in the tech world, they're also shifting geopolitically. And um, you know, Trump or NVIDIA, you know, Trump in in January, Trump signed off on an executive order that uh made it possible for NVIDIA to export um, I believe it was the H200 or it may be a few different um of their products to China, which then leaves less product for the United States. So, what does that do with just simply supply and demand? Who knows? I don't know, but I think that that these AI companies that are making billions of dollars on this stuff right now, every every day, that they're realizing that there's a ton of money to be made, and they have the best stranglehold on the market that anybody could have. Because we as individuals definitely don't have the market cornered in any way, but these big AI companies have kind of you know, they have first mover advantage, but the problem with that for them is open source, and so there's no way to put an open source genie back into the bottle. And if you understand, like, you know, like uh like Bitcoin blockchain, when you hear the politicians talking about, oh, we're going to ban it and we're gonna make it to where people can't own Bitcoin or have Bitcoin, it's like good luck. You literally, that's impossible, right? Like you could you could pass a law to say that if anybody possesses it, it may be illegal, but you can't stop people from possessing it, if that makes sense. And so there's because it's all open source, and so with the local AI models, they're open source as well. So when we have local AI models that are nearly as good as the frontier company AI models that are making billions of dollars on this stuff, you can bet that those companies are trying to protect their business interests and not let these open models rise up. And I saw that NVIDIA was working with, was it Olama? I can't remember. Uh, this just came out a few days ago, talking about how they were going to, or maybe it was open AI, I can't remember, but they're going to uh NVIDIA was going to work with somebody to produce an open source model, and um, which is great, you know, it's exciting, but I think if we look at it, they're probably trying to position themselves in a way that tamps down the competition by providing saying, hey, we came up with our own model and it's an NVIDIA model, and it's good enough for you guys, where everybody just becomes complacent and says, Okay, we'll use that model, but you know that's not going to happen because there's there's so many models out there available right now for people to use. Now, the problem is that we just don't have the space or memory to run most of those, and that they're not quite up to speed with the best AI models out there, which are closed source and you have to pay for. But soon they will be. And so this MacBook Pro that I bought that I just got in. I am doing this as kind of in preparation of that. And I don't know if there's any uh truth to it, but I was I've seen a couple people um opining that they believe that it's it's going to be harder and harder to get higher performing machines from companies like Apple and NVIDIA because the higher performing machines are the gateway to the best local AMI i models. Right? And so if you're just running just a you know, just a basic MacBook Pro, you know, maybe it even has I don't need 64 gigs of RAM, uh, which is a mid-grade system, uh, you're still not going to be able to run uh, I mean, you could you could run a decent, decent AI model, but you're not gonna be designing iOS apps with that by any means. So it just doesn't have the capability, it doesn't have the context memory. And so when you can run bigger models, it of course gives you more capability. So if you need that capability, uh the new MacBook Pro with 96 gigabytes of RAM or 128 gigabytes of RAM, that's the maxed out version, that will definitely give you the capability you need to run these top tier local AI models. So thank you so much for joining me today on the Zambin show, and we'll be back next time to talk more tech. See ya.