Rate Limited
Discussion about the latest news in the world of AI assisted coding.
Rate Limited
Opus 4.7 Feels Weird? Claude Design is Amazing & Cursor? | Ep 14
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
This episode explores the latest AI model updates, including Opus 4.7, Kimi 2.6, and the evolving landscape of AI in design, coding, and enterprise applications. The hosts discuss model performance, industry trends, and strategic moves like SpaceX's partnership with Cursor, providing insights into the future of AI development.
Links:
Ray: https://www.youtube.com/@RayFernando1337
Eric: https://www.youtube.com/@pvncher
Adam: https://www.youtube.com/@GosuCoder
Chapters
00:00 Introduction to the Podcast and Recent Developments
02:49 Exploring Opus 4.7: Impressions and Comparisons
05:58 The Art of Prompting: Strategies for Effective Use
08:57 Design Innovations: Claude Design and Its Impact
12:03 The Future of Design: AI's Role and Implications
15:00 Kimi 2.6: Performance and Comparisons with Other Models
22:17 Exploring Composer's Efficiency
25:19 Kimi 2.6 Performance Insights
30:44 The Impact of Tokenization on AI Models
33:15 Cursor and SpaceX: A Strategic Partnership
40:20 The Future of AI Coding Companies
46:24 The End of RooCode: Reflections on Community
46:32 The Evolution of AI Coding Practices
50:15 Navigating AI Engineering Workflows
56:24 Orchestrating AI Agents for Efficiency
01:02:00 The Future of AI in Development
Ladies and gentlemen, you are now tuned in to the Rate Limited Podcast with your host, Eric Provence of Rebo Prompt. We also have Adam Larson, aka Gosu Coder on YouTube, and myself, Ray Fernando. We are aiming to be the five-star podcast, so make sure you go ahead and rate us highly there on Spotify, Apple, uh podcast, and your favorite podcast show. Ladies and gentlemen, welcome to the show. We have a bunch of stuff packed in from Opus 4.77, Himik 2.6. Things are shipping nonstop, and we're really excited to get back to you here this week. Uh Eric, go ahead and take it away.
SPEAKER_04Yeah, thanks for that, Ray. I mean, it's been uh a little bit of a gap since our last episode, and obviously, like the space doesn't stop. It's been the month of anthropic shipping, it seems like. And uh, oh boy, uh Opus 4.7 has been, I think, probably the most divisive model they've released so far, coming hot off the heels of the Mythos model. So uh I know Adam, you you had some work featured and you got to give it an early shot there with Opus. I want to I want to hear your impressions. You probably spent the most time with the model out of all of us, and uh, you know, you you've been talking to people who've been integrating it directly. I want to hear exactly like what are your thoughts? How are developers you're working with feeling about it, and like where do you think we're going as an industry with this? Like, you know, like is the upgraded number like worth the number or something else? You know, I I want to hear what you what are you thinking?
SPEAKER_00Yeah, I mean, I maybe taking a step back here for a second. There's only been a few times where I've actually been like blown away in a model upgrade, and and Opus 4 to 4.5 was such a big jump for me. Four 4.5 to 4.6 felt like a nothing, kind of like 4.6 to 4.7 is kind it is definitely all over the place. Because there are times that it feels good, and there are other times it it feels like it's degraded. So I've started actually asking around to other people that I work with, and all of us pretty much unanimously have a hard time deciding if it we like it better or worse than 4.6 or 4.5, and uh, and that's kind of a bad sign. I've um I've attempted it for like some very large you know uh new features, and I feel like there's a there's things that it has failed on that I feel like 4.5 would have succeeded on. And I've actually got some tests up and running um on my other computer over there where I've actually got two side by side running with 4.5 and 4.7 because I'm I'm curious like if it's just like a and you know, maybe I've over-emphasized how good 4.5 was, but in general, I'd say uh it I I've probably been more neutral on 4.7 than I have been on like the other ones. I don't feel like it's anything more than uh what I saw going to like 4.6 and actually in some places, like I said, I I think it's actually degraded. Uh I was talking to one engineer today and we were we were chatting, and both of us like were just blown away with 4.5, and we were we were talking about how 4.7 actually has had us start considering using the GPT models. And he and he was a big Opus person. He's pretty much fully shifted over to GPT models for execution, which is something that he would have not thought about doing. So I think GPT is making uh gaining some progress, at least in my group of folks.
SPEAKER_04Yeah. And I know Ray, you're like a huge clawed buff, so I'm curious. Like, have you been you know pushing it and seeing what it can do for you?
SPEAKER_03Yeah, here's the funniest thing that I've gotten as far as number one takeaway, is that you have to gaslight the model back. Let me just like so it is interesting that sometimes it can be really good at following and over-prescribing whatever you're writing down. But I like to watch the thinking traces and I've been seeing them a lot inside of Droid. And I I do max thinking because I just really want to see what this model is like outputting. And it's like I sh I don't want to do this, but I should be doing this because the user keeps telling me I should do it. This like this is just done, but like I need to do an additional pass because XYZ. And I was like, hmm. What if I take away all of my instructions and just tell it to do its own self-discovery? And it comes to the same conclusions that I end up going to, but I now have to be that manager who has to kind of like take the horse to water.
SPEAKER_04Oh man. Oh man.
SPEAKER_03Here's some water. And like, you know, I'm looking at the horse, it's like dead thirsty, and it doesn't want to drink water. And it's like, let's just go over here, you know, like let's just get away from the desert and just let's just kind of go to a different area. Okay. You know, like that's the way I have to treat this model. And it's really good. So this is the biggest discovery that I've made is like I actually have to sort of prompt less, give less context, give less everything because the model is super hungry to want to go. And I it could be the droid system prompt, it could be, you know, that's kind of why I've only had a lot of time inside of droid and cursor right now. Um, and it's maybe just the fact that those harnesses maybe prefer uh or have additional instructions to go grab some more information, you know, within the context to build it on itself. So that was an interesting insight though, and even on the web too, even when I tell it to be very like when I just dump a bunch of information, it tends to not want to think as much. Um when I give it less and I say, hey, go go find your things, it it it will do a lot of time. Because I I had an issue on the web different from all of this, where I was looking into some health stuff and it like just didn't want to look at any of the information. I give screenshots, I give a bunch of information from the doctor, and it's just it's like, wait, you know, like it missed very critical details that my doctor was actually calling out.
SPEAKER_00So I wanted to ask you about that, Ray, because like I've had that same problem where sometimes it like almost refuses to do what I'm asking it to do.
SPEAKER_02Yeah, yeah.
SPEAKER_00So it seems like uh you've kind of figured out a workaround on that, but there are other times it's like flawless, so like I can't quite put a finger on like sometimes it just ignores something you tell it, or like it just outright refuses to do what you want it to do. Right. So it sounds like am I right in assuming that's what you're seeing, Ray?
SPEAKER_03Exactly, exactly. I I have no clue where, like, it's just so random.
SPEAKER_04It's like so I I've done some like forensic analysis on some of my prompts with it, and I like have Opus like look through the prompts and like ask it like where it gets stuck on certain things, and like so. I mean, I have this orchestration workflow, uh, and I'll talk a little bit about more about that later. But one of the things in it is that it it requires like a little quick scan to get an informed prompt for like getting the whole process started. And I was talking to Opus about it because it was skipping that, or it was just like going through and it was skipping the whole workflow and just doing everything. And I was like, Well, why are you skipping this? And it was, it turns out like it wanted to make sure that the prompt that it got started with was super well informed and like had to be like super detailed and it didn't want to miss anything. And so like you can see like there's like all these like little like scuffs and and little like uh paper cuts where the model it just like goes deep down a chasm to try and like meet a criteria and it like over indulges and over-explains its its like what its rationale for it, and and that's why like to your point, you really have to like kind of carefully guide it. And and I I find like if you give it a clear task and don't really tell it how to do the task, it does really well. But if you're like do the task this way, that's when it doesn't like it. And I think that's like the weirdest thing about this model. And and actually, like one of my favorite things about the GPT models, you know, to your coworker's point, Adam, is that they they listen to instruction. You tell do it like this, it'll be like, got you, let's do it. And and you know, when you have clear workflows, that's like a huge difference. Um but I want to talk a little bit more about design as well. You know, Claude, they just released the the Claude design tool, and I mean I've been using it for front-end work and revitalization, and this thing has impeccable taste. Like, this is a really, really tasteful model, and like it's it's weird, it's it's like a wild, wild horse. Like, I someone on X was talking, I was talking to them about it. Like, it is harder to steer. I I was I'm listening to the Acquired podcast. They were talking about the Ferrari episode, and I and it, and they were talking about the F40 from like the I think late late 80s, late 90s. And um, and that car is like a car that you don't you you you don't get in the car if you don't know how to drive it. And and if you if you take the wheel and you don't know what you're doing, you're gonna crash. And I have the same feeling about Opus 4.7, whereas like the older 4.5 and 4.6 was like a Honda Civic, where you you just say, Let's go, uh, and it goes, it's not gonna get you there fast, it's not gonna get you there like necessarily like in record time, but or like with record quality. But you know, 4.7 is like when you when you can drive it, you're gonna fly. If you can't drive it, off the cliff for you. Um so so I'm curious. Have you have you found like back to the design part a little bit, uh, that that like it's been a big step up for you? Has it uh changed anything about your workflows? I'm curious, Adam, if uh if you have thoughts there.
SPEAKER_00Yeah, I mean I would say yes. I actually really also like the the whole cloud design like ecosystem that they put out. Speaking of that, but in general, like Opus 4.7 is really good at design. And honestly, the cloud models have always been very top-notch from that regard. Um the only counter I would have is I've actually been falling more on when it comes to like image generation, which to me is a big part of design. I've been actually using a lot of Gemini lately for that, which I know I know that the uh ChatGPT model that just came out too is also really good, but I haven't tested that one as much. So, like right now, I actually have a cloud design project open up over here, gave it a very simple prompt, uploaded like a logo of a business that I'm starting with my kids, and it's amazing. Like it gives you like everything that you need, and then I can take that, feed it into Opus, and I get a perfectly designed e-commerce storefront that we can run. Like it it's really good. I have no complaints on it from that regard.
SPEAKER_04Alright. Yeah, and now what about you, Ray? Have you been like pushing the design limits on this thing?
SPEAKER_03Yeah, I did a full-on live stream. What I did is I took an existing design template from uh what is this place called? Aura.build. And there are a lot of really good templates. So I just grabs a couple screenshots, put the HTML in it because I really like the font and the interactive design. It's just a lot faster to describe it. And the thing that really kind of blew my mind, I just told it to go full in. You know, when it uh all the questions that it asks me, I just you know answer them in detail. And like it, you know, I said, Do you want to build one scene? It's terms of like a block landing page with the blog page. And I was like, build it all, you know, build about me, build it all. It gave up all these details, and it's like it felt like I hired a 100k designer to sit down with me just off that screenshot and just those questions that it would ask back and forth. Um, because I have sat with a designer over, you know, like designers in general, you know, at these big Fortune 500 companies, and I have sat with them um also like even recently with the droid folks. And the they ask similar questions, you know, about like your feel, your fonts, and these types of things, and I thought that was really insightful. And then how they take that work and what Claude does to give that to you as a tool was really cool. But the most amazing part was then telling Claude to give me a prompt to hand off to other agents that would encapsulate the entire conversation for the design. And it gave me a really extremely succinct, like 500-line document that has every single little detail in it. And that sending it off to like Google Gemini 3.0 Flash with Google Stitch was able to recreate everything that it did just from that small set of instructions. Wow. Which I found really, really incredible. And it because I don't know if you all realize this, um, if you can hit your limits super fast within like five prompts. I'm up to on the 20x max plan. I already use 20%. I was like, oh my god.
SPEAKER_00I hit my limits daily, like I'm always hitting my limit. But uh somebody ask you guys something. Do you think how do you think this impacts designers as a whole? Like it I mean, there's a lot of companies too, like Figma took a big hit in the stock market.
SPEAKER_03I think this is the cursor moment for design. I think this is that they can sort of like put in a lot of their work, templates and things, and then hand it off to the engineers and say, this is kind of what I'm thinking, like, and just start making like factories of ideas that they have. You know?
SPEAKER_04I I don't know. I find the thing the thing about design, and maybe like eventually cloud design, like if they keep investing in it, will get there. I think that the the reason people use Figma is like the control, the like collaborative nature of it. There's like a lot that is there, and the ecosystem around it is really rich. Like you're not, it's you're not like you can use Figma without AI, but like Figma does have AI, but you can't use cloud design without AI. And like, you know, like you hit your usage limits or you hit your rate limits or you hit whatever, like you're not gonna be able to continue working. Like, that's just not like a way that like people can depend on. Um, I I think I think it's a it's a fun. I think oddly enough, I think cloud design is great for non-designers. I think probably some designers will enjoy using it, but I think you want a tool that is like a power tool for designers. And I I'm a little skeptical that like just having like a blanket AI tool like this is gonna just replace that whole way of working. You know, maybe we need less designers, you know, like we'll need maybe less software engineers if we have more AI, but I don't think it replaces the need for them. And someone who's not you know design oriented won't be able to kind of get the kind of results that you want, the consistency and the context needed to kind of make a really truly excellent design that meets all the criteria. Everything about design is is like meeting you know trade-offs and and and delivering them in the right way. And if you just have someone trying to make a splash or a landing page, sure, but like when you're building a product that has a lot of things to consider into it, like you don't want to just hand that off fully. I don't know. What are your what are your thoughts there, Adam? Oh, or you know go ahead, Ray.
SPEAKER_03You you had something to say for sure. I want to propose something to you. I mean, because like there would be teams that would literally spend years making UI kits, right? Yeah, yes. And so how s how close are we for that programmatic part, right? Because like to I've worked them with amazingly talented designers and then some amazingly talented engineers, and some of the best ones kind of had the blends of both skills, of course, and they have these variety of skills too. And then when everyone sits together in the room, this iteration's kind of crazy. And so they're able to translate the conversations between the design, engineering, and they're like, okay, since we do this pattern a lot, let's just make a tool for that, and that becomes something that's exposed in the API layer. And then that's basically what I'm thinking that would be exposed to an agent, right? But they were just thinking about this from a you know design iteration phase, not necessarily from a hey, like we have to think about an organization phase. They're just at this ground level being super creative and like playing around with things, like even just drawing something on um, you know, your wristwatch or something like that, right? Calling these graphics engines and understanding the pipelines for how those are developed, uh, to make them like, you know, the the constraints are a small watch, a small display, small battery, you know, that type of constraint really kind of made diamonds out of these rough ideas. And so is that like, is this kind of that one layer that is starting to develop right now where like eventually this will be handed off to agents, and then eventually as a human, you're just maybe have other humans you're coming into who have these different backgrounds, you can kind of start making your own frameworks, right? I don't know, like this case.
SPEAKER_00If we think about the evolution of design, um, you know, back in the early days it was like Photoshop, and then like you'd stitch together images, and then there were quite a few other things before Figma and then Figma came around and then unlocked it. We think about where we are today, like there there's different stages of experience that designers have. There are people that I would say are graphic designers, they can make a pretty picture. That is not like the full extent of a designer. Of course. So what I what I kind of see to your point, Ray, is maybe this is like the cursor moment, but it's it's the pretty design. But there's still the how do all the systems come together? What's how does a user interact with it? What are the things that actually need to be on certain pages? What should the micro interactions be, etc., that that designers have to think through. And I don't see AI doing that, but it should be. The way I was trying to build it is like cursor's a tool for me to build could make code better. Maybe this becomes a tool for designers to be able to do token generation, you know, like all the what padding should we have, what colors should we have, like being able to iterate on that really quickly. But I still don't see it totally replacing, especially when you've got a like a complex application that's got you know very, very clear like wayfinding and things that need to be like spot on, and you need to A B test, you know, should this button be here and how should it be presented and all that stuff? Like I don't see AI being able to cover what designers do there. But I could get on board with it being the cursor moment in the fact that this should make designers faster.
SPEAKER_04Well, I I want to point out something you kind of mentioned there, um, which I think is like the split and the conflation of kind of UI and UX that a lot of people make. And I and I and like to your point, I I think the UX part of it is not solved. But a lot of the UI part maybe is getting there. I think there's there's a point to also lean on, which is when designers are making these UI frameworks and these UI things, a lot of what they're doing is setting up a language to kind of convey what the app should be consisted of. Like what are the the building blocks of this app? And you know, you have you have something that's consistent throughout different parts of the app, and you want to set up reusable pieces that engineers can use to kind of build on and build with. And I think AI makes kind of making the the sub-pieces a lot faster and maybe composing them a lot easier. Um but like when you're designing what the primitive should be, I think like you can iterate with AI with AI, but you have to have an opinion, you have to have some way that you're can conveying these trade-offs in in terms of getting where you want to be. Um hopefully that's making sense. But like I think there's there's still a gap somewhere, and there's always human judgment that's still needed, and no matter what AI workflow you're using. Um, and I think this is no exception. So I I'm always a little skeptical of we're gonna just replace all X with Y, you know.
SPEAKER_00Same. Just to be careful. I do I do think though, like startups, if you're just getting going, there has been times that I founded companies and you'd spend you know thirty to fifty thousand dollars on trying to get your your brand kit together so that you could actually build with it. I don't need to spend that money anymore. Because for sure. Like so there is that level of it that I really do appreciate.
SPEAKER_04But I think if you're if you're already thinking in terms of putting together your brand kit, putting together what are your things, you're doing that work yourself. Exactly. You're you're it's not the AI is doing it, the AI is assisting you in making that. Yeah, and and that's empowering, of course, because you know that wasn't something you could do before. Exactly. And I feel the same way running my business. Like I don't have the funds to pay for a designer to do that work, but I still think like you are still taking on that role. It's not the AI doing it.
SPEAKER_00Yep. Totally agree.
SPEAKER_03I'm curious if you throw that brand kit or that logo into the GPT image thing, can it generate a brand kit for you as well? I'm I'm just really curious. Yeah.
SPEAKER_04I mean, it could possibly do something. I like you'd have to iterate with it. I think the thing for me with design and in and image models is that iterating with them is always super weird. And like you have so little control over like what details it's gonna come up and edit in the next one. Like you can guide it, but then the longer you go, the more lost it gets, and uh it's always a mess. Um, so I I actually prefer something like called design for that work. Um, but yeah, that's that's me.
SPEAKER_00Yeah, you literally have to like download the image, start a new context, and like because the first iteration is always the best, and then from there it's downhill. So I literally have like when I'm working through some, I'll have like 15 or 20 chats, just literally restarting the chat over and over again. Because you're right, it'll slowly degrade any edits you do, and then randomly it'll be a totally different image. Like, I didn't ask for that at all. Like that, what in the world?
SPEAKER_02Yeah, yeah.
SPEAKER_04All right, um, let's move on to Kimi 2.5, sorry, 2.6. Uh new model just released. And on the benchmarks, this model is doing incredibly well passing Opus 4.6, in some cases GPT 5.4, um, which is like crazy. Like, I I I like we wouldn't have seen uh you know a Chinese model like this come out and and and do so well. Um I I did enjoy uh Kimi 2.5 in some cases, not for coding, but like running OpenClaw was an interesting use for it. Um I'm curious, like uh I know Ray, you spent some time with it. Um what what are your initial thoughts? Like, how are you using it and and how do you feel it compares?
SPEAKER_03You know what's really funny? I took that from Claw Design and exported like this 500 line type really detailed you know design instruction type of thing. And I tried it in a couple different places. So on Kimmy on the web, uh Kimmy with the agent swarm, uh in droid, you know, the extra high thinking. And it was the goal was just literally to just take those instructions and generate you know the blog page with these details and stuff like that. Um and surprisingly on the web, it it only got to about like 60 to 70 percent of the job done. There was a lot of details missing. There was a lot of nuance for like light mode and dark mode where the other models really captured it really well. In a really funny comparison, though, I said let me just put composer two on it from cursor and just have it do the same instructions. You know, four thousand lines of code later, like five minutes, all came out, everything working, yeah. Like light mode, dark mode, all details and what was surprising is that even mobile mode worked and that didn't work for any of the Kimmy models on the web even with the agent swarm which I thought was going to spend some more time doing that. I thought man this is weird because uh Google Gemini 3.0 flash with Google Stitch completely nailed it too. So there was something there because composer 2 only has 200k token context window too. I was like huh really fast.
SPEAKER_04Yeah and and when you're using composer 2 though like do you know did you want monitor like how it's splitting up work? Like is it using a lot of subagents to do work or is it uh just just kind of under doing scouting with sub agents like I don't know if you've looked at what is actually doing it seemed to generate code pretty fast.
SPEAKER_03It didn't seem to spawn off a lot of subagents.
SPEAKER_04It just kind of sequentially just went you know yeah so I I have a note on that actually that I like I I don't have data or or anything but like the thing that I think we're seeing a lot with Western labs especially with you know anthropic and opening eye now like cursor with this model um is that they're getting quite good at post training and like the RL loops and all of this stuff where like you you have like a goal and you sign off on something and like building a website end to end is for sure like so deeply covered in those post training systems. But when you pull up a model like Kimi26 which is like an instruct model that's like come out like it has some you know of that RL done especially for tool calling but it's not going to have the same degree of it as like you know compose like cursor put into composer with like making sure that like these common tasks are so well covered and so well like integrated that the model just cannot stop without all of this being done. So I think a big part of it is that um but I I'm curious to see like how Kimmy kind of pushes on and learns on the on the post training. Like so far it really does seem like OpenAI is like the best post-training from everything we've been hearing in terms of the coverage and it's why their models are like getting you know really scarily good at like a lot of stuff in terms of agent running. But I'm curious Adam have have you had the chance to play with Kimi 2.6 and what are your thoughts on like the this like post training around it as well?
SPEAKER_00Yeah I mean I have I the problem I've always had with Kimmy models is how slow they are there there is they have a big hill to climb in my mind. Their models are just these big behemoths of models that typically get you know 20 TPS on average when you're actually running it. So it makes it really hard for me to want to use this model as my daily driver. I also tend to agree with you I I do have some benchmarks I run based on uh like tool accuracy and I would say like Kimik2.6 doesn't do that great on those types of uh runs and I think a lot of it to your point is the other models are so trained to call tools accurately in particular in large agentic loops where there's multiple tool calls that are happening. And while Kimik2.6 is not bad like in retrospect you know if we were to think about it like this model's performing amazing if we had this model a year ago it's incredible but the speed combined with I'd say the the speed combined with the cost the cost is while cheap it's not the cheapest model but on the positive side if you go onto open reality right now it is the number one model as of today for coding which is you know the numbers don't lie so people are liking it. It's just uh a little too slow for me.
SPEAKER_04I wonder do we have data on where those tokens are going? Like what app is driving it?
SPEAKER_00Uh you know what I wonder if I can find that that would be amazing to figure that out.
SPEAKER_04Yeah I sometimes people run promos with free models as well and they kind of top the charts too so it's hard to say um I mean if it's on open router it's probably not free so I I I'm not sure if that's the case. But it's it's worth it's worth looking at um I wanted to make another point oh sorry what what was it Adam?
SPEAKER_00Yeah so number one is Hermes agent which I've never heard of.
SPEAKER_04Yeah you never heard of Hermes okay uh so that means that's like an open claw competitor. That's not for coding. So that makes sense that like it's it's it's burning tokens.
SPEAKER_00Number two is open claw number three is kilo code. Yeah. And number four is clawed code. So there's something interesting and number five is pie agent.
SPEAKER_04Okay yeah fair enough I mean that's interesting to see people burning it like that. Um yeah I just on the tool calling point I wanted to just come back on because you know you're saying it was it was doing a little bit poorer on your benchmarks there. Have you tested 4.7 on that benchmark Adam have you seen like not yet but I need to do that. Yeah because so for me I run you know I run them in with my tools with MCP so it's not like the standard tools that that that like they ship in the in the cloud code harness. And I find Opus 47 like regularly makes incorrect tool calls like it's not like often but like once every session or other session it'll make at least like one failed tool call and it recovers and it keeps going but like it calls a tool that like just doesn't exist or isn't right or like to the way it's formatted. And I've seen I don't know if this is like a prompt issue in Claude Desktop but I was using it in Claude Desktop the other day just in the chat mode and I was doing some some tax related stuff and I noticed that it kept incorrectly invoking the ask user tool. It would end the chat and it would say here answering this question and it was just XML and I was like yeah like it just didn't call it or something broke in the parser. I don't know if it's the model's fault or the or the harness's fault but there's something sloppy about it in the way that it calls tools I think versus older models it's it's a little unwieldy. So just want to call that out like not it's not solved for everyone. You know tool calling is still an evolving science I think.
SPEAKER_00So one of the other things that I have a theory on which could be incorrect on 4.7 is that I think it uses more context than 4.5 did. Yeah and I would love to actually test this like definitively but like I will do very simple tasks and my context window seems to fill up incredibly quick.
SPEAKER_04Well there's there's two things here I just want to just touch on before you keep going. So one thing they did mention that uh they changed the tokenizer for 4.7 and so that the way that text is ingested is less efficient than with older uh the older cloud models um and a part of that is that they want it to be more granular so it can be more sensitive to different tokens which is better for accuracy but at the same time it means that the same text eats more tokens in your context window uh and it's a bit it's about 30% more in the worst cases and sometimes even worse. Especially with white space apparently each space is apparently an individual token people did some tests with it. So that's that's kind of nuts.
SPEAKER_00So sorry keep going I just wanted to make that I mean you validated the the theory that I've got that I did not know the answer to so yeah I that was just a that that was something I noticed like over the last week. I'm like I feel like I'm eating up my context window I'm hitting my limits I'm compacting more often.
SPEAKER_04It thinks way more too like it it they mentioned this like the default is extra high which is a new level they put for it and it you can see it reason it just goes on and on and on and on with this reasoning. Yeah that's definitely a lot more than the older ones I I I wanted to mention as well they said that with 4.7 that high reasoning was the bare minimum you should run it in an agent and that anything lower is not worth it. And it's it's funny they did that because then copilot they they kind of set the max reasoning to medium on copilot if you use it there. So you can't even even if you're paying the top plan with with copilot you can't even use the the model as it's intended. So that's a bit unfortunate. And then they recommend extra high and they say don't use max because max will overthink and use twice as many tokens. Very strange. It feels like they're they're figuring this out a little bit more it it in some ways it reminds me a bit of uh O3 last year that that um openai release uh which I love that model but it it had weird quirks like this um so just worth bringing out yeah any uh any other thoughts there Adam yeah yeah I mean I would I would say most of my testing has been done in extra high because I heard some of the same similar things that they recommend that.
SPEAKER_00I do know there are folks that I work with that swear by medium which is also confusing. But I don't know. Part of it is we are limited in how much budget we have in a month so we have to be a little bit strategic about how we like utilize our AI day by day. And and the extra high is expensive. Like it will eat up your usage very quickly.
SPEAKER_04Now with that tokenizer too you're you're just getting a 30% cost increase just out of the box with the same the same usage.
SPEAKER_00Do you think you're going back to 4-6 Adam for most work are you switching the codecs or are you going to stick with 4.7 I've actually this coming week I'm actually going to try the uh the new like GPT models and try to just force myself to stay on those for a period of time. Like actually give it some some good effort on it.
SPEAKER_03Alright see how it goes.
SPEAKER_04As of this recording the latest model is GPT 5.4 yeah so we'll see you know we we release uh episodes a couple days after recording so things tend to change very quickly um between between what we record so we'll have to come back in this episode maybe slightly outdated by then yeah I I I'm curious to see and what about you Ray what what are you what are you feeling about 4.7 are you gonna stick to it or are you gonna go back or I I I still like to want to play with it more.
SPEAKER_03I'm just literally going max. Like I want to see the thinking tokens I want to see what it's like thinking about and compared to like Kimmy26 I saw Kimmy 26 have these artifacts where it will just keep doing these re repetitive thinking you know lines. So I don't know if that's just me though. I I want to dig in I don't mind paying more for those tokens I get more insight in the model in terms of where it's trying to steer me and then I get this back and forth with language that I can't get if the thinking trace isn't too long but you know I I I if I do need to ship some software or something I definitely bring it down a lot. But yeah right now I'm just kind of playing around with Macs and just just going all the way in.
SPEAKER_04I would really recommend lowering it a little bit just because of the recommendations. And a part of it is that like you know reasoning tokens use up the context window too. So like the more reasoning tokens are burnt the less context window you have for like context to think to to pull in and actually solve your problems. So like it's worth not overdoing it on the thinking finding the right sweet spot. But it is interesting though that like I don't know like I I bench these models I've been benching them for a while and all the previous claw models tended to do better with more thinking on my benching. But Opus 47 is not that and I've in fact noticed that just generally on default settings in Cloud code which is extra high it does a lot worse than older Opus models in in my benching. It's the first one that like really deeply regresses. Like Opus was always like top five model on my bench 4-7 is like I think I saw it like down in the number 25 or 50 it was like really low in the list so like it's been outranked by a lot of models on on the things I I'm measuring with it. So um it's not all wins. All right um let's move on to the next item which is there's been some big news about cursor and SpaceX joining forces and it's interesting you know cursor is you know like that they've been they've been kind of stuck between a rock and a hard place you know trying to make a business work out of AI coding they were kind of first to market with their with their product and doing really well gaining some early ground but you know they're squeezed by the costs of these tokens and as people use more you know it's it's it affects their bottom line. So they released composer and I think it's been a great a great hit and to your point Ray it's been doing really well but it's not like the state of the art models and people want the state of the art models and so how do they get there? Well like a part of it is budget and so like this deal with SpaceX gives them access to the grok cluster which is like insanely huge. And maybe maybe by the end of this year we're gonna see like you know cursor is gonna take a big swing and and and take domination in the coding space. Curious like how how do you how do you feel about the news right you you're a big cursor user and I want to hear what you think.
SPEAKER_03I think also from like the low level kind of going on up so at the conference when I was listening to Jensen talk and speaking to engineers from Nvidia and different folks on the floor at GTC there is like a long tail of optimizations that people still need to do on the latest like blackwell architecture. And especially in inference land too and like every single optimization leads to like huge amounts of margins. Right. And so there's like that part which I think their margins are going to go even crazier you know if you think about it just from one thing. And so a lot of people are like oh they're spending too much it's like you don't realize you know more engineers sitting down to solve a problem can like lead to hundreds of millions of dollars of you know profit. You know it's kind of insane the return you get on these engineers. But the other part of it I think is like the data right and so if all of these engineers are going to be doing like math science problems those types of problems make coding smarter you know the mythos models everything that everyone's saying is like as a result of making coding models better. Yeah you get you know more intelligent models. So you're just gonna have this crunching of you know data from these different inputs that's gonna be you know different workflows from coding but I feel like that's gonna be a really important uh input plus all these engineers and the feedback loop um plus the talent um I think I'm uh I think it's a good move for them.
SPEAKER_04I think this is like well for cursor for XAI like I guess.
SPEAKER_03Yeah yeah it's like it's money I'll get to I get to lease a card if I like it I'm just gonna buy it because yeah it's actually gonna get more and I I I think I think that's what's gonna be weird is gonna it's it's it's a bubble within a bubble. But I was talking to some friends here in Silicon Valley totally unrelated here but like I was like hey how how is this bubble? And they're just like it's not as crazy as the dot-com bubble because the dot-com cup bubble just seemed to be much more widespread where this is very very focused on like very specific uh columns of engineers and people like that so it's not like biotech's going crazy here it's not like all this other stuff's going crazy it's like you know these engineers who are coming in who have these really unique specialty skills people who have been in you know certain types of you know from hardware engineering all the way up the stack um you know those those are the people who are really valuable right as as you go up to you know because those those have larger impacts on the company's bottom line.
SPEAKER_04Well it's it's I just on I just want to just dive a little bit in this bubble tangent because it's it's a really interesting so I I think one thing that's really crazy is that like for one like the compute there's no compute like everything's used like the the there's just no margin for it like what all this money's been spent like a lot of it's earmarked for future spend and we'll see if like we grow into that but like so far the growth has been so significant with all these coding tools that like there's no more compute um so like it it would be crazy if like you know they they they came in they like train everyone to use AI for coding and all this work they get everyone hooked on these models and everyone just can't work without them anymore and then you're like oh well then the bubble bursts like what like what people are gonna go back to coding my hand or like I don't see that like I don't know. So like people are hooked the drugs are here and like people are hooked on them it's crazy. I just want to go back a little bit more onto like the cursor side of it though because like I think there's something interesting here. So like um and and I want to touch on your thoughts too Adam but I I just wanted to like bring up this point. So you know a lot of the cursor advantage especially coming in last year was like they were first mover they had a lot of the early data they have like a lot of you know an advantage there and I'm wondering if that's still the case like cursor you know they are still very much used but there's also you know codex is now I saw last they were at four million um users like active users and you know quad code is is even higher than that. So like they're getting lots of training data there. Is it still the case that like you know that flywheel is exclusive to cursor is it still like the case that they're as valuable like I guess they have like cross model um usage and their harness is really good and they're very good at using that data to improve but like I'm curious if the advantage is still what it was. And I also do wonder like why what what is cursor bringing to the table for space I guess it's like the replacements of their AI coding team and that could make sense because they have good model talent. But yeah. And what what are you thinking Adam? Like you you're still using cursor uh you know at work like like what what do you think about this deal potentially and like where cursor stands in the market right now.
SPEAKER_00Yeah I mean I kind of felt like this was going to happen. Not this particular deal but if we go back to like a year and a half ago as you're looking at like all of these AI agents popping up you know we've got this expansion we're gonna talk about another one here shortly but like there's going to start being a shrinking of competitors in the market. We're gonna see unfortunately more of these get acquired most likely kind of being absorbed into existing teams. Now if we take a look at this particular deal the it's interesting because you know the CEO of Cursor said they are limited by compute one of the reasons that they're doing this deal is for compute but it also gives SpaceX an opportunity to really acquire Cursor and it and it makes me wonder like there's definitely a lot of convergence happening with SpaceX buying XAI now them making this partnership with Cursor giving Cursor access to their compute but also with the option to actually go and acquire cursor and what I would rec realize is probably a pretty good deal considering the traction that they've got and the I would say the brand power that they've got so all of this kind of plays down to that they're the margins are tight. A lot of these companies probably have been burning money left and right I love that cursor is pushing on their own models. I worry about the other players in the market and I think we're gonna see some of the other ones either fold in or be purchased or go out of business and I think this is just like one of the first shoes to drop I do think SpaceX probably going to end up acquiring Cursor. I think it just makes too much sense for them as a business I mean they're a two trillion dollar company we know it's predicted 1.75 trillion and XAI is already like it needs like that foundation so I could see the XAI like grok models getting like force of like the premiere models composer joining for like it ends up being like XAI powered cursor models that that are being built out. But I think we're gonna see more of this yeah go go for it.
SPEAKER_03Is it constrained compute like you think that the the existing folks who are or incumbents that are currently there are going to hit the same bottlenecks that the cursor will eventually oh yeah 100%.
SPEAKER_00Well think about think about the amount of put your hat on and imagine you're one of these AI companies that is solely dependent on whatever API availability you can get from open AI or cloud and your margins are razor thin because you know there there's so much so much competition there like you really have a hard time even getting um the like the actual cost work because I could just go to Klein or one of the open source ones and just pay the API cost. So ideally you're buying in bulk from these providers at a discount and hopefully selling for what the market price is typically to your margins are razor razor thin. And then there becomes like okay marketing dollars headcount dollars the just the operational expenses of developing the the software that you need now the models go up and down there's issues that happen like you you end up not getting enough compute these are scaling issues how do you actually change that you do what cursor's doing you actually have your own models that you could control the cost on but you are now limited by like where's your infrastructure? Like where are you actually going to spend your time to actually train these models? How do you afford to do that? Like all of it's already taken to your point Eric like there's not a lot out there that you know a company like droid could go and acquire and be able to train a composer like model very easily. So there is a there's gonna be a lot of tension in in the AI coding space. There's only going to be a few winners and I think Google OpenAI Anthropic for sure they'll make it cursor I think it's gonna get absorbed and probably as a product will end up being something like XAI branded in the future.
SPEAKER_04The other ones I don't know where they're gonna go I'm I'm very curious but I think there's going to be over this year we're gonna see a lot more going out of business or a lot more getting acquired and absorbed in you know for Cursor what's interesting to me is if they do end up in the hands of the XAI SpaceX team like you know we've seen the precedent where uh Anthropic pulls the plug on access to uh to anthropic and I think I'm pretty sure that they've already pulled the plug from from XAI on using Claude. And so is Cursor valuable without Claude like I guess that's like the bet they're making by the end of the year. I also think though like a lot of the margin that Anthropic is making is from cursor because they're paying tokens and and and like they're paying that margin. So fully disconnecting them is a big thing. So it's a tricky thing for them to decide on um they're freeing up compute but a lot of revenue I just I I it's it's a tricky thing and like that consolidation has consequences especially for a player like Anthropic which is very you know gatekeepery on the access um so I'm curious like do you think like uh Adam that like you would still use cursor without cloud models in it?
SPEAKER_00Probably not personally I don't think I would yeah what about you Ray?
SPEAKER_03Same. Yeah I mean I would just probably flip the droid and like but it costs more it's 2x more inside a droid so it's like you know why? I I do want to drop some elephant in the room that no one's talking about and some of the CEOs have said this publicly and I think Notion's also kind of hitting at this too is that people are the Conversation is about charging more for certain uh tokens. And so the the concept is like right now tokens are all kind of pennies, right? Now it's just like clinging knot left and right. But if the output is for like a lawyer, it's gonna be like a thousand X more than you know certain things. If your economic output is for this code, is this then they're gonna charge you that much more. It's the same probably inference costs and everything else. But they're experimenting slash toying with these business ideas. So if that is the case and they're gonna want to like I think I feel like NVIDIA's kind of creating this floor for them, saying, Hey, here's a marketplace since we're providing the hardware, you know, use our hardware, but then the more that you optimize and have your engineers work towards uh providing, you know, agency to use these different uh capabilities, you get rewarded because now you've done all that hard engineering work to output our tokens that make them more efficient, that don't eat all up GPUs, and then you also bear the margins with that too, right? So it's kind of like I I would there be more incentive and more players that come into the mix instead of less in that world, right? Because we're just assuming that tokenomics today are gonna be the same in like in five years or something like that.
SPEAKER_04Yeah. I mean, I I don't know if we are, but like that's the precedent that's been set, and it's hard to change the market dynamics from here. But it's it's a fact that like the current token dynamics are very like they're they're expensive. Like people are spending a lot of money on tokens. I think if we shift to a model like this, it won't be necessarily to save people money though. Um and you know, I I could see it being more of a squeeze in other markets than coding, you know, especially like you're saying with like lawyers and and you know, potentially medical professionals or other other folks that that are using these models in different ways where maybe they're spending less tokens, but you know, the value they're getting out of them is much higher. I guess we'll we'll see what the market forces bring. And like I think this is gonna be a competitive space. And if anyone tries to gouge too much, like I don't know, we'll see. Depends on how big the model advantages are, but uh, it's gonna be crazy. There's a lot of things, kind of a lot of plates spinning, you know, like right now. Like it's it's gonna be hard to see. I want to switch gears a little bit. Uh well before we kind of move on to the like kind of our like last topics there, I just want to make in a little acknowledgement. So um, you know, Adam, you were like a huge proponent of Rue code for a long time, and they just announced that they're no longer supporting the project. So I'm curious, like, what are you thinking about this? Like, what have you have you spoken to the team at all about it? Or like what are you what are you feeling?
SPEAKER_00Yeah, I I conversed with them a little bit on uh X about it. Um I'm actually really sad because when I go back in time, it was such a like tight-knit group of people, a tight-knit community, I would say at the forefront of like AI-assisted engineering. There was a huge percentage of people that didn't even know about AI-assisted coding, or even the idea of having a VS extension or any sort of like terminal that could actually generate code for you. If people were AI coding at the time, they were copying and pasting into a chat GPT window and copying out. Like that was the level that was happening there. And I actually, you know, I had the opportunity to contribute to the Rew Code repo, and that was really fun. So I learned a lot about how agents are built, and that that served me well going into the company that I ended up founding and understanding how to build core agent loops and then manage context and all the tooling and stuff around that. And I and from doing that I learned more. Got built a really good relationship with quite a few people on the team. And I I see the decision they're making because at some point they have to clamp down and make money. But it is sad. It is sad that um, you know, I think like Kilo Code and Klein are gonna try to take on the root code uh well are they uh my understanding is Kilo is kind of gone gone all in on open code now.
SPEAKER_04They have working that, so I don't know what they're kind of pulling from the the root side.
SPEAKER_00Yeah, they they still are trying to do a marketing push right now to pull in the root code customers. You can kind of look on Reddit, like some of the posts that they've done related to it. But you're right, they have actually built on open code now. Uh they that basically their message was like root code, we'll take it from here. I think was kind of the way they worded it.
SPEAKER_04So it's uh you mean kilo or klein? I think they were mentioning Klein taking over.
SPEAKER_00Klein did that too, but root code or uh kilo code said that same thing. Uh basically had in that exact you know, basically those exact words. Like um, but I think the better transition is gonna be to Klein, you know, in my opinion, and what that team's trying to do. I still worry long term about the health of these open source coding agents. Like how are how are like Klein going to how are they gonna sustain a business long term and are they gonna be able to have the revenue to continue supporting it?
SPEAKER_04I mean the space has changed so much since they got started. Like I actually got started with Repo Prompt around the same time that uh Klein got started when it was still called Cloud Dev, and I remember seeing their progress and their traction, and they got a lot of traction from being open source and free, which I didn't have the benefit of being. Um, but like you know, it that early time was very different, and there was a lot of gains to be made with like clever things to get models to be more reliable, especially with file editing. And I I remember talking to the the Roo people early on about diff file editing, and we take this for granted now, but like um there was a time back then in 2024 where the standard practice for editing files with AI was that they would rewrite the whole file because they weren't uh they weren't like practical at making these small edits. And I had done a lot of work to make that more reliable. It wasn't perfect, but I think like I was doing a pretty good job. And I remember talking a lot about that with the Roo people, and I gave them some pointers on where to look, but I didn't want to kind of like reveal everything I was doing because they were putting it all on GitHub. Um, and uh it's just interesting to see like you know people trying different things, and I think those easy wins are kind of washed away now with with you know like the models improvements uh and post-training. Um, but there's still a lot to do. I think harnesses are still very early, and I do want to talk a little bit about that a bit more later. Um so just to kind of uh switch on on this side of things, actually. Um for the episode, Adam, you were telling me a little bit about your team and how they're using these models and how there's not like a clear, like this is the right way to kind of go about things. Um and I want you to tell me a little bit more about that like indecisiveness around working with models and harnesses and like what the right way to build is uh because I think it's really interesting and kind of represents a kind of where we are right now.
SPEAKER_00Yeah, I actually love this. So maybe um thinking about this like holistically, across the group of engineers that I work with, everybody is coming in at everyone that I know has now acknowledged that AI engineering is here to stay, which is great. Like we were not there, I would even say six months ago, but we're there now. And everyone knows that we need to be faster and be able to output more with AI. So now what everybody's trying to do is figure out like what's the best way to do that. So we've got we've got different degrees of effort here. There's one effort where it's like, okay, how do we build very prescriptive workflows to accomplish these known tasks, and we're gonna drive AI to go do this one thing really good over and over again? Putting the right context in, giving them the directions they need, all the way down to what is the least amount of context that we can put in, what is the minimum number of tools that we need, how do we actually manage this so that uh the AI can go get the context it needs versus trying to like preload it? And what I what we found is there's really no right answer. Like what works for one person and what they're doing for their own workflow for a different task or the different uh way a person likes to work is just very different from one person to another. So where where we are right now and where I am right now is I think it's very individual. It's like what works for you and how you like to work as an individual, you should just go with that. And I think companies have to optimize around individual like adjustment of people's workflows rather than a prescriptive, hey, everyone on the team needs to work this way now.
SPEAKER_01Yeah.
SPEAKER_00So that that's my general sentiment. But to say to kind of conclude what you were saying before, we don't have a right way. There is no like you are doing it wrong and I'm doing it right. Yes, you you've got a ton of MCPs and a ton of rule files and stuff like that, but I can code and get the same outcome, if not, you know, an equal outcome in about the same amount of time. Like there is no like clear winner in the path that we're actually going down right now.
SPEAKER_04Well, I mean, I think there are like differences. Like if you are sensitive to context, if you are sensitive to like how you're prompting these models, like especially to the point of 4.7 being you know a little unwieldy to steer, like there's definitely work to be done in figuring out how to get the best work out of these models. Uh, you know, before I share too much of my thoughts on this, um, I want to hear what what what are you thinking, Ray, on this like topic of like right and wrong ways of holding the models and the tools. Uh, because I think there's a lot there. I think like philosophically, there's like should should should there be a right way, maybe not. Um, but like in practice, like what are you what are you seeing?
SPEAKER_03Well, right now I've been solo operating and I'm just trying to think about like my own workflows, because I brought somebody on to do some operations stuff, and I was like, here's what I'm doing, you know, with my operations, and I was just doing stuff manually and then trying to put the AI in it, and then spun up an open claw, and then like started to do that, and then I'm starting to notice that like I think they were doing it originally just to sort of follow my pattern and flow, and it just wasn't really what working for them. And so you know, burning all these tokens. I have all these great documents that no one really reads, they're using AI to summarize them back, and it's like, okay, what was the point of all this, you know? Yeah, and so uh I'm at this weird phase right now where I'm actually trying to sort of minimize everything and just uh write more succinctly and take bullet points and just really try to get my own ideas out on paper. And I think that's kind of where I'm at individually. I I I I was just talking to my friend yesterday. I was like, I feel like even my own writing, like I spend more time trying to steer the model than I did if I were to just sit down and write my thoughts for a first draft and then write a second draft. It's like you know, or I would learn a lot better if I spent three hours reading a copywriting book and actually learning how to do the copywriting versus delegating that task uh or hiring someone to bring that in. But I feel like that three hours spent is a better return. And maybe because I've just spent so much time uh playing with these models and I know all their quirks and I know that that's a ugly that that's from this model. Like I I I spend too much time reading their output that you know, I I can smell it and it just doesn't smell good. I think other people probably feel that way, but I'm kind of I'm I'm not saying that I'm gonna just like throw everything into a gentic universe right now. I'm still trying to figure out what that looks like. But as a solo operator right now, um sort of trying to figure out where what what still makes it uh more human for me in terms of like what I'm trying to run for my businesses, and then um what is it the stuff that I could just automate, like task, you know, like oh that's a that's an agentic task, that's a cron job, that's this, and then that's just an agent loop. Let me just you know spin off an agent and go ahead and do that workflow. Um so it's been weird because I'm trying to like balance business with code, yeah.
SPEAKER_04Yeah, exactly. So to your point, like you're you're trying to figure out how do you fit this stuff into like the whole part of the business, the whole operation. And I think that is just like even more uncharted territory, like there's just so much that is not figured out there. The models weren't trained for all of this, like we're kind of retrofitting them in from code into being a secretary or doing random other things, you know. Like there's a lot to figure out there. Um, I think like just in code specifically, though, like is there like a like a key thing that you're finding, Ray, like where you're like, yeah, like this is like the thing that is most reliable for you, or is it kind of all tied into the whole thing that you're saying?
SPEAKER_03I love simpler agent loops that are kind of cron-based in some ways. So like setting up monitors to do things and then having data to come back, and then a function that does a simple diff that is very reliable. And then the agent acting on it. That's it. Like I think it's just the most beautiful thing in the world. I was like, this is so cool. Um those are like the best things.
SPEAKER_00I think like if you think about uh the level the ways people work, like traditionally it's like we have a task, we're sitting there writing code on it. Then we got to the point where we can actually have like agents work on tasks so we can start doing multiple tasks. Like the next evolution of that in my mind is we need to separate this out so that the the work is done before the uh engineer even looks at it. And that that is where my thesis has kind of gone to is like we need to get to the point where product managers, designers, they can actually initiate agentic loops, like to actually go complete something, and then engineers can pick that up and actually take it to completion. Now, this is an unfortunate thing in that it does it means we're probably going to get better velocity, but as a coder, there's less of that like joy of actually writing the code, which is a big problem to overcome. But I think that's the way it's going. And so as a as somebody that's writing code daily, typically what I'm doing is I've got three or four agents running, some of them doing bigger tasks, some of them doing smaller tasks. I usually have like my cloud desktop app open that I'm using to kind of chat with, or I'll use Gemini to kind of conversate with. But it is all about like parallelizing work, and each of those tasks require a very different kind of tool set to do it. So not all of them are conducive to going through plan mode and cursor, doing the subagent approach, executing. Some of them might be literally telling Claude and Claude Code to go do a particular thing because it's a small enough task. So I think it just all depends, but I do see the future being bigger orchestration loops, letting things happen from like a task board. It gets picked up by AI, the loop happens, it self-evaluates itself, it makes sure everything's good, it stages it up, it has a place that a designer could actually go and click on a deployed version of the application and be like, yeah, that looks great. I like the way that is, or I don't, and they can integrate on it. That's how we get some of the speed improvements I think we need to go to.
SPEAKER_04Well, it's funny you mentioned that because like if what you kind of described is a little bit of how I've like shifted my own work. Like granted, I am like also solo, but like, you know, in the work that I've been doing with Repo Prompt, um, I recently shipped a orchestration mode, and the way that I've been using it has just completely changed how I code with models now. Um, you know, to the point of Opus 4.7 being hard to steer, I think a lot of that is mitigated if I'm not the one prompting it. Um and instead I'm asking GPT to prompt it. Um so like the way that I have it set up is that there's like a work a skill like workflow and it directs the models to use my specialized sub-agent tools. And what's crazy is like I can have like uh a codex main agent, a cloud sub agent that's running in cloud code. I can have like a cursor explore agent if I wanted to use composer, I could have a Gemini design agent if I wanted to kind of critique that. Uh like like it supports all of them via like ACP and the different stuff. So um there's these different model presets, you know, for design, exploration, engineering, and all that work. And I just talk to the main model and say, hey, like here's what I need to do. It takes care of like running an agent to understand the code base, map it. Um, it has a dedicated model to do analysis and then decomposes the work. And I'm just talking to this agent and being like, hey, um, let's let's cue up this work. And it's like my like task tracker and it and it delegates work as it needs very efficiently. I can interrupt it, it can come back and and kind of say, okay, we'll add requirements on demand and steer the models as they're working, check on them as working. I've had reports of people spawn it spinning up clauds, and sometimes the claws spin out and start doing things that are no good. The master agent will stop it, kill it, and and say, nope, start a new one and say, like, let's let's redirect it, uh, all on its own. And it's validating its work as it goes. Um, like the this whole flow.
SPEAKER_03Yeah tend to float back to because that's something I'm gonna be digging into a little bit more. Like, what what's your like orchestrator plus what other things you prefer? Yeah.
SPEAKER_04I mean, so so right now I've got like the the Codex $200 plan and the Claude $200 plan. So because I've got like all these tokens to burn, um, I'm like kind of going a bit ham. But at the same time, the setup with this workflow is so efficient because of how it delegates work and and like each each context window is so well like curated that like it's actually not even anywhere near hitting any of the limits, which is crazy. Um, but my to your to your point, like what I'm using is I really like like Opus, you know, 4.6 or 4.7 to do exploration. Um, I really like uh I recommend Sonic Hive for like people being reasonable, but that's like you know, like if you're gonna have tokens of burn, Opus is great. Um my main agent that I talk to is is GPT 5.4, and then I delegate work to GPT 5.4 as the main primary like backend main worker. I have a dedicated design agent that is Opus on latest, like the Opus 4.7. Um, and then that one is in charge of like doing design reviews or doing UI work. Um and then there's like a cheaper engineer model, which I set to Codex 5.3 on medium. So like if there's a simple task, Codex will just do that. And and so this way, like you're like load balancing between the plans, you're like load balancing between what task needs what, and the workflow already specifies when to divide what kind of work. So I don't have to do any of that micromanaging. I just tell the model, let's get this work done. It deals with it, and it's generally like really solid. Like the quality has gone way up. I'm able to tackle like really large refactors that just work. Uh, each of the pieces are really doing a lot of validation. It's it's insane, like how much more productive I've been since I got this out. Uh kind of nuts. Yeah. Uh yeah, I I think like we're gonna we're gonna see more and more, I think, of this kind of delegation work orchestration. I think um there's there's from what I've done with it, like I've noticed that the hardness is right now, the tools they have for managing subagents are really lacking. Um, because this micromanaging is kind of not super doable. A lot of them will just have like fire and forget subagents, but you need sub agents that are kind of persistent that can kind of keep doing work that the models can kind of keep prompting. Um, I think I think there's a lot there.
SPEAKER_03I think by filling on a cool thread because yeah, I've only seen this in a specific mode, like in Droid with like missions or like other apps that you know like try to pull them together, but most of the part these people are just staying in their own silos. Yes, yes. Like you like there are these advantages with these different models, and I love that you're natively actually just filling it in. Uh and it sort of sorts itself out. I really I'm super curious now. I can't wait.
SPEAKER_04And I mean I've I've set it up to auto-default all the models with my preferences too. So like you just kind of pull it up, connect your CLIs, and it's just all configured, you just prompt it and it works. Um, so there's like a lot of complexity I was describing, but all of that's hidden away when you use it, which is I think was uh good work there.
SPEAKER_00Well, which's really cool about that is it's very similar to a theory that that I had like I don't know, 12 months ago, honestly, that I tried to build with Root Code, which is very much that same orchestrator level. I my concept behind it was, and it is somewhat coming true, is that AI is going to continue to get more expensive, but you don't need the best model to do the simple task. Like I don't need the model with all the world's intelligence to change the color of a button. That was my kind of my premise behind it.
SPEAKER_02Yeah.
SPEAKER_00So the the challenge that I ran into, which it sounds like you've overcome a little bit, is the context passing that has to go between all the different subagents and being able to get each of the subagents to understand where we are and the the series of work that needs to happen.
SPEAKER_04Well, I think that the way to kind of overcome that is like two ways. Like one is if you have like a dedicated plan file that gets written to, like that the main engine passes around, like that's like a super anchor, super helpful anchor. But the second thing is just that I don't think like six months ago we were there with models being able to orchestrate, but I think with GPT 5.4, like it's gotten to a point where the models are able to keep track of the goals, keep track of all your steering, and keep the models on track with like the work. And they're able to follow the workflow and and and loop over it and just keep going. And they compact well. So like even after a compaction, 5.4 can keep going and keep running this loop. Um, but generally it doesn't compact because it's very lean on what it's doing and delegating this work to other agents. Um, but you know, I tried running this with 4.7 and it wasn't doing a great job, to be honest. Um, like spinning up threads and keeping track of them and waiting. Um, so I wasn't super impressed with 4.7, and that's why I like GPT a lot for this work. Um, but I I think like a lot of the problems are around the tools and the efficiency of the tools and the managing of agent threads. Like, I think there's just not been enough work done there, and that I think I put in the work to make work. Um so yeah, that's that's kind of where I'm at with my own.
SPEAKER_00Sounds like I need to go load that up. Yeah, I need to get a Mac.
SPEAKER_04Yeah, yeah, yeah. Absolutely. Yeah. And and uh yeah, and like Ray, you've got one, so you you gotta let me know what you think because I I want to see how I can talk about your work.
SPEAKER_03Yeah. Yeah, I got a Mac mini. And then next I'm just gonna be texting over telegram plans, and it's just gonna, you know, do its thing.
SPEAKER_04Well, the thing that I've got crazy with this too is that like the the tool to start the orchestrator is available over CLI and MCP. So you can actually have like your claw like just talk to it and like start the work and just have the orchestrator run. I'm just gonna have to check in on it. And it's just gonna be the Mac mini with repo prompt is running and just see these little like your claws just checking in on the orchestrators and telling you what's going on. It's it's nuts, honestly. Um, all right, cool. I think this is probably a good place to stop. We've gone over time a little bit today. Uh yeah, I think it's give me a little space to talk about that. I think uh I mean I I'm really excited about it. I think it's been doing incredible work for my for for my. But my community has been thriving over it and it's it's been great. So um yeah, lots lots coming out in the pipe. I I don't think uh we've seen the last of these new models, and uh it's gonna probably be another busy couple of weeks, I'm sure. Um yeah, so we'll see what happens. And uh thanks for tuning in, everyone. Any closing thoughts, Adam, for today?
SPEAKER_00Nope. Uh just uh leave comments below. I love reading through it, yeah. Hearing what you guys are trying. Um also if you haven't subscribe and follow on the regular podcasting channels like Apple and Spotify, those help a lot.
SPEAKER_04Yeah, definitely. How about you, Ray? What do you got to say?
SPEAKER_03Yeah, just thanking everyone for for following us along this journey. I think everyone, you're if you're listening to this, uh you're probably still early to this. I still, you know, it kind of feels like you're falling behind, but please don't feel that way. I think we we all feel that way as well. And uh this is a unexplored territory, and it's part of the reason why we just started having these discussions. And it's just really nice to sit down and catch up a little bit for a couple weeks and just see what's been happening in this space. And and we encourage everyone else to do the same and you know, be be thoughtful when you're in the comments as well. And I think we it's more about like encouraging others to grow, uh encouraging others to you know lift each other up because there is a new paradigm that is here with AI, and I think we're trying to leverage it in all of our different ways, and people are coming up with discoveries and and seeing new cool stuff, and we're just gonna continue to see more and just an exciting time to be around. So I'm really happy that you're able to join us. Obviously, share it with a friend, super helpful here. Um, you know, five star review all over the place is actually gonna be really helpful, and we're trying to notch it up to the number one podcast in the world.
SPEAKER_04At least the AI category. Yeah, in the AI category. Yeah. All right. Well, thanks everyone for tuning in, and uh, we'll catch you next time.
SPEAKER_03Later, everybody.