Rate Limited

Claude Code Leak! Rate Limits keep changing, and Building Agentic Systems | Ep 13

Adam/Eric/Ray

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 49:37

This episode covers recent developments in AI, including source code leaks, rate limit changes, and the future of agentic AI systems. Experts share insights on managing AI projects, building reliable agents, and navigating the evolving AI landscape.

Links:
Ray: https://www.youtube.com/@RayFernando1337
Eric: https://www.youtube.com/@pvncher
Adam: https://www.youtube.com/@GosuCoder

Chapters
00:00 Introduction to Rate Limits and Coding Challenges
01:36 The Cloud Code Source Leak Incident
07:05 Rate Limits and User Experience
11:59 Choosing the Right AI Tools for Coding
18:02 Building Agentic UIs and Architecture
22:34 Exploring Cloud Dispatch and Its Impact
31:07 Designing Agents Beyond Coding
40:22 Building Effective Agents for Business

SPEAKER_01

You are now tuned in to the Rate Limited Podcast with your host, Ray Fernandu, Gosu Coder, aka Adam Larson, and Eric Brownsche, the founder of Repo Prompt. We are here because there's a lot of rate limits going on. If you've been using Claude Code and all these different things, there's so much crazy news and gossip. And if you stay towards the very end, we have some really cool stuff about how to do agentic coding, like agents to agent frameworks and all these really great deep dive discussions from Adam Larson here. And so without further ado, Eric, go ahead and kick off the show.

SPEAKER_02

Yeah, thanks for that, Ray. So I mean the first news item this week that I think really deserves to be touched on is that it seems like the Claude Code team accidentally uploaded their source code uh online. And it didn't take long before the entire entirety of Twitter just filled it up, grabbed it, and distributed it all over GitHub. So it's all over the place. And now I think the lawyers are coming for all those uh all those forks and whatnot. Um so it's really interesting to see that like a mistake like that could happen. Um from what it seems, uh there was just like a process error where a developer accidentally included a source map, and the other team has been very gracious and not like throwing that dev under the bus who did it. Um and I think you know, we can all relate to mistakes happening in prod. Like, you know, there's processes involved for a reason, and we want to make sure that these things happen, you know, correctly in practice. Um, you know, Ray, you know, you worked at Apple for a long time, and you know, you were telling me earlier you, you know, these things these things can happen even at you know large companies like Apple, even without AI involved. So why don't you tell us a little bit about what that's like, you know, making big releases?

SPEAKER_01

Yeah, I think the most important lesson I learned from all these years of things and uh stuff, it's like, you know, I don't speak directly for the company. This is just kind of my own experience as a software engineer that like you know, if I'm running a company now and I have employees that are hired, uh, you know, the release process is extremely important and there's certain things that you want to check. Uh I actually one of my first bugs that I found out uh actually prevented, you know, a whole bunch of crazy stuff from happening. You know, I assumed it cost the company lots of money, and the the best part and the thing that I appreciated from all my managers were that it says like don't make the same mistake again. So uh I think that really got me to think as a really junior engineer to be like, okay, I really have to pay attention to all these details. Even if I do delegate to an AI system, you know, what the rules are in place, uh, you know, is there a human involved? Are there like really specific critical steps? And um, I if they're like that important to the business where it can be costly, uh, you know, more more eyes and more attention are gonna be on it until we have a system that can take over. And I think kind of that's the uh the big uh uh thing that I would say is a takeaway that maybe if you have some business critical thing that is gonna impact your business a lot, make sure you have some process around it. Um and either other folks to check or whatever that's involved, even if it takes time away, uh, you know, all your hard work can just go away overnight in in some way. So it's it's always pretty cool.

SPEAKER_02

Yeah, I mean it's uh it was a huge deal to see this source leak. You know, Anthropic's been very um secretive with a lot of their stuff. They're you know not so open, you know, like some of the other competitors, most of the other harnesses are you know open in this way. Um but so it's interesting to see it happen and and you know what's gonna come from it. Like I I I do wonder what's gonna happen now. Um, but I you know, I think you know, the takeaway, you know, seeing from people who are like you know scrutinizing the source, um, you know, there I don't think there's any magic in that harness. I think you know, like people are like trying to find you know what what is the special sauce. I think at the end of the day, it's just the model the model is is is uh used to a certain shape of of work with these with these uh you know response structures, message formats and such. And there's a nice little like set of you know tools around the UX of it, and I think there's some stuff to be learned around that, though I would be careful overly scrutinizing the source, you know, it's it's not public and to be careful what you read. But uh yeah, I don't think there's magic in there, and I think uh, you know, we're all just exploring and figuring it out as we go, and I think the labs are doing the same, and so it's interesting to see. Um, so yeah, any any thoughts on it, Adam, uh, before we move on?

SPEAKER_03

Yeah, I mean uh honestly it it was bound to happen at some point in my mind. Uh I worked in the video game industry for six, seven years, something like that, and uh pretty much every video game we released within days was already cracked, and source code was out there, and they're like it's like it it's just one of those things that like humans are gonna make a mistake, people are gonna figure out ways to decompile things. And to your point, Eric, I don't think there's anything that special about any of the harnesses really. I think a lot of it comes down to like how the model interacts with the harness and how they're trained and how they behave. Um, I would say, you know, as a company gets mature, to Ray's point, there are better checks and and like gates that something that products need to go through before releasing. And this just shows me, you know, anthropics moving fast, they're breaking things, they're probably still operating quasi in startup mode. And you know, as if they're a multi-billion dollar company now, they're probably gonna have to start putting in some actual like formal release processes and make sure that whatever boxes need to be checked, which I know a lot of people aren't gonna like, but it is just the nature of when you want if you need to protect your IP, you have to do that. And I'm sure there's a lot of people upset right now at Anthropic that their code is out there being looked at. But like I said, bound to happen. It's just been the nature of the internet in my entire life. You know, the entire time the internet's been around, stuff like that's been happening.

SPEAKER_02

You know, I I did read uh something interesting as well about this, is that um, you know, I mean it's JavaScript, right? It's TypeScript. So like even in the binary that's released, it's all like reverse engineerable, with it's like minified JS. Like you can recover a lot of that. Like, so so it's not like there were the stuff wasn't kind of available if you want if you knew where to look. I think the difference though, from what I did see, is that um what's interesting about this release is that there's a lot of like internal only parts of the code that were previously stripped from the build. Um, but because the source was like fully released now, uh all that stuff is available to kind of look at. Um so that's why some people were able to find like you know, secret model code names of stuff that's upcoming and whatnot, and um finding interesting prompts and stuff that are interesting too, like um just interesting things to kind of you know dig around. And there's lots of thoughts and people digging around on next if you want to read for it, and I'm sure on YouTube as well.

SPEAKER_03

Okay, one theory, the Tamagotchi game thing that they released in there. Yeah, it had to be an April Fool thing, right? Like that hat.

SPEAKER_02

I don't know. It sounds like uh it was in there and planned for a while. They put a lot of work into it, so it has to be an April Fool thing. Yeah, it's gotta be, man. I I think they're they're they're just cognizant of the a lot of people are just staring at the little thinking spinner, and they're like, Well, what are we gonna have people do while they're waiting? And start scrolling, and we want to keep engagement, so let's bring them into the app and start feeding a creature. Maybe. Yeah, uh just subway surfers, bro.

SPEAKER_01

We just gotta get those subway surfers.

SPEAKER_02

Yeah, yeah, yeah. Just Italian brain rot all over the place. Uh right into your code. All right. Uh, I mean, the next uh topic is I think a little bit more relevant to the folks at the pod here is that Anthropic actually really tightened up some rate limits uh in the last couple of weeks. A lot of people were seeing that their usage limits are blowing up. Um the official word was that actually they did a lot of efficiency improvements, and it's only 7% of users that would notice like increased rate limits. To be honest, 7%, in my opinion, like is actually quite high because all the power users are in that 7%. Pretty much everyone on X who's like talking about it are in that 7%. So it's bound to feel like a larger number than it kind of seems on paper. Like if you think about it, like a lot of the people outside of that sphere of like power users, they're probably using it quite casually. So of course they're not going to be hitting those limits. Um, but it seems like there's even something else going on on top of this, where you know limits are being hit faster, you know, potentially prompt caching is failing in cases, and uh that's like a quite interesting thing to observe. So I'm curious, uh, have either of you been hitting increased rate limits? I mean, it seems like they're being squeezed because of all the new users coming in, and there's only so much GPU capacity to go around. Um Ray, have you have you been hitting them a lot lately?

SPEAKER_01

I haven't really been using Opus like Claude Code in the actual Claude Code harness. Um a lot of my usage has really been from um you know like external agentic, you know, like the Claude SDK testing type of stuff. So I I haven't really seen it there. Um you know I have like this agentic framework that's kind of running in my own little thing. Um that's basically what I use a lot of my sub for. You know, it's like just testing around that. Um so I my limits look okay. I I use um you know the cloud cowork and claude dispatch, and they seem to be very generous there too. I have I have the$200 max plan too, so that's pretty good. Um and and you know, I I just did taxes, I did a whole bunch of stuff, all kinds of requests, didn't matter what time of the day I did. I still seem to have a lot of limits.

SPEAKER_02

Well, it seems like the the tighter limits are for you, I think 5 a.m. to 11 a.m. Pacific time. So if you're doing your work outside of those hours, I think they're pretty okay. And I know that they have like a temporary promo where outside of the peak hours you're getting double limits. So it's like kind of a whiplash where like if you're using it in peak hours, you're getting basically half limits, and then afterwards you're getting like a double or normal. So it's kind of hard to gauge like your usage and see like what's fair. Um, I I noticed that I I moved up to the hundred dollar plan. I was I'm I'm mainly a codex user, but uh I started turning to Claude for certain tasks um in the last couple of weeks, just so I wanted to kind of wrap up a little bit. I was on the pro$20 a plan, and that one I noticed when the first day when that those limbs kicked in, I was I just was using Opus a little bit, and within 25 minutes the limits were gone with two chats running. Like it you could just see the bar fill up, so that was really terrible. But but then especially the$100. Uh I mean, I mean I had it on just medium at some point. I've been pushing it up to high, but like because it's a lot smarter on high. Yeah, it's just uh it's just tough, you know, like to use it a lot. And I think the open AI limits are actually coming back down as well now. Uh with I think April 1st, April 2nd, where they're the 2x promos ending. So we're gonna see what what they're gonna do about that. They just reset rate limits again yesterday, so it's very hard to tell like what you get with codex as they just keep resetting them. Um so it's very hard to hit the weekly limits. What I have noticed with codex though is you're a lot more likely to hit the weekly limit than the five-hour limit, whereas with Claude, I almost never hit the month the weekly limit, but I almost always hit the five-hour limit. So, you know, that's a little unfortunate. Um, and what about you, Adam? Have you been spending any time in Claude Code lately?

SPEAKER_03

Yeah, it's my main that and cursor are my main two. I will say I'm actually uh I hate to even say this. I'm kind of thankful they put these limits in place because there was a time that there's so much outages, it was basically unusable. Yeah, and I no one wants these, I don't want these limits. I still hit them. Yeah, but I also want stability so that when I open up claw code, I'm like, oh crap, go check the cloud status page. Yep, it's down again. I gotta go.

SPEAKER_02

Yeah, yeah, yeah.

SPEAKER_03

So that that was my life for a while. And then it was like, okay, back to cursor for a bit. I am hopeful that they can stabilize, and if they can stabilize, then maybe they can loosen the reins, you know, if they figure things out. So I'm trying to keep that positive twist on things, but for me, I hit those limits all the time, honestly. Especially because I'm Eastern time zone and I feel like it's like prime coding times. And yeah.

SPEAKER_02

Yeah, yeah, yeah. In the morning, morning Eastern time is the worst time to be coding. It goes through to 2 p.m. Eastern, so that's like really tough if you're trying to get a work done in your workday, normal hours. Uh, they really want people to be night coders, I guess, like sleepy damned. That's just how it's gonna go now.

SPEAKER_01

Um so how would you like if you had like a couple, like, would you get two pro plans of one specific vendor? Would you get one in one, like you know, 400 bucks a little bit? I'd be one in one.

SPEAKER_03

Like to me, to me, Claude is down so often, at least in my personal opinion, especially during the times that I'm coding. Like I it was like literally daily. I'd have something happen, I'd go check the status page, and it's like, oh, they got something going on now. Or Opus is down, but Sonnet's okay, you know, whatever's happening. Uh I would be I would have to like bring a chow and have two different providers personally, just because of I want the reliability to actually have something work, which is why I like cursor and cloud code is my two main ones right now.

SPEAKER_02

Cursor and cloud code, because you gotta have just Opus, like you're not turning into the GPT models. I turn to GPT models when I have to.

SPEAKER_03

Uh I'm still definitely a bigger fan of the Opus models.

SPEAKER_02

Can you talk a little bit about like why what that split looks like for you and like why you feel like Opus or or you know just just like the cloud models in general are doing better for you?

SPEAKER_03

Do you know how like when you work with an engineer on your team and you're like, oh, okay, this person, you don't know them, but they just absolutely crush a project that they're working on, and now you've kind of built up that trust that with that they're actually gonna do a good job with it. What I've learned with Opus is if you do the planning up front, you could just let it rip, and 90% of the time it's gonna come back with something very solid. Now, I will say there are times that if you do not spend the proper up type of like planning, it will infer things it shouldn't do, and like all the things that would happen, just like a normal engineer. For me, Opus is just kind of built that trust that I kind of I when I want when I know I want something to work, I know how to talk to it, I know how to plan with it. If I need something to happen and be done, Opus is what I turn to. Now on the GPT five the GPT side, I actually like using GPT to review the code that Opus writes, just as that secondary like perspective. I was using Gemini for a while, but I've kind of merged into that a little bit more on the GPT side. And I don't know, maybe maybe I need to actually just like hard limit myself where I'm only using GPT models for a week just to see if I can build that trust with them. But right now it's Opus.

SPEAKER_02

Yeah, uh, I'm I'm curious though, because you're using it mainly for these long plans in cursor though, right? Yes.

SPEAKER_03

And cloud code. Yep.

SPEAKER_02

Okay, so so for my experience with cloud code, the compaction is just not that great. Um, and if you're using the 1 million model token uh with the context window on Opus, I find like over 200k tokens, it's really hard to trust the model. Um and and I I mean the same with GPT, like I don't trust like large context GPT either. But what I do find is that if I'm executing a large plan uh and I'm very clear up front and I provide like a key way for the model to kind of recheck where it needs to get the information it needs, um, then the GPT models are are better at kind of executing over multiple compactions to kind of solve the the full scope of your task. Uh whereas a Claude, like, you know, you can let it run for a while, but if it's just beyond the scope of where it's gonna cross, maybe like 300k tokens, and you you let it go without compacting, it's gonna start falling apart. But then if you let it compact too, it'll probably forget some things. Um, I know in cursor they have a better compaction system than what you get in Claude Code. Um, so I think you know that's something to consider as well, like the scope of your plan. I think in general, like you don't want to be doing huge, huge, huge one-shot plans. Like, I don't think that's you know the best practice at the moment. Um, but I'm curious like how that colors your experience and yeah, and I'm I'm curious as well, Ray. Afterwards, yeah, tell me.

SPEAKER_03

I I would agree with you, like there is a limit to how much you need to push, and I'm actually still of the mindset that 200k and under is pristine, which is why I don't use a lot of MCP. I do not like to pollute my context anything more than I have to. So I'm very much on that side. Uh, I do agree also that compaction is better in cursor. I find that it it feels very natural and it doesn't it doesn't compact so much. You'll actually watch it and it'll only maybe reduce it like 30 or 40 percent. So it's not like a really aggressive compaction, which I think is good. And I also agree that like if you were to try to do something where you're generating a giant PR to go like that is useless. So the plans that I I typically are working on try to get isolated to two or three files or one new feature that's going in. So I'm not trying to do like an entire like application from beginning to end. Um but I I still go back to I've done some amazing things with Opus, and it has just built that trust with me. And the way that I actually can build that trust with another model is I actually just need to like go and limit myself to just go use that and figure out how it works. Like pretend Opus doesn't exist for a week. I only can use GPT models and see what I can pull off. Maybe I'll do that this coming week, see what happens.

SPEAKER_02

It's worth trying. I mean, from my experience, GPT-5.4 is great. I think the only thing is like you have to just be a little bit more specific with how you talk to it because it'll take things more literally. But I find that like generally if you want code that works, GPT is better. The one downside though is if you kind of let it run too autonomously and figure things out on its own, it tends to produce a lot of complexity and it writes a lot of useless tests that you have to kind of pull back and clean up after. It loves stuff like that. That's like the one downside there.

SPEAKER_01

It loves like literally guard things. I'm like, bro, just chill, just chill for a second.

SPEAKER_02

Yeah. Well, both the models are allergic to breaking things, so they're both add all these complex backwards compatibility fallbacks, they're always doing that. Um, yeah, it's it's a lot. You gotta really monitor what these models are doing, or your code base becomes the model's code base very quick. So yeah.

SPEAKER_01

Yeah, I think I wanted to add on the cursor thing that I noticed is because you can reference past conversations in in that compaction things of what they do is that they basically just compact it. Uh they jettison your your messages, you know, the all of the agent messages into JSON L files. And so it looks like later on, you know, they from a compaction standpoint, uh they just you know keep some type of context going and then they're you know going back in at certain times and reading back in the relevant, you know, pieces that they need for uh their thing. But it's kind of crazy how much it's continued to evolve. And I I think the the speed is is really interesting. It's um it's like wearing different gloves, you know. Like you have this glove that will do this thing, and like you know, I used to play baseball, so like you know, pitcher's glove, catcher's glove, you know, there's those different uh things that you use them for uh depending on your use cases and teams, and I feel like that um i it goes all over the place. I've been actually um I'm signing back up again for the ChatGPT Pro specifically because I'm doing this massive architecture thing where uh I want to have all my agents talking to each other through like the Apple Watch, even though I don't wear it anymore. I'm I actually gotta get it back out. Um the Apple Watch has been a great forcing function to build UIs from, actually. It's like a perfect thing for agentic UI thinking because it really is just basic elements, you know, it's just a tap, a touch, and a microphone. And so you can do a lot just with those simple gestures. And um, you know, like one thing is like if you raise your wrist, you only have like two to three seconds to do something before your arm gets tired. So if you start to think about UI building in that space, you start to eliminate a lot of things in your UI. And then you start to focus on like what use case do I want? And so that type of thinking has been really cool because now uh I'm building a thing that's gonna track all my agents, like an open claw, all the things that it's spawned out. Uh it's gonna track like codecs, what things have I kicked off, clawed and so forth. And so, you know, I've kind of architected it in a way that I'm gonna have like this massive plugin system, so each of them kind of work independently and so forth, but I want to get a good reviewer on it. So I know I was gonna spit up repo prompt and then the chat GPT Pro thing. So I'm gonna hit you up, Eric. And uh we're we're gonna try to figure this out because I wanna like that's what I want to use the pro plan for. And and I think from what I understand from what you say, um right now it's just a bunch of like one you know, giant couple, like two to three markdown files that really describe the spec that I'm trying to build and each of these little details for the components. Um and like how would how would I approach this, Eric? Like, how should I, you know, it because it is it the the file is growing bigger, right? As I as I add more things, I start to separate my files, or like what's what do you think I should be?

SPEAKER_02

I mean, I can start building.

SPEAKER_01

I'll encode I have is like a basic HTML prototype that clicks through and shows what I want. But like, yeah.

SPEAKER_02

I mean my my advice is like don't over spec up front and like try and like start building, but laying down good foundations so that it's scalable and like constantly gut check, like read the code or like ask the agents like how can we make this more scalable? What can we do in terms of here's what I want to add eventually, like how can we lay down some groundwork so that that's possible? I think setting up projects is actually like a huge challenge with agents because they kind of just kind of slop it out. Um, and you have to really be disciplined because you know the the the bones of your project are what kind of determines the architecture going forward and how the project sprawls out and evolves. So, my my advice is like try and plan small early on, get the core functionality in, and then like once you have it working, like start adding in the pieces one by one that you need to add and try and clean up the architecture as you go so it not just like run it off and go. And once you have those clean bones, then you can kind of start running a little further and further. But like you want to make sure that like up front you're you're taking short sprints so that like you're able to kind of just make sure that your code is growing healthily.

SPEAKER_00

Yeah, that's uh great advice.

SPEAKER_03

I yeah, I'd add uh pretty similar, like get end-to-end as quickly as possible, and I think that's what you mean, Eric, by bones. There's too many people I see like not doing that, and then you end up in just a mess. Like end-to-end as quick as possible, get the framework in place and then go.

SPEAKER_02

Yeah, because then you might you might have like described some features and then you didn't really think through how they interact, or you might get like some some things that that kind of sound like what you wanted but aren't actually correct. And that's right. And I'd also invest a lot in finding ways to kind of API fy your code base uh so that like models can kind of call into it and test it headlessly without having to run through the whole UI. I think. That's like really powerful stuff now, like making a CLI for your app, like finding ways to kind of poke around the functionality and and create good test plans. Um, I mean, obviously, general unit testing frameworks are great, but like often having the whole machinery running is really helpful, and being able to call into the things that you need are is very helpful to test. So, um, yeah, closing the loop is very important, but also getting your hands dirty, playing with it, and seeing how it feels. Like the models can't do that well right now if there's a UI involved, and you need to be able to see how it feels and what looks like you know, what looks correct and what's missing. So the sooner you can get to that the better. Yeah. Sweet. Um, I I was just uh looking at this claw dispatch thing. Um, you know, we we that this just came out, and I know that the two of you have been playing with it a lot. So, you know, Adam, why don't you tell us like what it is and like how it's been affecting your day-to-day.

SPEAKER_03

Yeah, I actually I had a friend message me and say, like, hey, I've been using Claw Dispatch, you gotta check it out. Because I had heard about it, and you know, there's so much AI news, you can't really keep up on it all. And he was telling me what he was doing with it. And I was like, man, that actually sounds pretty cool. So what I've been using it for, well, first let me start with what it is. So within Claude Cowork, or in the Claude uh desktop app, you can actually go into cowork and you can there's a dispatch uh menu item on the left side. If you click on that, you basically can create an asynchronous task, and that asynchronous task will run to whatever completion, as long as you have the permission set up right. So for example, I've hooked up like Google Docs to it, and I have hooked up a coup a couple other connectors. So I can actually uh go through and I say, hey, I want you to do this research, I want you to link where you found it, I want you to build a spreadsheet, I want you to uh go ahead and put it in Google Drive for me, and then I'll come back in the morning and it'll have the research done for me, everything in a spreadsheet, all the formulas and things set up for me. It's actually like incredibly, incredibly nice because um it just runs. It runs in the background, and I go to bed, come back in the morning, and it's done. I've actually used it to create projects too, so I did it to create a few like task management oriented apps that I'm trying to do for this business that my my kids are starting up, and it's been super nice. I also Ray brought this up, but I also did use it for my taxes too. So I had some um a bunch of like individual receipts and uh things from credit cards. I had it like bring together, like, okay, this receipt matches this. Here's the categorization of things for me. It's it's incredible actually. I really like it. I Eric, you mentioned this. It is buggy at times. There have been times I've come back and for whatever reason it failed to create the Google Doc, and I had to like prompt it a little bit around it. But when it works, man, it is it is so nice.

SPEAKER_01

And how do you notice on the side that it like kicks off this little thing? It seems like its own separate context window to achieve the task that you're doing. Okay.

SPEAKER_03

It's got like a little like um almost like a project view where you can actually see all the context and the things that it's building up. You can even have have files like within its context of it. It's very, very nice.

SPEAKER_02

Yeah. So I'm curious, is it is it mainly the UI that you're finding you know appealing for this work, or is it like you know, just the experience of using it significantly different versus like cloud code directly?

SPEAKER_03

For me, for me, I gotta say, it's mostly the UI, like, and it's built. So in my day, in my day job, I always think about there's there's agents that you're interacting with to get a job done, and then there's autonomous agents. What this feels like to me is a purpose-built system for autonomous agents, where you give it a single job and it'll run it to completion without constantly asking for feedback along the way, which sometimes you want the autonomous nature of AI, and I think they've done a good job wrapping that up here.

SPEAKER_01

Yeah, to me, I would typically set up a cloud project and then keep a chat going. But the problem is like I would have to constantly create handoff to for all my different chats, even though I had a project folder, right? And then when I would go create a new chat, I would say, can you reference the other chat? And it would be, you know, recent in the last month they started to do that, you know, with the introduction of memory, so that within one project you can reference all of your chats. But with cowork, it just kind of like that's your project. Like that's the whole thing just in one big ass chat. And um then you see it launching off these different, you know, basically it's a sub agent in essence to achieve your task. And it has all the context for it. And so you just keep ripping. Um and so you you can still create a new a new thread or a new whatever, and then you you have a new like basically a different project that you're working on. And that that type of workflow is kind of what I actually appreciated about OpenClaw. It's like you could just have this thing that runs forever and you don't have to think about compactions or anything, and it does whatever it does. But to me, this is like less debugging. I know it's gonna pick up the things. I don't have to like I didn't spend any time saying, okay, now go pick up this chat or do this. It was just kind of like cool. Now let's build a spreadsheet. Okay, now if I my accountant was looking at this, what type of things will would she want to see? You know, and then it's just like and uh don't clip this anyone. But basically, finance bros are cooked. It's an incredible job. It's there's a skill for writing Excel files that I forgot that Claude had. It's so good, yeah. It's good. It's crazy.

SPEAKER_03

To add on to that, Ray, like one thing I was trying to do is like for me, the hardest thing is like meal planning. I know this sounds ridiculous, right? Meal planning, getting the grocery list together. So like I was like, okay, plan a meal. Here's the things that I've done in the past. So it did that, build a spreadsheet on it. And I said, go to Walmart.com, add everything to the cart. And it it added a bunch until Walmart realized it was a bot operating it, and it blocked it. So it's like, dang it. Like we need websites that are built for agents to interact with. We do. We need a Walmart CLI.

SPEAKER_02

I you know it's crazy. The grocery list thing is really nice. Like, there's a lot of little tedious things that uh you end up doing day-to-day that like if you find if you take the time to try and like let these agents automate it, like you can actually get like you know, a lot of time saved and quality of life gained back. Like it's very tedious to like pick a recipe, pull out the ingredients, make sure that you get the right amounts and you you get everything on that list. Like that's a lot of stuff that's wasted time, and it's nice to have that kind of automation for it, especially if you don't have to think about it and your cart's auto-filled. Like, I mean, that's a dream for sure. That's very cool. Um, so you know, like that's so that's great. You guys are are diving into it. Are you finding you using it a lot from your phone, like with the with the tools around that, or like you're just mostly using it on the computer?

SPEAKER_03

Probably 50-50 for me. Um yeah, really. I'm curious if if you're the same.

SPEAKER_01

Uh it's mostly been because it's on the laptop. Even though I have it on the Mac Mini, like all my financial stuff, my Mac mini is literally just for my little claw guy. I was like, no, no, no, no crazy stuff going on in there. Like I kind of have some isolation, but I want to, it's making me think whenever M5 stuff comes out, you know, with this Apple announcements, if they get an M5 Mac Mini, you know, I'm gonna that's gonna be my beast mode machine with like, you know, running all this like headless stuff and all this cool things that I want to really just bump it up a whole notch. Yeah, because I I like that system a lot.

SPEAKER_02

All right. Well, that's cool. I I do worry a little bit about overdoing on the uh agent side with your phone. I do find like it's a good it's like a little bit of a vampire, you know, if you're already over spending too much time on your computer, especially with the coding agents, that you start kind of delegating and managing agents constantly from your phone, like you know, it's just quite the suck of uh energy and attention. It's a bit of a bit of a concern that I I see going forward. Curious if you all feel the same about that. Yeah. I definitely do. Just to add on to that. Yeah, I mean, as a family man, you know, you got all that concern going on. Um definitely. Yeah, yeah.

SPEAKER_01

I've talked to a lot of people who feel that way too. Yeah, I think it's just uh do you think it's a phase? Like I'm just curious, like you know?

SPEAKER_02

Okay, no, I think like more and more the more automation this stuff comes in, the more people are gonna want to do on their phones. Like you see the draw for it. Like, I I could see the next generation, they're they they're just computerless, they're connecting to cloud agents, and they're just doing all their programming from their phones, like never looking up at a bigger screen, just drawing their eyes. It's it's terrible. I know it's not gonna be phone terminals, but it's it's not gonna be like I don't know. It's it's messaging apps, you know, the the form factor solved it on the phone. Let's go. Oh man, yeah, to the app. Oh jeez.

SPEAKER_03

We all are gonna need like a day where we just like don't use a phone at all. Like I think every every p person's gonna have to get to that point where it's like no phone day. Yeah, dangerous to yeah.

SPEAKER_02

Uh all right, I wanted to switch gears a little bit. Um, you know, we were talking a little bit before about you know what it's like to design agents, uh, you know, not for coding. And I want to kind of dive into that a little bit with you, Adam, because I know that's what you've been working on. Um so why don't you tell us a little bit about like the kind of work you're doing there with with these agents and like what kind of stuff you're thinking about when designing these agents for the for that work.

SPEAKER_03

Yeah. If you don't know, like my company that was acquired recently, we we were focused on building agents around the marketing space. So I've uh very fortunately had the opportunity to learn both the hard way uh from you know customer feedback that was not great, to like actually getting good customer feedback, like what works and what doesn't work from an agent perspective. But let's extrapolate that out because like if you think about the marketing hype around agents today, the promise is that it's going to basically automate everything that we do. Like it and let's start if we apply that to any vertical, banking is one that I'm familiar with, marketing is one I'm familiar with, accounting is one I'm familiar with. But then you start thinking about the the future of point solutions going away. So that the idea would be that instead of actually having to buy 10 different applications, you buy one that can basically run your entire business. So like each of us have businesses that we run. You can kind of imagine like a an agent that does everything you need to do from the marketing you need to do, the accounting you need to do, the payroll, the taxes, the operations, the management of tasks. Think about the context switching that has to happen in that world. So I don't want to get into too many specifics, but I'm just kind of like seeding the problem here. Coding agents, it's very like focused. You're working with a very small set of tools, and then what people do is they plug in MCPs to try to expose that, and people optionally bring those in. But if you're if Eric, you're running your business and you need to figure out your marketing plan right after you're talking about like your accounting and finances, right after you ran payroll, and all of this is being run agentically, whether a combination of autonomously or interactive, the problem really becomes all around how you manage the context of things and how you manage what tools and subagents are actually available at any given time. So the architecture that I've been building out around this is what we need to do in those types of systems is we're going to have to have a much thought more thoughtful process around building what I've been calling a context engine. Now, the and and this isn't a new thing, but it actually becomes more important as the scope of the things your agent needs to do scales. Because we all know you add a hundred tools to an agent, it's gonna be bad. So you can't have a hundred tools registered at any given time. So what you need to do is you need to have this intent. You need to have a way to like, based on what the user is starting a conversation with, what is their intent? How do you take that intent, map it to what skills and what tools and what context need to be actually built together to start that conversation? Then the the problem becomes, okay, you've got this really great, ideally, really great framework to now be able to do whatever job. So, like, Ray, maybe you're doing your taxes. So the context engines built that, but you get to the end, you're like, okay, now that I've done my taxes, I actually want to go do something completely unrelated. How do you actually switch over to that new um that that new batch of like context? How do you rebuild that? So I think there's like tooling that actually has to be built to manage these cross-intent activities while carrying along enough context to actually be able to uh to actually transition into that. Because some of the things you might have been doing in that initial uh conversation may carry into that second one, but the tools, the skills, everything that's needed to do that second job, completely different. And then what you end up with is like that is a very interactive, right? So that you're interacting with the chat, you're talking back and forth, but there is also the idea of proactive agents, and we were talking a little bit about it with dispatch, but let's and and we were talking a little bit about it before the show with like cron jobs that you can actually trigger to heck go have things run. There's the difference between interactive agents and autonomous agents. So you also need to manage the idea that you have autonomous agents that do something for you. That's so it basically does all the context engineering and it pulls you in, but you can jump in later and interact with what that autonomous agent did. So that's the model that I've been building out in my head. I'll I'll pause there because I I think I threw a lot at you guys.

SPEAKER_02

Yeah, yeah. So I'm curious. Uh so you're splitting off uh you know this work in terms of managing the context for spitting off these other agents. Um, and how do you basically decide like what models are going to do what? And basically, how are you evaluating the success rate? Because I know a lot of the benchmarks we're kind of relying on when we're looking at these models are very coding oriented. There's some that are basically computer use or tool calling oriented, but like generally that's like a little vague and it's hard to see like what are good success metrics. Like, you know, you're talking about intent inference and all that stuff is super important, especially for more non-technical users trying to manage their business using these things. So, like being able to decipher what the what the user actually wants, that's like a huge part of it too. So, how are you thinking about it in those terms and and how are you evaluating the success metrics there?

SPEAKER_03

Yeah, so you you're dead on. And some of the stuff that I've talked about is kind of subjective to like what is good in some of these situations. The the key thing is to measure, to start with measuring something. So I would always say like what we're doing, what we try to do is have some set of golden prompts that we're consistently testing against, ideally automated in a way that gets the outcome that you're anticipating at some threshold. Right? Maybe your threshold is 70%, maybe it's 80%. But the main thing is you need to understand when you make a change how much it's impacting the performance of some set of data. Now, that that golden set of uh prompts should change over time as you get user feedback. Because what you thought people were gonna do with your system and what you theorize around it may not be what actually happens, which goes back to that thing we were talking about earlier, Ray, which is getting end-to-end is so vital, especially getting it end-to-end where a user can actually use it, even if you're slightly embarrassed by it. Because you can actually start gaining that information where you're like, I thought people were gonna ask these three things. That is not at all what they're doing. We actually need to tune the system to be more geared towards these types of things. Uh, but it really is a series, it's a having that starting set of prompts, even if they change over time, but just so you can measure, so you can make changes without risking things, evaluating the outcome of that, and then iterating on what that golden set of prompts are.

SPEAKER_02

Yeah, it's it's interesting. Um, I I've actually been working on in my own workflow in Repo Prompt a system for exporting a prompt for another model. So, like when you're trying to pull GPT Pro. So there's like this layering of like a model calling another model to gather context and then exporting the result of that. And I was getting some user feedback from people who are like, hey, like the end result when I give it to GPT Pro, it's just giving me a prompt. And I could see that like when users were asking the the system, they were saying, Hey, make me a prompt to do this. That's what they were asking for. And then the model was like, Okay, great, I will prompt the task to be to make a prompt, but the system was already tasked with making a prompt. So it was just redundant and and people were just not asking the right thing. And so, like, I had to go in and prompt the system to actually extract the intent from the question to then funnel it in the right way to get to the actual outcome, which is to make a plan or execute a code review. And so that's like something that like I wouldn't have thought to think about because I'm just using it and it's working fine for me because I'm asking for a plan because I know what what to ask of it. But it's interesting how you know people adding these extra layers of of redundant prompting are causing issues. So I'm curious if that's something that like you're running into as well when you're you're thinking about these things.

SPEAKER_03

All the time, 100% of the time. Like it's uh what's amazing about this space, honestly, is it's so unsolved, and I love it. And and a lot of times people are getting paralyzed by like, well, what if it does a bad thing? Like, what if it does something wrong? What if it does something you didn't expect? Well, yeah, it's gonna do that. Like, that's AI, like it's probabilistic software. We have to change the way we are approaching these things into like, let's build a theory on what we think people are gonna do, and we shouldn't just make this up, like we should base it on real data if we have it. But then we need to iterate and be okay with iterating, exactly like you said. We've had we have that stuff happen constantly at Raylead where people are, I'm like, well, this works great. Like our but then it's like that's not at all how people were talking to it, and they were putting things in there that were very different. Yeah, and depending on the model, depending on the AI model, it will interpret it different ways. You also have that layer of it to deal with.

SPEAKER_02

Yeah, and if you have models talking to other models, then that layer of chaining and the prompting leads to areas where like you might actually have compounding misunderstandings that just really astray. So it's a really challenging thing when you're orchestrating models like this.

SPEAKER_01

Um, I'm curious on the architecture and like the problem solving piece of it. I feel like people who are running businesses right now are really curious about the agentic coding. People are thinking, let me just go to open claw because people are talking about that, and all the YouTube videos are showing, you know, how they've transformed their businesses overnight. And you know, there's like a lot of hype around that, and then there's like the new Hermes model and so forth. And you know, Eric, you're just basically um and and you know, Adam, you're like you're just going right into the code and so forth. Um, you know, if I go to 11 labs, they have like 11 labs agents that are pretty nice GUI interfaces that just kind of get me started in thinking about you know, a basic agent, right? Like they already have the stuff there and it's just plug and play. How how should uh what priorities of things that should people like think about first before maybe you know like not only playing with the agents because you just need to learn how they work, but like you it sounds like Adam, you've done a lot of thinking about the frameworks for things and like if you were to distill down maybe to your children and say, Hey, we're we want to run this like cool store, e-commerce store, uh what what part of it can I use as agents? You know, I'm gonna try a lot of things and like how would you guide them?

SPEAKER_03

Yeah, I would say let's say you ru you're in a business and your CEO is like we need to build an agent. There are a couple different things that you should start thinking about immediately. Literally everybody in the world can build an agent. Small agents are incredibly easy to build. I I no offense to anybody that believes otherwise, but they really are very easy to build. The AI is really good at doing the basic stuff.

SPEAKER_02

Well, can I just say one thing on that? So so just so you know, a basic agent, if you want to just get that working, you just download one of the SDKs from and you can use AI SK, which wraps all of them, but like the agents are basically built into the model now. All it is is you have a query and you give it a list of tools, and the model will just call the tools. And so you don't have to have a four, there's no loop, there's nothing there. It's just the model will call tools and then it will answer, and that's your agent. So that's the basics of it. It's it's nothing more than that.

SPEAKER_03

And you're spot on, like literally, there is nothing that's there is no moat there. Like it is it is very simple. So, what what if you're a business, what sets you apart? What is something that you can do that open AI, that Anthropic, that Gemini, that Google or whatever cannot do? And ideally, what that is going to be geared around is context and action. So, what what is the context that your business has that open AI won't have? And if you're if you're a business with a good data foundation, you need to start thinking and theorizing around what are the things that I can do in my agent that actually give better outcomes to my customer that they couldn't get just going to chatGPT.com. Otherwise, people are just going to go to chatGPT.com. And then the action side of it is like, what is it that your company does with whatever the job to be done that is hard to do elsewhere? And how do you make that agentically accessible? So, for example, if it's like, hey, I need to, if I want to go design an email, I can go to chatGPT.com today, design an email. But if what if I wanted to send that to a thousand people? That is an action ChatGPT can't do. So those are examples of things that you I would really anchor to. And but do not forget the context because a lot of people are jumping into building agents and they have no thought press around the data that needs to go in, how you should actually manage, inject that context into the prompts that actually are going in, but also give the data, give the agent the ability to ask questions about your data so that it can make good reasoning decisions based on the question to be done.

SPEAKER_02

Yeah, I think another point on that that's really important is how are you designing the tools for kind of retrieving that data in such a way that like it doesn't overwhelm the context window when it returns? Like a lot of people's first mistakes when they say, like, oh, let me just have the model query database and it returns like 10 million results. Like that's just an overwhelming amount. Like the model's gonna try and get broad results, but like, how do you kind of distill down what it needs from that call and try to org optimize the call so that it kind of gets to what it needs faster? And I think a lot of this is a lot of iteration, but it's also like formatting output and being clean about like you know, restricting what happens. I know cursor's approach is that they just dump that to a file and let the model. kind of read from it. But in my experience, a lot of the time the model will not necessarily go and open that file. Or it maybe it will in some cases, but you know, and then it adds more tool calls to retrieve the data. So finding good ways to kind of you know massage the data and return it in a way that is clean and efficient as like goes a long way and it's one of the things that I do with my MCP tools.

SPEAKER_03

I actually love that Eric.

SPEAKER_02

Yeah.

SPEAKER_03

Yeah I I do I love that too. The one thing I would also add on this is a theory I haven't actually implemented this part yet. Everything else I've talked about I've definitely worked through. But I honestly believe code gen, we need to start like we need to start giving agents the ability to start operating and so for example in data analysis if someone asks something like hey I want to understand the propensity of blah blah blah there's no reason we shouldn't give the agent the ability to to actually go do some of that work for us via code. That's my theory currently again I haven't executed on it but I think figuring out like the sandbox architecture and making sure we can run things safely is going to be the next like pivotal thing to kind of figure out an agent development.

SPEAKER_02

So actually on that note what's interesting um is I was having a kind of markdown like the other day I was working with with an agent to take a markdown file and convert into a PDF and I was noticing the agent would go off and and just make a throwaway script and a temp folder and it was just like I'm just going to make this temp script to kind of solve your problem. But like I think the temp script part of it is actually the wrong thing to do. It's like oh well maybe I'll need to iterate on this script or maybe I want to keep that script around for next time when I need to do it. And so pushing the model to kind of have reusable scripts is very important. I think that's what like skill files are good for is like you can kind of guide the model write the script store it for later and and just like pushing the model to write this reusable code. Like of course the models need to be able to write code to kind of analyze data I think is super important but like if everything's ephemeral they'll forget how they did it and they'll have to relearn it every time that's just inefficient. So yeah finding good ways to make things reusable yeah very important. Yeah. Anything else uh on this like work of dealing with multiple agents especially like I know a lot of people are very obsessed with orchestration these days you know they want to kind of token max and and I'm curious like what you're thinking on uh with that yeah I saw you unmute so I'm gonna give you a chance to jump in here before I I close out um well I I think we will save it for the next episode because I gotta head on out soon and I think there's a I'm gonna have an interview today um with some of the folks who were basically um token maxing and they're from Korea and these guys are just like crazy next level and they uh even the Jeff Huntley the inventor of the Ralph Loop saying that these guys are the token slot masters. So for him to give that to them like it makes his Ralph loops they use Ralph in their looping uh type of thing uh he's just kind of has about you know like give them the crown of like wow these guys are yeah thinking about things a little bit differently and we could talk more about the architecture for the next episode so stay tuned folks love it uh yeah just a quick thought on that personally like I think like once you have more than one layer of indirection I think you're kind of overdoing it in my opinion like when you have one agent managing some agents I think they can manage one layer of agents like that's doable and you can kind of still audit what's happening but like a second layer uh to me that's too much and then you're kind of getting away off the rails and I don't I don't think we're there yet in terms of what you're able to do. But it's also because like you're not able to have oversight over this and and I I think we're not quite there at the point where slot maxing is worthwhile. Like I think you're just burning tokens for the sake of it. Don't listen to Jensen who says that if your 500k a year engineer is not spending half a million sorry quarter million a year in tokens that they're they're falling behind like that's just bullshit. Like actually someone did the math on that and I don't think like I don't think it's reasonable for anyone to even come close to spending like the amount of tokens you'd need to kind of burn a a quarter million of of in tokens it's just it's just such an insane amount.

SPEAKER_01

I mean maybe once the next generation these guys do but not one billion tokens per day two billion. Yeah that's too much just for one guy.

SPEAKER_02

Yeah and what are they shipping with that?

SPEAKER_01

They got like oh they have like the oh my uh codecs and then oh my the oh my guys oh yeah yeah yeah yeah the omai guys yeah yeah yeah they're crazy I mean listen good for them I'm happy for that yeah I'll have more details I would like some of those billions of dollars if they don't mind I'll tell I'll take the 250k you can take you can keep the tokens oh jeez all right well I think that's uh kind of a good place to kind of wrap up for today then so I stuff to think about if you guys all have questions for Adam or myself or Ray uh on on what what it is that we're thinking about in terms of you know multi-agent orchestration or you know wanting to dive deeper on building agents outside of coding as well I think that's like an important thing to start thinking about uh please do drop a comment and let us know uh I want to hear from you yeah any uh closing thoughts there for today Adam on this nope I appreciate you all and uh if you guys have better ideas on how to do any of this stuff drop it in the comments below please yeah we're always learning back yeah yeah yeah yeah make sure to leave those five star reviews we are aiming to be the number one AI podcast so if you watched our first episode Claude Sonnet had claimed to be the number one AI coder uh just because they you know denounced that themselves and of course you can give your own self medals and so uh we're asking you for help as well so five stars all across the board uh whether it's Apple Podcasts Spotify uh make sure you go ahead and do that it actually helps the rankings so we're moving up on our little rankings here and I appreciate all the the reviews have been coming in amazing thank you so much guys and uh hope to hear from you hope you have a great rest of your week everyone cheers later