MarsBased podcast - Life on Mars

How standardizing AI boosted our dev efficiency up to 50% | Building MarsBased

• MarsBased • Season 2 • Episode 117

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 32:06

Welcome back to another episode of Building MarsBased, the series where we share the completely transparent reality of how we have built our development agency since 2014.

In this episode, we pull back the curtain on our 18-month journey of adopting Artificial Intelligence within our web and mobile development workflows. The rapid evolution of large language models has completely transformed the engineering landscape. However, finding a process that actually worked for a professional agency took a lot of trial, error, and strategic pivoting. 

We structured our transition into three clear parts, starting with a Divergence Phase in early 2025. During this time, we went full throttle and let our 30-person team test every tool on the market, from Cursor and Replit to ChatGPT and Raycast AI. While this freedom was an incredible learning experience, it also led to widespread analysis paralysis and left us paying for costly, unused annual subscriptions for tools that the market deprecated just weeks later.

Everything changed when we entered our Convergence Phase in early 2026. We made the executive decision to standardize our entire tech stack around Claude Code and introduced a unified RPI (Research, Plan, Implement) methodology across all teams. By putting everyone on the exact same setup, we were finally able to benchmark quality and track results. This standardization triggered an immediate shift from minor marginal gains to a staggering 20% to 50% increase in developer productivity across our client projects.

Today, we are operating in a continuous Refinement Phase. Through our daily internal knowledge-sharing sessions called Martian Tapas, our team constantly tests the boundaries of our setup. We are optimizing token consumption with specialized plugins like Caveman and RTK, while simultaneously experimenting with local AI models to ensure our agency maintains long-term regional independence from US-centric, VC-subsidized platforms.

Watch the full episode to see exactly how we structured this rollout and how you can apply these professional frameworks to your own development team. If you have any specific questions about our infrastructure, drop a comment below and we might have our CTO, Xavi, break it down in a future video.

If you appreciate honest, data-driven insights into the business of software development, please Like, Subscribe, and Share this episode!

Support the show

🎬 You can watch the video of this episode on the Life on Mars podcast website: https://podcast.marsbased.com/

Welcome To Building Marspace

SPEAKER_00

Hello, everybody, and welcome to another episode of Life on Mars and another episode of Building Mars Base, the series where we try to explain in all open transparency how we have built this company, this product development agency that we started in 2014. In previous episodes, we talked about NDAs, we talked about sales, we talked about choosing the right technology stack, how we distart the very first days of the company, and how to come with the idea of the name, of the logo, and some lore of the company. So if you are more interested in this, just go and check out the first episodes.

Why We Fully Embraced AI

SPEAKER_00

But in this particular episode, I'm going to be talking about our transition to adopting AI, fully embracing AI in our development processes. Because the latest developments, the latest releases of models have transformed not only the way we work, they have transformed the entire landscape of web development, web and device and mobile development, or development as a whole. So we are not an exception. We one year ago decided to investigate fully, invest fully in AI development. And in this episode, I'm going to be explaining the three stages in which we have divided this transition period that started more or less around January 2025. So if we go back to January 2025, a little bit earlier than that, we had to find the master plan for Martspace for 2025. Every end of the year, the three co-founders of Martspace, we get together and we decide the strategy plan for the next year, the marketing plan, the technology plan, the uh how many people will we hire, and what will new lines of business will launch this year, whether we will adapt to certain trend or not. And in 2025, we decided that you know this thing about AI seems like it's going somewhere after the initial hype of, you know, 2023, um, early days of ChatGPT, mid-journey, if you remember those early days of generative AI that was mostly convenient to do some fun stuff, but they couldn't really help you that much in or didn't shape that much how we were working. They were not as transformational as the things that we have seen lately. Um, we decided to be late adopters of those technologies or those methodologies or those trends. But 2025 we said, look, this thing is actually happening. We were seeing that tools like Lovable, Replete, Cursor, Windsurf, they were actually becoming pretty good at helping us do our job. Maybe they were not fully capable of developing an application in one shot as they are now. But back then they were showing some severe promise. And we increasingly started using them more and more. Some people in the company that were super early adopters, some people were pretty skeptical, some were sitting in the middle, they were fence sitters. And we decided to, you know what, it's time we built a strategy around

Phase One Divergence And Experimentation

SPEAKER_00

this. So we decided that this strategy was going to be threefold, and in the first phase of this strategy was this the strategy or the part that we call the divergence. This is where we actually opened up and decided to adopt in January 2025 to go full throttle and fire on all cylinders with AI and adopt all of the technologies, try all of them, benchmark them, test them in site projects, in company projects, in pilot projects. And we were testing many technologies at once because back then there was not a clearer winner, like somehow Cursor was taking the lead, but some other models were already catching up, and none of them was particularly spearheading this in the way that Cloud Code nowadays in 2026 is doing. But back then, all of them were showing promise, and we were even using AI more to consult information, documentation, um, maybe to check APIs, maybe to chat with the code, but uh it wasn't actually producing 100% of the code. It was producing parts of the code, like, hey, I need to implement this functionality, or what's the best way to build this uh pattern in Ruby, or what's the most efficient way to do this kind of, you know, uh to build something around a particular protocol in IoT, for instance. It was more of an assistant, right? And so at the beginning of the year, we decided to tell everyone in the company to go build your own stack, try your own things. We gave everyone in the team a license of Raycast AI, if I remember correctly. We invested heavily in Raycast AI and ChatGPT. ChatGPT was already paying for everyone in the team because it was good and useful for a great amount of things. Uh it's you know, as a generalist tool, it's good for producing reports, researching, creating images, some stuff that was good, not particularly centered around the development part. But neither was Raycast, just that Raycast felt more of a productivity tool that was pretty much embedded in our file system. So therefore it was useful if you wanted to uh search something in your code base or talk to your documentation or even create your own your own plugins, right? And so you could create your plugin for translation, your plugin for uh even like uh testing an API or building JSON files or stuff like that. It was not a full ID, but it was pretty useful in a lot of things. The thing about this phase of diversions, it was very useful because we were talking on a weekly basis with the entire uh team of Marspace, and we were sharing the product developments of each one of these tools. And some people were like, yeah, use Cursor for this particular reason. Look, if you just use also these plugins, it's gonna do XYZ and stuff like that. Some other people were on VS Code, and it was useful because we could compare and we could see many things, the pros and cons of everything. But at the end of the day, we saw that that was helpful, but not super helpful. Why? Because for one, we couldn't compare two different people using the same technology, because everyone was using a different configuration. Some people were just doing their research with ChatGPT. Maybe they had a plugin or two on Raycast and they were building on Cursor. The next person was using Gemini and was not using uh Raycast for anything and was using VS Code. Another person was using only VI for development, and maybe occasionally they would use something on they would search something on perplexity, right? And so it was very difficult to benchmark and to put the same the everyone on the same level and compare the outputs in quality and in quantity of different teams or different people in the same team. So all in all, it was different to compare pairs with apples. And so a while it was useful to see what was out there and to maybe discard certain technologies or methodologies or frameworks or tools because they were either like too expensive or too incomplete or they were not secure enough, or we didn't like the outputs that they produced, we decided that that was a good experiment, but we needed to move to the next phase. And that

Two Costly Lessons On Tools

SPEAKER_00

drastically happened in November, December 2025, so at the end of the year. So we spent one year testing, as I mentioned, tools, frameworks, libraries, um, even different models, and learning about Opus, Sonnet, Haiku, uh, the codex, and all of this, or even testing Cloud at the towards the end of the year, towards the second half of the year actually. And um, we learned many things. One of the things that we learned, and one very maybe not very expensive, but it was significantly expensive fuck-up, is we paid annual subscriptions for a lot of these tools, which we ended up not using. As a matter of fact, I'm still using a annual ChatGPT subscription that we paid for last year that I'm not using anymore because I transitioned to Clot on in January this year, but uh it's still active there, and until it runs out, I can use it. So we paid for RayCast and we paid for ChatGPT about a 30 licenses of each for one year. So that was a moderately expensive fuck up that I hope that other people don't incur into, mostly because things change so rapidly that it doesn't really make sense to pay for an annual subscription, especially in a in such in this landscape that it evolves so rapidly that a tool that works well might be obsolete next week. And maybe you don't use it anymore after a couple of weeks because the models have drastically changed, or the tendency in the market is to move towards this other thing and whatnot, right? So while we learn how to do things, we also learn how not to do things, and one of the ways of not doing things is paying upfront for these subscriptions that we saved on some bucks on paper, but we ended up losing all of these months that we are actually if we have paid for it up front, but we will not be effectively using these tools. So unless you really, really need the money, and in which case, you actually you wouldn't be paying for an annual subscription, anyways. Don't pay for annual subscriptions for AI tools unless you're 100% convinced that this is the way to go, and I'm gonna be sticking to it like my evolution or my football club, and I'm gonna be with them forever. In which case, you know, might make sense, but it might not be the sanest decision in the world. Because as I said, and we have seen it constantly, um, the new models constantly keep evolving and they keep changing and outdating the other ones and deprecating the other ones on an almost weekly basis. Maybe not weekly, but on a monthly basis, that happens. And so we are tied to always trying the next thing. And that's another one of the perhaps the expensive learnings that we did. Not as expensive uh as expensive as the pre the previous one, but it's also somewhat expensive because it produces an opportunity cost. The moment in which you are more interested in trying the next thing, the newest model, and you're always trying a new model and you haven't finished testing this model, and there's another model coming out, you're actually not producing meaningful work. You're not creating, no, you're not extracting any value out of this analysis. And you end up stuck in analysis paralysis. And that happened in with a few people in the company. They were like, Yeah, but now I'm gonna be trying this new um model, and with this optimization uh thing that I've built for it, and that I read this as another methodology, I'm gonna be trying it out. At the end of the day, you're like, Yeah, okay, yeah, you've been spending so much time in experimentation that the real output of it is very little or none at all. So if I were to go back in time, I would change these couple two things. One of them not paying for annual subscription. The second one is commit to a certain amount of rigid experimentation time every week and don't go over it. So let's say if you want to experiment four hours per week, that would be great on the new models and new tools and new libraries and new methodologies, but no more, because it's a rabbit hole. You can get stuck in experimentation and just go down that rabbit hole here and there, and you could spend the entire 40 hours in the week and not come out with anything tangible. Because the next week there are gonna be there's going to be so much new material to try that you don't you end up not working in your actual project and you're not delivering any kind of value to the company or to your own projects. So, as I was saying, and of end of phase one, this phase one that lasted for about a year, in which he we made all of these learnings, more or less around December 2025, from the last two months of 2025, that's when Claude came in and completely revolutionized the

Claude Code Changes Everything

SPEAKER_00

market. And with the newest models of Clot code back in the day, so more or less December, November 2025, things changed drastically. And whereas before, AI augmented development or AI assisted development was sort of a sophisticated autocomplete, in which you were writing your code, and then the IDE was suggesting, oh, you're actually trying to write this, and so you hit tap and you autocomplete, yes, no, accept, reject. That was great. That was more of an assistant. With these models, it became an augmentation of yourself. It actually replaced yourself. And so the same meeting in which 12 years prior we had decided to embrace fully AI, the meeting we had in 2025 in December, we decided certain things for 2026, but we had to redo it again in January 2026 after we tried all of these models, and we saw that the hype around the latest developments and releases of Cloed Code, particularly, were living up to the expectations. And that completely changed the landscape. That completely changed the company. In 2025, phase one, we saw marginal gains by using AI, just because maybe we didn't have a standardized way of using it, maybe not everybody was fully trained to use AI, maybe we were not actually comprehending how the context windows work, the differences between the models, large language models, small language models, this kind of task, the non-predictability of the outputs of the LLMs and stuff like that, or just the sheer reality of working with non-deterministic models, that was difficult. That was a learning curve of how to use AI in an effective manner. But in 2026, what we saw is that, hey, actually, these marginal gains of 3%, 5%, 7%, even in certain projects, we're actually talking about 2025, 30, 40, even 50% more effective, more productive. Why did this happen? Well, for one, and this is something that is there are some things that replicable, so you can take this out from this video and apply it to your company. Some things are not replicable. For instance, what is not replicable is this year of 2025 learning, fucking up, testing, tinkering with all these tools. That's something that put us in a completely different position just because we tried everything, we learned everything, we shared everything with hive mind in the company, 30 people that we are in market, we're trying new things. And so all of this collective, uh big brain that we have built around developing with AI, that's something that is very difficult that somebody else by watching this video only and solely will be able to be in the same position that we are. However, there are other things that are much more applicable. The second one maybe is more applicable, which is hey, if we diverged in this in the first phase of the adventure into AI augmented development, the second phase was conversions.

Phase Two Convergence On One Stack

SPEAKER_00

We saw that cloth code was the way to go, we saw that certain models were the way to go, um, and we even converged on certain libraries and plugins, right? And so in January 2026, we decided to just tell everyone in the company hey, uh last year we experimented, last year we tinkered, last year we were sort of a lap of AI and development. This year we have to, we have already decided this is going to be our stack. We're gonna be using Clot Code, we're gonna be using these models, these models for particular things, these other models for these other things. Um we strongly uh advise to use these three or four uh plugins, maybe like we're testing now plugins like Caveman, RTK, and stuff like that that reduce or optimize the usage of tokens. And the methodology of at the time was research plan implement. That was a very good methodology. You know, there were like two worlds, one is spectrum and development, the other one is RPI. We decided to go with RPI for no particular reason whatsoever. Maybe Chavi, our CTO, will be able to explain better in another episode. But we tested both, we decided that was the way to go. And we decided to stick with it because the moment we're we stuck with it and we standardized this across the company, we could compare the output of different people in different projects and see whether that methodology and this stack actually matches the way we work across projects, right? It was very difficult to compare before that because maybe we tried something work well in a project, but in the next project it didn't work. Whereas here, we can see that if everyone's using the same way of working and they are in this project half of the week and in this other project half of the week, both of them work well, this brings us closer to accepting or to finding out that this is the right methodology. If we find a methodology like the one that we have right now, which is heavily based on research plan implement, we are actually in a way better spot because uh if you can work in five different projects, or you have 20 odd developers or engineers like we have, and um they're working across projects, and this methodology seems to be working in most of the projects, if not all of them, and we haven't seen any big problem. This is the way to go for now, which is another of the learnings that we have done so far, which is hey, what works today might not work tomorrow. But I've got to say that at the time of recording this, which is June 15, we have been using more or less the same methodology. We have been still we're still using clothed code. Of course, the models have changed significantly. Um, because right now we've got uh Opus 4.8, back then it was Opus 4.5, if I remember correctly, with Sonet, it was Haiku for quicker things and smaller tasks. And we recently saw Fable for about 24 hours, and uh then it got revoked outside of uh outside of the US. But uh, this has changed, but it hasn't really changed the way we work. The way we organize it's still the same. RPI, clothed code, these three or four ways of of working. We have um actually integrated everything with our ecosystem of uh Google Workspace, Linear, Slack, and and the other tools that we've got. And this has been proven, or this has uh this has proved to be very, very consistent for the last six months. We expect that in the next six months, nothing's gonna change considerably unless maybe Fable gets released to the wide public uh outside of the US. Maybe there is uh another way of doing it, maybe we are cut off from using all US models or something like that, which we cannot predict. But what we have seen is in this second phase, the phase of convergence, that actually right now we have been running for six months, we are happier, we're more effective, we're more productive, and we're learning more. And there's another byproduct of this. If everybody is using the same thing, if I share something I found out today, like, oh, if you use this slash the I find a new comment that maybe the other people don't know about, they can share it on our Slack, and everybody else will be gaining this knowledge. And this is this contributes to the GDP of the company, right? Because whereas before it would be like, yeah, I I found this VI plugin and no one else was using VI in the company. Like, ah, good, but uh what good is it to me? Whereas now everybody, like any finding that we have for Cloud or for RPI or the way we you know use our context windows and stuff like that, it's very likely that if I share some findings, some learnings, and our Slack and our linear, everybody in the company will be able to profit from that knowledge. And that that enhances and improves the experience for everybody in our team. So that brings me to the third phase. So remember, phase one, divergence, try everything, experiment, tinker. Second choose, find what works, and then converge on it and just pick your stack and try to learn as much as possible, teach everybody, make sure everybody uses the same setup, the same environment, same tools, and stuff like that. Phase

Phase Three Refinement And Cost Reality

SPEAKER_00

three, which is an ongoing phase, it's the refinement, right? So we are constantly now, even though we are on it's sort of a 2.5, if you will, because it's still convergence. But what differs from phase two is that Instead of just learning and just optimizing the we work, is we are questioning everything we have learned so far. So with every new model, we're seeing does this model produce better output? We're not adopting it blindly just because it's the new model and we have to do so. Because it's more expensive, but it's gonna be giving us more things in return. It's gonna make us more productive, more creative, or less iterations, less bugs and stuff like that, it's faster. No, we are does it really help our costs? Is clot code still the way to go? Is RPI research, plan, implement this methodology is it still good? Like we are refining because right now we have we have seen that out of a sheer natural, an organic process, people are starting to diverge a little bit right now in the same stack. But the way people use annotations or the way people divide the phases of research plan to implement, because like there's I'm gonna be talking more in in detail about this, but maybe some people use uh markdown files, but some other people are starting to work in setup with HTML files because um they find it's much more productive or easier, more visual to see uh that the the the sort of annotation files happen inside HTML so that it's more visual, not constrained to text only, which is what happens with markdown. Because in terms of computing power, in terms of cost, in terms of time, it's irrelevant. Like it's actually the same, there's no difference. Um so some people have decided to start that, and then we can actually compare. So we have a mechanism inside the company, which is called the Martian Tapas, in which every Thursday somebody talks about something, some cool project they build, some new methodology, something they tested, something they were interested in. And um, we have been finding the last six months pretty much every single instance of the Martian Tapas, every Thursday, has been around AI and findings, right? And so we have seen that some people like to have a manual changelog or a tree of decisions in Markdown inside the project. Some people don't like that, some people delete them after certain commits or certain pushes, or whenever they they create an or they they they close the the PR, they just decide to get rid of all of this because at the very end of the day it's clutter. And if you have already the information and the comments, why should you store them in markdown files? Because then it creates duplicates and might not actually add much to the to the context and stuff like that. So these small differences might not be very important right now, might not seem very important right now. Well, that's the difference. They are important, they don't seem important because right now somebody else is paying the cost. The entire AI scene is being subsidized by venture capital. And um, we are seeing, like we saw at the beginning of the cloud transition, where the costs aren't really the real cost that we are going to be paying in a few months from now, maybe a few years from now. So it's very cheap to experiment. Um, companies and VCs are still paying for a significant percentage of the real cost of this so that they create market adoption. And in doing so, more people will use it. And in theory, that brings the cost down. Um but the reality is we are playing now and we are creating business models and we are using it to a certain amount of capacity that we will not be able to use it in this capacity, or these business models will not be sustainable once this subsidizing of the usage of AI or the this compute power cost goes significantly up. And so that's why I think it's very important that we keep trying all of these things at the same time that we are actually advocating for and testing maybe local models and smaller models so that we maybe can become independent from you know all of these uh US or Chinese models. We want to have something more centric to our region and completely independent in the case that a shutoff happens or something like that. It's if it if there's a time to invest in that, it's now. So I think more or less I have explained everything I wanted to say here as a quick summary, phase one, divergence, try many things, everybody uses their own thing, everybody learns on their own in their own way. We fucked up a couple things, so we fucked up by paying significantly a significant amount of money for upfront licenses of products we ended up not using. And the second one is we couldn't really standardize, we couldn't really learn, or we couldn't really teach each other or compare the performance, the outputs, because everybody was using a different setup. And and so it was difficult to compare between projects and to see what really worked. Second phase, um, after one year of experimentation and walking in the desert, we converged on a single stack, as I mentioned, so cloud code, RPI. And for these six months, between January and June 2026, we have been working in a standardized way in the company so that we can compare different teams with different technologies. Is this methodology or is this way that we work significantly better with Python over TypeScript? Not really. We haven't seen these kind of differences. We haven't seen many, many differences in also using Ruby versus TypeScript. It's marginal so far, at least for the kind of work that we do. Maybe if we were to work on something much more uh computationally heavy or sheer like the sheer scale of big companies like LinkedIn or Amazon, something like that, maybe we would find different learnings and more significant differences between technologies. What we have found is the way to that it seems that we're working better right now is um using RP RPI, that we have a good methodology of implementing it and some certain patterns that help us to manage the context windows better and the documentation, the traceability of the things that we are developing, the decisions that we have been making, and the integrations with other tools that we play with, like Linear, Slack, or GitHub. And in phase three, it's the phase of refinement, the phase that we are right now, in which we are questioning everything again. So it's not a pure, it's not a divergence, it's a little bit of divergence inside convergence, because what we're seeing is that by adopting this stack, naturally and organically, people evolve their way of working, and they take different little paths from the standard way. And if they don't stray too far from the general direction of the company, that might end up being very positive as well. Because somebody will be saying, like, look, yeah, we actually I started using this, as I mentioned, like one of the examples that just came to mind was, hey, um, I've been trying um RTK, for instance, or a caveman, and while it looks like a joke plugin, it's actually very useful because it just uh saves on token consumption, and therefore the cause of the uh of AI 4, like what do you actually get out of uh AI being a little bit more verbose? You don't actually need it. Uh on the contrary, I'd rather not have it talking that much, or um just save on certain instructions that maybe shouldn't be uh done with a big uh language model, so large language models, and maybe it could be um somewhat proxyed into a little bit more of another algorithm that just says like, oh, this is a git, uh this is a rebase, or is this a creating a new branch and stuff like that? Don't use an LLM for that. Use some standard uh practice and some standard coding, you will end up saving in in token as well. So this these three phases are the um the three phases we have experienced in more or less the last 18 months in

Questions From Listeners And Closing

SPEAKER_00

the company. I hope you found this interesting. If you have questions for the team, because you might be having questions about, oh, how do you actually um implement so the differences between RPI? Maybe there's a somebody uses an intermediate phase, as I mentioned, like somebody else, somebody in the company is using RPI, research, plan, implement. But some people uh have got different approaches to uh when is it actually acceptable to skip the research part or the plan part for small tasks, for um enhancements for like quick bugs and stuff like that. Maybe you don't have to be going through every phase. Some people do it religiously because they want to build the habit and it's it's actually good. Maybe it consumes more tokens, but at the same time, you're helping to train the AI a little bit more, and you're also becoming a much more organized and coherent programmer at the same time. So maybe it was not worth the effort uh of going through every phase in a really small task, but you build a habit and therefore you will not skip it in a future um feature development, right? So, with that being said, if you got questions for the team, if you've got questions for our CTO, Chavi, if you want to have more visibility of the cool things we're doing with AI, stay tuned. Um, subscribe to the podcast if you like it, just uh share it, like it, and and and and post it on social media and send recommendations for speakers because as you have seen, we have got the Road to CTO podcast, we've got regular interviews as well. So this is the Building Marspace series in which we speak about how we are building Marspace. This is our 12th year uh going into our 13th year, building a fully bootstrapped, uh, very opinionated lifestyle business from Barcelona. And I hope this can serve as an inspiration for all the people that want to create a better world for developers out there. So, thank you very much. I'll see you in the next video.