S1 E3 - Ray Myers - Devin AI and Navigating the Hype and Reality of AI in Software Development Artwork

Attention is All You Need

A Podcast with a Botchagalupe spin on all thing Generative AI.

Attention is All You Need

S1 E3 - Ray Myers - Devin AI and Navigating the Hype and Reality of AI in Software Development

April 23, 2024 • John Willis

In this episode of the Profound Podcast, I sit down with Ray Myers, a tech lead in chaos engineering at Indeed.com with a rich background in software engineering across multiple prominent companies. We delve into the fascinating intersection of artificial intelligence and software development, exploring how AI can both augment and challenge the conventional workflows in programming as well as his thoughts on Devin.

Ray discusses his journey into AI, spurred by the developments in generative AI technologies like GPT-4. His insights draw from a traditional AI education but are deeply influenced by the rapid advancements in the field, particularly in areas like natural language processing and code generation. Ray shares his perspective on how AI is reshaping software development, pointing out both the potential benefits and pitfalls.

Ray provides a nuanced view on the recent buzz around Devin AI, addressing its marketed capabilities versus its actual functionality. He discusses the initial excitement in the tech community about Devin's potential to automate coding tasks and the subsequent realization of its limitations. Ray emphasizes the importance of understanding what AI tools genuinely offer versus the hype often propagated by marketing efforts, stressing that these tools should be seen as augmentations to a developer's capabilities, not replacements.

Ray's LinkedIn can be found below:
https://www.linkedin.com/in/cadrlife/

John Willis: [00:00:00] Hey, this is, I think, going to be my 3rd podcast in my new AI. How do you like that? I didn't think that I'd be saying that. My new AI podcast series. I got a gentleman now is really just, you know, you, you kind of watch what's going on. You listen, you hear people, and then all of a sudden, somebody, you know, Strikes a chord with you, the way they think, the way they talk, the subject matter they talk about, and this gentleman has definitely gotten on my radar, and I think I'm surely becoming a huge fan of his sort of, you know, in my world, you're a youngin but like, I think we've got, like, quite a, your experience, you know, and your thought process about our industry is, you know, it just sort of bleeds.

So, Ray, you want to introduce yourself?

Thank you so much, John. Yeah, I'm, I'm Ray Myers. I'm currently tech lead of chaos engineering at indeed. com. And you know, in the past I've, I've done a variety of roles at Carfax [00:01:00] at Morningstar a subsidiary of United Health Group. So I've, I've had a kind of a variety of backend and later platform at DevOps SRE roles across these various industries.

And yeah, a couple of years ago I decided Maybe I had something to say. And I guess the reason we're talking now and why you've created this your spinoff of your you know, profound knowledge podcast is that In software, if we want to make sense of the future of it, we've got to figure out how AI factors in as well.

And I felt like there were some, as you, I think have, there were some concerns that weren't even being considered. The, the craft itself is already so poorly understood. So I want to see if I can use, you know, my experience to, to help AI have the best impact it, it can, and avoid maybe the negative ones.

John Willis: Yeah, no, I think there, you know, there's a good, [00:02:00] bad, ugly story here, right? I can, and we can sort of dissect that, right? You know, like, I think everybody, you know, I mean the, I, you know, so your, your you know, sort of your analysis and some of the criticisms, which are like. Spot on don't sort of like, it doesn't sort of conflate with the idea that you're a fan of this stuff.

And I think that like, we're all feeling that like we I, you know, I, I'm sort of not even joking, but I think, you know, I've coded more in the last year than I did in the last 10 years, you know, and I started my career, first 10 years of my career was coding, right? And, you know, but I get into like being a thought leader or whatever, that's a terrible phrase, but, you know, I'll go a whole long one where I'm just doing presentations and writing, but, but I, you know, I'm, I'm having so much fun coding again that like, you know, all the things I hated about coding.

You know, the things I love about coding is that feedback, you know, you're creating things, you know, you're, you're sort of putting something out there and bits and bytes and it's coming back and [00:03:00] it's turning real. The thing that I hated about it is like, okay, this language requires a colon, a bracket and a semicolon.

The other language would require a bracket, colon, you know, like that kind of stuff, but like, a lot of that stuff's gone now with, you know, Copilot and ChatGPT. So how did you get into, you know, what was your sort of. It's kind of like now I'm realizing that my, for people who didn't have classic, and I'm assuming you didn't have a classic AI background you know, kind of like I'd ask everybody on my Deming thing, like, what was, how did you get into Deming, in this case, like, how did you get into AI and what sort of sparked you?

Guest: Yeah. Yeah. Well the, if it had been, had I got into Deming, that would've been a short conversation because it would've because of you, .

John Willis: That's funny. There you go.

Guest: Because your, your podcast and, and your book and in beyond the Phoenix project.

John Willis: There you go. Yeah.

Guest: You know I. I got into AI, I mean, I guess you could say I had a traditional AI background for all the good it did me.

Because I went to school, [00:04:00] I graduated you know, around 06, and we actually did have a two course sequence in AI that I, that I, I took an artificial intelligence course that focused a lot on game AI and graph searching, which is, you know, good, good stuff.

John Willis: Right.

Guest: I I did some competitions actually.

I did pretty well in them. And then the second course in that sequence was by Daniel Torrance, who was briefly my thesis advisor, and it was in evolutionary algorithms. So that was you know, that was a field of AI as well, but at in our department, you know, we we were thinking about evolutionary algorithms a whole lot.

They're really cool. And neural networks were in a bit of a slump. So we were aware of them. But like it, you know, we wouldn't have guessed, I wouldn't have guessed that 10 years ago when we talked about ai, we'd be saying, okay, it's neural networks in some change. That's really what's catching on. So I [00:05:00] was then a late comer, despite being early entrant into, into ai, the generative ai.

I you know, really started focusing on it when pretty much the moment GPT-4 came out. Okay. And then the lucidity that, that it was coming out with the degree to which it. generated plausible code as well as plausible prose was something the word I use at the time, this can no longer be ignored.

And I set out to discover like, what precisely can it do and not in a practical context.

John Willis: And you you know, so you, you know, like if I go back to your resume and I watched some of your videos and you definitely have a lot of content out there, which is awesome. So you're, you know, you're, you're a professional coder.

I mean, you think about coding, you think about architecture, you watch your videos, even the sort of people that you point to, sort of, you're, you know, sort of heroes. It's, you know, like Farley and you know, different people you know yeah, Margaret Hamilton, , you know, right? And Fred Brooks, who, you know, I didn't, [00:06:00] like, it was kind of, like, I knew who Fred Brooks was, but it wasn't until I watched a video and went back and searched that I realized he's the guy who created the interrupt system for Microsoft.

I mean, not Microsoft, for IBM operating system. But then I knew, like, I guess I just forgot he was the author of the Mythical Man Month, right? So, but yeah, so my point is, long winded, is that you are, you know, you're a coder, you, you know, you have a professional coding career, and now complementing that with SRE is pretty awesome background.

But I guess to people who are still on the fence about, like, oh, you know, that, you know, Coding stuff. It kind of, I'm not, I'm worried about it. And like, what do you, what do you say to them to like, so like, Hey, you know, I, I've been doing this for a long time too. Let me tell you why you should be like, like what you said is we can no longer ignore it.

Guest: Yeah. Well, I mean, a particular thing. I may be having more in common with the people you mentioned than other people, actually. And in my video, people can check out on specifically the [00:07:00] point of whether they are just going to wipe programmers off the map. It would be on my channel, Craft vs. Cruft, and it's just called, Will AI democratize programming, which I put out a year ago.

My position has not changed based on anything I've seen in the next year. And the thing is, we have always been trying to democratize programming for the entire history of programming. It is not a threat. To programming or programmers. It is, in fact, our aspiration. So really, I think I'm not saying they can't do this out of fear that they will.

It's that I am maybe even disappointed that they won't. But nonetheless, it won't completely displace the field. But I do think it's too powerful to ignore in terms of its contributions. Nonetheless, there's just a lot to do, though. We have a lot of work to do, you know when infrastructure as code came out with much, much help from yourself that was, you know, kind of revolutionary and what it added, but [00:08:00] I still have a pager you know, we, the, the field still exists.

And compilers didn't eliminate programming and OO didn't and any number of other things that made a great impact. So I think we just have to see this as, as another moment in that history. And there's no evidence that it's going to be completely discontinuous, discontinuous with, with the history as being suggested.

John Willis: Yeah, it's, it's an augmentation, you know, like I'm actually, you know, the reason I wanted reasons, a couple of reasons I'm, you know, my day job has sort of converted to be like DevOps and generative AI, whereas the technical that my sort of night job is I'm writing another book about the history of generative AI.

So I'm like in the same style as my Deming profound, very storytelling oriented. And, and I think the, you know, the, The like, if you sort of look at the history of things that like at the time, like the handheld calculator was like, I remember that I was in high school, you know, like [00:09:00] it was like, you know you know, like the whole like goodness, like you can, you know, I was literally I just.

I was like probably the first generation of, you know, advanced math in high school that didn't have to use a slide ruler, you know, like the calculator came out just in time. But, but like, imagine, like, I know what that felt like. And I, you know, I even talk about, like, imagine what the printing press was like, you know, so these are augmentation tools, but, but, but what you so, I mean, like, The people that are afraid of these tools, I mean, like they're afraid of them in some ways, like it's going to replace the job.

And like, and I think we can talk about that, but, but like, why should they be like learning these tools and using them? It's very much like what you found, like what, what, what did the people who sort of like afraid of it in that, like, I think it's not going to work. It's going to jet, you know, the thing I hear all the time, like, well, what if it generates bad code?

Well, you know, like, yeah, well, like what if people generate bad code?

Guest: Yeah. Well, we don't yeah, we don't end the thought process. It's [00:10:00] there, you know, like, good point. In a number of ways we will, yeah, we will figure out how to vet it. And I think going back to the first part of your question, you know, in terms of getting into learning about it, I really don't like the people are being pushed to learn out of fear.

I don't think that's the right reason to learn something. I think there's a lot to be fascinated and passionate about. I don't fear the robots coming to take my job. I fear people making my job worse. By, you know, in some cases badly adopting new technology and in a lot of other ways. I don't fear machines laying off my friends, I fear people laying off my friends because of leadership failures that didn't prepare in most cases for economic factors and so forth.

So, this is all just. Us doing a good job and [00:11:00] looking out for each other. That's, that's what this is all about. You know the, the idea of, you know, personifying the machine is the one doing it, you know, is it, I mean, it's a, it's an undertone, but that's really sci fi stuff. We, this is all what we do and how deliberately we think about improving our processes going into the second part, you know, how can it really Help us.

I mean, we have with LLMs things that they do really reliably well. So when you say, oh, it elucidates, it does this unreliably, it doesn't add numbers together well, this and that, that's all totally right. So focus on its strength for a start. Is really good at turning information from one form into another in particular, like taking unstructured information like text and.

Turning it into structured information that can now interface with all the other stuff we have to deal with structured data. So this is some of the really big contributions [00:12:00] that like the flashy stuff kind of distracts you from, but just thinking about how we can use all these really robust tools we have.

And then if we could just add this little bit that turns some unstructured information to structured it expands what we can build. And I think a similar way to the introduction of like relational databases might have or caching.

John Willis: That's a good point. Yeah, no, I mean, like for me, I find it, you know, I've never made the assumption that it was going to build like some Sistine Chapel application for me, right?

Like, you know, like the, the idea of like saying, Hey, build me a, you know, a system that does this, that, that, like, I, like, We've been around long enough to know that's like a recipe for disaster, but I love this sort of natural language process of it. Like, I mean, the other day I was, I was managing this hackathon and this team built this like really clever sort of like it was like, a copilot terminal copilot, right? Like, you know, not like a very sort of miniature version. We'll talk [00:13:00] about Devin in a second, but it wasn't designed. There was this this team that wanted to basically keep track of all the commands that they're running and then have sort of a prefab set in a vector of all the commands and then, you know, be able to ask questions like, hey, how would I do this?

What is the command to do that? And then graph it and all that. And, and so I was, I was a judge for the code and they were using fast API, right? And I was like, and I hadn't used fast API before. So so it was weird about like how you do an import. And then there was a directory for for routers.

And like, I was like, okay, this is going to be a tutorial that I don't want to take. And I just put the code in. And I said, Hey, when I do the import of routers from this library what is the, you know, what does it do with the router directory? And then it said, don't tell me he runs this script, that script.

And then I was like, Oh, got it. You know, like 3 minutes, you know, less than 5 minutes. Like, that would have been [00:14:00] like, I would, you know, I would have had to search Joe's tutorial, which would have went on and on about the, the beauty of open eyes and, you know, like, skip, skip, skip, skip. But yeah, I mean, like, the, the dialectic.

Of the like, hey, you know, you know, chat GPT or copilot. Can you tell me here's a Json? How do I get the, you know, the 4th layer field out? And then it comes back with sort of something I know is wrong, but then I'm like, okay, yeah, I kind of meant this, right? So I, yeah, I, I think to your point is, you know, it's just an extension.

It's like the calculator for coding, right? Like, if you will. So I think that's pretty cool. Well, one of the reasons I, I sort of like got really interested in your work was you know, like this, this sort of Devin. ai thing came out a month ago. Right. And, and it was like, Oh, anybody was talking about it.

All my friends were pinging me and say, you got to look at this. It's the most incredible. And, and its service value was [00:15:00] like pretty incredible. Like, okay, and, and I, I'd seen a couple of prototypes of people who were, you know, Matt Wallace had created some interesting stuff about going out and, and running selenium and like from the terminal with, with a chat GPT.

Right? So, but I thought, well, maybe these guys have, you know, figured out the cure for cancer here, you know, and yeah. And then you know, and I got really then immediately concerned with technical debt and we can talk about that. But then I, I, sir, you know, so your video and you really, you know, went into like, Hey, wait a minute, you know, we've been doing this for a while.

By the way, you know, and I thought, so then it was ironic that like, like within a couple of days, there are already like, like two or three knockoffs in a month now that the list is probably too long to put on a page. But, but it wasn't so much that it was that. I think, first off, they, they definitely hit a, they, they, they definitely hit a [00:16:00] chord, right?

Like, you know, like, You know, I'm thinking like I'm sort of playing with an article like you don't pull on Superman's cape. You don't spit in the wind around with software engineers, you know, but, but yeah, the response was just insane. And then your response and other people's response. And then. Again, our sort of tribe, if you want to call it, pretty smart people.

And so like immediately kind of figured out like what was really going on. Now I didn't, you did in a couple of other people, but before we get into the stuff that like, That I think was overinflated, maybe overhyped about Devin AI. Like there are some good things that they introduced and you do a good job of pointing out like how they sort of put things in the limelight, the IDE structure, I'll let you sort of take over on that.

Guest: Yeah. And there were things and we often call them coding agents before Devin, I think Devin is the first to put out there in [00:17:00] addition to being the first to one of them to publish a a sweet bench score, even if they didn't run the entire benchmark. They were the first to even plausibly get on the board with a respectable score on.

And I'm going to have you play with Squeebench

John Willis: here in a minute. Yeah,

Guest: in a second, we'll get to that. Which is a given, you know, I started the, the mender. ai website to specifically talk about language models, you know, ability to address legacy code, as opposed to just writing these new toy examples, which we kind of know is a much different task.

Then the actual thing that holds us back as an industry. So I wasn't still, I'm really intrigued by the actual meat of the Devin announcement as I saw it, which is, you know, this, this could take care of a lot of the, a lot of the little monotonous things. It could allow us to integrate it in a principled way in an overall software engineering process in which it was pushing things for the better. [00:18:00]

And that would take some, some careful thought, but I think there are paths to do that. And then there's the part of the announcement and the video is dissecting Devin, if you want to check it out, but the part of the announcement that I criticized was Their characterization of it as an AI software engineer.

And the various other things they said that implied in that direction, I think very deceptive ways. But, but I think I understand, I empathize with the catch 22. They are trying to make a big splash with what I believe is potentially a very impactful product direction because they are targeting areas of neglect with where this is actually useful.

And nobody wants to fund those. Nobody wants to fund your, your tech debt robot.

John Willis: You

Guest: know, nobody wants to fund your, your automation that'll help people with these minor aches and pains that are actually what, what holds back the industry. And so I guess maybe they felt [00:19:00] like they had to embellish.

John Willis: Yeah, you know, don't start making me feel bad for him.

I'm still trying to win him over. You know, I'm trying to, I'm catching. No, I think it was, you know, I think if you look at the investors that, you know, that was sort of a get rich quick idea that, I mean, I mean, there's a whole nother discussion about like, there is no moat or, you know, like, whatever you, unless you're building a new you know, foundation or frontier model, or you've got some incredible way, you know, To sort of glue things together where there's a secret sauce, you know, almost, you know, I've been watching these hackathons and what people can do in a day is mind blowing.

So the idea that, you know, that, that, you know, six competitive like coders or seven competitive coders, we're going to like, you know, You know, create something that millions of people think about and struggle with and build and, and automate in a, you know, like you couldn't have picked the worst crap to put out a first product against where, you know, and we saw that like, you know, the open source [00:20:00] Devin's and all that.

But, but I think to, to summarize the good things, right. I think, you know, like you said, they exposed an investment and an opportunity for people to try to build upon this sort of the. The nuance things the things that like don't add the value like you you mentioned infrastructures code right like in the early days infrastructure code I used to try to tell people like you do 80 percent of your time is in the mark and 20 percent is business value what if you could use a tool like shaft for puppet.

To basically flip that, you know, what if you could be doing 80 percent business value and instead of worrying about how to configure Apache, you know, for the 100th time because, you know, and, and all the plugins, like, what if, like, that's just a button, right? And so I, I mean, I think that, that, what, what Devin has sort of introduced in a way I think their sort of proprietary or get rich scheme around it is probably going to fail miserably.

And, and then you mentioned the [00:21:00] IDE too, like you really like the idea that there's sort of this encapsulated idea of not just a, a co pilot, but like it's all, it's a web browser, it's the, you know, it's the IDE, it's the terminal, it's all sort of built in. With the I thought that was a sort of cool observation of something that creative that they did.

Guest: Yeah, I think for what I'm calling autonomous dev tools,

John Willis: right?

Guest: Like, this is the baseline now, like, we, we want to at least capture this workflow. And you can see the the community of people building similar things, open Devin, Devika and others. They're taking that lead like the existing coding agent efforts are repurposed into let's let's at least get something that's like that demo because this is pretty intuitive to come into once this starts integrating with our overall process.

I have doubts that that's what is necessarily going to look like. I think autonomous dev tools will take a lot of. different interesting forms. But for right now, I think this is kind of our this is our baseline. And again, you know, Cognition's contribution in that [00:22:00] is I think gonna end up being pretty important.

And just, I think if we can point at a North Star of, okay, you've got all these AI coding tools, can you form them into an autonomous dev tool that it can at least take 10 steps on its own before falling over? You've at least proven a range of capabilities that can now be applied to other ways. It might not actually look like that how you're using it.

John Willis: Yeah, you know, I had never really, until this conversation, really thought of, like, the sort of dev and movement, if you will, or autonomous dev tools, or I think you have a good way you describe it somewhere in my notes here where you say, you sort of Yeah. Coding agents versus autonomous coders or right.

Like a good name for this space. Like these, but like, it, it is sort of like a next generation of infrastructure as code, right? What infrastructure code for us. And now we've got this new thing that adds a little more to infrastructure code. And I think that's really cool. Could you quickly [00:23:00] talk about SWE, we call it SWE branch, but like it's SWE branch and what's that, that's all about.

So people who don't know what it is.

Guest: Yeah. So SWE standing for software engineering, right? SWE bench. So software engineering benchmark was a benchmark put out by Princeton NLP lab, which is natural language processing in September of last year. And so it's been cooking, you know, for six, seven months.

But the thing that I loved about that benchmark when it came out was, was two things. One of them is the benchmark cases consist of solving actual issues in open source. GitHub repositories. So they took actual common like Python libraries and things. And they said, okay, at some point from this get hash and the repo, someone needed to do this, to fix this bug, to update this dependency, whatever.

And let's make [00:24:00] a benchmark case out of that. And they did this with like, you know, over a thousand cases. And that benchmark was so different from all these toy problems that are like solving fizz buzz or something, you know, that there are at least little, you know, 30 line programs in isolation, that the old guard of LLM coding evaluations, such as, you know, the, the, the original popular one human eval is one you hear thrown around, you know, that was, that was good for the time, but LLM coding ability got to where that, that benchmark was effectively beaten and we still didn't have anything that you could use in a real situation.

John Willis: Right. And so

Guest: Swaybench comes along with this approach that proves it can solve a real problem in a real repo, passing the tests that the person who submitted that PR needed to pass. And so that's great. And so that's the first thing I like about it. The second thing I like about it is that as soon as you plug the existing LLMs directly into this benchmark, they all fell flat on their face.

It demonstrated like, It demonstrated what we had observed, which [00:25:00] is that, yes, we had solved coding toy problems, and we had not remotely solved doing anything in a realistic scenario without a human expert programmer beside it. So that focused that turned the focus to, well, the state of the art models aren't enough, but we can wrap them in increasingly sophisticated agents.

Right. Algorithms around the, the models or tools that the models can call. And that has led to this Devin and, and it's ilk, you know, being able to progress in a, in a measurable way.

John Willis: And they have we'll circle back to the, the and the SWE agent is an implementation of sort of a Devin like solution.

That yeah,

Guest: sweet agent was the first open source agent that was competitive with Devin, and it was released by Princeton LP lab as well. So they had actually working on that for six months. But [00:26:00] you know, as soon as it came out, it was like, okay, this is the Devin competitor now sort of in retrospect, that's how these things are all framed.

John Willis: And so I think that the interesting thing is so like, I think You know liar liar pants on fire part of this this Podcast and and I think the thing that really shows and I'll have links to a bunch of your own Podcasts so people can sort of go a little deeper and and what you're talking about you But your passion is the way I interpret your passion about this is not to be petty or the two points I think that you make and you can really expand on either of these but one is You They've come out with a competitive stance.

Hey, we can do this percentage of this. We can do this percentage of this. We can do that. And, you know, and, and, and, you know, from some of the research I did, but you pointed me in that direction that they were competition coders. And so like. You know, you were a competition coder, so like, hey, okay, game on.

You know, right? [00:27:00] Yeah. There's a lot of

Guest: friendly competition. When I talked about, you know, can we beat Devin? Devin has been beaten. And this, you know, I, yeah, I made clear. And it will go

John Willis: back and forth. And they're going to have to earn their keep, right, in this very competitive space. But the other one is, I think this you and and the gentleman who does the internet of bugs is the passion of like our tribe Or the way, you know, let's let's be you know, if we're gonna say these things, I love what that The the call guy said from the bugs.

He said that Yeah, they should have just told the truth and taken the win, you know, like, and, and, but they, like, whether it was the marketing folk or the pressure or whatever, but like, you went through quite a good list of like, sort of, well, you know, like. Be careful, like, all that, like the hype I fell into, and then every other person who, you know, sort of like, I got so many emails, like, you need to see this, John, you need to see this, John, it's unbelievable, everything has changed.

And can you walk us through some of the [00:28:00] things that, you know, that you found when you sort of, you did a little deeper discovery of the claims of Devin?

Guest: Yeah, so point by point we could just go on the original tweet that they put out you know, because that, and their blog entry says a little bit more, but it wouldn't take long to go point by point, and the, you know, my article goes into more detail, yeah, yeah, yeah,

yeah, more, more detail.

But, you know, first of all, we'd like to introduce Devin, the first AI software engineer. So that's not, that's not a good you know, that's not a good place to start. And I, I've gone into detail but, you know, I think suffice to say, I'm choosing to take them seriously as people who have some potential to impact the future of software development.

So, That means I am taking their, what they say, you know, with their 20 million investment, you [00:29:00] know, fairly seriously. Perhaps more seriously than their investors do. I don't know. I don't know what their opinion is on if you take 20 million and then lie to the public in your first announcement. But I think the term software engineer is important.

And if you're going to use it in this way, you at least be prepared to define it. And of course they haven't done that. And they don't think that it from, from what they're saying, it doesn't look like they think that it's a term that has any meaning worth considering.

John Willis: So, so already, you know, Therein lies, the poking of the bear.

But yeah, so, so, you

Guest: know, you, you have activated me but this is marketing, you know so I understand. You know, that's just, it's how they're tagging their product. So. I did extensively take issue with with calling it that and playing into some of people's wrong impressions, but what do they concretely say, okay, it's the new state of the art on suite bench coding benchmark.

That's not [00:30:00] technically true they didn't run the benchmark in their blog entry that they make clear they only ran 25 percent of it. Now, it is very expensive. So I understand why they didn't, but another reason they didn't is because their, their app is a very time consuming and very slow to run. That does impact how practically valuable it is.

So you know, they didn't technically place on Swaybench. They're still not listed on Princeton site, but they did credibly achieve that level of performance. So I call this mostly true. Then they say it has successfully passed practical engineering interviews from leading AI companies. That is a completely deceptive thing to say.

They were not interviewing a machine for a job. That's, that's not a thing people do. That's not how you evaluate if you're going to buy a machine. I love

John Willis: your quote, like chefs don't evaluate toasters, you know?

Guest: Yeah. Yeah. And furthermore, like it's not even an impressive capability. Over a [00:31:00] year ago, AGI paper by Microsoft research.

And once they showed that GPT 4 is able to score highly with clever prompting on you know, any number of, of professional exams. And it just turns out that like, because you can pass a bar exam as a computer program, doesn't really mean the same as when a human can pass a bar exam. So they're basically saying, you know, based on this already established behavior of GPT 4 with no agent around it whatsoever.

You know, Devin's able to pass engineering interviews, you know, it's, it's a completely meaningless claim, but it is implying, again, that it's a real software engineer, which is not,

John Willis: well, it implies that, you know, I think the other point you've made, which is it implies that, that what I worry about is the, the leaders in the management.

You know, I think this is a longer discussion about, like, people, like I'm going down to present at this open networking conference next month. And the reason I've [00:32:00] been invited to speak is all of the board members believe they're going to reduce developer head count. And I'm. God forbid they're going to reduce infrastructure, you know, and like they're believing the hype and sort of Devin plays into this like in the worst way because this is my tribe, you know, that like, you know, what, like, I don't need even, you know, DevOps people anymore.

Like I'm hearing this DevOps is dead and we've been hearing this since the day one, but now they're coming on very strong, like, you know, heavy, like Delta one on one heavy, you know, Yeah. So, but, but your point that like, like when they make these claims about passing tests or, or, you know, competing for job interviews, they're not really working in the job, right?

Like, yeah.

Guest: Yeah. And so then we get to the claim about actually doing a job, right? So the, we've established the, the engineering interview thing. I mean, that's just a repackaged, you know thing they had tested GPT 4's ability to do things like that before [00:33:00] GPT 4 was even released. And yet. It's over a year later and GPT 4 is not at those jobs.

So, you know, that, that's just a total red herring again, to promote this sort of false narrative. And then the part that saying it completed real jobs on Upwork based on the demo they provided as evidence of that. Is just also not true. It didn't complete a paid job. It did you know, a sub task within it, it helped their expert programmer who was operating it potentially complete that job, but they didn't even ask it for the same thing that the job asked for.

And that's what internet of bugs.

John Willis: Yeah. He does a great job of like, like it's sort of an AWS. Exercise and like, they don't even specify what cloud or, you know, like it's just, yeah,

Guest: and it's not that it didn't do anything useful in the course of these demos. That's not really the point. Yeah, it's a matter of how honest are you being,

John Willis: you know,

Guest: you're throwing the name of our profession around.

And how honest are you being with [00:34:00] how you present, you know what what's happened. And, you know, are you really in the position, like you mentioned thought leadership. Well their thoughts in their name cognition. Yeah. They're presenting themselves as in a position where they could have some thought leadership on the future of the industry, and, and this is how they're using information.

John Willis: You know, I think the other one, and you, you explained it, and then the Internet of Bugs guy does that call guy have a last name, or is that just going to be a mystery to everybody? Which guy? The call who does the internet the internet of bugs.

Guest: Oh, yeah, I I am trying to figure out

John Willis: I'd

Guest: love to talk to him.

John Willis: Yeah, I'm gonna talk to

Guest: you internet of bugs reach out to probably

John Willis: Totally. Yeah. Yeah, super bright. Super. I think he's purposely not you know, like like making this a hanger But but I love like you talk about it and Like, you know, the, the one sort of, they're a thing that sort of got everybody like really excited.

At least everybody I knew, which is the idea that it was doing, you found the bug, it debugged it and it fixed it. Right. [00:35:00] But then like, I'm like, Whoa, all right. And even then I was like, but you're putting like print statements in, but okay, like I'll give you like a half a credit, like that was pretty interesting.

But to hear that, like. It was already in the readme and it like, it didn't even see the readme and it actually created the bug that it actually fixed.

Guest: Yeah, it was cool that it introduced a parade statement to, yeah, trace a bug. But it was a bug that it introduced while doing something unnecessary. I mean, amazing find.

I wouldn't have picked up on that, you know, found by Internet of Bugs.

John Willis: Yeah, his demo of that is like spectacular of like how, you know, like, you know, like, this is not the way anybody would even like, even do like a tail statement and you know, and like, just all the things that he shows in that are like, it's, we'll have that link in there as well.

So what, so what I think what's interesting is We kicked the hornets, Devin. ai kicked the hornet's nest, and you know, you mentioned SWE [00:36:00] agent, you also mentioned AutoCoder. In fact, you had asked on that podcast, like, hey, I'm going to go, and I love that you're doing this, right? Like, that you're going to go in and like deep dive on a lot of this stuff, right?

And because you're passionate about our industry. And then you ask like, Hey, what, you know, what, you know, your listeners like, what do you want me to do next? Or first? But what, what is the sort of the, the, the reasonable list of things that we should be paying attention to? Yeah. Well

Guest: I will give you the, the pointer to the list first, which Okay.

You go, yeah, yeah. You know, my best attempt, which is no pilot dev. Which is a spinoff I created in the last couple of weeks of Mender. ai, specifically to talk about autonomous dev tools. Because I think now that Devin has sort of focused the conversation you know, with an imperfect start to the conversation, as I mentioned, but I, again, I think there's important contributions they've made in terms of laying out the product direction.

I do think in some form, in the right responsible form, [00:37:00] These are going to be really impactful, and I want to help make sure that impact is, is Mostly positive. So no pilot. dev is where I discussed this and then the current players, you know, I'll be continuing to update because this is going to rapidly change right now.

Obviously Devin, if it were released, which it has not been in any capacity would be a player. SWE agent, we mentioned was the first one on the SWE Benchmark. I love that

John Willis: you have a benchmark, right? It's pretty cool. Yeah, yeah,

Guest: it was the first other one competitive and that's MIT licensed. OpenDevin is one of the early ones emerging that has a GUI similar to Devin.

Devica is another. And the benchmark leader right now. So those are kind of the, a couple of the GUI leaders and the benchmark leader is AutoCodeRover by National University of Singapore, a team out there and they were the [00:38:00] first to claim you know, beating Devin on the benchmark, the SWE agent was about a parody.

So AutoCodeRover and SWE agent are the two like academic ones that are just sort of command line only right now. I think SWE. Agents building a GUI. And they have vastly different architectures. So the fact that two agents you know, within a month of Devin's release, two different agents by two different universities around the world from each other with two different architectures had both been competitive with this previously untouched score.

That's very interesting. And I want to see what the difference is between those architectures.

John Willis: Yeah, I think that, you know, like a friend of mine is working for the, he's running sort of AI for one of the big, big guy, the big consulting companies. And I had a conversation with him and like, it was kind of scary almost like army of bots.

And so it isn't like, you know, I mean, the, the, you know, the. [00:39:00] The beauty of this technology is it's not really that hard. The things that were impossible for a 97 percent of the planet, I'm just making those numbers up, but, you know you know, five or six years ago are now like come up, like it's insane how this is commoditized.

And, you know, the fact that you can sort of build these sort of, you know, mixture of experts or different agents or army of bots that can just, so it's, you know, it makes sense that like people. Didn't need Devin to say, Oh, you know, I think I can automate a lot of the junk that I do all day long. Right. So it makes sense that a lot of those things like popped out, like immediately it isn't like, you know, they saw Devin and like, okay, we sat down and write this stuff, right?

Like they, this stuff was in the works by a lot of people, a lot of clever people. So yeah, no, I think that's you know, pretty cool stuff. Yeah. The, I, I think I guess the other thing I, oh, you saw one question I'll ask you is [00:40:00] you know, I, I I know that the, the, the cloud. com folk, they, they created Rancher and then now they've got a company called Acorn.

They've done a pivot and they've got this thing called GPT script. Have you had a chance to look at that at all? No,

Guest: no, I can't comment on that one yet on my list.

John Willis: All right. Yeah, take a look at it. It'd be interesting. I don't think they're, they're not even in this sort of like we're going to be Devin or anything.

They just do some tight integration with sort of the the, you know, the sort of the system like they can sort of integrate a prompt language with like a find command for code and then put the code into the You know, it's just a nice integration with the sort of OS and I thought that, so I've been having a little bit of fun sort of just seeing where that's gonna go.

So I guess the last subject then is I'm really scared to death. Of you know, being the sort of the DevOps person and, you know [00:41:00] thinking a lot about that SRE is a great example of something that came out of scaling technical debt, right? Like, like, if you go all the way back to Google, you know, like it starts because they're the expansion and growth that they have.

Like has to be separated. Like Mark Burgess, my good friend wrote the original forward to the original SRE book. Right. And it was a

Guest: fantastic book.

John Willis: Yeah, it is a fantastic book. It is the ground zero for that, this whole discussion. Right. And, and you know, one, in a conversation I had with him one time, he said, you know, one of the things that like Google did so amazing was they made a non deterministic world look deterministic to developers, right?

You know, which became like the board, which we see is cool platform, but through that platform, I think, evolve why you needed scale reliability. Right. So SRE and so SRE was in a lot of ways, you know, sort of my [00:42:00] theory, but it's not anything profound, excuse the pun that it's a scale, you know, like what scale introduces, you have to think differently about how you manage your, it's a part of your job right now, right?

Indeed. Right. You have to think differently about how you manage things at scale. And I worry that we're, we, we're going to fall into this mistake that we constantly make where. We're going to think that, you know, the Devins and the sort of the sales pitches and the pro pilots and the, the chatbots that are just going to get introduced to everybody like, oh, no, no, don't worry.

Like SRE is dead DevOps is dead. You don't need it anymore. AI is going to take care of all this and what we're going to see in these large institutions. Is a minefield of vector databases, all, you know, massive variants of orchestrations of line chains and Lama indexes and different versions. Not, and you know, like I thought Ruby gems almost [00:43:00] ended my life at one point.

And like that that's kids play compared to the academic Python library dependency structure. Right. And then you get into the models and the embeddings and like, you know, Like, I have this fear that, like, in a year from now, the CIO who thinks that they created the new chief AI officer, and then when the infrastructure and DevOps people come to him, we'll clean that up, they come to him, like, Hey, you know we need more people now.

Like, what do you mean it's AI? Well, I got four different vector databases. I got, you know, and, and so I, I worry that we're not thinking clearly like, you know, like we've seen this movie before, you know, we've used SRE, we've used DevOps principles, we've used platforms. I think a fair amount of people are thinking about platforms, but if we're not putting this all together, like, I think the world, I think we're going to be in like, I mean, it could be existential crisis.

Existence for somebody and maybe I'm getting too [00:44:00] melodramatic, but I mean, what are your thoughts about this as somebody who has played on both development side and now you do reliability engineering.

Guest: So yeah, I think I have similar worries to you, right? It's not that I'm afraid that is going to take my work away.

I'm afraid it's about to give me too much work.

John Willis: Yeah. Yeah. And they're going to say, I'm the one that has

Guest: to come and clean up the messes, you know, and

John Willis: what they're going to say is, you don't, you don't need any more head count. There's AI. Right,

Guest: right. Well, you know, here's the thing. If AI makes your people so much more productive, then you should be able to have more head count.

If you were actually able of taking advantage of of productivity when you had it, that should mean you're so much more profitable, right? With however many developers you have, if you can actually scale productivity and what you get out of it in terms of business value. So, so I think, you know, yeah, people need to, to think very carefully about what it is they're, they're even saying, you know, it's a, it's a leadership thing.

If you think you have a productivity gain, you can't figure out how to make more money with it. [00:45:00] The only thing you can figure out how to do is count, is cut head count. That's you know, that's not an AI problem. That's a human problem. That's just incompetent leadership. But as far as, yeah, the the history of, of technical neglect.

Yeah, I don't see that changing. And I see a lot of almost panic driven adoption of things or at least urge to, you know, I don't know how well people are reigning in their urges, but yeah, this could create an avalanche of technical debt that we're not prepared to take advantage of. But they will be new systems, so they will not yet be probably essential.

So a lot of these features will just get launched and cut. Maybe but yeah, I don't, I think it's just going to create a lot of havoc if we're not very careful. Before I was in reliability before I went into SRE proper, I was doing a lot of legacy code work as a [00:46:00] backend developer. And so sort of same story, like you, you're just going to add to the legacy pile, no more, the more new code you're able to write.

So you better be good at managing your legacy code. That's really in the long game. The future of software is simply the future of software maintenance. So what effect are we having on the future of software maintenance?

John Willis: Yeah, yeah, yeah, no, and you know, and I think going back to the sort of the technical debt thing is that we you know, like, I think you had a great point in one of your videos about, like, you know, Whatever you finished, you thought was greenfield at the end of last year is now legacy, right?

Like so legacy is just this cascade of like all these things that we've produced and and they're the things that make the money, right? Yeah, no you know, I also saw that you were you know Like you sort of mentioned the Phoenix project and all like any good DevOps person should You know but you, you seem to have you [00:47:00] know some, you, you've been interested in, in Golratt and theory of constraints.

And, and I, as I was thinking about all this discussion we had, like it goes back to systems thinking, right? Like if we're short sighted and we think we can sort of replace this for this, are we looking at the overall flow or what Golratt would say, you know, you know choosing local optimization over global optimization, right?

So.

Guest: Yeah. I actually on mender. ai when I go over the sort of knowledge sources it's based on that, you know, it might look like an odd one out in there is theory of constraints, which is, yeah, the methodology, the flavor of systems thinking I like to think of it as. That's Goldratt introduced and I'm a little more read up on Goldratt.

I'm still trying to wrap my mind around a lot of the Deming I'm catching up, but he was very good at thinking clearly and showing [00:48:00] people how they hadn't thought through things clearly. What really is our goal? What really are the factors that are preventing us from meeting it? And, and how can we, you know, kind of resolve these, these false conflicts that our local incentives you know,

John Willis: cause.

No, in fact, the whole reason I'm into Deming is primarily because. I met Gene when he was halfway done with the Phoenix Project, wasn't even called the Phoenix Project, and before he would give me an early copy, he told me, you know, like, I need you to read The Goal, so I read The Goal, and then I read, like, almost every book that Goldratt had, because I was consumed by it, and then I think it was, you know, there was a A couple of things that prompted me towards Deming, but the thing that really got me sparked to write a book about Deming was Eliyahu Goldratt's Beyond the Goal, where he talks the way a physicist thinks.

And then he says, by the way, I was, and so was Dr. [00:49:00] Deming, a physicist. And I was like, there's something there, like there's something about the way these guys think differently than Drucker and all the other sort of classic management guys.

Guest: Yeah. And that really also is a way that I think programmers can appreciate some of their teaching is because they are, they, they really do emphasize kind of brass tacks logic, even as it applies to things that are more human and more squishy, that like, it allows us to relate a little bit more.

I wish more programmers would get into these two thinkers, but I think the, because yeah, of their maybe more hard science aspect of their background, it does make it more relatable once you get into it. But yeah, in terms of just the, the most shallow entry level theory of constraints of looking for the bottleneck, what's the bottleneck in software development?

You know it is the ability in some, one of the big ones I would say maybe the central one, the ability to own the software we've already written. And so when I saw people focusing solely on the fact that GPT 4 can [00:50:00] write new code in isolation, I like, well, this is just failing to even recognize the nature of the work.

John Willis: Yeah, yeah, yeah, yeah. And then, you know, then we could sort of spend like a whole another hour on bias, right? Like, there's a lot of great books about bias and sort of image reckoning, but there's, there's clearly bias in code, right? There's, there's a lot more Java and C and GitHub than there is assembly language, right?

Right? So, like, you know, depending how far ago, like, and like, a lot of the banking Java code, right? It's not going to be GitHub, right? So like, there's a, like, there's a whole interesting thing about the bias of code, not really understanding what actually runs Fortune 500, Fortune 1000 companies today, right?

Like, and so to your point, as it's generating stuff, and this will all get resolved, but as it's generating new code, That bias is going to affect the inner connectivity, the legacy, the, all the things that it couldn't really understand. So, yeah.

Guest: So I do have confidence that sooner [00:51:00] or later, if you give people a new hammer, and their first instinct is to start hitting themselves in the hand with the hammer that they will eventually need to stop.

Move

John Willis: their hand.

Guest: Yeah, yeah, they'll eventually move their hand or something, but you know, I don't know how long that'll take. I'm trying to trying to kind of skate where the puck is going to be and get ahead of that a little bit. You know, but the positive thing I see. Is, you know, cause I talked to coaches.

I talked to people who have been trying to be change agents for a long time. And they are always saying, man, management just wasn't willing to change. People weren't ready to change. Well, in every area of knowledge work, people are more ready to change now. Because of AI, than they have ever been. I think we can bring something positive from

John Willis: that.

And I think just to end, you know, and I'll put links to all your stuff, but you know, I'll have you sort of tell people from you where the best place to find you. But I think [00:52:00] the thing that attracted me to you is, is your, your genuine passion. Just bleeds and, and you know, and I think most people who, who would spend an hour listening to me are most likely to find you as interesting as I had, and I, so keep up the great work, my friend.

And where did people find you?

Guest: Yeah. Thank you so much, John. I mean you know, you're DevOps pioneer and this is you know, this was an honor. So my YouTube channel is craft versus cruft and mender. ai. is my site generally on refactoring using AI, applying AI to legacy code. I have both you know, concrete things and philosophy about that there.

And if you're interested specifically in following this autonomous dev tool situation, that's where I'll be focusing with nopilot. dev.

John Willis: And so what's the no pilot just a play on copilot or [00:53:00] just

Guest: so so there was it's renamed now and they've developed it into some pretty interesting tool. It was called GPT pilot was the response to copilot where it's more autonomous.

So that was kind of a pre Devin Devin as well. I think Pythagora. Is the company that what they call it now, but so no pilot is the more autonomous. Calling it. I'm calling it a I'm not calling it a pilot. I'm not comparing. I get it. I get it. I

John Willis: mean, I hate to pull this out. In the final minutes, but like I know well, didn't really mean no SQL, right?

Like, yeah. Yeah. Good. Good. This is great. My true. Yeah. That's awesome. Yeah. Yeah. So I thought I was wondering if they became that's cool. All right. Well, good. But yeah, thank you.

Guest: very much.