Code with Jason

255 - Ghost Engineers with Yegor Denisov-Blanch and Simon Obstbaum

Jason Swett

In this episode I talk with Yegor Denisov-Blanch and Simon Obstbaum about their Stanford research on developer productivity. They share findings about "ghost engineers" (9.5% of developers who do minimal work), discuss challenges in measuring engineering output versus productivity, and explain their data-driven approach to software engineering assessment. The conversation explores how different developers contribute varying value, how life circumstances impact work motivation, and their methodology examining source code and Git metadata. The researchers highlight the importance of quantifying engineering contributions and have collected data from over 50,000 engineers in their ongoing study.

Speaker 1:

Hey, it's Jason, host of the Code with Jason podcast. You're a developer. You like to listen to podcasts. You're listening to one right now. Maybe you like to read blogs and subscribe to email newsletters and stuff like that. Keep in touch.

Speaker 1:

Email newsletters are a really nice way to keep on top of what's going on in the programming world, except they're actually not. I don't know about you, but the last thing that I want to do after a long day of staring at the screen is sit there and stare at the screen some more. That's why I started a different kind of newsletter. It's a snail mail programming newsletter. That's right. I send an actual envelope in the mail containing a paper newsletter that you can hold in your hands. You can read it on your living room couch, at your kitchen table, in your bed or in someone else's bed, and when they say what are you doing in my bed, you can say I'm reading Jason's newsletter. What does it look like? You might wonder what you might find in this snail mail programming newsletter. You can read about all kinds of programming topics, like object-oriented programming, testing, devops, ai. Most of it's pretty technology agnostic. You can also read about other non-programming topics like philosophy, evolutionary theory, business, marketing, economics, psychology, music, cooking, history, geology, language, culture, robotics and farming.

Speaker 1:

The name of the newsletter is Nonsense Monthly. Here's what some of my readers are saying about it. Helmut Kobler, from Los Angeles, says thanks much for sending the newsletter. I got it about a week ago and read it on my sofa. It was a totally different experience than reading it on my computer or iPad. It felt more relaxed, more meaningful, something special and out of the ordinary. I'm sure that's what you were going for, so just wanted to let you know that you succeeded, looking forward to more. Drew Bragg, from Philadelphia, says Nonsense Monthly is the only newsletter I deliberately set aside time to read. I read a lot of great newsletters, but there's just something about receiving a piece of mail, physically opening it and sitting down to read it on paper.

Speaker 1:

That is just so awesome Feels like a lost luxury. Chris Sonnier from Dickinson, Texas, says just finished reading my first nonsense monthly snail mail newsletter and truly enjoyed it. Something about holding a physical piece of paper that just feels good. Thank you for this. Can't wait for the next one. Dear listener, if you would like to get letters in the mail from yours truly every month, you can go sign up at NonsenseMonthlycom. That's NonsenseMonthlycom. I'll say it one more time nonsensemonthlycom. And now, without further ado, here is today's episode. Hey, today I'm here with Simon Opsbaum and Igor Denisov-Blank Welcome.

Speaker 2:

Thanks so much for having us, Jason.

Speaker 1:

Thanks for being here, Good to be here. So, igor, you put out a tweet recently here, maybe a series of tweets on developer productivity.

Speaker 2:

And.

Speaker 1:

I don't know how I came across it I think maybe it kind of went viral or whatever and it caught my eye because it said something along the lines of 9.5% of developers literally just don't do anything. They don't do any work, but they collect a paycheck anyway. And I was very unsurprised. I've long had kind of a hypothesis that the great minority of developers create the great majority of value and they just kind of subsidize the salaries of everybody else. And so, because I had given that matter so much thought already, I saw that and I was like wow, these guys are studying this exact thing that I've thought about so much. So I'm curious how did you go down this path of going into this topic?

Speaker 2:

So in a past life I was looking after a number of digital transformation projects for a large company. We had several thousand software engineers, and it was always very paradoxical how, despite software engineers being so data-driven, we never had a data-driven way to make decisions around our software teams. And so, sitting in meetings, sometimes even at board level, it never felt right that the decisions we were making were not grounded in data, in good data, and that rather we were using either a combination of intuition, sprinkle of politics and probably some data that we knew wasn't super accurate, but it was the best we had.

Speaker 1:

And when you say decisions, what kind of decisions do you mean?

Speaker 2:

Well, anything from integrating companies that were acquired to making decisions around outsourcing partners and vendors, um, to kind of like upgrades to you know the it infrastructure and what that meant for kind of the engineers we had, and how kind of that transition would look like okay, got it okay and and you were maybe kind of frustrated that these decisions were being made but not in a data-driven sort of way.

Speaker 2:

I was almost convinced, and people around me were also convinced, that we weren't maximizing decision quality. There was still room left to do better, and the problem is that us, and the same as many other companies, were the situation where, hey, there's, this is what we have, this is the best that there is. There's different ways to slice it, but ultimately there's. You know, there's a lot of room that still could be room for improvement. So then that takes me to Stanford, where, which is you know, I've been here since 2022. And that, uh, when we launched the research, okay, interesting, interesting.

Speaker 1:

okay, yeah, the decisions and data driven and that kind of stuff is interesting to me because making decisions can be interpreted or construed in a number of different ways. There's different kinds of decisions, like there's the kind of decision where you have a fork in the road and you can take road A or road B and those are kind of your two options. And there's the decisions of like, well, what do I do with the rest of my life? Where it's very open-ended and there's also the data-driven aspects. You know, I have kind of a beef with this idea of um, of data-driven decisions. Not that data is bad, obviously it's. It's smart to base your decisions on um.

Speaker 1:

But what's missing from the picture for a lot of people, a lot of time I'm sure you guys would agree is that the data is just source material. It is in itself, is not, it's not anything more than just information to be used, and data isn't even necessarily truth. You know, you see these facts, but you have to put the facts through a critical thinking process to figure out. Okay, I have this spreadsheet or whatever. What does this mean? What can I infer from this? To build useful knowledge about the world in order to make decisions, and the reason I mention that is because I think a lot of people take it way too simplistically and they're like oh well, I have this data and I made a decision, so that was a data-driven decision and so it was good. But I just want to emphasize that there's more of the picture than that, of course.

Speaker 2:

Yeah. I completely agree, and I think we're big proponents of always using data and context and combining different sources of data and understanding kind of the pros, the cons and the biases that each source of data introduces. So I mean fully aligned with you.

Speaker 1:

Yeah, yeah, can certainly empathize with or I guess sympathize is the word with seeing this process of decisions being made in a way that's not very data-driven or you know more broadly, in a way that's just not very smart and it's painful to watch and it's painful to be part of and get dragged along for the ride. But how does that translate specifically to an interest in researching, developer productivity, because that's not where I personally would have gone with that. This must have, like, really grabbed you. I'm curious about that.

Speaker 2:

Yeah, I mean, I self-taught myself coding almost 20 years ago and in the mid-2000s I had an e-commerce business which I implemented myself and, anecdotally, that was the most money I've ever made throughout my career. So this was back in the day when implementing an e-commerce business and, anecdotally, that was the most money I've ever made throughout my career. So this was back in the day when implementing an e-commerce business wasn't just a template that you can plug in in a few clicks with a tutorial. I was using textbooks and sketchy websites to learn this stuff, so it was quite fun.

Speaker 2:

I've always enjoyed software engineering and then it's not that I came to Stanford with the mission to solve this. It's that while at Stanford, I was introduced to people one of them including Simon who were also wondering about the space, also passionate about this space, and we thought it would be a very interesting direction to start exploring. And then, as we explored further, we saw that, hey, there's really something here. We think we can propose something that is a different way of looking at this and which hopefully spurs conversations, spurs, ultimately, innovation and progress in this field in a way that maybe someone else up until now hasn't thought about.

Speaker 1:

Yeah, and this might be a dumb question, but why did you choose to share this information? Obviously, if you spend all this time researching, you're going to want to share what you found. But were you like, were you outraged and you wanted to share and like say, hey look, this is crazy, or did you want to share it in hopes that somebody would use that information to act on somehow I'm I'm curious about that.

Speaker 2:

Yeah, I mean we ran so once the first time we started identifying these kind of ghost engineers. You know they would pop up and we would just see it as a normal thing, because I've worked in organizations and I've been part of teams and whatnot where I suspected, almost knew, knew that some people were really not doing much, and so I, you know, as the research went on, we can have had the data on this. Of course, the interesting part about this research is that the more research participants we have, the more valuable the research becomes for everyone, and so then, early on, companies were very interested in participating in the research, but then the value we could provide in terms of benchmarking and maturity of the research would grow the more research participants we attracted, and so the reason for why we decided to post it. I saw a thread that was originally on X, started by the venture capitalist Didi at Menlo Ventures, and I left a comment just saying, hey, I researched this stuff and here's some of the things I see.

Speaker 2:

People really started engaging with it and something that I thought was normal and I thought, well, look, this is just how businesses are and I just don't think too much about it. Then I kind of saw that people there was something there, some kind of taboo topic or something that people knew was happening, but nobody really had concrete data and evidence or even attempted to research this. So upon seeing that, I said, wait a minute, I've literally ran this analysis and we can kind of put something together and and publish it because people will be interested in it.

Speaker 1:

And that's kind of what drove the, specifically the publishing of the statistics on yeah, yeah, yeah and I'm curious when you have a suspicion that somebody is not really doing much, what does that look like specifically? Are they, you know? One sign is just a lack of things coming from them. They're not submitting much code or anything like that. Maybe in their meetings their updates sound like bs or something like that. But what does that look like?

Speaker 2:

I think there's many flavors of this. I can touch in a few and maybe, uh, simon can chime in as well, because I uh, he has a lot of probably some experience with this as well and and kind of real, real proud environments, real working environments, um this can mean anything from or sorry, maybe Simon, if I can add a few comments anything from people actively trying to mask themselves right, and that's usually when it becomes, I think, malicious.

Speaker 2:

I've talked to a number of these people who don't do much and they've reached out and received lots of love mail, lots of hate mail, engaged with both uh, the people that say I'm a ghost. And this is why it's usually because it starts off as them being frustrated with their job and they start decreasing their performance. There's something blocking them, something that they don't like, maybe a unclear, uh, feedback loop between their work and the results or the rewards or the recognition. There's something there, some kind of politics. They decrease their performance and at some point they realize nobody's really telling me that, hey, why have you decreased your performance? So then they're just going to keep doing it.

Speaker 1:

Interesting, Simon. Did you have anything to add to that?

Speaker 3:

Yeah, look. So I think there is a number of dimensions to the problem. So there's people that they actually do nothing, so there is just no activity, right, like we've seen probably something along those lines throughout our career. You have people that are in meetings they don't say a thing, uh, throughout our career. You have people that are in meetings they don't say a thing. Uh, if you're online, no, no camera, no mic, no, no comments, anything, uh. Then then there there's people that you know they they contribute something. Um, you think it's not a lot, but you don't really have the time to always dig into it and question it. It's almost like they're trying to gaslight you. It's like they're trying to convince you that their contributions are super awesome and important and then you may choose not to fight the battle at that point in time.

Speaker 3:

So I think, specifically in my career, I also ran pretty big teams and I also thought like a CTO. Sometimes I was the last person to find out when a team was struggling. You know, if it's wrapped in a number of organizational layers, it can take time until you know, let's say, reality transpires to you. It can take time until you know, let's say, reality transpires to you. So I thought it would be good to have something that would give you a level of insight into what's going on, and for me, that was something you know and I had a personal interest in understanding, like what drives people to deliver high performance in in general, so I thought it would be a good thing to have so.

Speaker 1:

Throughout my career, I was I was missing something like that and I imagine you guys have encountered um what you might call involuntary ghosts, like people who want to work but their bosses just aren't really giving them anything. Have you encountered that?

Speaker 2:

So the thing is, when we identify a ghost, we don't go and talk to them, Right? So then it's a bit hard to get to the truth of why that person is disengaged or a silent quitter. Based on the people that I've talked to and I've had um chats with I think it's like about 10 people through signal, basically like conversations, and then maybe, uh, a few dozen others a bit more than a few dozen have reached out by email and I've exchanged messages. Some of them maybe do voice that, but uh, mostly the themes I see in talking to these people is just frustration, disengagement and um unhappiness, feeling I don't know stuck, like that's kind of how it all starts and then it ends up evolving into different things, but that's a theme that I I sometimes uh encounter when talking to them interesting, interesting, and does your research include, like just you know, okay, aside from the ghost engineer idea, there's just a natural variation in productivity.

Speaker 1:

The best people are more productive than the worst people. How much does your research include that kind of stuff?

Speaker 3:

So look, I think it's normal and we all have, that there's just periods when we're more productive and there's periods when we're less productive, and I think that's completely fair. And then if we looked at just one data point, then we wouldn't know Are you kind of like in a low or are you in a high? So the thing is, we run long-term analysis and a long-term study. So if you don't do anything over, let's say, half a year a year, then how much time is enough time to get out of a low, I think, is the question here. So that's one. So I think it's people. They may have a low and I guess they just need some support to get out of it. But I'm just speculating here. So that would be totally fair. So I think that we hope that with our research you could kind of like help those people.

Speaker 3:

There is people that obviously and we have encountered some of those that work two jobs and then they realize like, hey, I can, I can do that, I can get away with it, and and then it becomes kind of like you know, oh, ok, so I'm not getting caught. And then it becomes kind of like you know, oh, okay, so I'm not getting caught. So I'll just continue to do that, and that's kind of, in my view, a toxic behavior that shouldn't be accepted. Right, because in the end, like you, as an employer, you don't know if somebody is working two jobs. Right, like you don't know that it's like in an affair, right, one person is committed and the other one isn't.

Speaker 1:

Right.

Speaker 2:

Yeah, it's not the deal that the other party agreed to.

Speaker 1:

Yeah, interesting. So I work as a consultant and I started part-time in early 2023 and then full-time in late 2023. So, as of this recording, I've been full-time a little more than a year. Early on, I would consult with ctos of very small software companies like, where the cto is the only programmer and then I started working with bigger teams, maybe 10 or 50 or 60 people, and I you know all this stuff and I'm sure I'm sure this is not lost on you guys like there's a huge psychology component.

Speaker 1:

It's almost like you could say it's not software engineering with psychology mixed in, it's psychology with software engineering mixed in, and it's just so fascinating to learn how different people think and how different people think of their job and what it means to do a good job and why we have a job and what you owe to your employer and what they owe to you and all that kind of stuff.

Speaker 1:

Me personally, I'm driven by ego and conceit and megalomania and stuff like that, and I just want to. I'm driven by like showing that I'm the best, like hey, watch this. I'm driven by like showing that I'm the best, like hey, watch this. I'm going to do this way better than you guys. So that's me, but that's not the typical case and for some people it's like their job is like a distant second focus in their life and they just want to get out of work as fast as possible so they can go do their their main thing. Whatever they do outside of work and I'm not making any value judgment on that but it definitely results in a different kind of while you're at work. It results in a different, in different behaviors. Than somebody who's like thinking about programming and stuff like that, people who maybe have they're just like an intellectually engaged person, like 24-7 kind of, they're always like reading books and stuff like that, versus somebody who clocks out of work and immediately they just like are watching TV or doing some hobby or something like that.

Speaker 3:

I mean, I hear you but you know it's true what you say, but in my view this is also very black and white right now.

Speaker 3:

So it could be I'm you in one phase of my life, it could be I'm the other person in another phase of my life, and I think that's totally fair and I think the company should support that. But I think I, as a you know as yeah, I guess my point was is just I may have these highs and lows, like it doesn't have to be different people so and we see that actually in the data set like that people they have periods when they're very engaged and very hungry and then they're less. It could be, you know, maybe they have a family now, and I think that's totally fair. I think also there is people, you know, that are professionals and they are very focused when they work, but then after work they just want to, you know, you know, left alone. I think that's that's also fair, but I think that's not really in the context of the research. That's more about the work environment per per se.

Speaker 1:

Yeah, it's an interesting. I just wanted to.

Speaker 3:

Yeah, you know it just triggered something in me that, um, you don't have to be different persons to be one or the other. You can be that yeah.

Speaker 1:

Yeah, I really hadn't thought about that. And as I think about that a little bit you know I'm 40 now Back when I was like 24, I remember that I would work during the day and then I would go home and I'd be like all right, like where's what's, what's happening tonight, like where's the party, and I would just you know it was. It was all socialization. That was all I wanted to think about. You know, I was single at the time. I was living in an apartment in downtown Austin, texas. There was tons of stuff to do. Now I'm 40 and I have two kids and a wife.

Speaker 1:

We live on a farm in the middle of nowhere and it's a very different life and farms are expensive and so I'm trying to like earn as much income as I can, because you know we got a barn to pay for and stuff like that. So that's a very good observation. People go through different seasons of life and, depending on their life situation at the time, they might have a different appetite for work at yeah, um, okay, and I'm curious also about any reaction to this research. I I think I've already seen some, some reaction. I'm not like a super duper, like finger on the pulse in social media kind of guy, so I'm kind of oblivious to what might have happened, but I'm curious what the response was there's a bit of everything.

Speaker 2:

Um, this whole thing kind of started with uh, or the whole topic and discussion started with uh, with uh, you know, dd, the, the vc at menlo, and uh, he's received commentary from other prominent people in the tech space saying hey, I know this happens, it's a fact because I've seen it, and the degree to which it happens may be bigger or smaller, right, but it's a fact, and so there's been a lot of movement and support around that.

Speaker 2:

Of course, there's always people who dislike it or who find holes, and I think that's totally normal. And the reason we put this out there and we published the paper is so that people would look at it, give us their feedback, and then we can use this feedback to improve the next version of the model or the future papers that we're planning to release, because this is what we're doing is not just an isolated research exercise. We have the intention to release more papers, the next one of which is going to be out in a few weeks. And, yeah, I think feedback is great and we welcome all thoughts and opinions and we try to read as many of them as we can to incorporate that feedback into what we do.

Speaker 1:

And are you ready to talk about what those next areas of research are going to be or not yet?

Speaker 2:

Sure, I mean we, so I don't want to promise things that then may not materialize. But the paper we're writing up right now it looks into using LLMs to evaluate the output of software engineers. In a sense. So it builds up on our first paper and it kind of like compares and looks at some pros, some cons and some ways to potentially enhance what we are doing with LLMs. That's one direction. The other direction is definitely continue going down this let's call it ghost engineer topic, in the sense that you know, publishing more rigorous methodology of what we did, what we didn't do, what are some of the biases that could be present and just be super transparent around the good things but also around the things that maybe didn't work out so well.

Speaker 1:

Yeah, okay, and for measuring, if I remember what you said a second ago correctly, measuring developer productivity with the aid of LLMs it seems like if you endeavor to measure productivity, then a prerequisite is having some idea of what productivity means. So I'm curious what does developer productivity mean to you guys?

Speaker 3:

Well, I think we don't know. I think that's why we measure output and not productivity. We conduct research in the domain of software engineering productivity. I think that's the biggest problem, right Like there is no definition of the term right.

Speaker 3:

And that's the problem. Personally, what inspired me to work in that space is that, like I saw people still working with lines of code, counting commits, so so doing very, very basic things, or you know story points to to measure their team's productivity. So we thought that there should be something better. And then we saw that there is not a lot of research done to establish that Like we also don't know what's the best way to measure it. Like we're trying one way and we're in the phase where we're getting feedback and some of it is positive, some of it not that positive, and we're thinking about, like how we can improve it. But this is exactly what we wanted.

Speaker 1:

Okay. So when you say, you know, we don't know what developer productivity is, so that's why we want to measure output, I'll ask then what does output mean to you?

Speaker 3:

Right. So essentially it's a, it's a score, like I'm talking now in the context of our paper and I don't want to get very technical with it Because I think maybe your listeners that want to dive deeper they can look at the paper. But on a very high level we saw that people, they count commits, they use story points, they use very basic metrics to measure the output of their software engineers and then, for example, when you use story points, you cannot really compare that across your team because if you go through a series of adjustments or people estimating differently, it may not mean the same thing across teams. Adjustments or people estimating differently, it may not mean the same thing across teams. So we thought it's actually crazy that people try and measure output without looking at the source code. And that's kind of what inspired us to build a model that looks at the source code and all of the metadata that you have on Git, like when it was committed, like the changes, like the actual content of the commit. So we parse the code in there, we try to put it in context of the repository, so we look at a number of data points in the code and we try to build a score that kind of makes the output score more meaningful.

Speaker 3:

We used an expert panel to calibrate the algorithm because, for me, as an engineering manager, if I had a team that was struggling, I would always send a team of trusted people in there to look at the situation. So they would do the review, they would look at the architecture, they would look at the practices and do that. So I thought it could be valuable to just, you know, have a quicker way of getting the same feedback that a team of trusted engineers would give me, and that's how we came up with the initial study setup of the paper that we published.

Speaker 1:

Interesting. Okay. So I don't really understand, but let me try to. So obviously, let's go to one extreme, maybe, and then we'll come back. So obviously it would be ideal if we could just directly attribute a developer contribution to a piece of business value and say like, oh, this day this engineer created $5,000 a month worth of business value, and so we can measure that against this other value that this other person created. Obviously it's not knowable, because there's so much in between the developer's contribution and the realization of the business value. That's true.

Speaker 3:

But then I want to add, just if we're talking about that particular example, then so I think there is always a separation function in a company, right, like so you may have a business person say, hey, build me that, so you may be building it in a very productive way, in a very efficient way, right, but then there is no business value to it. So I think, do you want to punish the engineer because he built what business requested, right? So I think you're already like mixing two things, and that's why, in our setup, like we propose, looking at the outcome is very important. I think what you described is like a business outcome that the engineer facilitated in a way and that way you could infer productivity. But I think that's a number of things that have to come together and work out beautifully so that you get to that.

Speaker 1:

You know, five thousand dollar outcome that you gave in your example, yeah, um and sorry if, if you haven't uh happened to catch any episodes of the show in the past, the show is all about tangents and, uh, if you don't mind, I'm going to drag us on a tangent now.

Speaker 1:

Um, I think it's a really interesting, uh, a really interesting area of like. You know, there's that separation layer of like maybe an engineer built something really efficiently, but the person who asked for that feature, it was the wrong thing to build, and so should we punish the engineer. I think there's an argument to be made that an engineer has an obligation to and you're not always in a position to do this, but an engineer has an obligation to and you're not always in a position to do this, but, um, an engineer has an obligation to push back on building things you don't think should be built, like, for example, the other day I shouldn't say the other day, I already said it um, but, I I'll say it in a little bit of more cloaked way and I'll be less candid than I was going to be.

Speaker 1:

But there was a feature in one of my client's organizations that was going to be built and somebody showed it to me and they asked me how I would go about estimating how long it would take.

Speaker 1:

And I asked some questions about it and I discovered that this project which maybe would have been ballpark guess like a $30,000 or $50,000 project, something like that, before we took that measure to do that $50,000 project, we could probably do something in one or two hours that would take no coding, that would give us the same or better business outcome that we could try first, not even, instead, just try this really cheap thing first and then, if that didn't do what we needed, we could consider this $50,000 project. And, of course, provided that I'm right which I don't know if I was right or not but an engineer who can do that is so much valuable than an engineer of equal technical capability who either doesn't have the ability or willingness to say like no, no, no, let's not do this thing at all. I don't know where that falls into the question of output and productivity and all that stuff, but I couldn't help but ignore that. Or I couldn't help but not ignore that philosophical question.

Speaker 3:

I think you're hitting on a good point and I think the intuition that you want to focus on outcomes is a good one. I think, ultimately, as a business right Like you need to have different teams, different people that have to work together to create great outcomes. For a business right Like, I think that's a good thing to focus on. But what we see across our study participants like they're very conscientious on the outcomes they wanna achieve. They have a good prioritization framework to pick the next thing to build. But then we also see that they have a very high output, and I think that's why we want to focus so much on the output right now, because when you talk to people in Silicon Valley at least that was my experience will focus on the outcomes anyway. So that's, that's top of mind for people.

Speaker 3:

But uh, what we're saying is like, hey, why not looking at the output as well? I think it's something that you should not neglect, and and we see in in across our study participants that companies that have a very high output but have a good culture in place that allows them to deliver against outcomes, they seem to be the most successful ones, both from a business standpoint and from an engineering standpoint. Like you can see, there's these high performing, very healthy engineering organization where all of it comes together. So I think that's what we ultimately want to demonstrate that you can't just pick one and be good at that, and if you want to be among the best, then it's harder. You've got to focus on a number of things. It's harder, you've got to focus on a number of things.

Speaker 1:

Yeah, yeah, it seems like yeah. That part can probably be pretty safely left out of the picture.

Speaker 1:

You know the gap between output and the ultimate value, because, paraphrasing something that I think you were saying is like was saying, is like people who tend to be good at some things tend to be just good at most things. Organizations tend to usually do a good job of everything, or a mediocre or poor job of everything. Rarely do you have an organization that does an excellent job at one thing and a terrible job at another thing. As I say that out loud, I'm not sure that what I just said is actually true. Maybe it's sometimes true and sometimes not, I don't know.

Speaker 3:

I think it's true. I think most of I mean almost anything you can imagine is something that we see in the participants. It's a pretty diverse group of companies. I'm just. I think you know, measuring software engineering productivity is a hot potato topic, and I think that's part of the reason why Igor's tweet went went viral.

Speaker 1:

Yeah, yeah, I know yeah go ahead.

Speaker 3:

Sorry, I just think it's important that you also try and quantify it. Yeah, it's not like it shouldn't be completely left in the dark. We can argue about what's the best way For me. Not measuring is not an option.

Speaker 1:

Not. Measuring is not an option.

Speaker 2:

Hmm.

Speaker 1:

Why do you say that?

Speaker 3:

Because I felt like that. A lot of engineers that I talk to, they just don't like to be measured. They say like hey, I'm not a salesperson, it's different. Like exactly all of the arguments that we heard, like why you cannot look at, you know, output, because output doesn't matter. Because output doesn't matter, you got to look at outcomes. So I'm not sure I'm delivering the point well, so I think you can deliver a great outcome when you're doing nothing. That's what I'm saying. So you want to focus on outcomes, but you also want to make sure that people work hard, because that's kind of giving you the tact of things. Yeah, and I would agree that it's not only the code that matters. Engineers have to do many things, but our hypothesis is that everything you do flows in one way or another through the code.

Speaker 1:

Yeah, that's definitely true. The distinction between whether it's possible to do something in principle, or whether it's possible to do something in practice, or whether it's just not possible even in principle. And it seems clear to me that measuring developer productivity is possible in principle. If you were an omniscient being who could just know everything, then you could know who's productive and who's not. That seems clear to me. So I think the question is not whether it's in principle possible to measure it, it's just what would it take in reality, and can we do that? Does that seem right?

Speaker 3:

I? I I think it's what. What you say is true. Um I I think that, um, um, you know you there, there is there, there are ways. Now you know, with the advent of LLMs and machine learning, that you can know a lot more. So it may not be true in all cases. Again, I think we talked about it in the beginning that you should take data with a grain of salt with a grain of salt. So I don't think we're able to deliver perfect data with the analysis far from that but I think it's better data relative to what we have been using in the past.

Speaker 1:

Am I understanding right that we're going off of the code? So, okay, we can think of like this whole, uh, this, this sequence or life cycle of of something, um, where it goes from conception to implementation and release and all that stuff, um, and there's all this stuff out in the world. It's just too. There are too many facts, factors mixed together. It's not realistic to factor all those things in, like we talked about. We can't go all the way to the business value because that's just mixed in with all this other stuff. So, if I consider this like stack of factors or whatever, um, are we going like to the code level and like kind of stopping there? Like we're not.

Speaker 1:

If we imagine that, instead of using LLMs, there's just like some super smart guy and we can say like, hey, jim, go look at all this stuff. We're having Jim like look at the code base. We're not having him like look at the UI of the product or something like that. Right, okay, okay, yeah, that's interesting. I'm trying to imagine and, by the way, I'm not trying to, uh, I'm not trying to take a position on this, I'm just trying to to understand, um, if I were to look only at a code base and I could see it evolve over time and see people's contributions and stuff like that. How would I tell um? How? How would I measure people's output and make meaning of that? Um, that part is pretty unclear to me. Obviously there's these like what you might call naive metrics, like lines of code and stuff like that.

Speaker 1:

And I think it's well agreed that, like that's not a meaningful metric to go off of, but it's not clear to me what is.

Speaker 3:

So okay, so we agree on lines of code. That's just making that clear. So I think a lot of it can be inferred and I think of it like that. So just think of quality, right Like so it's tough to measure. You don't know what's the right amount of quality, right Like so it's tough to measure. You don't know what's the right amount of quality and what's the right amount of refactoring. So a lot of people struggle with that.

Speaker 3:

But then when you look at your team and you know the code base and see how it evolves, then you can put that in context with teams that have a different level of I don't know unit test coverage. If we want to use that as an example, then how do they fare over, let's say, two or three years? Like does their output stay the same? Is it better, is it lower? So I think this is the kind of like inference we're trying to make here. So we also don't know what's the appropriate amount of quality. I think it really depends on your context, but we see how it works out for certain teams and we can compare that. We see that, for example, and like as an engineering manager, you know that, you have that in your intuition that, like you, should focus on a clean architecture. You should probably have a healthy level of test coverage and automation in place and if you don't like them, you'll be struggling with defects, unplanned rework, which will slow you down over time and turns out we can see that with our measurement. So now we have something that we kind of always knew but didn't really, you know, weren't really able to quantify. We have a model to do that and that allows you also to build a business case for, let's say, the non-technical people you're working with and that can help you, as an engineering manager, build the case of why you should now invest in refactoring or cleaning up, for example.

Speaker 3:

In refactoring or cleaning up, for example and there is many of these, you know they seem small decisions, like a lot of people say like hey, we have to deliver things fast, so we don't have the time to clean now. So we see how people, for example, they try and move fast, they take shortcuts, they have a big release and then you have a dip after that. It could be, for example, they try and move fast, they take shortcuts, they have a big release and then you have a dip after that. It could be people are tired, people need to recover, but then there is a significant amount of cleanup in their code base that needs to happen and then you could be doing something that adds business value and delivers good business outcomes. But you're cleaning up because you do a shortcut. So you can see these kind of decisions that you take and behavioral patterns that you may have as a team in the code, and that was our hypothesis when we started the research.

Speaker 1:

And you have to have a large sample size in order to be able to be able to draw meaningful conclusions and tell what's what and stuff like that, because obviously projects have a lot of factors involved in them. Yes, and it's very easy to say this happened on project A and then later such and such happened, and because the thing happened, that's why the other thing happened later. But it could just be a coincidence. So I imagine you probably have to have a large sample size.

Speaker 3:

I think that would also be the strongest counter argument against what we're doing. I think it's very hard to isolate things because there's always a lot of things going on in an engineering organization. But, yes, you can address that with a larger sample size, and I think that's also why it took us so long to. You know, we started this over two years ago and it took us practically two years to release the first paper, because a lot of it was really convincing companies that hey, there's research that we want to do. Why do we want to do that? And please give us access to all of your data. Oh, wow.

Speaker 3:

So it took a long time, you know, like to build critical mass, and in the beginning, like there was really nothing that we could offer in return, it's just like, hey, we have this idea, can you support us please? So it took us a long time to convince enough companies to join the research, to join the study and also to stay in the study, right, like we. We obviously want to see how people, you know, act once they. They get the data. If that changes anything, right, so it's. We really see this as a long term study over many years. So we're at 50. Yeah, 50,000, and change in terms of engineers that are in the study.

Speaker 3:

Oh wow, we continue to onboard study participants. We continue to onboard study participants. So we want to continue to do this and we'll keep publishing.

Speaker 1:

Okay, and as we head toward a conclusion here, is there anything that you would just love to shout from the mountaintops? If there's anything you learned in the course of doing this research that you wish everybody knew, it's probably a number of things.

Speaker 2:

I'm not sure if we can isolate it down to one. Perhaps the biggest one is don't be afraid to have these conversations right. It's. Yes, it can be perhaps a bit intimidating to shine light. That could also be, you know, distorting things slightly into the closet because you don't know what's going to come out and you don't know what's going to happen.

Speaker 2:

But, ultimately, I think that having conversations around this topic, around hey, how do we measure output, measure engineering productivity, what does productivity mean? How can we improve developer experience? How does that impact developer productivity? All of these topics are super important, especially now with the rise of LLMs, ai, ml, all of this stuff which is going to transform how software engineers work. And so, to some degree, if you're unable to measure their output and to measure a bunch of other things, how can you really quantify and understand?

Speaker 2:

Number one, the impact of these new technologies that are having on your software team.

Speaker 2:

Number two, two, decisions around using these technologies, not around, not just around model selection or fine-tuning, but also ways of working, ways of incorporating them in your workflow.

Speaker 2:

A lot of it can be, you know, derived through, yeah, like you can ask and you can kind of see what feels right, but then some of it also needs to come from from data, right, and so that's perhaps, uh, maybe, the. The thing I would like to encourage people to to do is to just, you know, think about this critically, be open to it, and it's not that the concept of measuring productivity doesn't mean you're going to lose your job. It doesn't mean you're gonna, you know, whatever it just means that look there's, there's ways we can do things better. And, at the end of the day, if software engineering were less political and less based off of intuition and all these things that nobody likes and that some people are better at playing than others, and transparent and meritocratic and fair, then I think this would be an improvement for everyone, from ICs to team leads and managers to senior leadership, to even people who work adjacent to ICs and engineers.

Speaker 1:

Well, before we go, is there anywhere you want to point people where they can find your research? Or if people want to get involved and help you guys, just anything at all you want to share where we can send listeners?

Speaker 2:

Yeah, I encourage them to visit our research portal, which is softwareengineeringproductivitystanfordedu. As well as this, we also publish regular insights on X, so Simon isn't super active there, but I'm trying to be a bit more active. And then LinkedIn and other social media channels are also a good way to reach out to us. We're always happy to chat and talk about this stuff and we enjoy pushing this forward and making progress to make software engineering a bit better for everyone involved.

Speaker 1:

Awesome. Well, simon Igor, thanks so much for coming on the show.

Speaker 2:

Thank you you.