Pragmatic Data Scientists
Make Data Useful
Pragmatic Data Scientists
Why is Breaking Things Good In Product Development? | VJ1
By definition, a SEV is a good problem to have when I started my career, I had a lot of ideas about how software engineering should be done, how we should really think about the pristine architecture that is infinitely scalable. I now believe that every successful software out there in the market, everything that has become extremely successful, way beyond what you expect, is hacky. Yeah. You know why?
YZ:Hi, Vijaye. Hi. Yesterday we had a stand up, and you talk about our SEV that happened recently. It's a big SEV, and you talk about your philosophy of handling SEVs. But that was quite shocking for me, because it's the first time that I hear about a leader having this kind of attitude towards SEVs. What happened there?
VJ:Well, you're talking about the stand up where I talked about our SEVs, and I mentioned that SEVs are a natural part of building software. Building production software that is used by customers at scale. As an engineer that, you know, been in the industry for like 25 years I know very well that it's impossible to write software without bugs. And so the idea is to not prevent SEVs. The idea is to learn from SEVs and prevent the same SEV from happening again. So it's about mitigating the issue and then... Putting stuff, putting guardrails in place that would then prevent that particular stuff from happening again.
YZ:At beginning, it sounds natural and correct, but when I think about how other leaders that I worked with, when they view SEVs or even common engineers, how they view SEVs. As much as people try to not blame anyone, SEVs always come with a very negative connotation. Yeah. So you are actually at the extreme of, you know, these are not only not-bad or blameless, but also they are natural.
VJ:Yeah, I've like built software projects, both big and small, and some of the software projects have been very successful. Some of them have not been successful. I would rather be in a position where a SEV happens because your product is successful, is used by people. Because if a problem happens when nobody is using your product, it's not a SEV. And so. By definition, a SEV is a good problem to have, which means people care about your product and you, as the software engineer building your product, care so much that you don't want to let your customers down. That's why you call it a SEV; that's why you treat it so important. And so, yeah, so I take a very different position about this. I don't think we can avoid SEVs. I don't think we can avoid bugs. And that's not, that should not be the motivation.
YZ:Oh, that is actually interesting because like from a data scientist's perspective, we look at SEVs, we know something is wrong. But actually when I think about the software I use, there are a lot of times there are like very bad bugs and never gets reported or solved. Yeah.
VJ:I mean, The fact that we acted quickly is important; the fact that everyone came together to mitigate the SEV is important; the fact that we address that problem by worrying about what are the mitigations first, without worrying about blame, without worrying about what caused it; those are all important. So I think SEV as a process has a very finite set of steps that you have to take when you enter in one of those, and then after the fact a SEV is mitigated, you could really worry about what is the real fix for it, how you avoid these same bug or regression from happening again and how do you put some controls in place? So, you know, the system overall gets better than before. So a sev's a good way to like harden your product and you get in a better place.
YZ:Yeah. But still from a software developing point of view, I watched your interview with Josh and he named the topic as"code is cheap". You mentioned at the beginning of your career you aspire to write these bug free scalable code with good architecture, good data modeling. But I think to this day, engineers still view SEVs as a bad thing because we feel like SEVs happen because we are not competent.
VJ:Yeah, no, no, actually, it's not true because I, when I started my career, I had a lot of ideas about how software engineering should be done, how software should be built, how we should really think about the pristine architecture that is infinitely scalable. Yeah. But when you think about in practice. There is a you're always trading off with product and product success and business success. Some of those have a time window and you need to, like, be able to balance and trade off between what to prioritize. Sometimes you, you know, we as engineers, we love you know, pristine architecture. We love recursion. We love to, like, get into this thing to. Do so many optimizations whether it is algorithms or whether it's data structures Some of those are premature optimizations some of those you don't have to Until you give the product to the customers So you kind of like want to optimize getting the product to the market as quickly as possible and then if you do end up finding problems, you can always go back and fix it. But if you miss the window, there's no opportunity for you. Actually, we'll take another stance here, which is I now believe that every successful software out there in the market, everything that has become extremely successful, way beyond what you expect, is hacky. Yeah. You know why? Because no matter what, you can only build software that can scale maybe an order of magnitude, maybe two orders of magnitude. But if a product is a runaway success, we're talking about like... Four, five, six orders of magnitude bigger. And so if you had spent all the time building for that, then you wasted your time. And general, what happens in a, when a product is becoming very, very successful, the team is constantly going and fixing things that are breaking at the seams. Yeah. You're, you're fixing things, you're patching things over time. You accrue a lot of what I call tech debt. And it's okay to accrue tech debt and there will come a point in time where you have like so much unsustainable tech debt that you will go back and clean it and as you're cleaning it You're keeping the product that's already successful
YZ:And now I, I think I have this mental model, like we think the distribution of successful products are normal distribution, but they're log normal, like they are the extreme tails. And those are the ones that we actually see
VJ:and use. Yes, you don't talk about the products that were built with very, very good architecture, but never really saw the
YZ:scale. Yeah. Yeah, so I guess the tradeoff is put it in front of customers let customers tell you if this product is good or not. Yes. If it's good, then solve that problem. Solve those problems. Absolutely. Wow. Cool. This is a very deep philosophy. Thanks. Thank you. All right. Bye.
Eric:The philosophy is different from how we have to operate with infrastructure and existing products. And I think that's what it comes down to. So definitely I think like moving really fast and having SEVs in product development. Yes, for sure. But like for example, if feature gates were to go down at this point, we actually can't just, in my opinion, hand wave it and say, Hey, we're just moving fast because It actually becomes a core infra component of all the companies that use us. so, I think there's different ways of slicing the same problem. I think like, from a large, overreaching viewpoint, definitely makes sense. But I think once you start to slice it between like, product side facing stuff, and then infrastructure stuff, and how our product is used by all our customers, then it becomes more nuanced.
Marcos:I think it still applies to infra though, like for infra you're not going to build something that will work, like when we started we're not going to build something that works for like billions of events, well we're going to build it the first time it's not going to work for billions of events.
Eric:But that's a new product once again, right?
Marcos:Sure. But in this case feature gates, now we care about it because people use it. Yes. At the beginning when we're building it, it's not a big deal if it doesn't work.
Eric:But if we cause SEVs now for feature gates. Sure.
Marcos:But now it's, now it's a successful product. So now we need to care about it.
Eric:So, we agree. It's just like, that perception is very true for new and upcoming products, right? But once you are part of the critical infrastructure for everyone else, you can't treat it the same way.