Code with Jason

247 - Steven R. Baker, Creator of RSpec

Jason Swett

In this podcast episode, Steven R. Baker dives into test doubles like mocks and stubs, discussing their essential role in robust code development and challenging traditional testing practices. The conversation covers the nuances of Test-Driven Development (TDD), including writing failing tests first for better code clarity and test coverage, and explores RSpec's influence on TDD. Additionally, Steven examines Ruby's adaptability and the integration of AI in programming, providing listeners with actionable strategies for more maintainable codebases and a balanced view on AI's evolving role in software development.

Jason Swett:

Life hasn't been the same since the pandemic. Instead of working at an office around people all day, like we used to, most of us now work remotely from home, in isolation and solitude. We sit and we stare at the cold blue light of our computer screens toiling away at our meaningless work, hour after hour, day after day, month after month, year after year. Sometimes you wonder how you made it this far without blowing your fucking brains out. If only there were something happening out there, something in the real world that you could be a part of, something fun, something exciting, something borderline, illegal, something that gives you a sense of belonging and companionship, something that helps you get back that zest for life that you have forgotten how to feel because you haven't felt it in so long. Well, ladies and gentlemen, hold on to your fucking asses. What I'm about to share with you is going to change your life. So listen up.

Jason Swett:

I, jason Sweat, host of the Code with Jason podcast. I'm putting on a very special event. What makes this event special? Perhaps the most special thing about this event is its small size. It's a tiny conference, strictly limited to 100 attendees, including speakers. This means you'll have a chance to meet pretty much all the other attendees at the conference, including the speakers. The other special thing about this conference is that it's held in Las Vegas. This year it's going to be at the MGM Grand, and you'll be right in the middle of everything on the Las Vegas strip.

Jason Swett:

You got bars, restaurants, guys dressed up like Michael Jackson. What other conference do you think you can go to, dear listener? Or you can waltz into a fancy restaurant wearing shorts and a t-shirt, order a quadruple cheeseburger and a strawberry daiquiri at 7 30 am and light up a cigarette right at your table. Well, good luck, because there isn't one Now. As if all that isn't enough, the last thing I want to share with you is the speakers. And remember, dear listener, at this conference, you won't just see the speakers up on the stage, You'll be in the same room with them, breathing the same air.

Jason Swett:

Here's who's coming Irina Nazarova oh yeah, here's who's coming For Freedom Dumlao Prathmasiva, Fido Von Zastrow, Ellen Rydal, Hoover and me. There you have it, dear listener. To get tickets to Sin City Ruby 2025, which takes place April 10th and 11th at the MGM Grand in Las Vegas, go to sincityrubycom. Now on to the episode. Hey, today I'm here with Stephen Baker, Stephen welcome.

Steven R. Baker:

Hey, thanks for having me.

Jason Swett:

You and I had a nice conversation some months ago and we talked about RSpec, which I believe you are the creator of. Yes, that is correct. Yeah, so I'm very happy to have you here, and we crossed paths again because there's some discussion on Twitter about mocks and stubs and such. I suppose it was triggered by me sharing my table of contents for my new book that I'm working on, and you said something that I thought was really interesting. I think you said something like please don't make it focused on dependencies and getting dependencies out of the way, and stuff like that. I forget the wording, but I'm really curious to hear more about that.

Steven R. Baker:

Yeah, so basically, when most people think of mocks, usually when people talk about mock objects, first of all they're talking about stubs, and there are several different types of fakes all under the umbrella of test doubles, and I can go into more detail on that shortly.

Steven R. Baker:

One of my one of the bugs up my arse about this kind of stuff is, um, that, uh, uh, there's a lot of misunderstanding around testing and and various methods of testing and stuff, and one of the ones, uh, with regard to mock objects in particular and stubs, is that people tend to only use them to abstract away a third-party dependency like a database or a, an api service, um, an authentication service or something like that, something that has to talk over the network or or has a dependency, and I believe the thing I said was, um, managing those dependencies is actually a side effect that you get, like a benefit you get, but it's not the point of of mocks, and so, um, speaking of mock objects specifically not stubs and not fakes and not, uh, other kinds of test doubles Mock objects in particular are about testing, about verifying the interface, that two objects speak to each other, the messages they send to each other.

Jason Swett:

Yeah, yeah, so let's get a little deeper into that. So sometimes I get a little bit mixed up on terminology and stuff like that, because in this area of mocks and stubs there's a couple of different use cases. And yeah, one is like let's see if I can come up with an example off the top of my head. But like you have a method that you want to assert gets called, and so you might mock that method if I'm using the right word, and then you can say assert that this method was called. Did I just string together a coherent thought?

Steven R. Baker:

Yeah, yeah. So it probably helps to talk about the different kinds of test doubles. So under the umbrella of test doubles and the definitive guide on this is Martin Fowler's article called Mocks Aren't Stubs. But under the umbrella of test doubles, there are mocks, there are stubs, there are fakes and there are mocks. There are stubs, there are fakes and there are spies. Um, so let me I'll talk about mocks last, because those are the, I think, the most interesting.

Steven R. Baker:

So a stub is just an object that when you send a message to it, uh replies with a hard, with a canned response, same response every time. There's no logic in it. It it's just, it's an object that you place, that you put in place of another object that just has a canned response, so that you can be sure that you know what you're getting back. The common use case for stubs is a random number seed. So let's say you have code that uses a random number and obviously it's really hard to test a random number. So you might actually replace the random number generator object with a stub that always returns the same number. That would be a good use of stub so that you know what you're getting back. A fake is an object that doesn't actually do the work but can be used in place of another object, has the same interface, obviously, and might have a little bit of meat in. It might have a little bit of behavior, but generally not Fakes and stubs are pretty much the same.

Steven R. Baker:

Spies are really interesting. A spy is an object that has all of the actual behavior but it records how it was called. So let's say you have a database interface. You might put a spy around your database interface object so that you still interact with the database. But then you can assert on the messages that you sent to it. And a mock is just an object that knows how it was called. It doesn't do anything, it just knows how it was called. And a mock doesn't do anything, just knows how it was called. A spy does the thing and knows how it was called. So if you're trying to divorce yourself from a third-party dependency, you wouldn't use a spy because the spy would still call the third-party dependency.

Jason Swett:

But a mock would not yeah, okay. So this is super helpful for me personally and I'm sure for a lot of people listening, because all that stuff is stuff that I've used. I just didn't know the right terminology. It's kind of like I play guitar and maybe like I'm familiar with certain intervals and stuff like that. It's like I don't know what this interval is called, although I use it all the time and I'm familiar with its characteristics, but it's like, oh, that's a minor seventh.

Steven R. Baker:

I never knew that, but whatever I was thinking exactly the same thing the first time, because I'm also I'm learning how to play guitar now, and the first time I was asking a friend of mine can you tell me? I I'm trying to play this song, but I can't figure out what this chord is. And he said, oh, that's a minor, that's the minor sixth. And I said what the fuck is six? I don't know how to play a six, I know how to play a g and, and so he taught me about the circle of fifths.

Jason Swett:

This was very early on um and so that that was the thing that I learned, uh, recently too, um, yeah yeah, yeah and so um, with this stuff, like, by the way, for whatever it's worth, I was planning to address all that stuff in my Mocs and Stubs chapter, although there is definitely some research in between where I am now and being capable of documenting all that stuff with the correct terminology and stuff like that. But the snapshot you saw was only what I had so far, which is, which is the third party dependency stuff.

Jason Swett:

And I started with that because that's like one of the more common questions I get and one of the more frequent mistakes I see, because I find people wanting to like well, they'll like stub the response and then assert that the stubbed response is what they expect.

Steven R. Baker:

Yeah, I see that all the time.

Jason Swett:

Yeah, it's like, bro, you just stubbed it.

Steven R. Baker:

Of course it's going to be what you expect. Yeah, this test cannot fail. I learned a thing many, many, many years ago I can't remember who from probably Brett Shewart that never, never, tried. It's a common thing, but that I'm pretty sure brett is where I learned it from many years ago.

Steven R. Baker:

um, uh never trust a test you haven't seen fail, um, and I I'm just I'm making some notes here because I have a few, uh, a few, a few points that I think are important to to cover um. So, when you're doing the third-party dependency extraction, I think in the table of contents you shared, I think you talked about Stripe and Google authentication or something like that Stripe yeah.

Steven R. Baker:

And those are excellent places to use test doubles, exactly for the reasons that we've already covered and for the reasons that I'm sure you cover in detail in the book. But those are probably not mocks. Those are probably stubs, right? So the way that I describe mock objects to people, and in Martin Fowler's article he talks about mockists and statists. Okay, and I am a hardcore mockist because I come from I look at the world in terms of small talk, uh, which is a dynamic live environment.

Steven R. Baker:

Ruby is heavily inspired by small talk. Um, it's a dynamic live environment and you can objects communicate with each other by sending messages. Um, when you send a message, when object foo sends a message sorry, when you send the message bar to object foo in Ruby that would look like foobar you're sending the message bar to foo. You are not calling the method. The object is calling the method in response to having received the message. Now, a lot of people ask me or complain to me when I explain this, that that sounds like. That just sounds like a whole bunch of bullshit. You're actually calling the method. You're not In Python. When you do foobar, you are calling the method. That's what you are doing. But in Ruby and in Smalltalk, you are sending the bar message to the foo object. The object then decides what it will do with that message. That's why we get to have things like method missing and this is made obvious by the send method on object in Ruby, because you can send a message to. So you are not calling the method, you are sending the message.

Steven R. Baker:

So when I'm describing mock objects, I usually tell people to picture a graph where your nodes, the circles, are the objects and the messages they sent to each other are represented by the edges. So picture a graph with some circles and some lines. Between the circles and the circle is the object foo, and over the line between this class over here there will be a line that says bar. This is the message. When you write assert, you are making an assertion about the state, so the node.

Steven R. Baker:

When you use a mock, you are making an assertion about the edge, the message. So a mock is saying did, if you have two objects, a and and B, and A sends a message to B, the mock is testing that A sent the correct message, formed correctly, to B. It's not actually doing it, it's just testing that point. It's kind of hard to do this without being able to draw it out. I probably really should write this out and draw a diagram. But basically, a mock object is for making an assertion about the interactions between objects, and so mocks are for that for interaction testing and assert is for state testing, and both are used. This sort of leans up against a couple I I believe I said on twitter I have all kinds of unpopular opinions about testing. Um, one of my unpopular opinions about testing is that a unit test is only a unit test if it has one concrete instantiation.

Jason Swett:

What does that mean?

Steven R. Baker:

So if you have a unit test let's say you have one unit test for the FUBAR class and it's called FUBAR test If you're doing this, you know, in hardcore Stephen Baker hardcore style, the only object you're allowed to have an actual instantiation of is foobar. Every other object needs to be a mock or a stub. Got it? And so then the test and the unit that's under test can live together with nothing else in the system. Now, do I always write code like that? Fuck, no, but it is the ideal.

Steven R. Baker:

Another ideal that drives the way that I design my software is one assertion per test. The reason why one assertion per test is important is because when an assertion fails, the program stops executing at that point and you get your backtrace from there. If you have three assertions in a test, you know that the first one and the first one fails. You know the first one fails, but you don't know if the other two have failed or not, and so I try to design my tests such that each test only contains one assertion such that each test only contains one assertion.

Jason Swett:

I have a similar but slightly different view and a different motivation for it, or an additional motivation for it, which you might also share. I think a test tends to be easiest to understand when it only has one topic. I see tests all the time and I'm sure you do too where there are like five assertions and they're all like kind of different stuff. The setup might be the same, but then the assertions are very different concerns and it's like there's five different reasons why this test could fail, and that means that if the test fails, it's not abundantly clear why it failed. Five different reasons why this test could fail, and that means that if the test fails, it's not abundantly clear why it failed, whereas if there are five separate tests, then it is abundantly clear why it failed.

Steven R. Baker:

Absolutely, because you've made five different checks, and if only one of them fails, you are able to narrow down the reason for the failure very, very accurately, and so one assertion per test is one of those rules that I try to follow, yeah.

Jason Swett:

My slight difference from your view perhaps is that for me it's not one assertion per test, just one topic per test, and maybe that would be two very closely related assertions, for example.

Steven R. Baker:

Absolutely so. One of the examples that I use for this kind of thing is if I'm testing that I have a list and let's say I have some kind of custom special list and I add an item to that list, I might make two assertions there that the list size has increased by one and the most recently added item is the one that I just added. Because those are very tightly um, those are very uh and I would put them in the order of the count is increased. I put assert list dot size equals previous size plus one. I'd put that one first because the second one can't be true or is very unlikely to be true if the first one is not true. But I'm always so. I don't always follow the one assertion per test rule, but it's an ideal that I definitely try to to follow and I'm I'm when I break that rule, I'm absolutely making a very specific and deliberate decision to break that rule.

Jason Swett:

Yes.

Steven R. Baker:

So yeah, we're totally on the same page.

Jason Swett:

Yeah, great, yeah, so let's see Going back to mocks and stubs and such. This is something that a lot of people don't understand, and I think largely because it is very rarely explained clearly, and so my hope is that in this episode we can give people a little bit of uh, breath of fresh air with a straightforward explanation of what these things are, and I feel like you've already done a lot toward that end with your description of mocks and stubs and spies and fakes. Perhaps one difficulty that I've personally had is when I use a stub in a webmock, for example, I'll call stub underscore request, and it's very clear that I'm stubbing because I'm calling stub request. But then with rspec, for example, I might call um expect such and such dot to receive such and such dot and call original um. I didn't realize until this podcast episode, but I think tell me if I'm wrong. I think what I'm doing there is I'm using a spy you're basically making a spy from first principles.

Steven R. Baker:

Yes, okay, yeah. Because you're basically making a spy from first principles yes, okay, yeah. Because you're saying make sure that I did the right thing and actually do it. Um, and so you're, you're sort of manually constructing the idea of a spy. Now, I don't remember and this is funny because I think it's the only part of our spec that still has code that I wrote in it Um, I don't remember if our spec mocks uh, I don't remember if our spec mocks has the concept of spies in it. I cannot remember.

Jason Swett:

If it explicitly does. I haven't seen it, but that's, I don't use.

Steven R. Baker:

I don't use our spec, so I'm I'm not like. Well, I use it when I'm forced to like, when I'm on a team that uses it, but I don't use it.

Jason Swett:

How ironic.

Steven R. Baker:

And so it wasn't built for that, it wasn't built for using um. So, yeah, so that when you, it was built for teaching. Uh, so when you, when you, what you describe is exactly a spy is, is the the, you're definitely embodying the spirit of a spy. But a spy is like a rapper object that counts its invocative, that counts the messages it has received and calls the original. And so if you don't have a framework and I always recommend that when people are learning this stuff, they build I think that when you're learning how to do TDD, you should probably build your own testing framework. I build a testing framework in every language that doesn't have one, because I'm aggressive about TDD. So if you want to implement spies, that I would just use a wrapper class for that.

Steven R. Baker:

So yeah, so but that is what you're describing is. Is the idea of a spy, absolutely yeah.

Jason Swett:

OK, ok, oh boy, where do I go from here? There's so many things that I want to ask you about. I'm curious about this Are there any parts of RSpec, although you said you don't use RSpec, are there any parts?

Steven R. Baker:

of RSpec that you think shouldn't exist. Not really. I think that, like, rspec is good and people like it, um, and I'm I'm, you know I'm I'm very proud of that fact. Uh, I, I talked about this and if people want to, they can go watch my keynote from yuruko last year where I talked about where rspec came from and why I created it. But basic, the short version, is I created it to teach people how to do test-driven development without ever using the word test, and the reason was the word test, I think, is test.

Steven R. Baker:

The word test describes a post condition, and so in every one of my classes where I would teach tdd, very often I'd be brought in and I I started teaching td back in like 2001, 2002, um, and in every one of my classes, I was usually brought in by a company who was paying me to be there to teach this thing that they thought people should learn, and so I didn't always have a class full of people who were, who wanted to learn this thing, and there was always one dickhead in the group that said you can't test something that doesn't exist. Therefore, your entire premise is flawed. I am so smart and so I said okay, clearly, and I noticed that people were getting hung up on on the, on the test centric nomenclature, test-centric nomenclature, and so my goal was to teach test-driven development without ever using the word test, and so I said we're going to describe the expected effects of the software that we're about to write. Before we write it, we're calling our shots. If you've ever played pool, calling your shot is something that really skilled pool players can do, right.

Steven R. Baker:

Um, your uh, double entry accounting is very similar to test driven development and stuff like that. But, um, I wanted to be able to show people we're not testing. The test driven development is kind of a not a great word for it. Um, what we're doing is we're describing the behavior that the correct completed software will have, and so that's where RSpec came from. So, in terms of things that RSpec has, that I don't like. There are a few things that I just would have done without. Let is a perfect example. Let, if you look at the source code for let in our spec, it's literally just a wrapper around. It just creates a method that is called the thing that you named it so we already know how to create a method.

Steven R. Baker:

So if you do let colon foo bot, you know, curly brackets five, every time you, every time you uh use fo, you get back five. Well, you can also just do def foo, return five, end, and that is literally exactly the same thing. That's happening under the covers and so it's that little bit of indirection that it's just syntactic sugar. That doesn't really help anything.

Jason Swett:

There's nothing special happening there. Well, the part that rankles me a little bit is the fact that it's not just a method, but a memoized method, and so, um, it can lead to confusion.

Jason Swett:

Um, and you know, there's, there's the two versions of let regular let and let bang and the difference is that let bang is invoked immediately and I use let bang 100 of the time because to me the the regular let is like all cost and no benefit. It only poses a risk of introducing a subtle bug in the test without really it's like is. Is there really such a meaningful performance benefit in lazy loading the value that it's worth the risk of making your test harder to follow? I don't think so.

Steven R. Baker:

I would agree with that and if I was still in charge of RSpec, I would just get rid of let, I would replace let with let bang and of course I would just remove let altogether. But it does fall in nicely into the DSL that you get for describing your software and that's the nice thing that RSpec gives you. I get emails very often, several times a month, many times a month, from people saying hey, thanks for writing RSpec. I never could get testing until I saw RSpec, and I appreciate that. The reason why I don't use it and in fact I've never actually talked about this in a public way this is a perfect place to do it because I've been thinking, oh, I should really write this article on my blog and I just never do.

Steven R. Baker:

The reason I don't think people should use rspec is because it's cognitive overhead. So it's a dsl, so you have to learn a special syntax, um, and that syntax is. If you really understand ruby very well, then that syntax is, you know, fairly easy to get and the syntax is nice. It reads very well. That was the point. It reads like a description of what the software is going to do in plain English and that's very nice. That's the beauty of DSLs, but there's a lot of there's a lot of discussion around our spec best practices and stuff like that, and actually there's a website called better specsorg which gets passed.

Jason Swett:

What's that I said? You mean worse specs.

Steven R. Baker:

Yes, there's a website that everyone passes around called better specsorg, and I disagree with nearly every point they make on this website.

Steven R. Baker:

I think it's terrible advice in most cases, and so there's a whole bunch of people who sort of specialize in making rspec suites better and and, and you know, design best practices for rspec and stuff, and whether they're right or wrong, that doesn't matter. The my the reason why I don't like that overhead is if you use something like Minitest, your code is classes and methods, and if you use Minitest, your tests are classes and methods and technically your RSpec suite is as well, but it's abstracted away so far that it's not obvious or classes and methods, and you learn good practices about how to clean up your code and make your code better. You can apply everything you've learned, from making your code cleaner to making your test suite cleaner. All of the same rules apply when you're using RSpec. The rules for making your code cleaner do not cleanly apply directly to making your RSpec suites cleaner, and so now you have to learn an additional skill. You have to learn this funky DSL and I often like people are always amused when they see like and I write RSpec and stuff.

Steven R. Baker:

They love pointing out that oh, did you know that RSpec can do this? And I say no, I fucking didn't. Um, I just I'm not aware of all of the little syntactic things and I put most of them there um, uh, originally, and uh, I'm just. I sometimes forget where does the dot go. Um, oh, the one that always messes me up is is it two? Underscore not, or not dot two, or is it both? And I can never, I can never remember.

Jason Swett:

I think it's actually both these days but I thought it was dot not underscore two or that I never remember these things.

Steven R. Baker:

I? I, every time I need to do it, I, I will remember. And and if I haven't written one like that in a couple of days? So it's escaped my brain and I will find it again later. But that's why I don't use RSpec, because Minitest is my but X unit. And I have the second reason coming up. An X unit framework is just classes and methods. So the things that you know about making your classes and methods in your production code cleaner apply cleanly to your test suite. That's beautiful. We already have inheritance, we already have dynamic dispatch, we already have all those things. And so you know we can clean up our test code in the same way that we would clean up our production code. It makes a lot of sense. There's no context switch, there's no sort of cognitive overhead of switching between the test and the implementation. And the second reason why I suggest not using RSpec is that if you learn the X unit pattern of testing, that cleanly applies. It's so simple. It cleanly applies to every language.

Steven R. Baker:

So, um uh J unit tests, by the way are you talking about this? Uh well, so uh, the book that you're holding up is x unit test patterns, um, the, the. The pattern of you have a test class, that it, that subclasses test case, and all of your methods start with test and you do your checks with assert. By the way, is this are you, are you publishing video as well, or is this audio only? No, just audio okay, that's why I read it. The that's why I want to read it.

Steven R. Baker:

The title of the book yeah um, so that that style is called x unit, because we, the, the first one was s unit in small talk, and then we got j unit in java and so every other language had initial unit. So it's called x unit pattern because if you learned how to do TDD in small talk, you can now apply the X unit pattern to doing TDD in Ruby. And some languages don't have classes Like. I recently learned racket. Well, a couple of years ago I decided to do to learn racket, and rack is a, is a list play as a scheme, and it's a. It's a functional programming language. It sort of does, but it just has functions, and so you do have to figure out, okay, what is the equivalent of a test case? What is the equivalent of a cert? Okay, now I know, because the X unit pattern just describes there's a test suite which contains many test cases, which contains many tests, which contains assertions. So it's just a shrinking thing, and as soon as you understand that pattern, you can apply it anywhere. Now, rspec, because it's a wacky DSL, cannot look the same in every language, and RSpec has now been cl same in every language. So, and and RSpec has now been cloned in every language, like anyone who does JavaScript, and so so when I tell people I don't often I mean I do say that you know I created RSpec, but that's not the important thing that I did.

Steven R. Baker:

The thing that I did was I changed. I wanted to change the way that people talk about testing, and so the the R spec style test is what I call the describe it pattern. Jest is a describe it pattern testing library, but because JavaScript isn't as expressive as Ruby, you have to put in the parentheses and the arrows and all of this extra syntactic sugar to get blocks to work and stuff, and so it just doesn't feel right. And so I don't, even if you do like RSpec, I don't think you should use that style in any other language except probably Smalltalk, maybe Lisp or Lisp-like except probably small talk, maybe Lisp or Lisp-like. But you need a really expressive language to be able to do the describe it pattern without adding a lot of just mountains of extra syntax. So, yeah, so I tend to stick to XUnit pattern stuff because that's available everywhere. Rspec clones are available in lots of places, but I have to learn a lot more about the language, like if I'm new to JavaScript and I want to write tests and jest using the describe it pattern.

Steven R. Baker:

I now have to immediately learn how to make a block in JavaScript and how they work and what is it for. And I know how blocks work and stuff, but not everybody does. But that's not a syntax. If I'm learning JavaScript, the block syntax is not going to be the thing I want to learn on the first day, but I am going to write tests on the first day. So one of the things I do is I use the X unit pattern to learn new languages.

Steven R. Baker:

When I wanted to learn Racket the first thing and I worked with LISPs and Scheme before, but when I wanted to learn Racket I did some days of the advent of code last year the first thing I looked up in the Racket documentation was how do I make an assertion, how do I group my tests together, how do I run my tests? And then I didn't learn anything else. I just wrote a failing test and then I learned just enough of the language to make that test pass. And so itdd language learning as well that makes a lot of sense because I know that pattern.

Jason Swett:

Okay, let me ask you a fucking question, um, that is so seldom explained anywhere. Okay, it's explained everywhere, but only in a superficial way that I don't think really gets the point across most of the time. Why do we write a failing test before and I know we talked about this already, but why do we? Why do we write a failing test before we write the code? And then part two of the question is why do we write only enough code to make the test pass? Third part 2.5 of the question is do you even look at it that way of writing enough code to make the test pass, or do you look at it a different way? So, first part of the question why do we write the failing test first?

Steven R. Baker:

Okay. So the reason I write the failing test first is because it gives me a clear direction of where I'm going. So what I'm doing is is by writing that failing test is I'm expressing what. I'm describing what the working software will be, what the behavior of the working software will be when it's complete.

Jason Swett:

Man, I realized that my question was, even like, not precise enough. My first question was actually multiple questions. It was like there's writing the test first and you could construe that as writing down the specification, and that doesn't necessarily have to have a lot to do with running the test.

Jason Swett:

And that doesn't necessarily have to have a lot to do with running the test, because the act of writing down that specification before you write any code, that's its own act with its own reasons, separate from the act of running that test, which has different reasons or overlapping reasons or whatever.

Steven R. Baker:

Whatever yeah, so. So one of the reasons why I practice tdd so religiously is I'm I am often more verbose than I need to be, and people anyone who listens to me talk knows this, uh. And so I don't necessarily know how to you know when I, when I've made my point and the test, the failing test, when it passes, I know I've made my point, I know I'm done. This software does everything that I said it would.

Steven R. Baker:

Now I can take the next step, and if I don't write the test first, I'm very likely to write more code than I needed. So when I tdd, I end up with less code. And the other reason why I write it first is I guarantee, if I strictly follow the tdd practice of you know, um, make, write a failing test, make it pass, refactor, I know I don't even use I don't use code coverage tools because my I'm so glad you said that because my code coverage is always a minimum of 100, because the every line of code, every line of production code that I wrote was only written in order to satisfy a failing test.

Steven R. Baker:

There was no x, there was no other code written there right.

Jason Swett:

by the way, I have a feeling you share a characteristic with me where you hold positions in which you are in the vast minority. Nevertheless, you're completely correct. You know, like the test coverage thing, Like we're in such a minority by saying test at least this is my words I think test coverage is bullshit and useless.

Steven R. Baker:

Yes.

Jason Swett:

But we're totally in the minority with that.

Steven R. Baker:

So a coverage tool is important for bringing a legacy code base under test, and I define legacy code as any software for which there was not a test. I believe that comes from Mike Feathers um, I believe that comes from mike feathers, uh, and so the coverage tool is really important there, to let you know that. You know when you hit a hundred percent. Okay, now I can start doing some refactoring. That's the other reason why I do tdd. I do tdd because I get less code. It winds up being cleaner. I get I immediately get 100% coverage. And when I have that high level of coverage and that high confidence that my software does exactly what I said it would, I can refactor mercilessly yes, mercilessly, but the so there's less code written On top of refactoring mercilessly.

Steven R. Baker:

I already know that my code that was written with TDD is decoupled by the definition of it Because it's already used by another thing. So let's say you have a class that's only ever instantiated once in your production code. It has one caller, it has one user of it. But if you have a test suite and so if you have one piece of code that you like, one method that you only use in one place, you might end up tightly coupled with some tight coupling there, when you practice TDD your code. That method is now being called from at least two places Once where it's used in production code, by its by, by something that depends on it, and once by the test. So your code is now D, a decoupled unit by definition.

Jason Swett:

Yeah, yes.

Steven R. Baker:

And that's an additional benefit.

Jason Swett:

So your code is now a decoupled unit by definition. Yeah, yes, and that's an additional benefit. Okay, so I want to. I dragged us off the trail, but I want to bring us back to the failing test aspect. So some of those things that we talked about are things, things that I that I definitely wanted to, uh, points I wanted to make. There's an additional point that I wanted to make, too, which is the reason for one of the reasons for having a failing test first is to test the validity of your test.

Steven R. Baker:

Say that again.

Jason Swett:

One of the reasons for starting with a failing test is so that you can test the validity of your test.

Steven R. Baker:

Yes, absolutely. This is that rule that I talked about Don't trust a test that you've never seen fail.

Jason Swett:

Right.

Steven R. Baker:

I've seen a lot of people who don't write the tests first. They will go and write some code and then say, oh, I really should put a test around this. And they go and write a test and the test and I look at the test in a pull request, review or whatever, which I also hate. The pull request workflow, um, trunk-based development for life, Um, the. I'm looking at a pull request and I look at the test and I go this test can never possibly. Can never possibly fail.

Jason Swett:

Right False positive.

Steven R. Baker:

Right, and so, um, with TDD you can do that. One of the things that I'd like to do with myself even. There's this practice called um evil pair that we do in code retreat, um, and I don't know where it came from. I learned of it by way of Paul Hammond at ThoughtWorks. He doesn't even remember meeting me, but we worked together very briefly. We paired on something like almost 20 years ago at ThoughtWorks and he was this infuriated me when he did it but you would write a failing test. You're pairing two people at one computer. We only paired for like an hour.

Steven R. Baker:

He would write a fail, you, I would write a failing test, or sorry, he would write a failing test. I would make it pass and then I would hand the keyboard back to him and he would go and delete all of the code that I had just written. That was not necessary for that test to still pass, and it really taught me a lot about moving in smaller incremental steps and being more deliberate with describing my software yeah, okay, so this is something I want to go deeper into because it's really important.

Jason Swett:

um, and let me just wrap up the other thing first, the failing test is so you can test the validity of your test, and not only should you see the test fail, but you should verify that it fails in the exact manner you expect, because sometimes I see people write a failing test and it just gives like some random error or something. It's like hang on, like that's not of. No, not all failures are equally valid, like that failure we just saw that doesn't make any sense, that like tells us that something is weird.

Jason Swett:

We need to see the failure, not just any failure, but the failure we expect exactly my bit.

Steven R. Baker:

One of the big bugs up my arse about this is um in our spec. You can say expect block uh to raise error okay you can tell it. I expect an exception to be thrown here oh, without saying, which exception you expect to be thrown. What the fuck is the point of that test?

Steven R. Baker:

saying which exception you expect to be thrown. What the fuck is the point of that test exactly right like if you're not saying why that the the assertion needs needs to clearly communicate why it failed. And also it needs to. It needs to fail for one reason and one reason only. Right. There shouldn't be multiple things that could happen here that could cause this test to fail, and, of course, I'm speaking in ideals here. Not everyone works in an ideal situation. I'm currently working on a code base that was started in 2007, and it was migrated from Minitest to RSpec, and that migration was not ever completed.

Jason Swett:

Oh no.

Steven R. Baker:

So we actually have two test suites. This is an old code base and nobody that has worked on it before ever tdd'd, I don't think. And it's very difficult to tdd on this code base, um, and I'm not like I, I'm not, I I'm not, uh, you know, criticizing people for, for not doing it or whatever you know. If, if you don't want a TDD, fine, that, that's fine. I know it's better for me. I think it's better for you as well. But if you don't want a TDD, well, look, you know, it's your funeral. Uh, that's fine. I'm not, I'm not going to argue with with with people about it, um, but it's uh, when you're not, when you're not able to do these things, it really slows me down and really hurts, and I have a lot less confidence that the software that I'm writing is actually doing exactly what I said it would.

Jason Swett:

Right. Okay, I don't have a smart response to that, but I want to bring us back to okay. We talked about the reason for writing the test, oh, okay. Yeah, this is what I wanted to go back to Writing just enough code to make the test pass. That's like ubiquitous in the TDD definition, but I don't exactly agree with that. I think you should write just enough code to make the current failure message go away.

Steven R. Baker:

Ah, yes. So we say just enough code to make the test pass. But what you're looking for is not necessarily pass. You're looking for make the test pass or change the failure reason. So it's totally okay for a test to fail three times on three subsequent runs after changes. But I really want the error message to be different on every run.

Jason Swett:

Right and why.

Steven R. Baker:

Because that means that I've made movement.

Jason Swett:

I've made forward movement Okay, Maybe not forward but why do you and I prefer to just write enough code to make the current failure message go away rather than going all the way to writing enough code to make the test pass?

Steven R. Baker:

It's a smaller step, and smaller steps are easier to undo, and smaller steps result in smaller overall code.

Jason Swett:

Yeah, yeah, I'd agree with all that. Like, we've talked about this idea of what I call speculative coding, where you write more code than what's necessary to satisfy the requirement and if you're going to try to write enough code to make the test pass, sometimes that's like kind of a great distance and it gives you more room to code speculatively, where if you only code enough to make the failure message go away, then that's less rope to hang yourself with. And what's an example of when you might oh, here's an example I might write a class that, sorry, I might write a test that calls a method on an object, but no such object exists, no such class exists.

Jason Swett:

So I get an error saying there's there's no such class, and so I'll say, okay, I'll define the class but do nothing more. Then I'll run the test and I'll say, okay, well, now there's no such method on this class, so I'll write the method at each step. I know I'm not going to make the test pass, but if I, like you know, go initialize, if I define the class and then define the method and like, do all this other stuff, I don't trust myself not to make a mistake and accidentally put something in there that's not actually necessary.

Steven R. Baker:

Yep, and this is the same thing I don't trust myself to not. I call this getting a case of the might as well. So, oh well, since I'm creating this class.

Steven R. Baker:

I might as well write this, okay. Well, actually I probably also need this initializer too. Oh, and, by the way, I'm eventually going to need this class. Even though I'm not using it to make this test pass, I'm eventually going to need this interactor that gets passed in the constructor, so I might as well just put it there and store it for later, right, and so then then I wind up with a whole bunch of stuff.

Jason Swett:

That's not that that's not tested yet, that I didn't strictly need to get that test to pass um, yeah, so I knew, to avoid the might as wells um, okay, and so obviously, like there are obvious costs and risks to having code that's not covered by tests, are there any other reasons why you think it's bad to get a case in the might as wells and code speculatively and add code that you're not sure is needed?

Steven R. Baker:

Oh, absolutely. I'm not smart. Other people have bigger brains and more intelligence than I do, but I don't have enough intelligence to be able to fit the entire world and the entire future into what I'm doing. And so if I build more than I need right now, chances are I'm going to have built the wrong thing. This is in the agile community.

Steven R. Baker:

We call this the last responsible minute. You delay every decision. You delay it until the last responsible minute, and the reason we do that is because the last responsible minute is the last point where you can no longer not make this decision. And the reason why you delay making those decisions is because the longer you can kick that can down the road, the more information and the more knowledge you have, and so the better informed decision you can make. And so if I get a case of the might as well and I go, oh yeah, this thing is going to need to talk to that thing, so I might as well just put in a constructor parameter and stuff, even though this test doesn't need it and stuff, well, I might find out three days later that after a refactoring, I realized that, oh no, I don't need this interactor, I need this interactor instead. And now I have to go back and change all of that code that's not tested, because it was added there just because I might as well.

Jason Swett:

Yeah, that's really interesting. I hadn't thought about that angle of it. I definitely have seen that you add something cause you might as well, and then it turns out to be not just not needed, but like maybe it kind of addresses a need, but it's like wrong. Uh, so it's, it's a bad prediction. Um, the reason that I was thinking of is all code has a cost, and if you add code without being sure it's needed, then you're paying. If you add code that turns out not to have been needed, then you're incurring a cost for zero benefit absolutely.

Steven R. Baker:

you're incurring maintenance cost. You're incurring if you're not working in ruby, if you're working in swift or, even worse, scala. You're incurring compilation time, uh cost, uh on, because the, the Swift, is actually getting a lot better at this, but the Scala compiler is, uh, ridiculously slow. Um, and when I worked at Neo4j, we had one line of Scala that took 17 minutes to compile on modern hardware and I don't remember it was in the, it was in cypher query language and I don't remember what it was doing or whatever, but it was the build times all increased by 17 minutes and we narrowed it down to this one line of code and it turned out that we actually didn't even need that one line of code. Somebody had put it in speculatively.

Steven R. Baker:

And so that is a very real example of where a real cost. Now, everyone. Now, when I worked at Neo4j, I was the only person in the entire company who ever ran the entire test suite on my local machine. I actually find that on a lot of teams, most people don't run their entire test suite. They let CI do it and, by the way, that's not CI, but we call it CI. They let CI run the entire suite and they only run the tests that they're looking at right now. And I don't do that. I run the entire suite. And if running the entire suite takes too long, that's pain. It's important to feel that pain so that you're incentivized to fix it. And so if you're solving the pain of a long test suite so that you can continually rerun your entire suite over and over and over again locally instead of waiting for ci, guess what? Ci is going to be really fucking fast too that's what you're getting.

Jason Swett:

That reminds me of of another heretical minority opinion that is correct, um. So no offense to everybody who this applies to, which is basically everyone, um, but I think the way that people structure their test suites in rails is idiotic and insane I would agree with that.

Steven R. Baker:

Rails resists testability in a lot of ways.

Jason Swett:

Hmm, tell me more.

Steven R. Baker:

So the best example of this is I want a unit test for my controller. Want a unit test for my controller.

Steven R. Baker:

I can't do hosts controller, dot new and pass in everything that I need to so that I can instantiate one posts controller and then poke at it with my tests to write unit tests yeah I actually have to interact with it, using through the whole router, through the whole routing system, and then my, my test, every single functional test, and you should have functional tests that go through the router, but my unit tests shouldn't have to, and that's why you need all these extra helpers in Rails. Now. Rails is so much more productive than anything. Rails is great and I love it, and so I still use it despite these annoyances. But it would be really nice if I could unit test my controllers, because I would really like to.

Jason Swett:

yeah yeah, um that's one example the aspect that I that less is like it okay. So test suites and I guess application code too. It's like two-dimensional but sense of. Okay. With some two-dimensional data sets you can represent them or retrieve them on either dimension. But with a file system you have to pick one of the dimensions to be primary and the other dimension to be secondary.

Jason Swett:

Okay, okay and in our spec test tweets. The only strategy I've ever seen is they pick the um test type to be the primary dimension and the domain idea, the domain concept, to be secondary um, and I think that makes no sense. Um, because when are you ever going to be like and I've talked about this a lot of times but like you're working in the billing area of your application Um, and you're like does this, does this uh piece of behavior?

Jason Swett:

have a test, and where is it? If so, you're not going to. Now you have to like you can do a grep search, obviously, but that only gets you so far. Now you're going to have to like, one by one, dig through like spec what is it Spec system billing, spec models, billing whatever?

Jason Swett:

Wouldn't it be a lot better if you could go spec billing, and in this ideal world that doesn't exist. Your Rails app is neatly organized into namespaces, so it's a nice tree so that you can, and then the test suite mirrors that tree, so you can find the test and then the other thing that's really nice about that is you can run the tests.

Jason Swett:

You can run the single test for the local area that you worked in, if that passes for the local area that you worked in, if that passes, you can go up a folder and run all the tests in that general area. Again, go up a folder until you get to the root and run the entire thing. The way it is now, you can run your individual test and then it's like okay, from there you just have to run your whole test suite, because it's not organized in such a way that lets you do that.

Steven R. Baker:

Yeah. So there is one of the things and I'm not sure if I like this or dislike it. I don't think I would have. I like to tell myself that I wouldn't have designed it this way, but I'm not sure what I would have done differently. There's a one-to-one mapping in Rails between your model and the test for it, and in RubyMine you can hit Shift-Command or Shift-Control-T. That will take you between the test case and the class that's under test and that works for models really well. It works pretty well for controllers Works a little bit less well when you get further and further up the stack, because then you're coming from the perspective of the user. Um, uh, I think that the way that rails does this is because the tooling in ruby is fucking garbage. So this is not about testing or mocks. But let me go on a on a slight tangent about, um, dynamic static languages. I fucking hate static languages. They don't give me anything and the type supremacists annoy the shit out of me more than anyone else.

Jason Swett:

Nice.

Steven R. Baker:

And because they, I wrote an article on my blog called Static Typing Means Some Tests Are Unnecessary, and it got and I and you know spoiler alert basically the only test that static typing I pointed out in this article. The only test that static typing prevents you from writing is assert. Fooclass equals foo, and if you wrote that test then I would fire you anyways. So that is the only test that the type system would prevent you from typing. You don't actually get to write fewer tests. If you are actually practicing TDD in a static language versus a dynamic language, you will not wind up with fewer tests in the static language. That's a fucking myth, it's laziness and it's an excuse, because programmers are always looking for excuses to not write the tests and that pisses me off.

Jason Swett:

well, there's all kinds of straw manning around testing yes and I think you know there's there's so many, there's so many things under this category but, like a lot of people who say they don't like testing, it's like no, no, it's not that you don't like testing, you don't like stupid shitty testing, but stupid shitty testing is like 90 of the testing that you see.

Steven R. Baker:

So like no wonder people come to that conclusion right and I, I think a lot of it is arrogance, like Like, um, these people just, you know, they just think that they're that, that they're too smart to need a test. And I'm, I don't have that level of arrogance, I don't think that, um, you know.

Steven R. Baker:

I, I'm not a very I'm not a super, a super intelligent programming wizard. I'm a very mediocre programmer and I focus on having exceptional strict habits to make sure that I'm always doing a consistent job. In MBA, in an MBA course, they will. One of the first things they will teach you is that the number one metric for quality is consistency. Okay, and they. The one of the examples that I heard in an mba course I took one time was mcdonald's. Mcdonald's hamburger a big mac is very high quality. Now there are a lot of reasons why you might consider that it's. It's low quality, it's not good for you, it's got too much salt, you know whatever. But I can go into any mcdonald's anywhere in the world and the big mac that I get is going to be exactly the fucking same.

Jason Swett:

Right, it's high quality in the sense that it meets its specifications.

Steven R. Baker:

Exactly, and it's repeatable, and repeatability is a great metric for quality, right? Because that gives you consistency and repeatability gives you predictability, and predictability is really key. So I wanna come back to dynamic versus static for a minute, because there is a point I wanna make here about Ruby tooling sucking ass. One of the reasons why the type supremacists are taking hold right now is because the way that you develop software in a static language is you write some code, you compile it. If it compiles, then you run it, and those are distinct steps. There's the edit step and the compiling step and the running step, and those are very distinct things.

Steven R. Baker:

In a proper dynamic language like Smalltalk, you are living in a live environment. When you create a class and you create a method in Smalltalk and you save, it also gets compiled into Smalltalk bytecode. But it's there, it's in the live system. That system is running and if you're writing a web application in Smalltalk which I've had the pleasure of doing a few times many years ago you literally are working in a live image and you can see the requests coming in, the request, objects are being collected and stuff. It's. It's a live system and you're uh, it would be, uh, okay, so, and that that solves all the problems that the type supremacists say that we have in dynamic languages.

Steven R. Baker:

Dynamic languages or type supremacists will say to you oh well, how do I, how do I know if a foo, if, if the, if the object foo, uh, has the, is of this type, and I said, well, why do you care? You just want to send messages to it, so why do you care what type is okay. Then they'll say how do I know that this thing has this method available? And the answer is ask it. So I don't do in my Rails apps or in my Ruby software. I don't do. If this thing that you passed in was an array, do this. And if this thing that you passed in was a string, do this instead. If I have a method that operates on both strings and arrays, I don't do that check. I say which part of the array API am I using and which subset of the string API am I using and which question can I ask of this object to know which of the two APIs I need to use?

Jason Swett:

Yeah, it's not what are you, it's what can you do Exactly?

Steven R. Baker:

yes, I love that, yes, yes. And so the problem with modern dynamic software, ruby on Rails being a perfect example of this, and then I'll tell you about my hopes and dreams for how I wish we could develop Rails software. So the problem with this is I've explained the static programming thing. You edit, compile, run are three distinct steps. We develop in dynamic languages like Ruby when they're dead, we do the work on them when they're dead, and then we save the files and then we bring them back to life again by running them the way I want to work on it, and so you lose a lot of the introspection and stuff. How do I know what type the foo is? Ask it. How do I know what it can do? Ask it. Okay, but I can't ask it in my IDE. That's right. Your IDE is stupid. Even RubyMine, which is the best example of modern tooling for Ruby, is still static language tooling for a dynamic language, and it's very difficult to do that because they're applying static language stuff. That's why we're starting to get bullshit like the fucking type hints, rbs and whatever the other stupid shit from spotify or from shopify is called where they want, like type annotations and ruby and stuff. Who the fuck wants. I don't want that, and anybody who wants that should not be writing in a dynamic language.

Steven R. Baker:

Dynamic languages are lovely, so so what do I do when I'm writing code? I will often run my Rails server, spin up a Rails console, write a test in the Rails console, make the test pass in the Rails console and then copy and paste that code into a file and save it. I wish that the development environment for Rails was a live environment, like in Smalltalk Start the server, write a test and I want a live GUI. I want that test running against the live server. I want all of that code running. I want to be at runtime. In the Small talk and in a proper dynamic system, small talk self. There are a lot of other examples Emacs, a list machine. You're always at runtime, always at runtime. And developing in a dynamic language is best done when you're at runtime. So you want to be at runtime as much as possible.

Steven R. Baker:

One of the things that I think something like RubyMind should do instead of parsing Ruby code to figure out what the fuck it does to help you move around, it should execute that code. And people say, oh well, there are file descriptors and databases and whatever. Okay, guess what? You know how you can get around that. Rubymind should run all of your code to figure out what it does, because then it can run it and ask it what it does. Right, but you don't want it touching the file system, you don't want it making changes. Okay, guess what? All of that stuff that can make external changes, that can make network requests or file requests or whatever. It's all in the IONamespace in Ruby. So make a fake I-O namespace and now run all of your code and do as much of your development as you can at runtime, because there's nothing more valuable than running working software and in a proper dynamic environment. Your software is always running, you are always at runtime, and so that's the problem. That's.

Steven R. Baker:

The disconnect here is that we are applying static language development practices to dynamic languages, because most of these people learn Java in school or whatever, and that's what they learn. And then they come to Ruby and they apply the way that they write Java to Ruby. Ruby is a completely different beast. You shouldn. They come to Ruby and they apply the way that they write Java to Ruby. Ruby is a completely different beast. You shouldn't write your Ruby like you write your Java. You should write your Ruby more like you write your Smalltalk. Now, of course, these live environments don't exist, but it is within the realm of possibility that they could exist because Smalltalk exists.

Steven R. Baker:

I also had the pleasure of working on Maglev years ago ago. We shipped the product to production on maglev, which was a ruby implementation on the gemstone small talk vm, which, uh, gemstone, by the way, is like the best database ever. Um, all these problems that we think we have about availability and scalability and extensibility and all that shit, it was all solved by gemstone in the eighties. It's fucking amazing technology, um, banks rely on it. There are many stories of banks trying to get rid of, uh, uh, you know, move away from small talk to Java or ornet so that they can, you know, more easily hire programmers because small talkers tend to be old and have gray hair and are retiring soon. Um, and I be old and have gray hair and are retiring soon and I've heard many stories of different, usually at banks, where they have a team of like six or eight small talkers building like a mission critical banking application, like literally handling billions of dollars of financial transactions at a high rate of speed the backend systems at banks and they want to replace it with Java so it's easier to hire people, and they replace that six or eight person team of small talkers. They end up replacing it with like 300 Java programmers and in most of the cases that I've heard of that, they just abandon it and teach small talk to newbies. And that is because when you have a dynamic live system, all of the introspection tools are really great.

Steven R. Baker:

When you are doing static analysis of the code to figure out where is this class defined, where is this method defined, what kind of parameters does it take? Well, one of the things you can do in Smalltalk, which you can do in Ruby as well if you're at runtime, you can run a method in Smalltalk and then do an inspection on it and say, okay, which messages did this method send to which other objects? Oh, wow, now you have a complete list of all of the messages, all of the subsets, of the APIs that you've used, of those things. And guess what? It doesn't know if this is like, let's say, a string and an array, both have a count method on it, right? Well, if I call foo count, if I send the count message to foo, or size or whatever length, if I send that message to foo.

Steven R. Baker:

Well, the type supremacist will say well, how do you know if foo is an array or a string? Well, guess what, it's one of those two, because we know we have a live environment. I can say show me all of the objects in the entire object space that respond to the message length or count or whatever it is and it'll say these 12 classes respond to this and you can say cool, I don't need a type annotation because Smalltalk knows that it's not possible and also Pharaoh will do this for you. I'm not sure if it's by default or not. If you write a method and you have an object and you send two messages a message that's on array but not string and a message that's by default or not.

Steven R. Baker:

If you write a method and you have an object and you send two messages a message that's on array but not string and a message that's on string but not array to the same object, the Smalltalk environment can tell you you just sent these two different messages to the same object. It's not possible because there is no object in the system that knows how to respond to both of those messages. Are you sure you want to save this code and run it. Oh wow, that's the kind of stuff we could do in Ruby if we were developing in a live environment Okay.

Steven R. Baker:

Yeah. So there's a whole bunch of stuff there. We need to get the best of a dynamic language. You have to get it at runtime. Now it's hard to get it at runtime because the tooling just isn't there. We don't have live environment.

Steven R. Baker:

Irb and Pry and stuff are really great. Those are great live environments, but they're not really great at it's hard for me to say, hey, that class that I just wrote, write it to this file on the file system, persist it, right, that code is in memory when you write it in IRB, but it's gone when you quit IRB, right? So we need a way for IRB to be able to say, oh, you just created a controller, I'm going to save that class and this file name, and Rails has the conventions down pat so we could actually do this. It's actually it's not super difficult to actually write this stuff, but small talkers are the ones that know how to do this, and there aren't that many small talkers writing Ruby, because if you can get paid to write small talk, you never fucking write anything else, and so, and so those are the. Those kinds of practices applied to Ruby would make all of the type supremacist arguments to disappear.

Jason Swett:

Interesting.

Steven R. Baker:

And the last point on that, to bring it back around to testing, by having, by doing TDD and constantly rerunning that test suite. I'm not at runtime all the time, but I'm at runtime very often. Say it again so every time you run your test suite, you are at runtime which is what you want for dynamic stuff, right. This is why I don't often use a debugger. I very often just you know, puts and run the test.

Jason Swett:

Right, Especially because, like you know, you have your test Because I'm at runtime.

Steven R. Baker:

I can ask this object what can you do? What do you have for me? Just run the test.

Jason Swett:

A debugger is a tool at least here's what I think. Maybe this is not right, but to me a debugger is a tool to aid in mystery solving. And when you have good test coverage, you just don't tend to have a lot of mysteries. And when you do have mysteries, they tend to be very small mysteries that aren't very hard to to to to research or whatever yeah, absolutely.

Steven R. Baker:

And and I'm shitting on debuggers, not not because they're not useful. They are very useful when you need it. There's no tool like it. But also in in small talk you might actually do all of your.

Steven R. Baker:

Some people who write small talk do all of their development in the debugger oh interesting so they'll go and write out a class and say okay, I want this kind of, I want to instantiate this class. So I'm going to send the new message to this class and then you save, and before it compiles the bytecode it goes hey, there's actually not a class called that. Do you want to save it?

Jason Swett:

Or do you want?

Steven R. Baker:

to proceed. Or you run the code and it goes I couldn't find this. And the debugger pops up. Well, guess what? You can write code in the debugger and hit proceed interesting yeah, so and and.

Steven R. Baker:

So if we get to dynamic land in ruby, if we could get the dynamic tooling around and I'm not going to write it. I'm in the twilight years of my fucking career. I'm very unlikely to write this. I've been doing this for almost 30 years. I'm definitely not going to continue doing this for much longer and I'm so I'm not going to write these tools, but I do wish we had them. If I was going to spend time on this stuff, I would just write in small talk, where we already have these tools.

Jason Swett:

I want to ask you one more question, because we probably have time for about one more question.

Steven R. Baker:

Um, um okay, after that I want to come back around to mock objects. I have one more thing I want to talk about.

Jason Swett:

Oh, okay, yeah um, my question to you is do you think test-driven development yields software with fewer bugs than software without tests, and if so, why?

Steven R. Baker:

I would say unequivocally yes For some definition of fewer bugs. First of all, when you do TDD. If you're practicing TDD, doing small iterative steps, there are fewer places for bugs to creep in. When they do, your test will fail and you immediately know that the bug is there. You will still sometimes wind up with the bugs where things that you just didn't think about and so that just gives you an opportunity to write a new failing test that exposes that bug. I would say that you absolutely wind up with fewer bugs in code where you've done tdd, mostly, like I said, because of the small incremental steps yeah, I have a um, okay.

Jason Swett:

So there's another reason which I I suspect you're well aware of, but maybe you just haven't said it yet um, which is, in addition, okay. So, like I was thinking about this earlier today and trying to think of all the reasons, like with tdd, first of all, you're less likely to introduce a bug in the first place and then if you introduce a regression, then your tests are likely to catch those regressions. So that's two reasons.

Jason Swett:

Yep, but then a third reason is that if you are periodically refactoring and, like you, I like to refactor mercilessly, because why wouldn't I you can keep your code base a lot tidier and easier to understand than you could if you don't have any tests. Um, and you know, like a messy, dirty kitchen that that provides a hospitable environments for pests to thrive in a messy, confusing code base. Provides a hospitable environments for bugs to thrive in a clean, tidy code base does the opposite yes so it's not just about the individual tests, but just the general ecosystem is less hospitable to bugs.

Jason Swett:

I think that's one that maybe people think of a lot less yeah, absolutely so.

Steven R. Baker:

Um, I also think of the test as documentation. So, um, and the best example of this there's a library in Swift for testing and I can't remember what it's called, but it's for testing Swift UI stuff. But the test suite for this testing library and it's called view component or something like that, or view I can't remember what it's called. I can get you the link to put in the show notes if you have those, because I'll ask John Reed for the name of it. We used to use it a lot when we wanted to figure out how to write an expectation for a piece of the Swift UI code.

Steven R. Baker:

I thought, oh well, I wonder if this thing supports lists. And so I just went into the test directory of this, of this library, and it said list test dot Swift. And I opened it up and it was exactly the right amount of tests that each test and actually Brian Merrick calls this example driven development, each test and we call that in ourc, we call them examples Each test is an example of how you might do this thing in code, and so the tests are documentation as well. If you want to know how to use this class, look at the test, It'll be obvious.

Steven R. Baker:

So I think the test documentation is a key thing as well.

Jason Swett:

Yeah, definitely. And one of my consulting clients said something that stuck with me, so I think the test documentation being the main thing and the application code as something secondary. And then, actually much later, I found out somebody told me of something that Bob Martin said, which was, if I could, if somebody was either going to come and delete all my application code or delete all my tests and I had to pick one I would let them delete the application code and keep the tests, instead of the other way around, because if I have the tests then I can reproduce all the application code.

Jason Swett:

But if I have all the application code it's going to be a lot more expensive to recreate all the tests.

Steven R. Baker:

Right and you have to have. From my perspective, you have to have all of those tests in order to be able to safely move the product forward, and so I would absolutely do that. In fact, I'm pretty sure and the people that are smarter than me can tell me I'm wrong than me can tell me I'm wrong If you have a test suite that covers every case, I'm pretty sure it would be pretty reasonable to write a thing that could generate the application code from the failing test suite.

Jason Swett:

That is a fascinating idea.

Steven R. Baker:

I've been able to replicate it on really small, tight problems. I've never been able to figure out how we would do it for a large application size. But people, AI code generation is all the rage right now. I don't want AI generating my fucking tests, but you know what? I could see a point in the future where I write the test and I would let AI write the code that makes that test pass. I probably wouldn't because I'm old and curmudgeonly, but I could see that as being reasonable.

Jason Swett:

Well, I've already done kind of a lot of work that way. I actually tried to write a Vim plugin where I could write a test, and then actually it was. I would have chat GBT, I would have the open AI API generate the test, I would have the OpenAI API generate the test and then I'd run a key command and OpenAI would write the code to make the test failure go away, and I could just repeat that cycle. But I found that the technology just isn't there yet. Yeah, the tests it wrote were actually okay, as long as there's somebody who knows what they're doing, steering the ship. Uh, I, I could tell chat gpt what test to write and it would do the grunt work for me. So that's great. But then the code that it would write would, um, be too too speculative.

Steven R. Baker:

it always would add more than necessary and for some, because all these ais are trained on speculative untested code. Oh yeah, so that's all it knows how to generate. Uh, jeff langer has a new book, either out already or coming out, that has a chapter on on on doing tdd with ai, and I believe he's also teaching workshops on that. Um, and so people should look up jeff langer and get in touch with him if you're, if you want to learn about this stuff yeah, interesting.

Jason Swett:

yeah, I use it for like well, I call it intellectual grunt work where, like I don't know the capybara syntax to get my hands on a certain DOM element or something like that. I'll use it for that, but I certainly won't use it to like, say, write me a test for my user class, because it's going to come up with stupid tests and I'm just going to have to like redo its work anyway.

Steven R. Baker:

Right, yeah, and so that's why I just don't bother with these things. I've dabbled with them and I just they just they don't really work for me. And also, I know what most code looks like and I know that what the you know, I know that what I'm doing is different and done in a way that's different than most people write code, and I want AIs that are trained on people like me, not people like everyone else.

Jason Swett:

Right, and there are things that could work in principle but don't work yet. And there's things that would never work even in principle. And sadly, I think a lot of programmers are using AI for things that could never work even in principle, like GitHub Copilot. It's just no offense because I know so many people use it, but I think it's just an idiotic way to write code because it's completely incompatible with TDD. You could use Copilot to write your test, to help you write your tests.

Jason Swett:

That might give you a little boost I don't think a particularly meaningful boost, but it at least might not hurt. But like to have it write your application code. It just shows me that all these people are writing code without tests if they think Copilot is saving them effort.

Steven R. Baker:

Absolutely. One of the big tells for me is when people, when somebody says to me, well, I use Copilot because it writes the code that I was going to write anyways, and as soon as somebody says that, I run fucking screaming from that person um, because if you were going to write code the way that copilot generates it, I absolutely do not want to see that code. Um, I think that these tools are still in their infancy and they're you know the they're kind of opaque and how they're trained and stuff. So I think there's some interesting. I think that they're we could get to a place where that is where where these things actually work. Uh, like I said, jeff langer in his new book talks about this. Um, uh, I believe ken fucking beck is doing some work like this now at whatever his regular day job is now. So I think that the tooling will get there. I think, but the way that these LLMs and co-generation tools and the system, the AI systems that we have now work, I think we need a revamping of how they actually work before they become completely, completely useful.

Steven R. Baker:

Now, if we get to a place where, if I describe a piece of software in perfect detail, the kind of detail that I would write out in my test suite and use that documentation as a test right. Write out your requirements in plain English. Sounds a lot like RSpec. Write out your requirements in plain English. Sounds a lot like RSpec. Write out your requirements in plain English and, if you are specific enough, an AI might be able to figure out pretty easily.

Steven R. Baker:

This is the only possible code, or this is the smallest amount of code that can make all of these tests pass. Then we get to a place where you can start doing really great ai generation and then I become my job doesn't become writing software anymore. My job becomes writing effective descriptions of working software, and so I think that that is a place where we might be able to, you know, to get there arguably you're still writing software, just at a much higher level of abstraction, absolutely yep, yeah, um but yeah, the um, like writing code in copilot without having tests first is um, something that could never be a good idea in principle, no matter how good the technology is.

Jason Swett:

But but I definitely agree with you that like, if you can describe your, if you can describe what you want in the right way, because you know the idea of having AI write like an entire program, the way people fear that like AI is going to take our jobs and stuff like that, like no computer can tell you what you want.

Steven R. Baker:

You know you have to tell it what you want.

Jason Swett:

Yeah, exactly, and so that idea doesn't make sense. But if you can tell it what you want, then it could write a test and then it could run that test, see it fail, write just enough code and then you can tell it what you want next and stuff like that. I actually imagine like a voice-based system where you tell it what you want it because, where you tell it what you want it, because, like, why should I have to write the test syntax? Why should I have to run the command to run the test? Why should you know there's all these? Why should i's?

Jason Swett:

Um, but I could just treat the computer like a pair programmer and I I verbally say what I want, what I want, and it does all the grunt work. Another funny thing about that is like I think that that workflow would work for a very small percentage of programmers because, like right now, I think most of the programmers who are using ai are like abusing it and maybe even having worse productivity than they would without ai. And then there's a small percentage of programmers who are smarter than everybody else, who are actually using AI effectively. And then in this crazy like voice based system, I think that would work for, like not to sound arrogant, but like it would work for you and me but it might not work for most people.

Steven R. Baker:

Yeah, I could agree with that. I think that we're still quite a ways away from that. One of the things that happens I'm not sure if you're on the well, I won't name which one but there's a community Rails Slack that I am occasionally part of. I think that's where we met actually and in the coding I stopped answering questions and helping people in there, because we would very often get somebody who pastes in a piece of code and says this isn't working, Can somebody please help me? And when somebody says please help, I jump on. I like helping people. I got a lot of help when I was coming up and stuff, and it usually takes about 10 minutes before this person admits that they actually, because I look at the code and I'm like this could never work. How are you using this, how are you doing this? And it takes five or ten minutes before they admit to me oh, I actually didn't write this code.

Jason Swett:

Ai wrote this code for me, like well, that's why it doesn't work and why are you trying to ship code that you don't understand well enough, like and yeah again, that's something that's not even a good idea in principle, no matter how good the ai gets, because even if it can write a perfect program in what scenario you know, even though Steve and I would trust you to write an excellent program, I wouldn't just take code you wrote and save it to my file system and send it off to production, like I wouldn't do that with anybody.

Steven R. Baker:

Absolutely and so, and so I don't know why, if you wouldn't do this from another human that you can meet and you can trust and you can have a relationship with, I don't know why I would. Because when you trust these AI systems, you're trusting the company that developed it and you're trusting that the training data you know was was reasonable and that the people who implemented it did a good job, and and stuff like that. So there's a whole bunch of stuff there. I think that. I think that you know, I'm not one of the sort of AI doomers that says, oh, you know, ai is not going to change anything or it's not going to provide any benefits. I think that code generation tools will eventually provide some benefits, but I don't see it today, unless you adopt different ways of working that match those things.

Steven R. Baker:

And Jeff is currently thinking about this. Like I said, I think Kent is doing it as well. Jeff is currently thinking about this. Like I said, I think Kent is doing it as well. There are people who are doing this stuff, but I'm definitely not looking at some like you know, leet code star to tell me how to use AI to generate code. That's not what I'm looking for. I'm looking for professionals that I trust to tell me how to use, to show us how to explore and discover how to use these tools better, and then to show the rest of us.

Jason Swett:

How was I gonna say, oh yeah, and I do think AI code generation, as imperfect as it is, is very useful. It's just that you have to understand what the viable use cases are and aren't Like. For example, I needed to do some programming in C Sharp. I needed to do some TDD in C Sharp so that I can teach some certain people who are using C Sharp even though I haven't used it in like 20 years. And so I had ChatGPT help me get a project set up and figure out the workflow and stuff like that. And I'm like okay, I have this RSpec test. How might I do that in C sharp? That kind of thing that works great. But that's much different from saying I've been programming for six months, help me do my job, or whatever.

Steven R. Baker:

Right and I. I think that that's uh, there are. Like I said, I think there are. There is some utility here. I don't use any of that stuff.

Steven R. Baker:

For me, the interesting thing about generative ai is making dick jokes about my friends. Ha, um, like I, I, uh, I like using um image generation ais to um. A friend of mine is a is a famous folk musician from from eastern canada. I like using image generation AIs to. A friend of mine is a famous folk musician from Eastern Canada and he really hates Arby's, and so I use generative AI, like image generation AI tools to draw pictures of him eating at Arby's and really enjoying it. That's not useful. Nobody's ever going to pay me to do that, but it is the only joy that I get out of these AI tools. I think it's.

Steven R. Baker:

Actually, I also tried to make a chatbot with an AI tool. I've done a bunch of exploration and work with the usage of AI recently, especially the open source stuff like Ulmo and things, and I'm very interested in local models where I can train it on my own data and ask it questions about the data that I've provided. But I tried to get a chatbot out of one of these things and I realized that I was actually just faster to. I was trying to get like a chat bot where I could ask this, this, this uh ai instance, this ai, this ai that pretends to be my friend, the musician, and ask him questions about his opinions, which are well documented all over the internet.

Steven R. Baker:

This is a different musician, friend um, because he's very online. His, his opinions and preferences are very, very well documented online. He's got hundreds of hours of youtube videos. He's been a musician for 45. He's been a guitar player for like 40 years or something like that, and I wanted to make a chat bot that would. Basically I could ask it questions about his opinions about stuff, and with a locally trained small data set of just what he knows, I might be able to get some useful stuff out of that. But I was actually faster to just write a Ruby program to answer the common questions myself.

Jason Swett:

Interesting.

Steven R. Baker:

Than I was to train an AI and actually make it work.

Jason Swett:

Yeah, I wrote a generative AI myself. Let me briefly read you one of its um works. Uh, this one is entitled food. They are those species. These animals are each plant. They can buy the states to the rats. It cannot undertake a production. He is that order party. I must bide me. Were the management exact. Some abundance is united. He is the portion.

Steven R. Baker:

The animals are people that sounds like, uh, like some beat poetry yeah, or or um, like those, uh, um. I I knew somebody years ago who was an author of like uh I called it housewife porn, which I'm sure is a completely fucking uh, uh offensive, uh way to talk about it, but it was like the like vampire romance novels and stuff in the spirit oh yeah, she's an absolutely terrible for housewives, not housewives no, no, no, no.

Steven R. Baker:

I would never speak. I would never speak uh poorly of the latter uh and uh the um, and she's an absolutely terrible writer, uh like to the point where it was incoherent sometimes, and that's what that reminds me of. I think generative ai can probably write like werewolf, werewolf porn pretty, pretty easily, or like twilight fan fiction and stuff. I think the chat GPT can do a really good job of that.

Jason Swett:

Oh, yeah, yeah, yeah. And what's funny is like a lot of human generated content. I now read and I'm like this is about chat, gpt quality. I think your existence in the economy is now not needed.

Steven R. Baker:

So I'm the kind of leftist that believes in people shouldn't have to do a thing that they're not very good at in order to survive, that they don't really enjoy doing. Um, and if ai is going to replace the terrible writers, cool, cool, but we do need a way. I think bill gates said who is not a leftist? Uh? I think bill gates said uh, you know, we need to. We need to tax the robots if they're going to be replacing people.

Steven R. Baker:

People's jobs to pay like if you're gonna, if you're gonna automate somebody, if you're gonna go into a factory and automate 60 people's jobs out and lay them off because they're replacing with robots, maybe we should tax those robots so that we can provide income for those people that you've just displaced man, I've spent so much time thinking about this and reading about that and adjacent topics and stuff and that could be a whole additional two hour podcast episode. Absolutely.

Jason Swett:

Sadly I have four minutes before I have another meeting and we're not even going to get back to that mocks and stubs thing. You wanted to. But before we go, Stephen, anywhere, people should go online to find out more about anything you want to share.

Steven R. Baker:

Sure, so, um, you can find my website, which is not updated very often, at stevenrbakercom. Um, I am SR Baker on Twitter and most other places on the internet. Um, and uh, yeah, I'm. I'm always happy to hear from people, um, if you want to talk about this or or other related things, and, um, yeah, I'd love to. I'd love to hear from people if you want to talk about this or other related things. And, yeah, I'd love to hear from people about this. The thing that I didn't get to, which people can ask me about later, is design by wishful thinking.

Steven R. Baker:

Which is a design activity that I use mock objects to build better software with. I think better software We'll see. It's an idea that I learned from Corey Haynes, because I don't have any original ideas. Everything's a idea that I learned from Corey Haynes, but because I don't have any, I don't have any original ideas. Everything's a remix. I just I just parrot the good bits. So yeah, if you're, if you're interested in having more discussions about this kind of stuff, you can find me online at those places.

Jason Swett:

Awesome. Well, Steven, thanks so much for coming on the show.

Steven R. Baker:

Thank you.