UCL for Code in Research
The companion podcast for courses on programming from the Advanced Research Computing Centre of the University College of London, UK.
UCL for Code in Research
8/9 Research Software Engineering with Python (COMP233) - Design and Patterns
In this episode I talk to Jeremiah Miller - a software engineer - and Max Albert - a research software engineer in the research software group at the University of Southampton about refactoring and design patterns.
- https://en.wikipedia.org/wiki/Design_Patterns
- https://en.wikipedia.org/wiki/Spaghetti_code
- https://www.distributed-systems.net/index.php/books/ds4/ book by A Tanenbaum
- https://refactoring.com The Refactoring book by Martin Fowler
- https://www.patternlanguage.com the original book by Christopher Alexander on design patterns in architecture - for towns, cities etc. This book inspired software engineers to define a set of design patterns on how to structure code
- https://refactoring.guru/design-patterns Max recommended Sandi and her tips and recommendations on coding
- https://sandimetz.com/99bottles the idea of making things as identical as possible to sniff out design breaks or changes
- https://refactoring.guru/design-patterns another website on design patterns and refactoring
Some books:
- Design Patterns - Elements of Reusable Object-Oriented Software Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Addison-Wesley, 1995, ISBN 0-201-63361-2
- Enterprise Integration Patterns Gregor Hohpe, Bobby Woole, Addison-Wesley, 2004, ISBN 0-321-20068-3
This podcast is brought to you by the Advanced Research Computing Centre of the University College London, UK.
Producer and Host: Peter Schmidt
In this episode I want to introduce the concept of refactoring and design patterns - both of which can help to improve existing software. My guests helping me with that are two engineers, Jeremiah Miller from Pennsylvania in the US and Max Albert from Southampton University in the UK. You’ll hear from them a bit later.
So - Where to begin? Maybe we should start with when code has no design. None whatever. When code looks like a heap of cooked spaghetti dropped on a plate. A mess you can’t make head or tail out of.
I have seen and worked with code like this in my career. Maybe you have seen that, too. Source code that is thousands of lines long and the only person knowing what’s going on is the one who wrote it. Unfortunately, he or she left the organisation a while ago. …Leaving you with this…
[transition]
Spaghetti code, code smell, whatever metaphor you use for this - it’s not nice to work with software like that. And there comes a moment when you think: bah, why not put it in the bin and start all over again.
Unfortunately, software that cannot be reused is not uncommon in research - and in commerce, too, it has to be said. But reinventing the wheel all over again feels like waste. After all, the time you have to spend on rewriting code from scratch is the time you could use on your research. Of course, cleaning it up and making it spick and span again will take some time, too. Finding the right balance between repair and rewrite isn’t easy.
Software engineers have been grappling with this for years. And in the 1990s two concepts came up on how to help with cleaning up code. And perhaps more importantly how to avoid it getting messy in the first place. The first of that is called refactoring. And the second is the application of software design patterns.
Let’s say that I have a piece of legacy code and I decided to fix it. The first question is: what is it that I should fix and how do I recognise problems with the code. In short: how to sniff out the sections of the code that smell.
There is a number of code smells and I can’t cover all of them. But let’s start with some that maybe a bit more obvious.
One of that is the size of the code. That is the number of lines in a function or class or script.
In the past a project I worked on, used a piece of code that was literally over ten thousand lines long. It contained a never ending list of computations and data manipulations. Variables were often single letters, like x or y. This was smelly code indeed.
Why is that smelly code? The first reason is: I am not able to make head or tail out of it. What actually happens in that code and what do all these variables mean. The only way to find out is to debug it and step through it one by one - probably several times. The second reason: it’s hard or sometimes impossible to test. I can test what the huge code produces as an output. But if something goes wrong, in which part of the code is the error?
Breaking code up into smaller chunks is often a relatively inexpensive way of improving it. It may not change the flow of the application, but at least it may make it a bit easier to understand.
And that holds for variables as well. In a small function, say a function with one or two lines of code you may get away with using single letters as variable names. In particular, if the name of the function makes it clear what’s happening inside, like for instance a function that’s called ‘add’ or multiply.
But in larger code samples, having meaningful variable names helps understanding the code much better.
Another sure sign of code smell is duplicated code. This is something both Max and Jeremiah comment on in a few moments. If you find the same code copied over and over again - it’s time to redesign the code. Avoiding repetitions is a basic principle in coding. It’s called DRY - Don’t Repeat Yourself. Not at all costs - as Max will say a bit later, though.
There are other examples of code smells or spaghetti code and when you search on the internet you’ll get an impressive list of definitions and examples.
But let’s get back to the question of what you can do about smelly code and how you to fix it.
In 1999, Martin Fowler, software engineer and book author, published a book that would influence a generation of engineers. It is called simply: Refactoring - Improving the Design of Existing Code.
Refactoring is an odd phrase. But what it means is, to quote Martin Fowler, “the process of changing a software system in such a way that it does not alter the external behaviour of the code yet improves its internal structure”. Martin Fowler’s book gives practical tips and ways to systematically change and improve existing code. Not surprisingly, he starts with listing some code practices that should be improved - our smelly code again. which is a pretty comprehensive list of common faults in software design. And for that reason alone the book is worth reading.
Martin continues going through some practical use cases how refactoring code is done on projects. And even though the book is more than 20 years ago it is as relevant today as it was when it came out.
One of the big question in refactoring is: when shall I do it? This is not an easy answer to that question. And the book offers different perspectives for this question. Martin Fowler, for instance, recommends that you do a little code improvements often. Which makes refactoring code pretty much part of your regular development work. Ken Schwaber offers some advice for refactoring jobs that are bigger in the same book.
Because, making small and incremental improvements may not work in all cases. Sometimes, there is simply no way to refactor the code without breaking it first. And putting it all back together again may take some time. In these cases, integrating it with your daily job isn’t going to work. Meaning, you need to put some time aside - and probably some budget, too. Still, the effort for doing that may still be preferable to rewriting the whole shebang from scratch again.
No matter what kind of route you take in refactoring: the important message is: software changes and improving code is an ongoing and continuous effort.
But what can help you with that is to design and structure the code that makes this job easier.
And this brings me to the other aspect of code improvements: design patterns.
[transition]
There is a great story how software design patterns were created. But I let Max do that in a minute. Generally speaking, design patterns are approaches of how to solve a set of well defined software problems. The approach usually takes the form of how a piece of software is structured.
That may sound a little airy fairy. So let’s use an example. Let’s say that your application is using a database connection. You want to make sure that you use the same database connection throughout your code. So, you don’t want to create a new instance of the database each time you use it. In particular, when you access the database from different parts in the code. What you want, is a single instance that gets used throughout.
And there is a name for this type of design pattern: it’s called - the singleton.
Another excample is the socalled ‘Factory pattern’. I have used that in a mobile application we developed at some stage. The problem at the time was that the network API changed to a new one. And we wanted to get the code ready so that we can switch over when the new API was released and a bit more mature. In short, we wanted to use both APIs in the code - but be able to switch between them in a single place. And that is what the factory pattern helps you doing. It defines an abstract class that defines the access, in this case the networking API. But the instances are created in subclasses.
There is a long list of design patterns, and some of them might sound familiar, like the Builder pattern or the command Pattern.
But it must be said that learning, understanding and appreciating design patterns is not easy. For one, design patterns don’t give you a solution out of the box. They’re usually not detailed recipes given in code you can copy and paste into your software. Sometimes they don’t contain any code at all or if they do they use pseudo code or a programming language you may not use. Which means, you need to think about how to implement a particular design pattern in your code. Luckily, you are not alone - so if you are faced with this question have a look around for instance on Stackoverflow or other sites.
But generally speaking: what design patterns give you is an approach of how you can solve a particular engineering problem.
You can see traces of design patterns all over the place in open source software and most application programming interfaces or APIs.
The classical book that lists a number of basic patterns, like the factory and singleton patterns I just mentioned, is the socalled ‘Gang of Four’ book. It’s title is: Design Patterns - Elements of Reusable Object Oriented Software.
It’s called Gang of Four, because of its four authors: Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides. It has been published in 1994, a few years before Martin Fowler’s Refactoring. But like Refactoring, it is considered to be a classic.
Ok, that’s all very well, but how do you decide to use a design pattern. And which one?
For this, I am going to hand over to Jeremiah and Max in a minute. But let me just say, that you start with getting a really good understanding of what problem it is, you are trying to solve. And also, to appreciate that there may be different design patterns you could use. One size doesn’t fit all - as they say.
Which brings me to Jeremiah Miller, who I have been talking to how he uses design patterns in his work. Jeremiah works for a private company in the US, as he explains in the following conversation:
[Interview with Jeremiah]
JEREMIAH My name’s Jeremiah. I’m a software engineer living in Pennsylvania. Before I got into software engineering, I was a stage and lighting technician. And then I kind of just ended up in and did some network operation stuff. And then for about the last six years, I’ve been a software developer engineer, and 2021 I earned my Master’s degree from the University of Denver on software design and programming. And so I’ve just been doing that for the last like half decade now, a little more than that. I like to talk about the design and code patterns to people.
PETER where exactly do you work at the moment, if I may ask.
JEREMIAH So yeah, I actually currently work for Pizza Hut Digital, so they have a digital technology section and I do a lot of stuff with order processing and payment.
PETER Excellent. So we’re talking about design patterns today and code structures. How would you define what a design pattern actually is?
JEREMIAH So the way I think about them, there’s like a loosely standardized set of solutions to the problems we’ve all commonly have. You’re going to like run into the same problems. If you’re a woodworker, you’re going to be like, How do I make a joint? If you’re like a bookbinder, you’re like, How do I how do I bind the book? And there’s like a couple of different ways to do those things. Software design patterns to me are like someone else smarter than me and before me bashed their head against the wall. And these are the fruits of their labours. That’s how I think of design patterns, It is something I reach for because someone’s probably solved the problem.
PETER Are there any design patterns that you use frequently?
JEREMIAH Yes. So a lot of the like really not exciting ones that get used a lot. So for example, I use Builder and Factory patterns all the time, making connections to external resources like databases or just HTTP connections even. You create a builder, and because of the HTTP client - these are the things we need and these are the fields we need inside of it. And so we build those out and then, connect to the HTTP endpoint or a database. In one particular instance, I’m using Singleton for a database because it’s just: I have to keep my connections down. And so, those are three most common. And then I use a bunch others I don’t know the names off on top of my head . I just know that I’m using a bunch around messaging patterns: because of… - at my job we are communicating with a lot of different services and we use message queueing. At this point, the patterns have become sort of designed into the architecture itself. And I don’t have to think about implementing them in code. They’re split into microservices, so…
PETER Let’s start with the Singleton pattern So what is it?
JEREMIAH I use a Singleton when I need to like really manage like how something is accessing a resource. So in my case it’s a I’m accessing a PostgrSQL instance. The way that I’ve implemented it in my source code is - this is the connection to the database. And then rather than instantiating new connections, like when I have to make a query to the database, I have one connection that I pass around. I start this one connection, and then I pass that into my classes.
If a class needs to have that connection for whatever reason like if I have a query or if you’re using an object relational model or like you have a class that accesses the database - I would just pass that connection in there so that we’re not connecting multiple times to the database. If you don’t need to connect to the database a lot you reduce your resource usage.
PETER Yeah, exactly. it’s really there for if you want to use the - reuse the resource multiple times from different classes and different pathways, but you don’t want to create the object over and over again. You basically pass the object around. Let’s talk about the Builder pattern. What is that? Could you describe that?
JEREMIAH So the way that I implement the Builder pattern is, that it’s: I need an object; it’s going to have a set of properties; but those properties might be dynamic in terms of what we actually need there. In the case of an HTTP or a database connection: well - I would say database I talked about databases. So in the case of a database connection, right. In a world where I would to connect to multiple databases, I would, like, builder: new database connection. We would have a database object that we predefined. And then in the Builder you would pass in like URL, you would get your parameters from your environment variables and you pass all that stuff in and then also the database object that you’re connecting to. So you’d be like those three variables: credentials as a string, the URL and the database we actually want connect to it. And it passes that into the builder and the builder object is - like - Hey, here’s a database object. It’s different than your other database objects, because now instead of connecting the database one, you’re getting the database two. And that’s better than just…- I’ve seen it in your code where you can just, you can be manually creating those connections in like a different file or something. And you do that instead.
PETER one question that I have, sometimes code doesn’t have any design patterns or not by design anyway. So at what stage do you decide to use a design pattern. Or is there anything that you look for in code where you say, Well, hang on a minute, can’t we use this?
JEREMIAH so I think there are about three times when I’m like definitely start looking at design patterns, So when I’m designing a new service, I crack open like the design pattern books that I have. And I just browse through the list of design patterns at the front. I just make sure that I’m like thinking about everything while I’m planning out my new code. The second time is more of an a work context where I’m designing services or I’m designing a service that’s going to interact with other services. We use a lot of message queueing. That’s in the middle of a service and then those services have to go somewhere. And so we have to think about like retry logic and what pattern will we use to do the retry logic? Any time you have multiple services interacting together is when I would like reach for a design pattern.
And then the other big area is refactoring.
PETER Ah okay, yeah, when you have legacy code and you want to add…
JEREMIAH You have legacy code and you want to work it out. Or even like, even if it’s not legacy code you wrote and then you’re like - Oh, This is the first draft of it. Especially like - my girlfriend is like a researcher, she is a research scientist and she has to do that. So I know the research code is a little bit different because it’s like sometimes you’re just like have to run your experiments might not use it again, right? If the experiments don’t go well, you might have to like, design from the ground up.
PETER So you think! Ha ha
JEREMIAH Ha haSo for when refactoring code is like, that’s another big area.
Your second question there was what do I look for. If I’m like copying and pasting a block of code from ne location to another, that’s my first flag. Because I if I’m at that point, it’s like, oh, this needs to maybe be refactor to do a proper class or - or at least done something to abstract it anyway. Because if I’m going to reuse it here, I might reuse it a third time. so that’s like sort of my first sign. And then when I started to get to the optimization making like resources, which is another place I might reach for design patterns to see if I can make things more efficient.
PETER Well, these are three pretty good reasons, actually. the last question that I have for you: you already mentioned that you have some books about design patterns. Are there any particular ones that you would recommend?
JEREMIAH There are about four books that I reach for I like a lot and that I think are more generic. I find that in the programming world things change a little fast. So that like a book on a language is not as valuable as a book on a pattern. My library, I try to keep stocked with mostly like higher level, like design things. One of my favorite books found it in grad school is Distributed Systems by Martin Feinstein and Andrew S Tanenbaum. And it has more than just patterns. It just has everything when dealing with, like, distributed systems, it’s like troubleshooting, logging… I found it really invaluable. I pick it up a lot to like just kind of see if they have an answer. The Gang of Four Design pattern book. I think that’s a pretty common like
PETER That’s the Bible, so to speak.
JEREMIAH Yes, that’s the bible It’s like those that just make their way into other books that I use more frequently. So the two books I use more frequently though are Patterns of Enterprise Application Architecture by Martin Fowler.And then this one is actually like even more like I’ve done a lot of integrations to how I used to work for like health care companies, Enterprise Integration Patterns which has a very similar name to the last book I just said…
[transition]
Don’t be put off by the title Enterprise patterns. Yes, they are used in large organisation and for large and complex software solutions. But they can also apply for smaller scale research projects. For instance, a project I was involved with was using the so-called messaging pattern, which you find in the book Jeremiah mentioned: Enterprise integration patterns.
And as I just said, design patterns are not only used in the private sector for large organisations. Meet Max Albert, who has been using them in his work as a research software engineer. Here’s Max
[Interview with Max]
My name is Max Albert. I am currently a research software engineer in the research software group here in Southampton at the University of Southampton. I did my Ph.D. here in Southampton as well, in computational physics, and then I did a little bit of a stint as a data analyst and software engineer at a startup in the data science realm. Then I had my daughter and came back to academia, and it’s a great place to be just in that intermediate space between research and software engineering. It’s just fascinating.
PETER Let’s move on to design patterns. At what stage do you consider using design patterns in your code?
MAX Oh, it’s a hard question to answer because I don’t know if the listeners have a background in it. But it might be useful to point out that they originated actually in architecture and real world architecture, not software architecture. There’s this book by an architect called Christopher Alexander, who in 1977 wrote this book, The Pattern Language, which described basically a common set of problem - no solutions - to common problems that you face when you have to design a building city or something like that. But in order to work well together, there are certain ways in which they need to be interconnected and allow access to each other, that if you get it wrong, it just doesn’t work and people won’t use it. And so that’s where it started. And they came up with these even 200 something patterns for common issues that you need to solve. But they don’t give you a specific concrete way of like build this, build that. It’s a much higher level. And then when people started porting that to software architecture, it’s basically the same thing. You have a common problem that occurs a lot of the time, but it’s a very, very high level, an abstract description of the problem. So in order to understand what this pattern really solves, it’s useful almost, I would say, required to like to see a few instances of it in practice. And so it’s hard to learn design patterns without or to get the experience without just the iterative tedium of running into a problem again and again and then realizing something isn’t quite working here. And what is it that I can change? I feel you kind of have to learn the hard way, but it’s very useful to just get an overview of what patterns are out there and then come back after a while and see, Yeah, okay, now I recognize this and I could have used it in maybe this application that I wrote or this side project or whatever for me myself. One thing that I’ve learned over time, which I it took me quite a while actually, is to listen to the emotional experience when I’m writing codes and when I start having this gut feeling of like this is just it’s either awkward or I can see that this will cause me trouble down the line because maybe I’m duplicating code in six different places with just very few changes, that kind of thing. I’m like, okay, let’s take a step back and how could I do this differently? And often what I personally will do is actually try to rip out a small enough part of my application and rewrite and re prototype it a few times until I’m like, okay, now I understand what the problem is and I’m actually trying to solve and what kind of solution I could apply. And so design patterns come in in the sense that I have an idea of what patterns out there. And once I’ve actually got a good grip on the problem, I know or I can better tell whether a design pattern might help in that instance. That was a very roundabout way of answering your question that.
PETER No, I think it was a very good way of answering the question. First of all, thanks very much for building the link between actual architecture to software architecture and the fact that it’s actually on a very high level because, yes, you can learn design patterns, but the thing to remember, and that’s what I quite like about your answer, is that there’s not one design patterns for one problem. There may be actually different design patterns for a given situation, and you need to evaluate which ones the best and you may have to change it anyway at some stage.
MAX For sure, and vice versa. Even if you are sure that there is one particular design pattern that applies to your situation and provides a good solution. Intentionally patterns are such a high level concept that they don’t give you a concrete way of how you need to implement it. So any code examples that are given are only illustrations, and you’re by no means bound to implement it the same way, or you probably shouldn’t even because the situation is different. They are useful in that once you see four or five or six different implementations or applications of a pattern, then you realize, okay, this is what it’s really about. But you will still, in any given situation, need to implement it in your own way that makes sense to you. And so that is something that I think requires a bit of experience in writing software in general and what makes design patterns a little bit of a there’s a bit of a higher level entry point to compared to other concepts in software engineering.
PETER The other thing that I quite like is, what you called gut instinct, what other people might call code smell. When the code starts to stink. And you run into problems - I had a conversation with someone else who said - Well, if I start copying and pasting the same piece of code all over the place because I reuse it, then probably that’s the moment when I need to think about changing the design. Why am I copying this?
MAX But interestingly, actually, on that note, I mean, when novice programmers learn about this concept of DRY - so Do not Repeat Yourself. It’s also something that they, including myself, I used to apply it just everywhere, but there’s a saying as well: duplication is much cheaper rather than the wrong abstraction, so it’s useful to live with that little bit of discomfort of having something copied two or three times until you really understand what it is that you’re repeating and then you can abstract it out. So otherwise you’re locking yourself possibly into something that isn’t, that really makes it harder to extend down the line.
PETER Hmm. Are there any systematic ways in which you can approach this? Sounds a little bit like black art what we’re saying, doesn’t it: where you feel you have an instinct; Oh, this doesn’t feel quite right; yes, I copied it a couple of times, but maybe that’s still the best way of doing it rather than rewriting the entire architecture. But is there a kind of systematic way in which you can say, okay, I need to go through this?
MAX Yes and no. One thing that I would say is it comes with experience for sure. I mean, you have to hit your head against the wall a bunch of times until you realize what kind of wall it actually is that is there?
PETER Not literally, of course.
MAX Erm, No, of course. No. I mean, the more you run into certain problems, the easier it becomes to recognize them. I mean, the human brain is great at pattern recognition. So that’s the less satisfying part of the answer. Actually, like some people do these code carter. So just sort of little exercises that are almost repetitive and you can do the same one over and over again just to get a feel for - oh this time I want to implement it like this. So this time I would do it like that. That’s a useful thing just to hone that skill. But in terms of original question about is there a strategic approach to it, the one thing that comes to mind regarding the duplication, for example, is actually the pieces of code where you see repetition - is to make them as identical as possible. Only when you’ve reached a point where, okay, I cannot make anything more similar. What is the thing that actually varies between those and that is the bit that you then need to abstract out. That actually comes from a programmer called Sandy Metz. She wrote a couple of books and she has a bunch of really, really great online talks as well.
[transition]
I hope this episode gives you a bit of an overview on what refactoring and design patterns are. At least I hope that when you look at the code and think: hmmm I don’t like the way this is done - then trust your instincts and think about how you can do it better.
This subject is a huge area and I am sorry if I can only skim the surface here. If you have a chance, take a look at the books I mentioned.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.