Runtime Arguments

6: Code Performance - Where does the money go?

Jim McQuillan & Wolf Episode 6

Wolf talks about making your programs better. There are lots of ways to make them better. It all starts with figuring out what matters and measuring it. Measuring it all the time. Measuring it more. This episode is about following that path.

Show notes:

Take-aways from the episode:

  • Understand what you are optimizing for: (speed,memory,storage,developer, etc…)
  • Measurement is job one, because that’s the only way to know where the money is actually going.  You should be measuring.  A lot.  More than that.  It should be part of CI/CD.  You should run it before pushing.  Everyone should be doing it.  Measurement might be even more important than testing (and don’t get me wrong, testing is very important).  When I worked on Mozilla, our build servers did timing.  If your commit slowed something down, that was considered “bustage”, and required immediate fixing.
  • Use the profiler for two things:
    • To see if the whole thing is faster or slower so you know when it’s time to look deeper
    • To dive into the actual execution and locate the bad parts you need to improve.
  • It’s all about the money.
  • Write clear, simple, and correct (you’ll know by testing) code.  Only then should you optimize.  Do I need to repeat the old adage about premature optimization? “Premature optimization is the root of all evil.”  It’s easier to speed up working code, than it is to fix fast but broken code.
  • Understand the (real) data you will be operating on.
  • You don’t know just by looking at the source what actually costs you the most money.  Yes, you can see where stupid things happen, but even for those, knowing which actually matter requires measurement.

Hosts:
Jim McQuillan can be reached at jam@RuntimeArguments.fm
Wolf can be reached at wolf@RuntimeArguments.fm

Follow us on Mastodon: @RuntimeArguments@hachyderm.io

If you have feedback for us, please send it to feedback@RuntimeArguments.fm

Checkout our webpage at http://RuntimeArguments.fm


Theme music:

Dawn by nuer self, from the album Digital Sky



Jim:

Welcome back to another episode of Runtime Arguments. We are really excited about today's episode. Wolf is going to be talking about uh uh code performance and where does the money go? We'll be talking about profiling, measurement, and optimization. Uh we are we are so grateful for the listeners out there. Uh they're providing great feedback. We're enjoying doing this. We're just gonna keep on doing it as long as you guys enjoy it. Um this particular episode started uh the idea for it came when I uh was doing the research for the WebAssembly episode. That was several episodes ago. Uh we were trying to use WebAssembly and trying to make it fast and trying to compare it with other languages, and the idea came up that boy, wouldn't uh wouldn't an episode on optimization and measurement uh really be a nice episode? And Wolf was all over that. He jumped right on it and started doing some research. And he's I don't think he had to do a lot of research. He knows this stuff really, really well. I I think this one is probably pretty easy for him. So that's what we're gonna be talking about today. But before we do, uh I have just a very little bit of feedback from our previous episode. Uh that episode was on uh uh Wolf, what was that episode? The last one we did.

Wolf:

File systems.

Jim:

File systems, yeah. How could I forget? Because I did that one. Anyway, we have a little bit of feedback on that. Uh first of all, most of the research uh on that episode I can't take credit for. Uh, most of it was done by my friend Ron. Uh he and I took a long road trip up to Winnipeg uh back in May, and maybe it was early June. Um to pass the time, Ron was researching file systems because he was trying to figure out what's the best file system to use for a server he was setting up. So he was reading man pages out loud to me, and we were discussing, and we learned all kinds of stuff about file systems, and that's really what went into uh the previous episode. So if you haven't listened to it yet, go back and find it. Uh it's it was just a couple of weeks ago. Uh we talked all about file systems and stuff. So that was pretty cool. Uh, one of the things I failed to mention during that episode, we did talk about uh the importance of checksums in file systems, newer file systems like BDRFS, or that's BTRFS. Um, they have something called checksums, and that's kind of important because file systems and hard disks and and any kind of uh uh persistent storage can suffer from something called bit rot, and that is the the data can degrade, uh especially for magnetic uh material. Uh the data can actually degrade, uh uh and so checksums will help you detect that and in some cases correct for that. So I wanted to make sure that uh I got that out there.

Wolf:

So that's kind of that actually that actually makes me want to say something. Uh I'm not going to give a full explanation, but this is kind of a pointer for people who care. Um there is a special kind of CD, DVD, Blu-ray disc where you're not recording onto uh organic dye, you're recording, you're actually etching onto minerals. Um the minerals are rated to hold your data for 10,000 years. The plastic that is the disc is only rated for a thousand years. Uh pretty much almost any uh normal uh DVD, CD, Blu-ray player can play these, um, and a large fraction of sp of these ordinary writers can write them. Uh so the tool to write them is fairly cheap. The discs themselves are very expensive, and they come in sizes uh 4.7 gigabytes, just like a DVD ROM drive, uh just like a DVD ROM 25, 50, and 100 gigabytes. So yeah, they're not terabyte size like some things, but they are gonna last for you at least a thousand years, and all I care about is maybe them lasting 20. They are called M disks. Um, and uh there are places to learn more about them. Um, it's a little hard to identify which disks are the right ones for you. Uh a clue is how much they cost, but you do need to do a little bit of research. So that's where you should start. Okay. Okay.

Jim:

That's a good point. And you mentioned a thousand years and ten thousand years. That's for the fancy kinds of material you're talking about. What are just typical DVD uh uh discs good for? I mean the the you know the ones that I buy at uh Best Buy aren't gonna last that long. Well, not long. How a year, ten years?

Wolf:

If they last 10 years, you are incredibly lucky. Um they ought to last at least four or five years, um, but they're not nearly as good as we used to think they were.

Jim:

Yeah, so if you really have data that you want to store reliably for long term, like a thousand years or ten thousand years, use these uh whatchallow, M drive? M disks. M disks, yeah. You use those. Okay, thanks. Uh if you have uh any feedback for us on any of these episodes, uh if you just want to suggest topics, uh if you just want to say hi, uh feedback at runtimearguments.fm will uh get to us. So please send us some feedback and I'll repeat that information at the end of the show, and it's gonna be in the show notes uh because we'd love to hear from you. That's how we know people are actually listening. So let's get on to the topic, eh? Wolf, you ready?

Wolf:

First, let me say uh what I'm going to talk about. Um I have four specific points that I'm gonna mention here as the uh the principal focus of this episode, but underneath it all is the money. Um every single thing you write or run costs money somewhere, somehow. And almost certainly you want to spend as little of it as possible, and for that spend, you want to get the most out of it as possible. Um, this is true for yourself, but it's really true for your job job. If you're running something in the cloud or for a boss or millions of people are running it. Anyway, here are uh the here is the intro to what I'm gonna talk about. This whole episode is about making your programs better. And when I say better, I mean a lot of things, but better is a good summary. Um there's lots of things inside your program that individually can be improved. Um, and a thing we're gonna look at is which behaviors inside your program can or should we optimize. Um we're gonna look at why is it important, and we're gonna look at how do we actually do it. Um so that's a list of the things we're gonna talk about. Okay. The actual words I'm gonna say today are about programming, but the underlying ideas um actually apply to a lot of things. Uh almost any kind of process can be optimized. Almost any kind of process costs money. Almost any kind of process um is going to yield benefits if you examine it and figure out where it can be improved. So it might be hard to extract from the specific things I say here, but the underlying uh truth of this episode applies much more broadly than programming. Um the uh I'm I I wrote down words to specifically say about this. Uh a person is creating something that will be useful to others, and money is the limiting factor. It costs money to pay that person, it costs money to use their creation, that money might be because of time, it might be because of specific requirements like location or needed services. Um this stuff is fundamental to coding, but even if you're not a a coder, um, it's gonna apply to you. So, what can we optimize? Um couple top-level things. Uh, programmer time and effort. Um, that's probably a thing you care about when you're doing it on your own for your own self. Um, and the questions uh uh about that include things like what environment is it going to run in? Is it gonna be um some GUI thing on a desktop? Is it gonna be inside a phone? Is it gonna be on the command line? Is it gonna be a remote job in the cloud? Is it gonna be inside an embedded device? All of those have different criteria about how you're going to do your job, and that means how hard it's going to be for you. Um what languages are you, and maybe you're in a team, uh, so your team too, what languages are you good at? What languages are really good at solving this specific problem? Um, and there's lots of ways to evaluate that. Uh usually it's about what libraries are available in that language. Uh, but for some languages, it's about the actual features of the language. Does your intended environment force a specific language on you? If you need to hire people, what language uh choices lead to enough available people for you to hire? A lot of these choices are forced.

Jim:

So why does this all matter? Uh, who measures this and how would you track it? I I've never heard of anybody really caring that much about this. Um what I do know is that speed is important, uh, especially for big companies that do lots and lots of processing.

Wolf:

Uh so who measures it and and and who tracks it? Programmers usually don't. Managers almost always do. And weirdly, it's usually managers that do things like pick the language you're gonna use and the database that's gonna hold your data and do the hiring, decide which people know the right things to work on this product. Exactly the wrong people to do all of this choosing. Um an example, uh, we have a very good friend who's an engineer at Facebook. Uh he's uh maybe more than an engineer, you know, uh a leader of engineers, but but as an engineer. And he recently mentioned that they have he's in charge of a specific library, uh, and millions of servers are using this library. And when his team does something that is one half of one percent slower, people scream. Uh because at that scale, a half a percent of a million servers is 5,000 additional servers they'll need to add just to keep up.

Jim:

That's uh that's a lot of servers. Uh it's hard to think in terms of a million servers, and really he's talking about multimillions of servers. Uh so yeah, half a percent, that's a huge deal for them.

Wolf:

Yeah, big numbers, um, I feel like I ought to be able to visualize them. Uh, but we did that episode on crypto where we were talking about really, really huge numbers, and I sort of gave up and said, well, this is something I'd have to write down. Um but uh back to where I was, which is other things to optimize for. Um and one of the big ones is time to solution. In other words, the speed of the program. This is the thing you're usually thinking about. Um and it this one isn't really a choice. It isn't forced in the same way that the language and architecture might be, uh though there may be requirements. For instance, if you're writing the OS for an a Formula One car, um that OS has to be real time. Um or you might be solving some problem in a spaceship, and that problem you need an answer in under a minute.

Jim:

This right here, this sounds like the stuff people really want to know about, so uh keep going.

Wolf:

Yeah, this is this is the big one. This is the one we're gonna look deeper into. Um I just want to mention two other less likely things, at least these days. Um, memory and storage. Uh that's one thing, even though I use two words. Um it might might be very unimportant for you doing a personal program on a personal machine that happens to have a lot of resources. Um I have a laptop of my own, um, and it has so much more memory and so much more uh SSD space than anything I had when I was young. Um I can't even believe it. It's astounding to me. Um and finally, uh a thing to optimize, believe it or not, is output. Um, some kinds of output are already what you want. For instance, uh your program is an image converter, fine. If the program is going to be a command line tool, what does it print? If it uh if it uh logs things, what's in the logs? Are the logs super big? Are they helpful to the user? Is it helpful to you, the programmer? Are they ephemeral? Do they only live in the console? Or do you make files? Do you manage those files? Do you delete old ones? Um do you rename them? Do you compress them? Umput is a thing that can be optimized. Um influence the speed of your program. And uh one important characteristic of output, um, especially logs, is when you hand your program to the end user, the person who's gonna run it, um the logs is really almost all the kind of telemetry you will be able to have. They're gonna have a log file and they're gonna send it back to you. They're gonna pick out what problem it was. Is the log file concise enough to find the error? That's a problem for them and you. So that's a thing that can be optimized.

Jim:

So, like Wolf, I I've got a computer that's just incredibly fast. Uh fast processor, lots of memory, lots of disk. Um for most people, I think they're in that boat. They just don't really care uh that that their program is slow. Um I at this point with my hardware, I can't imagine writing a program for the types of things that I do that would even get near the limits. Um but why don't you t tell us more about uh about about this?

Wolf:

Well, on your own personal machine, writing your own personal code, solving your own personal problem, I I think you're absolutely right. Probably none of this matters. Especially programs that only you are going to run, that you're gonna run uh at the speed of a human being, not at the speed of how much data you have. Um the thing you are optimizing for when you write a program like this is your your own enjoyment. Um there's almost no way on your own personal machine for a program you write to spend enough money for you to care. Um your code is essentially free because you've already paid for the computer. You're not gonna hire anyone. Uh you're gonna use your favorite language, so that's gonna be easy. There's no barriers except your own personal time uh that you have available for this project. If this were for your work work, or running in the cloud, or on an Arduino, or in a phone, or on a Raspberry Pi, or on a supercomputer where you had to timechare, or calculating blockchain blocks fast enough to beat all the other miners, or writing a finance app that had to be absolutely reliable, or a stock trading app that needed to make the fastest possible trades, then you're you're gonna optimize for things other than enjoyment. Um I I've I've just listed uh a lot of the possibilities a minute ago. But which of these are gonna be important? Um and that is very situation dependent. Um and everything is a trade-off. Uh I'm gonna list a couple of the trade-offs because it's not just that you optimize for speed. Um it is you uh pick speed versus memory and storage. That's a trade-off you have to make. If you make it faster, almost certainly you're gonna use more memory and storage. Um that's that's kind of a rule. Um programmer time versus almost anything else. Um how many users are gonna run this program? How many times are they gonna run it? How often are they gonna run it? Um output, I already talked about, seeing enough to debug versus not overwhelming the user. Um and what resource are you actually spending money on? Sometimes things are free, like when you own the laptop you're running this program on, uh, time might be free. Hardware is free because it's already paid for, it's a son of cost. Um usually you're optimizing for speed and you're just accepting or ignoring everything else. Um that's the thing people are always talking about. That's the first thing people think of when you say the word uh profiling or optimization to them. Um that's what they want to make better. Um but whichever of these things is most important to you, um, that's gonna be where. Wherever the money is going. Continuously measuring that particular thing is job one. Making it faster or take less uh bandwidth or memory or that's not job one, that's job two. Measuring it is job one. Um this uh i profiling, figuring these things out, it's gotta be part of your regular process. Um it should be in your CI C D. You should have everything set up to profile on your own personal development machine. You should do that um every time you push. If you're gonna push something brand new, it should not slow the program down. Um that's a thing. Uh so let's let's focus on speed, because that's the main thing. It's almost always the most important. And I'm going to tell you a thing that old programmers say to young programmers all the time and have for many, many decades. Um because you have to be an old programmer before you learn this particular thing is true. And that is this. You're smart, and you're good at the programming language you're using, and you know your algorithms, so you are almost certain that you can tell by looking at the source code for your program where it spends its time. Don't feel bad, everybody thinks that. You can't. This is the thing programmers most commonly want to optimize, and you absolutely can't know what to do without measuring. I know you think you can. I know you do, but really, you are wrong. If you could easily see where your program is slow, wouldn't it be fast already? Um, sure, you can see the stuff that's obviously stupid. Um often though, the slow things you are looking at don't actually have a gigantic uh impact on the big picture. Sometimes slow, simple, and correct is better than fancy, fast, buggy, and hard to understand. Um you're starting up your application, stuff that happens at startup happens one time. If you have a startup that takes 20 minutes, yeah, there's some stuff in there you're gonna need to fix. Um but if you do something in your startup that takes five seconds and could have taken half a second, is that something you need to optimize? It might be incredibly stupid. Just looking at it could make you sick. Is that something you need to optimize? It almost certainly isn't. Um, you're gonna use tools uh to do this measuring. Um I'm gonna talk about and I'm not gonna mention too much about particular OSs uh throughout the rest of this, because it's basically the same job everywhere you go, and there is a basic hierarchy of answers. But um, if you're a Windows programmer, Visual Studio, um the big one, not VS Code, uh, has the tools in it to measure the stuff you want to know. That's where you're gonna start. If you're a Linux user, there's a program called Perf. Perf is king. Everybody knows about perf. Perf is actually great. It might be one of the best solutions. Unfortunately, it's Linux only. If you're using a Mac and it's uh recent and you're running Mac OS on it, um you're gonna want to use uh a GUI application called Xcode Instruments. Um it's good, it's a little complicated, and whatever compiler you're using needs to do some help with it. It's absolutely easy to use it on Swift or Swift UI, or I assume Objective C. Um I've been doing some Rust programming, and on the Mac there's a package a crate that you can use that will make your Rust programs easy to examine in Xcode instruments. Um anyway, you're gonna use some tool like one of those. More important than the tool is at least two kinds of information that the tool can give you. The most important of those two things is cumulative time in functions. Um it'll usually come back to you as some kind of table, and it will be sorted either by time or maybe by function name. It might be hierarchical, it might show you the call stack, it might be sortable. Um this all depends on the program you use to collect it. It might be as simple as a CSV file. Um, it might be as complicated as a web page that can do all kinds of things. Uh I'll give you an example. There's a tool called Flame Graph. Uming the cumulative time is important. Um because, for instance, you can see, oh, my program spends 50% of the time in such and such a function, and then there will also be a number that says how much of that time belongs actually to that function that appears at the top of the hierarchy, and how much of it belongs to the things you're calling.

Jim:

Um you're talking about these tools, perf, uh, uh Visual Studio, Xcode, uh, Xcode instruments. Um it seems to me, I I've not used perf, um but it seems if I'm thinking about this right, uh it's it's a great tool if you're writing like in a C language or Rust or something that's compiled down to machine code. Um you and I both do a lot of programming, not at that level. I I do a lot of Perl, you do a lot of Python. Do these tools help you there?

Wolf:

It depends on what kind of thing is happening at the place um your program needs help. Um a way to start, just like when you start debugging, um, you start with prints, printf's, and and you advance to logs. Um and then later maybe you use a graphical debugger and there's breakpoints and and uh all kinds of things like that. Um a way to start is, and I didn't talk about the second thing that these profilers give you, but here's an easy thing for you to get um that is really gonna help you. A thing that almost certainly is going to surprise you is how many times certain functions are called. Uh, and it's easy to uh stick a little bit of debug code inside that function at the beginning and maybe at the end uh to just count. Um sometimes you have functions you expect to get called a hundred, a thousand, however many times, and it only gets called once, and that's surprising to you. Sometimes you have a function that you expect to be called three times. Maybe you're concatenating strings. I'm not even gonna get into just how bad concatenating strings can be, but it can be really bad if you're not doing it the smart way for your language. It's different in different languages. But you think that this function you're using that helps you concatenate strings is only gonna be called three times, and uh when you measure, you see it's called 10,000 times. That's a big deal. That means if that function takes more time than you thought it should, the result, the impact of that uh is magnified.

Jim:

Okay, so again, we're you and I work at at higher level languages. I I do a lot in Perl. Not not as much as I used to, but I have written an awful lot of Perl. And Pearl's got a profile called called NYTprof. It's fantastic. It it gives me the the the counts of the how many times uh uh functions are run, uh how long it takes to run each iteration, uh cumulative, how long uh how much time that function took, you know, might have run 10,000 times. It produces a flame graph so that I can see where the hotspots are, you know, where I could spend my time. I'm guessing there's something like that for Python.

Wolf:

There is. There's a module called C Profile, or maybe it's called C Profiler, and I think there might be a pure Python one that's called Profiler or Profile. Um, and they do all the things that you just said, and they are super helpful.

Jim:

So you don't have to get down to the level of using prof uh uh uh perf or xcode instruments to do this kind of stuff. Usually your language, at least a dynamic language uh like like what you and I are using, uh, will provide that. You think Java offers something like that?

Wolf:

I gotta imagine Java's got Java uh has been, at least um since the late 90s and early 2000s, one of the most used languages. Um that sheer popularity means there must be a solution there.

Jim:

I don't happen to be a Java programmer, so yeah, it just seems like uh you know perf might be the best for Linux, uh, and Visual Studio might be might be the best for Windows, but I I think something at the level of the language you're working with is something worth looking at.

Wolf:

Uh, I think that's a great observation, and yeah, absolutely true. Um Okay. So a problem with uh runtime arguments is it's an audio podcast. So I can't show you uh a profiler, any of the profilers I've mentioned, actually doing its thing. Uh and profiling is very specific to the platform language kind of application and so on. So even if you could see, I don't think I could do a demonstration that was meaningful to everyone. Um and that makes me sad. Uh because I I want to show you how to do this. Um I'm gonna talk about it.

Jim:

When you play with these tools, it's really, really eye-opening and and kind of uh fascinating the the information that they tell you about your your your code.

Wolf:

Absolutely. Um I will say that these tools can be very fine-grained, and that's not how you want to start. Um you want to start at the top. Where does my program spend the most time? Is it accessing the database overall? Is it running the algorithms that it's using to calculate the answer overall? Um you want to start at a very macro level. And when I say macro, I don't mean like C or Lisp. I mean like microscopes and macroscopes. I guess there isn't a thing called a macroscope, except in the science fiction novel.

Jim:

Anyway, hang on just a second. One more thing. I'm I'm sorry to keep interrupting you. We talked about these languages. You know, there's a there is a language I use a lot. Every day I'm using it, and a lot of times I'm spending uh uh uh making it faster, and that is uh SQL. Um SQL's got a profiler, at least uh in Postgres it does. It's called uh SQL Explain, and I think it's a standard SQL thing. So I imagine it's it's there for uh MySQL and and uh SQL Server and all the rest. Uh SQL Explain uh we'll show you.

Wolf:

And explain is so important. Oh my gosh. Because you can see if it used the wrong order, if it looked at every row instead of using an index. Yeah. Huge mistakes.

Jim:

Yeah, well, there's a that's a great measure and measuring tool um that's available. If you can get your query into a file and you can run SQL Explain on it, boy, if you could read the output, because it's not easy. It's it's fairly dense the information they give you. But I you know, I can take queries that take many, many minutes to run and get them down into like the 10 millisecond range just by running SQL Explain and and seeing that, oh, I'm not using an index. I thought I was. Uh I'm I'm doing this sub-query where I should be doing a join instead. Uh running Explain on that, huge, huge win.

Wolf:

I I agree. Uh Explain does give you all the information. Uh, I do want to point out that um you must understand how databases work for you for you to make good on the information explained gives you.

Jim:

Sure, sure.

Wolf:

Like you need to see, oh, this one isn't using the index I wanted uh and and know why that's bad. Um so um uh a lot like uh one of the things I'm gonna talk about is um certain languages are good at certain operations. Um and when a language is good at that thing, you should do it the language way. Um works a certain way, knows how to do things a certain way, um you want to use it that way so it will do the best job possible for you. Um and that means you need to know a little bit about it.

Jim:

Sure, sure.

Wolf:

Okay, let me give you some rules, and I'm gonna give you these rules in order for making your code faster. Um, and it all starts with profiling, uh, so that you can actually see where the time, and when I say time, what I'm really saying is money, uh, is actually going. And start by looking at the most expensive hunk of code. You're gonna solve these problems one hunk of code at a time, um, and you're gonna go in the order of actual cost. That's what you're gonna do. And here are the operations to to apply. The first thing you need to know before anything, before worrying about making the code fast, is the code right? Is it simple? Is it correct? Is it testable? It is a way easier job to speed up a simple and correct program than it is to fix bugs in a speedy, overly complicated program. That's number one. Um so far we haven't gotten to the part where we're speeding things up. Uh you need to know that these rules are more important than the actual optimization. Uh you don't start with typing code. Alright. The second thing is for some hunk of code, do you actually need to do it at all? Is this piece of code necessary? Is the job it doing contribuing contributing to the answer? Uh because nothing is faster than not doing it at all. Um maybe if you have a piece of slow code, you could speed it up by 50%. That that could be giant. That could be huge, maybe. Um but if you didn't have to do that job at all, you would be taking out of the total time of your program the entire 100% time occupied by that useless piece of code. So, that's the second thing that you're gonna use to optimize code. Um the third thing, we already talked a little bit about seeing obvious stupid stuff. Did you make in this piece of code stupid mistakes? Um, for instance, using the language wrong? Are you doing by hand something the language already knows how to do better? Are you calculating things every time through a loop? Um, when it's the same answer for that particular thing every time and you could have hoisted the calculation out of the loop. Maybe the compiler catches things like that, maybe it doesn't. Um you should know better. You should probably write that code better. If you're in the most expensive piece of code, as we already agreed you would be, then you're at a place where maybe hoisting it matters. Um there's different ways to concatenate strings. In Python, for instance, if you take five individual strings, maybe some are actual literal strings and some are variables, um, and you use the plus sign and you add them together, um, doing it that way is gonna make the most possible allocations because it's gonna process those plus signs one at a time, and it's going to each time add two strings total together, and when it does that, it's maybe going to have to allocate more space. Certainly in the case of plain plus instead of plus equals, it's gonna make a brand new allocation for the result of the addition of those two strings. You using the plus sign is absolutely the slowest way. You know what you should be doing in Python? You probably should be making an F string that mentions variables and expressions inside of it. That all gets evaluated at one time with one single allocation, it's exactly what you want, and it's the fastest possible thing. That's a way Python is better at making strings than you might be if you just did it on your own. So know things about your language.

Jim:

Yeah, that would be like uh like using Sprintf in C or Perl or languages like that, right? Where you get this formatted string with uh a bunch of variables after it, and it does the replacement for you and generates the string once.

Wolf:

That's exactly right. Alright. Next down on the list, and this is an important one, and it's probably the first one you thought of. Um, listener, that's who I'm talking to. Are you using the right algorithm? And when you pick the algorithm for what you're doing, um, this is not opening up a computer science textbook and finding the fastest sort. Um, the right algorithm strongly depends on the actual data you are operating on. And the data you test with should be a good simulation of your actual data. Fancy algorithms might be faster, but that might only be true if. Your data is big enough. The corollary to this is you are most certainly not Google. You are not Netflix. You are not Facebook. They have different data. They have different problems. They need different solutions. Write your code for the problems you need to solve right now. If you solve a problem you don't have, it's not going to solve the problem you do have. The next stage is are you using the right compiler or interpreter options if this happens to be applicable in your situation? Things that will remove asserts, unroll loops, hoist expressions, eliminate dead code, use tail recursion, where that's a thing that would help, specialize flows for the specific types that pass through them in more dynamic languages. Your compiler can do things to make your code faster. It's not going to make your code three times as fast, but it might make your code 30% faster. That's a win.

Jim:

Alright, you you just threw a whole bunch of words out there that you know I read I read the man page for the compiler. You look at the the man page for GCC, and it talks about the dash capital O uh optimizing uh setting. And you know, you can do dash 001, dash 02, dash 03 to enable different kinds of optimizations. And they talk about things like like you just said, uh uh loop unrolling, uh hoisting. What are those things? What what does it mean to unroll a loop?

Wolf:

Um so sometimes what you have is a really tight loop. It's small, it just does a few things inside it. Um when you do that, um it can be that the machinery of doing a test that takes you back to the beginning of the loop is more work as a fraction of the total work one iteration through the loop is doing than you want to spend. Your loop costs too much because it's looping. Unrolling a loop means taking the thing that loop does, its guts, the middle part, the thing that's not the test, and instead of looping over that, um, putting that thing ten times or twenty times or a hundred times. Uh it depends on exactly what you want. It's not a loop anymore. I mean it is, it's just a really big loop. Uh because at the bottom or in places in the middle, it's gonna try to exit or go back to the top. Um, it is still going to be checking for conditions, but it's faster than what it was before, and it's longer and flatter. It's not a loop anymore, it's unrolled.

Jim:

So that's what unrolling a loop is. Okay, so so that particular example, uh, as programmers, we want to write loops because it represents, it expresses what we're trying to do in a small amount of code. And it's, I mean, you could flatten it out yourself, but that kind of makes a lousy-looking program, right? Uh, but if you can you want to write something simple. Yeah, you want to write it as a loop and let the compiler unroll it for you uh to make it fast by using the the optimization setting. Uh that's right. Yeah. Okay.

Wolf:

And I'm gonna toss in here, unroll it if it helps.

Jim:

Yeah.

Wolf:

Yeah, yeah. Which you can't do.

Jim:

Sure. Yeah, yeah, the compiler will uh unroll it if it helps. So uh there's other things. Uh hoisting expressions, uh eliminate dead code. That's that that's kind of kind of easy to see. But what is hoisting expressions?

Wolf:

Okay, so in this uh hoisting expressions particularly applies inside a loop, and there are more complicated versions where things loop over a function, and that function itself uh can be can do this thing. But here's what hoisting means: it means that you are calculating something uh inside a loop, so you calculate this thing 50 times or 100 times, but the actual thing you're calculating uh doesn't change. It's the same answer every time through the loop. Um and sometimes uh we're not talking about a loop, we're talking about some interesting kind of expression that appears five times in your long variable calculation. Um hoisting means pulling it out of the thing where it's calculated multiple times, just calculating it once, and then in all those places, so every time through the loop, or in the five places it was used inside your giant calculation, or things of that nature, just use the one pre-calculated value that you made. So the programmer didn't put any extra code outside above the loop. Um they just wrote whatever was natural, but the compiler figures out it's the same, pulls it out of the flow of the loop, now it only happens once instead of a thousand times, um, and you just use the answer. That's hoisting.

Jim:

Cool. Cool. Um I don't know. Do you dare talk about tail recursion? I I I I I don't understand what that means.

Wolf:

Tail recursion, I I want to talk specifically about specific types, but let me hit on tail recursion because it's more broadly applicable. Um when your program is executing, usually what happens is it grabs a hunk of space on the stack. Ordinarily in a running program, there's two different places for memory there's the stack and the heap. You allocate things on the heap when they're gonna live for a long time, they stick around. They might live longer than a pattern. Right, exactly. The stack gets a new block every time you start a function, and every time you end a function, that block gets thrown away. If your function nests very deeply, your stack will use a lot of blocks. And the stack is typically much more limited than the heap. You can run out of stack. Especially if you're doing recursion. Um, recursion is when a function calls itself. There are lots of obviously uh recursive problems like factorial and stuff like that. Um, it turns out it's it's typically easy for a programmer to turn something that's recursive into something that's just a loop, but let me explain how tail recursion interacts with this. When you write a function that is recursive, so it would make a brand new block on the stack every time it calls itself, and it could be very deep and consume all your stack, if it turns out that that place where it calls itself is the almost or is the very last thing in the function, then instead of waiting for all of those recursive calls, maybe a million, who knows, to come back before it throws away the stack block for this particular execution of that function, it can throw away that block before it calls into the code.

Jim:

That's that's why it's doing that magic for you.

Wolf:

Exactly. And what that means is instead of allocating a million blocks on the stack, it it can do it with about one. Or whatever however your code happens to be structured. So tail recursion, super helpful in certain kinds of programs. Um let me talk about um uh uh sp specializing for given types. Um is this applicable and what happens? Um there are dynamic languages. Perl is one, um, Python is one, there are others. Um it's not that variables don't have types in these languages, they do. When you have a variable, there might be an integer in it. And the runtime knows it's an integer. It it does integer things to it. And when you add two integers together, there's code it executes to do that. If your language is dynamic, then when you enter the function that is going to do the addition of these variables, it doesn't know yet that that variable has an integer in it. Um every single time you ever call this function it's an integer. There's a special kind of uh compilerslash interpreter called a specializing compiler or a specializing interpreter, and what that does is um it r happens as you are running your program. It doesn't do this the first time you call the function, it does it after watching the function get called a thousand times. Um it notices that over that thousand times, or you know, some number, over that thousand times, um almost always it was an integer. And then instead of testing and um calling the correct function um and maybe that function test, all that stuff, it can go directly to adding two integers together. Um if it sees that's the thing you do the most, it can make your program be really good at doing that thing. Um and that is called the specializing uh compiler or op or uh interpreter. Um sometimes jits do that uh just in time compilers. So um the the latest version of the Python interpreter, um I think 13, 3.13 can do that. Uh it's absolutely in 3.14. It I don't think it's ready yet, um, but it's a it's a big win um because Python is one of these dynamic languages. Um Python does have typing. Uh I think Python typing is a really great addition to the language, um, but it's all about static analysis. You run a tool on your Python program. Uh you don't run your program and have stuff check at runtime. That doesn't happen. What happens is um you run MyPy on it, or Pyre, or Pyrefly, one of these um special type checkers that looks at the type annotations and um does a bunch of math and proves your program is keeping the promises you made with these types. Um anyway, let me keep going, if that's okay. Do you have more questions? No. Okay. Um so uh can this, this is the next piece of the solution to making your code go faster. This is the next lower-down step. Can this particular hunk of code be done faster using a specialized library? Um for instance, um in Python, uh there's lists. Uh if you want a matrix or something, um you have to use lists of lists. That's not as good as you could want. Python isn't as good at manipulating numbers in lists of lists or combining whatever. It turns out there are uh libraries that, depending on your specific requirements, um, can do these things much faster. So if you want to deal with uh big arrays of numbers, maybe you want to use numpy. NumPy is a Python library, you call it from Python, but inside uh numpy is written in C. Uh Pandas, I think it might be written in C. Polars, I know, is written in Rust. Um they're fast, they're super fast, but they present to you a Python interface. So you don't even know you're running Rust. Uh but that particular problem that you're solving with an array is now solved 50 times faster. So sometimes the right answer is not doing it yourself, but using a library. And finally, the absolute last possible choice, the thing you want to avoid at all costs, but sometimes you have to do it. Um, and that is can you rewrite this particular section in a lower level language? Um there are so many costs associated with this. Um for instance, uh Python is a language that optimizes for programmer time. Um with Python, you can write much faster uh than you can in many other languages. But maybe you're at the part where you've tried everything else and you need to write some little piece of it in a lower level language, um, and let's just pick Rust for this example. Rust, for most people anyway, is harder to write than Python. It's gonna take longer, it's gonna take more effort. Uh Rust is pretty portable, but if this was a thing you did in assembly, um, which God help you, um portability is a problem. Is your team ready to deal with this language? Can anybody else on the whole team do it except you? Uh, or understand it, or find a bug in it, or fix it. So using a lower language is your absolute last option. But sometimes you have to take the last option.

Jim:

Well, you you you do what you gotta do, right? To make it fast, if if uh if that's what you're trying to shoot for. Uh so I so wow, that's a lot of stuff you covered. Uh that's neat. Can can you boil it down and give us uh the key takeaways from this?

Wolf:

I absolutely can. I I am so sad that I can't show you visually examples of each one of these things uh because I think that would make such an impact. Um, but I think I can boil it down. Um the first thing is you gotta understand what you're optimizing for. Um is it speed, is it memory, is it storage, is it you the developer? What is it? Um and almost certainly the real truth behind that is what is costing you money or or whatever you're using as a system of exchange. The second thing is, once you know what you're optimizing for, is measurement. Measurement is absolutely job one, because that's the only way to know where the money is actually going. You should be measuring a lot, more than that. It should be part of CI C D. You should run it before pushing. Everyone should be doing it. Measurement might be even more important than testing, and don't get me wrong, testing is very important. Um, when I worked on Mozilla, our build servers did timing. If your commit slowed something down, that commit was considered a bustage, that's the word we used. And people weren't allowed to commit after that. The tree, as we called it, turned red. The entire job of anybody who could help was to fix that bustage. Slowing the program down was bad. Real bad. Don't do it. Um, so measurement, job one. Use the profiler for two important jobs. To see if the whole thing is faster or slower, so you know when it's time to look deeper, and second, to dive into the actual execution and locate the parts you need to improve. Um it's all about the money. That's a point I need to make all on its own. You need to write, this is this is your job as a programmer. Write clear, simple, and correct code, and you'll know it's correct by testing. Um write code that says what it means. Write code that when you read it, tells you what it does. Umly then can you actually optimize it. There's an old adage, I think it might be from Don Knuth. It might be from somebody else, but the uh the adage is premature optimization is the root of all evil. Um, everything you do to optimize something is almost certainly going to make it more complicated, harder to understand, harder to fix, easier to have bugs, and that is exactly the opposite of your job. So don't do it. Um you need to understand the real, actual, true data you will be operating on. Your tests should have data that looks like that. Your algorithms should know how to work on that data. When you do big O notation, usually it's capital C plus big O something. Capital C is the constant for setting up that particular algorithm. Complicated algorithms have real big C's. Easy algorithms have small C's. If you have five items and you want to sort them, you don't want an algorithm with a big C because big C is going to take over the entire time of sorting. A bubble sort might be fine for five items. I'm not gonna tell you what's right for you, but easy data can be solved with easy answers. Um and finally, um, you You don't know just by looking at the source what actually costs you the most money. Yes, you can see where stupid things happen. But even for those, knowing which actually matter, knowing which stupid thing actually has an impact, which stupid thing costs you money, knowing that requires measurement. So write clear code, measure it, and then worry about everything else. Um those are the takeaways.

Jim:

Wow, that's a lot of great stuff. You know, uh, we've been doing this podcast for a couple of months now, and uh this is this is maybe my favorite. This was so much fun talking about, and uh I learned stuff, and I hope that our listeners uh learn uh from this uh and I hope they enjoyed as much as as we did uh doing this one. Um don't forget uh we have show notes if you're podcast uh player. Uh uh it should show the show notes. If not, we have a website. Um if you go to uh runtimearguments.fm, point your browser there, uh you'll see the full episode list and you'll see show notes and and all that fun stuff. Uh send us feedback. Uh you can contact us again at feedback at runtimearguments.fm. We'd love to hear the feedback uh because we want to we want to do the best we can and your feedback uh lets us do that. So uh thank you very much. Um please uh tune in again next time. And um uh Wolf, thanks.

Wolf:

Uh my pleasure. This is so important to me. I really hope something I said uh makes a difference for somebody and they spend less money.

Jim:

I I'm sure it will. Thank you.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

CoRecursive: Coding Stories Artwork

CoRecursive: Coding Stories

Adam Gordon Bell - Software Developer
Two's Complement Artwork

Two's Complement

Ben Rady and Matt Godbolt
Accidental Tech Podcast Artwork

Accidental Tech Podcast

Marco Arment, Casey Liss, John Siracusa
Python Bytes Artwork

Python Bytes

Michael Kennedy and Brian Okken
Talk Python To Me Artwork

Talk Python To Me

Michael Kennedy