Runtime Arguments
Conversations about technology between two friends who disagree on plenty, and agree on plenty more.
Runtime Arguments
30: Available compute: way more than you need, right up until you need it!
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
You almost never have exactly the right amount of compute for the job. Either cores are sitting idle while your code runs on one, or you've got more problem than machine. This episode is about the two fundamental tools for closing that gap — and why picking the wrong one makes things slower, not faster.
Topics covered:
- Concurrency vs. parallelism — the core distinction: concurrency is a scheduling problem (you're waiting a lot); parallelism is a compute problem (you need more processing). They are not interchangeable.
- IO-bound vs. CPU-bound — how to identify which problem you actually have before writing a line of concurrent code. (And a third case: memory-bound, where the fix is data layout, not more cores.)
- Threads aren't always what you think — system-level vs. user-level threads; why JavaScript's `async`/`await` is single-threaded concurrency and not parallelism; why Go routines can be either.
- Colored functions / async infection — why `async` spreads through a codebase the way `const` does in C++, and why Go sidesteps it entirely with `go func()`.
- Go channels and Rust ownership — why these two language designs are the cleanest modern answers to shared-state problems.
- Python's GIL — what it was, why it hurt, and why 3.14+ removes it (with caveats for single-threaded performance).
- Amdahl's Law — the mathematical ceiling on how much any parallelization effort can help, and why it's specific to your problem.
- Hidden parallelism — CPU pipelines and branch prediction run in parallel below your abstraction layer, and you can't see them without special tools.
- Communication is the real enemy — GPU bus bandwidth, cluster fabric, Apple Silicon shared memory vs. NVIDIA CUDA: the cost of moving data often swallows the benefit of more cores.
- Fork and copy-on-write — how Unix `fork` got fast, and why Python's reference counting undermines it.
- The actor model — how Erlang (and now Swift) solve the ownership problem by letting the object own the data, not the caller.
- Heisenbugs — the bugs that live in parallel code and only appear when you least want them.
Examples:
- Pixar render farms — 130,000 frames × 24 hours each, solved by embarrassingly parallel independent frames
- `make -j` — the classic CPU-bound parallelism win; why it only helps when you have real cores, not threads
- Trolltech's distributed C++ build system — compile-farm tied to a specific commit, object files cached and shared
- JavaScript worker threads and Web Workers — breaking out of the four-query Node.js limit
Link to Wolf's dap-mux presentation at mug.org: https://www.youtube.com/live/iyAk8-oE6cM?t=1725
Hosts:
Jim McQuillan can be reached at jam@RuntimeArguments.fm
Wolf can be reached at wolf@RuntimeArguments.fm
Follow us on Mastodon: @RuntimeArguments@hachyderm.io
If you have feedback for us, please send it to feedback@RuntimeArguments.fm
Checkout our webpage at http://RuntimeArguments.fm
Theme music:
Dawn by nuer self, from the album Digital Sky
Alright, welcome to another episode of Runtime Arguments, the podcast where we talk about all kinds of cool, techy… Programming… things. Well, we, we. I'm Jim McQuillan, and uh, my partner here is Wolf. Say hello, Wolf.
Wolf:Hey everybody.
Jim McQuillan:So, uh, yeah, this is our, uh, episode 30, our 31st episode. And today, uh, we're going to talk about what to do when you have more problem than you have computer. We're gonna get into lots of cool things, uh, multi-processing, threading, uh, I don't know what all. Wolf's got all kinds of, all kinds of tricks up his sleeve, and we're gonna learn some of those today, so I'm kinda looking forward to that. So, how was your week? Wolf?
Wolf:Oh my god, so annoying! But, you know, in a nice way. I live in a rental and have for a very long time. And I have the greatest landlord of all time. He is a really terrific guy. And, uh, they went out of their way to replace the flooring over the major part of the house, and that meant. furniture going into a pod in the front yard and the family sort of splitting up into various locations. So my wife and me in a hotel with one of the three dogs. Uh, one of the boys living with grandma, with two of the dogs, and one of the boys here in the house, not to leave his room. So…
Jim McQuillan:He stuck it out, huh?
Wolf:They said… he did, he did. So they, they said this would be 2 days, and I said, okay, it'll probably be 5, I'll reserve a hotel room for 4. And we came back today and they were still doing a little bit of work. So I think I was right.
Jim McQuillan:And today was the fourth day? Is that right?
Wolf:It was like I spent Sunday night and this is Thursday as we speak, so Wednesday night was our last night there.
Jim McQuillan:Yah. Okay. So it's, it's, it's good to be back home, isn't.
Wolf:It is, and it looks good, it's gorgeous. I mean, of course, Jennifer is not as happy as me, because she sees things and notices things. So she can't be as easily satisfied as I am.
Jim McQuillan:And… And us… us guys, we just… I don't know what it is, we just don't pay attention, do we? We just, we walk in, we're happy.
Wolf:You know, in this way, ignorance, ignorance kind of is bliss.
Jim McQuillan:Yeah. Just be happy, right?
Wolf:How about you? How was your week?
Jim McQuillan:Uh, busy this week. I just got back from vacation. I spent, uh, 9 days, uh, gone. Uh, my buddy Ron and I, we drove up to see Scotty in Winnipeg. Which, uh, you know, we live in the Detroit area, so that's, uh, round trip, that's 2,500 miles in the car, and, uh, you know what? I enjoy every minute of it. I, I, I love the, I love the adventure. I love the drive. And we, we, uh, we had a good time. So, but I'm back.
Wolf:You're lucky it's summertime, because in the winter in Canada, you can often get pulled over by a snowplow. I don't know if you've noticed the color of the lights on the various vehicles, but they're yellow on…
Jim McQuillan:Or run over. Yeah. Yeah, you get run over.
Wolf:police vehicles and they're red and blue or something like that on snow plows.
Jim McQuillan:Yeah, I don't know. I don't know. But it was, it was a good time, but of course, uh, being gone for 9 days, it put me behind, uh, in my normal work, so this week was catch-up time. And, uh, so that's what I did. Right up until, like, 2 minutes before we turned on the microphones, I was busy working. Uh, but we're here now and we're gonna talk about some cool stuff. Uh, let's see, do we have feedback? Um, last episode, we talked about Wi-Fi, and uh, you know… uh, uh, we've got lots of friends on the podcast, and one of them is, is my friend Marlon, our friend, uh, he works at Meta, he's a brilliant, uh, engineer there, and, um. Uh, he listens and, uh, you know, we talked about, uh, different, uh, uh. levels of Wi-Fi, uh, different, uh, you know, 802.11 something. And I mentioned that the whole thing that drove me to that conversation was the fact that in my house, I, the signal doesn't reach everywhere I would like it to reach. And we talked about mesh networks, and various ways I can fix this, and one of the things I was thinking about doing is just getting a better access point that maybe does Wi-Fi 7, which is maybe a stronger signal. Marlin says, just move my access point. Move it to a more central location in the house and I'll be fine, because placement is probably the most important thing when it comes to Wi-Fi signal throughout the building. And, um, uh, yeah.
Wolf:Can I actually use a phrase? Is it location, location, location?
Jim McQuillan:Yes. That's exactly what it is, like it always is, right? They answer to everything. And the location of my access point is not a good location. Unfortunately, the way my house is laid out, moving it, drilling through walls, and drilling through floors. Um, it presents a challenge, so I'm gonna… I don't know, maybe over the next month or so, I'm gonna try to figure this thing out. I suspect, uh, I… I can get, uh, much better coverage without spending any money, and that… that… that appeals to me. Uh, but thanks, Marlon, for that advice. Uh, it's… it's kind of obvious, um… you know, I, I, the solutions I look for are usually on Amazon, uh, not on, uh, just common sense. But, uh, so yeah, that was, uh, that's the feedback we got from the last episode. Um, and of course, we love feedback, we encourage it, send us feedback, uh, you can send it to, uh, feedback at runtimearguments.fm. Uh, of course, we'll, uh, we'll repeat that again at the end, and we have show notes, and we have a website, we'll… we have all that contact information there, so you can, um… Tell us what you'd like to hear. Tell us what we said wrong. Sometimes tell us what we said right. uh, hopefully today, um, we've got some ideas that will, uh, cause people to tell us what they think. So, hopefully. Anyway.
Wolf:You, you know. Over the course of the last week, something really good did happen.
Jim McQuillan:Uh, what's that?
Wolf:And that is, we took my little open source idea. And.
Jim McQuillan:Right.
Wolf:for a DAP multiplexer, and, um, and we did a big demo of it, um, that is now recorded and available on YouTube.
Jim McQuillan:Yeah, you gotta go to mug.org, or go to the… mug.org is where you'll see the announcement. If you go to YouTube and just search for mug.org, you'll see Wolf's, um, talk on… uh, uh, DAP mux, which was really, really cool for, um, controlling a debugger and seeing variables and… all the kinds of things you want to do when you're debugging. Uh, so check that out. Um, you know, when we do these episodes, we, uh, we put together a little… sometimes it's an outline, sometimes it's, uh, it's never a script. But, uh, we sort of jot down the ideas of what we're going to talk about. So we did that with this episode. create a great outline, discussing all the things we're gonna discuss. And, uh, a couple of days ago, Wolf and I had a conversation about it, and it was kind of funny, because we… we got into a heated, uh, discussion. about one of the things. We'll get more into it later. But, you know. I had a realization, and this is true about a lot of things between Wolf and myself, we come at things from a whole different direction. Umm. Wolf, Wolf is a genius software developer and he thinks, I don't know what he thinks, like I haven't figured it out, but it's what's going on in his brain is different from what's going on in mine. I'll tell you what it was. We're going to cover it more later, but it's the idea of multiprocessing and clustering. And in Wolf's mind, they're exactly the same thing. And, and I understand from his point of view, yeah, they are, the more I think about it. But from my point of view, I think in terms of what's it going to take to implement this thing at the hardware level. Because I'm kind of a full solution provider, and I have to think about plugging things together. And that's where I think.
Wolf:Plus. And you sign the… you sign the checks. When… when you need 5 machines to do it, you have to sign a check that says 5 machines!
Jim McQuillan:Yes. You're right. Right. Right, right. So I'm thinking about it from a whole different standpoint, and to me, uh, multiprocessing and clustering are entirely different. And to Wolf, they're the same thing. It's just how we look at things. Um, and when I realized that, it's like, oh, that's what was going on the other day when we had that conversation. And really, like I said, that explains lots of our conversations. We just come at it from different points of view, and maybe that's what makes this podcast interesting, assuming it is. Um, but anyway…
Wolf:I mean, that's half the title.
Jim McQuillan:Yeah, right. Yeah. Yeah, I guess it's the purpose of the podcast, right? We have different ways of looking at things and sometimes it goes smoothly and sometimes not so smoothly, but this is going to be a great episode. I think I'm ready to move on. Are you?
Wolf:Uh, I think I'm ready.
Jim McQuillan:You ready to go? All right, let's get into it. Wolf, go ahead.
Wolf:Um, so… There's lots of ways to talk about this topic. We could be super… pedantic, and… I don't think that's valuable. What I think is valuable is getting to the essence of the problem. Um, and the problem is when, uh. You're solving some… Workload, some problem. Usually, you're at one of the extremes. Usually, you've got more computer than you have code. Um, for instance, uh, my Mac has… I don't know. What? How many cores does my Mac have? Does it, does it say up here if I pull it down?
Jim McQuillan:Oh, 20, I, I, a bunch.
Wolf:Yeah, a bunch of cores.
Jim McQuillan:I can't keep track of them anymore.
Wolf:like, maybe 18 or something, and 40 GPU cores, something like that, and uh… a regular, like, an NVIDIA GPU card has, you know, in the tens of thousands of cores, or something. They're all simple, but… but it's got so many. So… Almost all the time. You have one core doing your job. And… some huge number of cores not doing your job. If you're lucky, they're doing some other job that is also important to you, but a lot of times, all that hardware is just a waste. Or, you're in the opposite situation. You have a problem that is so big that you didn't bring enough computer to solve it. And it just seems like you're so rarely in the middle. So the question is. How do you balance this? And so I want to talk about. Essentially, this whole episode is… Um, when getting… Two hunks of code, both of them aimed at the same goal. to run at the same time. At least… From your point of view, it looks like they're running at the same time. Now, there's constraints. There's gonna be some reason. that they can or can't actually run at the same time. There's something that you don't have enough of. That's the whole reason why this is a problem that needs addressing. Maybe you don't have enough, um… Uh, CPU. your problem is that, you know, you… remember the old days, when we just had one CPU in our computer? And if you wanted to solve a problem that.
Jim McQuillan:Oh yeah.
Wolf:you know, would easily benefit from 10 CPUs or something. Instead of running it on 10 CPUs, what you did is you ran it, but it took 10 times as long, or worse. Umm. These days, maybe you have 10 CPUs, and maybe you can use them all at the same time. Sometimes, the thing that you're constrained on is more about time than it is about processing power. Uh, for instance, there are things across a boundary that just take time to get. Uh, one of those things is, um… IO, data. If you have stuff on a disk. I don't care how fast your disk is. I mean, I guess I technically do, but nobody's made a disk fast enough to make this not be a concern. A disk is so far away and so slow compared to even the first-level cache in your computer that you are waiting for disk stuff to happen. So… Here's two different ways to solve the waiting problem on a computer. One way is you add more processing power. You really do run. Multiple hunks of code at the exact same time, but you do it on multiple processing units. Um, maybe those processing units are cores. Maybe those processing units are, um, I don't know if you remember, um. Uh, what did they call that special thing where, uh, you only had one core, but it had two processing threads?
Jim McQuillan:Oh, hyperthreading? That still exists. Oh yeah.
Wolf:hyperthread, or something, yeah. Uh, maybe you're running it on a GPU, maybe you're running it in a cluster. Uh, it's Kubernetes, or it's a Bay… who knows? That's one answer. Multiple pieces of code at the exact same time. That comes with problems of its own and I'm going to get to those. Or maybe your problem is that you're spending a lot of time waiting. So instead of being about running multiple pieces of code at the exact same time, your problem is about scheduling. Let me pick the right piece of code to run right at this minute. We have two different words. One of those words is concurrency. And the other word is parallelism. These are two different approaches for solving those problems. Concurrency is when you're going to use scheduling to solve it. You pick which piece of code is going to execute right now. Threads are a way of doing concurrency. They are an implementation for a concurrent solution. Multiple CPUs are an implementation for parallelism. Each of those two approaches comes with problems. There are words in the languages that you use that imply you are using one or the other. Some languages don't tell you what answer. that their language feature is going to leverage. And maybe they don't even choose what answer, um, that feature is going to leverage. Maybe it depends on how many CPU cores you have. Whether it's going to be threads, or cores, or whole separate processing units.
Jim McQuillan:But. But…
Wolf:Yes?
Jim McQuillan:So, what I'm hearing is, if I have a program that spawns threads. It's not going to take advantage of multiple cores.
Wolf:It depends on the language and the interface that it gives you, what the underlying implementation is. The typical answer is that a thread is lightweight and it runs in the CPU you are in. It is a scheduling problem. It is concurrency. But really, really good languages that can use both. have the option of saying, here's a thread I can run on a different processing unit. For instance, um, Go has Go routines.
Jim McQuillan:Okay.
Wolf:If necessary, a go routine could be a thread on your local CPU, on the CPU you're using. or Go can make a Go routine run on a separate CPU. Why is this an option? Because the communication between a Go routine and your main. hunk of executing code is a channel, not shared memory. And that means that it can send the data to the right place. If it's on the same CPU, maybe that's a real short trip. If it's on a different CPU, maybe it's a longer trip, but it doesn't have to worry about contention, which is one of the problems of both of these mechanisms when two different hunks of code want to access the same piece of memory. and coherency and access become a problem.
Jim McQuillan:Okay. In my mind, multiple threads always meant, uh, if there's more than one CPU, it'll just take advantage of it, but I guess I'm learning something today. Um, uh, which is great. Some days, some days I learn things.
Wolf:Well, here's an example. Um, JavaScript.
Jim McQuillan:Umm. Yeah. Yeah.
Wolf:JavaScript has async and await, as do several other languages.
Jim McQuillan:Sure.
Wolf:But unlike some other languages, JavaScript is explicitly designed to be one foreground thread.
Jim McQuillan:Right.
Wolf:That, like, that's the whole idea. When you say async and await in JavaScript, you are not getting multiple CPUs. That's not happening. What you are getting is…
Jim McQuillan:No, no, I don't think of that as threads at all.
Wolf:Uh, and yet, some people would describe it that way. But that's how it works. And there are other languages with async and await that actually do use threads. Um, and there's… a thing to know is there's multiple layers of threads. Like, threads are a thing.
Jim McQuillan:Okay. Sure.
Wolf:Uh, there's threads at the level of… let's talk about Unix and Linux for a second. Uh, we already know that they are divided into levels. There's the system level, and then above that, there's the user level. System-level threads are heavier than user-level threads. System-level scheduling is more resource-intensive, um, and, uh, more strongly opinionated than user-level scheduling. Um, so that it turns out there are system-level threads, and there are user-level threads, and your language might be designed to take advantage of one of those, but not both. Umm. Also interesting when you're talking about threads and async and await is this notion of There's a word people use. I'm not sure I agree with this particular terminology, but the idea of. Colored threads.
Jim McQuillan:Hmm.
Wolf:I don't know… you're mostly… you do a lot of stuff in Perl, so in Perl, you wouldn't have seen this, but in JavaScript, you would have. And there are other languages where you would see it, and certainly with respect to other language features. For instance. in C++, and I think in Java. I don't do a lot of Java. Um… There is the idea of const. When you mark a specific parameter const. or you declare a certain member function to be const. you can't call it with non-const data in certain contexts, in the way that you do it. When you declare a parameter to be const in C++, you can pass in anything, and the promise is. that it won't alter that underlying parameter. But if you declare the member function to be const, you are saying this cannot be called except on a const object. If you say this can only be called on a const object, then suddenly const starts to propagate. It can't just be that this is const. The thing that calls this has to be const. And that has to be const. And that's why they call it colored. The color spreads from the thing that you want. This happens in a thing people used to use in Microsoft. stuff called COM, the, um, Common Object Model, uh, other things. But the problem is that, uh.
Jim McQuillan:Umm. Umm.
Wolf:some people call it infection. One function infects the caller, and that infects the caller. Well, this happens with async and wait. Um, in a language like Python.
Jim McQuillan:Right.
Wolf:If you have a synchronous, a normal function, you can't just await some other function, because it needs you to be asynchronous to do that. So you need to have an asynchronous event loop to get that asynchronous function to execute. You need to be able to await it, and you can only do that from an asynchronous function. But in Go, for instance, um… It's, it's just a go routine. Any function that you want, you can call in this magic way by instead of saying, you know, the function is named foo. Instead of saying foo, open paren, some parameters, close paren, you say. Go. And it does its thing. No color, no infection, no spread. And now it communicates with a channel. That's pretty, pretty cool. That, like, you say that it's cool, it's this thing, you say that a lot. I don't say it very often. And I'm not a Go programmer. I can write stuff in Go if I have to, but it's not the first tool I reach for. But this is a reason to reach for Go. This is a really good thing that Go does. Um, Rust. Rust doesn't actually provide the mechanism. to do the threading, but it does provide the API. So, it has a way to communicate between, uh, things that is not unlike Go's channels. But you supply the underlying library that decides, is this a thread? Is it a… process? Is it a… what… what are you going to get? Um, which I think puts it even one step, uh, beyond Go. I think that's a fascinating answer. So, um…
Jim McQuillan:So, so with, with Rust, is it, is it moved more into like the standard library than in the language itself? The threading and that kind of thing?
Wolf:Um… You know, the main… so, I'm not a Rust expert, although I want to be. Uh, my understanding is that one of the most important Rust libraries in this, uh, neighborhood of solutions is called Tokyo, and it is about threading, and it is not in the standard library.
Jim McQuillan:Okay, but it's… it's a library.
Wolf:But it is so commonly used that it might it might as well be in the standard library Tokyo is as important to rust as a thing that used that we used to use all the time in Python called twisted. Used to be twisted is less important now, but Python and everything that came after learned a lot from twisted and I think there's a lot of code in rust that is greatly informed.
Jim McQuillan:Umm.
Wolf:Bye, Tokyo. Um, so. Two important concepts. Concurrency and parallelism. Concurrency is when you are waiting a lot, and the answer is scheduling. Parallelism is when you have more problem than you have computer, and the answer is more processing. And now let me give you one other really interesting word. Before, when I talked about optimization and measurement, and by the way. Measurement is a huge part of this world. how you are going to solve problems, uh, using these tools, you need to measure what is the constrained resource? What do I get when I do such and such? Well. Before, when talking about speeding up pieces of code, I talked about the idea that if you are trying to optimize some piece of your code. that takes 20% total of the execution time to solve some particular problem. then even if you make that part of your processing go away ENTIRELY. The most you can get out of that is a 20% speedup. It turns out that there is a similar effect in, uh, this whole idea of concurrency and parallelism, and that is. if the fraction of your program that can be parallelized… let's go the opposite direction for a second. Let's say that there is 20% of your program that can't be parallelized. then the best you can do is an 80% speedup. It has a name. And it turns out it's the same thing. It's the same name in concurrency and parallelism that it is in straight optimization. And that is Amdahl's law. And yes, it's that Amdahl. The one, uh, who is the computer expert and, uh…
Jim McQuillan:Yeah, Jean Amdahl.
Wolf:Yeah. Amdahl's law says that, you know, once you measure and you see what part can be sped up, that's the limit right there. Whether it's parallelism or concurrency or optimizing something away entirely.
Jim McQuillan:Right? Yeah.
Wolf:measure, you'll know what the limit is, and that limit, uh, is the expression of Amdahl's Law for your particular problem. It's not a law in the same way that, um. Uh, what's the one about, uh, number of transistors in a…
Jim McQuillan:Oh, Moore's Law.
Wolf:In a… Moore's Law. It's not, like, something independent of the problem in the way Moore's Law is. Um, it is super dependent on your problem and on your specific solution to the problem.
Jim McQuillan:Yah.
Wolf:Um, but it is its own thing. So… Now you know I do kind of want to talk about a couple of interesting. problems around the answers, um, and levels at which the answers occur. Like, there's parallelism that is beyond your ability to detect. Uh, for instance, inside a CPU. I can name at least two things that are, um… Uh, important one is the pipeline.
Jim McQuillan:Umm.
Wolf:Uh, the pipeline inside a CPU means that multiple instructions are happening at the same time, at various stages of their own execution. They don't exactly happen entirely at the same instant, but they are overlapping in a way. that makes you be executing more code than you would think you could. And there's branch prediction. Branch prediction isn't exactly getting two pieces of code to execute at the same time, unless it is, um… The kind of branch prediction where it does a bit of one side, but mostly has chosen the other. Umm. these are both a level of parallelism that you can't see, uh, unless you have special tools and you're working with a special version of the CPU. Um… you have a much better look inside concurrency, because concurrency is a scheduling problem, so you can actually look BEFORE it actually starts executing code. That's useful. So that's one thing. And a second thing is. Once your problem is going to be solved by some, uh, form of parallelism. Now you have another problem, and that problem is… communication. It's getting data from where you are to the place where you're going to solve a whole different part of it.
Jim McQuillan:Right.
Wolf:Um, if it's on a different CPU, in a different machine, across a cluster. Well, now you've got to send data across, you know, what TCP or who knows what, the fabric. If you're two machines in the same data center, you're not necessarily connected by TCP. You might be connected by an interconnect fabric or something.
Jim McQuillan:Yeah, was it a mirror net or there's there's various.
Wolf:which… There's tons. So there's that. Or maybe you're on a GPU. So now what matters is getting the data across the bus, which could be a PC.
Jim McQuillan:ways.
Wolf:IE bus or a P what whatever it is.
Jim McQuillan:PC yeah yeah.
Wolf:it's not nearly fast enough. None of it is fast enough. The great thing about parallelism is you've got. X times as much processing power, and the horrible thing about it is that you have to send the data there. Um, a nice compromise is on, uh, Apple Silicon. it uses shared memory, so there's not a communication problem, but on the other hand, where a NVIDIA GPU has. X thousands of cores, um, and they're simple and can do simple math. Where by simple math, I do mean that it can do reasonable amounts of linear algebra, the kind of stuff you'd want to do if it was video. But on a piece of Apple Silicon, maybe you get 40 cores. 40 is not 10,000. They are significantly more interesting cores, individually. Um… Of course, you only get one kind on Apple Silicon at this moment, whereas on an NVIDIA card, there's three different levels of GPU cores, all the way from the smallest, fastest leaf cores. cores, up to the highest level ones where you only have 10 or 100, I don't even know how many of them, but that make big decisions and part out the work and whatnot. But zero communication overhead on the Apple Silicon and gigantic communication overhead on the NVIDIA system or whatever particular thing that it is because there's AMD and there's NVIDIA and there's this Apple Silicon. NVIDIA uses CUDA, and AMD uses ROCm, and Apple uses Metal, and all of them can be used to solve non-graphics problems. But once again, you need to look at your problem. You need to identify what is the band… what is the scarcity? What is the cost? What is the thing that you need to, uh… optimize for. Um… And then you need to pick the appropriate kind of parallelism and hardware on which you would like to solve it. IF you are allowed to choose. Just like when we talked about languages, we talked about the choices into which you could be forced. Well… if you're doing a video problem, and you want to run on Windows machines, because it's a game, and it's a AAA game, and you want people to buy it. Well, you already know what the answer is. The answer is, you're gonna solve it on an NVIDIA card, and your problem is communication, and you're gonna do video work, and that's all there is to it. If your problem is you want to search a strand of genetic material, and it is a million bases long, and you want to find something that's 60 bases long in that. Now you have a different problem, and you have different things to optimize for. So… Almost never do you have the right amount of hardware to solve the problem that you want. So you need to understand your hardware, and you need to understand your problem, and you need to understand if there's other ways to solve your problem. I wanted to mention, um… some of the interesting ways that we have done this in the past. Threads were easy. But, um… In the past, on Linux and Unix systems in particular, we have used fork. Fork is awesome, in some ways, and kinda terrible in others. At the beginning, Fork used to do a thing where it would copy the world. It would copy your entire process, including all file descriptors and any resources that you might use, and um… Then the only change would be that you are the child or you are the parent and you could do some particular thing. And sometimes what you really wanted was you didn't even actually want that copy. What you wanted was you wanted to be something else entirely. So, you did a fork with that giant copy, and then an exact, and suddenly you were something else! making that whole copy kind of a waste. Um, but that's a thing, that's what we used to do. Since that time, we've made fork. in a lot of cases, significantly faster, because we use a copy-on-write version of fork. So fork. acts exactly like it does a complete copy of your entire process, but it doesn't. It just starts running. And it doesn't actually copy anything. until you write. And since most of the thing that you just copied is executable code, and code is in volatile segments, they can't be modified. Therefore, your code doesn't get copied, but your data, which includes the file descriptors, does. Bats. the part that might get modified if, uh, you're gonna do something, and then that's the part that eventually will get copied. Everything is faster. Unless you're Python. If you're Python, and you do reference counting. Ugh. then instantly just looking at a variable changes its reference count, and then back again, so copy and write almost never helps you. And also, Python had this horrible thing called the GIL. the, uh, global interpreter lock. The global interpreter lock is a way of solving this problem of multiple stuff happening at the same time, um, by stopping two different pieces of code. from trying to have a coherence problem with the same piece of data. Now, in more recent versions of Python, I think starting at 3.14. You can run without the GIL, and I think in 3.15, running without the GIL will become the default. For some stuff, for single-threaded stuff, in 3.14, it wasn't necessarily faster. In fact, I might go so far as to say it was often slower. Uh, but if you actually were multi-threaded, it was… it was a win. But soon, it will be the case that, uh, life is just better with that. Umm. So. How do you know what you want? Uh, you have to decide. Are you IO-bound? If you're waiting for the disk, if you are waiting for the network, if you are waiting, you know, if you're waiting. Waiting is the key. Then threads. Concurrency, probably your answer. If you're CPU-bound, if you are, um… trying to calculate, uh, hashes or encrypt, um. I mean, in a way, you're data-bound for those, but… but if you're CPU-bound, if it's pure cal. Probably parallelism is the answer. There is a third choice, it doesn't… crop up a lot, but it's if you're memory-bound. If you're memory-bound, it's a lot like Jim's location problem with his base station. The answer to a problem of being memory-bound isn't about picking threads or picking multiple CPUs. It is about. probably layout of your data structures in memory. Can you get that right? Um… And the real hard part is when you want to share data. Is it something that you need to send between two things using communication? Is there coherency that you have to manage? Is there locking? Um… Different languages use different things. Um, there is one model that I didn't talk about, and that's the. actor model. Most of the languages that we use. don't leverage the actor model. This is kind of an Erlang thing. But it turns out Swift does use actors. An actor is when it's the object itself that owns the data and nobody else owns it. So you can only get to it by executing code in the object? That kind of solves the access problem, as long as you don't provide a bunch of escape hatches. Um… Anyway… I think we can make today be kind of a short episode if I give you the takeaways from this. Because really. It's just those three main things, I think. Is it a scheduling problem? Or is it a work problem, a CPU problem?
Jim McQuillan:Sure.
Wolf:Um, so the takeaways are, um. That either you have more computer than you have code. A bunch of cores going to waste. And if that's the case, the answer's concurrency. Or… You have more work to do than you have computer to do it. In which case, the answer is parallelism. Get more CPUs. Get more cores. Get more machines. Get more. You need to diag… just like every single episode we have, I'm going to harp on it, you need to diagnose. before you write the code. Start measuring in advance. Figure out where is the problem. And the actual enemy isn't how much work you have to do. It's not what you're waiting for. It's sharing state. It's communication and coherency. That is your enemy, no matter which mechanism you end up choosing. That's the thing you will fight. That is the thing you will get wrong the first three times. That is the thing the languages that help you with. That's the place they're going to help you the most. Go helps you because it uses channels. Rust helps you because it enforces strict ownership. Only one owner gets a mutable reference to something. Or, everybody has a read-only reference. Amdahl's Law is the thing that essentially, um, I guess the word here is humbles you. Um, you're gonna know in advance. You can't do better than this. If you are 5'2", it is unlikely you are going to be dunking. If only 20% of your program is parallelizable. Ben? is the fastest you will get to go. Is that 80% at normal speed, and then some speed up on that 20%. Python? Used to have lots of limits, and is having some of them… removed. Um… And… You should know when not to parallelize. If you're already fast enough. Maybe it's okay you have too much computer. Maybe. Or whatever. But it's key to know. We use this magic word. We talk about Heisenbugs. Bugs that only show up. Um, when they are least wanted, and are incredibly hard to reproduce under a debugger. Well, let me tell you where they live. They live in parallelized code. So, it could well be the case that. You have a problem where parallelization is an option? But it's not a guaranteed win? If the win isn't guaranteed, go spend your money on something where the win is guaranteed. That's what I think the takeaways are. What do you think, Jim?
Jim McQuillan:So I've, well, I've got an example of parallel parallelization that makes sense. I just read this earlier today and I don't remember it was on Hacker News or someplace talking about how Pixar. uh, renders their movies. Um, and, you know, when, when you're doing an animated, uh, uh, movie, um. the, the animations have gotten so good. If you, if you look back at, um, the one I can think of is, um. Finding Nemo, where they made the water look like water, and now, like, grass looks like grass. And I think even back when they did Hair Looks Like Hair, back when they did the movie Titanic.
Wolf:And hair, hair looks like hair.
Jim McQuillan:Um, rendering that water, uh, there was some, some new technology had to be created to do that. Anyway, um, the thing I read today said, uh, that a single frame of, of. those movies takes 24 hours to render. That's just one frame, and what are movies? 30 frames per second? I think they said the average movie is about 130,000 frames. Now, if you had 130,000 frames. And it took 24 hours per frame. um, there's not enough time for, for any of us to, to, you know, we would never be able to watch the movies because they would take that long to create. So they went to massive parallelization. Pixar developed this whole thing using… creating render farms, where they are, indeed, processing multiple frames, like, thousands or tens of thousands of frames simultaneously. So at the end of 24 hours. they could have tens of thousands of frames done, instead of just one. So that's a great place, a great example of running things in parallel, where your problem is bigger than your compute.
Wolf:Yeah, and, uh, like, some things to point out about that situation are the data that needs to be shared, um, first of all, is actually independent per frame, because, uh. The big data is all the pixels at the end. It's the output, not the input. The input is 3D models, and the 3D models are actually different per frame, because they've moved, things have moved. So it's not like there's a coherency problem. That's a big, big deal. Second, I wanted to say, remember when I can think of at least two movies back in the beginning time of computer animation and CGI. Tron was an early one, and so was The Last Starfighter. I thought those were both interesting. And I used to work at Trolltech, a company… it's now known as The Cute Company.
Jim McQuillan:Yep.
Wolf:And, uh, we had, you know, Qt is a gigantic C++ framework, especially back in the day. It was way bigger than the computers, but you needed the libraries to be pre-built. to do your thing, but we were constantly… I wasn't, I was the… I was the evangelist, so I don't want to confuse anybody that I was actually writing code there that was for part of the library. I was writing code, but it was, like, demo code and… And whatnot. Um, uh, anyway. They had a system. Which other people started to emulate, where, when you wanted to build a specific, um, version of the code. You would start… And every machine in the office. would help. you'd compile two files, that machine would compile five files, the other machine would compile three, and they were tied to the specific commit. It wasn't Git back then, I remember, I think maybe we were using Subversion, I can't remember. But it was tied to the specific commit, so later, when people wanted those object files, they could get the right one, and they'd already been compiled. That's a form of parallelism. And there, they get the data from the centralized repo. And there's no duplication, because you're compiling one file, and the other guy is compiling a different file. There's lots of problems where the answer is parallelism, or concurrency, or…
Jim McQuillan:Sure.
Wolf:Or who knows what.
Jim McQuillan:Sure, sure. Uh, make minus J. I do that all the time when I build Postgres or whatever. And… there's no communication between them, because they're all compiling different files. Every thread, uh… or every process is compiling a different file, so they don't need to talk to each other. But, uh, Make is a great tool for that. Make-j16 will, uh, spawn 16 threads to build your program much faster, assuming you have the CPU cores to do that.
Wolf:Right. You definitely want cores for that, as opposed to threads, because that's a problem that's CPU-bound, um, not IO-bound. It's not like you're waiting for that file. And if you had threads, and it was one CPU switching between compiling the different files.
Jim McQuillan:Yeah.
Wolf:it wouldn't be faster. In fact, it would be very slightly slower. So, minus J, really good when you have multiple cores.
Jim McQuillan:Sure. And, uh, you mentioned, very briefly, JavaScript. I've been doing a lot of programming in JavaScript lately. And, you know, JavaScript is single-threaded, and it's all built around an event loop. So when you… I'm doing it in Node, so I'm making database calls, and… and other things. And when I, when I go to submit a query, or, or, you know, execute a query, um, I, I use async and await, and, and that, uh, not always, but, um. When I feed a query to Postgres, my program is free to do other things. Right? It's, it's, uh, that, that, that, uh, database query's running in the background. Uh, it's… JavaScript itself is single-threaded.
Wolf:It's running in your background.
Jim McQuillan:But I can… What do you mean?
Wolf:I mean, the query has gone off, and now it is the primary importance to Postgres. It's running in Postgres' foreground. You're just waiting for an answer to come back. And the fact that you're waiting.
Jim McQuillan:Yes. Right, right. Right, and when I get an answer… Yeah. And if I don't do the async await, it'll run a callback for me. It'll basically run… my program will process the result of that query when it.
Wolf:Yeah.
Jim McQuillan:finishes doing whatever it's doing. Like, my program can be sitting there doing lots of things, and when the query finishes, my program will get that and use that. And I've recently.
Wolf:Which you can do either with threads, or with promises and futures. Um, both of those give you the same power, which is, you're waiting.
Jim McQuillan:Yeah.
Wolf:And why wait? Like, while you're waiting, let's go do something useful and interesting.
Jim McQuillan:Right, right. But something I'm using now, well, an issue I ran into, I can only queue up, I think it's four, I can only submit up to four queries to Postgres from that single thread. And all the other queries that I might queue up will just sit there in a queue, they won't actually run. But 4 of them will run. uh, the way, the way Postgres works with, with Node.js. So four of those queries will just be off running on their own. I'm working on a problem where I have a lot more queries to run, so I've resorted to worker threads, which, uh, exist in Node, and they also exist in browsers. Uh, you can, you can run, uh, it's called something else in browsers. It's, uh… Uh, I forget, worker something or other.
Wolf:There is a thing called Web Workers.
Jim McQuillan:Umm. Web workers. It's a web worker in the browser, it's a worker thread in Node.js, and I can spawn as many of those as I need. And, you know. This way, I could submit lots of queries to the database, and of course, I need a powerful enough database server to handle that. But it's… so far, it's working really, really well. So you can do multi-threaded stuff in JavaScript in Node. And I'm kind of having fun with that right now. But anyway, um… Boy, you covered a lot of stuff. Some real interesting stuff. I'm anxious to get this episode out there for people to listen to and offer some feedback. So again, feedback. Send your feedback to us, please. We love it. Feedback at runtimearguments.fm. You can go to our website at runtimearguments.fm. Uh, I forget, do we need the WWW on it? Or we don't?
Wolf:You absolutely do not want the WWW.
Jim McQuillan:We don't want the www.runtimearguerstaffer, that's not what you want.
Wolf:And you don't want the ass either.
Jim McQuillan:It's not HTTPS, because of where the website is hosted. But we will have show notes on the end of this episode, so you can check out all the links there, and a transcript, and um…
Wolf:And a transcript.
Jim McQuillan:Wow, it's been a great episode, Wolf. I had fun with this one. Thank you and thanks to everybody for listening. And Wolf, you got anything else to say or are we done?
Wolf:Uh, I think… we're almost done. I think we should put a link to the YouTube video about DapMux in our show notes.
Jim McQuillan:Yes, we'll do that.
Wolf:And it's possible to make a link that starts right at the beginning of the DATMUX part. Which is about 24 minutes in.
Jim McQuillan:You measured. Yeah.
Wolf:Well, it says, you know, it's right in the corner. I didn't have to, you know, go through it or anything.
Jim McQuillan:Right. Right. Right! Uh, I'll have to see how we can do that, but yeah, we'll do that. We'll do that. Uh, so thank you everybody. And until next time, uh, goodbye.
Wolf:Bye bye
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
CoRecursive: Coding Stories
Adam Gordon Bell - Software Developer
Two's Complement
Ben Rady and Matt GodboltAccidental Tech Podcast
Marco Arment, Casey Liss, John Siracusa
Python Bytes
Michael Kennedy and Calvin Hendryx-Parker