Runtime Arguments

12: GPUs - Can I, Should I, and How?

Jim McQuillan & Wolf Episode 12

It's Wolf's turn this episode, and this one required research!

GPUs obviously do tons of work. You see it every time you play a graphics intensive game. You know how crypto-miners are using them. You’ve heard AI companies using them for model building. You’ve got this hardware in your machine! Can you use it? Should you use it? Where even to start?

GPUs can help if your problems, data, systems, languages, and architecture align. GPU-based solutions won’t help everyone … but when they do help, oh boy do they really help.

Takeaways
Platform recommendations:

  • NVIDIA: Richest ecosystem, start here if you have choice
  • AMD: Improving rapidly, good for PyTorch workflows
  • Apple Silicon: Excellent for unified memory workloads

Language recommendations:

  • Python for quickest wins
  • Rust/C++ for maximum control
  • JavaScript for web applications

Links

Dave Farley explains what's wrong with Vibe coding https://youtu.be/1A6uPztchXk?si=mzEg4mpbTIjaihnP

How do graphics cards work https://youtu.be/h9Z4oGN89MU?si=JRrumRPfYU6a0A02 

Hosts:
Jim McQuillan can be reached at jam@RuntimeArguments.fm
Wolf can be reached at wolf@RuntimeArguments.fm

Follow us on Mastodon: @RuntimeArguments@hachyderm.io

If you have feedback for us, please send it to feedback@RuntimeArguments.fm

Checkout our webpage at http://RuntimeArguments.fm

Theme music:
Dawn by nuer self, from the album Digital Sky

Jim:

We got a great episode today about uh that GPU you have on your desktop or in your laptop. Uh we'll try to get to the bottom of whether you can actually utilize it. Um so so that's coming up in a minute. Um in the meantime, Wolf, how was your week?

Wolf:

Uh well, if you happen to listen to the last episode, you know Oscar is home from uh surgery. They they took some string out of his tummy. And he's doing great. Oscar the dog. I should have said that. I'm sorry. Oscar the dog, who is, you know, technically not my favorite, but he's my favorite. Sure. So my week has been pretty good. Uh, other than we had out-of-town visitors at work, and then as soon as they got back home uh to the country that they are normally working from, they sent in messages calling out sick because they had COVID. And I had close contact and that made me feel bad. But today I got a I got a COVID vaccine, so I'm feeling okay.

Jim:

Uh so feel bad, not in the COVID sense, but in the worry sense.

Wolf:

Exactly.

Jim:

Sure, sure. Okay how's your week? Uh you know, very, very busy working really hard for uh for one of my customers trying to trying to do some uh payment plan stuff. Uh I think I mentioned before we're using square payments and we've integrated into our system. And so I've been working really hard on that, but it's really fun stuff and it makes the customer happy. So that's always good.

Wolf:

I love making people happy.

Jim:

Yeah, that that's that that's that's what I live for, right? Making people happy.

Wolf:

So You know, uh I don't like the physical aspect of the thing I'm about to say, uh, but I do like the emotional aspect. I don't want to be a UPS driver, but when you see a UPS driver and and they're ringing your bell or however they are communicating with you, you are happy to see them every single time. Yeah, most of the time. It's it's the FedEx. That's a good job.

Jim:

It's the FedEx delivery guys with the uh with the letter that's uh that's urgent that that sometimes is scary. But yeah.

Wolf:

Oh thank God I've never gotten one of those. Yeah.

Jim:

Well, when you're in business, you get all kinds of stuff, right? So uh uh feedback. You know, we love feedback. We got a lot of feedback on our uh IPv6 episode. That was the one we did a couple of weeks ago. And that uh I I I said it during the episode, that was one of our favorite episodes to do, and uh we got some awesome feedback, and we really appreciate that. Um you can send us feedback uh by sending an email to feedback at runtimearguments.fm. Please send us all the feedback you can. We we that's how we know there are people out there listening. Um Wolf, you got some feedback from uh from one of your coworkers, didn't you?

Wolf:

I did. Uh I work with a guy named Robbie. Um and uh he uh actually has uh I I've worked with him for a while, but he just recently actually joined my team. So now I work directly with him, and that's uh cool because he's uh a great guy. Um and Robbie listens very carefully to the show and sends us uh what he claims he always intends to be short pieces of feedback, but which always end up being long, insightful, and very uh influential and productive analyses of episodes. He has uh said some uh very valuable things to us, and uh it is greatly appreciated. I I don't know that there's our ranking structure, but I'm making one up right now, and we are elevating Robbie to the Pantheon of friend of the show. Uh thank you so much for listening, Robbie, and for all your your help and feedback. I I I hope that many more people join you.

Jim:

Yeah, yeah, thank you. It's encouraging to give feedback like that. We appreciate it. Uh we've got some feedback uh uh more more uh things I would like to follow, more like follow-up. Uh we several weeks ago we did an episode on AI and using AI to code. And a couple of things have come up, and I I I think they're interesting, so I'm gonna share them. Uh there's a guy named Dave Farley uh who's got a YouTube channel, and he did an episode on what's wrong with vibe coding. Uh vibe coding is where you basically let the AI do the coding for you. Um he's got a great episode uh on his YouTube channel about that, and I've included a link to that in the show notes. Uh if you get a chance, it's 17 minutes long, and it's uh the best 17 minutes you'll you'll spend on YouTube, I think. It was pretty good. Uh another thing is I just saw yesterday uh JetBrains. Wolf, you use JetBrains editors, don't you?

Wolf:

I do. Uh I I used to be a gigantic PyCharm fan. Uh PyCharm. I have a license to PyCharm. I and and at my at a previous job uh we used um whatever their their main one is, idea uh uh whatever it is. Uh I I always forget because I'm mostly in PyCharm. But uh as you well know, I have found that the error reporting and language understanding in the Helix editor when connected with an LSP uh really fits me better than JetBrains does. So I love JetBrains, but I am evolving.

Jim:

All right. Well, for those JetBrains users out there, whether it's PyCharm or whatever any of their other products, uh they are gonna turn on AI by default, which means all of the code that you enter in your JetBrains editor is going to be used to help train their AI model. Um I think that's a controversial move. You can turn it off. So I'm I'm I'm putting this warning out there telling you if you want to turn it off, if you feel icky uh when you hear things like this, turn off uh that AI thing so that they're not taking your code and using it to train their AI model. Uh apparently it's individuals uh that are using a non-commercial license, uh uh corporate licenses, they aren't turning the AI on by default. Also, Wolf?

Wolf:

Uh if you are an individual and you suck, please turn it off. Don't don't let your bad code get into the model. Thank you.

Jim:

Oh, yeah, don't let your bad code get in there. And if you've got good code, then you need to decide if you want to allow it in. Uh and then uh another friend, uh Michael Lucas, he's a great author. He writes science fiction and he writes technical books. Um he's working on a book now that's going to come out very soon, Networking for System Administrators, second edition. And in his technical books are the best out there. Uh anyway, he's including uh in the in the front pages of his book uh the following. Uh this book is licensed for your personal use. Machines are not persons. Use for AI training is expressly forbidden. I think that's pretty good. I wonder if the AI people are gonna actually heed that advice and and not use it. But it also makes me think: if all the really good authors out there are forbidding the use of their work in AI, doesn't that mean that AI is gonna be using all the bad stuff for its models? What's that gonna do? If all the best stuff is not part of the AI models, it makes me worry. I'm gonna talk to Michael about that and see if if he's if he's what he's thinking about that thought. Because uh while it's not his responsibility to make sure AI models are are the best quality, um I worry that all the great authors are gonna do this, and that's gonna leave us in a weird position, I think. What do you think, Wolf?

Wolf:

Um, I think I don't know a lot about writing. I do know a lot about code, and I do know that um I'm very sorry to say, most code sucks, and uh AIs are trained on most code. Yeah. So the answers that you get out of AI are, as I have previously said, the most popular answers that it has seen that fit that particular set of uh prompts. And that code is very probably not as good as what a good programmer is would have written themselves. So um there are problems.

Jim:

I I I agree, but if you take the best stuff off the top and don't allow it into the AI models, uh uh the average is gonna be even worse. Uh and that's that's that's worrying. Um I I I have not yet started using AI. Uh I'm dragging my feet. I know Wolf, you're using it uh for some things. I I I'm not ready. I don't know.

Wolf:

Anyway, I like uh I like when AI uh doesn't write brand new code for me, but looks at my code that I have already written and tells me you forgot this, or um, gosh, this would be a good doc string here, or things like that. Helps me find type errors.

Jim:

But I think that's an interesting, I think that's an interesting, useful way to use AI. Uh the people that are doing the vibe coding and and uh and just writing complete applications without understanding how they're working inside, those are the ones I worry about. Uh anyway, uh I I think we spent enough time talking about this stuff. Um why don't we get on to the episode? It's uh it's really exciting today.

Wolf:

Uh it's interesting. It's a very interesting question to me. This is a thing that came up at lunch, uh, as our topics often do, uh, and it was a thing neither Jim nor I had knowledge about, and that's why we decided to dive in. Uh I want to talk about GPUs. Uh these days, your computer, whether it's a laptop or a desktop or whatever, almost certainly has a GPU, especially if you're a gamer. If you're a gamer and you've got a card, and your card is some honk and expensive, powerful, air-cooled, has a fan, weighs four pounds, needs a power supply of its own.

Jim:

Maybe liquid cooled.

Wolf:

Maybe liquid cooled. Uh way back in the old days, uh we used to have CPUs that were integer only, and if you for some reason needed more than just integers, you might get a thing called a floating point coprocessor that would go on the same motherboard as your integer CPU. But you couldn't count when you were writing a program, you couldn't count on somebody else having a CPU of a floating point unit. Uh those days have changed. Now, whatever CPU has you have, it has floating point in it. Um, so you always know you're gonna be able to do floating point math. It's a lot the same with GPUs. GPUs do a fundamentally different kind of computing than uh your regular CPU does, and they're different. There's several different uh people who make uh GPUs. I'm gonna talk about how the things they make are different from each other. That makes it hard for an ordinary uh programmer like uh you or me, Jim, or anybody who's listening, to write a piece of code that's going to run everywhere uh and yet do the fastest possible computation it can. Uh today what we're gonna talk about is um several things. Uh what uh can a GPU actually do for you? Should you do you have the kind of problem that a j uh GPU can help you solve faster or cheaper or somehow better than you could have solved it without the GPU? Should you use it? And if all of those things are true, um how? How do you do that? Uh I think those are important questions. And I think right up front I've got to talk about uh the the uh let's call it the hardware landscape. Uh there's three major GPU uh styles, manufacturers, categories. Um let's talk about them. First, when an ordinary human says GPU, what they mean is a giant card. And that card i has uh some kind of huge uh printed circuit on it. Uh when I'm talking about GPUs in this conversation, I'm not gonna talk about cards. I'm gonna talk about that printed circuit. On that printed circuit, generally, there are forty or four hundred or four thousand or forty thousand GPU cores. Now I know that you already know what a core is because you have a CPU in your computer, and these days uh your CPU probably has eight cores or sixteen cores or you know, some ungodly number, some large number of cores. Well an NVIDIA card has on it um somewhere in the neighborhood of Jim, you said the exact numbers. It is it in the neighborhood of 10,000 cores?

Jim:

It's something like 10,762 or something like that.

Wolf:

Yeah. It's an NVIDIA actually has on that card three different kinds of cores. Uh by far the most common core they have on this card. It's called a CUDA core. C-U-D-A. A CUDA core is very much like your CPU, uh, except vastly simpler. Your CPU is a full-on computer. A CUDA core is more like a desktop calculator. It can perform simple operations, but if you were going to write something that did a lot of conditional logic and branching and lookups and did a different thing for every possible uh uh piece of data that you applied it to, probably this isn't a good uh problem to hand to your GPU. The middle category of cores on an NVIDIA uh GPU card is called a ray tracing core. Ray tracing cores are significantly more s uh sophisticated than a CUDA core. Uh they can actually do almost the entire job of ray tracing, albeit uh one pixel at a time, but over and over again, and there are a lot of them. Not nearly as many as there are CUDA cores, not by an order of magnitude, maybe even a couple. Um and finally, there's the very most advanced uh kind of core on an NVIDIA GPU card, and that is called a tensor core. Tensor cores are the most sophisticated of all, and they can do some remarkable math, uh, and they are often used for uh artificial intelligence and uh uh neural networks and uh modeling and training, and uh there's a very, very small number of those on a j uh NVIDIA GPU card. I can't remember the exact scale, but it's not like a hundred, it's more like twenty or ten, something like that. So that's NVIDIA. Uh it can do lots of work all at the same time because it has so many cores. Uh what's up, Jim?

Jim:

Uh uh uh CUDA, by the way, the the the uh acronym C U D A stands for Compute Unified Device Architecture.

Wolf:

Which feels almost meaningless to me.

Jim:

Yeah.

Wolf:

Uh an interesting thing that nobody ever talks about is these CUDA cores. Um they it's not like you have 10,000 of them and you address one of them. It's not like that. Instead, they divide them up into groups of, I think, eight. Uh might be sixteen, but I think it's eight. And each one of those groups is called a warp. W-A-R-P, a warp. Um, and of course, probably the first thing that comes to your mind is, you know, warp factor five and a Starship engine. Um, but that's not the kind of warp it is. Uh one of the original, we're gonna call use the word computer, but the Jacquard Loom uh used punch cards to automatically control weaving, weaving of fabrics. So taking many, many threads and weaving them together into a piece of fabric or carpet or whatever particular thing it is, controlled by punch cards. And the word warp comes from the act of weaving. That's what a warp is in the NVIDIA uh GPU card. And a warp is enti entirely comprises some small number of coup de cores. Okay, so that's NVIDIA. Second on the list is AMD. Uh mostly you uh interface with AMD cards using Rock M or OpenCL. Um they're not anywhere near the top of the list of who would I use? Uh that's NVIDIA. That's pretty much NVIDIA. Uh they they are improving, but they've got a long way to catch up. And at the bottom of the list, I'm sorry to say, because they have some things I really like, is Apple Silicon. Uh when you are looking at one of these Apple Silicon uh chips, uh it has some number of CPU cores, both um efficiency cores and performance cores. It also has two other kinds of cores, GPU cores and uh uh neural processing units. Now, Apple has provided uh uh programmatic interfaces to get to the GPU cores. Uh and unlike NVIDIA, a GPU core on Apple Silicon, one of them is just like all the others. They don't have multiple kinds. Uh the things they have are they've got the CPU we just talked about, and they've got the neural processing units we just talked about. If you want something that's not a just plain standard GPU core, you use one of those. So the GPU cores you can get to. Um it's much, much harder to get to the neural processing units. Uh so we're not gonna let those come into our conversation after this sentence. Uh mostly we're gonna talk about the GPU cords. Now, the the things that are bad about the Apple Silicon implementation is that I I I have the best laptop you can buy from Apple. It's the M4 Max, and I have every single GPU core you can get. On an NVIDIA, you get 10,000 however many that Jim just said. On my Mac, with the very best chip, on the very best laptop, I have 40 GPU cores. 40. That really changes the the math. Um NVIDIA can solve problems that have a lot more data points in them simultaneously than uh the M4 can. However, Apple Silicon has an amazing thing uh that tips the scale in the other direction, um, and that is this. The biggest part of the job when you're using the NVIDIA is that the NVIDIA cores are on a card across the bus, and all of your data and everything you can reach from your program is on the CPU side of the bus. And you have to spend time and cycles getting the problem, the data, across the bus to the GPU. Um that's the bulk of the work that happens when you're using an NVIDIA to solve your problem. But on Apple Silicon, they are all on the same die. They are in the chip, the one chip. Uh and Apple has a thing called unified memory. Um so for instance, my laptop has 128 uh gigabytes of RAM. And that RAM is as easily accessed by the CPU as by any of the GPU cores, as by the neural processing units. Um there you don't spend any time moving the problem from one place to the other. So this thing that I have just said is the very first dividing point. When you are looking at your problem, how you will know, yes, solving it with my GPU is a good answer. It is a thing that might help you. The more data you have, the more likely it is your GPU can help you solve the problem. Also, the more data you have, the harder it's going to be to get your data into the GPU if you're using uh NVIDIA. So it's all about your problem and knowing your problem, just like it always is, just like in every episode we record.

Jim:

So to compare the NVIDIA with the Apple, you uh you know, 10,000 cores versus 40 cores, the cores are doing different things. They're very, very different in the work that they do. And like you said, the memory architecture is different. So you know, we can't say that uh uh the 40 cores of the Apple are doing the same thing that 40 cores on the NVIDIA would do. It's just a different thing. So you have to structure your problem to take advantage of that, whatever it is.

Wolf:

That's exactly right. Um and I don't want to get ahead of myself, uh, but uh for instance, uh, it happens that I work uh for a company that builds high-resolution maps. Uh, and you've seen these computerized maps from other people, like Google's maps, and you know about Google's trucks that drive around uh mapping things. And Google's trucks have uh a camera on them. You've seen the pictures from the Google truck cameras when you do Street View, but another thing that's on a Google truck, and our truck has them too, is high-resolution LiDAR cameras. Some cell phones have these too. A LIDAR camera is 3D. Um, it fires off and gets back the reflection of everything that's in its field of view, and it knows how far away that thing is. So a LIDAR image is really uh a point cloud. And point cloud is a technical term. If you search for libraries that deal with point clouds, you'll see it. It's a lot of data. And each data is a point. And the GPUs love this, they eat it up. That is what a video game is. It is dealing with a godzillion number of points and doing stuff to every single one of them. Nvidia rocks at this. Um any GPU is good at it. Um so these can just like a video game is a good problem for a GPU, these LIDAR point clouds are a good problem for a GPU to solve. Um, a couple things I want to talk about. Um it is the case that NVIDIA dominates general-purpose GPU computing, and not by a little bit. Nvidia is the clear winner, even to somebody who doesn't know anything about computers at all. They they do know what GPU to get, and they know that it starts with the word NVIDIA, um, especially if they are uh dealing with cryptocurrency or AI or anything like that. Um and I'd like to talk about what you can realistically do on each platform today. And uh we've already talked about the unified memory story on Apple Silicon. Uh that that I think is important to note, but I I actually think we've probably already said everything we need to say uh about that. Um let's talk about why you would do this at all. Um you have code, it's solving some problem. Everybody's got a problem, everybody's writing code to solve their problem, everybody who's listening to this podcast anyway, and everybody is spending too much money. We've talked about that before. Everybody wants their code to go faster. Um so can your GPU make your code go faster? There's a couple different ways to do it. If you have NVIDIA, which you almost certainly do, um there are a couple things you can do that are very specifically about NVIDIA. Uh for instance, uh let me let me also just insert in here. In the old days, uh, we would actually write in assembly language. Um because that was the best possible choice. Uh you had it, it was there, it worked for your computer, um, and you definitely had an assembler, and it wasn't like you were going to write something that could run on somebody else's computer. Later we had C and other languages like that, things changed. We don't use assembly language anymore. Uh you don't want to write actual CUDA instructions. You don't want to write Apple Silicon GPU instructions. You want to write in some higher level language. Um it turns out that Nvidia uh has a tremendous ecosystem. They have a language specifically about talking to the coup de cores, um, etc. And you can use that. It's higher level in assembly, and it works, and it's used in a lot of places. But even higher level than that is you can work with libraries um that are already in your language. If you're a Python programmer or a Swift programmer or whatever, C programmer, you can use Python or Swift or C or whatever and call the library that's doing the number things you wanted to do, but underneath is using CUDA. Um of these things, they're just drop-in replacements. For instance, I work in Python. In Python, when we want to deal with giant arrays of points or geometries, or we use we use a thing called pandas. Uh it turns out that there's a Rapids R-A-P-I-D S library, which includes drop-in replacements that ha that are API um compatible with these super popular libraries. You can use uh C UDF instead of pandas, it's a drop-in. Instead of import pandas as PD, you import CUDF as PD. And that's it, you change one import statement, and suddenly all the places that you were dealing with pandas data frames of geometry, now instead of your CPU doing work, with, you know, God help you, eight cores or sixteen cores, now suddenly your Nvidia is doing all the work with 10,000 cores. Um of course the data had to get there, but uh and there's a bunch of libraries in these uh in this Rapids collection. Uh at least very popular ones, Pandas, NumPy, and Scikit Learn. Um there are some cross-platform alternatives. Um PyTorch Tensors uh can be a numpy replacement, and that works uh on AMD if you want. Apache Arrow has GPU support, and I think it supports more than just Nvidia. Polars, which is um an a kind of an updated and better version of Pandas. Pandas is uh, I believe written in C. Polars is written in Rust, Pandas is row-oriented, polers is column-oriented. Um they make different trade-offs. Uh one is not a clear winner, but they they they you if you analyze your problem, maybe one is better for you than the other. Polars has GPU backends. Um but in our case, uh where I work, we use pandas, and we are not right at this moment using the GPU, and from this research, I see how we could change one line of code, an import statement, and start using a GPU. Um for our local processing.

Jim:

That seems like uh a a good thing to do, doesn't it?

Wolf:

It does. I want to do that. Um give it a try. That's my plan. Uh I have other things I want to do too. I want to use a more modern Python. But uh I'll s I'll settle for anything that saves money. Um uh so that would be changing that one line of code would be a huge change for us. Um let's talk about languages in general. Uh uh there's a couple different levels. There's excellent support, good support, and maybe it'll work for you levels of code uh of languages. If you use Python, C ⁇ , or Swift, um there's a lot of help for you with respect to getting code to run on your GPU. In Python, there's QPy, Numbar, Rapids, PyTorch and Jacks. In C there's CUDA, which we already talked about. For the AMD, there's a thing called HIP. Uh on Apple Silicon, there's metal. Uh if you're using Swift, obviously metal. For stuff that's okay, you'll get good support. In JavaScript, you can use gpu.js or web gpu. In Rust, you can use WGPURS or Rust CUDA. Uh both of those, unfortunately, are Nvidia only. Uh in Julia, you can use CUDA.jl, uhmdgpu.jl, or metal.jl. So you can tell from the names that those three things are one for each of the three platforms we talked about.

Jim:

No, I've I I've never seen Julia code. I have no idea what it what it looks like, how it works. Who's using it? Have you ever seen it?

Wolf:

Uh okay. So to be 100% candid, I have seen Julia on someone else's display. Okay. Um because I use I I I use a uh uh tool called um oh god, I forget what it is, but it's it's the it's a dynamic environment for executing uh live code. And it was in initially named for Python Julia and R. Um and I uh brought an intern from a a company where I used to work to uh to a Python place where she gave the talk and she gave the talk on Julia, so I saw the code in her window. It's an interesting window.

Jim:

That you're talking about where you integrate the code into the document and it runs and yeah, and I use it all the time.

Wolf:

I I don't understand why it's not on the tip of my tongue. It started where iPython was the core of it. Um I I don't understand why I can't remember what it's called. Because I use it all the time. Yeah. Anyway.

Jim:

We'll we'll put it in the show notes.

Wolf:

Yeah, we will. I I probably have it laying around somewhere. Um if you use one of these other languages, uh, I'm not going to say I feel sorry for you because I think pity is a thing you you will become angry at. But um it might work a little or a lot in Go or Java or C sharp. Java has J CUDA and J O C L. C sharp has Alia GPU, although uh my understanding is that's completely inactive. Um that's at the languages. When it comes to the platforms, uh Apple Silicon, mostly you're gonna get to what you want through uh metal shaders. That doesn't sound right to me.

Jim:

You know because shaders I did some reading about that because like what is a shader? To me, that sounds like something in a graphics context, something you would use to to uh uh uh create graphic images. Um but it turns out that the shaders uh can do much more work than that. They're they're much more complex than just doing graphic things, they can do just processing on large chunks of data. It's just weird that they still call them shaders.

Wolf:

That is strange.

unknown:

Yeah.

Wolf:

Um on AMB, uh, their main thing is Rock M, R O C M. The R O C part is capitalized. Uh the support for ROC M varies by what your actual main programming language is. Um and cross-platform web GPU is maybe a future standard. It's hard to know. Um so let's talk about when does a GPU make sense? Uh let's go back over the sweet spot. Um for sure it's gotta be about the data. GPUs are when you have more than one piece of data. Um so are you doing array and matrix operations? Because that's what GPUs do. That's their bread and butter. They love it. Um do you have what is technically known as, and Jim loves this phrase, do you have an embarrassingly parallel problem? Um there are a great many embarrassingly parallel problems, and uh what that means is, in case you haven't run across this term before, do you have a giant thing to solve that is incredibly easily broken down into small things to solve, and then the answers to those small solved problems recombined? If you have that, your problem is embarrassingly parallel, and you can use so many extra solutions that other people cannot. Um, a lot of the math and computer science world is about trying to figure out how can I turn my problem that doesn't look like anything into Into a problem that does look embarrassingly parallel. Do you have a large data set? Are you doing some kind of interesting simulation, uh like a Monte Carlo simulation or something, that involves uh many simultaneous um uh problems being solved to uh uh construct a whole? Um if you have decided that Rapids is the right choice for you, and I strongly encourage you to look, because Rapids has a lot of sub libraries in it, uh for instance, are you working with ETL pipelines? Just to remind you, ETL stands for extract, transform, and load. Um feature engineering, time series analysis. I'm gonna put a lot of emphasis on time series analysis. Uh if you're writing a web page, time series analysis has probably never crossed your lips. Um, but I'm you talking to me. I am. But I used to work at a networking company uh called Arbor Networks, and a thing that we did was we uh deployed code inside of uh cell network switching computers, switching stations, and watched the flow of packets to understand um which packets might be uh uh hostile uh and how to easily get rid of them and how to quickly categorize. Um and a thing that was very important to us was time series databases and time series analysis and graph analytics. And when I say graph analytics, I don't mean um we made$300 this quarter and five hundred dollars last quarter, so the bar to the left is taller. I don't mean that kind of graph. I mean graphs with nodes and edges, like when you try to examine a git commit history.

Jim:

Yeah, I think with uh acyclic uh uh redundant graphs, uh what are what are the uh cyclic graphs and that kind of thing, uh data structures.

Wolf:

That is that is exactly what I mean. Uh it turns out a great many problems can be solved with graphs a lot more than you would think. Um and kind of as an aside, I'm almost afraid to bring this up because it it might make Jim stop me. But uh I've been listening to a book on audio during my commute called Um The Theory That Wouldn't Die. It's not how to use it or do it, it's the history of Bayes' rule. You've probably heard of Bayesian spam filtering, for instance. Uh Bayes' rule, huge, huge uh point of contention for uh well over a century. It it's all about probability and knowledge, and I think it's remarkably fascinating. Um that is another thing that solves a way more problems than you would have thought. Maybe we'll talk about that someday. So um let's talk about how you might get started. Uh if you use Nvidia, uh you're on the easy track. Uh you should probably start with Rapids, if you're already using Pandas or NumPy, um remember what is your problem? If your problem fits this model then yeah, go ahead and try. If your problem is something that does not use a lot of data, that is not embarrassingly parallel, that uh is not worth pushing over the bus to get it to the GPU, maybe you should have stopped listening already. But uh if your problem is big, like mine is, um start with rapids. Um if you're doing a bunch of numpy like stuff, use QPy. Um if you uh need a custom kernel or something, maybe numba is the answer. Uh let's talk about AMD as your platform. Probably if you're using uh uh Rock M PyTorch or PyTorch, those are the two things that go together. Uh OpenCL via PyOpenCL. Uh, but absolutely look at Rock M first because that's gonna be a much smoother and better answer than the OpenCL answer. Um if you're an Apple Silicon user, uh in one way I'm super happy for you because I happen to love Apple Silicon, and in another way I feel sad because you are definitely the bottom tier. Um Apple Silicon users uh should start with PyTorch using the uh MPS backend uh or TensorFlow metal uh or native metal if you absolutely want to eke out the very best performance you can. And if you're a web developer, and somehow as a web developer, you have a problem uh to which the GPU might be applicable, um GPU.js for browser-based compute, or web GPU for next generation uh access.

Jim:

You know, there's another choice if you're a web developer. Remember uh back, I don't know, it was episode three or four we did on WebAssembly. Uh turns out you can do like CUDA in WebAssembly. Uh so you compile it probably in C or something, and and uh combine that with WASM and shoot that down to the browser and it can run. Um I'm a web developer, but I I still don't think I need to do that.

Wolf:

Um I was having a conversation about this um with uh the mathematician at work, uh a guy who I would also uh I'm getting ready to categorize him as a friend of the show. Uh his name is Dave. Uh we were talking about how people use GPUs. Um these days, uh uh let me tell you why is Nvidia one of the most highly valued companies in the world. It's because of uh cryptocurrency miners. It turns out that you can use uh an NVIDIA card with all those cores to simultaneously calculate hashes of different um arrangements of transactions to form a block. And you can do them all at once and find the best one and submit it, and as we know, it's just a race to submitting the first uh correct trans uh block. Uh so there are a gazillion uh cryptocurrency miners who bought NVIDIA cards. Uh it turns out that nowadays you don't actually want a GPU anymore. If you are a single person with a GPU card and it's a nice one, you could do some mining. It's not worth it, but you could do it. But an actual cryptocurrency miner spends some cash and burns a special thing that's called an ASIC. Uh also um some it's in the category of or is an FPGA.

Jim:

Yeah, it's an application-specific integrated circuit. It's just a giant integrated circuit that does logic massively parallel.

Wolf:

Exactly. And you can, for the same amount of money as you might spend on a GPU card, you can uh burn an ASIC that does the same job 500 times faster. So that's 500 times uh better results. So they crypto miners who know what they're doing.

Jim:

Yeah, in the crypto world, it's whoever whoever gets the answer first uh gets the gets the prize.

Wolf:

Yeah. Uh so I that that seems like a uh what do they call it in s in profit forecasts, a potential weakness.

Jim:

Well, there's return on investment. That that's where you'd get your your ROI if you if you did that.

Wolf:

Well, whatever it is, this is a thing that is going to potentially negatively impact NVIDIA's future. Uh at least at the moment, Nvidia still has all the people training AI models. Um, and that's a big graph problem uh with weighted edges, where when I say weighted edge, they use the word parameter. Um so they're still selling those cards, but how many people are buying those cards? Uh so I don't know. Is Nvidia going to continue to remain super, super profitable? Yeah. I I'm not an Oracle.

Jim:

You know, it's interesting. We we did talk about how the the GPUs are used for uh AI, but it's used for the training of the models, not you know, if you go to Claude and you type in your request, uh as I understand it, that's not going through a GPU. It's the training of the model that that made it all happen to begin with that enables you to go in and type your query.

Wolf:

Exactly. Open AI, you know, buys fifty billion dollars worth of GPU cards so they can build a model. And then when you type in your prompt, um all you're doing is stepping around inside a graph, traversing uh from node to node on the edges with the greatest weight. Uh exactly like compression or uh autocomplete or prediction or or or whatever. So yeah, uh OpenAI, buying a lot of NVIDIA cards. Uh Wolf and Gym, not buying a lot of NVIDIA NVIDIA cards.

Jim:

I don't I I do not have a discrete video card, a graphics uh GPU card. I've uh the only GPU I have here is on my uh my Apple computers. I've never I'm not a gamer. I've never needed uh uh that kind of hardware for gaming. I mean a game to me is like solitaire, you know, sudoku. I don't I don't need a GPU for that.

Wolf:

I totally hear you. All right, the reality of GPU programming. Um, there are challenges. First of all, um memory transfer overhead. Uh that is the number one problem by far. Um however, if you're like me, there's absolutely a second problem. Um I always encounter this when I'm writing stuff that's gonna run on a thread. I have never found a debugger that helps me debug a thread to the extent I would like. If it crashes on a thread, I figure out a way uh to run it synchronously. Uh it's the same with uh stuff running on your GPU. In the typical GPU case, you have access to the CPU, but the GPU is all the way across the bus. You're not looking directly at it, you're not touching it, debugging it's hard.

Jim:

But the but the problems you're solving on a GPU, you don't have conditional logic in a GPU, right? You have basically a bunch of arithmetic instructions. Uh there's there's not branching and all that kind of stuff, as I understand it.

Wolf:

So Um Let me put it this way. There is branching, um, but it's absolutely uh not a very important tool in the toolbox. You don't want to use it. It's possible, but it's not as helpful as you you would wish. But it's hard. No matter what. If if something is if you didn't do the right thing, um you've got to figure out how to see what you actually did do. Sure. You know, um yesterday uh I wrote a piece of code uh that uh it happened to be Python, and I was doing a piece of code just to satisfy special type checking machinery that I had enabled, and this was uh something about SQL. So my function took a piece of SQL that you passed in, that's a string, so it said in the parameter list, this is a string, and then it called this function on the string and returned the result. Uh and when I called that one single function inside that was the entire internals of the function I had written, instead of passing the variable SQL, I passed the type of the variable str. So the thing I called was expecting a string, but it got a type. Yep. And you know, I my eyes passed over that bad code twenty times, thirty times, and I never saw it. I I had to get another person to look over my shoulder. They said, What is that? Uh, and we fixed it. And you need uh to consider are you locking yourself into by using specific GPU um solutions, are you locking yourself into a specific platform? Have you now made it that you can only use an AMD? You can only use Nvidia. Um and finally, did you actually save money? Would the CPU itself have been faster? Um and like I said in our performance episode, measurement is job one. You need to be measuring all the time. So um that's the challenge is um what are the economic considerations? It's gonna be friction. It's harder, it's gonna take you longer to write the GPU answer. Are you gonna get that back in compute savings? Um if you intend for your new code to run in the cloud, well, a default, at least today, a default cloud setup, whatever, machine, instance, node, whatever you whatever you're gonna call it, doesn't have a GPU. You can get a GPU on basically every service I've looked at, but it's you're asking for a different thing, and it costs a different amount, usually more. Yeah, quite a bit more. And maintenance. It's a whole new thing. You need knowledge, and you're going to apply that knowledge over and over and over again, because reading and fixing the code is something you do five times, eight times, twenty times more often than you do uh writing it for the very first time. Um it would have been nice to be in a place where I could have showed you some code or done some numbers, um, but we're just audio. We're just an audio podcast. Uh I think I've given you the place to start and told you where to look. So let me reduce it down to some takeaways. Um I've got uh two kinds of recommendations for the takeaways. Um what platform and what language. For the platform, NVIDIA has the richest ecosystem. If you've got a choice, that's where you should start. AMD, it's getting better, it's getting better fast. If you have a PyTorch workflow, AMD might be right for you. Um uh Apple Silicon, the fact that it has unified memory, that's its win. That and it's more sophisticated cores. Uh if those two facts are applicable to your problem, uh Apple Silicon might be the right place for you. Um languages. Uh Python is gonna give you the quickest win. Um Rust and C are gonna give you maximum control. And JavaScript is gonna be great for uh web applications. And I want to give some real-time feedback that uh Jim just communicated to me, and that is that thing we were talking about is Jupyter Notebooks.

Jim:

The thing you couldn't remember the name, somehow it just popped into my head, and I've I've never used it.

Wolf:

I I I use it all the time, it's super awesome. Yeah, Jupiter is really three syllables uh compressed into one word. The J-U part of Jupyter stands for Julia, the PY stands for Python or PYT, and the R is for R, the language, the speech language. So Jupyter's about Julia, Python, and R. But um these days you can get a Jupyter kernel for almost any language you want to work with. I personally have have used SQL and Perl and uh several others. Uh Jupyter's pretty awesome. Um anyway, uh that's everything I think I had to say. Jim, um do you have any final questions? Anything else?

Jim:

Well, I um uh you know I'm a I'm a web developer. Uh uh I write back-end business, uh front-end back-end uh business applications. I just don't need a GPU. I find them interesting. I've got a uh I've got a Apple. I I may poke around a little bit with metal just to kind of try to understand it more, but I'm not solving those kinds of problems. But uh it's still interesting stuff.

Wolf:

I um I mentioned the word Bayes earlier in our conversation. Um and Bayes is about calculating probabilities of many possible simultaneously outcomes. Um they've used it for all kinds of things. Uh it occurs to me, I wonder if a GPU would be helpful in solving Bayes kinds of problems. Boy, I'd I'd like to investigate that.

Jim:

Yeah. Yeah. Well, that's your homework.

Wolf:

That is my homework. That's your takeaway. But uh later, uh maybe in an upcoming episode, I will mention the website I have decided to start investigating that is a complete expla practical explanation of Bayes for ordinary programmers like us who aren't mathematicians. Um my mathematician at work, Dave, um he has loaned me a book that he promises is going to teach me what I need to know, and I I open it up and look inside and there are lots of math symbols and I get terrified. I don't know if I'm terrified of not understanding it or if I'm terrified of letting him down. Yeah. Maybe both, right?

Jim:

All right, that's it for me. All right. Well, hey, thanks. That was uh that was really interesting, Wolf. Um, thank you so much, uh, listener. We appreciate you guys uh showing up uh and and listening to our podcast. Uh please spread the word. Uh we're trying to get the word out there about us. Uh, you know, every two weeks we're putting out a new episode, and uh, we've got a bunch lined up. I can't wait to tell you about those. Uh if you have feedback, once again, send it to feedback at runtimearguments.fm. Uh you can communicate with us directly. Uh we're going to include our email addresses and our Mastodon uh account information on uh the show notes. So please check that out. So until next time, uh thank you very much. And uh, Wolf, it was great fun, as usual.

Wolf:

Yeah, thanks everybody. Talk to you next time.

Jim:

Bye bye.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

CoRecursive: Coding Stories Artwork

CoRecursive: Coding Stories

Adam Gordon Bell - Software Developer
Two's Complement Artwork

Two's Complement

Ben Rady and Matt Godbolt
Accidental Tech Podcast Artwork

Accidental Tech Podcast

Marco Arment, Casey Liss, John Siracusa
Python Bytes Artwork

Python Bytes

Michael Kennedy and Brian Okken
Talk Python To Me Artwork

Talk Python To Me

Michael Kennedy