What's Up with Tech?

CUDA Everywhere: How Spectral Compute is Democratizing GPU Programming

Evan Kirstel

Interested in being a guest? Email us at admin@evankirstel.com

What if you could run CUDA code on AMD hardware without changing a single line of code? This groundbreaking possibility is now reality thanks to Spectral Compute's Scale compiler, which we explore in depth with founder Chris in this eye-opening conversation.

Born from frustration with cross-platform GPU programming limitations, Scale represents seven years of dedicated compiler development that allows developers to simply recompile their existing CUDA applications for AMD hardware. "Keep your code exactly how it is. Just recompile it," explains Chris, highlighting how Scale eliminates the painful process of rewriting code with tools like HIP that never quite deliver perfect compatibility.

The technical achievement is remarkable – bridging fundamental architectural differences that neither NVIDIA nor AMD have incentive to address themselves. While NVIDIA benefits from CUDA's effective monopoly and AMD pushes their HIP alternative, Spectral Compute has created the missing technology that allows developers to treat GPUs more like CPUs, where code works across vendors without compromise.

Scale currently shines with traditional high-performance computing workloads like fluid dynamics, sometimes outperforming AMD's native solutions. The team is actively enhancing AI workload support, particularly around matrix multiplication operations, with full PyTorch compatibility targeted for early 2023. They're also expanding hardware support beyond AMD's current lineup, even supporting GPUs that AMD themselves have abandoned.

With a thoughtful business model that offers Scale free for consumer cards and academic use (charging only for data center deployments), Spectral Compute has created an accessible path away from vendor lock-in without sacrificing performance or compatibility. For organizations seeking more hardware options without the pain of code rewrites, Scale represents a compelling alternative that could transform the GPU computing landscape.

Support the show

More at https://linktr.ee/EvanKirstel

Speaker 1:

Hey everybody, fascinating chat today as we dive into how to get AI running cheaper, faster and across more hardware platforms with Spectral Compute. Chris, how are you?

Speaker 2:

I'm good Slightly melting in my very hot office over here.

Speaker 1:

Well, we'll try not to melt together, but before that, maybe introduce yourself your really interesting background and what's the backstory behind Spectral Compute. What got the company started?

Speaker 2:

Well, it all started back in 2017. We were trying to write a library for cross-platform GPU programming because we weren't happy with the experience for writing the GPU code at that time, and this was sort of in the days before things like Thrust came along. So we built a library of very nice GPU programming tools. It worked great on NVIDIA. I wanted to just run it on AMD hardware as well, so we tried to set up AMD's tools for doing that, and it was well. It was a mess, and we got so fed up with that that we decided to just pivot and focus on solving that problem. And then, a few years later, here we are with a Cuda compiler for AMD approximately.

Speaker 1:

Wow. Well, you certainly hit the timing right, and you promised something called faster or free, which is for performance gains. That's quite a promise. How does that work exactly?

Speaker 2:

That actually is for our consulting business, which is separate from the CUDA on AMD compiler that we're talking about. We'll have to talk about that as well, of course. So that was actually. It's interesting how that connects to our backstory, because CUDA on AMD was a thing that is somewhat challenging to get people to believe in, because people believe well, there's this prevailing belief that CUDA is inherently optimized for NVIDIA hardware. So when you tell people I'm going to run unmodified CUDA code on AMD hardware, everyone says no, you aren't. It's going to be horrible and slow.

Speaker 2:

So the way we run the company for the first few years was we did consulting to fund the business and then pulled that into developing scale. The product we now are launching the Cudor and AMD compiler. So the first rule of three is lying on our not wonderfully developed website is referring to a guarantee we gave to our consulting clients because the consulting business was a performant optimization as a service to like the high efficiency traders. Ai is that kind of thing just getting in there using our skills to make that stuff go faster.

Speaker 1:

Got it, and let's talk about scale. Can you maybe explain what scale is, what it does and why it's such a game changer for CUDA developers?

Speaker 2:

Right. So today if you have a CUDA program and you want to run it on AMD, you have a CUDA program and you want to run it on AMD, you have to use HIPAAfy and HIP and the RockNStack. So you have to basically rewrite your code in a different language which is incompatible with CUDA, and then you can try it out on AMD hardware. That's quite a large barrier to entry. To begin with, having to rewrite your code is a big deal. Although AMD would like you to believe that tools like HIPAAfy largely automate that, I can tell you, as someone that wrote quite a lot of HIPAAfy, that this doesn't really work. There's a bunch of reasons why, but basically the language is subtly different from the language that NVIDIA's compiler accepts and the APIs behave differently.

Speaker 2:

So what we do with fail is we say screw all of that, keep all your code exactly how it is. You just recompile it. So we provide the exact same commands that you'd use to build your program for NVIDIA. We provide those same commands but they now work for now built an AMD program. So you just install, scale, turn it on and now the commands that were previously built for NVIDIA now create an AMD binary for you. So you can very easily try it out. And by lowering that bar to try anything out, we think it makes it a lot easier to make the jump right, because now the barrier to entry is just try it out and then you can try it, rather than having to do all this R&D and then maybe you'll like AMD hardware. That's crazy right.

Speaker 1:

That's kind of the idea. Got it Very simple pitch and building scale took, I guess, many years. Seven years. That's quite a journey. You guys were drinking a lot of coffee along the way, I imagine.

Speaker 2:

Yes, and hiding from a lot of pandemics and a few other bits and pieces. We weren't working full-time on scale throughout that period. Like I said before, we took a bunch of detours doing consulting work to keep the lights on. But yeah, it's a big job. The basic approach we took was to get enough working to be convincing, because no one's going to believe us if we just have like hello world working. So our approach has been get like some real world big programs Like we can run like fluid dynamic software that NVIDIA have published on scale with no code changes. We can run GRUMAX, these like really big, real programs, getting them working out of the box with no code changes at all. That's enough to actually convince people and VCs. So that's kind of the idea.

Speaker 1:

Very cool, and how does scale compare to other CUDA translation tools that are on the market?

Speaker 2:

Well, the key point is it's not the translation tool, that's the key difference. We believe that source translation is fundamentally a problem, because when you do source translation, you now have two copies of your source code. You've got your original cuda and you've got whatever the new thing is, be it mojo or triton or hit or whatever, or sickle there's lots of them and these other tools, these other languages, they might have advantages, but it's a different code base. So now you have have two. If you want to stay in NVIDIA's world uncompromisingly, you want to keep the CUDA, but you also want to keep the other thing that you've chosen, because you want A and D. So now you're maintaining two copies of your program and that's bad for everyone. You see, there's lots of examples of projects that do this.

Speaker 2:

One of the two versions gets more love right, more attention, more developer time. You end up with a type of 30, have bugs in one but not the other. One gets faster than the other. It's just a mess. So way better to have a unified code base and you can still, if you want, to have a few special bells and whistles, you optimize for one particular vendor.

Speaker 2:

But it's a good analogy to think about what it's like to write code for CPUs, about what it's like to write code for CPUs. When you write Rust or C or whatever for a CPU, you don't care if it's an Intel or a ARM or whatever CPU. The compiler takes care of it for you and basically you can do the same with GPUs. It's a lot harder on the compiler side, but basically no one's done that compiler R&D because GPU vendors don't have an incentive to, but we do, the GPU vendors. Why would NVIDIA or AMD spend time developing compiler tech that bridges the gap between hardware differences? That doesn't help the bottom line right. They want to sell you hardware. So that's what we're doing. In a nutshell, we're developing that missing technology to bridge the gap. In the same way, the gap has already been bridged on CPUs.

Speaker 1:

Got it Makes sense. You also built a gpu accelerated rejects engine um. How does that work? How does that fit into the bigger picture?

Speaker 2:

that sort of fell out of some of the earlier stuff and consulting work. So what I mentioned way, way back before scale was really a thing. We were trying to build accelerated libraries for GPUs and the Regex engine was one of the things we built. At that point we had a client in the high-frequency trading industry who was interested in doing regular expressions very efficiently, so we developed that for them. Basically, it's on its way to becoming open source.

Speaker 2:

In the next year or so we plan to open source a lot of our earlier projects, which is a lot of basically CUDA code that we wrote in those early years, because we still think that some of those old libraries we wrote have value even today. Some of the techniques we developed are quite useful. As for how the Regex engine actually works, the basic idea is that it pre-compiles an optimized GPU kernel for the input Regex at compile time and it ends up basically emitting a fully branchless code sequence, for the regex Outperforms every other regex engine on the market. But yeah, we're going to be publishing it soon-ish, open source, so people can take it apart and see all the horrendous things we did. I mean, it's just a regex compilation in the FeverPlus type type system. So it's a bit insane, but it works. It's fast, so who cares?

Speaker 1:

interesting and but scale isn't open source. Why take that route? You know well, you know a lot of the ai ecosystem is going open source. Amd is doing good work there. What's your philosophy here?

Speaker 2:

well, the philosophy is I'd like to keep paying rent, please. Um, basically, obviously I'd love to make it open source, um, like, ethically I'd love that, um, and of course it would make life easier because people could submit fixes. But the reality is we have to make pay the bills somehow. So the compromise you've got is that we we publish it for free for use on consumer cards and for academic use. Um, so we're already, we have agreements with several universities and stuff.

Speaker 2:

If you, if you are a university or research institution that wants to do non-commercial academic or teaching use, then just contact us. If you want to use it on consumer cards, it's free. You can just go and download it off the website. Um, so they only charge people that want to use it on data center series cards, like the MI series cards, basically the cards that cost more than $20,000, that's where we charge a fee. And even then, because AMD cards are so much cheaper to rent or buy than NVIDIA cards, our fee doesn't really make much difference to your costs. You come out very far ahead, but that allows us to fund the project, basically Because ultimately we don't want to be spending our time on consulting and we can spend it on developing scale right.

Speaker 1:

Yeah, and it seems a great proposition. Why aren't NVIDIA or AMD talking about it more?

Speaker 2:

Well, we are talking to those companies, of course, but I'm not sure how much I'm allowed to say about our relationships with those companies. So, amd, I think they like it a lot. I think NVIDIA likes CUDA and would like you to keep using CUDA. I would imagine if you asked them they wouldn't be super-duper thrilled about CUDA becoming cross-platform right, because NVIDIA gets a huge advantage out of the effective monopoly right. Cuda is a really good tool. It's really good to use. Developers love it. There's lots of code written in it. So NVIDIA get a huge advantage from the fact it only works on their hardware. So I'm sure they would be happier if they retained that indefinitely.

Speaker 2:

And AMD, they've been investing very heavily in HIP, which for those unfamiliar, is basically like CUDA with a funny hat on. It's AMD's answer to CUDA. It's meant to look like CUDA, but it's just different enough from CUDA to not actually be compatible with it, but just similar enough that it's familiar to CUDA users. But it has all these subtle problems that make it a pain in the ass to actually work with. So they well, an AMD seems to be very much pro-HIT and they want everyone to rewrite their code in HIT.

Speaker 2:

But if you look at what's happening. There's a reason people don't do this. You know there's performance issues, there's correctness issues and ultimately, people don't want to compromise. Because once you migrate to HIP, now NVIDIA is a second-class citizen. Why? Why can't we have both vendors be first-class citizens, and that's what we did. So right now, our NVIDIA support is NVIDIA's compiler. There's no compromise. Is NVIDIA's compiler right? There's no compromise. If you want to build for NVIDIA, you just turn off scale and away you go. We are working on compiler tech that we think can beat NVIDIA at their own game and provide performance advantages, even on NVIDIA. But until we've got there right, we're not going to force you to go through our compiler when it's not meaningfully superior, right, we just want the best for users.

Speaker 1:

Well, it sounds very rational and practical. I love it. So are most of your customers focused on AI workloads, or are they trying to clean up legacy systems a bit, or both so?

Speaker 2:

at the moment, most of our customers are in non-AI high-performance compute industries like fluid dynamics and that kind of stuff, and the reason for that is basically that the area where scale has the most holes is AI. The reason for that is because it basically all comes down to the matrix-multiplied hardware in the chips. So the AMD and NVIDIA matrix-multiplied hardware, what NVIDIA calls the Tensor cores. They have similar features, but theVIDIA calls the tensor cores. They have similar features but the way they work is very different and we're a few months into doing the compiler R&D to basically map one onto the other, and until we do that, they perform very badly.

Speaker 2:

So although lots of AI projects do work, the performance of a lot of them is currently not that great. So until we fix that, which we are close to, though, that doesn't really work. The good news is lots of old-school science applications and CFD not even old school lots of new school. Everything that isn't AI basically isn't using those features and our performance there is great. We beat AMD's performance in lots of areas, like our kernel launches are way faster than theirs. There's lots of benchmarks we win on, so those people get a lot of advantage out of this. So it's those sorts of applications, at the moment mostly at employee production.

Speaker 1:

Interesting, and do you plan to support more hardware beyond AMD? And others in the future? What's next on the horizon? That is the vision I don't think I'm allowed to tell you what's next on the horizon.

Speaker 2:

That is the vision. I don't think I'm allowed to tell you what's next on the horizon in terms of new vendors beyond AMD. We are already expanding our support within AMD's product line. There's actually a couple of GPUs we support now that AMD have dropped support for in Rockend. Now our strategy is to keep supporting AMD hardware for much longer. A point of friction we've seen in the AMD world is that RockN keeps dropping support for GPUs that aren't actually that old. Meanwhile, nvidia is supporting seven eight-year-old GPUs, so we are going to be doing that for AMD hardware, but then we're keeping old things around for longer and continuing to apply optimizations to the entire product line. As for beyond AMD, we're going to be keeping old things around for longer and continuing to apply optimizations to the entire product line. As for beyond AMD, we're working on a few things, but I don't think I'm allowed to say just yet. Aren't NDAs irritating? Sorry?

Speaker 1:

Very true Care to share any customer anecdotes. I saw something on the website about Playground AI. You know pretty great results.

Speaker 2:

Yeah, those guys again were consulting clients. So what we did for those guys is we basically jumped in and made their. They had a sort of PyTorch AI application and we made it less slow, significantly less slow. They had a generative AI application and it was using. It was burning lots of money in AWS, so we made it to use less compute time, which translates into lots and lots of money. Safe Um, so good stuff, but again, not scale. Um, what I'm hearing is that we should probably revamp our website to be more clear about our current activities.

Speaker 1:

Um, yeah, that would be. That would be great, but it's also great to get an update here. What are you up to in the fall? I can't believe it's september almost uh any any hackathons or meetups or events that you're planning?

Speaker 2:

yeah, so well, in november we'll be. We'll be exhibiting at sc25 in the us, so, um, come and find us next to dell with our large purple stand. We'll be there talking about scale. We'll also get some cool demos and things lined up. Actually, it's mildly amusing that we are hoping to have a demo on the Steam Deck.

Speaker 1:

Going for that just for quality value, why not?

Speaker 2:

I mean the Steam Deck has an AMD GPU in it and running CUDA on it.

Speaker 1:

I have one. It's a good little machine, yeah.

Speaker 2:

I'm not sure there's any commercial value in it, but it's cool and that's what counts for demos. But yeah, in terms of and then, other than that, we'll be doing a major release of Scale in the next couple of months, because it's been a while since we did one Last one was in May and that should hopefully include a major enhancement on that, not by stuff I mentioned. So the AI stuff should start to work a lot better. We are targeting having full PyTorch support by the start of next year, but we'll see. That's an ambitious goal Because we're building the software platform rather than porting projects. We can talk about what projects work, but we're not doing work per project, if that makes sense. We aren't sitting here rewriting PyTorch. We're building the functions of CUDA that it uses, so then its old code just works and modifies, but that takes longer to get going. But once we get there, we're done forever, if that makes sense.

Speaker 1:

Oh, it does. So you're a small team swimming in a sea of giants, tech giants. What are your plans to scale, to grow, or what are your ambitions over the next?

Speaker 2:

year or two. Well, we are hiring. We are hiring Since we did the initial release of scale a year ago. I think we've quadrupled the team now, so we're going all in on scale. We've mostly stopped consulting. We're focusing on getting scale complete, getting scale performance and trying to really make that go, Because there's quite a lot of demand for CUDA on AMD and a lot of people are, I think, burning a lot of money on things like HitPort when they don't need to. So we think we have a really valuable thing. As you say, the timing is quite good.

Speaker 1:

Well, there are tons of great developers out there. Where are you guys based? I assume you're pretty virtual these days.

Speaker 2:

Yeah, we're fully remote. I mean, if you're trying to hire people that have compiler and GPU experience, there aren't that many, so you can't really be picky about which country they're in. So we've got two of us from the Netherlands, there's a bunch of us from Cambridge, there's a few in Edinburgh. It's just scattered around Mostly Europe, got some Danish people, but it's a real mix. Mostly Europe, mostly Western Europe, but not entirely. But yeah, everything's fully remote. We meet up a couple of times a year in person to prove we actually exist and aren't just like an LLM Great.

Speaker 1:

Well, thanks for joining. Really a great team, great mission and love the approach. Good luck onwards and upwards. Thank you, Thanks for joining. Bye-bye, Second priority probably, but thanks for joining. Thanks for listening, watching, sharing everyone. Take care, Thanks, Chris Toodles. Bye-bye.