I think I started programming around the end of 2022. I started with Python. Then a year after that, somewhat I started getting into C. And then middle of 2024, I discovered CUDA. And that's I got really into CUDA. And then I did my master's thesis in CUDA. I after that, June of last year, I started an internship at Nvidia. And then that was until December. And then in February, I've started as a full-time employee as a software engineer on the NVIComp library team. So maintaining that library. And that's basically my path to where I am now.
ConorWelcome to ADSP The Podcast, episode 283, recorded on April 23rd, 2026. My name is Connor, and today with my co-host Bryce, we interview a fellow NVIDIA, Marco Snogado. This will probably be part one of a three or four part conversation. In this first part, we primarily talk about Marco's path to NVIDIA, where he started with zero programming experience less than three and a half years ago and ended up with a full-time position at NVIDIA.
BryceI flew back, you know, Shanghai, Hong Kong, Hong Kong to New York, and that is east to west. Ramona flew back Shanghai to Dubai, and then she spent a day doing stuff in Dubai and then Dubai to New York.
ConorMy question is not why she's not with you. My question is why are you picking her up? Because last time I checked, you don't have a car.
BryceI mean, this is just this is just what you do, Connor. When your partner returns from a long trip, you go to the airport and you meet them with flowers.
ConorWhoa, whoa, whoa, whoa, whoa, whoa. What? She's a big thing.
BryceI'm gonna take the train. She's got bags, and she doesn't like traveling on her own, so you meet her at the airport. What? She's gonna have to go and figure out how to where to get a cab on her own.
ConorIt's called Uber. You hit a button on your phone and it shows that as well. You are quite the gentleman. I mean, because you how long have you guys been dating? You've been dating for Yeah.
BryceI'm actually for once I'm banking on on immigration, taking a long time to process her, like they usually do. All right, so we we we don't we don't have time. We have like a whole, I've gone down like a very deep AI rabbit hole, and like I think I've leapfrog I sent you on Slack, I think I've leapfrogged you on a couple things.
ConorAll right, we gotta we gotta take a step back. This is classic Bryce, though. He like jumps into the middle of like a conversation that we haven't even started, and he's like, alright, alright, we got it. So we have we have a guest with us, first of all. So what we're gonna do is we're gonna introduce our guest, and then we are going to talk about the couple, maybe more than a couple topics that you're gonna hear over the next two to three to four episodes. We're probably gonna record this in two parts though, because we've got, I think, only 40, maybe 50, depending on the status of Ramona's progress through immigration. And anyway, so what we talk about in this first episode, I don't actually know which topic we're gonna cover first, but we'll introduce Marco in a second. The topics we're gonna talk about, I believe, are Bryce's leaping Connor and his AI abilities with his autocuda and everything else he's been up to, plus maybe some Bryce updates because he's been all over the world at different, I don't know, conferences, universities, doing lectures and talks. I don't actually know. He'll tell us. Will that be in the second part, the third part, or the first part? We don't know. And then the other thing we're gonna the other major topic we're gonna be talking about is GPU implementation of an algorithm we talked about, I don't know what episode it was on, folks. Maybe Marco does, of rotate, which we I think 200, 202.
Marco SalgadoThere's a couple of them there where you mentioned that in the episodes with Jarrett Albrook. You also talk about that. Well, so the on the slides that I presented, I mentioned one of the episodes. That's why I know that.
ConorSo we'll we'll let Marco introduce him in a second, but the way that this episode or these ADSP episodes came together is I've missed the last two Better Code meetings, which is an internal meeting at NVIDIA that happens, I believe, every Wednesday or every other Wednesday that Jake Hemstad, I think, is in charge of, but he might have actually delegated running it to a couple people on his team. Jake is uh a manager of the uh CCCL team, which is the CUDA Core Libraries. CUDA Core Compute Libraries.
BryceSee, that's the beautiful thing about that acronym. Yeah, it's changed for whatever you want it to stand for.
ConorThis the C has changed uh from time to time, but it is currently the first C.
BryceOriginally it was CUDA Core Compute Libraries. No, I thought originally it was C. C, yes. No, no, no. Originally I told people it was C, but it was really CUDA Core Compute Libraries.
ConorAll right, all right. We got well, it doesn't matter. We got like I said, we're short on time, so we can't tell all these stories. But I missed the last two. One was on a fine if implementation, a GPU find if, and the second one, the most recent one, was uh a rotate. And I actually had planned to go to that one, but a story we don't have time to tell today is that my sump pump pipe like exploded, and the that morning, like I came downstairs in the basement to see water shooting out of the floor. I actually was at my desk at 5 p.m. Eastern time when that meeting was taking place, but just because the pipes had exploded, it like it rattled me and I missed that meeting. But then I was like, oh yeah, better code was today. I checked at like 10 p.m. And then I saw that the I didn't know what the topic was. I saw that the topic was a rotate implementation, and I was like, oh my goodness, I can't believe I missed this. And I also saw it was two hours, and I was like, holy smokes, it actually wasn't two hours, it was an hour, and then I believe I think I'll throw him under the bus. Andre Alexandrescu didn't log off of the meeting for an additional 60 minutes, so it's just his little icon excuse.
Marco SalgadoThe thing is that he had to basically leave the meeting because he he had to take his kit to soccer practice or something. So he basically told me, Yeah, I'll just stay in the meeting so that it keeps recording, and then everyone just locks off. So that's basically the reason.
ConorWas he the one that started the recording? Yeah, exactly. Oh, I see, I see. All right, that's fair enough, fair enough. Uh, you gotta pick up your kids, you gotta pick up your kids. You gotta pick up your wife, you gotta pick up your wife. A lot of people you gotta give you gotta pick up. Anyway, so I I saw that at 10 p.m. and I can't remember if I immediately watched it or I waited to first thing Thursday morning, and then I thought to myself, I did not understand a lot of stuff that was talked about in this session. And so we're gonna and plus, even if I had understood everything, we gotta bring Marco on. So with that all out of the way, we'll throw it over to you, Marco, because I actually I don't think we've crossed paths before this. And introduce yourself, tell us a little bit about your history, your path to NVIDIA, and then we can decide. Do we defer GPU rotate to part two and and get Bryce's Autokuda and travel updates? Or we do the rotate now?
BryceWe're gonna we're gonna talk about rotate this week because by next week, uh, by next week, so much stuff will happen by next week.
ConorYou're you're what is it? You're the Thanos collecting infinity rings, and you've got like four out of five right now.
BryceAnd uh That is correct. I have four out of five.
ConorAll right. So Marco, throw it over to you, introduce yourself, and then we'll we'll yeah, we'll talk about GPU rotate.
Marco SalgadoYeah, so yeah, Marco, I'm a fellow listener of the ADSP podcast, and yeah, I'm 25 years old. I'm from Madrid, Spain, originally, but I studied in Germany, and I think I started programming around the end of 2022. I started with Python, then a year after that, somewhat I started getting into C. And then middle of 2024, I discovered CUDA, and that's I got really into CUDA, and then I did my master thesis in CUDA. I after that, June of last year, I started an internship at Nvidia, and then that was until December, and then in February I've started as a full-time employee as a software engineer on the NVIComp library team, so maintaining that library, and that's basically my path to where I am now.
BryceYou may be the youngest, the youngest guest on our or the the the the first post-COVID programmer on our podcast.
ConorAnd wait, before we get to NVComp, I got a hundred questions. This is like a I don't want to say like regs to riches story, but you started programming like three and a half years ago, and now you're working at NVIDIA? Like, that's absurd. Well, first of all, what were you doing before then? And and second of all, like how you what are you tell the listeners? Well, I guess maybe most of the listeners are gamefully employed, but to the ones that aren't or that are university students, how does one start programming and then rapidly go through Python C CUDA and get a job at NVIDIA? That's wild.
Marco SalgadoYeah, that's yeah, that's that sounds wild to me as well. Yeah. So basically I did my bachelor's in mechanical engineering, and I would say my first passion was aerodynamics, and so that's where I started. And then I did my master's in aerospace engineering, but I basically went to a specific university where you could choose a lot of different subjects because right after I began my master's, I that's when I got uh started into programming, and I really liked it. So I didn't know if I wanted to stay in aerodynamics or go into programming, and so that master's basically allowed you to do, I don't know, I did courses on quantum computing, on algorithms, on probabilistic and Monte Carlo methods, like weird stuff. Basically, you could choose between a lot of stuff. And then yeah, basically, I also got a part-time job as a programmer in Python, and then that's also where I learned a lot of stuff, and then early on I realized that I really enjoyed it and basically got obsessed with it, and that's that's how that that went. And then yeah, when I got to C I got another job, I switched jobs to another part-time job as a C programmer, and I think that also helped me a lot because I went into a team that was developing a library where they were gonna sell the source code itself, and so they they really took care that everything was programmed very well and very clean style, very clean way of programming, and I learned a lot there. And then, yeah, I mean, when I got into CURA, I think I was also very naive, and I basically got destroyed at the beginning with the complexities, but I just kept go kept on going, kept on going, and then my master thesis. I also had the fortune or like the good luck that I found a very good supervisor who's a professor at Texas State University, and he allowed me to do the master thesis with him, and I also learned a lot from from that master thesis. Uh and then yeah, and then the internship, everything also lined up. I think my like my supervisor is someone that is somewhat well known in the data compression space, in the GPU data compression space. So him giving good feedback on my work, I think also helped a lot for me to get the internship on NVIDIA. And then once I was in NVIDIA, I think it was just showing that I was good enough. And then I guess that was uh the way that I got in. I think doing an internship seems like a very good way of getting in because NVIDIA seems to be a company that either only hires interns or only hires senior people. There's like a gap in between where new graduates don't really there are not really options for that. So I think an internship uh is a very good way for someone young to get into NVIDIA.
ConorI think a lot of companies are like that. I mean they would deny that they are, but it's just a matter of that it's you're taking a risk. Like it's a more certain thing if an intern has proven themselves and like a senior person has proven themselves. And so I I do know that like new grads do get hired, but it is just much more difficult. Like the company that I got hired at, they had like a massive funnel of basically co-op students and internshi interns.
BryceAnd like it wasn't this maybe changing because I when I was in China, I met with one of the most AI-pilled teams at NVIDIA, and the manager of the team was saying, I only hire I I love interns, I only hire interns and new college grads. I have a bunch of senior roles that I can't fill because all the senior people are not like sufficiently AI pilled, and so I don't want to hire like a senior person when I can just keep hiring more you know junior people, and like I that that does kind of make sense to me. The other good way to get hired at NVIDIA is be a student at the University of Waterloo and do crazy, you know, cool AI stuff and then tweet about it. And uh there's like three or four of these students who've come across my desk recently, and I don't know what's going on with this current class at University of Waterloo, but they're all really, really sharp. I mean, it's always been the case. University of Waterloo is a great school, but uh but yeah.
Marco SalgadoDid the guy that implemented Turbo Quant get a position at NVIDIA or what?
BryceCan't comment. Can't comment on that. Yeah, yeah, and I are chatting.
ConorSo wait, a couple more questions, because I do find this very curious. You mentioned so you were based in Madrid, then you studied in Germany, but then you mentioned your thesis advisors for your master's was at Texas State, AM. I know there's a bunch of, I'm not sure if I got that. Did I get that right? Texas State? It's Texas State University.
Marco SalgadoIt's a different one from the AM.
ConorOkay, yeah, I know that because there's like three of them. There's also like UT Austin, anyways, and I always get them confused. So it's just Texas State. Were you still in Germany at the time, or did you come to the US for uh your how did so how did that work?
Marco SalgadoNo, we did it remotely. So I mean I yeah, when when I did my master's thesis, I went back to Madrid because it was in the winter, and winter in Germany is pretty miserable. So I took the chance and I went back to Madrid, and I've been in Madrid ever since. So this was end of 2024.
ConorAnd so were you doing your masters just remotely from Texas State, or did the Germany university have some like partnership or something?
Marco SalgadoNo, it's not necessarily a partnership, but at in I mean, in my experience, German universities are very flexible with how you do things, and so I basically just found someone at the German university that allowed me to have an external supervisor, and so I just did my thesis remotely from Madrid with my supervisor in Texas. We just had one Zoom call every week talking about stuff, and then we emailed and then the guy from Germany just had to basically sign off the sign off the thesis saying that the you're telling me that Germans are are known for their flexibility. I know, I know. It's it doesn't sound right, but in my experience with university, they are pretty flexible. I mean, even in university, like you can if you want to do your bachelor's in seven years, you can do it in seven years. It's your choice. You can choose how many courses you take every year. You can take uh you during your first year, you can take courses from the third year if you want. You can do everything basically as you wish, it's your own problem.
BryceAnd I'm just gonna say, please, no angry, no angry letters for that from the Germans.
ConorI guess last question, unless the Bryce got more questions about the education, because I do I find this uh very fascinating, and also this is I think like very valuable, or may maybe Europeans already are aware of this kind of flexibility. But when when you were searching for your thesis advisor, were you specifically at that point did you have like CUDA in mind and potential uh like jobs you know in the space? Uh and did you s reach out specifically to that thesis advisor because of that, or was it for other reasons you reached out and then that kind of path just led to CUDA and NVIDIA?
Marco SalgadoNo, so I so basically I started with CUDA beginning of 2024, middle of 2024 in a university project, and after getting into that, I knew that I wanted to do my master thesis in that. And I think doing a master thesis in CUDA is quite difficult. It's quite difficult to find topics for it and research groups that specialize in that. And so I basically started looking all over Europe to see what options there were, and then I came across some papers from him. I think I had come across a paper from him already in the past, and so I contacted him. His name is Martin Burcher, and then I just contacted him, sent in an email, and he was kind enough to yeah, give me an opportunity, and that's basically how it happened.
ConorThis is a that's an amazing. I don't know if this um uh exists in North America, but what a life hack. Like, I I'm not sure if you had a mentor or something that told you this is possible, but like if you just stumbled onto this, or maybe it's just a common thing, that is like wild, wildly like I don't know, like amazing, or just like a focused, like you wanted to go a place, you didn't find uh any advisors, you know, and so you just searched basically the globe, tried to find folks that were doing research in this area, found one, got them to agree to be your advisor. Note to like listeners that are because that will we like probably one of the most common questions we get is whether you how do you get a job at NVIDIA or like at some other big tech company? And it's and one of the things we always say is like, well, what do you want to do? And it sounds like you answered that question like very quickly, or at least at some point, and then went and like found someone to basically do masters with. But that is uh uh very fascinated by that story. And I imagine there's a few people listening right now thinking, like, huh, I'm in my fourth year of university. What an interesting idea, uh, you know, maybe they're gonna go do the same thing. All right, any any follow-up questions from Bryce? Um, and also, too, actually, I do have one more question. You took a whole undergraduate degree in mechanical engineering, was that was that the uh right engineering? And they didn't have a single programming assignment or programming course or anything like that?
Marco SalgadoThey had a course on programming that was Java. Uh-huh. Uh I think it was second year or third year. And uh yeah, I think I remember that was very basic. I remember even the exam you had to write code by hand.
ConorOh, so and and that is a mandatory course. So you did take one course, but the problem was it was in Java and that did not make you fall in love with it.
Marco SalgadoNah, the thing is I was already in love with aerodynamics, I guess. So it was hard for me to dedicate a lot of time to that because I was very focused on aerodynamics.
ConorRight. I see. So it wasn't until later until you came to programming uh a second time that you kind of fell in love with it. Yeah, exactly. Interesting. All right, any questions from Bryce, and then we can pivot to GP2 GP GPT. That's how AI filled I am, folks. I can't say GPU correctly.
BryceUm, no, no, no. Let's let's let's talk about let's talk about GPU rotate.
ConorAll right. We'll throw it over to Marco. So I mean, uh my I I thought the presentation was great.
BryceThere was, I mean, I'll let you explain the two different I should, I should, I I have I have not seen this presentation because I was in China. So I have very limited background on this. And also the listeners too.
ConorI mean, I guess we can't you'll just have to you'll have to submit some version, evolved version of this once GPU Road Take gets into Thruster Cub or wherever we're gonna put it, somewhere in CCCL, and uh give it at a conference because probably we can't make that internal talk public. But anyways, you present two different kinds of implementations based on a criteria. But then the thing that I, you know, this might end up in like part two of this conversation, so a week from now, is like all the profiling, and you were doing a ton of stuff with I believe NCU to like measure this metric and this metric, and like that stuff is just like I use NCS not all the time, but pretty frequently. It's NICU, not NCU. Is it NICU?
Marco SalgadoI mean I've never I've never heard it said that way in my life. I always say NCU.
BryceYeah, because NICU, NICU, and I mean do you say N-S-Y-S?
ConorNo, because that has a good that has a good pronunciation. NSIS sounds good. NICU NICU stands for the neonatal incubator or something. It's like where the the babies that need help in the hospital go.
BryceWell, you know, the kernels need help. Let me because I've been I've been saying this a lot in our tutorials. Nvidia has two different profilers for the compute side of things. Insight systems gives you an eagle-eyed view of everything that's going on in your program. It profile, it's a sampling profiler, so it look it looks at what's happening at every stage of your profile, of your program, and then it builds up, just like in an audio editor, a a bunch of different tracks showing what every process and thread is doing at any given point in time. It is great for identifying IO and memory movement bottlenecks, concurrency issues, et cetera, looking at overall utilization, and it has relatively low overhead. Insight compute is a kernel level profiler. So it is a tool for profiling a single individual kernel or operation or multiple different kernels, and then looking at those results, it can collect a bunch of different hardware performance metrics, like for example, L2 cache performance information, like L2 cache misses, or things like how many sectors are utilized out of every L2 sector that's trended that's uh loaded in, and also things like how many operations are issued per uh cycle, stuff like that. Now, collecting all these hardware performance metrics is challenging because you can only collect a few of the underlying counters at a time. So insight compute will run your kernels multiple different times. It'll run your whole program multiple different times. So it could be up to you know 40 or 50 times. So it has a lot of overhead, but it gives you very detailed information and insights into the performance of a particular kernel or algorithm in your program.
ConorSo, yes, you used a lot of of these metrics, of which, to be honest, like how often, Bryce, do you use NICU? Pretty often. Pretty often? I think I've used it once and like I did I failed to gleam any information from it because I was like this is too confusing.
BryceUh you should you should come to one of my classes, Connor. I'll teach you. I have a whole great we have a great I've been I've been at your classes. You never talk about NICU. Oh no you haven't been to our classes recently not in the last year.
ConorThat's not that's not true. I was in I was in Norway.
BryceYou were yes and we talked we talked extensively about uh insight compute and insight systems.
ConorI think you talked about NSIS briefly. But anyways I'm not an expert and so we get to chat about the implementations, the things that you were measuring and hopefully this will be a fun conversation because we love rotate obviously but also informative in terms of like how do you go from nothing to designing a GPU algorithm and the things you have to consider. And do you start with NSIS or do you go straight to to NICU? It sounds so weird. But anyways over to you Marco now I like a mini representation of the the better code uh GPU rotate that you gave a week ago. Be sure to check these show notes either in your podcast app or at adspthepodcast dot com for links to anything we mentioned in today's episode as well as a link to a GitHub discussion where you can leave thoughts, comments and questions. Thanks for listening. We hope you enjoyed and have a great day.
BryceLow quality high quality that is the tagline of our podcast.
ConorThat's not the tagline our tagline is chaos with sprinkles of information