Cables2Clouds

Network engineers already understand Kubernetes better than they think.

Cables2Clouds Episode 62

Send us a text

The invisible threads connecting Kubernetes and networking infrastructure form the backbone of today's cloud-native world. In this revealing conversation with Marino Wijay from Kong, we unravel the complex relationship between traditional networking concepts and modern container orchestration.

Marino brings a unique perspective as someone who entered the Kubernetes ecosystem through networking, explaining how fundamental networking principles directly translate to Kubernetes operations. "If you don't have a network, there is no Kubernetes," he emphasizes, highlighting how reachability between nodes forms the foundation of cluster communication.

The network evolution within Kubernetes proves fascinating – from the early "black box" approach where connectivity was implicit to the sophisticated Container Network Interfaces (CNIs) like Cilium that offer granular control. Network engineers approaching Kubernetes for the first time might feel overwhelmed, but as we discover, concepts like DHCP with DNS registration, NAT, and load balancing all have direct parallels within the Kubernetes networking model.

Our discussion ventures into the practical challenges organizations face when implementing service mesh technologies. While offering powerful capabilities for secure pod-to-pod communication through mutual TLS, service mesh introduces significant complexity. Marino shares insights on when this investment makes sense for enterprises versus smaller organizations with more controlled environments.

The conversation takes an especially interesting turn when exploring how AI workloads are transforming Kubernetes networking requirements. From GPU-enabled clusters to specialized traffic patterns and the concept of Dynamic Resource Allocation as "QoS for AI," we examine how these resource-intensive applications are pushing the boundaries of what's possible.

Whether you're a network engineer curious about containers or a Kubernetes administrator looking to deepen your networking knowledge, this episode bridges crucial gaps between these interconnected worlds. Subscribe to Cables to Clouds for more insights at the intersection of networking and cloud technologies!

https://www.linkedin.com/in/mwijay/

Purchase Chris and Tim's new book on AWS Cloud Networking: https://www.amazon.com/Certified-Advanced-Networking-Certification-certification/dp/1835080839/

Check out the Fortnightly Cloud Networking News
https://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/

Visit our website and subscribe: https://www.cables2clouds.com/
Follow us on BlueSky: https://bsky.app/profile/cables2clouds.com
Follow us on YouTube: https://www.youtube.com/@cables2clouds/
Follow us on TikTok: https://www.tiktok.com/@cables2clouds
Merch Store: https://store.cables2clouds.com/
Join the Discord Study group: https://artofneteng.com/iaatj

Tim:

Hello and welcome to another episode of the Cables to Clouds podcast. As usual, I am Tim. I'll be your host this week, and with me, as always, is the other guy. What's his name again?

Chris:

Chris, I'm still the other guy. Two weeks in a row row I'm the other guy he's the other guy.

Tim:

Yeah, the yin to my yang or the yang to my yin, I don't know, we haven't figured out the yeah, um anyway. So, uh, we have a a new guest with us, new to the podcast and very excited to get him. Uh, here it's uh, marino Weijie, and we go ahead and uh just introduce yourself, marino, ouijie. And go ahead and just introduce yourself, marino, for somebody who hasn't heard of you yet. Yeah.

Marino:

Thank you so much, tim Chris, appreciate it for inviting me to the show. So my name is Marino Ouijie. I am a Canadian, so I live up in Toronto and I'm a little bit of a techie. I geek out every once in a while with the home lab and stuff like that. But I'm deep into both the networking and the Kubernetes ecosystem and I found my way into the whole Kubernetes ecosystem through networking, interestingly enough, and because of that I figured it'd be great to just chat about it and see where the landscape is today and where it's going. But a little bit about what I do. I work at a company called Kong and I focus in on AI and API transactions, so a lot of higher level networking. I don't really touch hardware, but I touch a lot of like Docker, kubernetes, various cloud systems as well, because I'm using the same networking principles that I would, just with APIs. So that's a little bit about me.

Tim:

Awesome and yeah. So, marino, I think we wanted to. We wanted to talk about Kubernetes and I love something. So, you know, just, we reached out to you cause we you know, we've followed you for a while on LinkedIn, we love what you do and Kubernetes is so integrated with networking and when you replied and said, yeah, let's do this, that was actually one of the things that you brought up as well. So, like, kubernetes and networking are so tightly coupled because of the distributed nature of Kube, so I think this is going to be a really, really good one. So, let's, you know, we've talked about Kubernetes a few times on the show, but kind of what's, let's, let's, let's hear, I guess, just your take on Kubernetes to begin with and let's kind of go have a discussion from there.

Marino:

So, if you think back to maybe 2017, at the time, Kubernetes was just a container orchestration system and all it really did was allow you to run containers at scale, and it was a very clean but also very manual environment.

Marino:

Today, it does pretty much the same thing, but also is a platform for a variety of kinds of workloads. We're talking like not only containers but VMs, but networks, but also bare metal, and it's just phenomenal to see the growth of that ecosystem, because when you think back to like, virtualization, that system was what vSphere so you might work with vSphere right and today Kubernetes has pretty much taken over that role. Now there's just a lot of different implementations. It's a very pluggable architecture. So you're not only plugging in your workloads, you're plugging in network security, you're plugging in observability, you're plugging in various different systems just so that it can interact with Kubernetes. You're even plugging in AI, and AI is also supported by Kubernetes as well in a lot of different ways. So we're here at this stage where it's been adopted in so many different organizations. In fact, you look at all the cloud providers and they have their own implementation of Kubernetes, which they roll, they scale, they manage for you, and all you do is you just run your containers or you run your workloads, basically.

Tim:

Before we get too far into the networking side of it. So what, what? What I found interesting. So I had to learn kubernetes and of course I'm still learning kubernetes, like everybody else's, as, as it continually changes, uh, you know for my job. Uh, you know, of course, I work for aviatrix and you know where our focus is on cloud networking and security, um, and what that means. You know where our focus is on cloud networking and security and what that means you know.

Tim:

So basically, I had to take, you know, a networking background. You know I'm a CCIE and almost pure networking, but also I used to be a firewall jockey. So a little bit of cybersecurity, very little. And then take all of that and be like, okay, well, now let's figure out Kubernetes, which of course feels like a completely new topic. I did know Docker. I'd used Docker years ago and so the concept of containerization, at least, was familiar. But yeah, the orchestration platform of course feels very different, the way it's all orchestrated and built. So it's really interesting for someone coming from a networking background to understand how Kubernetes is, how to use Kubernetes. Essentially, you're absolutely right.

Marino:

I mean, if you think about systems like OpenStack and vSphere and how they operated, they are very much distributed systems to just capitalize on the fact that you have all of these different resources available to you. But you have to find a clean way to carve them out so your developers aren't screaming for resources when they need them. You can just make it very easy for them to consume. Now, what's really interesting about Kubernetes is that it heavily depends on networking. I mean, if you don't have a network, there is no Kubernetes. Yeah, and it's heavily reliant on this idea of reachability. You've got this cluster. This cluster is comprised of a number of nodes and these nodes all need to communicate with each other. They need to exchange information, they need to identify when something's wrong so that they can tell another node hey well, I can't do anything anymore. Can you take on the load? And everything should still stay the same and look the same to a consumer, a developer, whoever.

Chris:

And I guess I think in the early days of Kubernetes it was kind of, I mean, keep me honest here, but I think the idea was kind of more implicit networking, whereas like it was just kind of implied that everything is able to talk to everything. And then once kind of networking and network security kind of got got a seat at the table, they're like hey guys, that's not really how these things should be operating. You know, we should, we should you know kind of um, uh, make this a little bit tighter and at least a little bit more explicit about what is allowed to talk to what Um. So I mean, in your experience, how has that kind of maybe evolved Kubernetes over time? Has it become kind of this bigger beast and it's harder for people to handle with that type of control, or do you think it's been kind of for the better?

Marino:

So when you think about Kubernetes for a second. To your point right, it was very much just this black box that would run containers. Networking in Kubernetes was a black box I wouldn't say static but you really couldn't do much with it. You couldn't customize it a lot and the demands of the networking and security teams were like no, we need to be able to get in there and trace packets, we need to be able to see what's going on, we need to be able to have ports that are locked down and other ports that are open for this communication to occur. And then oh, by the way, we also need to have a dashboard to see all of this as well. So you start to see a transformation of the networking side of it, where it's growing to accommodate what network engineers want, but at the same time just aligning to what Kubernetes is. So let's think about a workload and how it gets on the network for a second. So you've got like a server that is connected to a network with, like Ethernet. It gets its IP subnet mask gateway and now it's on the network. You've got some DNS going on and now people can hit it without having to know the IP address. That same concept exists inside of Kubernetes when you have a pod, that comes online as well. Except there's also this concept of IPAM IP address management. That comes about because it's not like someone is going there statically assigning IPs. There's a system that has to assign these IPs, but then you also have these conditions of okay, if I assign IP addresses, how do I handle the DNS bit? So then you have this system called core DNS or a DNS system that effectively looks and watches the system. New workloads come online. Well, let's go, you know, set up an A record and a PTR record so that services can communicate with each other, or pods can communicate with each other.

Marino:

But one thing that the Kubernetes ecosystem really thought about, was very thoughtful of, is this concept of immutability and ephemerability. Right, things can suddenly disappear. Things also just simply cannot be changed easily. Like we cannot go in here and modify a pod just like that. Like we can. But that's not the right way to do things. That's right. We have to be a little bit more declarative as to how we want desired state to look. So this concept of FQDN became very important in the Kubernetes ecosystem. Dns became very important, where you now have an abstraction layer where you can start swapping pods in and out, even notes. You can swap these in and out and no one on the consumer side should ever see that anything changed. Maybe they might see a slight blip, but nothing really changed there.

Tim:

So this is interesting because I think a lot of network engineers probably already understand Kubernetes a little bit better than they think they do. Like, for example, the concepts that you know you were just talking about Marinos, this is just DHCP with DNS registration, right Like that part of it should be very familiar to anyone who is a network engineer. And to go back just a step before that, I think that you know the need for more. What was the word I'm looking for For the more granular and more declarative networking stacks is what, of course, gave rise to like CNI, right, the container network interfaces, where now we're bringing this network layer observability, like with Cilium and Calico and so on and so forth.

Tim:

The CNIs are just providing something that the original I don't know what you'd call it, the original branch, not branch a project the Kubernetes project didn't really envision application developers needing. Right Like, app developers were never going to necessarily need that level of network observability. They needed it, but they didn't understand it. They didn't know they needed it. You know what I mean Like. So I think, do you think you know? Cni is basically the method by which we get you know it up levels. So think of it like an application is how I think of it. I think of it like you're installing an application, essentially on the cluster that the application is, you know, in the kernel it's, it's, it's networking, but yeah, anyway.

Marino:

Yeah, I mean, a CNI is just a switch at the end of the day, that's really what it's doing Replacing the bridge?

Marino:

Yeah, yeah, it's a very multi-layer switch that's very capable of so many things. It's pluggable too. But what's interesting about what a pod is is it's a Linux network namespace. And when you think about network namespaces, these are isolation boundaries that very much mimic this concept of a VLAN Not entirely one for one, but the idea of having that network namespace is to provide that isolation boundary for a set of containers that might need to talk to each other but still have some access to the network, some priority on the network, if you will.

Marino:

Now, where we've gone with, like Calico and Cilium Cilium especially, I mean, if you've been watching the news, they got acquired. Actually, the company that built Cilium got acquired by Cisco Isovalent yeah, isovalent, probably about two years. Yeah, isovalent Two years ago. Big fan of them, big fan of what they do and big fan of the Cilium CNI because it spoke to me as a network engineer, because it was able to do things like BGP, bgp. Why do we need BGP in Kubernetes? Well, if you have all of these different pod networks, I mean, someone outside of your Kubernetes ecosystem needs to be able to get to it. So you're not just going to hand them a service IP and be on your way. You've got to share those networks back into BGP and they need to distribute them to remote sites if you will. And so they really touched on and were very thoughtful of how to build a CNI. That was very network centric, still Kubernetes centric too, but the folks that engineered this were actually network engineers. Yeah, they sat there behind the scenes and really thought like you know, what would a network engineer do to move these packets around? But then we can get into like the nitty gritty of eBPF and how, like kernel based networking works. But I think that would take a whole hour in itself.

Marino:

But when you think about the networking part of Kubernetes, you're starting to see layers and layers and layers of abstraction going on here, because it doesn't stop there. You've got a service mesh on. If you're not familiar with it, or for the folks that listen on later on, if you think about what a vpn was aiming to do, it was supposed to securely connect branch locations. And when you think about pods, for a second, they're in a way like their own little islands doing things. They need to communicate with other islands as well, and you can truly do so. Like if you think about the internet, if everything just talked over the internet. Sure, everything would go through very cleanly, but in a very unprotected and plain text manner.

Marino:

So the idea of service mesh was to not only provide a layer of encryption through something called mutual TLS, but also to add on to that network capability by bringing in this concept of QoS, but not traditional QoS. Qos in the sense of let's just think about service resiliency implement things like timeouts, retries, inject faults, so we can add some artificial delay where we need to, and then also do things like hey, I'm going to roll out a new version of a service, something like a blue-green or a V1, v2. Let's wrap to that appropriately. And, by the way, we used to do these things as well with ALBs, with application gateways as well, and we still do. It's just we brought this into Kubernetes because we needed a way to be able to handle this from a service-to-service standpoint we needed a way to be able to handle this one from a service to service standpoint.

Tim:

I think of, uh, when I was trying to figure out what the hell service mesh was, the only the closest thing I could think of from a network engineering perspective was like here's an application layer software defined vpn mesh, basically like, like you know, with the whole concept of here's, here's a sidecar which is like your little tiny router or firewall or whatever you would call it VPN Terminator on the side that's allowing the applications to have protected connectivity with each other. But it's all at the app layer from a communications perspective, right, because you can do all of these policies and stuff at the app layer. Yeah.

Marino:

You get to be a lot more granular because at this point you're communicating at either that TCP layer or HTTP, where you can get super granular, because now you can inject certain kinds of headers into your HTTP request and filter or provide policy against that. You can bake your authentication in there as well. There's just a lot of fancy things you can do with HTTP that you couldn't really do at the TCP layer. Like you're kind of stuck, you're limited, it's very static, you're just working with IPs and ports and host names, but when you're talking about HTTP there's a lot of data you can feed into that entire request flow.

Marino:

So service mesh became very popular but then also became problematic in its own way, because now it's just adding a significant amount of complexity and overhead as well, and so you have to think about your operations teams and what they have to take on as a burden to be able to support your applications running on top of Kubernetes or even other systems too, other systems that might be using a service mesh.

Chris:

Now that we're on the topic of service mesh, I've seen like in my experience talking to a lot of customers and colleagues and things like that, there's kind of people in two camps that are running Kubernetes either in the cloud or wherever Either the ones that do not want to deal with the complexity of a service mesh they think that it's just too much for what they need to do and then the ones that are like some of them are begrudgingly implementing a service mesh and some of them are happily doing it. But I guess in your experience, what have you seen as being that breaking point where a customer is like okay, we absolutely need to do this for this purpose ABC, enhancing our applications, enhancing the business, et cetera. What does that typically look like to you?

Marino:

The most common use case you find with companies wanting to or organizations wanting to use a service mesh really comes down to the security bit. Right, they have this strong requirement to adhere to things like PCI, or they have to protect their workloads and how they communicate, and so MTLS becomes that initial pathway. But then organizations also begin to realize how important the service resiliency also becomes, right, that QoS bit, if you will. The problem is like if you think about when we were doing networking at the hardware level, like how often would you actually sit there and configure QoS?

Marino:

The better answer would be yeah, like the better answer would be let's just throw more bandwidth or higher powered switch if there's a problem, right, and we're starting to see that same pattern arise, just in a different light, all over again. Because no one wants to sit there and baseline their services and understand, like, the latency between their services in that service request flow. And it takes a lot of tuning to set up that resiliency as well. It's not easy to do because at a moment's notice, your requirements might change and then you have to change up everything all over again. And you also have to pair this with your testing, your testing methodology too, which might not be like bulletproof and some things might slip through the cracks.

Marino:

And when you think about scaling right, when you think about how Kubernetes operates, it scales. It scales based off of certain triggers. It scales your workloads because it determines hey, you know, you've gotten an increase of requests coming inbound now and I have to scale up right, but you cannot infinitely scale, so you've got to do other things too. Anyways, like we can go on and on about this, but the reality is, if you're a large enterprise organization, chances are you're probably investigating or currently using a service mesh. You wouldn't be using something open source. You'd probably be using something enterprise.

Marino:

No enterprise grade. But the other thing, too, is most organizations that are much smaller walk away from service mesh because it's not warranted. Within their organization, they have a better view of what their workloads look like, as well as their infrastructure, so they have a lot more control. But when you're thinking about enterprise-wide scale, you have different teams needing to interact with each other, different applications doing different things, calling each other, if you will. That's where that service mesh becomes very powerful and also very important and very useful as well.

Tim:

Yeah, I can't think of a single enterprise that I'm aware of that actually has a working application catalog of like here's our, here's our apps. Here's what's talking to what on what ports. Here's the bandwidth requirements, here's the latency requirement. None of that happens, right, it's always like oh, it's broke. Oh, yeah, I guess we can't tolerate more than 50 milliseconds of latency. Good to know, right, it's like that. So, yeah, totally, totally, totally agree on that. Another thing that I've noticed and it broke my brain when I was learning Kubernetes was the networking behind service, like services, like service IPs, how they can be shared between nodes and like you know how, like the, the cube control, uh, scheduler, and all of that like sets up the services and then, no matter which node you hit, you're gonna end up at a pot. It's it's just, it's weird to me. So I I still don't know a great way to explain that to people that haven't sat down and read it.

Marino:

Basically, read through it to figure it out yeah, so a lot of this really comes down to NAT and saying to that requester or that consumer well, I'm going to NAT this request and then send it off to where that actual workload exists. And you know, behind the scenes, depending on what CNI you're using, what underlay fabric you're using as well, a lot of that is just overlay and vx line networking. At the end of the day, yeah, I mean that all sits behind the scenes and we just don't see it. I mean, we don't even have to sit there and troubleshoot it anymore. Um, our networks just take it, they just handle it at this point in time. Um what? What really comes down to it, though, is for network engineers out there, if you have a very clear understanding of how nat works, that's how the kube proxy works, that's exactly what's the kube proxy is doing not effectively and it knows where to direct workloads.

Marino:

It uses a little bit of what they call ip tables. So if you've ever worked in the linux space you probably have messed around with ip tables and more recently, nf tables. Yeah, yeah it's just an enhancement. Now, like psyllium uses ebpf to handle this, but at the end of the day, um all it's.

Tim:

All that's really going on is a bunch of remapping and rerouting yeah, I seem to remember and again I'm so late to the game, but I seem to remember that, like IP tables was a huge, just a hog, like it was. It was a problem like that. One of the reasons that you know CNIs gained popularity was because you know IP tables could just be overrun essentially with the requests and mapping tables and just swapping, constant swapping.

Marino:

It's. It's a little. It's a significant overhead overall, right, because you have a bunch of nodes in your cluster and they have the same set of IP rule tables or sorry, ip table rules and you're just thinking about a system that has to read down a list, find that entry and then route the request, which is not very efficient when you're talking about massive scale. So a lot of CNIs have significantly improved upon that experience overall, but at the end of the day it's still a challenge and that's why we're seeing a migration over to NF tables, because it can process traffic a lot better and it can read rules. It can write rules a lot better as well and it just does really well with connection handling a lot better as well, and it just does really well with connection handling.

Tim:

Sorry, one sec, I lost the. I accidentally closed the window with our list on it, so all right, hold on, let me pick up. So actually, scale's a great segue, right? So I don't think. I think one thing network engineers that haven't really leaned into Kubernetes haven't really considered is the, not just the scale itself of Kubernetes, where you can have, you know, hundreds or potentially thousands of nodes and pods and all of this, but just how, from a networking perspective, that is even handled at the, you know, by Kubernetes. So you know what does that look like Like? You know, just to help the network engineers understand what scale looks like from that perspective.

Marino:

Yeah.

Marino:

So let's just say you're scaling workloads, right, you have a container and you need multiple copies of it. In Kubernetes when you create a service type, it's actually a load balancer that's fronting these services with DNS and so by default it'll just round robin the requests out to each one of these containers, or pods, I should say. But what actually causes the scale is triggers. So within Kubernetes you have something called a horizontal pod autoscaler, which effectively helps with the scaling capabilities based off of certain conditions. Okay, my CPU metric for all of my workloads, my replicas, has hit 80%. Start scaling to a certain number of replicas and then the reverse, like once that demand goes down, scale back down.

Marino:

Because the other thing that you have to consider too is your cluster right, which runs all of these containers. It's not infinite. You also have to consider scaling that up too. And let's just take, you know, one of the major cloud, your cluster right, which runs all of these containers. It's not infinite. You also have to consider scaling that up too. And let's just take, you know, one of the major cloud providers out there. When you have to scale your cluster, it's not just a simple hey, I'm going to throw a node in here and then boom, now I have more compute and CPU and memory capacity. It actually has to bootstrap it, it has to bring it online, it has to do a bunch of validation checks to make sure that the node itself is working and can actually accept workloads, and then, once it's in the cluster, joined to the cluster, you can start scaling your workloads. So there's it's a twofold operation. But the other part to this too is the load balancer right. So you have an internal load balancer and then, if you are exposing your services, you'll probably use an external load balancer as well that just again accepts the connections and then distributes them to all available copies or any available copy. But what's interesting is that the way to operate these systems is not just by like turning on your HPA and being on your way. You actually have to understand how your workloads operate.

Marino:

What might be considered demand also could be considered a denial of service attack. Let's just say, you know, around the holidays is when we expect to see traffic spikes and we expect to see increased load. But it's not so difficult for someone to just create a denial of service attack, slip it into that same period of time and no one really think about it. And then, all of a sudden, if you haven't set your let's say, your cloud bills or alarms properly, and you also haven't set your scaling limits, well, guess what? Now you have a million dollar cloud bill that you have to take on and worry about.

Marino:

So there are other mechanisms that you can employ to be able to handle that too, things like rate limiting. Rate limiting is important because, as you start to see scale, you'll allow a certain number of requests to make it into your cluster before you hit a capacity. But that rate limit is supposed to prevent something like a denial of service attack, because now, once you start to see an anomaly of requests coming inbound, wherever you've implemented that rate limiter, normally it's not at the load balancer. You do it somewhere like a WAF or maybe even like an API gateway or something of sorts. That's where that rate limit kicks in, so you don't run into a scaling infinite scaling situation. So that load balancer is important. It's just distributing load. But you also have to rely on telemetry data. That telemetry data is what's enabling that scale to be possible as well.

Tim:

I mean, that's very cloud, Sorry go ahead, go ahead.

Chris:

I was going to say, yeah, the DOS thing is kind of. I feel like that just has a lot of parallels to a normal denial of service, right, like, if you think of like these, you know a lot of carriers offer, like denial of service, scrubbing services, right, because the thing is, if you buy a circuit from someone and they hand you basically a wire, once the traffic's on your end of the wire, there's no way to stop it. It's already there, right? So you've got to with Kubernetes. It sounds like you've got to use some of these external things, as well as maybe some of the internal components, to stop it. Before you know, it causes Kubernetes essentially to think that the trigger has been invoked, right? So I think it's exactly the same.

Marino:

It's very much the case, right? I mean, whether you're in the cloud, even outside of Kubernetes as well, straight up operating in the cloud, you're going to run into the same challenges as well, because your resources, you're exposing them to public customers, public consumers, and unless you're implementing some strong authentication and you're not just freely exposing all of your APIs to everyone, you're likely going to run into situations like that. But again, we've learned from this. We've built a lot of practices around these situations and how to design for and build for these situations as well.

Tim:

So, speaking of because this is also a really good segue I mean, we're talking about scale here, right, and nothing has shown to stress scale lately more than the deployment of AI workloads and high-performance computing and GPUs and all the stuff associated with doing a lot of data very quickly and a lot at scale. So people are doing this in Kubernetes. I don't know which pieces they're doing in Kubernetes actually of the AI stack. You know from their training. I'm not sure exactly what it is that they're doing, but you know actually what does that look like? Do you have any idea? Like what people are doing with AI and Kubernetes now?

Marino:

Yeah, before it used to just be running systems and workflows. That would have some engines that are just doing training, running training models within your cluster as well, and you would still need some pretty high-grade hardware and GPs to be able to handle that. But we're well past the training. We still do that training. We still see it in Kubernetes. There's a project out there like Kubeflow which definitely helps with setting that all up and helping you set up workflows to be able to ingest different training situations. But what's actually happening now is you're actually running models inside of Kubernetes, and not only that, like it's not just a cloud, it's not just I'm going to go to a cloud provider, spin up a Kubernetes cluster and then run my model there, Because the reality is they're not giving you the best of the best CPU memory. It's all shared resources and what you're actually starting to see is that clusters need to have access to a GPU.

Tim:

That makes sense.

Marino:

GPU enabled and you need to effectively be able to pass that GPU through to your workload. So if you're running a model as a pod, it needs access to the GPU and it's not just a let's just set up a GPU pass-through capability. It also means that you're passing a lot of networking traffic too, so now your CNIs are being updated to handle that kind of traffic as well. There's something called dynamic resource allocation, DRA, and DRA in Kubernetes is effectively prioritization. It's QoS for AI and LLM.

Marino:

Yeah, for hardware and model traffic. So you have a few models that are running on top of Kubernetes clusters. It's expensive because GPUs are expensive. Um, but you also have to think about that cost control too, because this is a great way to have that million dollar cloud bill that you probably couldn't even you know, conceive that you never, even thought of or planned for.

Marino:

But you have a team of developers. They need to build, they need to run their models, they need to test out how these models operate. They're doing very unique things and they're consuming a lot of GPU power. It's not so much about compute anymore. It's about access to good quality GPUs that can process thousands of tokens per second not just five, but thousands, right and that's not cheap right Now. The other side to that, too, is we're seeing that only because what happens is you have teams that just go the shadow AI route, go to open AI, go to cloud or anthropic, pull their API keys and just like keep adding credits.

Tim:

Just add credits, yeah.

Marino:

That's it Right, consume the APIs. But here's the problem with that. So you're starting to enable these developers to just send PII, send IP, out into these models without like clicking the right compliance and guardrails that needs to be in place. And so to control that there's a few ways, like you know. You can use something like an AI proxy or an AI gateway, or you own the models on-prem, where you decide I'm going to just build my cloud on-prem all over again and then I'm going to send my developers to use local models.

Marino:

Yeah, they can certainly pull models down and train them if they want to customize them to their needs and still have their applications communicate, do the things that applications can do. This is all like agent to agent stuff now yeah of course.

Marino:

At the end of the day, like you, it becomes almost impossible to scale GPUs because now you have another problem of electricity bills going through the roof. Providers are not going to come after you because you're using GPUs. They're coming after you because their electricity bills, their HVAC bills, are through the roof. So now you have other systems that come into play here. Right, when you think about what Apple's doing, especially with their Apple Silicon chip, right, they're basically enabling GPU building on their devices. Like, I've got this M4 that I'm basically working off of right now. That's where the stream is going on, or where our recording is going on, and I've got a few models running behind the scenes here and I don't hear any fans. Do you hear any fans? No, right, but that's the direction they're trying to go.

Marino:

In fact, there was a Twitter post, probably maybe two months ago, where someone had just bought a bunch of mac minis, m4 based mac minis, racked them up, used a usbc or or what is it, thunderbolt 5 to connect them all up and they had, like you know, I think, more than 100 gigs of bandwidth between each of these nodes. But now you have this super cluster of Macs that's running. You know, you don't have a lot of the same constraints or considerations around HVAC and it becomes a very powerful style of cloud. And then you find out like Apple's got their own container runtime now, right. So now you start to see that they're really trying to advance the developer game, especially when it comes to AI and working with LLMs.

Tim:

Yeah, that's interesting. I think and we've done a couple shows on, basically, what does it look like to build an AI data center for, like, what's the new and exciting? And I think anything we have come up with is something it will change, because the technology has to change, because what is it? Moore's law we're getting so far ahead of Moore's law at this point, like it's it's going to take a while, but figure out what the next iteration of that is. Um, but yeah, so no, that that's that's really, that's really good, and uh, one thought about that.

Marino:

Do you remember infiniband?

Tim:

Yeah, yeah.

Marino:

Yeah Right, rdma. All of a sudden, they're reentering the space all over again. They're becoming popular all over again.

Tim:

It's circular. As the technology hits limits, we explore what if we tried before, is it going to work better? Can we do it better this time? Definitely agree, okay. So I think actually we need to wrap up. So before we do, marino, marina, where can, where can people find you on the web? Where should they, uh, follow you and interengage with you?

Marino:

So I am active pretty much everywhere. Uh, mostly LinkedIn, a little bit of Twitter, x, um, I, I mean, I, I, I do more trolling on Twitter and X more than I do the professional stuff, right, um, but if you, if you want to connect with me professionally and see some of the work that I'm working on or some of the projects that I'm working on at the moment, come hit me up there. You can literally look me up by my name. I'm there as well. I use Discord, but I just like kind of lurk. I'm not really big on like heavily contributing.

Marino:

But one thing I would recommend folks to do is I go to KubeCon a lot. Right, I try to go to every event that's out there in North America and Europe. I'm trying to get to, like some of the, you know, japan and China. Maybe next year we'll see, yeah, and if you're, if you're planning to go out that way, definitely hit me up. Let's connect for a bit, let's chat and even go check out those sessions. There's a lot of new AI based sessions and you can see the intersection of what Kubernetes is doing with AI as well.

Tim:

Awesome, and we'll get all that in the show notes as well, so everybody doesn't have to have to memorize it. Absolutely All right. Well, thanks for joining us today, marina. This has been an awesome discussion. We'll have to have you, have you back and expand on it in a month when everything changes. I'm just kidding, but yeah, no, this is great. Any final thoughts?

Marino:

Yeah, I think that everyone at this point should start considering how they can involve the usage of AI in their workflows, in the way they build. I mean, all it takes is go, download Alama and run a model locally and then, if you want to take that a step further, pair that with Open Web UI or WebKit and if you want to take that even further, check out Alam Studio, because I think not everyone's got the capability to go out and source an API key and then start building with public models and at the same time they might have some restrictions as well. But if you've got a very recent Mac or some sort of ARM-based device, you can do some serious damage with some of those models out there, some of the little ones yeah, definitely agree with that, For sure.

Tim:

Awesome, all right. Well, this has been Cable's Cloud Podcast and I was going to say enjoy everything we do and subscribe to us on everything, but obviously, if you're listening to us, you probably already did that, so instead, of you subscribing, I'd say don't enjoy everything we do.

Chris:

We don't get it right every time.

Tim:

No, no everything To enjoy everything we do. It's a requirement, but yeah, if you liked it, at least share it with a friend, maybe several friends and, yeah, we'll see you guys next time.

Marino:

Thanks everyone.

People on this episode