Transforming AI Development: G-Core Labs' GPU Infrastructure, Scalability, and Global Reach Artwork

What's Up with Tech?

Tech Transformation with Evan Kirstel: A podcast exploring the latest trends and innovations in the tech industry, and how businesses can leverage them for growth, diving into the world of B2B, discussing strategies, trends, and sharing insights from industry leaders!

With over three decades in telecom and IT, I've mastered the art of transforming social media into a dynamic platform for audience engagement, community building, and establishing thought leadership. My approach isn't about personal brand promotion but about delivering educational and informative content to cultivate a sustainable, long-term business presence. I am the leading content creator in areas like Enterprise AI, UCaaS, CPaaS, CCaaS, Cloud, Telecom, 5G and more!

All Episodes

What's Up with Tech?

Transforming AI Development: G-Core Labs' GPU Infrastructure, Scalability, and Global Reach

June 18, 2024 • Evan Kirstel

Interested in being a guest? Email us at admin@evankirstel.com

Discover how G-Core Labs is transforming the landscape of AI development with Michele Tironi, the head of their AI product stream. Join us as Michele unveils the power of G-Core's GPU-based cloud infrastructure and their expansive network of 180 points of presence. You'll learn about the immense computational demands of AI model training and the critical role of low latency and scalability during inference. Michele digs into how G-Core's offerings are ideal for large-scale training and applications requiring rapid response times, and he highlights complementary services like managed Kubernetes and storage solutions that smooth the AI development workflow.

In the second half, we delve into the impressive scalability and global reach of G-Core’s GPU network. Michele discusses the flexibility of their pay-as-you-go model, which caters to applications from minimal GPU usage to extensive deployment for fields like generative AI in marketing and gaming. You'll hear about the journey from AI experimentation to widespread adoption, and the excitement around the upcoming launches of their InfraX and Edge products. Plus, get a personal glimpse into Michele's summer plans in Italy and his lighthearted take on the rivalry between AI research at Oxford and Cambridge. Tune in for a riveting discussion that promises to illuminate the future of AI infrastructure.

Support the show

More at https://linktr.ee/EvanKirstel

Speaker 1: 0:01

Hey everybody, fascinating and timely topic today on accelerating your AI development through a GPU-based cloud infrastructure with G-Core Michele. How are you?

Speaker 2: 0:17

Very well, thank you. Good to meet everyone.

Speaker 1: 0:19

I'm.

Speaker 2: 0:20

Michele Tironi dialing in from London.

Speaker 1: 0:23

Well, good to have you here. London's a bit of a hotspot for AI development these days, so a good place to be. Maybe introduce yourself and for those who aren't familiar, to kick things off who is G-Core?

Speaker 2: 0:36

Sure, so my name is Michele Tironi. I joined G-Core a few months ago as heading up our AI product stream. I have a background in high-performance computing and machine learning development, particularly computer vision and other AI topics. And yeah, G-Core we are a European-based but global AI infrastructure provider. We have a global network of 180 points of presence from our CDN network and we're now expanding into the AI infrastructure space, so we're offering out both GPU clusters of training and also inference at the edge, which are topics that I'm sure we'll cover shortly.

Speaker 1: 1:22

Yeah, these are all fascinating topics for all of us, industry watchers and insiders. So maybe describe a little bit about the network and how it works for AI training, how it's set up, configured and how you see engaging existing and new clients in the AI space. What does the process entail?

Speaker 2: 1:52

Yeah, of course. So, as you probably know, when it comes to developing AI, ai models and then deploying them in production, there's two main phases of compute. The first one, of course, is the training phase Many techniques, but normally these are very compute-intensive tasks where you are feeding models a large amount of data and training these models over a particularly long time, and that phase is very compute-intensive. And then, once you then have a train model, then typically you then deploy that in some application and then end users using some application, and then end users using application will then call the model and do what's known as model inference, where you query a model and get a response. And so when it comes to sort of the infrastructure to do those two parts of the process, there's some different requirements and at GECO we try to cater for both of those.

Speaker 2: 2:49

So, when it comes to the training side I mentioned, this is a very computationally expensive part. So what we're trying to do is really provide a large compute cluster so that you can train these large models on as much data as quickly as possible. And in this sort of situation, what typically researchers or data scientists are looking for is really raw compute power and scale. So people are running large clusters of GPU servers and feeding as much data as you can to try to get the performance they need.

Speaker 2: 3:26

On the other end of the spectrum, once you come to inference, an individual inference call is typically less compute intensive than a full training run, but of course you have much more scale because you can imagine, you have an application. You've got many, many users, each of them calling a model, and so in these circumstances, what you're looking for is very fast response time so that, as a user, you're getting a very quick response from your query. And you also care about performance. You care about being able to service customers around the world and you care about scaling. As I mentioned, you have many users, multiple queries, so you really care about how the compute scales up and down with usage.

Speaker 1: 4:09

Yeah, fantastic offer you're putting together here. Are there certain customers or use cases that you see as ideal for your GPU cloud? What are you going for exactly?

Speaker 2: 4:28

Sure, yes.

Speaker 2: 4:29

So when it comes to the trading aspect, as I mentioned, our ideal customers, our customers are looking for a large amount of interconnected compute.

Speaker 2: 4:40

So what I mean by that is typically when you go to the cloud and you might get hold of some compute, traditionally you might get a virtual instance with maybe a few CPUs or one or two GPUs. When it comes to training, you're looking for a large cluster, and so you want that cluster to be interconnected very fast. Networking that's something that we offer, so we're really focused on customers that need a lot of raw compute power. When it comes to the inference side, the unique proposition of G-Core is that we can offer inference across our global locations with very, very low latency, and so for us, an ideal customer is someone that has a use case, where they have a global customer base and are looking for very, very fast response times wherever you are, and the benefit that we offer is that, regardless of where the user is, the actual inference will happen in a local region near the user, which reduces the time it takes to reach the compute node and get the response back.

Speaker 1: 5:51

Interesting, and you have a whole suite of services that you can potentially integrate with your GPU instances. I'm thinking of managed Kubernetes or what else. What else might be relevant to someone developing AI?

Speaker 2: 6:08

Exactly that. So we have a. There's a range of workflows that typically happen when you're going from AI research through to development, through to production, and we have a range of services that can cater for all of those. So, typically, if you're just doing, focusing purely on the training, then you want the most performance, and so we offer these sort of large AI training clusters. But once you get a little further along and you're managing more workflows and maybe training inference, then yes, we can offer a managed Kubernetes service and that enables you to really scale your compute up and down and manage that more easily. We also offer storage solutions. So these, also these models require data train, so you typically need a lot of storage and of the feedback to be training. And then through to the inference side, we have our new product, inference at the edge, which are designed to enable customers to deploy a model anywhere in the world through a single endpoint.

Speaker 1: 7:13

That's a great proposition, and so data privacy security are top of everyone's mind these days. How do you ensure privacy security in this new paradigm where everyone's worried about risk?

Speaker 2: 7:31

Absolutely, and we see that coming up as a topic time and time again of our customers. And our approach to this is through our sort of edge inferencing, as we call it, and the idea there is that, regardless of where your users are, the actual inference of the model will always happen in a local region that is near the user. So what that means is that, supposing you have a user in the United States, then when they want to do model inference, the data and the inferencing will stay in the United States. Well, if you have another user who is in, say, japan, then in that case we will do an inference in Japan and that means that the data will also stay local to Japan. And so in that way we really ensure that all users that their data is staying local and being processed locally.

Speaker 1: 8:27

Yeah, nice approach. So when it comes to the big tech, the hyperscalers competitors of yours perhaps, I mean how do you see yourself positioned? I mean, these are pretty expensive resources, gpus, pretty hard to come by. How do you set yourself apart in this market where you've got all the hyperscalers and very big companies like X here we're broadcasting on who are beginning to consume these CPUs?

Speaker 2: 8:59

Absolutely. There's been a huge growth in this space in the last couple of years. As you mentioned, you've got both the hyperscalers who have been around, who have huge scale and are now moving into this space, and then you also have some newer companies who come in recently to the market, and what we're finding is that we're positioning ourselves is that, unlike hyperscalers, we can offer both a public cloud but also private cloud offering. So what that means is that if a customer wants a dedicated cluster, we can actually deploy it and architect it just for them, and that again ensures greater privacy.

Speaker 2: 9:40

And our other benefit is that, particularly for customers that are either in Europe or outside the US, the fact that we have our data centers and our data processing again local, so either in Europe or in other parts of the world, it is equally something that is quite important to some customers. Likewise, when it comes to the inference side, we've really focused a lot on the network performance, network speed. That is something that, as a provider that we come out from a CDN space, is something that really differentiates us. We have a very, very low latency network that is one of the best on the market and that enables us to offer very, very low latency, fast influencing.

Speaker 1: 10:28

Nice, describe your network a bit. We didn't really get into this, but where are your points of presence? And I think you're in a lot of places that are unusual, at least from the US perspective. You don't hear about a lot, but you're in some really interesting places. Will you put GPU global network with over 180 points of presence now?

Speaker 2: 10:55

They're distributed globally, with large presence in Eastern Europe, africa, the Far East, so it's definitely not centered around just Europe and US. It's much more global In terms of then our deployment of GPUs across that. We started with a few strategic locations, particularly the US, europe and the Far East. We're looking to grow that throughout over time, a large proportion of the points of presence to be able to cover the full global area that we cover from a point of presence.

Speaker 1: 11:33

Very nice, very nice. And in terms of scalability, I assume you can start pretty small for labs or proof of concepts and then scale pretty large, as you do with your variety of services. How does that work in terms of scalability and growth?

Speaker 2: 11:50

Absolutely so. One of the things we've really focused on in our inference product is to make it very scalable and scale down back down to zero. So what that means is that if you don't have any user at an instant time, you have no GPUs and you're not paying for anything, so it's a full pay-as-you-go model. And then with that model, you can start from a very small amount of capacity where maybe you just have a few users and you need just a single GPU, or you can scale up to tens or hundreds of GPUs that customers are getting the computer when they need it and then, when maybe a period of downtime, they are then releasing those resources and not paying for them.

Speaker 1: 12:43

Yeah, it seems like the only logical way to build a platform now. These are very expensive devices, probably pretty hard to come by. I don't feel bad for your supply chain people trying to get a hold of GPUs at scale at this time, but good luck with that. I'm sure you'll overcome those challenges. Is it too early to talk about any deployments or use cases? Are we more in the trial and science project and testing phase?

Speaker 2: 13:18

Yes, so right now, the InfraX and the Edge product was launched earlier this month, so we're in a thank you very much. We're in an initial beta phase, so we have early customers across a number of use cases that are trying out getting early feedback, and then some of those we will hopefully scale out in the coming weeks and months. In terms of use cases really getting interest across the board are in generative AI use cases, for example, image generation, for say, marketing-type use cases or film generation, and then we're also getting interest in other applications in, for example, gaming, where we have a lot of existing customers who come from the network space and they're looking at real-time applications for AI within their gaming and so they still really need that low latency, fast response time to offer a very good customer experience to their users.

Speaker 1: 14:28

Fantastic, as someone who's been in this field rather a long time. What are you most excited about? About what's next in this field rather a long time. What are you most excited about about what's next in this next AI wave? What's on your mind?

Speaker 2: 14:42

I think what's really exciting for the last few years is the way that we've had this huge explosion of interest, of course, and particularly since chat, gpt.

Speaker 2: 14:53

I mean, I guess AI and deep learning has been around for a few years, but I think in the kind of public conscious it really exploded with Yadam and ChatGPT and suddenly there's a whole new set of use case and applications that have become possible.

Speaker 2: 15:08

I think what's really exciting is that in the last maybe 18 months there's been a lot of experimentation, a lot of proof of concepts, a lot of pilots, and what we're really seeing is now businesses whether it's small businesses, but also enterprises are now starting to see the value of those POCs and pilots and now really scaling out adoption at large scale. That's really exciting from a kind of perspective in terms of just seeing that use in to solve real problems, and also from a G-core perspective. It just shows that the kind of problems you have to solve slightly change. You're moving from people more doing research, experimentation and maybe just a small number of users and a small number of use cases to suddenly you've got to scale out across the world with thousands or millions of users and that creates a whole set of interesting challenges and great opportunities for companies such as ours.

Speaker 1: 16:14

Fantastic. So, and just on a personal note, this summer any travel or trips planned or just heads down getting this product to full production.

Speaker 2: 16:24

Sure, there's certainly a lot of work to do in the next few months for the whole team where we're really focused on this product. But of course, it being the summer although you wouldn't know it yet in the UK, but summer is around the corner. You may have guessed from my name Michele is originally Italian, so I'll be spending a bit of time in Italy, hopefully to get a bit more sunshine, but otherwise, yeah, we're really focused. We've got some exciting concepts coming up and we're looking to take the product to the next level and out of the beta phase to full production.

Speaker 1: 17:00

Fantastic. Well, we'll all be watching. I see here on your LinkedIn profile you went to the University of Oxford. I didn't go to Cambridge University, but I lived there for quite a while. Who's doing the best work these days in in uh ai research? Would you say cambridge or oxford? Who's your your alma mater or cambridge university I think it's.

Speaker 2: 17:24

It's a great question. It's probably unfair to pick one of the two. They they're both doing, uh, fantastic research in different areas. Um, I mean oxford, where, where I came from, was one of the leading groups in the original computer vision and deep learning group. A number of professors went on to Google and Meta and places like that. But with AI moving to really so many different areas also pharma, healthcare, biotech Cambridge also doing some fantastic work there, and cambridge also has the uh. Well, what was the largest supercomputer in the uk until recently, there's now a bigger one over in bristol. Uh, but you know, as you know, the, the real kind of with driver um, behind all of this uh ai growth is is just compute, and that's why we're seeing such growth and all the hype around NVIDIA, of course, and the growth in that stock market. It's because it's really the compute that unlocks all these incredible applications, and that's the problem that G-Core is trying to solve, working with our partners at NVIDIA to really give access to this infrastructure that enables all these amazing applications.

Speaker 1: 18:41

Yeah, it's unlocking a whole new opportunity. Congratulations on that and keep in touch. Look forward to seeing all the progress and the value you're creating.

Speaker 2: 18:50

Fantastic. Now I've been great to talk to you about it and I look forward to the growth and keeping in touch.

Speaker 1: 18:59

Thanks, so much. Take care everyone and thanks for watching.