Real World Serverless with theburningmonk

#51: Real-world serverless at Amplyfi with Liam Betsworth

April 21, 2021 Yan Cui Season 1 Episode 51
Real World Serverless with theburningmonk
#51: Real-world serverless at Amplyfi with Liam Betsworth
Chapters
Real World Serverless with theburningmonk
#51: Real-world serverless at Amplyfi with Liam Betsworth
Apr 21, 2021 Season 1 Episode 51
Yan Cui

You can find Liam on Twitter here and LinkedIn here.

To learn more about Amplyfi and their services, please go to https://amplyfi.com

And here's the post I mentioned in the chat about the DDOS attack that Fathom experienced.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

To learn how to build production-ready serverless applications, check out my upcoming workshops.


Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Show Notes Transcript

You can find Liam on Twitter here and LinkedIn here.

To learn more about Amplyfi and their services, please go to https://amplyfi.com

And here's the post I mentioned in the chat about the DDOS attack that Fathom experienced.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

To learn how to build production-ready serverless applications, check out my upcoming workshops.


Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:12  

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. I'm joined by Liam Betsworth, from Amplyfi. Hey man.


Liam Betsworth: 00:24  

Hey, thanks for inviting me along. My name is Liam Betsworth. And I'm the CTO at Amplyfi. I started working at Amplyfi about four years ago now, which is probably around the time that I was introduced to serverless. And to be honest, nothing's ever been the same since.


Yan Cui: 00:40  

So I guess we kind of crossed paths a while back at ServerlessDays Cardiff. Even though I don't think we actually met in person but I do remember walking past your office. And Matt Lewis from DVLA was telling me about some of the things you guys were doing. Can you maybe start by just telling the audience who is Amplyfi and what you guys do there? I guess, unfortunately, given the space we're in, when you say Amplyfi, people probably think about AWS Amplify but...


Liam Betsworth: 01:08  

Exactly, yeah, there has been some confusion there first, but I just want to say that we got here first, and it is a very slightly different spelling. But yeah. So yeah, Amplyfi, I guess I'd describe us as probably in an AI business intelligence company, what we're trying to do is we're trying to derive insights from unstructured data. So you can imagine that we're ingesting, you know, hundreds of thousands, millions of documents. And then what we're trying to do with that data is that we're trying to basically pick out the kind of trends, you know, to try to understand some early warning signals and some of the disruption, allowing our customers to do some kind of due diligence, you know, any of these kinds of business intelligence tasks that you might want to do with big data. That's what we're facilitating.


Yan Cui: 01:53  

So is it similar to a lot of the online analytics platforms, things like Google Analytics, or something like that?


Liam Betsworth: 02:00  

Not exactly. So whereas Google Analytics, and a lot of the other online platforms are very much focused on, let's say, quantitative sort of results, and you know, measuring metrics of usage and things that we're very much focused on harvesting data from the internet, and not just from the internet, but also from private and kind of premium sources that people pay for. So we will then ingest that content and analyse it through a machine learning pipeline, trying to make sense of it.


Yan Cui: 02:28  

Okay, so I mean, for me, it's kind of difficult to sort of wrap my head around exactly what does that mean? Because when I think about online analytics, and machine learning, a lot of it goes to like, you know, ecommerce, where people are buying, which products are selling well, and you know, how your ads are performing things like that? Do you have some use cases that you can help me sort of solidify my understanding of what you guys do?


Liam Betsworth: 02:50  

Yes, absolutely. So probably one of the simpler use cases that I could provide something that we facilitate. So for example, we may have a customer and for example, there might be a bank or financial institution, or maybe somebody who's lending money, okay. And so they need to do due diligence on their, on their own customers, so they need to understand exactly who they're lending money to. So the bank or the the financial institution may have a particular policy that they that they don't lend money to people who are involved in certain nefarious activities that they that they don't believe in. So what they can do with our software is they can search that customer, you know, that company within our platform, and then they can see essentially everything that that customer or company is involved in. So for example, if they did not lend money to particular companies who were involved in perhaps gambling, or the adult industry, or that kind of stuff, you know, that is something that they could determine from our platform. And so compared to, compared to standard data sources, or standard databases, where a human had to enter that in a human have to say that this company is involved in gambling, or this company is involved in the adult industry, our technology was actually able to determine that from open source data. So from from reading the web, we were able to make that decision.


Yan Cui: 04:13  

Okay, so I think that start to make a little more a lot more sense to me now, I guess, straightaway, I'm thinking about in terms of the challenges you guys must be facing in terms of ingesting data from all over the place, like you said, that unstructured data and trying to make sense of them and also just allocate the right data attributes or records to the right people, you know, people got similar names or the same name but actually a different person, that sort of thing. How do you guys go about solving that? And also, I guess, in terms of the volume of data we're talking about can be quite significant as well.


Liam Betsworth: 04:48  

So yeah, as you said, right. This is a real challenging area, and I guess this is why we thrive in this area, that there aren't very many players here. It's cost inhibitive to ingest all of the data, I don't think anybody can possibly ingest all of the data. So we're very selective in what we do ingest. And we do try and ingest the highest quality content that we can. And you're probably the most trustworthy content as fake news, you know, is a massive thing right now as well. So you're ingesting very high quality kind of gold standard content is especially important. Then the other areas you were talking about, of course, yeah, how do we actually determine that a sentence has a person or location, or an organisation within a sentence? That task is actually called named entity recognition. So being able to, you know, tokenize a sentence and say, yes, there are people locations or organisations in the sentence. And then the second part of that then is, well, now that we've determined that there are people or locations or organisations in that sentence, how do we tie that to a world, real world entity? And then that is the problem then of disambiguation or linking. Yeah, I mean, these are all very kind of novel problems, they've not been 100% solved. And I guess that's, that's why there's lots of room really to still innovate in this area.


Yan Cui: 06:08  

Okay, so I’m quite interested in how you, how, what your architecture looks like for this particular problem. Because it sounds like you need some kind of NLP service that takes the sentence and tokenizes it so that you can figure out which word is the location, all of that. And then run through some other intelligence service maybe sounds like something that's maybe proprietary that you got your custom machine learning models that can do, can disambiguate that data. Can can you maybe talk us through what your architecture looks like from a really high level?


Liam Betsworth: 06:41  

The architecture itself, it's very varied across the platform, of course, right. There's a huge amount of capability here, when people usually talk about, you know, front end and back end developers, you know, perhaps talking about API development and front end development, our back end goes very, very deep with the machine learning. And so throughout that entire stack, we're not just, we're not just using serverless. Here, you know, we're using server. You may remember, actually, I gave a talk in 2018, I think, in ServerlessDays, and probably quite contentious for ServerlessDays, the name of my talk was server or serverless. Right? And the answer was a bit of a cop out in the, to be honest, I don't think you can use serverless for everything. So we don't use serverless for everything, we do use server for some tasks. Probably one of the more interesting parts of the architecture that I can talk about is around actually on machine learning pipeline, you know, how do we actually go from ingesting the data and trying to produce a useful insight of the other side. So yeah, we have experimented with a few different approaches here. So one of the first approaches that we really experimented with was batch processing. In that we would get a whole bunch of documents, we try and process them through the pipeline at the same time, running on S3, sorry, and is running on EC2, or running on SageMaker. And, yeah, so what we'd get then is basically we get a massive backlog of documents, put them through the pipeline, and then they'd come out sometime later, maybe minutes, or maybe hours later. And whilst that was good for a proof of concept, one of the problems that we came up against was that there was no real immediacy there in that if I want to process one document through the pipeline, I have to wait for this whole pipeline to spin up to process the documents to shut down for it to become available. You know, there's no real room here for boosting or fluctuation in traffic. So yeah, we have to start looking at some other alternatives here, you know, how can we, how can we go from processing one document to ten documents to a hundred thousand documents to a million documents, and then back down to zero? Of course, you don't want that infrastructure running all the time. And that's the reason why we chose batch processing to begin with. Thinking of serverless, okay, well, how could we solve this with serverless. Serverless is perfect for these kinds of use cases where you know, you've got this bursty kind of activity, going from zero to a hundred. And so one of the early things that we experimented within this area, probably around June 2020, was when AWS released EFS for Lambda. And we thought actually, that this is great, because Okay, just to give you some background up until then, it's not possible to deploy anything to Lambda that is more than 250 megabytes. And that's a massive problem with serverless, like, how can you get these machine learning models into Lambda? So to give you an example, you know, a common library that you might use for machine learning purposes, PyTorch is at least 500 megabytes. And so if you can only put 250 megabytes in a Lambda, then this is a big problem. So yeah, with the announcement of the EFS, then we realised, Okay, great, you know, we can strap some volumes, some elastic file storage to our Lambda and actually we can load in our models in runtime at runtime. And so, as an initial proof of concept for using serverless and Lambdas, for machine learning for our pipeline, it was fairly successful, actually, you know, we were able to burst from one document to hundreds of documents to thousands of documents very quickly, and then scale back down. But what we quickly realised was that with the EFS, you get burst credits. And we were running out of credits very quickly. And there wasn't really any provision from AWS to say, you know, this is our particular use case, we need more credits, where we need to scale up and down EFS much quicker. I think AWS came back to us, and they said, probably the best thing that you can do here is that if, if you have, if you have larger volumes, then you'll get more burst credits. So actually, it sounds really stupid. But what we ended up doing is we ended up filling these volumes with just white noise, really just a whole bunch of noise, you know, tens, or hundreds of gigabytes of noise, just so that we would get more burst credits. It wasn't a great solution. But again, it got us a little bit further. And so around that point in time, I would say maybe August, September 2020, we've partnered with AWS. And sometimes we get access to some kind of early beta programmes. And so AWS reached out to us understanding our use case, and they said, Look, we've got this really cool thing in the pipeline, and would you guys be interested in trying it. So of course, the thing that they ended up releasing at re:Invent was Lambda with containers. And that specific technology for our pipeline is an absolute game changer. So now, yeah, we've gone from originally batch processing, which we have to wait for ages for it to kind of spin up and spin down to EFS, which we were running our burst credits to now actually Lambda containers where we could have, you know, ten, ten gigabyte models deployed in a container in Lambda and get all the benefits of Lambda scaling. So probably I mean, that's definitely one of the core parts of our architecture that we're using, which is serverless right now. And that, as I said, you know, over the last few months, that's been an absolute game changer for us.


Yan Cui: 12:11  

Yes, funny, you talk about the workaround you have to do for EFS. I guess also worth at this point, for anyone who's listening who's not familiar with the feature that Liam just discussed, he can nowadays use containers as a packaging format for Lambda functions, there's the storage units you get is 10 gig, but it's read only, you can't actually write to the volume, whereas with EFS is a read and write. But it's like Liam said, there's throughput limits, as based on based on the throughput units you collect over time and how much you actually use and how much unit you get is based on the size of your volume. I guess you could also use provisioned throughput, but that's a lot more expensive, depending on demand throughput you need. And you're back to paying for uptime for the amount of the throughput you need, even though you may not need them all the time. So that’s not exactly a very serverless way of doing things. But I guess with containers, one thing a lot of people asked me about, or at least a lot of confusion I've heard is that, oh, you can run containers in the Lambda function, which is not exactly the case, you're not really running a container service, a long-running service per se. But you're using container as a packaging format for your application, which lets you push in all your 10 gig machine learning models, which for tensorflow and tools like that, which is quite easily a couple of gig in size. One thing I do want to ask about your experience using containers with Lambda is that they've done this really clever thing with the container for Lambda support that using this thing called Sparse File System, so that you don't have this massive penalty for running containers using a container as packaging format for Lambda in terms of cold start time. However, the Sparse File System try to prioritise the bits you actually need from your functions. So that is always there when you function cold starts. But anything else is have to be, you know, you have to be read remotely, this is still going to be a network file system that you have to fetch on demand. So if you need to actually run something that is going to use up say, 10 gig of your file storage because you're loading a machine learning model, that can be quite significant in terms of latency list whenever you need that. So a few people I've spoken with have said that has taken a lot longer to load, even compared to EFS. Has that been your experience as well that you load container image with two to three gig? And then because you need to run the machine learning model, you actually need to load the whole thing. So it adds a lot of time to your cold start. Is that something that you guys have noticed?


Liam Betsworth: 14:51  

Yeah, that's certainly something that we've noticed. I think, you know, in blog posts that we released recently with AWS, I think we've been witnessing cold start times, maybe 20 to 30 seconds for loading in some of these models. Now to be honest, because this is running in the background as a process. And it's not always directly visible on the front end, it's not such an issue for us, we're willing to wait that 20 or 30 seconds scale up time. But yeah, it's certainly something that we've seen as well. Yeah.


Yan Cui: 15:18  

Right. Okay, that makes a lot of sense. I guess that's kind of by design, that you could have a really large container image, but most of the time, you don't actually need most of it. So they kind of prioritise that use case and make sure that if people are using containers to ship their applications, so that's, you know, you're serving some web web traffic, then all that additional files you don't need, they can just sit on a container image. But that's not going to impact your performance. But in your case, you actually need to load all of them. But luckily for you is on this, this is not user facing latency. So it doesn't really matter if it takes five milliseconds or hundred milliseconds, or even this case, twenty to thirty seconds. So in that case, are you... That's the bit where you're using them with the machine learning model that you've already built, to serve traffics to, to answer questions as part of your batch process. What about in terms of the data ingestion side of things, I imagine you’re probably using something like Kinesis, or something like that to ingest the data and put it through some kind of pipeline that feeds your machine learning model?


Liam Betsworth: 16:18  

So we do have a number of different pipelines and a number of different ways in which we're ingesting different types of data, it depends entirely on whether it's internal kind of private data that we need access to, or whether we're scraping from the web. Generally, it's a fairly simple approach, actually. But the way that we're generally queuing things right now is just using SQS. Or rather, you know, as we're indexing and pulling documents in, we're saving them to S3 and then saving a pointer, I guess you could call in SQS and pulling that out.


Yan Cui: 16:51  

Right. I see. So in terms of ingestion, how much volume are we talking about here? Because you said you're quite selective about where you get the data from. So I guess that's going to help you reduce the amount of traffic because SQS and, you know, at scale is not cheap. A lot of people use Kinesis, and things like that instead of SQS or SNS because of the fact that at scale they're significantly cheaper compared to SQS. Is that something that you can share in terms of, like, are we talking about thousands of records per day, or is it per second?


Liam Betsworth: 17:25  

So the traffic is very bursty. Basically, we're trying to pull in content on demand, as the user sees that there is an area of interest, perhaps that we don't have enough content for, or perhaps there is more content that we could find in that area. So to give you an example of the kinds of the amount of data that we're pulling in, at any particular time, we may go from zero to having to pull in 10,000 or 100,000 documents within within perhaps 10 or 15 minutes. Now, that does scale up much higher than that, you know, we've we've obviously stress tested this tackle, and we've seen how far we can push it with Lambda. And it's actually it's really interesting, you know, when you see, I guess this is when you really see the horizontal scalability and the power of Lambda. Yeah, when we've been running this with, let's say, 70 million documents in one day. I think that's probably the limit we've had so far. And it's just incredible seeing, you know, thousands of Lambdas running concurrently in your dashboard and watching things, you know, move through the pipeline. But that's the kind of that's the kind of quota we're talking about here somewhere between hundreds of thousands and millions of documents.


Yan Cui: 18:37  

Okay, very cool. Very cool. Yeah, I've heard quite a few people use Lambdas, the horizontal scalability in this kind of similar use cases, where essentially, you know, you've got, you've got this almost supercomputer on demand, where you can get up to what, six cores per instance, when you use a 10 gig function, and you can have 1000 of them, while you can scale to from zero to 3000 of these things in one go. So it's just burst traffic it’s actually super good for these kinds of use cases.


Liam Betsworth: 19:07  

Ah, yeah, I was just gonna say, yeah, I mean, so the exact use case that we're using these for as well, right, is machine learning inference. And if you're running inference of a one document or a very small number of sentences, it's actually not a very intense process. It's the perfect kind of process that you could just dedicate one CPU to. And then yeah, you know, all of a sudden, you have 5000 CPUs or 10,000 CPUs that you can scale to. So yeah, it's almost perfect for our use case for the for the inference. Yeah.


Yan Cui: 19:35  

And what about in terms of some of the challenges that you come across with Lambda? I guess you talked about some of them already, in terms of the EFS support or, and now with, with the container support, it's, it's a cold start time, but I guess in your case it is not really an issue. Is there any other issues that you guys run across, sort of come across as you start adopting more and more serverless technologies?


Liam Betsworth: 20:00  

Yeah, so we use serverless very heavily in our APIs as well. We always have a number of reasons for it really. But again, it nearly always comes down to cost and just kind of flexible. Scalability works really well for us. It's also been really useful for us in terms of being able to spin up multiple environments very quickly, or, you know, being able to build branches and test individual branches without the complexity of something more complex. But yeah, other problems we've had. So actually, you know, that this was the subject of my talk two years ago, which I'm glad to say that AWS have solved since then. And I know you've spent some time or some early work on this as well, Yan. In terms of the cold starts that used to plague Lambda and API gateway, especially if you had your Lambda within a VPC as well, which is exactly what we were doing, you know, hindsight now, but yeah, we were experiencing cold starts at the time of maybe 16 seconds or something, you know, sometimes we were using C# in a VPC. I'm glad to say now that those problems have gone. So that was definitely one of the early challenges that we had. And just to clarify, the reason why we had our Lambdas within a VPC is because of the way that we were interacting with our data store. So we were using ES, and the the way that our, sorry, Elasticsearch, and at the time, the way in which we were securing Elasticsearch was within a VPC. So, I mean, the Lambdas had to exist within VPC to communicate with that without having some really complex kind of proxy. So yeah, that that was definitely one of the early challenges that we had. But obviously AWS has now solved that through, you know. How did AWS solve that? They had, they've attached the NAT gateway to, I forget exactly how they solved that.


Yan Cui: 21:46  

So what they did was, they actually replaced the whole networking layer when they introduced the firecracker. So this replace the underlying, I guess, virtualization technology they were using, with firecracker they were able to rewrite the whole networking layer. What they do is, instead of creating the ENIs on demand, as whenever a new instance of your function is running and needs it, it is created ahead of time on deployment time, so that before your function is actually put into active service, they create an ENI. And somehow they managed to create the ENI based on the unique combination of the network configurations. So if you've got 100 functions all have the same VPC, security groups and all that configuration, then they just use the same ENI. I guess they may improve the ENI efficiency as well so that they don't need many ENIs to support the throughput. Instead, one year now it can support all these different functions or all these concurrent executions. There's still some times when you may see cold start related to VPC and ENI. If you have functions that are configured with VPC, but then the, for a very long time, there's just no traffic. So the ENI that’s provisioned eventually gets garbage collected. So the next time one of these functions gets invoked and needs an ENI, then it still gets created on demand. But by and large, you shouldn't see them anymore, because they are provisioned ahead of time, which means when you have VPC functions, deployment is going to take a bit longer, but at least that’s your time, not all of your customers' time.


Liam Betsworth: 23:21  

Yeah, sorry. Yeah, it's all coming back to me. Now, that kind of nightmarish scenario. But yeah, exactly. Creating the ENI was the part that was taking the time. Yeah. So of course, yeah, there's a shared ENI now. But yeah, so other kind of challenges that we've had with Lambda, it's, I find that it's very easy, actually, the technology moves so fast. And there are so many new developments. And sometimes it's really hard to know whether you've made the right choice or not, you know, because you want to be an early adopter, and you want to try out the latest technology, you know, like the Lambda containers. But sometimes, if it's a very early kind of access technology, you're never quite sure, really, whether you've made the right decision. And I guess only time will tell whether you've made the right decision or not. You may have to go back and perhaps refactor your architecture,


Yan Cui: 24:03  

Especially with Lambda nowadays, they're getting more and more features. And certainly not every feature is applicable or useful for everybody. And you can you can have something that's really complicated. You got functions that can use container imaging, you can use the extensions, you can use the Lambda layers, a combination of all the different features, but then, most of the time, you probably don't need them, right? And I think certainly, for most use cases, you know, you can just keep it simple and use.. Oh, hello. 


Liam Betsworth: 24:37  

Sorry, my dog just jumped on the call.


Yan Cui: 24:39  

That's alright. I've got a cat but she is just sleeping. She is said even paying any attention to me whatsoever.


Liam Betsworth: 24:47  

Yeah, I was gonna say actually a sort of a final thing that, you know, this is always on my mind, actually with serverless is that you're very susceptible to Denial-of-Wallet kind of attacks. And that you know, that's not to say that anybody is aiming any attacks at us. But let's say for instance, when you're scaling from zero to 100, very quickly, and you could be running at 100 for a very long time, it's very easy to go from spending no money to spending, you know, 10s of 1000s of dollars in just no time at all. So it's always something in the back of our minds, you know, making sure that we've got the right kind of alerts and alarms set up to make sure that this doesn't happen to us. Yeah.


Yan Cui: 25:22  

I don't really remember reading that blog post from I think Jack Ellis from Fathom, they also do, they also build some kind of, I guess, more like Google Analytics thing, but using entirely serverless technologies. And they had some very interesting case where a competitor, I think, was or someone was doing denial service attacks against them. And they were running into exactly the problem you're talking about, and that they did lots of work with AWS with the Shield Advanced team to put in much more stringent processes in place so that they identified those bad actors quickly, and then blocked them on the WAF, and, and rejecting traffics, and all these things just to protect themselves against future attacks. But it was really interesting blog posts, I'm going to put in the in the show notes so that other people can read about it as well. But yeah, definitely that's something that you gotta worry about. But I guess it's also the same problem that exist in other applications that service attacks is always going to be something you gotta worry about. You know, you may not be paying for the individual Lambda invocations, but you will still be paying for uptime for EC2 instances and other things like that. What about in terms of just adoption? Obviously, as an early adopter, you ran through some of the early pain points using Lambda, with VPC and all that. What about in terms of staffing and, I guess, training? Obviously, being in a very different way of doing things, did you find it harder to hire people with the right skill set? And what about, you know, trying to teach the people you already have how to make the best use of Lambda and other services? 


Liam Betsworth: 27:00  

Yeah, I mean, as a newly sorry, as an early stage or kind of startup scaler, working in an interesting area of machine learning with natural language processing, we haven't really struggled to attract the talent who are interested in working, I guess, with novel technologies or new areas, that kind of stuff. So a lot of the kind of people that we're attracting are very keen, very interested to try out new technologies. So certainly there hasn't been any barrier to adoption for us in that. People are more than happy to use the technology, they want to come and work for us to work with this technology. In terms of training, and onboarding, and things, we're very lucky in Amplyfi that we have a really capable architecture and DevOps team. And again, you know, they are certainly advocates for serverless. It really helps having that in house expertise and kind of knowledge. Yeah, pushing for these kinds of ways of working.


Yan Cui: 27:52  

Okay, that's actually quite an interesting thing I want to maybe ask you a little more about that, your team structure. So you said you've got a DevOps, I guess the infrastructure team? How do you, I guess, in this case, separate different responsibilities? Do you have a feature team that only develops the Lambda functions themselves? And then have some other teams that provision other things like API gateway and such like that? Or do you have your development teams own almost everything and be autonomous, but have DevOps teams do some kind of support, putting the tooling around it, monitoring all of that?


Liam Betsworth: 28:28  

So the way that we're set up within the engineering team is that we have leads of individual functions. So we have a front end, back end, machine learning architecture, DevOps teams within the engineering team. And although they exist as individual functions within engineering, we do then have cross functional development teams. So let's say for instance, architecture and DevOps acts as floating resources that can be used amongst the individual kind of platforms or projects that we have going on. And, yeah, to be honest, it's fairly autonomous. Anyone is allowed to do anything, but I guess, with the oversight of DevOps and architecture than the sort of, you know, checking in on pull requests and things or checking in on the designs of systems before they go to production, just to make sure that they do, you know, use best practices.


Yan Cui: 29:24  

Okay, that makes a lot of sense. What about in terms of the things that you hope AWS will do better? Do you have any sort of AWS wishlist items that you'd like to share? 


Liam Betsworth: 29:34  

Yeah, there's two, two for me, I think. As a machine learning company, one of the really cool things for us would be and I think we're probably only just one step away from this now, you know, we're trying to become more and more serverless. But but one of the things that you can't do right now is you cannot attach a GPU to a Lambda. Now, as I said, you know, a lot of the inference that we do, you can use a CPU to do the inference, but you can do much quicker and more complicated model inference if you had a GPU. So I think for us, that will be a massive one, if we could attach a GPU to, to a Lambda, just like you can to an EC2 instance, then that will be awesome.


Yan Cui: 30:13  

That'll certainly be a game changer. I was talking to Denis Bauer from the Australian, I guess the research, one of the research divisions. They're doing a lot of genome analysis on the COVID-19. And she was raising this exact same point that you can put a Lambda function with a 10 gig of memory that gives you six CPU core. But that's nothing compared to if you can attach a GPU, and that's going to give you 1000 cores, and you can do much more parallel processing, and you will be, orders of magnitude difference in terms of performance. Yeah, that would be, that would be pretty amazing if you can do that. What's your second wishlist item? 


Liam Betsworth: 30:53  

So my second item is, this is a very niche one, actually. And probably only only people who have experienced this issue will have will have encountered this. Basically, the way in which we think about the way in which you think about event driven computing and serverless, SQS triggering Lambdas, I think most people see that when a message appears on a queue is that there's almost like a push event, and then it triggers a Lambda. Whereas actually, from my understanding, it's not that way, it's that when a message reaches the end of the queue and triggers a Lambda, it's actually that, you know, something in the background within AWS has infrastructure is polling the queue continuously to pull a message off. Now, whilst that works, you know, in most scenarios, what you'll end up finding is that if if in our use case, you are scaling from zero to a million documents very quickly, if you have zero messages in your queue, then the polling mechanism ends up getting throttled. So then, you know, if we had a constant throughput of a million messages, you know, constantly coming through, it wouldn't be an issue in that, you know, it would be polling at the rate, which will allow a million messages to come through. But then if you scale down to zero, then the polling very much slows down. So it does take some time to kind of catch up again, and then so you won't immediately see all those messages being ingested as fast as you'd like. So yes, quite a nice one. But you know, if they could change that SQS polling to, to more of an event driven kind of architecture, where actually it's a push rather than a poll, I'd be impressed by that.


Yan Cui: 32:20  

Yeah, that might be a tricky one for them to do, because that’s kind of depending on SQS, and how it works. And SQS is a poll based service. So like you said that what they've done is that they've got this polling layer that Lambda team is managing, and I remember correctly, it starts with five pollers. And then it scales up the number of pollers based on a number of messages in flight. So it actually monitors the number of messages as in flight. And as that goes up, increases the number of pollers by something like 60 per minute or something like that. But like you said, to hit the peak throughput you need, it takes some time to get there. So maybe what they could do is give you more control over how many pollers they run for you. I guess the downside of, the problem for them in that case is that that whole layer, the whole polling layer is is free, is this part of the, the value you get from Lambda. But if you if you just say I want 100 pollers all the time, but there's no traffic like 90% of the time, then they are wasting resources that are not being actually utilised. So I can understand why they don't give you that control, but maybe there is something they could do by giving you if you want to have custom number of pollers, then you have to pay extra, something like that, pay some hourly, hourly rate, because they have to spend more resources doing polling for you and not getting any messages. But at least that way, you have some more control around cases where, you know, for example, a lot of the ecommerce, food delivery services or live events based services, they know when the spikes are going to come. And when they come, they usually are pretty sharp. So having some kind of auto scaling controls around that layer of message polling will be really useful. Okay, that's a really good one. I've never actually really thought about that. Yeah, I've run into other issues around that whole polling thing because it doesn't take into account reserved concurrency on your functions. So even if you set your function to just one instance at a time, the poller would still be pulling out whatever, whatever number of pollers that it decides to run based on the traffic going into your queue. So actually I had that opposite problem of how to control the exact amount of concurrency I have in my function. And is there anything else that you'd like to sort of mention before we go? I think I've covered all the questions that I had. Anything else that's coming up with Amplyfi? Maybe something that you want to tell the audience about, new projects, new service you are offering?


Liam Betsworth: 35:00  

Yes. So within the next month now we are releasing a new product called Deep Insight. And actually it uses the platform that I've just been describing actually in this talk. So, yeah, keep an eye out for that. It's called Deep Insight.


Yan Cui: 35:13  

Okay, I will make sure I add the link to that. Is that like an announcement blog post somewhere that I can link to?


Liam Betsworth: 35:19  

I can share a link with the website. That's no problem.


Yan Cui: 35:21  

Okay, cool. Sounds good. I will do that. And I also include links to your social media profiles as well as to the Amplyfi website. Are you guys hiring by any chance?


Liam Betsworth: 35:32  

We are hiring. Yes, we are hiring machine learning engineers right now. Yeah. So if you are a machine learning engineer, and you're interested in NLP, then yeah, please reach out to us.


Yan Cui: 35:40  

Excellent. I will include a link to your careers page as well so that people can check it out. And again, Liam, again, thank you very much for taking the time to talk to us today.


Liam Betsworth: 35:49  

Yeah, thank you very much.


Yan Cui: 35:51  

Tare care. Okay. Bye, bye. 


Liam Betsworth: 35:52  

Yeah. Cheers, Yan.


Yan Cui: 36:41  

So that's it for another episode of Real World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production ready Serverless Applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.