#43: Real-World Serverless with Ant Stanley (part 2) Artwork

Real World Serverless with theburningmonk

A podcast where we talk about real-world use of Serverless technologies from engineers who work with them day-to-day. We will discuss use cases, why they chose serverless and the pain points and challenges they face. If you want to know what it's REALLY like to work with serverless, this is the show for you.

All Episodes

Real World Serverless with theburningmonk

#43: Real-World Serverless with Ant Stanley (part 2)

December 30, 2020 • Yan Cui • Season 1 • Episode 43

You can find Ant on Twitter as @IamStan.

Links to things we discussed in the episode:

re:invent 2020 in review (serverless) with me and Ant Stanley
re:invent 2020 in review (containers) with Ant Stanley and Vlad Ionescu
Ant's new Interrupt publication
midway framework by Tencent
How a company racked up $72,000 in GCP bills overnight

Senzo workshops in January:

Production-Ready Serverless by me
Running containers on AWS by Vlad Ionescu
Testing Serverless Applications by Slobodan Stojanovic and Aleksandar Simovic
Node.js for Serverless by Ant Stanley

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:11

And since we're talking about all the different vendors, let's go back to what you were talking about, what I was gonna ask you about earlier, in terms of the major differences between Lambda, Azure functions and Google Cloud Functions. And where do you see maybe where Azure is doing better? And maybe where Lambda is doing better for example?

Ant Stanley: 00:34

Yeah, I think so. I've used most, most of the major cloud providers. I've used Google Cloud Functions. I've used Azure Functions, and obviously Lambda as well all within, you know, in paid gigs and production so my thoughts on a lot of them really comes down to the ecosystem within which they exist, you know, the first ever functions provider was Iron.io functions, Auth0 had a functions platform called WebTask. All of this kind of existed just before Lambda existed, and they don't really exist anymore, because the adoption is often driven by the ecosystem, you know, folks who are using AWS adopted Lambda first. Folks didn't go to AWS because of Lambda, definitely not in 2015. And for me the quality of the different functions platforms, is as much about the quality of the overall platform so the amount of event sources so Google for example is not super event driven. They don't have anywhere near as many event courses as AWS or Azure and that's that's a major failing. They've got Google Cloud Run which is great and Google Cloud Functions which is okay, but Cloud Run is really all about synchronous workloads, you know, so if you want the asynchronous event driven stuff. Cloud Runs is not a great platform because it's not an easy integration, you have to use Google Cloud Pub/Sub to essentially make stuff event driven. So you have your event gets pub’d to Pub/Sub. Pub/Sub then invokes Cloud Run by sending essentially the call from Pub/Sub to Cloud Run as a synchronous call. So Cloud Run is not async natively itself it requires Pub/Sub to do that. And there's a very limited number of data sources so that's a problem with that. Cloud Functions itself, Google Cloud Functions hasn't had the development of the other platform so Azure Functions and Lambda, both have really good cadence of releases and features. Cloud Functions hasn't quite had that, you know, there's a bunch of features that people were expected from day one. So you can't really do a custom domain name with Cloud Functions, which is a major failing. If you want to use anything publicly, that's an issue, you know, you have to use Cloud Run. So Cloud Functions is, unless you're using Cloud Functions via Firebase and Firebase functions, and then Azure Functions. The biggest problem I have with Azure Functions is that it's not really an Azure Functions problem. It's an Azure problem. And that with authentication there's no global IAM equivalent within Azure Functions. Azure got Azure Active Directory, which we think would be supported by every service. But if you go under the hood, a lot of the serverless services don't support it. So, if you're using Cosmos DB to really high performance, NoSQL scalable database, which now has auto-scaling and just great. But Cosmos DB doesn’t support Azure Active Directory. So, you need to make, so you can’t create a service using Azure Active Directory, which can assign to a function, and then that function can't make a call directly to Cosmos DB. It has to make a call to Key Vault, which is a key management system that Azure key management system that supports Active Directory so you put your Cosmos DB key in Key Vault. So you use your Active Directory token that your function has access to, to call Key Vault, and then you can make a call to Cosmos and you do that once during initialization. But it's a step you just don't have to do on Lambda, for example, and that's the biggest failing I find with Azure is just you'd expect everything to be tightly integrated, but, you know, there's a bunch of these services, Cosmos DB is one, Event Grid is the other one that don't support Active Directory natively. I do expect that to change, because they are increasing amounts of services that use Active Directory. So for example, Azure Blob Storage didn't support it and they now does. But that's kind of what it's, yeah, so that's kind of mean Lambda for me is, is the leading platform, and it's not just about the performance of what you can do with Lambda and extensibility of Lambda. If AWS, if it didn't have all the event sources and it wasn't so integrated into AWS, it wouldn't be anywhere near as useful. You know, outside in terms of the major functions providers, Lambda is clear number one, Azure is definitely number two, talking about folks running within clouds, you know, I separate out the edge function providers as a separate category, but definitely Lambda one, Azure Functions two. Google Cloud Functions definitely needs a lot more development. It needs to, like it seems to be likely the forgotten child at Google, they seem to focus more on Cloud Run. Cloud Runs are really good for very specific types of workloads. And then, yeah, I don't know how the other providers are doing. I'm really interested to see how the Chinese platforms are going. Alibaba and Tencent both have huge investments into serverless but we don't really see the capability of those clouds outside of China is very limited. They have had regions outside of China but they don't have anywhere near the functionality of the regions within China so it's not really ever been something I've had to use. But definitely I would say, yeah, Lambda number one, Azure Functions number two, Google Cloud Functions kind of trailing a bit at the back then unfortunately.

Yan Cui: 06:19

Yeah, yeah, I guess the Alibaba has got the strength of the Chinese market or the size of it. So at Andy Jassy's keynote, the other day he showed that Alibaba cloud already has a bigger market of the, of the cloud market, compared to Google because of how big the Chinese market is, I guess,

Ant Stanley: 06:38

Yeah, yeah, the Chinese market is gigantic. I turned to the folks at Tencent. So Tencent helped run cloud ServerlessDays in China, and they've just, or about six months ago I think they announced a $70 billion investment over the coming years into Tencent card. You know, that's more than Azure and Google combined. I think that's like AWS levels of investment and their focus is just the Chinese market to try to catch up with Alibaba. It's crazy. I think the Tencent folk, actually not just Tencent, if you google and I don’t know you've done this, go to Google Trends and put the word serverless in, and then go to the worldwide view, and you'll see that the most searches for the word serverless come from China, by like a long way, like you think, you know, America is where most of the serverless that's happening. London's another biggest community. They are like, China is just miles ahead, and they've got their own, there's a lot of serverless specific frameworks coming out of China as well, which we don't see at all. And Alibaba has an interesting one which, in theory, enables which abstracts away the cloud providers from the runtime perspective and enable you to run a Cloud Function on any provider but it's written by Alibaba and is sitting, no one knows about it at some timer. It's, it's really interesting.

Yan Cui: 07:59

That's a very closed market it seems. So you mentioned the, the strengths of the Lambda platform being, not just the function itself but also all the things around it, but also in terms of those specific use cases that Lambda now starting to support with all the different new features they're adding, but you also talk about how now is become this really complex thing, and is, especially if you're like a front end developer, you are getting more into serverless you want to escape from the limitations of say, the framework that you are suing, and then you go into the Lambda console you see all these different configurations that you didn't have to see before. And you are like, “Oh my god, what hell's going on here?” Is there anything that the Lambda team could have done or could do in the future to make things simpler so that you don't have to see all of these different configurations all the time and potentially based on your use case only some of the things that are relevant to you, are exposed to you?

Ant Stanley: 09:01

Yeah, I think it is possible, you know, where they provide you with a very basic set of defaults that are based on the most likely use case. You know, so for example like in web development in general, things like Webpack and Rollup, and all these bundling and transpiling frameworks, add a lot of complexity to web development build pipelines and then what came out as a, came out of that as a lot of folks in started to develop opinionated frameworks on top of them that had all these opinionated defaults. So folks, actually you don't need to learn Webpack. You don’t need to learn Rollup or whatever it is, you know, you use a framework that has these defaults packed in for you. And I'm wondering if it's worthwhile the AWS team doing something like that. So, you know, I have asked a question up front, what are you using this for, data pipeline, you know, present a console where it's all optimised for that, you don't really have to think too much about all those. You gotta predefine the whole configuration, not just about a, you know, a code template but have a configuration default for you, for your use case. That could be super useful and then, then potentially, you know, maybe even hide a lot of the options as well, you know, have an advanced tab, you know, where you go to as like, where if you are brave, you know what you're doing, you go to that advanced tab if you understand all of that. There was a… I don’t know if you saw it last week there was an article trending on Hacker News about this company who are using Google Cloud Run, a startup was using Google Cloud Run and that they didn't really understand how it worked. And I think they had some sort of recursive workflow or something like that in there, and they ran, they ran up to $72,000 bill in a day because of slot misconfiguration and not fully understanding how the platform worked. So they had a $72,000 work bill but I think Google ended up writing it off. But they publicly admitted in this blog post that is part of the problem is they didn't understand how the platform worked. And I think, yeah, having opinionated defaults will help a lot of that and also understanding that early users to these platforms don't necessarily understand how that works. So don't give people a loaded gun to shoot themselves in the foot with would probably be a really good, good way to go as time, you know, have a advanced... lots of questions before if you want to unlock it, or something like that might help as well.

Yan Cui: 11:38

Yeah, I like the idea of having different personas and then have different consoles that are dedicated to those personas so that you don't... Even for me when I'm trying to go into the Lambda function now there's just a bunch of things I just, I just don't care. It doesn't really apply to my workload but that they are there I have to scroll through to pass them all the time to find out what I need. So having a more dedicated persona profiles that hides a lot of these options from me that will be just useful in terms of navigation as well. And I guess the circular infinite recursion, that's also hit quite a few people on Lambda in the past, but I guess that they didn't ever hit quite as big a bill as 72,000 that I've heard. I think the biggest one I heard was a couple of hundreds maybe because the Lambda is just so cheap. And I guess now you've got a 10 gig functions, maybe it gets more expensive.

Ant Stanley: 12:31

Yeah, there's, I’ll see if I can find the article but yeah, it was, I think it was some sort of recursion they built a web scraper essentially, I think, and they're trying to, I think they ended up basically trying to scrape the whole internet where it went a bit too far too fast. You know, one hand, you having a highly scalable platform is great and the other hand, you can get yourself into trouble with it. So you know, you use these things responsibly and carefully and understand how they work, or your provider, you know, provides those guardrails, so you understand, Hey, you can't expect everyone who touches this thing to understand it completely, you know, so.

Yan Cui: 13:06

Do they have something similar to Lambda in terms of the regional concurrency limits which is there in part as a guardrail, but also things like billing alerts so that, you know, if you suddenly see a really big jump in your AWS bill, you get alert before something really bad happens?

Ant Stanley: 13:24

Yeah, so obviously billing alerts are my favourite form of monitoring within AWS, but within Google Cloud they do have billing alerts and they share billing limits, so they can stop services. The problem is, as with any cloud provider AWS has a similar issue, if the billing systems are behind live systems, so in Google Cloud there are about 24 hours behind, thinking AWS about 12 hours behind, more or less. And basically, it was, these folk shut off the billing alerts kicked in, but it was a bit too late. The other fun things they did is they were using Firebase as a database. So the $72,000 wasn't just Cloud Runner fees, was overall fees. And they were using Firebase as a database, and they had, obviously, the Firebase as much as a different interface. It's the same billing account behind the two, but they put themselves in the free Firebase tier. And for some reason Firebase auto upgraded them, because they had billing details. So Firebase is failure mode, you know, if you hit your limits, was not to stop the workloads from happening, was to keep the workloads going but charging more. And I think a lot of their costs end up being Firebase as much of as Cloud Run. So it's not just... so I think that was kind of the problem, but it's because it had this 24 hour cycle, you know, to resolve, to actually calculate what the bill is when the billing alerts couldn’t kick in. And that's, all platforms, they don't know your bill live. That's one of the problems, you know, there always be a delay between you using something and the billing system knowing about it.

Yan Cui: 15:11

Right, so these guys, they rack up 72,000 in one day or do they not have...

Ant Stanley: 15:16

Yeah.

Yan Cui: 15:16

Okay. Wow, that's pretty wild.

Ant Stanley: 15:19

And the overall bill of Firebase, plus Cloud Run, plus everything.

Yan Cui: 15:23

Right, right. Gotcha. And you also talked about Webpack and Rollup. I've been hearing a lot about the esbuild recently apparently is much more efficient much faster. Have you used it?

Ant Stanley: 15:33

No, I haven't. I'm actually going to start using it for new projects. We have been using Rollup, I wanna swap out... So I've used Rollup extensively. All the time every single function are shipped with it, doesn't matter the platform, I always bundle using Rollup because you can get fantastic tree shaking. One of the things with esbuild, is that it's essentially similar to Rollup and Webpack and that it's a bundler that can do all the tree shaking but it's written in Golang. So it's a lot faster. So Rollup for example and Webpack, don't really do multi threading that well because of JavaScript whereas esbuild can. And obviously Golang is significantly more performant than any kind of, than JavaScript. So yeah esbuild looks like a great tool, a very common thing when you are building front end work, or even using bundlers on your back end, is that you're going to be constantly bundling, you want to be testing your bundled output, not just the unbundled stuff because just in case errors get introduced in bundlings. It’s unlikely but you want to test what you're shipping rather than it. So if you're going to be running it in a, if you're going to be doing multiple iterations and potentially running a kind of a watch function with constant bundling, Rollup and Webpack is really slow if you’ve got a lot of functions, a lot of code to bundle. With esbuild it is significantly faster. It makes that fast feedback a lot quicker and improves performance, your deployment performance hell of a lot because bundling can get really slow, there's a lot of calculations that have to happen. You know, because it figures out what your whole dependency tree is and then does tree shaking, and only imports the pieces of code that you need. So yeah esbuild is definitely on my to do list. This coming month or so trying to replace Rollup.

Yan Cui: 15:27

Yeah, I've heard, I think, AWS started to use esbuild recently as well. So it's certainly getting some pretty good attraction there.

Ant Stanley: 15:39

Yeah, I definitely think I've been looking at, I've been using the AWS version three, the AWS SDK version three for JavaScript a lot, and they've changed the way all the bundles work. It used to be this like a monolithic thing that got brought in. And there were lots of code dependencies between different functions so it became harder to just import a single SDK, whereas with version three, you know, there is essentially a whole family of SDK. It’s SDK per service. So you can bring in just an S3 SDK, just the DynamoDB SDK individually but more importantly, it's written in such a way that it bundles really really well. So I've got a function that runs GraphQL.js, the S3 SDK, the DynamoDB SDK, and the SNS SDK. So I've got all of those SDKs in there, and parameter store. And, and it bundles to about 700 kilobytes, with with with with all your SDK is in there. So that's a huge saving, you know, with the version two SDK is what 50 megs if I remember correctly. If it comes in something like that, it’s huge. With version three just bring what you need and it bundles fantastic. You know so you have far smaller functions.

Yan Cui: 19:04

Okay, great. Yeah, I need to check it out as well. I think you told me that there was still some missing features from the v3 SDK for DynamoDB. Has that been changed now?

Ant Stanley: 19:13

Yeah, they have. So when they first, first, it was all in preview, they didn't have the ability, there's no DocumentClient. And I don't know if they're going to bring it back. So obviously, if most people end up using the DocumentClient because DynamoDB got its own JSON schema for its top system, and no one wants to deal with that. So what the DocumentClient did is it would, they called marshalling and unmarshalling, it would transform JSON, standard JSON into the DocumentDB’s attribute value schema that DynamoDB uses, and they didn't have that early on, but they have released one now. So there's. I think that they're mapping the DocumentClient back but what they had released is marshal and unmarshal methods of the document, the DynamoDB client so when you do a DynamoDB operation, you just call, you know, if it’s a put or whatever it is you just call the marshal function to transform it. And if you need to, and then if you need to transform that data that it gets back you call the unmarshal function. I think it became available in September. I think it might have been October, but yeah, that's been available now so it's a lot more usable.

Yan Cui: 20:30

Okay, cool. Yeah, that's, that would be a big miss if they don't have that. So I guess the one final thing I want to ask you about, since we are still in the final week of re:Invent is, where do you see this serverless thing going? I mean containers are getting more serverless like features and the Lambda is getting more container like features. I mean nowadays you can run container images inside a Lambda function. So, where do you see the evolution taking us next?

Ant Stanley: 21:01

So genuinely I think there's gonna be more and more of a convergence, I remember a conversation I had with Tim Wagner in early 2016 kind of off the cuff and he was saying at the time, you know, his Lambda group where, you know, there are a lot of users pushing for longer running Lambda functions, container support, etc. And he said, in the container group, you know, there was a push for folks to have more serverless type type features. And I think we will see more and more of a convergence and at some point it might be hard to tell the difference. I think at some point the container team, we're going to release something that looks a lot like functions and the functions team, the Lambda team we're going to end up releasing something that's more stateful. But I think the whole serverless paradigm of pay per use, you know, scale to zero, that's here to stay and I just think the use cases and the platforms that support that paradigm is just gonna keep increasing, because that's what people want, you know, they don't want to pay for stuff that doesn't, that they're not using. They want stuff that can scale automatically on its own, you know, whether that's event driven, whether they're stateless or stateful. You know, that's going to stay. And what will happen, you know, might look have a container interface, might have a zip interface, that's definitely going to expand and I wouldn't be surprised if, you know, there's Fargate functions or something that comes out at some point, or stateful Lambda comes out at some point. It's... But I do think serverless has got to the point where it's just a thing, you know, it's just it's just an option everyone uses. It’s becoming less, significantly less controversial, shall we say, I don't think... people don't get fired for using Lambda, you know, they don’t get a lot of questions for using it now. There's a lot more trust in the platforms. So I think uses are just going to keep expanding.

Yan Cui: 22:58

Yeah, I kind of remember the same thing happening with NoSQL, and eventually became something that you just use rather than to keep talking about how this is a NoSQL vs a SQL, even though that conversation is still happening nowadays. But certainly that is no longer, like you said, a controversial decision. Okay, so I guess that's all the questions I've got. Is there anything else that you want to mention before we go?

Ant Stanley: 23:26

Yeah I think obviously we had a great conversation about re:Invent on Friday, about, you know, what your… I got your thoughts and you were on the other side of this. I'll get your thoughts on re:Invents and kind of the products that came out and what you thought the direction is. I think folks should definitely watch that. That was a great chat. Good to have another conversation with Vlad Ionescu around his thoughts on the container releases that should be quite interesting. Vlad is a character. He's got strong opinions on things so it's always entertaining to watch. Yeah, but yeah, check out homeschool. It's a forever evolving platform so homeschool.dev. Lots of serverless and container content on there. Yeah, we're gonna be running more and more workshops and more kind of online courses on there, aimed at folks who want to use the latest and greatest platforms and modern functions in the data platforms.

Yan Cui: 24:26

Sure, I would make sure that those are included in the show notes, including your session with Vlad. One thing we actually didn't talk about and you didn't mention is interrupt. What's that

about?

Ant Stanley: 24:37

Yes. So so interrupts is a new publication we've, we've started at homeschool. And that's really focusing on, focusing content for the modern developer, you know, we look at. So there'll be a lot of cloud native and serverless content but also focusing on things like shifting security left the whole shift left paradigm and observability as well. And that's really not gonna be huge amounts of deeply technical content. It’s more focused on the higher level trends and, you know, views and where we're going. So, it's a relatively good post on there at the moment around cloud native and kind of containers and serverless, it's a pretty strong adoption pattern we've seen where folks are moving to to migrating to the cloud, they're just skipping EC2 and they're putting all their existing workloads and things like Fargate, and they build all their new apps on things like Lambda. And so this post on why that pattern exists, other posts on AppSync, for me like AppSync was one of the best enhancements AWS has ever made. But that just didn't hype it up enough, like I feel that the AppSync team has been massively undersold, so it's a good post about the launch of AppSync and the impact from that day. But yeah, probably, you will start to see a few, few posts a week, focusing on kind of modern platforms and modern developer processes and tuning and these kind of things. Yeah, so if you want to keep up to speed, Interrupt a good place to go.

Yan Cui: 26:06

Okay, great. And I'll make sure that we include your Twitter handle and the LinkedIn profile on the show notes as well. For anyone who's listening, who wants to get in touch with, Ant, do you have any sort of preference for things that people should get in touch with, maybe job offers, for consulting gigs, or maybe random questions about your dogs?

Ant Stanley: 26:24

Yeah, yeah, Twitter's always, always the best place for me. I am horrendous at other platforms. I think I've got like two and a half thousand emails unanswered. So yeah Twitter's definitely, ping me on Twitter, my DMs are open, so just ping me on Twitter, or you know, just @ me, and it’s easily the best way to get hold of me.

Yan Cui: 26:46

Okay, great. Sounds good. In that case, I will make sure those are on the show notes. And thanks again for taking the time to talk to me today.

Ant Stanley: 26:53

Great. It's always good to talk to you.

Yan Cui: 26:56

Take it easy, man. Hopefully see you in person soon.

Ant Stanley: 26:58

Cheers, Yan.

Yan Cui: 26:59

Okay. Bye, bye.

Yan Cui: 27:13

So that's it for another episode of Real-World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production ready serverless applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.