Real World Serverless with theburningmonk

#27: Serverless at A Cloud Guru with Dale Salter

September 02, 2020 Yan Cui Season 1 Episode 27
Real World Serverless with theburningmonk
#27: Serverless at A Cloud Guru with Dale Salter
Show Notes Transcript

You can find Dale Salter on Twitter as @enepture.

We spoke extensively about GraphQL in this episode. If you want to learn GraphQL and AppSync with a hands-on tutorial, then check out the AppSync Masterclass.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

This episode is sponsored by ChaosSearch.

Have you heard about ChaosSearch? It’s the fully managed log analytics platform that uses your Amazon S3 storage as the data store! Companies like Armor, HubSpot, Alert Logic and many more are already using ChaosSearch as a critical part of their infrastructure and processing terabytes of log data every day.  Because ChaosSearch uses your Amazon S3 storage, there’s no moving data around, no data retention limits and you can save up to 80% vs other methods of log analysis.  So if you’re sick and tired of your ELK Stack falling over, or having your data retention squeezed by increasing costs, then visit ChaosSearch.io today and join the log analysis revolution!

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:10

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today, I'm honoured to be joined by Dale Salter from A Cloud Guru. Hey, welcome to the show. 

Dale Salter: 00:28

Thanks Yan. Thanks for having me on. Super excited to be here. 

Yan Cui: 00:32

So you've been with A Cloud Guru pretty much from the start that have gone through the whole journey of growing with the company. And can you tell us about yourself and A Cloud Guru for those listening who are not familiar with who they are. 

Dale Salter: 00:45

Yeah, so my name is Dale Salta and I've been a service developer now for the past four years at a client group. I joined very early on in the startup phase. I was the second developer. And previously to that I worked at a software consultancy company that focused on .Net.

So ACG or A Cloud Guru is a completely serverless cloud training platform with a strong focus on keeping our learning really fun and engaging and we've taught over 1 million students how to pass their AWS exams GCP exams and Azure exams. And we have a lot of content related to serverless. But not only do we teach serverless our entire platform and ethos is all around serverless computing and has been that way since we've started. So we try to practice what we preach by really following a lot of the stuff that we teach now in our content when we build the platform. And just to be specific, the team that I work on a part of what I do is actually building out that serverless cloud training school. So, it's been really, really exciting to, you know, started from when we're really tiny to where we are now. And just to put it in perspective. This is like not a really small serverless app. We are quite large. To put it in perspective we have roughly, 240 million Lambda calls per month, so that's like about 100 Lambda invocations per second, 180 million API gateway calls, a month which is about 70 per second. And we're moving roughly 90 terabytes of data through CloudFront which is around 300 megabytes a second of video content out to our students. So, this is definitely not a small serverless application by any means.

Yan Cui: 02:31

So, A Cloud Guru was fully serverless from day one. Do you know what was the reason for the guys to choose to go serverless from those early days when the whole technology space for serverless was still quite immature?

Dale Salter: 02:49

Yeah, so really serverless was a need or came out of a necessity essentially. So, Sam our founder, had a very small amount of time he took around three weeks off on the job that he was working at the time to essentially build out this cloud school, and he had only a tiny amount of time to do that. So he took his family down to Tasmania, which is a small state within Australia, and then he focused on trying to build a school or a product within that small amount of time. And if he were to abuse traditional approaches like this over full approaches, he wouldn't have been able to do it in the time that he had done it. At the time, we weren't interested in building, like an amazing technology architecture or paving the way on something new and exciting it really came. The serverless approach really came out of a need of necessity to build something very very quickly in a tiny amount of time. So, A Cloud Guru really has been serverless from day dot. Within A Cloud Guru we never used any ECS or EC2 or any of those, the server full technologies. We've really focused on using and leveraging DynamoDB, Lambda API gateway, CloudWatch SNS and so forth. So, one of the reasons why we knew we needed to build a serverless platform was that we had a whole bunch of users on another platform that we needed to move across essentially or to help encourage to move on to A Cloud Guru.

Dale Salter: 04:31

So, we didn't want to have to worry about the things to do with auto scaling or provisioning or managing servers or outages that happened in the middle of the night. We just essentially wanted to have a product that would just work and scale elastically with the amount of users that had come on. So, one of the interesting things about A Cloud Guru is that initially we started with a Firebase database. Sorry for those that don't know Firebase is essentially like a big JSON file in the cloud that essentially has like WebSockets, to be able to talk to it back and forth to pull data. And what the way that you talk to Firebases essentially directly from a client so like a SPA or something like that to the database directly and you manage security of that database through JSON policy. So, what having this allowed us to do was it essentially allowed us to not have to build out a back end at all so in a traditional application architecture, you'd have a front end and API layer, and a data persistence layer, we could actually cut out a lot of that middle work that would be involved in the data persistence, and just had that directly in the client. Sorry, that was like a completely serverless database and it was also essentially a back end list system at the time. So, one of the interesting things here is that that works for a lot of the different various spots of the early A Cloud Guru system, except for one key piece which was payment processing. So, we needed to basically have a way or a secure authenticated execution environment, to be able to, when a payment had come in through something like Stripe, to be able to process it securely and write into the database that like this person actually did purchase this course. And at the time, serverless was really in its infancy and functions of the service was as well. So we actually looked at Auth0, and Auth0 had this thing I think it was like a webtask or something like that, which we essentially use for that processing. We also had a lot of videos on CloudFront and S3 and the transcoding pipeline through that as well that managed all the transcoding of all of the videos in A Cloud Guru. And at that time we were actually writing this thing into webtask on Auth0 AWS just announced Lambda. So this is back in 2015. So we had our Firebase back end, we had a couple of Lambda functions and a really big Angular 1.x front end. And this was really awesome, like we didn't have to deal with all these like API layers or anything like that, and was something that was completely managed in AWS. So we really had this mantra of buy over build, where we would essentially leverage as many services as we possibly could, with functions as a Service for compute on our back end, to really put this school together, and we essentially wanted to avoid all of the undifferentiated heavy lifting associated to having to maintain cloud servers and infrastructure and all of that sort of stuff. So the story was a success, and A Cloud Guru did launch with a completely serverless school, and that wouldn't have been possible with a server full model because all of these other things that would have taken, potentially three weeks by themselves to implement authentication or an API or auto scaling or anything like that. 

Yan Cui: 08:30

That's exactly the kinds of stuff story that I like to hear is really breaks my heart when I hear that loads of stuffs going down the Kubernetes route, and spending six to 12 months just getting some MVP out when you can do what you guys have done and just get something out and working in a couple of weeks, and then go from there. And since then I guess you guys are now a much bigger company you have acquired the Linux Academy. So how has that transition over the last couple of years, how has that been? What were maybe some of the pain points, was there anything that you guys have to do really differently compared to what you were in terms of how you organise yourselves, how you manage your code and maybe we can talk about how your architectures evolve over time as well.

Dale Salter: 09:14

Yeah. Awesome. So, I want to preface this with I think the biggest challenge that we had throughout this entire process was latency. So, serverless systems tend to be quite a lot slower, potentially, if you're, especially if you're not careful than perhaps us over serverful counterparts. And so a lot of the architectural decisions we had made, we really kept performance and latency in mind. So, based on what I was talking about before we had a Firebase database. We had an Angular 1.x application, and a couple of serverless functions and as we started building out the developers so I think at the time we had this, this kind of architecture there was only about three of us or four of us, and we knew that this was not going to scale up in terms of developers working on it. So this system was completely elastic and scalable from the amount of users that we could have, but it was not going to scale out with the development team at this time as well, we were starting to think about building out a mobile application. And we knew some of the shortcomings of this initial architecture or this initial MVP was that our front end was a thick client and had all the data access logic and all the services within an Angular application, and there was no way we're going to be able to reasonably share that with our app with a with a mobile application, also the other shortcomings of Firebase was that it had no transaction support. It wasn't running in a, all of the data access logic was running in the front end which is not a reliable or consistent compute environment. So if you had network issues or something like that, it could potentially mean that the right, you're performing to the database wasn't completed. So, at this time, we decided that we'd actually start investing in building out serverless microservices. And initially we first started with HTTP. Sorry, this was a very traditional thing that we had used during other companies that we have worked out and we were very comfortable with using REST. And that worked really well initially but considering we were starting to build out an API and GraphQL at the time was getting a lot of attention. We thought hey like GraphQL is almost like a superset of the functionality of what rest gives us, but we actually get a lot of the fancy things like types, and you get this idea of an application data graph and all of these other awesome things that you don't get with REST. And also, it works really really well with mobile applications because you only have to fetch the data that you need and what the clients specifically are looking for. So, as we evolved. The API over time. We want returning to additional properties to the mobile application that they didn't need. So, we actually transitioned from our old thick client with a database that was accessing it directly into a microservices model. And the way we did this was we used what's called a strangler pattern. So essentially what that meant was every time we used or built new functionality. We asked the question “Would we implement it within the old way which is that the front end and the Firebase database, or would we write a completely new in a microservice”. So that was the way that we slowly transitioned it over. And, at the time, AppSync wasn't a thing yet this is back in 2016 so we're actually running and we still do this today. We're running GraphQL-js on the Lambda function itself. And you know, API gateway fronts that, but all of the business logic is being run within a Lambda function using a GraphQL-js library. So, at this time, we knew that we had all these different microservices which was great, but GraphQL actually requires or encourages you to have all of your data accessible from a single endpoint. So the idea is that all your clients only know one URL that has your whole API on it. But we knew we also wanted, like a microservices, which had bounded contexts. So we wanted to have a service that was specific for checkout and a specific service for course and student course taking experience and one for web series and transcoding and stuff like that. So the way we accomplish that was through what we lovingly call our back end for front end or our BFF. And the way this architecture, worked was that the BFF had GraphQL-js in it, and the resolvers would call into Lambda functions that each microservice earned. So the interface to a microservice was a specific Lambda function name. And then when you called into BFF, BFF would run GraphQL-js and then it would delegate getting the data down to the microservices that earned that data. This model works fairly well for quite some time but one of the things that we ran into when we did this was that it was essentially really slow. Because at the time if you needed for a given query to hit full Lambda functions and sometimes they were serial, you could potentially run into fault cold starts on your worst case so your p99 were really really bad with this model. You know, in the case that you do hit those cold starts and you hit four of them at once, which could happen in some scenarios. And we also knew that every time we had to extend the API or building new functionality we had two services we had to touch. So we had BFF, and then we had to touch the downstream microservice which had that functionality. And that was essentially slowing us down right the lead time to be able to make a change and deliver functionality to our customers became slower due to this architecture and not faster. So, what we ended up doing is we ended up making, you know, keeping in mind performance and being able to deliver it really quickly. What we did was we had microservices now expose their own GraphQL endpoints and then all BFF did was it just stitched all our microservices together so we could still serve up all of the application APIs to the front ends. So, it would actually stitch them all together at runtime and then it ought to, for certain parts of the query, delegate those down to the microservices which are responsible for those. So we moved from this model of essentially having one microservice having lots of Lambda functions which could have cold starts to each microservice being one Lambda function. And this really improved our performance, because you know, at most you would essentially only pay two cold starts. And it was very unlikely you have to pay two because all the BFF Lambda functions were always quite hot. So, this is essentially how we addressed both the slowness and then also being able to deliver really quickly. And then sometime after that as well we introduced async messaging with SNS and SQS. But I'll talk about that later. 

Yan Cui: 17:08

Thank you, that was a really detailed walkthrough. And I loved a lot of the details that you put in there. I think there are a lot of things for us to drill into here. For example, the schema stitching and one time you talked about, you said using some of these experimental spec in the GraphQL space. I think the latest one is now where it's called the schema federation. 

Dale Salter: 17:28

Yes, correct. Yep. So, what we essentially did was. We even went one step further here in the BFF doesn't actually have knowledge of the downstream services we created our own federation service, essentially, and what it would do is, each time a request came in, it would first call out to a federation service, I would get all of the information for the microservices that BFF was responsible for or could be added delegated to, it would introspect the data that came back, and then know which service, he could delegate portions of parts of that query to through that federation that you're talking about. So we essentially even went one step further, in that every time we added a new microservice we didn't have to change our BFF, we just registered ourselves to that federation service. 

Yan Cui: 18:22

Okay. Gotcha. And this case is AppSync, something that you would consider in the future. I guess not for the BFF, but for the microservices because I guess that means, potentially a lot of time you can cut out that one Lambda invocation that you have in the microservice because probably AppSync can do the job for you and go straight to DynamoDB or whatever supported data source that it has. 

Dale Salter: 18:46

Yeah, so the great part about this architecture is AppSync is a GraphQL compliant API service so if it were to be something that we wanted to use or experiment with, we could very well stick it behind our BFF, and it should just transparently work, so we could have these microservices that still using GraphQL-js and then some of the services that are using AppSync. At the moment we haven't gone as far as to look into AppSync too much. This may have changed since we've last looked into it but I think one of some of the issues that me personally I understand with AppSync can be that you have these like DSL files, and you potentially relinquish some of that control that you may have if you're running GraphQL-js within Lambda entirely. But what I do think AppSync potentially could be really good for us, for is that rapid prototyping of a new microservice or a new back end or something like that to prove a proof of concept, because you don't need to write a lot of that bluecard, a lot of those integrations directly hook into like you were saying they hook into Dynamodb or you can write a Lambda resolver or they can hook into things like Serverless Aurora and stuff like that. So it's definitely something that we're still going to look into.  

Yan Cui: 20:11

Yeah. I've been working with AppSync on quite a few different client projects now and I have to say, the developer experience that is great is really really quick to get something running. I've got some reservations around Amplify CLI at the moment. I think it takes too much decisions that I don't always agree with, but I think just working with AppSync itself has been a really good experience it is for me. Another question I've gotten around the, I guess your use of GraphQL is, how do you handle some of the common challenges with GraphQL things like over fetching and under fetching 

Dale Salter: 20:44

Yeah. So over fetching under fetching so I guess what we essentially do here is to avoid this idea of over fetching is that you implement dynamic resolvers within GraphQL to only fetch the data that you need. So in theory you shouldn't need to over fetch more than what you need, within GraphQL because each entity maps to a resolver, and usually you're going to load the whole entity and then only return a projection of that entity, to the client. So you're not returning some big graph of data and then only mapping to a tiny subset of it. You should only be fetching the entities that you actually need to resolve an individual request. 

Yan Cui: 21:34

Yeah, I guess, if you're careful and when you're building your BFF or what the model, but also when you are building the client. But one of the things that's always, I guess I worry about in the back of my head is that, well, if anyone if an attacker is looking to launch some attack against my system. The fact that they can hit a lot of different resolvers from a single entry point, asking for one piece of data and asking him for related data by traversing the data graph, they can increase the magnitude of the attack are hitting me with some over fetching queries. He says something that you guys ever thought about or worried about. 

Dale Salter: 22:13

Yeah, so we know that GitHub does a lot of stuff, the way that they work at least from what I understand is that they attach a complexity metric to each of the resolvers within GraphQL, and they have a certain level of nesting that you can actually resolve to. So whilst we haven't found a need to have to do that within A Cloud Guru just yet. There's definitely ways that we could implement that if we do need those sorts of things. I guess one of the nice parts of a serverless system is that it should in theory be completely elastic so if we do get attacked. A lot of the system should be able to dynamically scale up and down to be able to deal with that. But at the same time it would be more desirable to be able to prevent that query from even executing and all that's definitely something that we need to look into. 

Yan Cui: 23:10

Yeah, that's why I guess in the serverless world, we are now calling denial service attacks the denial wallet attack. Because you can force your way through it but it's gonna cost you a lot. 

Dale Salter: 23:22

Yeah, funnily enough, within serverless, I one day introduced this bug within our video player. They've essentially caused each of our students to continually hit a serverless GraphQL back end, and it would do that essentially like every second. And we had, you know, a few hundred students watching that video at the same time. And we ended up DDOSing ourselves, but one of the cool parts of it is it actually made the application faster rather than slower because of that DDOS. You know, it wasn't sustainable because we're essentially calling our back end far more than we should have but you know it's quite a funny side effect of a serverless system is that it gets quicker instead of slower. 

Yan Cui: 24:10

That's funny because of the whole warm cold start, now you are eliminating it by DDOSing yourself. And it's funny you mentioned that as well, that Ant Stanley actually did the same thing. We launched an online workshop. It was in May, and he built a front end, and he had a bug where he was making a request to the backend constantly as well every four seconds. But yeah so when everything was working fine the system just scaled automatically but then we looked at the bill, he said oh okay what's going on there. It wasn't huge but still it was something that was noticeable, big spike.

Dale Salter: 24:46

Yeah, that's right. Hopefully it's something that your cloudwatch metrics should be able to pick up and let you know about. 

Yan Cui: 24:54

Okay so final question I guess around the whole GraphQL side of things is are you guys are using GraphQL subscriptions at the moment. 

Dale Salter: 25:03

So, that was definitely one of the things that we haven't been able to do is that GraphQL subscription. Essentially, because when we first looked into our GraphQL implementation API gateway it didn't actually support WebSockets, and you need some way to be able to do some kind of pubsub model to be able to implement GraphQL subscriptions, because GraphQL subscriptions are only a specification they don't actually talk about the implementation of those. So this is really where AppSync would have been really really nice for us to use because you get through subscriptions, through AppSync. At the moment we haven't really reinvestigated implementing subscriptions now with API gateway WebSockets because we haven't had a piece of functionality that would really need them. But it's certainly something that would be really really nice to be able to do, is having people be able to subscribe to GraphQL queries and then having the UI dynamically update once the underlying data stores changed. So, something that's really, really exciting for us that when the right opportunity comes up, I'm super keen to be able to get on to it and use it. 

Yan Cui: 26:17

Yeah, something like that will be really easy to do with AppSync but probably quite hard to do with API gateway, just because of the way API gateway implements its WebSockets is a very low level construct. You have to keep track of the connection ID and all of that yourself and to the mapping of who's subscribed to what content. Certainly I think, with I've seen the whole subscription thing has been really easy and I guess it kinda  brings back some of the power you had when you were running on the Firebase as well that the client can subscribe to changes in the database, and be notified easily when something has updated. 

Dale Salter: 26:56

Yeah, that's correct. And that's something that we, when we were moving away from Firebase we ended up giving up a lot of that there is the subscriptions that we had. And there was a lot of things that Firebase managed for us under the hood that we now had to manage ourselves going into  an API sort of model that Firebase and Firebase SDK would deal with for us. So there's things like pipelining and caching and concurrency and all that sort of stuff. 

Yan Cui: 27:26

I want to take a moment to give a shout out to this week's sponsor ChaosSearch. ChaosSearch is the fully managed log analytics platform that uses your Amazon S3 storage as the data store. Companies like Armor, HubSpot, Alert Logic, and many more are already using ChaosSearch as a critical part of the infrastructure, and processing terabytes of log data every day because ChaosSearch uses your Amazon S3 storage, there is no moving data around, no data retention limits, and you can save up to 80% versus other methods of log analysis. So, if you're sick and tired of your ELK stack falling over or having your data retention squeezed by increasing costs, then visit chaossearch.io Today and join the log analysis revolution. 

Okay, and let's switch gears a bit and talk about some of the microservices, and how you organise them into repos and CloudFormation stack and basically how you manage the whole development and release cycles for those microservices, since now you've got quality engineers working on them. So imagine you have to think about how many engineers are touching the same codebase at the same time and all that stuff. As you scale the team. 

Dale Salter: 28:41

Yeah. So, in the very early days we essentially didn't have this idea of teams, it was just individual developers, all just chaotically working on features. And as we've grown out with expanded from that model of having, essentially like single person teams and diving teams that are responsible for specific domains. Those are cross functional teams that each team has a PM, a designer, a lead and then three developers. And those teams are responsible for individual domains. So, one team might be responsible for things like our students. And other might be for organisations, courses, mobile and all those things. Now for each of those teams they actually work out at the same repository so we add a flagger we use a mono-repo. And the way that works is we use a service called Buildkite that every time that change happens in that repo we run a little detection script to look at what is changed in a folder and then figure out if we need to do a redeployment based on that change within that folder so that that folder you can kind of think of as a project so you've got your mono-repo. And your mono-repo has lots of front end and back end projects. And then based on those changes, it'll only deploy the microservice where that change has existed. So we use trunk based development for that. And every time a change happens and it gets merged into master. We then push it into our testing environment and then we do push button deploys into production from there. You know, only after the linters run and the automated tests and so forth.

Yan Cui: 30:36

So this case, this Buildkite, so others detection script is that part of what the Buildkite does for you or is something that you guys have to do yourself. And does it detect changes that are made to shared code, so maybe not to the folder where the project is, you know, it is, but it made some changes to some shared code, which would impact, one of the microservices.

Dale Salter: 31:00

Yeah, Buildkite is very extensible. and we ended up writing a plugin to be able to add mono-repo support into Buildkite. And the answer, regarding those shared packages or shared code is we don't share anything, across microservices or any code across each of the folders. The only way that we manage shared code is through NPM packages, and we have a specific repository that uses Lerna that has that shared code in it so each of the dependencies are all versioned and packaged together. And you know if we bump a package in would change a package or JSON or a yarn.lock or something like that and then cause a redeployment of that microservice, so we don't have this weird problem where we change code in one place and it has to trigger a fan out of microservice deployments. So each of the deployments, consistent versioned, deterministic and so forth.

Yan Cui: 32:04

That's very interesting. because, for a lot of people that I spoke with that are doing mono-repos, one of the motivations for going mono-repo, is to make shared code easier so that you can have one PR, one commit and that we update the shared code but also all the services that depend on that shared code without having to have that separate sort of flow whereby you update a shared code you publish a new version of that library, and then you then go back to the services and then update each service a new version. So what's the reason that you guys have decided to go against that approach of managing shared code is that because you want the team to own the update cycle for when their service is going to adopt this new shared code.

Dale Salter: 32:49

Yeah, that's correct. So, one of the things about shared code, shared code is really risky if you change it. And it's really hard to understand the impacts of what bugs, it may introduce. So if you update a piece of shared code and then have it propagate across 10 different microservices. If you don't understand each of the microservice that uses that shared code you don't understand potentially, if you've broken something for someone else, right. So, having it versioned and packaged means that teams can opt into it when they're ready to be able to operate a piece of shared code or something like that, rather than you're doing it yourself and not understanding the impact. Yeah, so that's predominantly the reasons still why we use that package style system of like versioning of libraries and NPM modules and repos. But the advantage of still having that mono-repo is that you can still very easily see across the entire application, what changes have been made by each developer. And you don't have to now wonder within your GitHub organisation, you know what microservices are being deployed to production, you can just look at one place and it has the entire state of the world, All in one repository for your entire organisation rather than having them scattered across the organisation in different repositories. 

Yan Cui: 32:24

Okay. Yep, that makes sense. And what about in terms of the communication between different microservices. I think you mentioned the SNS and SQS before. Is that basically how you are communicating between different microservices.

Dale Salter: 34:39

Yeah, so for synchronous flows. For synchronous flows between microservices, we use GraphQL, to be able to manage those communications and the advantages of using GraphQL is that you have a typed interface that you can use to communicate between those services. And one of the cool things about GraphQL is there's a lot of really cool things you can do. When the consumer connects to the producer or the server, and requests a set of data, it helps with being able to deprecate those requests, because you know what individual fields are being fetched by the consumer. And the reason, and the way that that works is the consumer has to identify itself as I am x service and I'm requesting Y data. And then when you go to change that data on the server, or in this case on a Lambda function that's running GraphQL-js, you know, what is being consumed by you in which microservices are depending on you. So you could in theory build like a graph from that. So how request response is done through graph QL. Now we've also started moving towards, and this is something that performances, as well, is moving to SNS and SQS. So microservices can in theory subscribe to data from other microservices they could cache that data. They could transform it, anything like that, within a Dynamodb table, they can keep that cash or that transform data. And then, anytime. That service changes the data that is dependent or has been cached. It can push a notification and say “Hey, your data is no longer valid or, here's your new data, store it.” And one of the advantages to this is because making a service service call can be 150, 200 milliseconds whereas if you're just pulling that data directly within your own service in a Dynamodb table, you've got single digit latency. So it definitely improves that performance. And then other advantages to using that SNS SQS model is that you get improved resiliency so if your service goes down for whatever reason, you can still service requests from your data store that you're keeping locally. 

Yan Cui: 37:30

Okay, I guess. I got an interesting question there with something like AppSync. You can say for different queries or operations or mutations. You can have different authentication models with them for the same GraphQL schema. You can have some operations that are dedicated to inter service calls that are expected to come from another graph to another microservice. But some of them are coming from the user so it is authenticated by Cognito for example. So in this case, you've got microservices that are potentially being assessed through the BFF, and then there's also other query operations that are assessed by other microservices directly. Do you have to make some distinction in terms of the authentication model there, and how to, or do you have to replicate basically what AppSync does.

Dale Salter: 38:10

Yeah, so we have to replicate what AppSync does so the way that the front end talks to the back end is through Auth0 and client side JWT. And that hooks into BFF. Once you're within the BFF environment you're within a secure area, and the way that services talk to each other is using JWT that comes from Cognito. So, services, the service calls are authenticated with the JWT that's coming from a Cognito pool. The way that we manage the different types of resolvers and the way that they call them whether they're coming from the front end or the back end to back end call or something like that, is that we use GraphQL directives to basically say who can access this. Can it be a viewer, which is a front end? Is it a server, which basically just says is, is it a server to server call? Or Is it like a role within the system so we have internal roles at A Cloud Guru that might be an editor or an instructor or something like that. So we've implemented our own way of managing that problem.On this not using AppSync. 

Yan Cui: 39:34

Okay, got it. Yeah, I think this will make that easy as well, especially when you're using Cognito groups, which is something that I've really struggled to implement to replicate with API gateway because you kind of have to end up writing custom Lambda authorizer and all of that stuff as well. AppSync just makes that a lot easier.

Dale Salter: 39:55

Yeah, that's right. 

Yan Cui: 39:56

Yeah, and I guess a final thing I want to get from you is what about your top three AWS wish list items. Are there anything that you really love to see AWS improve in this serverless space? 

Dale Salter: 40:10

Absolutely. So the three items that I'd like to see the serverless teams implement would be within Lambda functions at the moment, Lambdas don't have this idea of being able to run a back end job and be able to respond to API gateway, at the same time. So, what I mean by that is, so say you have a bunch of analytics or telemetry. Those analytics or telemetry, you have two options. One is that you can wait for those calls to come back when you're firing async jobs before you respond to API gateway. Or you can respond to the API gateway immediately, and then those pieces of telemetry will get sent in your next Lambda invocation to that same container. So the tricky part there is you can get that performance of being able to immediately respond as fast as you can, in request response, but you could potentially lose pieces of telemetry if that container doesn't get spun up again. So what would be really nice is Lambda to be either respond to the browser who or to whoever’s calling you, but also still be able to have those pieces of telemetry still get sent off after your callback or your promise in your handlers is finished.  

Yan Cui: 41:40

Yeah, the lack of background processing has been a pain in the butt for all the different vendors. I do a lot of work with the Lumigo and all the other different vendors have the same problem as well to collect telemetry and traces about your application, it can end up eating into your invocation  time, because there's just no background processing. But I think something may be coming in the future to address this, because this is something that's been asked for about for a very long time now, or the fact that there's no, you can only have a one subscription filter for cloud watch logs, which is also a pain point.  

And so, Dale, thank you so much for joining me today and for sharing your experience and your journey with a Cloud Guru. It has been really insightful. And I'd love to hear some more for what you guys are doing in the future. 

Dale Salter: 42:31

Yeah, thank you very much for having me on. And it was a lot of fun.  

Yan Cui: 42:34

So final thing before we go, how can people find you on the internet? And do you have any sort of personal project that you want to tell us about? 

Dale Salter: 42:41

So, if you want to find me on the internet I'm on Twitter. My handle is enepture, E-N-E-P-T-U-R-E. Or you can find me on LinkedIn @Dale salto. And is there any project that I'm working on. Currently I'm just tinkering with Serverless Aurora, just playing around with that, seeing how RDS works with serverless model. So, nothing, they're too interesting just yet but I'll be sure to share it around once I've finished tinkering.  

Yan Cui: 43:13

Excellent. Looking forward to reading that blog post. 

Dale Salter: 43:19

Thank you.  

Yan Cui: 43:20

Okay, man. Take it easy. And again, thank you so much for joining us today and stay safe.  

Dale Salter: 43:25

Thank you.  

Yan Cui: 43:39

That’s it for another episode of real world serverless. To access the show notes and the transcript, please go to realworldserverless.com. And I’ll see you guys next time.