#33: Serverless at Custom Ink with Ken Collins Artwork

Real World Serverless with theburningmonk

A podcast where we talk about real-world use of Serverless technologies from engineers who work with them day-to-day. We will discuss use cases, why they chose serverless and the pain points and challenges they face. If you want to know what it's REALLY like to work with serverless, this is the show for you.

All Episodes

Real World Serverless with theburningmonk

#33: Serverless at Custom Ink with Ken Collins

October 14, 2020 • Yan Cui • Season 1 • Episode 33

You can find Ken on Twitter as @metaskills.

To learn how to build production-ready Serverless applications, go to productionreadyserverless.com.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:13

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today I'm joined by an AWS hero, Ken Collins from Custom Ink. Hey, Ken, welcome to the show.

Ken Collins: 00:28

Thank you so much. How are you doing?

Yan Cui: 00:30

Yeah, it's good to finally meet you virtually. So, you've been working at Custom Ink for quite a few years now, and you've seen this whole transition to serverless. Can you maybe paint us a picture of who you are and who is Custom Ink, and your journey towards serverless?

Ken Collins: 00:47

Yeah, absolutely. And, you know, first thank you so much for having me on the show, I think, you know, I've looked up to a lot of your articles and I think you really helped me on my personal serverless journey and I just wanted to make sure I thank you ahead of time for that because you know all the work that you do. I believe with Lumigo, I'm not sure if I pronounced that right, it's just been immensely helpful. More so your deep knowledge dives onto all these topics and I just really appreciate you having me on and doing all that work that you've done for the community as well.

Yan Cui: 01:16

Thank you, really appreciate that.

Ken Collins: 01:18

Yeah, it's, uh, it definitely doesn't go unseen I do appreciate it. About, about Custom Ink I think, let's see I might be on my eighth year there. And personally I'm on my second year of my sort of retooling of my personal skill set on learning to use the cloud well. I think if you were to ask me a couple years ago if, “hey Ken, if you knew the cloud?”, I probably would have told you I've made an S3 bucket and I pretty much know the cloud really well and I knew what EC2 meant. And the more I learned the more I know I didn't know a lot of things. And that's really where my career has been over the past two years. If you frame that with Custom Ink, Custom Ink as a business that's been for printing t-shirts, has been a business for almost 20 years now maybe 20 as of this date, and there's a lot of history with Custom Ink with Rails and Ruby. And I think when I joined there about eight years ago, we only just started to understand like the breaking points of our monolithic applications. Traditionally, there will be a big front end and a big back end and maybe a legacy Java back end. And, you know, I watched us move from having servers, you know, manually installed by us to like maybe about three or four years ago, where we sort of had our big lift and shift into the cloud. And I've slowly sort of been adopting and learning serverless and optimise a lot of our more performant workloads that sort of are under stress as we start pulling them out of these sort of bigger applications.

Yan Cui: 02:50

I see make that transition to serverless. So what were some of the early pain points? One of the things that I hear a lot from customers who are going through a similar journey that you guys went through a couple years ago, is that well suddenly now as a web developer, you now have to learn a lot more. Whereas before you just write your own application code, and a lot of infrastructure is handled by other teams, but now everyone every developer have to know about all these different AWS services, everyone have to be suddenly capable of wearing different hats. You have to know about architecture, you have to be, I guess, responsible for those decisions you have to know about DevOps infrastructures code, and all of a sudden it does put a lot more onus on the developers. Is that something that you guys experienced as well?

Ken Collins: 03:36

Oh yeah and that's ongoing too. I don't think we've reached our finish line yet and. And if there ever was one it's probably at least a few years out and I think that's an honest part of like, some of these processes that we talked about DevOps and, and, you know, sort of cloud adoption whatnot that we might not talk about honestly enough, which is that this is a process and you really have to work out and it's really hard and you know there's some outcomes that we all wish that this has but it's a, you know, it's not turnkey and and it takes a while and I think for us, um, you know the, if I can sort of like frame how we've approached it today is one understanding where we got serverless right, and that was basically I think we have a testimonial on the AWS Lambda page around our usage of it. And we just took this one Rails application that was extracted from another Rails application. And we saw that we needed a rendering path for our clip art service, which you would experience if you were in our design lab. When resizing and adding clipart from our collection would sort of call out the image magic rasterizer bring it back into the JavaScript application and render it for the browser, and we had to move that to a sort of more performance, more scalable system, where we didn't have to sort of keep scaling up these EC2 instances and whatnot. And we got like an immediate cost savings, as well as sort of a capability expansion, and for two years we just been duplicating that story over and over again and then that sort of brings on these other conversations of where do you apply that knowledge, what are the standards that you adopt on, you know, say serverless framework vs SAM, and then how do you go out and sort of evangelise within the company that these are things that you can do. And then once you're there in the cloud with these certain frameworks, what are your capabilities that are different that weren't there before that make solving other problems easier, and it's just an ongoing process. It never stops,

Yan Cui: 05:37

Certainly with AWS, everything is changing, everything is evolving all the time. And there's never an end to your learning. In fact, as you get closer to re:Invent they're gonna announce more and more things. It's gonna be pretty hectic every year just to keep up with all the feature announcements and new services that they do around this time of the year. But, you mentioned something earlier, that it was, you move to serverless in, and immediately got some scaling benefits and some performance benefits. Was that a trigger? Was that a particular problem with an application, whereby you're running into scalability issues or run into other issues that made you have a look at serverless at that time?

Ken Collins: 06:17

Yeah, absolutely. I think the engineer before me that implemented that particular Lambda for our clipart library. He, you know one of the things, even though he was sort of embedded that particular engineer in our Web Ops team at that time, he had this capability to sort of look and and sort of shoot by a lot of our processes with Lambda rival. Lambda allows you to get sort of this code, working out there faster than it normally would take even if you're in the cloud right like it would, I would imagine for anybody that's in a company that's over at least 50 or so people, you know there's going to be processes in place for for maybe spinning up an easy two instances, or other things like that. And serverless allowed us to get this really good architectural win with very little effort that was traditionally looked at before on what it would take to sort of like properly architect something and spin it up and test its viability.

Yan Cui: 07:15

So serverless, one of the promises, is that you can do things faster and sounds like you guys have definitely benefit from that, from the fact that you're talking about being able to test features or ideas in the market quickly. Has that been something that you have seen in the customer, as a whole, whereby as more and more teams adopt serverless things are just getting delivered faster and because of that, you're getting more adoption and momentum within the company?

Ken Collins: 07:43

Yeah, it's so it happens slowly and then, you know, like I said, we, if I describe maybe from a scale of one to 10 where we are, I probably put it at a 2 and I'd like to be at 10, but then again I, I live and breathe everything Lambda, and for me it's hard not to see every problem being solved by it, or at least sort of focusing on a, you know, a particular way of our architecture of always utilising it, you know, think the one use case that I haven't really seen us exercise a lot that's sort of very nascent and still there and just ready to be tapped into is this sort of internal usage by your teams where you sort of can do the sort of entrepreneurship, where you can take and start sort of like either a new business idea or a new application to help sort of an internal team or some sort of process out and just spin that up on Lambda, and we're only starting to get into that we've done a few of them and we can see the value. But again, it takes a while to go out to the different teams and sort of to train them and let them know that they have these tools that can sort of do this thing. Otherwise they're just going to keep doing the same thing they've always done before.

Yan Cui: 08:48

And the speaking of tools, you also mentioned earlier about the choosing between serverless framework and SAM. What do you guys decide to go with and what's the reason behind it?

Ken Collins: 08:59

Well, I think the, you know, so as a principal engineer there, I was sort of assigned the go take a look at the serverless thing and try to figure it out more and sort of standardised for teams. I took a heavy look at serverless. And SAM was just coming out right when I started adopting things. So I do think I had an advantage from my point in time where, you know, back in the day, a couple years before. When I adopted serverless and Lambda, your solutions would be much different than mine because you were forced to have these solutions right there was no SAM then. So when I looked at both of them from a fresh pair of clean eyes, the thing that I was struck by with is the serverless framework is I felt like I was being forced to learn a couple things at the same time and I was having a really hard time doing that right like CloudFormation was new to me, Infrastructure as Code was new to me. All these things were so new and I felt like with serverless at least, I had to learn this framework and then the framework underneath of it that was driving it, i.e. CloudFormation. And when I looked at SAM, I saw that they were essentially the same thing so I was like well let me just go ahead and learn the raw language. And I feel like that's paid dividends, over the course of the past few years that SAM has been getting better and better and better. So I adopted SAM, I wrote up all of our processes for it. And slowly as SAM get better as they do this is getting better we've been just buying into it more and more. I realise there's probably people out there that like serverless and, by all means use the tool that's right for you but I've never looked back from SAM I think it's a great tool.

Yan Cui: 10:43

Yeah. Personally I've always gone with the serverless framework, because it just provides a lot more flexibility. One of the problems I’ve run into with SAM, on a couple of client projects, is that because SAM has no, I guess, the equivalent to a plugin system that serverless framework has, it means that I'm kind of stuck with what this framework does, and its decisions. So, there's no easy way for me to disagree with the decisions that SAM has taken without some drastic changes on the framework itself. A couple of times I've had to take a really hard, I guess, a detour and wrote a custom macro, just so that I can change some settings that SAM sets and doesn't give me the option to change. Whereas with the serverless framework I can always override what the framework decides to do before I deploy. That is something that I guess has stopped me from embracing SAM more because of that. Oftentimes, I just run into things that I just can't do anything, I end up running into a brick wall, whereas with serverless framework there's always an escape hatch.

Ken Collins: 11:51

You know which one of the things that you run into a wall about with, with SAM.

Yan Cui: 11:55

So the last one was when they first announced support for AWS IAM authentication for API gateway. So at that time when you enable AWS IAM, it also changes how credentials is passed through. So that is also changes the credential attribute, on the API gateway methods integration, which means the caller's credentials is then used to invoke the Lambda function, which means as a caller for your API, I have to have the IAM permission to execute the API gateway endpoint, as well as the Lambda function behind it which just breaks abstraction layer. They have since then changed it so that you can actually set the value to override the credentials. But I was on a project I was gonna deliver within next couple of days, a couple of weeks. I couldn't just wait six months or however long it takes the SAM to make that breaking change. So that was one that I remember very clearly that I had a deadline to meet and have to do something drastic and ship a CloudFormation macro, just to change that one line of configuration in the CloudFormation that SAM generates.

Ken Collins: 13:03

Very interesting I think the... You know, one of the counterpoints I found with that is like, I guess, every time that I had a sort of a blocker for SAM, I've ended up working around it in the process, right, so like maybe after the CloudFormation stack, deploy through my own sort of automation scripts I might do, like, an API call to sort of, like, stitch things up together and stuff like that. One thing I recently learned about difference between SAM and serverless is that within SAM, to me, it was always very natural, with the policy section to just sort of be very declarative and how I state the permissions that my Lambda has the capability to do within the execution role, but I learned that that was sort of like a plugin that you had to use with serverless and whatnot so there's probably a lot of give and take and I will definitely say that I think I stay a little bit more higher up on the use case, like I've never done really any custom IAM invocation. Most of our stuff that we've been using at Custom Ink is just super dumb HTTP proxies where we think a lot of value it just, you know, basically looking at API gateway is almost like an application server.

Yan Cui: 14:08

Yeah, that IAM per function thing has been on my wish list for serverless framework for a very long time, but there also is an argument for serverless that if anything the framework can't do, there's probably a plugin that does it.

Ken Collins: 14:21

Yeah, check. Yeah, I would say that that's probably like just a, maybe an SDK call right after the stack.

Yan Cui: 14:27

Most of the serverless framework plugins, they modified the CloudFormation that the framework generates. That's what that particular plugin does as well, that the less the framework generates the CloudFormation and then it adds its own resources based on your configuration.

Ken Collins: 14:41

Cool.

Yan Cui: 14:42

I guess, so maybe changing the topic slightly. Something that we were talking about just before we started the show was this idea of putting existing frameworks into Lambda. So, taking an existing, say, Express.js app or Python’s, I guess, Flask app or the equivalent will be a Ruby on Rails into a Lambda function. That's something that you guys have been doing quite a bit, right?

Ken Collins: 15:06

Yeah, I would say we have a very healthy mix of what some would call proper use of Lambda and also some would call the improper use of Lambda. Um, I definitely have a different perspective than most people and that I don't come with a computer science background and I traditionally do things in a way where there are no rules only outcomes, right or goals, so I try not to tell people this is right or this is wrong. I always try to advocate for people to try to figure out what's right for them, what's right for your company and where you are is probably a lot different from where other people are and their company. So, the thing that I tried to look into here was after we had done a couple of these Lambdas mostly in Node.js before Ruby was introduced as an official runtime. We had noticed a pattern or lack thereof and our Lambda work right like as you approached each node project, maybe how you did routing, which would also be a taboo law to break for a lot of people in Lambda was different, right, the approaches would be different the structure of the project would be different and stuff. And one of our engineers asked me, the basic question is like can there be any more sort of structure in these Lambdas. I don't think he thought maybe the outcome of me would be like well let me just put Rails in it but that was my natural first idea of like, like I mean we're basically just using these things as you know commoditized web servers and, you know, web web proxies. So, that spawn the idea of like well, will Rack, which is the HTTP abstraction in Ruby. Can you convert that event that comes in to a Rack event? And I quickly found that AWS sort of sanctioned this, if you will, they have wrote a lot of open source project that took the exact same structure for Express, and basically converted the HTTP event to a compatible event that you could send to the, to the Express application at node. And I was like, well, if they can do it I can do it, and I totally did it, and I love it to death, and I think everybody ,you know, there's this conversation to be had of like, I'm taking certain bets on Lambda that the, you know, sort of after watching Simon Wardley when I went to my first serverless conf, and we're probably the only one now, so I don't think we had one since in 2019 October, we sort of like preached about this commoditization of this sort of architecture that I, I totally buy into. And if you sort of believe that and you believe that's where the puck is going, and sure they can be these small little, you know Functions-as-a-service thing but they can also be this incredible architecture that we can build on top of that abstracts below the waterline. All these things that we don't really care about but yet we want right everything from auto-scaling to, you know, commoditize HTTP server, etc. And I've been having a lot of success with running Rails and Lambda, and I just, I just think it's a great idea and it seems to be gaining popularity right like AWS itself is advocating for PHP frameworks and I think we just recently got a Serverless Hero, I forget his name that is working on the, the Symfony framework inside of Lambda, and perhaps other have done it as well, not just with Express but maybe with Python as well.

Yan Cui: 18:30

I definitely think it’s, is a valuable to have, as, as a way for people to lift and shift existing application into Lambda and get some that benefit in terms of scalability in terms of better security because you don't have to worry about securing the OS or some of the networking side of things as well which is so complicated to get right. And it's easy, very easy to go wrong. But I think there's still an argument against having it as your goal in terms of that’s where you're going to end up with, because in terms of getting the best performance, or security from Lambda. At some point, you probably do want to break up your function so that instead of having one function that does routing internally and handles every endpoints. I guess an example of that would be, I had a customer who was doing that, and was fine because that they managed to get this project running porting the API to Lambda very quickly and got something out of door, which is great, but they were mixing up some endpoints that had to do server side rendering so there are some endpoints that require to react as a dependency, and with Node.js the more dependencies you have the longer your cold start is going to be, which means that every endpoints, even if some of them didn't require react. In fact, which is most of them, they still suffer from the fact that, while the whole function needed to initialise these dependencies, which included this big dependency for react. And also they had this issue whereby they have different resources they need to access, which means every single endpoint can potentially have access to everything that the whole API can do, which is something that's if you talk to AWS guys, they will often tell you that at some point you do want to break it up, partly because of a security reason. This is no longer an issue now, but also for this particular customer, they even had this problem with while one endpoint had to talk to something inside of VPC, which means the whole API has to be inside the VPC, which back then means that you had like an occasional 10 second response time because of the fact that VPC adds 10 seconds cold start, which thankfully is no longer an issue, but I do still think that at some point there is enough benefit to do some that re-architecting and taking your functions, taking your API endpoints and put them into single purpose functions as supposed to doing all your routing. However, CloudFormation doesn't really help you here when you have got a large API, because there's all these limits around 200 resources, and a few other things, which I find that there's often practical reasons that kind of forces you to put routing in your own functions, because you just run into all kinds of different service limits, but I do agree that it is great to have an option to be able to lift and shift your application into Lambda, we don't have to do a big rewrite because a lot of time that's just not feasible.

Ken Collins: 21:24

Yeah, and I want to be like super clear on this too that I think a lot of people when they hear us talk about these things were like this is right or that is wrong right like you have to really sort of take that with sort of context right like. Not everybody is Netflix, not everybody is Custom Ink and not everybody is, you know, somebody else so like for us it was easy for us to see maybe after 10 years that one application that served our product data, probably shouldn't be serving the S3 content as well, you know, and doing the binary stuff, so it's easy to say, like, hey, let's take that out let's put it in Lambda let's re-architect the, you know, the stuff around so that we can do these things better, but like we were very successful for 10 years with that poor architecture. Um, I wager that a lot of people once they start getting involved with Lambda, whether it would be through either microservice optimization first on an existing infrastructure or though creating new things, we'll find these right breakout points and abstractions sometimes from the get-go. But sometimes, maybe a couple years later, right and that's fine, right, like. So like one workshop material I did, where it teaches Rails developers like hey, here's a little microservice that you can do with S3 for image resizing, and typically in a Rails application you might use something called Active Storage, and, you know, people would just throw this app upon say Heroku, and you know, one controller action would be uploading you know binary data, much like our, our product system, and would be serving that binary data back out again through a Rails controller through, you know send file or something. And, you know, stuff like that I think is a, you know, if you want to do it in a big Rails app and and at some point later on abstracted out fine but, you know, those cases are so common in Rails that I thought a workshop was really good showcase in how you could break that out into its own microservice, and get that architectural win. But, you know, make no mistake I think people should do that when they need to do it right, you got to, you got to take data to this and you got to, you got to have business value first and you just sort of like, don't let the technical aspects of it, and the sort of correctness of it drive you first let it get your business win, get your, get your stuff together and the good thing is you're not locked into anything we're engineers we can do anything we want later on down the road.

Yan Cui: 23:49

Yeah, absolutely. I totally agree with what you said earlier that don't advocate for rules but advocate for outcome, because it's, sometimes it’s just good to take on some technical debt or just do it the hacky way if it means you get something out, be able to test it. And who knows, maybe once you test the idea is, is not what is not an idea worth building on. So, why spend weeks or months of work trying to do the right solution, when you can test the idea first with maybe a couple hours work, and doing it in a hacky way.

Ken Collins: 24:22

And there may be like oh, you know I've definitely talked about this on Twitter before and there's different people like I think, you know, one of the things with Custom Ink since we've been around for a while I think in 2014, if I remember correctly, that's when Lambda was sort of first introduced, right? And we had a business unit that was successfully started as a, as a whole other business unit with a, with its own Rails monolith, and in 2014 microservices and sort of smashing these things into oblivion was a very popular topic to talk about. And what happened was, is that for us we had this one business unit that went off, took this monolith and went to go build it right this time right like it was only six months old, and they're like we're gonna smash it up into a bunch of microservices, because microservices were important. And, you know, we lost two years of business traction because that was not the right outcome right like that was not where the engineering fit, and eventually we brought the product back in house and back into our core platform with the monolith and all that work was wasted so like, you know, we've got corporate memory around that and lessons and stuff so I always want to advocate that you know if you're brand new company and maybe that microservices first or we now have the tools to do that right for the first time. That may be good right, take that choice right and look at it, be smart about those decisions but I've seen it both ways, right. And I think it's a, you know, there's never one approach and I definitely lean towards small apps to monolith and breaking things up when business success lets you know you need to break it up.

Yan Cui: 25:59

Certainly I've seen enough, enough. What's the polite way to say this technical masturbation. I’ve seen that happens so often, is, you know, teams getting, especially, you know, I guess, there are teams that believe in certain ways of building software is the right way, and you do kinda find yourself go down really deep rabbit hole and kinda of lose the big picture which is it’s all about a business is all about adding business value. And with serverless that is the thing they enable us to do, but at the same time as new paradigm, new tools, new way of doing things, new ways to just tweak things and just try to solve engineering problems rather than solve the real business problem.

Ken Collins: 26:46

Yeah. And then there's, there's fun ones in AWS, I mean, like I'm still learning every day, like, I think we did a project a couple weeks ago where I did that classical AWS Lambda thing where we had a, we had a huge migration of our product image bucket to a new one. And we needed to switch to that abstraction I was talking about where we pulled the, the image assets and serving from our Rails to a microservice. And we had literally, I think we computed it out to 147 days of a single compute work to do. And we did it in an hour, right, like with AWS Lambda we architected the events we got everything set up with our queues and, and basically we started off that bucket migration and move over to the new platform. And that was amazing right like and, and these are the things that you learn when you start adopting this framework. And I think if we can get more people from full stack applications into AWS into Lambda from that point of view first without telling them, you know you're doing it wrong and do that lift and shift and and get them into this ecosystem and get them learning about other ways that they can do things, not so that they can waste time but you know when they do have to do that migration, or that next big thing on some sort of proper event based in architecture, the capability is at hand, versus of, oh let's lift and shift first.

Yan Cui: 28:05

Yeah, absolutely. If you're running Express app in the Lambda function is probably more scalable and more secure than if you're running Express app inside an EC2 instance or on a container. And also you have room to grow out like I said at that point, so when the business dictates that you need to be able to split things up maybe into a separate API into separate microservices, or to start building a more event based system on top of your existing application. And I want to switch gears again and talk a little bit about some of the challenges that you see on a day to day basis because certainly there's a lot of complaints from people that are getting to serverless nowadays, about the lack of ability to simulate services locally, testing and observability are quite common issues that people run into. Are there anything that sort of stands out for you guys, particular in terms of challenges that you face as you're working with serverless technologies on a day to day basis?

Ken Collins: 29:02

Yeah, that's a great question. I think you called out a few of them that were sort of top of mind. Let's see if I can go down a couple of them. One was the, the testing was sort of like local services right so like we have one application that was a new application for our processing creative teams, which used DynamoDB and Rails and S3. And for us, you know that Custom Ink we have, we have development account so we have, we use what I think most people agree is a good AWS strategy of, you know, a single AWS account for staging a single AWS account for production. And then we have basically shared, you know development accounts like Alpha, Bravo or Charlie. And what happens is that for the development process, especially at least within Ruby you know your language may vary, but creating an S3 instance or creating a Dynamodb table with a couple of lines of code like Rails would do it for either a DB create, you know, versus Postgres or MySQL is very straightforward right so if like if anybody were to come in to say one of these applications where we basically have native cloud resources set up, you just run a couple scripts and you have, as long as you have Git, Docker, and an organisational AWS development account set up, then you can get started working on this application locally with remote resources and it just, you know, feels like a native application. But I do think there's a huge market for as things get a little bit more complicated right like... Again, I think we're sort of afforded that a lot of our resources that we use are kind of shallow right so like, it might be hard for some people to set up like SQS and SNS and a whole bunch of other things that we've never really crossed that threshold, and the development environment right so that probably comes into this use case of like how you do testing. Again, sort of like a lot of our decisions are informed by Rails so for us testing, it's easy to test our Rails apps because we have things like you know unit functional and integration tests. And since we've basically use API gateway as a proxy or just sort of a commoditized web server, and Rails already has a web server built into it, we don't have to couple, our sort of testing to, you know, understanding if API gateway is going to work with things right like that's tested at Lambda we don't have to test that for our framework so we basically just test our full application at the various components that we like to whether it would be unit or functional or integration. And a lot of our image architecture is using image diff tools and other things like that so that we can sort of go from high to low and feel fairly confident about things. And then, you know, if it needs to work with API gateway right like I've seen people complain that, you know, oh my god you know API gateway, HTTP API. It's V1 and V2 silently switched right like you'll find that out in staging as you're moving through the pipelines. So we try not to do too much testing that, that test things in the cloud right that's where we use our multistage strategy.

Yan Cui: 32:14

Okay, yeah that's such an interesting thing that I haven't thought about, that the fact that you're running an application HTTP server inside Lambda does give you some of the benefits in terms of making it a bit easier for you to just reuse existing testing methodologies and it would still just work once is running in the API gateway. I guess as you are doing more and more event based systems that probably is not going to be true anymore and certainly I find that with some of the more, I guess, complicated or complex rather events based systems that I've built, some of the biggest difficulties when it comes to testing is being able to verify that your application is sending the right events to places like EventBridge and SNS and I have to have very elaborate environments set up so that I can essentially listen on events as going to those queues or topics just so that I can verify that, okay, when I'm calling my function, or when I submit an event, the Lambda function that runs is doing the right thing in terms of writing data to DynamoDB but then also publishing the right events. So the other systems can listen to, so it's just doing that end to end test on the contract that goes in and the contract that goes out.

Ken Collins: 33:28

That's an interesting story because we've never really done the stuff like that migration I mentioned earlier where we did the, you know, the product images. So like we... unit test that was all done in Ruby, say with libvips, and stuff and and all those are tested, like Oh that one's not Rails at all it's basically just your good classical Lambda microservice, and we just basically test the events, you know with SAM by, you know either, like loading the, the the Ruby up in the memory and just send in the handler the event, and making sure everything goes naturally underneath it. But like, I think that word you said a smoke test right like once you deploy everything, make sure it's working. And that's kind of interesting right because like, we wouldn't do testing, like, you test one thing and then you test another thing and if there's an API contract, right, like, like testing that they sent that message, like, it's something we typically don't test right like if if the interface is XYZ, then you better send it XYZ.

Yan Cui: 34:31

Yeah, some of those things that can get more and more difficult to test, but I guess it also depends on the type of system you're building and how much confidence you have about your application. I think there's a lot of these you can still test at unit level. So for example, for some of these event based systems, I can still run unit tests that involve the function locally, and using mocks and stubs to capture whatever you request we actually trying to send to SNS and to EventBridge. So that's okay to test what a particular function does, but certainly at some point you want to test the whole thing end to end, as is being deployed to AWS to check that, you know, IAM permissions and bunch other things are set up correctly. One of the things has caught me out a few times is while my testing the function locally that is all so fine, but I actually forgot to add SNS publisher permission to the function. So when it's deployed, it's not actually publishing the message to SNS so the end to end flow for that particular workflow was broken. And luckily we had end to end tests to capture some of that, even though we didn't validate, maybe, absolutely everything, but the fact that we execute them in AWS caught some of these bugs that would have otherwise been missed, if we just rely on testing functions locally or testing individual functions as opposed to the entire end to end flow.

Ken Collins: 35:49

Yeah, I'd imagine that's pretty hard. I don't think I've ever really sort of approached that problem right like I just kind of have to, have trust and probably their systems but then again like that end to end test sounds important I'm not sure exactly how I would approach that like it’s a, you know, if it's a Plinko game like how do you test. You know that you drop it in the top and then it comes out the bottom every time. And in this case, it hits all the right like pegs at the right time and stuff.

Yan Cui: 36:18

Yep, and the more different, I guess, the more different branching logic, you have along the way the harder it’s going to get as well in terms of test cases. But again, it's not something that I do all the time but certain, some teams that I work with, where the systems are highly critical so they put more importance on, try to catch as much of the possible failure modes during testing as they possibly can. Other teams may lean towards having better observability and monitoring in place so that if something goes wrong, they can deal with it quickly. And it's not a showstopper for them to maybe have, you know, occasional bugs and maybe those events are, doesn't get processed right away and goes into a dead letter queue so that they can then reprocess them later. Whereas other teams are building things that are realy time and has to be processed within a few seconds and it's super important, especially those that’s in the finance world where there tends to be more weight put towards getting things done right the first time, as much as possible, even if it takes longer, whereas other teams, especially as for smaller startups, you know, it's more of we need to ship tomorrow. It's okay if it got some bugs, we will live. But we need to ship tomorrow.

Ken Collins: 37:30

Yeah, I think I came up with a term the other day when thinking about some event terms called eventual financial consistency, right, like, a very, it's a very a, going purely event driven and I think is a, is a threshold we haven't quite yet crossed and our need yet. And certainly there's a, there's a lot of empathy there on how you do that stuff and do it well that, that I've had yet to sort of really address.

Yan Cui: 37:56

I don't think being event driven should be a go, is, like I said, right, is about the outcome. It says it's got pros and cons, like with every decision we make, but certainly is a very powerful way to build certain type of systems, give you a lot of properties that you would like to have things like being loosely coupled, also being able to replay events and things like that. I guess we will cover most of the questions I have. One thing I do want to also ask you about as well and this is something I tend to ask everyone I interview is, what are some of your top AWS wishlist items?

Ken Collins: 38:31

Oh, that's a great question. Let's see. So, I think, wow. Let me think about that right like I think a lot of them we've just been coming true. Naturally, right, like, yeah, I had mentioned that a lot of the work that I had done sort of is this bet on serverless and making a bet on the Lambda ecosystem and commoditization so if you asked me this maybe a year ago, I would have mentioned things like RDS proxy. And let's see, so maybe going further back there would have been like okay the VPC cold starts. Um, I don't know right like I think the AWS team from what I've been working on has been doing a really good job at sort of anticipating where things are going and getting things done. I think, you know, so I've had very few AWS wishlist items that I think falls in their camp right like they could fix like some of them in weird services like CloudFront, you know, I think I can remember. I wish they had like a allow list vs a deny list on the query parameters. But within AWS Lambda, let me think, a dramatic pause, let's see.

Yan Cui: 39:46

It doesn't have to be Lambda, it could be, like I said, other related services, could be API gateway, CloudFront or anything, or even SAM.

Ken Collins: 39:53

Hmm, yeah with SAM I think I want them to have HTTP API support soon, please pretty please, because like some of our older microservices, especially with node, we use Jest which is a testing framework and I think written by Facebook, and it's it's really nice right it has this one capability where you can sort of write environments like I think the default environment is, I think it was the node environment or DOM environment but basically I wrote a, a SAM Jest environment, which will automatically allow some of your tests to spawn an AWS SAM local start API. And, you know, because that the certain node microservices don't actually have their own web server like we do with, say Rails. And I would very much like to switch those to HTTP API, which you know as your posts have said it's 60% cheaper, 30% faster or was it 60% faster and 60% cheaper. And I've coupled my node microservers to SAM, starting the API so that I can do my integration test right there with the SAM framework and until they support the start local with HTTP API I'm a little bit hamstrung on getting that deployed out because I don't want to write a lot of code to sort of abstract that away and push it to a different type of HTTP method. So that's my number one.

Yan Cui 41:10

Okay, cool. Yeah, that's a good one. HTTP API is, I think 70% cheaper on the per million request cost compared to API gateway, REST APIs, but I tend to find that the whole 50% faster to be overplayed because, you know, it is only talking about the latency overhead API gateway REST API adds, which is typically 5 to 10 milliseconds, which even if you are 50% faster, you're talking about 2.5 to 5 milliseconds of latency savings, which is not nothing but it's probably not where your problem is, if you've got...

Ken Collins: 41:46

It's like there's a JavaScript micro-benchmarks right like here this library does this faster but put it in a real world application, it doesn't really matter right like. But yeah, most of our API gateway usage is with North South image architecture so most of our time is on the back side.

Yan Cui: 42:05

Cool. So, is there anything else you can think of? It’s okay if you don't. I guess that just makes you a very happy AWS customer.

Ken Collins: 42:13

Yeah, it does. I love what they're doing, I support everything they do every time they bring something new out to the market. I'm like, that's what I was waiting for, and I'll continue to do my part to advocate it I think EFS was another one that I didn't know was coming, and, you know, that has sort of enabled us to I'm literally today, maybe next week right in a whole Rails lift and shift for one of our most transactional services that is going to be unlocked by EFS. So they do a pretty good job of anticipating when I need to get my job done.

Yan Cui: 42:45

And now that you are AWS Serverless Hero you've got the inside track to give them even more feedback. And also I guess that you've got a better connection to the teams that you can tell them your requirements and where you're gonna be working on next and what you’re gonna need from them.

Ken Collins: 42:59

Absolutely. They do treat us well.

Yan Cui: 43:00

Alright so I think that's everything I've got, Ken. Again, thank you so much for joining us today and sharing your story with Custom Ink.

Ken Collins: 43:09

Yan, I really appreciate it. I really do. Thank you for all the work you've done.

Yan Cui: 43:12

Thank you. Appreciate that. And yeah, take care, and hope to see you in person at some point.

Ken Collins: 43:17

Yeah, I can't wait for the real person stuff to start again. Cheers.

Yan Cui: 43:20

Cheers. Bye bye.

Yan Cui: 43:34

So that's it for another episode of Real World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production ready serverless applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.