Real World Serverless with theburningmonk

#15: Serverless at iRobot with Ben Kehoe

June 10, 2020 Yan Cui Season 1 Episode 15
Real World Serverless with theburningmonk
#15: Serverless at iRobot with Ben Kehoe
Show Notes Transcript Chapter Markers

You can find Ben on Twitter as @ben11kehoe.

Following our conversation regarding AWS SSO, Ben has also open-sourced a tool that helps you make AWS SSO work with all the SDKs that don't understand it yet. Check it out here: https://github.com/benkehoe/aws-sso-credential-process

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday/
License: http://creativecommons.org/licenses/by/4

spk_0:   0:12
Hi. Welcome back to another episode off a real world serverless a podcast away. I speak. We have real well with practitioners and get their stories from the changes. Today, I'm really excited to welcome Ben Kehoe from iRobot onto the show. Hi, Ben. Good issue again.

spk_1:   0:26
Hi. Happy to be here.

spk_0:   0:29
So I know you and I both are almost a record state. The poster child for Lambda. You guys been doing really hard core stuff on serverless and Lambda for quite a long while now and doing some really hide this throughput as well. So maybe as let's get started by just talking about what you've been doing at iRobot and how the I will about go down this road off a serverless.

spk_1:   0:53
Yeah. So I joined I robot in 2015. Um, before we while we're in the process of launching our first connected Roomba, the number 9 80 And so while I robots been around since 1990 a scaled out ah connected robot was not something we had done before. You know, we don't networked robots on a smaller scale are Roomba business, obviously in 2015 was already big selling millions of robots a year, and so we didn't have a background in cloud computing. Um, and I didn't really have a background in cloud Computing. I was coming from grad school where had worked on Cloud robotics where, you know, sort of what can robotics do while leveraging the cloud. But as a grad student, you know, you're not necessarily using giant fleets of robots are things you're doing, you know, sort of mawr theoretical proof of concept stuff. And so we had chosen iRobot had chosen, ah, early in the project for the connected Roomba. Ah, a full solution I ot cloud provider. And the purpose of that was that they you know, his turnkey. We didn't actually have toe build it. It was a platform. It handled authentication ahead of delivery of for more updates. It handled it a scripting platform on top, all of these things. But they had really built it for industrial Iot t, which allows, you know, there's a lot more data. There's a lot fewer devices, and your control often doesn't need to be really time where all of those air kind of flipped in. Consumer I o t. And so even before launch, we were aware that that platform wasn't going to scale to Roomba volumes. One of my first tasks at iRobot was to build some scale testing to knock the system over to prove to them that they weren't going to scale to the volumes that they've claimed. Uh, which fun got to set? Something's on fire. Um, but ah, so we knew even before launch that we want to shift off of them. And two things were involved in there. You want your i O T connectivity layer, which deals without the authentication in traffic from the robots. And then we wanted to own the application that we that was behind, that that handled all of the business logic and all the things that went into that we wanted that because by 2015 we had realized that being ah, part of the smart home was an integral part of our strategy. We want to control more of our destiny in the cloud application there. And so we ended up selecting AWS i o t core as r i o t connectivity layer. And we knew we wanted to build an application behind that on AWS and again, you know, we had people in at iRobot who had previously built, you know, scalable systems. Um, but not a lot. And not at iRobot. And so, you know, we sort of looked at the selection of services on Ah, it of us. I o t core is, of course, completely serverless there no knobs to tune there. And Lambda was new. A p I gateway was new. Dynamodb had been around for a while, um, as three s to assess. And it's all these pieces were there and none of them required running servers. And we said when we think we can probably build this without actually having to, uh, learn that step to build auto scaling server based systems and then we don't have to go through all of that learning as it scales up very rapidly to the size of the room, a fleet, and to invest in that has been very successful. And we still don't have any ah containers or V EMS handling transactional traffic from are connected robots, naps. There's certainly some, you know, container based jobs, aws batch, things like that more in the analytics side. But in the ltp side of our application is still fully fully serve Earless.

spk_0:   4:56
So I remember in Orlando and a peg gateway in those early days the 2015 it was vory different, I guess offering and capabilities compared to what we have today. So as an early adopter, what were some of the biggest challenges and problems that you guys run into in those early days?

spk_1:   5:14
I think none of us in the serverless community really had any idea what we were doing back in 2015. Um, you know, this was back when the serverless framework was brand new and still called Jaws and, ah, when we looked at all the things that we needed to dio, um, we didn't see any deployment framework that fully met our needs. And so we decided to roll our own, which is ah, always a last resort. When you're serverless, you want Teoh by something rather than build something, if you possibly can. But, ah, it was the right decision at the time. And so that locked us into sort of tooling built around a specific architecture that we still have, uh and so you know, significant double digit percentage of our lines of code when we shift our application at first was in the tooling that we that we had built, Um, that's not really true anymore, but gives an idea of the the amount of work going into both just building the tooling, but also sort of predicting and establishing best practices. Um, and I think a lot of the decisions we made early on there were not ones we would make again if we were building brand new and isn't really what we're making when we're building new systems outside of outside of that.

spk_0:   6:37
Yeah, I remember using Jaws that is trying to use yours and those back in those days. It just didn't work. The first time I tried, it just didn't work out of the box. So I ended up having to build some kind of a couple together a bunch of different scripts on the confirmation to get it to work and have some fun department framework myself as well. Faithfully. No. Now they said, the tuning has got so much better now this But we have a different option. Different problem, where we have too many different options on again, not clear. Gets consensus in the community on how to choose which one Yes. And so you guys bean, So make it fast forward. A few years nowadays, you guys are running slammed at pretty massive massive skill. Are you able to talk about any of this of numbers or peaks? Throughput was a peak Concurrency you're running at?

spk_1:   7:26
Yeah. I mean, we talk, you know, the last time I checked, which was like, a year ago. Um, our traffic is spiky. Are rumors are scheduled, And so 10 a.m. Eastern time every day is there's a massive in rush of traffic as ah for whatever reason, the highest volume of scheduled robots during a 25 period. It's interesting to see cause that's people on the East Coast. We're scheduling a 10 a.m. But people in Europe or scheduling it much later, Um and that just happens to be the confluence. And then, you know, our weekends are busier than our ah than our week days. It's very spiky, but we don't really look at the spikes for volume of traffic. I think if you average it out across everything, um, across, you know, an entire week, um, it's in the hundreds of requests per second because our robots and It's across all the Lamb da's kind of thing that we that we estimate for, it's Ah yeah, And that's because robots are sort of they're not very chatty when they're out of outside of a mission, because there's not much going on within their sitting on the dock waiting for something to happen, reporting that they're still there during those times. Those robots are are not very chatty. And then there's thes big in Russia's big spikes in volume, when robots are starting scheduled missions.

spk_0:   8:54
That's quite funny that some correlation between everyone who owns the IRA but a rumor that all kind of, I guess it's like a particular time, the day when everyone seemed to be running the I robot ah, robots in their homes at the same time

spk_1:   9:11
in the app when your schedule it, it's ah, it lets you schedule for, like the hour of the half hour, whatever. So were guiding people into, um into those schedules,

spk_0:   9:22
right? I see. Okay, So Okay, that makes sense. And also, I guess that makes sense for them to run publicly more often on certain days of the week as well. Um, so in that case, when you have a sudden spike in traffic. Do you ever run into any of this off service limits that Lambda has insurance off these initial bursts Capacity limits in your region, or is that just not a problem? Because robots would just reach why you've never get throttled.

spk_1:   9:49
Yeah. I mean, there's a lot of traffic. Um, that doesn't completely exit. Ah, the, um like a lot of those. So the request volume from our lamb does is lower than their quest volume. Like the message volume into AWS i o t um, because basically everywhere, where messages air coming in ah, to aws i o t They're ending up batch Tinto a Lambda function. So, for example, we have an elasticsearch cluster that indexes the device shadows in eight of your society as big tennis, a stream that listens to the very large volume of requests that go into i ot through the shadow. And then, of course, we're pulling out large batches of requests out of that kinesis stream. Eso I think Ah, there's few places where the land is really need to scale up very quickly because of especially kinesis, which allows it to back up a little bit, Um, but also the batch ing in general. And ah, and then for AWS i o t They have a pretty good understanding of our traffic. Um, and so we don't see a lot of throttles off of there, You know, for us a a big event is Christmas morning. You know, all the people who have bought robots from black Friday until Christmas. Open them in about a four hour window on Christmas morning. And there's this giant in rush of traffic there. But we work with our account team in the AWS i o t team to let them know this is what are expected. Volumes are, um, and so that that can all be pre warmed for us. And so I think, you know, with ah Lambda a lot of that now is self service with provisioned capacity. Um and, ah, provisioned capacity, right? Not Thea reserve concurrency. Yeah, and so you can self service that for land if you know what your traffic patterns look like, which is something that we we dio. But most of that happens within aws Iot tea for us?

spk_0:   11:55
Yes, Funny that moment there. When the even expert like yourself have to double check yourself. Am I talking about dessert concurrency or provision in currency, the navy just so confused.

spk_1:   12:05
Well, the naming is is is weird, right? Reserved is like I'm reserving concurrency. Isn't that is it's concurrency gonna be there? No, it's only if I've provisioned it. Will it be there?

spk_0:   12:13
Yeah, I got. Is that the name? It just gets me so, so trusted. So are you running everything in one single region, or do you also have a multi region? Said office? Well,

spk_1:   12:26
that were mostly in us, East one. Um, and that's Ah, primarily because most of the things that our users are doing are not seeing the late to see their talking. If you're at home on the same WiFi network is your robot. The APP talks directly to it. And ah, therefore you don't really see late and see their, um, so we're not as late and see sensitive as other applications might be. And so we've chosen a single region strategies. Same for ah, fail over Is that our, um the current functionality that we have, um, isn't ah isn't primarily, you know, it's not mission critical to our users. Lives yet. Um, and the robot will still clean. You know, if you set a schedule, um, the robot will clean on that schedule, even if the cloud is out. Even if your WiFi is out and, you know, push the clean button. Same with that direct connection that you could make. Um, And so today, um, those fail over scenarios and those latency scenarios aren't high enough. Priority in terms of our customer needs to make the complexity worth it. Um, but as as our feature set grows as the as the our connections in the smart home grow and ah, more of the value that we see of having a connected robot in the home materializes. I think some of those things may change.

spk_0:   13:57
So you mentioned earlier about the whole pre skater pre warming for out. You call yourself? Are women doing the same thing with low balances in the past? Where when you have a new product launch in those those traffic's gonna be very spiky. That's the article itself against They also just auto scales, right? Just that you've got Spike coming in. Okay? That's why units, Okay, that's why I need Teoh. Ask them to pre wound

spk_1:   14:22
Well, and it's it's good. I mean, this is not something you know, Christmas. We need to talk with them about, you know, we're getting a very large influx of robots in in a relatively short amount of time. Um and and we want all of that to go well. And so we we talked with him about that. But it's not, um, you know it. It's not something that we have to normally talk about them with for other kinds of spikes. Prime day, for example. That just works. So they scale up and they're pretty good about accepting messages. And even if the delivery you know, uh, need some scale up before that happens

spk_0:   15:01
have ever running to issues around the delivery in terms off, duplicating message is being delivered form I ot to kinesis. There's some problem that I have experienced that with a few of these a sink, even sources that push messages to condense is that when you put them out from Lambda, sometimes you just get duplicates. I think we've Clough watches. That's quite what that's quite common when I see they didn't have time to write some code around that what about We vowed to quote, Is that something that you guys have to deal with?

spk_1:   15:31
Uhm, we designed all of the messages that come from the robot to be item potent. So ah, uh, they're all cross zero eso originally with i o t. You paid for the act message that was sent back down to the robot in cross one, um and ah, when you have lots of robots and lots of traffic ah, that costs can add up. So all the traffic currently from the robots is cost zero, and so duplicate messages can end up being sent. So everything we sort of design around being item potent updates to shadow that we index, for example. That's just fine for there to be duplicates records that come from, you know, at the end of ah, cleaning mission, a robot uploads little report of it that goes through a I ot into a dynamodb table. That idea is stable, so even if that gets updated later, that's fine.

spk_0:   16:24
Okay, that's pretty cool. One of the things that I get asked a lot is just what events source actually used as a cue. The some uneven options, I guess, given the volume off messages that dealing with kinesis must have worked out a lot cheaper than if you had used say you've embraced your sqs or SNS, which which which volume those services are so much more expensive compared to kinesis, it's something that

spk_1:   16:51
we haven't had to look too much at yet. Um, we chose kinesis because Theo Sqs integration, you know, didn't exist for a very long time. We built a custom resource toe to do the integration, but we only used it on lower volume things. All the very high volume stuff goes through Kinesis. Part of the value of Kinesis is that it's a buffer right that if whatever's downstream is broken, your kinesis stream will just back up. And this happens for us because that elasticsearch cluster that I mentioned that falls over occasionally. Which frost is fine because there's no time critical things that query that cluster. So it's totally fine and it fills back up with current shadowed at all the time. So once you just stand up, the cluster again gets it gets filled pretty quickly, so our requirements around it aren't aren't very strong. And so when that happens, the lander that fills that. Elasticsearch just starts failing, and the Kinesis team just starts backing up. And then once the clusters back up, the lander starts working and the Canadian Stream empties out, and you don't have to do anything. And it's not true with us. Us today, right? With sqs, you're reading from it, and all of a sudden you're downstream goes down and your Dok was just gonna fill up with all the all the records as they asses, they fail enough times and you got to read dr all those messages or you gotta figure out Oh, something's failing. I gotta turn off my lambda set. Ah, concurrency 20 and wait for it to come back. And to test its coming back, I got to turn that concurrency back up a little bit. Um, so you got to do a bunch of things that are a lot of work and undifferentiated heavy lifting. And this is one of the ah, one of the reasons that Lambda or AWS in general needs to build sort of a circuit breaking service, right? Your ability to do these things requires a very state full process. You gotta maintain Oh, there's some downstream health that I'm paying attention to, and I'm marking it. You know, it's not just if one call fails, Um, it's if several calls fails, you know, in a given time line. And, sure, I could make a cloudwatch alarm for it and do all these things. But then I need to store this state of that somewhere. And then I need to use that state to drive a change in Landis concurrency, and then I need to occasionally check on it to see if it's even better. Um, because I keep the concurrency at zero. Never get any more information about whether the downstream is healthy again. And so that feels. You know, it's something that a customer can build but is very much undifferentiated, heavy lifting, and I think it's sorely missing. It's something that, you know, is built into container systems. You know, Ston, envoy have features that help with this and in the serverless world were just completely without it.

spk_0:   19:49
Yeah, I couldn't agree more with that seven out with deafness sorely missing, I think we had a conversation with poor Johnston and the German daily as well. Leslie. We have about four or five different ways of implementing the basically same thing. Recipe isn't so much better if aws just provided. I think that brings us nicely, t actually, maybe before we want to do and what I want to talk about in terms of some of the missing features in the service offering from a diverse right now. One last touch Honest with kinesis one of the missing piece there is the auto scaling, which I've had to build a seven a couple of times myself. Are you guys doing anything? We've auto scaling or the just say no. Screw this. We're just gonna run at a certain shot number of shots and just be done with that. Yeah,

spk_1:   20:37
I think I mean, our traffic is pretty predictable. And so we don't you know, the restarting happens fairly rarely. In fact, I think one Christmas, the literal on Lee operations work that we need to do waas up shard, one of our kinesis streams. And so if that process had been automated, it only happens every couple of months. But if it had been off automated that Christmas Day operations for that massive influx of traffic, just like you know, 20 axis are normal. Daily traffic would have been completely hands off keyboard. No, auto scaling for kinesis is great, but that it implies that I shall have to manage shards, right? And so what I really want is just that I can shove data into kinesis. And I still have the same guarantee, which is for any partition key. The messages are ordered. Um, but today you have a notion of, you know, multiple partition keys are on the same chard, and they'll all be ordered relative to each other. But I'm fine giving that up in exchange for I don't have to manage shards. And that's I think the bigger serverless dream.

spk_0:   21:49
Absolutely. If you can have the same pricing model and scaling model that fire host gives you with just no more kinesis data streams, that would be amazing. Yeah, and so that I can stop brings us back onto that question I was gonna ask you just now is what do you think? Asked him the most. Pressing against the missing features that we have with the service offerings on a the best. Right now, you touch on the lack off some kind of auto scale. The Tennessee is also the lack off circuit breakers. Anything else that comes to mind.

spk_1:   22:20
So we talk a lot about Lambda and services like that for, um, for serverless. But several is is really more about your infrastructure graph. Like the amount of code that I need to write Forgiven application should go should trend downwards over time Until is literally just my business logic and even that, you know, when I think about um land is that I write today where all I'm doing is gluing together Ah aws services. And then I look at systems manager automation documents where I can, in a declarative way, say, Just make this a p I call and then make this other AP I call, um, being ableto define the transformations and I need to do in allowed to function without actually writing any code that has the possibility of going stale. Like when all I'm doing is aws ap I calls and Jason Transformations writing code for that is kind of Ah, an overhead, um, especially operationally. Now it's something else I need to watch. Um, So I'm always on the lookout for how can I move having a right less code How can I do things more declarative, Lee? Um but I also think, you know, as as we write less code and we're bringing in more services to do things. And, um, while we're trying to make that easy and such, we don't have good tools for understanding those things of the applications fully, um, fully made concrete in the cloud. I think you know the AWS CD K is doing work on that front. Um, but their take on it is that it's okay for it to be client side for the information about what an application is to get sort of loss. Aly flattened out into a set of cloud resource Is that then get deployed. Um, and I don't think that lossy step is acceptable. Ah, I think the you know, the concepts that the developers expressing in terms of what they're what they're, um what they want and how they want to represent their applications need to be fully represented cloud side, so that then when they go and there's something wrong, they can fully trace their understanding of the system through like the AWS console, rather than needing some client side tooling that may be specific to a development environment. For example. You know, if someone says, Oh, something's going wrong and you know the person who designed it was on vacation, Another person shouldn't have to, like, clone their repo to just get started with trying to fix it. Um, and so that's where I think client side solutions are not the answer. Uh and so I want to see more of that. And I think, you know, people like Stack ary are making progress on that front on the client side, though during development, I think we need better tools. Better integrated tools from AWS. Ah, you know. So we have the Sam Ah Seelye, We have the E CSC Ally, Um, we have the cloud formation Seelye for doing resource provider development now, and the number of those command lines is only going to explode. And they're all doing very similar things, and it's hard to glue them together. We have a task right now where we have step function, and most of it's dealing with Lando's. So we're using Sam, and we ran into something where it's like, well, this job is gonna wrong. Run longer than 15 minutes. It's collecting and it's gonna need a bunch of, Ah, a bunch of either memory or disk space. Um, and so we can't run it in Lambda. And so on. The C s task is the thing that makes the most sense. It's low, right? So it's not. Ah, it's, you know, it's it's not difficult to handle, but it's just hard to be like, Oh, I'm working in san now I need to add any CS task. How do I even developed for that? You know, I can't package up that docker file and get it in the VCR and stitch that you see our image into my ah into my template in the way that Sam does for zipping files for Lambda. Um, and that makes us want to, like, resort to using a codebuild task, which isn't which would would work. Um, but isn't the right ah way to go about it? Because it's much easier to just define that inside your template and not deal with all those complexities. Um and so having a better unified sort of aws dev command line with like plug ins for the services that you might be using that enabled you to set these things up. Um, I think might be a better way forward. Um, as people need to stitch more of these different ecosystems together. Yeah. So I think, um, those are some of my some of my wish lists.

spk_0:   27:26
OK, that's a pretty constant, pretty comprehensive list.

spk_1:   27:30
Oh, I have many more. But I think that cover some of the big items.

spk_0:   27:33
There's a lot of to impact there. But once you so maybe go back to the 1st 1 you mentioned around the whole writing, basically writing your your workflow without writing custom code. I know you also you guys, you guys use their step functions to quite heavily as well. I'm a big fan of a step functions even though I guess the way you can do it is maybe no, as nice as you say something like, ah, note read, for example, from IBM, where you have that is really great visual to awaken, to sign your workflow. And then you can also just switch out to the, um oh, or Jason editor view. Can you maybe talk about some of the use cases you have the first their functions right now. And how do you feel about where we're going, We for step functions and what's current missing there.

spk_1:   28:21
Yeah. I mean, we love step functions. We use it for everything we can. Eso Any automation process that isn't just a single lambda is pretty much a step function. And we like that. We, you know, we do long running jobs that are iterative using step functions at Landau. So, you know, Ah, just recently we had a task where today there's an SQS Q and A client side script that turns over, ah, large list of items and puts those elements in Ah, in an sqs you for processing and his recent job where that took the developer. It had to run for, like, an hour on their laptop. And we're gonna move that to be while you just put that input file in as three, and then we'll fire off a lamp or will fire off a step function based on it that will, you know, keep its place in the file. The file is relatively small. There's just some, like parsing and a P I call work that needs to be done for every line. And so we'll do that in the Lambda until it runs out of time and we'll just look back around. And now if it was super easy to set up that E c s task, um, or codebuild task we could do it that way. But, ah, in General Lambda for this ends up being easier, easier to manage as a development experience. And so then we just have a little loop that that keeps a pointer toe How far through the file it got through. And so we use step functions like that and all sorts of ways. The map state opened up a lot for us. We usedto have a lot of parallel states that were pretty cumbersome to deal with to the point where we're thinking about building a cloud formation macro toe help stamp out large parallel states based on a single branch. Um, but then the map state came around and we didn't have to do that, so that was great. Um, but yeah, designing them is hard. Manipulating the state object through all the ah Jason path expressions can be a pain. They're still missing. Ah, you know, we're big in tow Robo maker now, And you can't currently let's see Let me just double check on this one before I Yeah, we kick it off. We do. Ah, the yes. Sagemaker works with it, but robo maker doesn't. And so we were on lots of rubble maker jobs to run robot simulations so we can do fewer physical test with with robots. Um, forgiven, you know? Ah, poor request. For example, we can we can run simulated robots and check how well your code works based on that. But today there's some lambda glue toe help Handle that. And we could get rid of that if that was integrated with step functions. Things like that. Um, but in general Ah, yeah, it's It's something that we use every day and and engineer on my team. Ah, used to, you know, sort of automate everything as Jenkins scripts. And then when he saw step functions, he was like, Oh, yes, I can I can just switch all these things that were in Jenkins to run with step functions. And there's like, one last thing I have to deal with. And now he's pretty much our step functions expert, our foremost expert. We have a bunch of people who are pretty expert in step function.

spk_0:   31:45
That's recall, Andi. I've actually done quite a few of that. The similar thing that you just described earlier, intense, off using step function to use to trigger Lambda to process a large s tree flower in kind of like a recursive loop. Like I say, just sometimes that's just e c eight then, too, spin up a far greater task because you don't have that trigger. You said the ghost to Lambda to trigger your fargate task and all of that.

spk_1:   32:10
Yeah, and codebuild is, like, kind of Ah, an easy out for that because, you know, you can use a build spec file, Teoh. Define something to pull in your small snippet of code. Um, but it feels like a work around

spk_0:   32:29
the So what advice would you give Teoh? People who are just looking to start their service journey today Anything that you know, that really helped them get started quickly and avoid common mistakes.

spk_1:   32:41
Well, I think the most important bid is adopting the serverless mindset, which, um is something you can do even before you can use serverless that the technologies that we refer to a several lists, right? It's it's about wanting toe own less technology to deliver. Ah, customer value better. And so this is sort of, you know, ah, example. I, like, decide about this Waas, um that Ah, our administrator who handles Gira came to me and because we have we have ah, self hosted era and said, you know, I have trouble We've got these easy to instances that were running gear a on and they're they're, you know, freebsd and ah, no one knows exactly how they're configured and all the stuff, and I'm afraid they're gonna fall over and everything's gonna start a fire. And so we looked at was okay. How do we how do we set this up toe Have his little, you know, operational burden on that admin so that ah, they could go back to Jura administration. Right? Um, operating the software on it rather than the infrastructure underneath it. So that meant, you know, using Amazon Lennox to and setting them up. Ah, to be immutable so that Ah, if one of them went down, you could just bring another instance up rather than ah having to configure them. Ah, setting them up with you know, RTs rather than a database inside in D. C. Two instance, all these things are possible, even though you know none of the things that are involved are very serverless themselves a flying the same mindset. And so even if you are, you know, really interested in like, oh, Landel, it's really useful. But I'm stuck with all this stuff that I have to do as long as you're thinking about how does this help me deliver value to my customers better, whether your customers are and users or another team in your organization or just other members of your own team? Ah, I think that's the very first step. For the best step, you can take the technologies themselves. I think we can see people get bogged down with a halving adopted that mindset where they're saying, Oh, I'm gonna You know, I have lots of Landers or I have really big Lamb da's because that's the code is familiar to me. And I'm applying all of this custom logic in here. And that's where we start to get away from the real value of server lis, which is owning less technology.

spk_0:   35:18
Do you see serverless? That's the dominant paradigm in a few years time. What do you think? Serverless. The containers are gonna Splenda together like Tim Wagner predicted a few years ago.

spk_1:   35:28
I mean, the way that I think about this, I don't know that service will be dominant, especially given the rise of kubernetes in the last couple of years. Because KUBERNETES is allows people. It solves a lot of the current pain points that people have with containers without asking them to rethink the paradigms that they're using. So it presents them with a familiar architecture that's server based while solving a lot of theon, gration, aled concerns and architectural concerns that people have. And so, ah, I think that's delayed serverless adoption by a couple of years because it's enabled people to get to some place where they feel comfortable and feel like things are going better, even though they're missing out on all these substantial benefits from serverless. What I hope to see is if I look at, for example, fargate and Lambda. What I'd like to see is not Lambda adopt Mawr container based features. I don't really want the ability to provide a container image to Lambda. Um, I don't want indefinitely running Landers. I don't want our long Ramdas. You know, 15 minutes is enough. If you got to go over 15 minutes, you should be putting it in a container in a E. C s task instead, um, or you know, and you can't wrap it in, Ah, step functions. Loop that kind of a thing because the longer there's never a limit to how long you can go where everyone will agree that that limit is enough. You change it to 1/2 a Knauer and someone will say, I need 45 minutes. You change it to 45 minutes and someone says I need an hour and 1/2 And then eventually you've got indefinitely running Ramdas that ah can't get cycled out. And the scaling from Lambda gets worse. So it's harder for them to make a cheaper. All these things come up. So instead, what I'd rather see is more of Landis features be brought down to fargate. So I'd like to see a cloud run like model in Fargate where you know it's counting requests and creating new containers based on number of active requests. A swell as having like Lando's event source map ings. So everything that you should. That you can use is an event source for Lambda. You should be able to use as an event source for fargate. And then the difference is relatively minor As to how your computer's running for a given part of your architecture, right, you've got your infrastructure graph and forgiven compute node. The rest of the architecture doesn't care whether that's Lambda or Fargate, but what I do want is that there is a gap in features between Fargate and Lambda. What I mean by Gap and Features is that you can't do everything you want to do in Fargate on Lambda. You need to rethink how you write code to use land up because if because if there's if the Grady in between Fargate and Lambda is too small, then people will move as far Ah, in as they're comfortable and stop. And what you really want is that there's a little bit of a hump to enter Lambda that makes you have to rethink a little bit of what you're doing in a way that then you can start to realize the benefits that go by really investing in land and service oriented architecture. Um, and without requiring that little bit of a rethink, I think people won't actually rethink. They'll get as far as they can without rethinking and then just stop. That makes sense.

spk_0:   39:24
Yep, absolutely, at some 0.1 people to get off the faster horse and get on getting to a car, right? Yes. And I do think that what you mentioned about the communities and some this ruling making containerized environment seems to be better eyes Stephanie holding people back, I think, more than they probably realized, because community is a really complex beast and people don't realize how much how much cost goes into running the maintaining of communities across the Iraqi, the cost of them that being starting more expensive. But they don't realize that they got a team of 10 people looking after the community's crossed the so with that in mind at any others off on the biggest untapped potentials and use cases that we haven't really seen. People explore with serverless, we've lambda.

spk_1:   40:11
Yeah, I think, um, one of the areas that I'm interested in is organizational administration, and I am interested in this with Systems manager automation documents. As an example, um, where If I want to run a piece of code against every account to my organization that today is not that's straightforward. If I want to deploy infrastructure to every kind of my organization, I can now do that with confirmation stack sets. But then invoking that infrastructure using it to do something is, I think, the next step and simplifying the way in which I can do that. Especially when I am sense, um, or in an s three cents, you know, so that I don't have to push the same code to every region and every account so that it could be accessed by Lambda to get deployed. Kind of a thing is, ah, something that I think is lacking. Uh, I'm really excited about the shift to eight of your single sign on Ah, because I think it moves a lot of the complexity of AWS accounts. Um and ah rules map ings from being inside your identity provider to inside in AWS service. But I also don't see you know today you can use it with the seal ivy too. But you can't use it with photo three or any other esti case because they don't recognize where that token that you get from your identity provider, whatever gets stored, you know, they had Teoh sort of revamp some things to make it work for the CIA lie. And that exists in Bodo Corvi too. It's not brought into the SD case yet, Which means all the tools that are based around boto or the other STK is like the Sam cli um can't use us is so signed. And yet I'd love to see that in amplify. Right. So amplify lets you create, um, well, perhaps with a back end very, very simply, And you can add off. And you can say I want this to be a cognitive user pool or whatever you want you're off to be. If they added eight of your single sign on to that, all of a sudden you'd have, uh, teams within the enterprise that could create their own Web APS that would be authenticated using, ah, the organization's single sign on, um and ah, fully protected fully, you know, using best practices for all the things that they're building with it. Um and then they'd be able to create their own applications that much more easily. I think that matters a lot for even just sort of back end teams that ah want to put a user interface on the front of their A p I. They could do that with amplify if, as a so author is an option, AWS single fine on the service off was an option.

spk_0:   43:07
Yeah, since 70 different teams that build basically the own version off the kind of bridge between some kind of a single sign on process that's kind of integrates with the tooling with proto we've sdk so that developers that had to have the constantly copy and pasting credentials into their credentials flowers and setting up new ones every time the tokens expire Something something that comes from a to B s and integrates. We've all these different tooling that really exists form aws Theodore B double Really, really helpful. I think they've done a lot of work the last couple years already around organizations and all of that. But it's still a constant pain point for a lot of customers, and companies are working in the past as well.

spk_1:   43:47
Yep, for sure.

spk_0:   43:49
So I think we already cover love. There's a wish list you have is anything else that you want to so mention around. So your wish this items I Hank

spk_1:   43:57
recovered, I think the big picture.

spk_0:   44:00
Yeah, if you cover quite love them already. Okay, That's great. Ben, Thank you so much for sharing so much of her insights. And also potentially what the future holds there for us as the service community again. Take it easy and hope that the things are not too bad where you are and to stay safe. Yeah,

spk_1:   44:16
Thanks again for having me.

spk_0:   44:17
Okay, take here. But by that's it for end of episode off real world serverless to ask, has the show notes and the transcript please go to real world Serverless Start, Come and I'll see you guys next time.

How did iRobot get into serverless?
What were some of the challenges you faced in those early days?
What does your traffic look like?
Do you ever run into concurrency limits on Lambda?
Do you run multi-region?
How do you deal with duplicated messages from IoT?
Are you doing anything to auto-scale Kinesis streams?
What are the most pressing features for serverless on AWS?
How are you using Step Functions?
What advice would you give to people who are starting with serverless today?
Is serverless the future?
What are some of the untapped use cases for serverless?