#34: Survive AWS information overload with Ken Robbins Artwork

Real World Serverless with theburningmonk

A podcast where we talk about real-world use of Serverless technologies from engineers who work with them day-to-day. We will discuss use cases, why they chose serverless and the pain points and challenges they face. If you want to know what it's REALLY like to work with serverless, this is the show for you.

All Episodes

Real World Serverless with theburningmonk

#34: Survive AWS information overload with Ken Robbins

October 21, 2020 • Yan Cui • Season 1 • Episode 34

0:00 | 46:08

You can find Ken on Twitter as @CloudPegboard and on LinkedIn here.

He blogs on medium as well as his own website:

To learn how to build production-ready Serverless applications, go to productionreadyserverless.com.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:12

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today I'm joined by Ken Robbins. Hi, Ken, welcome to the show.

Ken Robbins: 00:25

Hi, thanks for having me. It's a real honour to be here.

Yan Cui: 00:27

So I guess we met virtually at the ServerlessDays Virtual a while back, and I was really taken aback by the talk you gave at the conference. Maybe before we go into that, can you tell the audience about yourself and your journey into the world of serverless?

Ken Robbins: 00:44

Sure, absolutely. Well, let's see by day I am the founder of a startup called CloudPegboard, I guess I should say day night weekend at the startup. And this is a service that helps AWS developers keep up with the flood of cloud information. And we do this by providing a personalised Slack or email feeds for just the information that's relevant to you. We also have a portal that makes development much easier and much more efficient by organising all the reference information that you access in a comprehensive per service data sheets, and so CloudPegboard is right now a solo entrepreneurship. And so it's hundred percent strategically relying on serverless, which has really enabled me to get a tonne of stuff done with just me. It would be great to have a team like I used to have. But you know, that's why serverless works there. I do a couple other things. So I'm also the volunteer CTO of Miles4Migrants, as you mentioned, that was the context that we first met. And there we use donated frequent flyer miles to fly people that are fleeing war, persecution disaster, we reunite them with their family or find other safe havens for him. And again, I'm a volunteer there, the organisation is almost entirely volunteers, just a few staffers. And I'm the only one that does any sort of development. And so again, I strategically rely on serverless to get stuff done. Otherwise, just probably wouldn't get done if it had a higher buy-in cost. And I guess also relevant to some of this is before I started CloudPegboard a year and a half ago, I re-engineering from Novartis research. And so the joy discovery side of things. And I moved us from a fully on-prem organisation to the cloud, starting in 2015. And in the process onboarded over 500 individual engineers and informaticians to our platform. And so there is well, there was a, you know, we used a lot... It is a hybrid strategy, and we had a lot of different types of solutions. But certainly serverless was a strategic focus as well, basically, where we could use it we did and where it didn't make sense we didn't.

Yan Cui: 02:48

Okay, so there's quite a few different things we can unpack there. I guess, I want to start by talking about Miles4Migrants, seeing as a volunteer organisation, and I guess the cost is a very much, you know, first and foremost of your concerns, in terms of your time in terms of resource use, but also in terms of the AWS bill as well. So what has been your experience of building Miles4Migrants with serverless so far?

Ken Robbins: 03:14

Yeah. So I think the dominant factor is the amount of effort it takes to get from idea to solution. So cost is important, because we really want any donation we get, which is mostly in miles and very little in cash, to be going to the heroes that we fly and the people that need it. And so we really, you know, obviously, like many charities are on a shoestring there. But fortunately, I should mention that AWS has been very generous to get us going, and we got some credits. And with serverless, I really don't put much of a dent into those. But it's really mostly about the time, right? It's kind of crazy slash dumb to be doing this work while trying to build a startup. But it's important to me. And if I had higher costs, like if I had to start deploying servers and maintaining things, in that sense, it would be enough friction that I probably wouldn't do some of the things that we've done. And so the serverless makes it possible to basically build something, get it deployed quickly. And then really importantly, is no maintenance. Because frankly, with an organisation like this, sometimes people that volunteer and I'm at risk as well, they get tied up in other things. And so they start, they build something and then they disappear. So then if you got to figure out how they built the AMI, and how to patch it, and how to tune various thing, you know, just just a lot, you know, like there's a lot of stuff that goes on there. And with serverless basically I deploy stuff it runs. It's all very self-documenting. I happen to use a serverless framework and so basically can just do a serverless deploy to make a change and it's just requires zero maintenance and zero extra effort, which is really important because we can't really break operations just because we have some variation and who's working and who's volunteering or not.

Yan Cui: 05:06

Okay, how does the architecture for Miles4Migrants look like from the back end? Okay, I can imagine this gonna be a lot of API calls, where, I mean, how do people even contribute their miles that they’ve collected with different airlines and, I guess, alliances and contribute to you?

Ken Robbins: 05:26

Yeah. So interestingly, you know, CloudPegboard has got a pretty complicated, not complicated but expansive serverless infrastructure. By contrast, Miles4Migrants, it's really simple and in very independent sections of capabilities. Really, the story is the idea of using serverless when I need it. So the first tier of serverless, for us is to use SaaS. So in particular, we use salesforce.com, as well as Jira for a couple of different aspects. And basically, I'm using AWS and serverless, when there are gaps, or when I need to glue things together. So for example, we need to do some analytics on our Jira. So I have a Lambda function that's scheduled off a CloudWatch event, which wakes up, extracts all of our ticket information from Jira, puts it into S3, and then into a QuickSight, and then we can do some analytics. So that's one example. And then another place, there is an app that when we have partners in the field, these are people that work with refugees or asylum seekers, and they're in the situation where, you know, they find someone quite literally just that's been dropped off at a bus stop, and they just don't know what's going on and how to find safe haven. And so volunteers from various organisations will find them and engage with them. And if they eventually come to the point where they need to fly them somewhere, until they contact us, we'll match them up with a donor who's got some miles, but one of the things we need to do is get a signature from them. And so we have a very simple web based mobile app that the partner can basically hand to the refugee or whatever this person is, and ask them to read the agreements, fill in a few pieces of information and sign it with, you know, using a little finger signature. And this is all done in the field. And so what happens there is once they do that signature, we convert that into a PDF that has the agreement with the signature, and that has to get attached to the Jira ticket, right. So that's not a capability, I can do straight out of the box with Atlassian. So it basically just build a HTML JavaScript based app, put together a few Lambda functions in the back end to receive this to basically serve the page and, and then receive the submitted signature and document. And then there's another Lambda function that interacts with the Jira API. So it's things like that were just a small, independent piece that stands up to be basically a lot of glue. But that one piece of glue, once I build it, I really never ever touch it again, it's that stable. And then there's some other places, for example, with Jira via various plugins, you can call out to APIs. And so I can put together an API to serve up certain types of information. One that it's actually not fully integrated yet because of the issue on the Jira plugin side. But for example, when we're filling out a form, one of our agents needs to be able to put in what location so right now we have it in the, you know, the airport IDs, so just be typing in freeform text box. But obviously, I want that to be a drop down pick list, because we really don't want to have errors. Errors are expensive in terms of the human costs, if have to go back and forth, these things are very timely. So you know, being able to make an API call to do the autocomplete for SFO to be San Francisco and that sort of thing. So that's that kind of thing. Hopefully, it kind of gives a picture. For most migrants, at least a lot of pieces that are where we need a bit of glue, we just put together something as a serverless solution and drop it in place.

Yan Cui: 08:59

You mentioned that, at the start of that conversation there that CloudPegboard has gotten much more expansive architecture. So what does the architecture for CloudPegboard looks like?

Ken Robbins: 09:09

Sure. So there's probably three main sections, I'd say there's two of the whole back end data collection, integration and persistence. Then there's a front end web app, and then APIs to provide some data services to the web app. And then similar to that, but in parallel is a Slack application, then we have a Slack bot, that again reaches back into the integrated data, but then serves that up as an application. So each of those is also serverless. And the back end is maybe the more interesting part and in the sense that there's a data collection doesn't have to be instantaneous. So it wakes up on a scheduled basis using again, a CloudWatch event, scheduled event, I should say. And it kicks off a Step Function that runs in parallel a whole bunch of these data collectors. There are literally dozens of them. And each one is an independent Lambda function that goes and collects data from different sources, and then puts it into S3. And then further on down the pipe, we have things that do integration and pulls all these S3 files together to integrate them is another sort of step into Step Function that does get text differences. So we can alert what's changed with AWS? What's new? what's changed? Then there's something that kicks off to send emails to the users on a daily or weekly basis, depending on their preferences. And so all of that's being coordinated through a Step Function on the back end.

Yan Cui: 10:36

Sure. So actually, with Step Functions, when you kick off those different parallel steps, what are the things that you're actually crawling? Are you crawling the AWS blog, I guess? And then collecting them based on different, I guess, different services? Is that what you're crawling?

Ken Robbins: 10:53

Yeah, so we've pulled in from several dozen RSS feeds. Then also, sometimes we're hitting API. So I pull data from GitHub, from YouTube, from Twitter. So for example, on Twitter, you know, the #awswishlist that gets tossed in there, which are pretty neat. But one of the things I do is that's one of my data sources, so I pull in AWS wishlist items from Twitter. And then like I do with all services, there's basically a step where we integration after I pull data in, I scan it using different techniques, sometimes I'm using Comprehend, sometimes it's heuristic , depending on how structured or unstructured the data is, to basically just try to identify what AWS services is this data source talking about. Sometimes it's really obvious. Other times, you have to... like on the wishlist item, you have to extract that. And once I know that, I tag it with you know, I have an immutable ID for all AWS services. And then we tag it with that so that ultimately, most of the data that I deliver back to the user is organised by service. And that's sort of a pretty fundamental paradigm to organise database service. And so I think that's the way people tend to work like I'm working with Dynamodb at the moment. And so I care about its limits, its security URL, the Boto doc, CloudFormation doc. So all this information I pulled together on by server, and so and then sometimes their HTML pages, sometimes their API, so SSM it was services manager, system services manager, I couldn't get those acronym up. You know, that's a, you know, for some information like region support, I pulled directly from APIs there. So it's really quite a variety. And so what's really neat is, every time I identify a new data source, I think this will be useful, or some user says, what about this, just look at all the different ways that that data could be found, and then spin up a unique Lambda function that goes and collects the data, puts it in sort of a canonical form that drops into S3. And then there's later downstream, you know, this integration capability that picks that quasi-canonical form, and then does one more step of cleaning up organising it and putting it into what is ultimately an integrated database with all the sources, all keyed off of the service that they refer to.

Yan Cui: 13:09

Okay, so one of the sort of, I guest, go back to that bit, you talked about with the Twitter AWS wishlist items, how do you then feed back to the system to make sure that you've tagged the right thing? Because obviously, this is just people, you know, writing random stuff on the internet? And how do you make sure that improving the accuracy of your techniques that you're using to tag those services?

Ken Robbins: 13:35

Yeah, so interestingly, generally, it's not been much of a problem, I take a fairly conservative approach, that if something doesn't match, I will delay putting it into the database. It's one step before they go way back in time when I first did this, basically, I had to just even create the first database like what are all the AWS services which is still a challenge. Because what someone defines as a service, and especially if you look at things like all the subservices under IAM or Sagemaker, yeah. Is that a service, a subservice or just a feature? It's hard to sometimes tell. But it also surprise, not surprisingly, if you are Corey, you wouldn't be so surprised. But yeah, the naming of AWS services is not only is it sometimes strange, but it's highly variable. So one of the things I did in the very first version was if I found a new name that wasn't in the database, I would go and add it to the database, and pretty quickly I realised that I had things that were essentially different, essentially the same being named as different things. And so I figured that it would... so the way it works out now is the occurrence of that is not that frequent. So whenever I extract a name, if it can't find a match, by looking against all these techniques I have, then basically it kicks it out as an alert and requires me to actually inspect it and decide, “Oh, is this actually one of several things? Is it an alias for an existing service? Is it in fact a new service? Or is it not, just not really anything?” And that doesn't happen that frequently. There's a few times a week, I need to look at that. And it's really interesting. And one of the things that happens is there's essentially a confidence score. And if the system thinks that look, this is a new name, but it kind of looks like something I've seen before. So it basically says this is a proposed alias. And so then all I need to do is say, is put my eyes on it, and say, yes, that is an alias or no, it's not actually that's a veritable new service. So then my database per service, I maintain a list of aliases. So anything I've seen that as a possible misspelling, changing spacing, punctuation, you know, even with fuzzy matches, you can find things that still show up as new. Even as much as you know, the prefix, which is not a variable, right, Amazon and AWS prefixes as mysterious as why one service has named it AWS and Amazon as they may be, people have theories. But you know, there is one correct answer, but out of AWS documentation is, you know, several times I have found where it's presented with both. And so if it shows up with a different prefix, but otherwise, the service looks the same. It'll come through as a proposed alias. And I'll actually put it in the database as a valid alias once I've inspected it. So since then, I know.

Yan Cui: 16:17

That's funny that even the official documentations mix and match them. I mean, certainly I do sometimes I used Amazon Dynamodb. Sometimes I say, AWS Dynamodb, for example.

Ken Robbins: 16:30

Yeah, exactly. There's, I would say I haven't done a statistical analysis on this. But I would say on average, I have around 12 aliases for every single service. And that just comes from just the way it's presented in different data sources. And these are mostly Amazon data sources, as opposed to something like Twitter or YouTube, although YouTube as Amazon produced usually as well. So yeah, they come up in many different ways. And, in fact, when I'm blogging, one of the things I do is I, of course, I always have CloudPegboard open, but I do have the search box open. And so every time I type a service name, I always just flip over to the tab with CloudPegboard type the service name so I can see what the official spelling is, and then get the link for the product URL, you know, which is one of the things on my datasheet. So yeah, it's not the worst mistake in the world. But I like to get it right. So because yeah, it's hard. It's impossible to remember.

Yan Cui: 17:24

And so I guess I have heard a couple of people told me that, when it's Amazon something, it means that the service was built for amazon.com and used internally before it gets exposed publicly as a service on AWS. Whereas AWS something that means that as a service is built for AWS, he said the same as what you've heard, at least one of the rumours that you've heard.

Ken Robbins: 17:50

Yeah, that's one of the rumours. Another was, and I'll probably get this flip. But, you know, things that are sort of, I'm not sure what the right term would be, because any term is always overloaded here. But, that's sort of more of an infrastructure capability would be Amazon. And then when it's sort of more of a managed service, and layered up, then it's AWS. But I spent like a half a day, one day, trying to search this and read various blogs, because I just want to know, what's the answer? And the answer was that there seemed to be no consensus, at least externally visible. And when people had arguments that seems solid, I could find counterexamples. Now, maybe not a lot. But you know, so I basically just said, “Look, I'm just going to assume it's, it's sort of best effort to kind of get to have it mean something, but don't try to make it mean something”. And, yeah, it kind of reminds me back to at Novartis, we generate IDs for everything, right? So every compound antibody, Gene, everything has a name. And it's important to have them registered. It's really the basis of science is getting those IDs right. And it turns out, one of the issues that we had was people trying to pick IDs that sort of were like vanity plates. And so we eventually had rules that prevented us from generating random IDs that had anything that was other than random looking for certain situations. So just because it's either, it's sometimes good to say either it has no meaning, or it does have some semantics associated with it, but being halfway in between is kind of tricky. So yeah.

Yan Cui: 19:21

And so with CloudPegboard, I've been using it for a while now. I do find it quite useful in terms of filtering out a lot of the, I guess, noise in the stream of information you get about AWS. But what is your business model for CloudPegboard? How do you hope to make money out of it?

Ken Robbins: 19:39

That's a million dollar question. So really, this started off as a passion project. I really just thought this should exist and got frustrated that it didn't and so after was at re:Invent 2019 I was talking to a bunch of people and I was like, yelling over the crowds. And so the loud bars and everything and I lost my voice talking with a bunch of people and just came back and said, “Look, I'm just going to do this.” And so I'm an idiot entrepreneur that basically said, Yeah, I'm building something, because I think it's needed. And I worry about monetization later, which year and a half in, my wife's wondering when that is gonna be. But, you know, so I did try, I did charge for it at first or after I got some number hundreds of users, I started to charge for it, and people were paying for it, but not at the rate that I, that I would need to make money, I could see that look, that's just not going to work. So I decided to keep it 100% free. And now I'm just kind of focused on making it useful and, and hopefully building a loyal user base, that it can expand. I mean, there's millions of people that could get value from it, because it's really anyone's using AWS, it's really no matter what your role is, it's useful. So really, the model is to get a lot of happy users, and eventually, hopefully move to a sponsorship model. Right now, I'm not doing that, because I really want to make sure that it's solid, and really providing great value first, but yeah, eventually, hopefully, it'll be more of a sponsorship model, I think will probably work best, you know. And there's also the side of it, which is for enterprises, there's some additional features that I have. So like, you can, again, this was an idea I had, because I wanted this when I was at Novartis, but I hadn't been able to sell the idea of basically extending the data model for enterprises, which to me is a little bit surprised, because I would have bought it when I was on the other side. But so for example, at Novartis, we had internal governance procedures and had a wiki that listed every AWS service and had a link to the internal security guidelines for how you need to configure that service. We happen to use Turbot for some of our governance and management. And so there's always a question, “did Turbot support the service yet?” Because if it didn't, we weren't gonna let people use it. We even have lists of internal experts and things. So we have a lot of this internal metadata that was related to each service. And I didn't want to manage my own catalogue of database services. And so you know, CloudPegboard has this catalogue of all AWS services, and basically, any attribute can think of that goes along with it. And so the idea for our enterprise version of this is that you can connect your internal enterprise data and extend the data model. And so for internal users, you can when you're looking up, let's say the CloudFormation syntax, or a ARN syntax, or links to product documentation, using the datasheet, you write them in there, see all your company's specific information, such as security information, and that way, people are much more likely to find it and use it, you know, in that model, then there'd be an enterprise play. But being really honest, I've maybe just because it's just me, and it's hard to do enterprise sales or building product and doing lots of other things. But it hasn't stuck yet. So I'm hoping that maybe eventually that will, but really, it's I think, just trying to get access to as many people as possible, get them really, really happy, and then eventually find ways to get some sponsorship.

Yan Cui: 22:57

Yeah, starting a company is always hard, which is pretty much the reason why I haven't started mine. But I do wish you good luck. And I do think it’s very useful, valuable service to the community, I find it, I found quality value out of it. And hopefully people that are listening today would sign up and get some value out of it as well maybe talk to their bosses or their bosses bosses and get the enterprise ball rolling for you.

Ken Robbins: 23:23

Yeah, I appreciate that. Well, that's great. I'm glad you're finding some value. And, and really, you know, this came from my passion, because it's just one of those things like, I want this, no one's building it. So I'm just gonna go ahead and build it. And, and so for me, and I think eventually, I started my first business long ago, which is so far ago that it's embarrassing, but you know, the aviation weather service. And so I guess I believe that, eventually, it'll work out. But mostly, it's, I'm having fun. I mean, this is a blast in just I figure, you know, I was, um, the classic case of what not to do as an entrepreneur, just build it, because you think it's interesting, and but, you know, eventually it'll work out. And I guess I should also mention, one of the things I am doing, it's because I'm a builder, not a marketer at all. And so I am looking for a, like a CMO like person to be a partner and someone who can actually, likes and is capable of the stuff much more than I am.

Yan Cui: 24:21

Okay. And you also mentioned before the show that you were working on something new, another venture to add to the mix. Do you want to tell us about that as well?

Ken Robbins: 24:30

Sure. It's more of an extension than it is totally new in the sense that I've gone out and I interviewed an awful lot of people still want to do more, basically, doing customer interviews, just trying to learn what people are doing, what some of the friction places are that all around this idea of gathering and managing information because, you know, it was just such a flood of information and also, you know, and I'm also adding as your support and such, so which makes it even more. And, you know, one of the things I was looking for is patterns there. And one of the things that seemed to stick out was the use of Slack, which I support, I had supported in a small way before and really in use a lot myself. But I was surprised the extent to which people kept saying, yeah, they get the RSS feeds, or select a number of them in their Slack. And the problem with that is that everyone is doing it independently. And there's like 30, different RSS feeds that are worth following. And again, this sort of, you know, I have this motto, like information architecture matters, just getting all the information is not the best thing to do. I'm trying to find ways to provide people with just the information that they need that's relevant to them in the way they want it and how they want it. And so and that's what CloudPegboard does, you know, especially with these nightly emails, whether it's whether you get them daily or weekly, that you can specify just what information you want to get. And so I'm now moving that to Slack so that there's a Slack app where basically can get all the feeds that I get pushed into a Slack feed, that's personal to just use. So if you… you know, the app has its own sort of messages tab. And therefore you can get just the information that you want. And like, if you want to unwatch a service, you just unwatch the service and again, continues to be organised by service. And so you can say, “Do you care about region updates? Do you care about governance changes? Do you want to know about AMI versions on the deep learning AMIs or if you only care about serverless services?” So you can basically configure dynamically if you want, just what information is useful to you. And then also just even when you get it like in the feed, you can say, look, only give it to me during these hours of a day so that you don't get that sort of distraction factor. And so it's tying into the same database as I have with all the same data collection, but I'm reporting it out via this very highly customizable, personalised Slack app. And then it also has some other really neat things, which I find important, such as a reading list. So if you look at something, say, “Oh, yeah, that's really important. But I really don't want to read it now.” you can just add it to a reading list and then there is a whole bunch of features around reading lists management, including the fact that thing, there's a priority reading list and a best effort reading list. And, and you can expire and set basically timers in so like you'll expire something on the best effort reading list, which, you know, again, looking sort of personal habits, sometimes you put something and say, I should really read this, and then it sits there for six months, after certain amount of time, you should probably just give up on it. And so I try to help with that by making that a capability that you can say, you know, give it retention time and such. So. Anyway, I think that's basically the idea that there's this Slack app, which is, um, really makes it a lot more accessible to get the information in the way that people want it based on what I've heard. And so that's getting... Well, by the time this right airs, it'll be well in production, it's, but hopefully I'll have my first early adopters, and in a week, so I'm, I'm pretty excited to play with it long enough, myself, it's time to let someone else play. And it was really fun to build this tool actually. It's fun to build a Slack app and get some much deeper use of EventBridge than I had in the past.

Yan Cui: 28:17

Okay, so you mentioned the EventBridge there, which I feel is a trigger word. Because I do think it is a great service. So how are you using EventBridge with Slack in this case? Are you using the built-in Slack integration?

Ken Robbins: 28:29

No, actually, that really doesn't quite fit for what I'm doing. But the way the Slack API works, well, one of the pieces of it. It'll send you an event when certain things happen, well, when most things happen. So Slack events will come to me through API gateway. And for most of these things, because obviously Slack trying to protect the user experience, you got to respond really fast. And so basically, I have an event handler, that's Lambda function behind API gateway that collects the information from Slack. If it has to do something, it basically responds with a 200 response and the appropriate, you know, responses needed there. Sometimes it'll do a little bit of work like it has to pop up a modal, it'll actually go and draw that modal. But most all other events, I just dump them off to EventBridge in fire and forget it. So I'll handle that later. Let me make sure to respond to Slack quickly and efficiently. If somebody sets a very short time out, and I don't have a variability for data dips, or anything else. So, so once it's on the bus, then I have another Lambda function that subscribes to that or several, and then they can do things. So for example, there's the... not sure how many people are familiar with the Home tab. But for Slack apps, if you click on the app itself, it'll give you a screen where you have two tabs, the messages that are sort of the stream of messages from the app, as well as the Home tab. The Home tab is more of a persistent static screen, but you can update it and so what happens there is one of my functions is just responsible for managing the Home tabs, it will pick up an event off the bus, and it can do whatever it needs to do. And it's sort of independent of time. Obviously, I've tried to be as fast as I can, but you know, it'll go and dip Dynamodb and do some logic. And then we'll push the result through essentially a callback URL back to Slack, which will then render that result on that Home tab. Or potentially, I might post a message in this, in the message stream, and so forth. So. So that's the simple version of it. There's other things that, you know, I have other activities that listen on for some state management, especially on the new user onboarding when things happen and you want to go and kick off. For example, I want to auto subscribe someone to CloudPegboard because I use features of CloudPegboard to amplify the Slack app. Because sometimes, you know, having a full web user interface is handy. And so I can hang different things off of there and just listen to the different events, and so have multiple subscribers to the same event in some cases. Yeah, so it's pretty simple use case, what I found is fantastic on forcing you to decouple like once I think a great pattern is habit is to just simply decide to use EventBridge. Because that's simple decision will then make the rest of the architecture flow in ways that you don't start doing some unnatural couplings, which, you know, because I certainly could do this without EventBridge. And then I certainly would be refactoring in two months or less. So. Yeah, so that's kind of how I've been leveraging it.

Yan Cui: 31:25

Okay. And I guess the one thing that I'll be probably quite interested in, in terms of a feature would be a, for example, I've been waiting for some service to become available in my region. I mean, this probably doesn't happen quite as often with the US... if you're based in the us-east-1. But certainly in the, some of the European regions, I've had cases where now we are waiting for a service to become available. And it’ll be really great if we just get a Slack notification when it does become available. So maybe that's something that could add quite a lot of value.

Ken Robbins: 31:57

Yeah, absolutely. In fact, that use case. I mean, essentially, CloudPegboard does that today, but not... but basically, if it's one of your services you're watching. But the specific way that you phrase it is something I've been wanting to do and haven't yet done, which is to say, a specific watch this service with this attribute. And so absolutely, that, you know, again, these are the kind of feedback I'm always looking for. Because I have all the data, I have the output channel, it's just a pretty simple bit of making some rules and deciding what to do for these things. And this all came back to, again, went back to my days at Novartis, there are several places where we're looking for capabilities that, you know, we were on enterprise support contract. And so we got the spreadsheets from our TAMs and looking at forecast when things would come out. But sometimes you just get surprised. And so, it was really important to find out when, whether something is appearing in a region or a new capability. So, you know, Mike Naugle example from way back when was for literally years, I was waiting for API gateway to have a VPC endpoint. And I talked with the highest level people even had a conversation with Mike Clayville about this thing. And, and but no matter how much gnashing of teeth there was, it just didn't happen for the longest time. And it was almost like a joke that I kept asking for when’s it coming, when’s it coming. And so that's one of those things where, you know, we were totally blocked, until we actually got that capability. And I think that's the case, you know, again, like you say, for certain regions, you're seeing the flow as they come out. And it's kind of interesting, looking at South Africa and Milan, and, you know, as some new regions, you can sort of see this flow as features start trickling into them over the last four months or so.

Yan Cui: 33:45

Okay, very cool. And since you are working, you're picking data from the Twitter about people's AWS wishlist items, what are your top three AWS wishlist items?

Ken Robbins: 33:57

I got a limit of three, huh?

Yan Cui: 34:01

You can offer more, if you want?

Ken Robbins: 34:02

No, I have a lot of like, sort of really, actually fairly minor things, but I think would be small, but really impactful. But at the sort of the top level, there's, I have this what I think is this really beautiful architecture that is completely serverless. It's what I call disaster tolerant. So essentially fault tolerant across regions, you know, so I used Global Tables for Dynamodb, Origin Failover for S3, and as well as you know, along with S3 replication, and so everything is beautiful and just feels good. It's a nice architecture. And again, it's active-active, so if anything really, really bad happens. I don't even have to be involved. There's no DR Plan, really, because it’s just active-active, except, except the Achilles’ heel is Cognito. I use Cognito User Pools, and there's, I know, there's still no replication or backup/restore capability that will preserve your passwords. So you know, I could, I can back up the, all the data and restore it. But if I have a failover event, if user pools in us-east-1, let's say goes down, I have a choice of either waiting it out, probably gonna work in most cases. But again, if you think about the patrol abstract, you know, way, you know, or I, if I were today what I have to do is actually have to declare an event to myself, right? And say, Okay, I'm now going to activate and point everything to my us-west-2 user pool and force every user to go through a password reset process, because I can't, I can't migrate the password. So this, I guess I'm frankly a little surprised that it's lasted this long without it. But really, I think replication of just like we do with Global Tables, which is an amazing feature in Dynamodb, which took a while to get but once we got, it's really extremely powerful. I really just want the same kind of capability in Cognito. And if I'm asking then, you know, you know, backup restore as part of it, it was backup, would all kind of fit and tie that picture together. So that's probably my number one. Number two, I think, is a little bit more of a nicer to have than than the previous one. But I think it'd be great to have essentially an auto tuning suggestion for Lambdas, set settings of memory, and maybe even timeout would help. So conceptually, if, like Alex Casalboni’s Lambda power tuning tool, but there's really no reason that Amazon can't just build this right into the product. And what I'd like to be able to do is, say, give some bounds of memory and timeouts that are acceptable to the function, you know, and that's actually running in production, maybe give a Lambdas, a hint to say, how much of my sampling can you do? Can you perturb 100% or 5% of my invocations to try some of these different options, and then log them in the report statement or an additional statement, which of course, then I could trigger off of for alerts and other things. And then either, you know, sort of a champion challenger sort of way, tell me, which should be the winner, or actually just go ahead and make it the winner and converge over time, when the service is good. I got evidence to say, look, this is actually giving you criteria of what you're looking for, you know, cost of speed, whatever it may be, once you've so optimised on that objective function, then actually make that be the winner, and then maybe challenged again. And so it's kind of like, you know, just like, I think the different types of, you know, on demand billing for Dynamodb was, was really, really super powerful, it'd be good to have that for Lambda functions. Now, one of the things I do today, because when you're small, and starting up a new capability, you know, I typically will put a fairly significant timeout and maybe a gig of RAM. And that's fine, because I'm only running it, you know, a few times, but then eventually goes to production, you leave it there, and then maybe six months later, you realise, oh, I probably should have gone back and looked at that, and brought that down to 256 is really all it needs. And, you know, to have to go and look at that I certainly it's not the hardest thing in the world. And I could run Lambda power tuning or do other things. But that's effort. I'm trying to keep the sort of maintenance free thing and, and there's no reason why we couldn't just have the service actually just converge to a more optimal setting, that I think we just it seems kind of an obvious thing to build. And I'd love to see that. And since I get my three genie wishes, uh, okay, I guess I can wish them doesn't mean I get them, huh? But something that I've kind of wished to have is the popular Python packages to just be available as layers. You know, there's some, AWS has like a handful that are out there, you can just pick the air and go down. There is another listing of some published layers. But I think it's important to have the layers published by AWS so that they're secure and trusted and maintained. And I know there's a really long tail, but there's also a pretty significant head, you know, like, I have a layer for requests. Like I shouldn't have to have a layer for requests and be nice to just point to an ARN and use that and Boto3 and NumPy or, you know what, there's just so many things that that it would be great to have as just layers, again, all to reduce friction and just make it much more of this, you can just kind of plug and play with pieces. at just one more thing, I think would be really handy.

Yan Cui: 39:30

Thank you. Those are some of the best AWS wishlist items I've heard. Certainly the one about Cognito is surprising for me as well. I've had a question from customers a few times is, “well, we want to do this multi-region thing. Everything seems fine. Apart from Cognito. How do people do it?” And unfortunately, the answer has always been, well, no one does it because it is just really difficult to do and just impractical. But I do wish they have better support for exporting users, like you said every time you want to change some kind of, well, change your user management system, or even just migrate from one pool to another is a giant pain in the ass. As for the power tuning, I think that's a, that's a, that's a very interesting idea. I've actually had saw someone do something kind of clever where by the function itself can actually do some kind of, I guess, metaprogramming to change his own configuration, in this particular case, I think he was changing, maybe not a memory, but a timeout. So when a function itself sees that, well, it's taking longer and longer to process things, it adapts by increasing his own timeout, or I've actually written some functions in the past, whereby we change the Kinesis batch size, to adapt the fact that we are just not processing things fast enough. And our downstream systems taking too long to respond and all of that we adapt by changing the batch size of the Kinesis function. So potentially, that's something you can roll yourself for now. But certainly, there's way more complexity than it’s worth, I think. Yeah, if it's something that Amazon can just provide out of the box, as like auto tuning.

Ken Robbins: 41:09

Exactly, again, because you know, everybody could do this. But then the classic sort of undifferentiated heavy lifting. And because it's really the same function, you don't have to know too much, I think the metaprogramming example of, of hitting your upstream service, that's something that you really want to do yourself. And that's actually a very cool use case, I think that's really neat. And in fact, you could write, this doesn't have to be built into Amazon, we know you could write a Lambda function that just sits there sucking up CloudWatch logs, like events, you know, and then having access to go and tweak the parameters of the Lambda function. But again, you could but doesn't mean you should. And, and that's kind of why we use AWS is because I want them to do everything. That's not business logic, basically.

Yan Cui: 41:57

Yeah, absolutely. Absolutely. And also, I guess, I would say that most of the architecture I've seen, Lambda cost is probably never significant enough to warrant too much effort. So most of the optimization in terms of cost has gone into other things like API gateway, or even CloudWatch logs, which often costs a lot more than the Lambda invocations themselves.

Ken Robbins: 42:20

Yeah, exactly. And, and so that's one of the things if it doesn't cost a lot of pain, then it's not gonna, it's harder to prioritise it. But for me, one of the things though, is because in some cases, like, I will be really generous on memory, to the point that it's, you know, it's a 10X Factor. And today, that's still not bothering me too much, because I just don't have that scale of user. But I know that over time, you know, I'm hoping, you know, to go up by a factor of 1000. You know, again, if you think about, you know, 100 milliseconds, so in every second, there's 10 of those, and then, then 10 more seconds of that. And so, I've done some, some of mine functions I've occasionally profiled, and I always find it fascinating, but yeah, there's definitely a sweet spot here. And it's, and I really do find a lot of value in this one time, we actually got value in two gigs, which was rare. But yeah, so Yeah, I agree that it's a low cost. But, sometimes I'm trying to, essentially optimise and performance without way overdoing the cost by, you know, because sometimes it's hard to tell the beginning, and then you kind of forget, if it's not important, not ever gonna go back up.

Yan Cui: 43:24

So if you have to work with compromise, so I wrote a blog post a while back about how you can use Alex’s power tuning tool as part of your CI/CD pipeline, so that you do it, essentially, with every deployment you work out, are you still running on the, against your sweet spot, rather than having to do it manually from time to time?

Ken Robbins: 43:43

Yeah, yeah. That's such a great idea. And there are places which is probably poor design, not probably, is poor design where that doesn't work for me just because things aren't quite as item potent as you want them to be and things are, you know, I'm dealing with some variable data. So for example, I am doing a certain amount of processing on a per user basis, and based off what, you know, someone could be watching 300 different services and someone could be watching for services. So the amount of information that I processed and whether it's on a Slack update or an email update can be highly variable and so that's why I kind of like the idea of running against actual production data, where's any kind of sort of quasi static analysis in the CI/CD pipeline doesn't have that. Could I build test cases that kind of figure out my usage patterns? Absolutely. But I haven't.

Yan Cui: 44:40

Alright, so this has been fun. Before we go, how can people find you on the internet?

Ken Robbins: 44:46

Well, on Twitter I’m at @CloudPegboard, which is really just me and DMs are open . LinkedIn, Ken Robbins and it's two b’s, and probably just add CloudPegboard into the search term to disambiguate me. I blog on Medium. And, of course, cloudpegboard.com/ is the site.

Yan Cui: 45:07

Excellent. So if you're using AWS and you're struggling with the constant stream of information about what's happening, and you want to just focus on a few things then checkout CloudPegboard. Personally I've been using it for a while and I do find it quite useful. So yeah, Ken, thank you so much again for taking the time to talk to us today. Stay safe and stay well.

Ken Robbins: 45:28

Oh, fantastic. Same to you I really... This was a tonne of fun. I learned something as well so this is great. I really appreciate this.

Yan Cui: 45:34

Alright, man, take it easy,

Ken Robbins: 45:35

You too, take care. Bye bye.

Yan Cui: 45:36

Bye bye.

Yan Cui: 45:49

So that's it for another episode of Real World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production ready serverless applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.