Real World Serverless with theburningmonk

#48: Serverless at scale at Prosieben TV with Daniele Frasca

February 17, 2021 Yan Cui Season 1 Episode 48
Real World Serverless with theburningmonk
#48: Serverless at scale at Prosieben TV with Daniele Frasca
Chapters
Real World Serverless with theburningmonk
#48: Serverless at scale at Prosieben TV with Daniele Frasca
Feb 17, 2021 Season 1 Episode 48
Yan Cui

You can find Daniele on Twitter as @dfrasca80 and on LinkedIn.

You can see open positions at Prosieben on this site.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

To learn how to build production-ready serverless applications, check out my upcoming workshops.


Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Show Notes Transcript

You can find Daniele on Twitter as @dfrasca80 and on LinkedIn.

You can see open positions at Prosieben on this site.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

To learn how to build production-ready serverless applications, check out my upcoming workshops.


Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:12  

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today, I'm joined by Daniele Frasca. Hey man, how are you doing? 


Daniele Frasca: 00:24

I'm fine. How are you? 


Yan Cui: 00:26   

So we've met each other at ServerlessDays Rome, when the power went out halfway through the conference, when Alex Casalboni was presenting, if I remember correctly.


Daniele Frasca: 00:37

Yes, was very funny. 


Yan Cui: 00:39

So you were telling me that you've been working with a German TV company called the ProSieben. And you guys are doing some really interesting stuff using Lambda over there. Sounds like you're running a pretty high scale as well. So can you maybe start by telling us about yourself and what you do at the ProSieben? 


Daniele Frasca: 00:56

Sure. I’m Daniele Frasca. I joined ProSieben around two years ago. I have developed around Europe, Italy where I started, Ireland, UK, and after the Brexit referendum, have landed here in Germany ProSieben. ProSieben is one of the biggest German’s TV and reach over 45 million households in Germany, Austria, Switzerland. And at the same time, we have around 36 million unique user every month on our application market by ProSiebenSat.1. My team is syndication is part of the digital media distribution of ProSieben. And we are responsible to syndicate and distribute ProSieben format and content to other media companies and social media platforms. 


Yan Cui: 01:41

Okay, that sounds like you've got a fairly sizable user base. And I guess, if your content had to be distributed to many different platforms, I imagine there's a lot of data pipeline, maybe media conversion happening. Can you maybe just talk a little bit about how ProSieben is using serverless in that context? 


Daniele Frasca: 01:58

Okay, so ProSieben has many applications. My department is divided by two major product, the classic TV application like the web mobile as myTV, and my and the digital distributions where my team is sitting, not everybody is using serverless. But on the TV side where we had even 100 millions hit on the API and everything. We are moving bit by bit to the cluster side in a more serverless side. But I cannot tell you more about this because it's not it's not my teams. About my team, so we we are b2b platform, and our focus is availability, resilience. And we do we also take care of scalability, performance, and everything but our our team is a perfect fit for for serverless because anyway serverless shines when you look at event base model, and as a perfect integration, so without many services respect to the classic to the classic cluster.  So practically when I joined ProSieben, the team syndication was based on the three tires enterprise application inside the cluster. We had our EC2 machines, we have Elasticache cluster, Mongo cluster, we have everything. The problem was at the time the team was firefighting the system, couldn't scale, every time that we touch something that breaks something else in the in the distribute monolithic world. So we start introducing serverless, doing workshops to try to convince the team or the management mostly. And we apply this strangler pattern. And in this way, which we increment, we try to replace components of the legacy system with serverless model until in six months time, we pretty much re-wrote the entire applications. 


Yan Cui: 04:12

Okay, so you rewrote the previous distributed monolith where I guess when you say distributed monolith, did you mean that you've got lots of monoliths that are sharing the same database and then you sort of re-architected them into some more, so I guess what you would one would think of as a microservices architecture where you've got distinct bounded context and responsibilities and different microservices on their own database. 


Daniele Frasca: 04:39

Yes, yes, we have exactly the situation and all the microservices were very chatty. So to track the flow was very, very difficult. And serverless actually helped us a lot because we went, we design a function with a single responsibility, we simplified the rules. We, the flow is much, much easier once you do a combination between Step Functions, Lambdas, etc, etc, etc. So, yes, this is this is how we actually apply serverless to our, our sector. 


Yan Cui: 05:21  

Okay, I guess let me come back to Step Functions a bit later because I'm a big fan of Step Functions. And I'd love to hear about your use case there. So in this case, did you have to also migrate the database technology itself as well, because I imagine where you had the distributed monolith, and the shared database, you are maybe using some kind of relational database. And everyone's just connecting to that and updating data directly. So now they move to this microservices, this new serverless world. Are you still using a relational database? Or are you using something like DynamoDB or other databases? 


Daniele Frasca: 05:56

And yeah, the build was, so we're using DynamoDB. And the build was that we actually, we actually introduced the legacy events, right. And with these events, we migrate the monolithic data in a month time. So we did the shadow shadow of the production applications at the time. So when we started, we didn't have the need to, to migrate anything, we just switch off the old system, and switch to the new one. And yes, we move from Mongo to Dynamo. And of course, the new system has a different type of query to everything. But it didn't affect us. So.



Yan Cui: 06:45  

Okay, I see. Yeah, so you use the pattern whereby you, instead of rewriting, and then I guess, doing a big bang release, you start to update the legacy system to emit events so that you can then populate your new database with the events and then filling in the data you need. And then you create another parallel service that, I guess new microservice that does one part of what the monolith do. 


Daniele Frasca: 07:09

Exactly.


Yan Cui: 07:09

And then over time, you route the request to that feature from the monolith to the news, new microservice, and then gradually repeat that process. 


Daniele Frasca: 07:19

Exactly. We did exactly this. And this actually, I mean, people think it was very difficult. But if you think about it, it's very easy, because serverless shine with this event model. And it actually help us to, so we are the application of like every single application has its own complexity. And it was not easy. But in six months time we manage as a small team of four women we managed practically maintain the legacy system and rebuild a new one, we actually reduced the cost from 40k to 10k. And we improve the scalability of the system to up to 10,000 requests per seconds. 


Yan Cui: 08:11

Okay, so that's pretty good saving, but I guess usually cost is not the primary driver for people doing this kind of migration. Was there some kind of trigger, some kind of a problem that make you guys think, Oh, this architecture is not good enough for what we want to do. We want to… And why we want to go into something like building microservices using serverless technologies?


Daniele Frasca: 08:34

Yeah, so the history behind this, this application was maintained by many peoples, right? At the time that I joined the old developers were not there. The system couldn't handle many requests. We we didn't believe any values for the business anymore. Because we were firefighting all the time, bugs take days, weeks to fix. So we reach the point that what we had was not manageable. And, again, serverless fit 100% of our needs. So we we are a b2b business, even though with serverless you can even build a front end application and an immense scale. But for us, the serverless gives us the advantage to start to work much faster to release features will reduce the cost, as I say, and we actually, the most important things, our management was much more happy because we start delivering features for the partners and onboard the new partner  much much faster than what has been done at the time. So that there are many, many factors why serverless, why we are so happy. If we're going at a team level, we don't have much hoops, and the team now focuses on what they do best, what they are paid for coding. 

Yan Cui: 10:13

Okay, that's such a very, very similar story to what I had back in 2016 when I was at as startup, as social network startup, similar problem you guys had in terms of firefighting all the time, in terms of the application can’t scale and in terms of problems like you talked about how you took months to deliver a new feature. And we went through pretty much the same migration process like you did, going from monolith MongoDB database to microservices with their own databases in DynamoDB. And also, you know, it's similar pattern to what you did as well in terms of using events to pave that path so that we can migrate data from the legacy system to the new service and then be able to switch certain endpoints and proxy them to the new microservices is is really refreshing to hear someone's going through a very similar journey that I did before. And do you have some sort of sense in terms of how much faster the team is now compared to before? Do you have kind of metrics around how long it takes to deliver features on the monolith versus how much time it takes to deliver a similar feature nowadays?


Daniele Frasca: 11:22  

From it looks like we went from a month, two months, to weeks. In a sprint, we usually deliver two three features, if request. And by most of all, we actually don't have many bugs. And because we're starting from scratch, we manage to have like a 100% unit test coverage. We do automation tests, integration tests, we do everything by by the book. And and we actually also, for example, solve a problem in deploying before the deployments there was many manual steps. It takes like more than an hour. And now is just a click and minutes later the application is up to date. And yeah, I think, yes, we are way, way faster.


Yan Cui: 12:17  

That's really good to hear. Again, that, that sort of thing that I think this is a really good reason why someone should go serverless, just because your velocity goes up massively. There's fewer bugs, there's also a lot easier to have a really smooth CI/CD pipeline deployment just so much simpler where you've got less things you got to worry about.


Daniele Frasca: 12:42  

Yes, sorry. I think also another advantage is now the team actually has full control of the system in a way that we actually know how the flow works and how things work. In serverless, many, many people complain that serverless architecture diagrams are more complicated because you have a very detailed flow, many Lambdas in the messages. But we actually every everybody in the team now knows exactly what a Lambda does, how is the flow, what something that you couldn't have with a monolithic because monolithic is the diagram is a block that does billions of things. So these are also if you if you see this as onboarding a new team members inside the inside the team, these are the advantage that you don't have, most likely with the monolith approach.


Yan Cui: 13:41  

Yes, absolutely. Totally agree. 100% agree there. Okay, so on the flip side, what were some of the biggest challenges when it comes to transitioning from your monolith application to serverless? I guess there are some technical challenges, but then in terms of the cultural and engineering challenges with the team you had, was everyone so familiar with serverless technologies already? Or did you have to train everybody up?


Daniele Frasca: 14:08  

Yes. I already experienced like, I started five, six years ago, I don’t remember, something like this when I was London. And so practically is a cultural change. Usually your people are comfortable of what they're doing. So we have the classic discussion in the team, the management, let's keep going like this, let's try to improve what we have, but the the roadmap to do so. I mean, even if you are very good, we will never happen. If you didn't manage to improve the system before, you don't do it now. So practically what I did is many meetings workshops about serverless, the benefits, the class you can increase the agility, the lower the cost. That was very important for the management, blah, blah, blah, this kind of stuff. And I had to show by example. So so I remember the time I wrote for myself, on my own time a component. And at the time was this Step Functions, I was doing a packaging. And we were using the Step Function was was using API Gateway as a proxy, to communicate with the endpoints inside the cluster. So I have to show all of these kinds of things. Because people, there is a nature of the developers do not trust new things. And actually, the main disadvantage at the moment that I have with serverless is we cannot find people. Nobody has serverless experience. People, when they talk about the cloud, like AWS, they'll do Docker, a cluster. And that there is we this is the issue, we need to scale the team. But people don't even try to apply because there may be scared of serverless. And every day see discussing with other colleagues, that serverless is not so expensive at scale. So I did many calculation with other people, right, API Gateway Lambda. And so just API Gateway Lambda, we we could have case. So API Gateway coming with 10,000 requests per seconds. So you can ask AWS to increase the limits. So you could have like, 600,000 requests per minutes. That is unbelievable if you think something like this is coming with a ticket that you open to AWS while if you want to have a cluster, you need to have a SRE team in there, you need to have a war, a war room when people are not with people. So the total cost of ownership is much, much higher. So these are where all the things that I did, when I joined when I tried to translate the the the team from normal developing, let’s say, from Docker to serverless.


Yan Cui: 17:17  

Yep, staffing costs is one of those, one of those costs that people don't think about when they think about the total cost of ownership for a solution they have. To look at what they're paying AWS every month, that's easy to measure. And I guess the old fallacy is that what gets measured gets optimised and you forgetting the fact that you've got a team of four engineers just to be there to look after your Kubernetes cluster, because it's such a big, heavy, complex machinery and you're paying those guys 10,000 pounds a month or something like that. So your total cost of ownership is actually much, much higher than what you see on your AWS bill. And, and not to mention the fact that it takes longer to develop features. And there's more, I guess, the more complex, there's more challenge that you got to handle in terms of having them Multi-AZ, making sure you've got the right redundancy or a scaling policy as well, because even if you're running Kubernetes, you still need to worry about scaling the cluster underlying underneath.


Daniele Frasca: 18:16  

Yes. And most, most people, especially media, what they do they scale the cluster before the primetime, right. And so just beyond the hope that you will have a certain type of number of people, while with serverless, I mean, you already know that you can do, let's say, out of the box, a 600,000 requests per minutes. You can sleep, you don't need the people, they're checking the system every single time so to predict traffic. It's I think the issue is serverless is still feeling like a like a new technology. So even it's a five year so that exists. And it's the cultural change is very slow.


Yan Cui: 19:03  

Yeah, yeah. Lambda came out in 2014. So it's about the same time as when Docker went into version one. I think that was the end of 2013, when Docker became a version one. I remember, I remember when the Lambda came out, I guess it was fairly limited. And you could trigger trigger Lambda function with S3. But then when API Gateway came out about six months or nine months later, that was the big game changer. I think that opened up a lot more use cases for what you could do using Lambda. And I guess even if you're worried about the cost at scale with something like API Gateway which can get expensive there's also options for looking at ALB which you're paying for uptime, so the pricing model is different, but t high volume ALB can be a lot cheaper compared to API Gateway.


Daniele Frasca: 19:55  

Well, I, we can discuss about this. I have my opinion. And if you take the normal microservice, right, API, the compute computation database. So if you just concentrate on the endpoint, and the computation is like an ALB at the cluster, I did the calculations and 10,000 requests per second. Right? And I like it. Sorry, the calculation was 100 million request, so 100 millions hits on the API Gateway, just API Gateway will cost $10 for 100 millions and the computation of the Lambda if you think one second to to return some information that is actually quite slow was thousand. And to have that scale, I think you cannot compare that scale with the cluster. I mean, it will cost you much more, unless you start to put many things inside the cluster and you are doing an optimization in cost in this way.


Yan Cui: 21:08  

No, what I meant was ALB with Lambda, I don't mean that you have to run, you don't have to run a cluster behind ALB, you could just have ALB instead of API Gateway, but routing requests still to Lambda function so that you don't have to manage any clusters yourself.


Daniele Frasca: 21:23  

Yes. My other point you can put, you can put an API Gateway. So today, I think yesterday, there really is also API Gateway to application load balancer, if you want to keep everything private.


Yan Cui: 21:36  

Okay, well, okay. But when I did, when I did some calculation before ALB is a lot cheaper, once it gets into the 1000s of requests per second throughput compared to API Gateway, even with API Gateway, HTTP APIs, which is already quite a lot cheaper compared to API Gateway REST APIs. ALB can still be very, very cost effective when you've got, say, consistent throughput of 1000s of requests per second, if you've got spiky traffic, okay, maybe that's different. When you've got spiky traffic that happens, I don’t know, once a day, whatever you got, you got a massive spike, but then your base line traffic is still quite low, then I think that cost equation might not apply. But if you've got a really steady high throughput, then I think ALB is going to be saving you a lot of money compared to API Gateway, that is if you don't need anything, the ALB doesn't support things like custom Lambda authorizers and things like that.


Daniele Frasca: 22:32  

Yeah. So in the end, I mean, yes, you can build pretty much everything with serverless. But it's also true. You need to always to do your own test everything, everything is on feet. So most likely, when you have a very high scale application, maybe you have both, you have a hybrid, maybe even a car, you have a cluster, you have some Lambdas it always depends I mean, don't get me wrong, but I think it depends from the scale. But the majority of application out there can be done in a serverless way.


Yan Cui: 23:04  

Yeah, absolutely. And even companies like iRobot, and companies like that, which are dealing with fair amount of traffic, and Bustle is another good example. They're serving, I think, also maybe 10, or maybe more million monthly active users really on the on the user facing APIs websites. So they are handling quite a lot of traffic. And they'd have to do some slightly different, slightly, I guess, unconventional things that entails optimising the latency, but also in terms of course, as well. So they tried to often cut out as much of the things as they can so that they don't use API Gateway for everything. And sometimes they do often do Lambda to Lambda direct invocations, things like that. But yeah, if you've got a specific use case, and you've got specific concerns, then yeah, by all means, go off the beaten path. But the beaten path is good enough for maybe 90 95% of you out there. So I want to circle back to Step Functions, because like I said, I'm a big fan of a Step Function. So can you maybe just shed some light on some of the use cases you have for Step Functions and how, how it helps you implement those features?


Daniele Frasca: 24:17  

Okay, so I think before I need to explain your very high level, what is the architecture? So it's integration as a service, we are distributing packages, and the package is made of video images, submitting data to media, media companies and social platforms. So our architecture is made pretty much from five major components with an overall of 70 Lambda functions. So practically, these five components are like this. So we are receiving a notification to say, Oh, look, this needs to be delivered to partner x. And we practically get this notification from an external team. We do a query on AppSync, we get our data from there to Dynamo with, with DynamoDB stream, we check the difference. And we send a message to an SQS to the second component, the second component to practically do checks, but they need to emit an event like created or to be updated. And the end, these events are made with EventBridge that will trigger directly as Step Functions. And here we are going to the third components, these components is a, we use Step Function as orchestrator. So this is what we call the packaging system. We we call a different team for the transcoding, because we do transcoding on demand. And we do the SQS integration. So we send an SQS to this other team and the other team does the transcoding. After a few hours, whatever is the time, they will respond with success token or failure token. And we go ahead where we actually generate custom images and the metadata. So this is the first Step Function here, we also call some express a workflow in advance from different types of configurations, we need another event always with EventBridge that will trigger another Step Functions, that is the delivery system. So here we have a few steps. But and then we have the integration with AWS Batch, where practically we are downloading the video that the other teams upload in our side, the meta data images we do packaging, like we do a zip, a TAR and everything. And we are transferring this video to different service like a spare S3, whatever the customer wants to their server. And in the end, we have the latest component that is an API endpoint API Gateway, where the customer to send a feedback like, yes, we receive the package has been processed correctly, or there is an error. So these are pretty much the five components we're using. As I say the Step Function is just we're using Step Function as application orchestration. So we have many steps with the conditions and with Step Function, we have many features built in Step Function that don't need to be coded inside applications like parallel processing, roundly timeout, retry, all these little features are coming out of the box. And this actually means less code for us less maintain and less errors. Because at the time that you code, you do mistakes. Yeah, so this is why we're how and why we are using the Step Functions.


Yan Cui: 27:48  

Okay, that's cool. So you're using SQS with a task token and then you wait for the task token callback from the thing that you've just called. One thing I want to ask about here, though, is that you're using AWS Batch to download those video files, why using AWS Batch rather than just having a Lambda function?


Daniele Frasca: 28:09  

Because the download could take will go off 15 minutes, the files can go for also 100, 200 Gig and they takes time plus it is not just the loading is also zipping or doing different type of compressions. So and the let's say the delivery process could take an hour and the just because this we need AWS Batch, because once the package is transfer is not, sorry, once the package is download compressed whatever you need to transfer and also the transfer is not our by our control, it could take 20 minutes but also could take six hour and because this we need a big round process and will be pretty much impossible to keep track of this with a Lambda that has a timeout of 50 minutes so because this we have AWS Batch plus AWS Batch is coming in handy because you scale up and down as the concept of the job queue. So if we are sending a 10,000 delivery we can split them by partner we can split by by let's say the S3 service one is using a spare so the batches scale up and down. And as a prioritisation are really built in that we actually have a use because sometimes our videos much higher priority than all the other one and we need to keep everybody else started transferring before and what they have. Because we are using the batch so the batch in the end is a cluster with machines. We we configure the compute environment with spot, spot instances. And another another feature built in inside the batch is the retry. Because what if the destination server goes down? Or the connection timeout? What do you do? So instead to build the code to to handle the situations, the batch can be configured with the retry and retry retry until a success.


Yan Cui: 30:33  

Okay, gotcha, is there a reason why you don't use a Fargate? And instead use a batch because of some of the built in facilities you get in terms of the automatically scaling up and down and using spot instances instead of… but i thinkFargate supports the spot instances now as well, doesn't it?


Daniele Frasca: 30:52  

Yes, but at the time Step Functions didn't have the integration with Fargate. Now Fargate is support support in AWS Batch, but it doesn't really work yet. And there are too many limitations for our case. So we cannot even set up the YML or something. So we try without a ticket or everything we try but it didn't work. The advantage to use AWS Batch with Fargate probably is less configuration so with the CloudFormation templates, but in terms of costs are pretty much similar.


Yan Cui: 31:34  

Okay, gotcha. Thank you for that. Before the show, you also mentioned that that you guys use AppConfig. Are you using AppConfig with with AWS Batch? Are you using AppConfig with Lambda?


Daniele Frasca: 31:47  

With both, so practically, we originally have the configuration file inside the S3 bucket, and every time because our flow, and each Lambda could be customised. And I'll say, I want to send the images to you, you want to set that at the image, we are injecting the configuration at runtime. So before we used to use S3, but after we move to AppConfig for a few few reasons. One, we can do JSON schema validations, we have versioning. And we can we have deployment strategy. At the moment we are using, like the 50%, I think, the 50%, 50-50%. And we do rollback in case of problems. So we have these kind of built in features that are comfortable, I think, I mean, here we are talking about saving cents. But every time if you're using for example, S3 now every time you need to do an API call. So you're going to spend $0.000001. But with AppConfig in the hosted configuration, the access to the file is is free. And inside the code is like is like pointing to localhost. Because it's inside the the Lambda layer. So there's a little things but we we leverage the versioning and the JSON schema, this is the major reason why we opt for AppConfig.


Yan Cui: 33:33  

Okay, I see, yeah, because usually I think a lot of people use something like SSM parameter store instead of AppConfig for a lot of these configurations. But I guess there is no schema validation, no rollout strategy, which I think actually not that many people would need anyway. But a schema validation is definitely useful.


Daniele Frasca: 33:53  

Yes, but with parameter store, you have a limit of four kilobytes, I think. So while with AppConfig, the hosted configuration is 64kb. I mean, it is always there is always a limitation. Plus, with parameter store, you have a limitation on the API that you can eat up. So you need to be aware of the quotas that you have if your application is at scale.


Yan Cui: 34:22  

Okay, so if your config is really big than that, that probably makes sense. With SSM parameter store. You could you could set up advanced parameters, which can give you eight kilobytes instead of four. And you can also enable the throughput limits so that the data will go up to 1000 ops per second as opposed to 40. But it does take, it does turn SSM parameter store into a paid service where you pay like five cents per 10,000. Was it 10,000 requests or something like that? I can’t remember but it is still quite a small amount for something like this. But definitely the schema validation, I think is useful, especially when you've got a big config file, like, that can’t fit into four kilobytes.


Daniele Frasca: 35:06  

Yes. And we are still using parameter store for the secrets. So inside our config, that is the key reference. And so when so we read the so we don't store any secrets inside the config, everything is in parameter store. So we use both.


Yan Cui: 35:20  

Okay, gotcha, gotcha, that makes sense. Okay, so I guess we're coming to the end of the conversation, or the list of questions that I have. Do you have any sort of AWS wish list items that you'd like to share in case someone from AWS is listening?


Daniele Frasca: 35:37  

I think my wish list is on X-Ray. X-Ray is fantastic tool. But it doesn't really work for end to end tracing, especially on the serverless application. I mean, if you want to do real end to end tracing today, you need to use Epsagon, Lumigo, Thundra or similar. X-Ray has a major limitation with the trace. So this is one. The other one is when they,  so every time that we are talking about serverless, they actually, I hope the service actually scale at the serverless level. Sometimes you have a hard limit. And you know, at scale, there is this limit that is very low, let’s say 1000 or 5000 is not the best. For example, I'll give you the Step Function, so with AWS Batch, right, the AWS Batch, even if the even if as the job queue built in, you have 20 requests per second on the API, that doesn't make sense. So you are forced to build into the Step Function flow the retry catch the error to the to the API, the throttling error every retry. So sometimes, some service are there, but they're not serverless ready. If we intend the serverless with infinity scale, how, how they sell it. So this is my wish list on the my wish list, improving the quality of the services instead of just releasing new services every day.


Yan Cui: 37:20  

Yeah, yeah, I hear you. X-Ray sounds great on paper, but then we actually try to use it, especially when the for things like if you need to find specific errors, it is so hard to find that that one thing that that one trace you need, especially when you're using things like AppSync, where there's no there's no difference in the URL path, you can't really query into the the actual query itself. So you just look at lots of traces, but you can't find the one thing you actually need. Plus is all sampled once you get over a few requests per second. So I find it really hard to actually find useful information from X-Ray. And like you said, if you're using things like EventBridge, things like that, the X-Ray doesn’t even support it.


Daniele Frasca: 38:07  

Yeah, exactly. There is no point to release the service if they're not ready. I think AWS is doing a great job, I think most likely the team, they do their best. But I it's better to not release half-working service. I think, at this point of time.


Yan Cui: 38:28  

I don't know if it is fair to say it is a half working service depending on the use case, depending on what services you use. Some people can still find a lot of use out of it, especially if you're using a lot of containers and and you're happy to do that manual instrumentation. But for a lot of serverless applications I've worked on where EventBridge is often a pretty, pretty key component. Not being able to trace through EventBridge is a bit of a bit of a problem. And also doesn’t support things like DynamoDB, SQS. SNS I think is supported. Is it?


Daniele Frasca: 39:03  

No no, but for the trace is the same problem as EventBridge


Yan Cui: 39:08  

Okay, yeah, this is a whole bunch of problems which we think it doesn’t support, which is why nowadays I typically just use Lumigo. I guess another thing I find with X-Ray is that it tells you something called something. It tells you how long that call took. But it helps you in terms of working out, like performance issues, but it doesn't show you the request and the response. So for debugging, you still have to have loads of custom logging, just so that you see Okay, why did that call took longer than another one to figure it out if this is something specific about a request for a particular user. So things like that still end up having to do some custom logging around your code. That's where something like Lumigo and others, more specialised serverless observability platforms do a much better job of where they show you not just a fact that there was a call from Lambda function to DynamoDB but also what was the request and response. So when you got problems that are specific to one entity, you can you can more easily find out those differences compared to other successful requests that responded quickly. But, yeah. X-Ray is, I guess it's it's nicer getting started tracing solution. But it's, yeah, I find it, I almost never find it to be good enough for a lot of the applications that I worked on.

 

Daniele Frasca: 39:33  

Yeah. Another one is Cognito. Cognito as well has 1000 requests the limit. So, if in case you want to build a web application, mobile application. Unless you have traffic that is not much as a very small, medium business, fine, but for media, Cognito is far, far away to be to be ready.


Yan Cui: 40:56  

Is it because of the fact that it is quite spiky that a lot of people are signing, just before a programme starts?


Daniele Frasca: 41:03  

Yes. 


Yan Cui: 41:03  

Right, right, gocha.


Daniele Frasca: 41:04  

Because you can go from 200 active user to 50,000, 100,000. So, you never know. 


Yan Cui: 41:13  

Yeah, that's very similar to the kind of patterns that we saw when I was at DAZN. And you got, you got almost no users or less maybe a couple 100, or something like that before a football match starts and then five seconds before the match starts, it goes from 200 people to 30,000, 50,000 or whatever. It just got massive spikes in the traffic, but I think for a lot of the other applications as most e-commerce or daily just normal websites, your traffic is much more Bell pattern, then that is probably is probably enough, they don't have that massive spike in traffic. But yeah, for some of these use cases like TV and the media sometimes it’s... Yeah, you got to think about those edge edge cases when everyone just comes in at the same time.


Daniele Frasca: 41:58 

Yeah, true. Well, I think AWS will fix at some point, hopefully.


Yan Cui: 42:04 

Yeah, hopefully, hopefully. I've, I mean also they're they're quite, even though there's, there are hard limits. They're often quite open to have a per this per customer discussion around. Okay, if the hard limit doesn't work for you, what would work for you. Those hard limits, a lot, a lot of time are just a configuration on their side, so they can they can still relax hard limits for certain customers, especially around throughput. So, but yeah, so if that's you then have a have a chat with your account manager to see what's possible what AWS can sort of customise specifically for your use case.


Daniele Frasca: 42:43  

Yeah sure I will do.


Yan Cui: 42:45  

And yes, thank you so much Daniele for taking the time to talk to me today. There's some really good stuff in the discussion. Is there anything else that you'd like to share with the audience before we go, maybe, maybe if ProSieben is the hiring, anything like that?


Daniele Frasca: 43:00  

Yes, ProSieben is hiring all over the stacks, so we have like 30, 40 positions, from serverless cluster, everything is there. We are happy to pay for evaluations in Germany, from Germany, everybody can work remotely. So if anyone wants to apply you can contact me on Twitter, they can find me @dfrasca80 or find me on LinkedIn, I'm the only Daniele Frasca in Germany, so very easy to find, or go straight to the ProSieben website, many opportunities available. 


Yan Cui: 43:14  

Okay, I'll make sure those links are included in the show notes, and so anyone who wants to check out open positions at the ProSieben, they can quickly find that from the show notes. And once again, thank you so much for taking the time to talk to us today and stay safe, hopefully see you in person soon.


Daniele Frasca: 43:59  

Thank you so much. Have a nice day. 


Yan Cui: 44:01

Ciao.


Daniele Frasca: 44:02

Ciao, ciao.


Yan Cui: 44:15  

So that's it for another episode of Real World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production ready serverless applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.