Real World Serverless with theburningmonk

A podcast where we talk about real-world use of Serverless technologies from engineers who work with them day-to-day. We will discuss use cases, why they chose serverless and the pain points and challenges they face. If you want to know what it's REALLY like to work with serverless, this is the show for you.

All Episodes

Real World Serverless with theburningmonk

#61: Serverless at Slidebean

April 06, 2022 • Yan Cui • Season 1 • Episode 61

Links from the episode:

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

To learn how to build production-ready serverless applications, check out my upcoming workshops.

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:13

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today, I'm joined by Jose Enrique and Jorge Bastias from Slidebean. Hey guys, welcome to the show.

Jose Enrique Bolanos and Jorge Bastias: 00:29

Hi Yan, thank you for having us.

Yan Cui: 00:31

So I have to say I've been watching your videos on YouTube for quite some time now, and I especially loved the startup forensics series that your CEO has been pushing out. So that's what got me interested when I noticed that the Slidebean is using Lumigo and it's doing stuff with serverless. So I want to get you guys to maybe tell the audience who is Slidebean and what are your roles over there?

Jose Enrique Bolanos: 00:51

Sure. Yeah. My name is Jose Enrique Bolanos. I'm the one of the co-founders and current CTO, CTO of Slidebean. Jorge, you want to introduce yourself?

Jorge Bastias: 01:03

Yeah, I'm a developer at Slidebean. We started using serverless, like, maybe five years ago, maybe? And yeah, so we started Elastic Beanstalk. And eventually we moved to serverless.

Jose Enrique Bolanos: 01:17

Yeah, just to give you a bit more background about Slidebean, we started about eight or so years ago, around 2013 2014. And we what we set out to do was to solve the problem of people who don't have design skills when they create presentations. And so we wanted to separate the part of creating a presentation on the content side from the design part of it. So the idea, or what we set out to solve was allow people to enter what they wanted to say on their slides. And our software would automatically create the slides for them and arrange everything and design everything for them. Since then, we've pivoted to, we're more now of a startup that helps others help solve their startups. Pretty much. We help startups get started and with how to… a lot of people who don't know how to pitch to investors, how to get investment, how to get up and running with when when you're starting out, it's really complicated, right, there's a lot of different things that you need to juggle. And so we are now a startup that helps other startups. And just to give you a little bit more background about how we're using serverless. When we started out, we've always been using AWS for our hosting needs. But our application initially was very simple. It was just a front end application, and a simple back end application with a Mongo Mongo database. But initially, we were using a simple EC2 machine on AWS, pretty much we configured everything ourselves. But you know, that kind of like got us up and running for a little bit of time before we hit issues with scalability and server maintenance, which is about a very big hassle, then we move to, we were using Elastic Beanstalk to solve some of those issues that we were having by just having a single, you know, configuring EC2 machines ourselves. And Elastic Beanstalk was also helpful for a while before we kind of like hit another brick wall. And then that's where around that time when that's when the serverless solutions were starting to pick up and became more game easier to use and more available to developers. And you know, our background is more of software development. In terms of front end and back end applications. You know, we're not at that time, we were not that savvy on DevOps topics and maintenance. So severless, you know, serverless architectures and AWS, Lambdas, were pretty much a great solution for us because of the problem we were trying to solve. So yeah, it's been around eight years that we've been working at SlideBean. And around five or so that we've been using serverless architectures.

Yan Cui: 03:53

Okay, so I guess nowadays, what does your architecture look like? I guess you still have that front end. Is it hosted statically? I guess, maybe out of S3 and this Cloudfront? And what about in terms of the backend? Sounds like you still have some APIs? What about, I guess, are you using API gateway and Lambda? Anything else?

Jose Enrique Bolanos: 04:11

Yeah, so our application is distributed and as a front end application, which we use Cloudflare Cloudflare for that, and we host host it there. And our back end application has gone through several transformations. But recently, we've been leveraging GraphQL and GraphQL Federation to distribute and you know, the backend logic between different microservices. And so all of those microservices for the backend are now hosted using AWS Lambda and we use a… for the deployment we use something called Up which is Apex Up. It’s a service that we leverage for that. And, and yeah, everything is hosted in AWS through Lambda and serverless. We also have some other logic, that for example, that was very useful to solve with Step Functions, which is sometimes people need to export their presentations to other formats like a PowerPoint or PDF. And that process happens outside of the front end application. It's something that happens as a background process. And so when somebody requests an export, well, the solution that we have now is a solution based on Step Functions. And so pretty much when somebody wants to download a presentation, that step, a Step Function is triggered. And then a whole process of background process of Step Functions where, for example, we take screenshots of the presentation, then we download, process those images, and we finally build the final presentation file that we send out to the user. And that's all that everything happens using Step Function. So this is the current solution that we have for that, and it's working great so far.

Yan Cui: 05:48

Okay, so that's quite interesting. A couple of things I guess that picked my interest there. For one, you mentioned that you're doing GraphQL Federation, and you're connecting your GraphQL API to several microservices. I imagine those are the REST APIs sitting behind API gateway and Lambda. Have you guys looked into using something like AppSync instead? Because I guess when you first started, maybe five years ago, AppSync wasn't really a thing. But nowadays, you do have this managed GraphQL capabilities you get out of AppSync. Is that something that you guys would ever consider, or because of the fact that there's no Federation support there that, you know, you see yourself are still sticking with your current approach?

Jose Enrique Bolanos: 06:30

Yeah, so we, this Federation, we're leveraging GraphQL Federation recently. In the maybe three months back, we started using it and we split our main GraphQL application into several different graphs, leveraging Federation, but so we're kind of new to Federation. So we haven't, you know, consider any more changes than that. But definitely, we could look into into using that in the future for sure.

Yan Cui: 06:55

Okay, so what about in terms of AppSync? Would you guys consider using AppSync instead of running GraphQL in your own code? Because I imagine you're running maybe something like Polo server inside a Lambda function behind API gateway. Is that correct?

Jose Enrique Bolanos: 07:11

Yeah, yeah, exactly. That's what we're using right now. Honestly, we're not… I haven't dived too much into AppSync, or I don't know if you are aware of it. But yeah, we…

Jorge Bastias: 07:23

A little bit, I saw actually yesterday, a thing, AppSync studio yesterday, after seeing, there was a webinar about that I need to look at. But yeah, we're not using it yet. We have looked at it before.

Jose Enrique Bolanos: 07:39

So yeah, pretty much we have our… pretty much we had a… we used to have a single endpoint for our GraphQL. And now we have a gateway and some servers that are connected to it underneath. But we're we've all orchestrated everything ourselves, so to speak with using leveraging API gateway.

Yan Cui: 07:56

Okay, so maybe going back to when you guys submitted decision to move from, to move from using Elastic Beanstalk to using Lambda. So after you've made a decision, have you noticed any sort of changes in terms of how fast your team is able to deliver new features, or maybe your cost for running your architecture in AWS, any sort of, I guess, the business values that came from that decision to migrate to using serverless?

Jose Enrique Bolanos: 08:24

Yeah, I can speak to maybe the the decision to go from, for example, Elastic Beanstalk, where we had to manage scaling and those kinds of topics ourselves to just move into two Lambdas. The main business advantage for us is that we don't have to have to worry about scaling pretty much, it was something that we're not, we don't have that much expertise on. So being able to allow pretty much AWS and the way you know, because there, those are serverless functions, we pretty much don't have to worry about it. I remember, we used to have issues where we were not sure exactly when to, you know, scale up or scale down. Because our app, the nature of our application doesn't have any moments where we know, okay, like, for example, if it was something time sensitive related application, like for example, if it was a game for let's say it was an application that kind of like something happens after a soccer game ends for example, so we know that after that game ends, we need to scale up because there's going to be a much of… it's going to be a traffic spike. That's not the case with our applications, we're not really sure how to set out those rules to scale up or down. And with serverless functions, it's pretty much you know that problem probably kind of goes away because, you know, the architecture handle handles it by itself. So I think that's the main advantage of not having to handle servers since we're a very small team. And we don't have we didn't have that much expertise on on DevOps and things like that. And then we also, you know, the problem that we were trying to solve with exporting the presentations, Step Function was really useful to kind of be able to trace what's going on with requests because before we used to have a different pretty much we had endpoints, where one would call the next and things like that. But it was a very manual process that our code did. And now with Step Functions, it's really, it's much easier to handle a very complicated workflow and orchestrating everything. Yeah. So Jorge, maybe you can talk about that.

Jorge Bastias: 10:20

Yeah, well, the thing that we started using EC2 server using actually PHP to do the exports. And then, you know, we started looking into to the Lambda functions. And it was really difficult for us at first, because we started using the actual API, AWS API for Lambdas. And it was really difficult until we ran into the serverless framework. So that's what we that's we started using Lambda functions for that. And then we had to use Step Functions to communicate them a bit better. And again, that was kind of difficult for us at the beginning, you know, now they have a new version of Step Function, the the Studio, which is a lot easier to do recently. We are still using this serverless framework for that. And you know, like, it's a great solution for that particular use case, we tried to use Step Functions for another solution that were privileged solution called, called monthly services, that solution, but that solution didn't fit our use case for Step Function, because we have a lot of ways to enter the system and to exit the system. Step Functions only has one starting point and ending point. So those are the kinds of problems that we had just in general. It’s a complicated environment for us, right? Yeah. So that's what we that's why we're using Step Functions for that particular use case when we just moved to use it, we're using SQS between Lambdas in that other solution, monthly services. And we are we still having some issues with that. But we're using Lumigo to find those issues and fix them.

Yan Cui: 12:08

Yeah, I think that's interesting. Because like you said, having multiple entry points to the same workflow can be a bit tricky. I have, I guess, with Step Functions, you tend to have that one entry point. But I think with exit you could have different terminal states. So the states with the “end” is true. So I think that should have been, I would have thought that's something you can do. But obviously, I'm not familiar with your particular use case. So I'm sure you guys have considered that already. But what I guess I do want to ask about your Step Functions is that one of the things that people have often talked about with Step Functions is just how difficult it is to test your state machine, especially when you've got a really complicated graph. So they recently announced this support for doing some local mocking with the Step Functions local. Is that something you guys have considered, have looked into? And is that a problem you guys have run into in terms of how do I test my Step Function’s state machine?

Jorge Bastias: 13:04

Yeah, actually, one of the things that we've tried a bunch of things, we tried local stack, I think that's the solution. And then we decided to use a bunch of serverless plugins in the serverless framework, some of them kind of work. And some of them don't, for instance, we use Kinesis, in one of our solutions, and one of those plugins just stopped working, we had to do some dissolution, because we have to change the code press to actually run it locally. In the end, we just stopped using it. And I saw a recent webinar about that in Lumigo. It was a serverless framework webinar, he said, The best way to do that is having an account, a test account or a development account, you can make your changes, update the solution up to AWS and test it there. Because there's so many, it's not a good solution for us right now. I also looked into SAM, but we're, you know, we haven't had a chance to look at into that very much yet. But yeah, it's it's difficult for us to do local testing currently.

Yan Cui: 14:15

Yeah, I think local simulation is always difficult. And that's why most people will tell you don't bother with it. Like a local stack. I've had so many problems with local stack myself. And I've tried local DynamoDB local Step Functions, or Step Functions local. They all kind of work to some degree, but it's never quite 100%. And I've seen a lot of teams spend like a week setting things up with local stack and then one day it just breaks and then you spend like another week trying to fix it. Whereas they're just much easier to say like you said, create a temporary accounts or have the same account, but create temporary environments. Because you can just say with a serverless framework, you can deploy to a new stage called, I don’t know, local test, or something like that. And then you can just run your function locally, but against the real AWS services, things like DynamoDB and things like that. But for things like Step Functions is a bit more tricky just because the whole orchestration, the whole engine is running in AWS. So for things like that, I think it's a little bit harder to move away from trying to simulate something locally to... I think what it is doing now with Step Functions local, I think it is a step forward. But still it’s it’s not as smooth as you would like. So I guess besides because you talked about, you know, when you made a transition from Elastic Beanstalk to Lambda, it was quite difficult. So imagine testing was one of the big issues that sounds like you're still looking for the right solution for your team. Were there any other challenges that you had to sort of find your way around when you made that transition from, from running machines to running Lambda functions?

Jorge Bastias: 15:49

There's limits that we ran into, that will not vary. We didn't know those limits. For instance, there's, I think, I think it's seven gigabytes of data in all of the environments of Lambda. So we're using the serverless framework, we're just, you know, everything will update. And it's our payload is very big, or was very big. So we're running out of space in AWS. As a matter of fact, we solve that using one of your utilities to move unused versions of the Lambdas. So that's one thing. The other thing is we were also using this is we're not using it anymore, but we had another solution to import presentations into the system. And so we're using I think it was OpenOffice and OpenOffice, you know, we had to have that in the environment temporary. In the end, we were using theirs, because eventually we solved it using layers. But at first we're just uploading the binaries into the system. And so we're running into problems with the limit of the actual Lambda itself, I believe it's 512 megabytes that you can use. And so we're running out of space, it's it's difficult to get into those issues. At least we solved it for now. We have other issues right now.

Yan Cui: 17:14

Yeah. So I guess, when you come into that the Lambda from traditional, more traditional development paradigms, one of the first things you run into are all these different limits. You never think to look for them until you run into it. But I think the good thing is that AWS does publish the limit on the documentations page. So once you hit this limit a few times you do learn to look at a service you're using, check the limits, and then see what you're doing whether or not it fits into those limits. So I think over time, that's second nature to me nowadays. Whenever I'm looking into doing something, I'm always checking the limit to see what I'm trying to do if it fits within the limits of that service. And most things has got a soft limit for the things that you commonly want to change. But we've learned that they still got some hard limits. And the 75 gig limit that you mentioned, that's the soft limit now, so you can you can raise that. And you can also use the stuff that I pushed out, open source tools that I pushed out so that you can delete any old versions of your functions that you no longer use. That's going to help you keep you within that 75 gig code storage limit as well. What about in terms of the operational side of things, like you said, Jose, you've got a small team, and you're coming from more traditional front end and back end development backgrounds. Guess learning must be a big challenge for you guys. Was there anything that… What do you guys do to help everyone learn the latest and greatest practices when it comes to Lambda? And how do you still keep up to date?

Jose Enrique Bolanos: 18:39

Well, I was just going to briefly mentioned that the… when you said that, you know you… I just wanted to quickly mention that the AWS documentation, sometimes it could be very obtuse in my opinion, sometimes we want to learn how to use certain technologies. And just by following the AWS official documentation, I think it's it's not enough. It's really tricky. So we've, Jorge has been, you know, leveraging all the resources for sure. And you know, sometimes we've had issues sometimes even in production, but when we're sometimes we run into these limits, and then we have to go and understand what's what's going on, because the documentation is hard to navigate. I, I would I would say but yeah. Jorge, I don't know if you have more about how you went about to learn all these technologies.

Jorge Bastias: 19:26

Yeah, one of the things actually, I come from Windows development way back when I didn't even have an AWS account before I joined Slidebean. So I didn't know anything about AWS. And one of the things that we've been talking about over the last few years is that we need to learn AWS platform better. So actually, I did my first certification for AWS, the first one to certified Cloud Practitioner, and we're probably going to be moving on to other certifications for that I mean, certifications, kind of useful. AWS has so many services, it's very, very difficult for us. Like I said, we're a small team, the serverless framework is a great resource for us and Apex Up because we actually run an express server that hosts our GraphQL services up there, you know, you just need to continue learning. Yeah, it's a continous process of learning and learning more.

Jose Enrique Bolanos: 20:26

Yeah. And I was just going to quickly talk about maybe our challenges with visibility. So it's hard to, I mean, just by using AWS and going through those those logs with CloudWatch and whatnot, it's, I think it's not the best experience. So that's why we've been looking for other solutions like Lumigo, to really, you know, we've had to learn how to prenow and kind of like publish those logs out to Lumigo. So that we can make them useful because sometimes when we want to debug something in production, if there's a customer that's saying that something is going on, and we want to look at those logs, that's something that we have to have to learn the hard way. And it's a bit difficult to navigate just by using AWS, I would say. So

Jorge Bastias: 21:09

One of the things that we started project last year was we needed better observability for our current solution, which is monthly services. And like I said, we ran into Lumigo by mistake. And as a matter of fact, I think it was a newsletter that you published, and Jose saw it. And he said, you know, maybe you should use look into that. And that's why we're using Lumigo, we're only using Lumigo for that current solution. Last year, like I said, we were looking for better observability. And we started using Datadog for that. And the first thing that we ran into is we're started shipping our logs to Datadog, our first builds were more than our AWS costs, and we had to run and see how to reduce our logs. Now we're only sending our error logs. Right now we're probably going to moving, cutting Datadog off and moving 100% to Lumigo because it's a better pool for us. There's an Issues page, and it tells you exactly what's happening. And it's a great summary page for us. And that's the idea that we're currently going to do this this year.

Yan Cui: 22:21

Yeah, you know what, that's a very common story I've heard so many times. There is a cost for using things like Datadog. But then there's also additional costs for the API calls you have to call to, Datadog have to make to forward all of your logs and all of your metrics. And I still think, you know, I find debugging problem using metrics and logs, I think it’s just very labor intensive. And like you said, the Issues page of Lumigo is just so useful. I just go there. And everything's already been sort of categorised by the error type and function. I just have to go in and have a look. I don't have to make any decisions. I don't have to look for anything. I mean, for me, at least, it's just been really, really helpful. Especially, you know, I'm the I'm the one main back end team for a few of my clients. So I need to be very efficient and finding problems quickly. And I'm glad to hear that you guys are finding something similar as well. And also like one thing I also find myself doing more and more with Lumigo is I don't have to write many logs anymore. Because Lumigo just captures most things that I need out of the box. So before, like you, I was writing a lot of trace logs. I had to think about sampling my log so that I don't I don't write all my debug logs in production. Yeah, nowadays, I just don't have to worry about that anymore. Because I just don't have to write that many logs. I only write logs when there's something that's, you know, I don't see in Lumigo. Things like I'm doing loads of computation in the in the code. There's some business logic as complicated. So I write some trace logs there. But other than that, I don't have to do a lot of common logs you do like console dot log event, JSON stringify dot event. I see so many people do that, just so that they can capture their invocation events. And they can go back to to later when they to debug something. Yeah, I'm really happy that you guys are enjoying Lumigo. It’s, I do think it's by far the best tool for applications where you're primarily working with Lambda and serverless components.

Jorge Bastias: 24:16

Just one thing, I have to say Datadog’s support is horrible. And Lumigo, their support people are so much better. It's it's night and day, you know, sometimes in Datadog it took days and days just to get a response. I mean, that whole thing just, you know, we had a long, really long, long issue with them. And you know, really, it was just so much better for support.

Yan Cui: 24:42

That’s really good to hear. I'll tell the guys. I'm sure they will appreciate that. I guess. What's next for Slidebean? What's the next evolution of your architecture that you foresee?

Jose Enrique Bolanos: 24:52

Yeah, so right now, like I said, we're in that transition where before we were just a presentation application for people to create a presentation online. And now we are trying and to cater to other startups and help them out in any way we can to get up and running. So we used to have a bunch of services that were not integrated into our application that we were being we were handling pretty much manually and other teams interact in your company, we're handling like, for example, we would help, we try to connect startups with investors, or we would allow them to download document templates, like, say, a financial model to get them up and running. And we're now in the process of consolidating all of those services into a single platform. So that our main application will no longer be just an application, sorry, a presentation tool. It will encompass everything that we offer into a single dashboard. So that's where we're trying to merge everything and offer everything under a single dashboard for all of our users. And so one of the first steps into that change was, of course, the front end will change. But we wanted to split, like I said, our GraphQL backend into smaller smaller chunks of that of subgraphs. So that we can… our teams can work independently into, let's say, somebody is just working on a, for example, we have startup lessons, where people can see videos that we create about startup topics, and we're going to be… We publish those on YouTube, but we want to be able to host them on our main dashboard. And so we want, for example, to have a single a subgraph, just for those startup lessons. And so that merging all the things that we offer, under under a single hood, you know, comes from challenges like splitting our back end, and also joining everything on the front end. And yeah, I guess that's our main what we're working on the most right now. So architecturally is not going to change that much other than with the backend split. But we also have this other application called recurring, which is a.. Jorge was talking about it, recurring is something like monthly services. But this is a, that's the codename for the for the project. But we essentially allow people to track their… we allow startups to track their expenses. So people, we give them a unique address, and people can just forward all of their invoices to that address, and we compile them under a dashboard. And that also, you know that our current solution uses SQS under the hood to process all those files and pretty much receive a file and the output is to have that logically added to our database. So yeah, just improvements in technology. We're leveraging SQS because we can do batches, we can do different things that we can also do with Step Functions. And also, we have multiple points of of entry for workflow, we're probably going to be improving that solution over this year as well, maybe, you know, SQS has worked well for us at the moment. But we, sometimes we have issues. In Lumigo it is good to spot those issues that happens in those Lambda functions. But we probably want to evolve that architecture into the next step, which is not clear right now what that is. For now, SQS is working well, but we might examine a different solution in in the future.

Yan Cui: 28:17

Okay, well, best of luck. And sounds like you've got the quality of interesting challenges ahead of you. So I'm going to put the links to recurring as well as Slidebean in the show notes for anyone who wants to check it out. I guess recurring kind of sounds interesting to me, because I run a small, one man consulting company. I've got lots of expenses that I need to keep track of as well. So definitely I’ll check it out myself. I want to thank you guys, again, for taking the time to come on this podcast and sharing your experience. It's always good to hear how people's journey started and how it's going when it comes to serverless.

Jose Enrique Bolanos: 28:51

Yeah, thank you so much for having us. And yeah, it's been a great ride and journey. You know, just I would say that if everybody's having the problems that we were having serverless is definitely a great solution to to all those problems. So thank you for having us. And please try out Slidebean and recurring.

Yan Cui: 29:09

Take it easy guys. Ok. Bye bye.

Jose Enrique Bolanos and Jorge Bastias: 29:10

Bye bye. Thank you. Thank you very much.

Yan Cui: 29:26

So that's it for another episode of Real World Serverless. To access the show notes, please go to realworldserverless.com. If you want to learn how to build production ready Serverless Applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.