Real World Serverless with theburningmonk

#40: Serverless at Space Ape Games with Louis McCormack

December 09, 2020 Yan Cui Season 1 Episode 40
Real World Serverless with theburningmonk
#40: Serverless at Space Ape Games with Louis McCormack
Show Notes Transcript

You can find Louis on Twitter as @louism517 and on LinkedIn here.

See open positions at Space Ape Games here and

To learn how to build production-ready Serverless applications, go to

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod

Yan Cui: 00:12  

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today, I'm joined by a good friend of mine, Louis McCormack. Hey, man.

Louis McCormack: 00:25  

Yeah. Hi Yan, how are you doing?

Yan Cui: 00:27  

So Louis, we've known each other for a while. And I've really enjoyed our time working together at this Space Ape. And for the audience who haven't heard about Space Ape. Can you tell us a bit about the company and what you do there?

Louis McCormack: 00:40  

Yeah, sure. So, my name is Louis McCormack, and I am a DevOps engineer at a company called Space Ape Games. We are a midsize mobile games company based in central London. We are a little over eight years old. And I've been there for coming up to seven of those eight years. In that time, we've put four titles live. And we've got another handful of games that we're very excited about that are getting increasingly close to being put live. Our tech stack we have, we're a games company, mobile games company. So we have game clients that run on mobile devices, typically people's mobile phones. And they're written in Unity. And they talk to some back end infrastructure, which takes the form of game servers written almost exclusively in Scala. And they run on Fargate on AWS. So So notably, for this podcast, we don't use Lambdas to serve game traffic. And that is mainly because, as mentioned, our back end services are written in Scala. Scala, for those that don't know, is like a functional skin on top of Java. And there are sort of some well known issues with running JVM based applications in Lambdas. It's difficult to get away from that cold start time when a process has to deal with the JVM and load lots of stuff into memory. That's not to say we don't run any Scala Lambdas. We, we absolutely do. They just tend not to be user facing sort of back end managing management type stuff, and data processing jobs. But despite us not running our game services on Lambda, we are still big fans of serverless. And particularly from a DevOps standpoint. And that's, I suppose, in large part what I’m here to talk about today.

Yan Cui: 02:32  

Yeah, I think the whole serverless adoption at the Space Ape is quite interesting and quite different from a lot of other companies, where a lot of other companies I've spoken to on this podcast, you know, the drive to push for serverless is coming from the development team where they want to, you know, get away from having to manage servers from having to patch the OS and all of that and just write business logic. But like you said, in the Space Ape, because everything is written in Scala, there's a very rich toolset of things the guys have done, some really clever things which, you know, hopefully, maybe at some point that, you know, we can talk about that. And but it also means that we can't really feasibly run Scala, for the user facing stuff in Lambda, because of the cold start. Both because of the traffic, there's fairly high usage on those APIs. And some of them are even, you know, serving, I don't know, I guess, the game that me and Louis worked on has been cancelled. So we don't do this, the socket based real time multiplayer anymore. But still, there's a lot of work that Louis and his team in Space Ape was doing using Lambda and also some use cases around machine learning, if I remember correctly. From that perspective, Louis, maybe can you talk about some of the things you guys are doing from the DevOps angle? And how serverless and Lambda has really helped you guys there?

Louis McCormack: 03:54  

Yeah, sure. So we use we use serverless, for many, many use cases. And I think, I think for me, the real enabler in AWS, the real sort of clincher, is actually something that doesn't get much praise, I don't think and that is CloudWatch events. Now you can configure a CloudWatch event to be triggered by any single action that occurs throughout your entire AWS state, which means you can respond to anything that happens in any of your AWS accounts. And the response might take different forms. If, if the thing that triggered the event was a security concern, then it might send an alert, it might take some remedial action to prevent an engineer having to get out of bed at three o'clock in the morning, for instance. Or it might be sort of a functional response. For instance, we have a Lambda that gets triggered whenever whenever a CloudWatch logs group is created. And it comes into life and it subscribes that CloudWatch log groups to a set of predefined subscriptions. And so this being able to respond to events is a very powerful thing. However, I think I think there's, certainly for us Space Ape, certainly for some other operations teams that I spoke to in the past, there's, there's one particular use case that kind of acts as a springboard or a gateway drug into the into the world of serverless. And that that, for us, at least, was to use it as a sort of a cron replacement. A bit of context about that back in the back in the pre-cloud days, it was it was quite common for for ops teams to, for there to be this kind of mysterious opaque server. And if you're lucky in the corner of the server, and if you're unlucky, it's underneath somebody's desk. And legend has it that the cleaner came along and accidentally turned it off one day and all hell broke loose. And over time, it became a dumping ground for sort of random background tasks and cron jobs and mysterious demons. And you know, the people who knew what those things are have left the company. And it's at a point where you know, this, no one really knows what this box does only that you can never turn it off or your your whole business will come crumbling down around you. And then cloud came along, and we maybe we move this box from underneath somebody's desk to an EC2 instance, and probably the same sort of pattern emerged. And now I guess you're in no danger of probably no danger, the cleaner accidentally turning it off. But you do know, the thing about EC2 instances, like death and taxes is that at some point, they will get terminated. So arguably, you're in a worse position. And, of course, good ops teams probably didn't do this. And they found ways to manage this. I'm not pretending this is a pathology of ops engineers or anything. But I do recommend if you talk to any ops engineer with a long enough career, there was a sort of smile funnily. And they'll know what I'm talking about. But you might wonder where I’m going with this, but serverless does help us out with this because it, it came along, and it kind of it dawned on us that, hang on a minute, you don't need a mysterious box running cron jobs like a glorified scheduler, you can use Lambdas as a glorified scheduler, you know, with scheduled events to do the same thing. And obviously, you no longer run it underneath somebody's desk, or an EC2 instance that will go away, you run it on on an infrastructure that would just continue to work day after day after day. Of course, you could, you could just do the same thing. Instead of creating a cobbling together a random cron job at three o'clock in the morning, you could cobble together a random Lambda and never tell anyone about it, and then you're probably in a worse position. But I think the thing with Lambdas is they force you to have a bit more process around things because they kind of force you. The old cron job 3am scripts would generally be bash scripts, and it... Lambdas sort of force you to use a proper programming language, sorry to any bash fans out there. And I'm sure you can run bash on Lambda, I've never tried that. But generally, you're forced to use a proper programming language. And this means you'll probably use proper development constructs like a separate code repo, tests and documentation and the whole thing became a little bit more visible and known. And, and I, the point I'm trying to make is I think this kind of glorified scheduler, use case was a was a stepping stone, and kind of opened our eyes to the other possibilities that that serverless brings.

Yan Cui: 08:27  

Yeah, I remember you guys went from using bash for a lot of things to writing a lot of stuff in GO around that time, when you were moving a lot of the, I guess, cron jobs, from EC2 instances to Lambda. And I guess you guys are also doing some really interesting stuff around security and stuff as well. Because, you know, as anyone who's made a popular mobile game, or any kind of game would tell you that someone's gonna try to hack it. Someone is going to try to do something and and, you know, being able to react to various events happening in your AWS ecosystem, using CloudWatch events, or I guess, nowadays, you can probably call it Eventbridge is also quite a powerful thing as well. And I think you touched on that earlier as well, didn't you? You mentioned some of the event driven stuff you're doing whenever something happens in your AWS account, and you run some Lambda to do automated remediation?

Louis McCormack: 09:19  

Yes. And you're absolutely right, for some reason. Hackers feel that they got an open season on games, they sort of try and hack it for fun. So yeah, we have to be we have to be very secure. And yeah, one of the sweet, we use CloudWatch events and serverless more generally, to help us manage security. I mentioned earlier, yeah, you can have, you can have a Lambda which triggers, for instance, we have an alert which is triggered whenever anyone logs in as root to any of, to the AWS console of any of our accounts. We have an alert Which is triggered, it sends an alert to a Slack channel. And we are asked to find the person who's logging in as root and ask them why they're doing that. Occasionally, there's good reason for it, of course, but you know, if we don't find the person who's logged in as root, which has never happened, by the way, we could have a problem. But that's a kind of almost a trivial example of how we use CloudWatch events for security monitoring. More generally, we use AWS Config Rules. And these are, these are a way to represent policy as code, which is a sort of a movement that's gaining traction. And policy isn't just security policy, it could be things like naming standards, or tagging policy, but we use it generally to to, to enforce security policy. And the way that Config Rules work are that they can be configured to trigger a Lambda to run whenever certain objects in your state are changed. For instance, we have a config rule that is triggered whenever any changes made to any EC2 security group throughout our state. And what happens is this Lambda is triggered. And it just sort of pauses the change that has just been made. And it just sort of identifies if the change that has been made could lead to security issue. For instance, if someone had just opened up a pool to the internet, then Config Rules would pick that up. The Lambda would decide that that's something that probably shouldn't be done. And it would again notify us the DevOps team. And we then... it's our job to go and find out who made this change. And why they made this change. 99% of the time, it's somebody just spinning up a proof of concept of some new technology, perhaps they're working at home, and they just wanted to open up the the admin console of this proof of concept, which is which is okay, as long as it's only a short term thing, and we just make sure that it is a short term thing. But without Config Rules to help us there, we would be utterly none the wiser, that this this thing had happened. And that's just a single example. We have many more Config Rules. And it's just we found it's an incredibly powerful thing to be able to respond to events in this way.

Yan Cui: 12:29  

Yeah, that kind of underpins the whole DevSecOps movement. That's been, I guess, getting a lot momentum, the last 12 months or so. And I think that's definitely something I'm seeing more and more now, especially in the in the enterprise space, where people are using Lambda with AWS Config to, like you said, provide the governance and to enforce consistent naming conventions and the security practices to everyone who's deploying stuff into their AWS environment. And I think that's a really good direction for the industry to be going rather than having gatekeeping, where you've got one team that have to basically double check everything that everyone is trying to deploy, to having that team being able to to just say, you know what, we're gonna define a bunch of guardrails and then rules and let you do whatever so along whatever you're doing falls within the boundary of what we define in terms of the best practices in terms of the rules that we configure. And I think this is definitely a nice welcome, change of pace, certainly in that particular space. And there was also one other use case that we'd worked on together when we were at the Space Ape was that we were using Lambda to do some load testing against the this multiplayer online, a MOBA that we were building, right? Is that a thing you guys are still using Lambda to do essentially load testing?

Louis McCormack: 13:49  

Yea, absolutely is. Yeah, this is, this is probably my favourite thing that we use Lambdas for. And it's still, Yan, you'll be pleased to know, we're still using the same system we developed when you when you were with us as well. 

Yan Cui: 14:01  


Louis McCormack: 14:02  

So yeah, load testing is extremely important to us. As we ramp up to a global launch, we need to be sure that our systems can handle the load which we expect them to receive. We may be going from zero or very few players to like hundreds of thousands or even millions of currently connected players. And we need to make sure, we need to have as much confidence as we can that our systems are not going to just just fall over, there’s some well, well publicised examples of games suddenly receiving a lot of load and their servers not being able to cope and we don't want to be another example. So the way that we do that is to, we generate artificial load through through load testing. And we have a fairly elaborate sort of load testing infrastructure that Yan helped us build. And it's it's built using serverless constructs and the basic premise is that with a single request against an API gateway endpoint, we are able to spin up an army of Lambdas each run in a load test client and they'll sort of focus focus like a tractor beam on a given endpoint, and run a load test through. And once it's complete, they report metrics back through SQS queue, and they eventually percolate through to Dynamo and the results are collated and we can look at them in a, like a nice web UI. A load testing really has two phases. The first is, is driven by Jenkins, when when a game server is built, a load test client is built alongside it on every single build, that's how important it is to us. And then Jenkins will actually spin up a version of that game server and expose an endpoint. It will then call into our load test infrastructure, ask for 100 or 200 Lambdas to generate some load. They'll point the load at that game service. Jenkins itself collates the results. And it uses them just to confirm that performance hasn't, performance hasn't regressed, we haven't made anything worse in that code change. So we can sort of be good guaranteed that our services are as performant as they ever were, at least. But the second phase is the more exciting phase, especially for me, that's as we ramp up to two big launches, we run what's called a large scale load test. And for this, we we build an entire production like environment. And be honest, nine times out of 10, we just use production if it's before the game has launched, and then we just wipe the tables afterwards. But we then we then use the same process, we call into our load test infrastructure, ask for hundreds, perhaps even thousands of Lambdas. They generate some load. The problem that we ran into here is we sort of ran into some of the limitations of Lambdas at this point. The the first one is, the obvious one, they're limited to 15 minutes. We have load tests which run for, well, most of them run for at least an hour. Some even run for days. We run a thing called a soak test. It is kind of like we'd allow any problems that might arise from running a system over time. So the 15 minutes is is a problem. And also Lambdas are limited in processing power. This was true right up until yesterday when they announced at re:Invent that, I think,is it 10 GB and 6 cores Lambdas are a thing now?

Yan Cui: 17:34  

Yep, that's right, 10 GB.

Louis McCormack: 17:36  

Yeah, so that's, yeah, they're not so limited in processing power anymore. But they were when we designed this system. So to generate the equivalent load of, say, a million currently connected players, we needed something a bit beefier. And so we, we turned to Fargate. Actually, we're already using Fargate to run our services. So it seemed a natural fit. And it's the same process, exactly the same Lambdas are launched. But instead of running the load test client through to completion, there's a there's a flag that they see has been set, and they move into Fargate tasks. They don't they don't really move, they just launch a Fargate task and then and then terminate themselves. But the net result is the same, we end up with hundreds of thousands of very large Fargate tasks generating an incredible amount of load. And it's phenomenal. Actually, we've never run around of these large scale low tests, and not found a problem, a problem that would have caused this considerable pain when we launched. It's, it's, it's a very, very, very useful thing for us.

Yan Cui: 18:42  

Sounds like the system has progressed quite a bit since I was there and also gotten a lot more sophisticated, which is, I guest, Kudos you guys. I guess maybe for for people that are listening and who are maybe not familiar with the specific context, the one thing they're gonna be thinking about right now is why do you build your own load test client? Why not just use something like JMeter or the whole bunch of different, I guess, the off-the-shelf solutions? Why do you guys have to do all of this work yourself? 

Louis McCormack: 19:13  

Yeah, that's that's a good question that if we were running a website or something like that, then yes, we could use JMeter or any of the other load testing tools that are out there. But our game code is very, very complicated. It sort of, it has to be, there's a, there's a, there's a lot that goes into making these games and a single HTTP request response is just not going to be able to trouble or to test all of the different code paths. So we need these load test clients to to be sort of consciously developed. A lot of thought goes into them as well to make sure that we do cover all the different eventualities. Yeah, I think, does that answer your question?

Yan Cui: 20:02  

Yeah, I guess. And I also guess that they don't really speak a lot of the more standard application protocols. I remember that was a lot of a proprietary application level protocols that was being used. Because to get the most efficient the bandwidth use or the most efficient, I guess, to get the most performance, there was a lot of custom stuff being done in there, at least certainly on the on the mobile that we worked on. There was a whole custom reliable UDP protocol. On top of that a custom application level protocol is just so that we can as tightly squeeze the bytes into a packet as we can. So, you know, we couldn't quite replicate the same, even just network layer transport layer implementations into one of the standard load test clients you can find out there. I guess that still kind of true in the other games you guys have developed, right? 

Louis McCormack: 20:53  

Yeah, you're absolutely right. There's a lot of proprietary protocols, multiplayer aside which is the thing that you talk about. We use, we use Protobuf as a sort of a protocol delivery mechanism. And within that we have we have a lot, you know, we've been around for eight years, we've built up a lot of tech that makes these games possible. And there's a lot of kind of cross compilation between languages and a lot of shared model type type stuff. So yeah, going back to the original thing, a simpler tool like like Jmetre, is just not gonna cut it.

Yan Cui: 21:32  

Yeah, I remember some of the more crazier things that there was done there, things like, not just using Protobuf, but also having like a tier on top of Protobuf that gives you generics and things like that which you don't have in the raw Protocol Buffer protocol. So that was some stuff that Andy did. I remember, that was some pretty next level stuff that he was he was doing. So again, those, I guess, those kind of things that makes it difficult, or maybe impossible to use a more off-the-shelf solution for when it comes to load testing. But really interesting see how you guys are mixing Fargate with Lambda in this case. I guess this is just basically a toggle so that you decide when to run the load test in Lambda vs when to run them in Fargate?

Louis McCormack: 22:18  

It is, yes. And I guess that's another thing that probably wasn't around when you worked for Space Ape was Fargate. If it was it was it was probably prohibitively expensive or not quite usable. But we, yeah, we've, we're big fans of Fargate now, not just in load testing, but as I mentioned, we we run all of our services on it for us. And the real benefit of, as we see it, is is obvious, but you don't have to manage the cluster that your containers are running on. And I know that Amazon are billing it, AWS bill it as serverless, but I wondered what your thoughts on that were,

Yan Cui: 23:00  

I don't really care too much about whether or not something is labelled as serverless. I really just care about not having to do stuff. So if I don't have to, and Fargate lets me run containers without having to worry about provisioning the underlying cluster and do a lot of the management stuff. So for me, I mean, it gets a lot of the benefits you get with serverless. But one thing is probably missing is something similar to what Google has with Cloud Run where you've got an event trigger, which is something that the Fargate doesn't have right now. And hopefully, you know, maybe there's still time in the re:Invent this year, so maybe they will find time to actually squeeze in. And that will be a big plus for people that are using Fargate. Actually another one that I’m seeing more and more people use as serverless container, instead of Fargate is CodeBuild, which also lets you trigger and run some task on a container without having to provision underlying cluster. And the difference is that is triggered by some kind of event. And also, unless you use a bigger instance at Fargate. So Fargate has, I think the maximum instance you can have on Fargate is quite a bit smaller compared to what you can have on the CodeBuild. So we also see now people using CodeBuild, which is if you're new to it, it's basically something that lets you where it's like a CI tool that lets you build your code and deploy them on a container using container image, but then you can also just use that to run any container image on some trigger that you can use to trigger a CodeBuild task. So yeah, that's also kind of interesting.

Louis McCormack: 24:41  

That is interesting. Yeah, I didn't know that. I didn't know that people were using CodeBuild for things outside of outside of build pipelines. We do use CodeBuild ourselves in the build pipeline, but we hadn't quite realised that you could use it elsewhere as well. I think the maximum size of the Fargate task is, is 8 cores and 32 GB, I believe if I'm not mistaken. So, yeah, be interested in see what, what CodeBuild can do.

Yan Cui: 25:11  

Yeah, I’ve looked. I think CodeBuild can probably have maybe 16 or 12 core, or maybe even bigger than that. I can't remember. But it was definitely, when I looked up the last time, it has a much bigger maximum instance you can get compared to Fargate. And also with Fargate, I guess you still have the default limit of quite a small number of task you can run per region. So I guess you guys had to raise that to a much higher maximum so that you can run out of your game servers and as well as your load testing clients?

Louis McCormack: 25:45  

Yes, we do. Yeah, that's one of the first things we do. I think the limit is 500 per region. Yeah, we we make sure that gets raised.

Yan Cui: 25:54  

Okay. Yeah, 500 is much higher than the, what was it, when you launched or maybe even like a year and a half ago, it was much lower. I remember that when I was at DAZN we tried to use Fargate. And the first limit we ran into was like 20 Fargate tasks or something like that per region. It was really low. And I guess they must have raised that recently.

Louis McCormack: 26:14  

Yeah, and I might be wrong. It may be 100. I’m pretty sure it's higher than 20. Now.

Yan Cui: 26:20  

Yeah. Okay. All right. That's good. I guess another thing that we can probably talk about is you've got this idea that looking at the serverless landscape, and you say it's looking more and more like a Linux system. So can you maybe talk to us about that and explain your thinking?

Louis McCormack: 26:37  

I have. Yes. And you know, disclaimer, this, this might be complete nonsense. But yes, yeah, it's just something that I've begun to notice over time. So I really believe that serverless is the next iteration of cloud. If you if you think serverless is cloud v2, then v1 is, is VMs and containers. And the thing with VMs is that they're kind of a lift and shift of a virtual machine, that is exactly what they are a virtual machine that would have previously run in the data centre. We've just taken those VMs and put them into the cloud, and, and containers pushing up the VM a little bit more, and ECS, and Kubernetes. And I'm a big fan of containers. They all, they all push that push the boundaries a little. But they're still, they're still using that same concept. They're running, they are taking an operating system, generally a Linux OS, and running it in the cloud. And what strikes me what strikes me about serverless is that instead of running a whole operating system in the cloud, I sort of see it as a, as a, perhaps a DevOps engineer, with a systems lens on it, as a Linux system exploded into the cloud. And I don't mean an explosion, like the Big Bang, I mean, exploded as in split out into the cloud. So I've got this theory. And bear with me, but you can draw, you can draw some analogies from Linux operating system fundamentals to, to serverless constructs. So if if you could think of the cloud as the CPU, you know, like a constantly available source of compute, then Lambdas, it's easy to imagine them as processes or tasks that are scheduled onto a CPU. System calls this one is a bit of a stretch, I must admit. System calls in in Linux or Unix terms are when a process asks to be switched context into the kernel from user space to kernel space, generally, because they're asking the kernel to do something like disk, disk IO or something like that. System calls could be like calls into AWS into the AWS API. So that would make AWS kind of the kernel. Interrupts, interrupts are another thing in in Unix systems, they trigger something based on an external event, like someone hitting a key on the keyboard, or some sort of IO being ready. And it's not doesn't take too much imagination to liken those to CloudWatch events. Then we've got demons, I think most people know what demons are, but they're long running processes. You know, step functions, kind of replicate that. And then, if you if you think back to the kind of basic Linux systems programming, there's a thing called IPC, which is inter-process communication. And it's a number of sort of well defined ways that processes can communicate with each other. Things like semaphores, message queues, shared memory, shared files, and the the system busted, the D-bus. And over time, if you look at the features that AWS have bolted on to to their serverless offering, it comes more and more to resemble this model again. If you if you keep in mind that, that I'm sort of imagining that Lambdas are processes then message queues wait for one process to put work onto a queue for another to pick up a process that work that they've got an obvious analogy, we've had message queues in the serverless world for a while, SQS, SNS, Kinesis arguably. And then the system bus is another one that the system D-bus, which is a way for processes to broadcast messages to others that happen to be listening. On the bus, we've got the CloudWatch events bus. It’s even called a bus. So it must be similar. Shared files as well, we've got, I think, relatively recently, you can use EFS with Lambdas. Now I'm really not sure of the wisdom is doing that. But you can do it, you can share files between between sort of unrelated processes. And if you're using this model, it may be interesting to look at the things that are missing from the IPC module to try and try and guess what might come next. Just just a fun exercise really, I’m not really saying this is this is what Amazon are gonna invent. But some of the things that are missing is semaphores. In cloud terms, that might be a global locking mechanism, or I don't know, leader election or something like that. I think that you don't actually need this in a serverless world, I think if you're using message queues properly, you can achieve the same thing by just having something consumed from a queue at a rate of one. But another thing that's missing is, is shared memory. And this, this kind of gets me thinking maybe this wouldn't be such a bad thing. So what what I'm talking about shared memory in the Linux world is the same piece of virtual memory mapped into two or more processes address space. And the closest analogy I can draw in serverless is kinda global variables, when, you know, when a container starts, we are allowed to initialise some global variables. Maybe we're calling to SSM or Secret Manager, maybe it slows down the the, increases the cold start somewhat. And then other functions, which run in that same container, are able to access that shared memory, they're able to access those global variables. Now, I'm not sure of the received wisdom these days in doing that, it always worked fine for me. But I do wonder if there's, if there's something there, like, maybe some way to share that, I wouldn't talk about memory, I'm talking about fast access to storage that might not necessarily be durable. So maybe this could be a way that Lambda functions could access data from some very fast storage that would result in us not having to sort of pre-load during the init phase of container. I know we could use ElastiCache Redis which is probably the closest thing that we have at the moment. But there's there's a problem there with the... Redis has a very hard limit on the number of connections. And if, I think you'd hit that, that limit pretty fast if you're scaling up thousands of Lambdas. Anyway, Yan, that is that's my theory. Maybe it's, maybe it's the ramblings of a madman. I'm not sure.

Yan Cui: 33:33  

No, I think there's definitely some truth to it. And in terms of the shared memory, Tim Wagner actually wrote something, God, when was that, I think, more than definitely more than a year ago, when he was still at the AWS, where he found a way to basically allowed worker instances for Lambda function to inter communicate with each other, almost like inter process communication. So then you have a way, so with that you kind of have a way to do that shared shared data like point-to-point at least, rather than the mapping to a shared memory space. But that will definitely have some interesting use cases, even though I really like the share nothing philosophy from Erlang that shared nothing philosophy, I think, is a really powerful thing and allows you to, to scale and have independent failures, that doesn't, you know, one process, writes bad data and collapse the memory and everyone else is, is have problems. So you don't have that kind of failure modes. But certainly, I think that's a really interesting line of thinking to explore maybe even further. And who knows, maybe like you said, AWS is going to come up with something that give you that sort of shared memory model for Lambda in the future. When they do I'll definitely let you know.

Louis McCormack: 34:51  

Yes, please. One thing I guarantee with the things they announced is that they always surprise me. Even Lambda itself was a was a surprise when it first came out, I couldn't quite get my head around it. But they are obviously cleverer than me.

Yan Cui: 35:05  

Well, they have a lot more data than we do in terms of what people are doing and what sort of problems that people have. And that's one of the great things about AWS in terms of the feedback, from customers straight into teams. Speaking about, I guess, the, I guess, company culture, Space Ape is by far the best company I've worked for in terms of company culture. You know, from the sort of cell structure to some of the bottom-up philosophy. I don't want to sort of spoil the story for the listeners, but can you maybe tell us about how the cell structure and the bottom up culture works at the Space Ape?

Louis McCormack: 35:44  

Yeah, I can. Thank you for saying that as well.  We think we have a good culture as well, so it's nice to hear it validated. Yeah, we do, we have a we have a wonderful culture at Space Ape, really, I mean, we make games which lead itself really to quite a playful environment. And it's a very democratic culture, actually, there's a very flat management hierarchy, which I think helps everyone to feel empowered. And movement between disciplines and sort of gaining skills in multiple disciplines is encouraged, which leads to less siloization and barriers. And, and notably, anyone, anyone in the company can come up with a game idea. Even even a DevOps engineer, like me, could come up with a game idea. I mean, I haven't got a game idea, but I could come up with one. And then that idea, consistently sort of allowed to germinate and it must pass a set of requirements and milestones and get buy in from the rest of the company. And once it, if it passes those those sort of milestones, then a small game team is built around it, and it does start off small. And from that point on all decision making around the around that particular game is left entirely to that team, even down to whether they should continue developing the game or not. And there will, there'll be a game lead, and the game lead almost acts as like a mini CEO within that team. Obviously, they can seek advice and guidance from anyone else in the company and are encouraged to do so. But we ended up with sort of a cell like structure with several small teams developing games, almost almost in isolation, but not not really an isolation because striating these game teams, a number of shared disciplines, could we still call them guilds, shared guilds, we have a server guild, client guild and a DevOps guild. And the idea is that ideas of progress and technologies that are used in each of the games are shared through these guilds. And so that way, the games aren't developed in isolation, they're just sort of developed on their own. And the that's not the only way we share these ideas. They're also shared with the wider company, weekly Show and Tells, another ceremony. So although we have very, you know, several small teams developing very distinct products, and they are very, very different games. There's never any mystery. Everything is shared with the successes and failures. And I mean, there are a lot of failures. Failure is, is, is part of it as part of the process. And even decisions that are made at higher levels by a CEO and people like that all of those decisions and successes and failures are shared too. And it just leads to a very healthy atmosphere, I think.

Yan Cui: 38:34  

Yeah, definitely. I think the transparency from the top down has was probably one of the most important thing for my time there, just understanding not exactly what to do, but what the company is looking for from you. And you know, who we are as a company as a culture. And I think that the role that John and Simon plays in terms of reiterating that every single quarter in every single quarterly company meeting, reiterating those culture, those key points. I think that really helps, especially when you got new people joining, or just to remind people who's been there for seven years, like yourself, you know, this is who we are, and we're not going to deviate from that. I think there were a couple instances that I remember clearly as well. Was it like a birthday thing or something? There was some company culture to, you know, do cake or something. And I think one person didn't want to do it. And then someone else was complaining about Oh, yeah, you know, what happened to the cakes. And that's when sort of John's kind of stepped in and just said, Well, you know, we, we do need to celebrate, you know, the person whose birthday is, you know, and trying to stop the sort of the feeling of entitlement from creeping into the culture. And I think John really understands that, once those, I guess, those entitlements and bad things as creeping into a culture is really hard to stop. So he just nipped in the bud.

Louis McCormack: 40:00  

Yeah, absolutely. And you've kind of hit on something else there that that we are very willing to adapt. And you know, John, the CEO is very honest that it's a learning process for all of us including him. And we have we have adapted several times, course correction, whatever you want to call it. And I guess we will continue to do so. But that's again, I think it's a healthy place to be.

Yan Cui: 40:25  

Yes, definitely. And the fact that I think one thing we probably should also talk about on this whole culture topic is that, how do you deal with failures and the fact that you let people decide when to kill the game is great, and know the fact that people can, and are willing to do that means that there is a safety net in place, because you see a lot of companies especially large ones, whereby you know people drag on their project they sort of, they just hoard, they hoard their, their bit of the project or a space so that they have job safety they have security knowing that, you know, that you can't fire me because I'm the only one who knows this part of the process. And in fact, you see a lot of deliberate over engineering, just so that you create something so complex that no one else can touch it without you, whereas Space Ape is quite the opposite. So, maybe talk about maybe can you elaborate on some of the sort of safety net and how does it.. what happens when say as a game team we decided that this game is not going to be, you know, top 10 grossing so we don't want to build it. I want to pursue other ideas. What happened to that team and what happened to you as an individual?

Louis McCormack: 41:37  

Yes, that's a good question. So just as you, you mentioned that it sort of reminds me of other places that I've worked, where you see those, those sort of patterns and it also reminds me how lucky I am to work at Space Ape where that type of behaviour just doesn't really exist. I think that's the first point you made that it is down to the game teams as to whether they terminate their own games or not, and puts in that power in the hands, the power of that decision in the hands of the team leads to less ill feeling, as you can imagine, you know, and it is the whole team as well it's not just the game lead that suddenly decided the whole team factored into these decisions, but I think another point is that what I mentioned before, because, although the games teams are developing their own games, and they are not going to say there's no competition between them, they want to develop the best games. It's a healthy competition and everyone's aware of the other, the other cool games that have been developed at the company. So they do know that if a game team gets disbanded. They can either, they'll keep that same game team and come up with a completely, sometimes, a completely new idea, sometimes a variation on the same idea. Or they can they can, the game team will get completely disbanded and they'll go and join one of the other very cool products that we've got going on. So there's, there's not really a problem there.

Yan Cui: 43:04  

I remember John mentioning that this whole culture this whole identity was in part inspired by Supercell who incidentally also ended up acquiring a part of Space Ape. And I remember watching like a documentary or interview with the CEO of Supercell, and he just laughed and said, I'm probably the least powerful CEO in the world. In fact, like one of the, one of the teams that killed his favourite game idea ever. They killed it. But guess what they ended up making a Clash Royale. So I think it worked out pretty well for them.

Louis McCormack: 43:42  

Yes, I think I think no one would argue that. Yeah, I think, I think we, we learned a lot from from Supercell, of course we do. Yeah, even before they acquired us I think we can, part acquired, as I should say. We learn a lot from from the practices.

Yan Cui: 43:59  

So yeah, thank you for that, Louis. And I guess for listeners who haven't experienced this kind of culture is very refreshing. And like I said, is by far the best company culture I've personally experienced in a workplace. And I do hope some of these ideas will be, will be, I guess, more broadly accepted into other companies. And, yeah, it's been great talking to you again and catching up on glow times,

Louis McCormack: 44:26  

Yes, Yan, hopefully we can meet up in person before long.

Yan Cui: 44:31  

And before we go, how can people reach out to you and find you on the internet?

Louis McCormack: 44:36  

I am on twitter @louism517. And that's about all I'm on actually. I'm not very, I'm not much of a social social tech guy. 

Yan Cui: 44:46  

Yeah. I remember all the guys at Space Ape were very much get the head down build games. And I was probably the only man who was constantly spending time on Twitter and blogs. 

Louis McCormack: 44:57

Yeah, I should say as well we are always hiring. So, if you like what you've heard at all, we are at, is the place to look. 

Yan Cui: 45:09  

Yep, and is one of the best places to work, like I said. So yeah, go check it out. And with that, I guess, thank you so much Louis. And hope to catch up with you in person soon.

Louis McCormack: 45:19  

Yes, thanks Yan. Yeah thanks for having me again.

Yan Cui: 45:22  

Take it easy, man. Bye bye.

Louis McCormack: 45:23 


Yan Cui: 45:37  

So that's it for another episode of Real World Serverless. To access the show notes, please go to If you want to learn how to build production ready serverless applications, please check out my upcoming courses at And I'll see you guys next time.