Real World Serverless with theburningmonk

#7: Serverless at LifeWorks with Scott Smethurst and Diana Ionita (Part 1)

April 15, 2020 Yan Cui Season 1 Episode 7
Real World Serverless with theburningmonk
#7: Serverless at LifeWorks with Scott Smethurst and Diana Ionita (Part 1)
Show Notes Transcript Chapter Markers

This is part 1 of my conversation with Scott Smethurst and Diana Ionita about their work at LifeWorks. We talked about the story of LifeWorks, a wellbeing startup that were acquired for $400M, in part, thanks to the heroics of Scott and Diana. They implemented business critical features using serverless technologies, delivered them under severe time constraints and created significant business value which played a part in the acquisition.

We talked about how they made their systems multi-region and geo-partitioned their user data. We discussed why they opted for Apollo server instead of AppSync and the challenges of transitioning to a serverless way of doing things - including changing in mindset, local debugging and testing.

Check out org-formation, an open-source tool to help you manage your AWS organization using infrastructure-as-code (IAC) and CloudFormation-like syntax.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod

spk_0:   0:00
in this episode ofthe Real World Service are interviewed my good friend Scott Smith Hearst and the end of your Nita about their work and life works well being Stop that decided to go service Hi, Scott and the d n A. To build the world being section off the app with severe time constraint and how they've managed to ship it on time and help the company get a quiet for over $400 million is a fascinating storey and that is some amazing work. They're very, very quickly. I'm sure you're gonna enjoy this episode. Hi. Welcome back to another episode ofthe Real World Service. A podcast where speak of real world practitioners who are building stuff with surveillance and get their storeys from the changes. Today I'm joined by two old friends of mine, Vienna and the Scot who had worked with at yobbo. Hey, welcome to the show, guys

spk_2:   1:02
Young Hey, you

spk_0:   1:04
know, while since we worked together yobbo and we did some really cool things there But you guys have been working together at a fuel starts where you continue to use surveillance for pretty much everything you do. So tell us about your experience the last couple of years and where you guys been doing?

spk_1:   1:20
Ah, yeah, I mean, it's mainly been at one place from It was about mid 2017 and saw the end of last year. We were working with a client called Life Works There. CTO ad. Hey, had actually heard some of the cool stuff. We'd done it. Your ball. That's mostly young. Let's be honest and he got in touch with me and we have a chart on it. It quite quickly became a part of the life. Works could really benefit from going serve alert speakers. They were pivoted in a new direction on the kind of hear architectural gridlock, I suppose, with the existing Buchan solution. So you add, ended up bringing me in attack, lead on architect, and pretty quickly after that, I got the honour on board and, yeah, we were there for just over two years.

spk_0:   2:09
So the end of tells about live works against most of us who are not business UK men or even heard of them.

spk_2:   2:16
I'm sure life works is ah, well being companies that acts as a service that other companies can get for their employees for instance, they have trained counsellors that you can call that that would help you through hard times, like financial troubles or when you're grieving or have other kinds of personal difficulties. They don't disclose the information to provide your company, so it's a it's a safe space. So they also provide an app that you can use to help yourself before you start talking to people because that's more expensive. But they're started off as a social network, mostly, so you could see a feat of your colleagues. Activities get perks, discounts, restaurants and cinemas. But while we were there, we built a well being section of the app where you can like I said, Help yourself. For instance, you could track habits that you might want to implement, like switching fizzy drinks for water or choosing to walk or cycle over taking the bus. They has a section called Challenges. You can compete against your co workers, see who takes most steps in a certain time frame. You could receive tips on how to improve your life by completing assessments, basically a set of questionnaires on well being topics like drinking or sleeping. The APP also tries to encourage you to use it. So they've game ified. These actions you receive points whenever you, for instance, take steps or complete the assessments I just mentioned. You use that these points to get discounts and popular shops or even small monetary rewards.

spk_0:   3:39
Okay, so what's this? Something that you guys were building from scratch, as opposed to lifting, shifting or breaking up. I existing monitor into service.

spk_1:   3:49
Yeah, it was. It was mostly a Greenfield set of features as the other touched upon. It started off as a kind of yeah, like itwas like a social network for employees with rewards, recognition and all that kind of stuff. At that point, it was just a UK company called Work Angel. He ended up merging with a 40 year old U S company called Life Works. And that's where the Employee Assistance Programme stuff came in and at the point had got in touch with me. They've made a decision to pivot yeah, more towards this wellbeing space. It turns out that gives you access to a much larger market. I think like the AP, market is about $1 billion. Where is the wellbeing markets like $65 billion. So these features we've been tough to build. There were a bunch of them, like the honour just described. One slightly problematic thing was they'd already signed a strategic partnership with the Royal Bank of Canada. Insurance on what they were going to do is we're gonna Upsell the life works platform with corporate insurance policies. The provides over was certainly would awful these wellbeing capabilities in place. So there was a time pressure Tio. It's relatively build all of these new features. And I think the CTO realised that because they had this legacy monolithic PHP back and the based on the current velocity of stuff that was getting due on the they just start very little chances of me in these deadlines on DH. That's why we decided to go down the service through. I mean, it was one of several reasons, but certainly it was about getting Mohr Dawn in less time with fewer people. That was one of the big reasons.

spk_0:   5:36
And how did that decision work out for them?

spk_1:   5:39
It well, it's all very well. It was a huge success, to be honest. I mean, we held that one year on, we did hit the final deadline in June of the following year. I decided to take a month are over at that point because we've been working very hard towards that. There was still only a buck and team of four revolts at that point. About two weeks into my break, the company got acquired for $400 million. Yeah, a very big part of that, I think wass the wellbeing offerings the harder meaning obviously wasn't the only factor in the acquisition. Bought one a ship out of the company that acquired them were also Canadian, like the Royal Bank of Canada on DH. They were very quick, keen on the wellbeing offerings. And I worked our, you know, in terms of numbers to deliver all that we developed about 25 micro services that was comprised is comprised of about 170 London functions. And we did that with an average of 2.5 back and developers, because for a large chunk of it, it was just me and Deana. And then, for the last two months, sorry for lost. For months we hired in run in Vivian's two girls over the line. Yeah, it was a huge amount was achieved with relatively few people.

spk_2:   6:53
So in this

spk_0:   6:54
case, you've had your quarrel of micro services or build from the ground up with survey lists. You guys, employees or patterns to help you manage the complexity. And also you were doing thiss multi region as well. I remember you telling me anything you guys were doing that maybe things would be easier for you.

spk_2:   7:11
Ah, I'm not sure I can say anything about particular patterns that we adopted, but I couldn't say a few things that did help. For instance, we had a few Devil Islam does that made our lives much easier so they would detect when you Lampasas were being deployed and automatically, for instance, configure alarms on ears and Leighton see, so that if if anything went wrong would be notified. Vice lack. It said log expiration dates to comply with GDP or regulation which ship logs to date a dog because we found the cloud watch was not upto it did not suit our needs. For instance, we you mentioned going multi region. We could not search for logs cross region. That was a pain.

spk_0:   7:51
So for the stuff you were doing when you guys told me that you were the point to multiple regions to improve Leighton. See? And also, resilience is Well, you know how you guys so coordinating order departments across multiple regions. And how do you set up both of agent replication? All of that?

spk_1:   8:07
Yeah. The multi region stuff came in post acquisition because it was quite interesting because immediately prior to the acquisition, the majority of our customer base was in the UK All of our AP eyes were hosted single region in London when almost overnight, because of the acquisition, we now knew the majority of our customers. We're now going to be in the US and Canada, and there was also going to be a whole bunch of new customers as far away as Australia. So that's why we started investigating the Morsi region thing. We did do some testing with a product called 1000. I stopped calm. That allows you to simulate mobile clients from various locations and gives you that kind of four late and seeing it awesome networking tools built in, we saw that even a simple hello world in the worst case could take 1.5 seconds. Aware is in the multi region set up. That same user would experience maybe 150 milliseconds. But part of the problem was the right from the off. We've been given a requirement. Geo partition users data. This is because some of the wellbeing data was very sensitive in nature. You could be asking in a health assessment questions like, Does someone feel suicide? Or there were a lot of companies very keen to keep that in their local territory. So as a worst case, you could have a situation, for example, where the user is in Australia. That data was in Australia. What they were that hit in an A P I in London, which was going back to Australia for the data, then going back again to respond to them in terms of how we did the multi region stuff we had to my great because yeah, we were in a single region settle. So we went from AP, eyes that were edge optimised, and we have to go to regional AP eyes. Yeah, there's a whole process there. We got our c I deploying Teo. All four regions in parallel. There were four regions. We identified that we wanted to go to. And then there's the whole process of moving everything across. We move the first and the resources second, and we did off the right some costume tool in actually to help with that. Like moving some of the Dynamo Deby data on DH. Some of the s three stuff.

spk_0:   10:25
Okay, that sounds pretty interesting. Especially how you have to make sure that user data stays where well, where isthe in the country, but then allowed to use it to access it from other places in the world as they travel to different locations. And remember, you guys also mentioned that you were beauty ing with graphic. Yo, but you're running your own A polo server hosting Lambda. Supposed to use an app Sync, Can you explain? How did you guys came to that decision?

spk_1:   10:52
The main reason for that well, we did look into up sync up Sink was a very new products at the time. One of the very big reasons we didn't send abusing it is because we had just gone Morsi region at that point on up sink was only available in one of the four regions we've just gone into so that was a big downer. Another thing we weren't keen on with the way you could secure up sink. It was certainly at the time it was very sensitive. Incognito, we weren't using cognitive, so I would say another factor. Wass We'd gone to huge lengths to build a micro serves his architecture on DH. We didn't really want to then build a monolithic graft, you outlier at the time Apollo server hard feature called schema stitching. I believe they've now rebranded that I think it's called Apollo Federation. But the idea is that you, you confederated graph across you micro services. And then you have what appears to be a single graft, the Welland point still to the clients, which stitches all of the remote schemers together. So from a client perspective, it feels like just one graph. You know, for example, we mentioned challenges and assessments. The assessments ap I would own the assessments part off the graph, and the challenge is a P. I would own the challenges part the graph and they would be stitched together in a single public and point from what I could see up, Sing didn't support this and you had Tio effectively creating monolithic graph within up sink. So, yeah, they were the main reasons we stayed away from that time.

spk_0:   12:33
So as far as I know that it's still the case with absent today that there's no support for schema stitching. So you do end up with one big monolithic graphic. You're a p I in sync s o switching gear. Slyly. Deanna, I've got a question specifically for you because I Syrian. But that day we first met at yobbo. I was trying to show his host surveillance Lambda stuff

spk_2:   12:55

spk_0:   12:55
all you want to do was to have his two instance to Runa Know Js express up. And now they say you've changed. You come completely around and you are one of the biggest Africa's for serviced. I even I know. How did that come about That was so special about service for you as engineer?

spk_2:   13:16
Well, to be fair, that day when when we met that first note express up was the very first I had Bill's ever So it wasn't a big switch. The ah Zia, Sir Valises exciting for me, I guess because I like that it forces you to go down a micro services route in a way makes you think about what's the smallest kind of function that I can build that I could maintain That does a job that I need. It's awesome that it could interact with a whole ecosystem. A vato? Yes. I mean, it connects with us S s. Q. You can have event triggers from pretty much anything you can think of. That is great. So, yeah, I guess that's what I found exciting about it.

spk_0:   13:56
And so, as your transition from beauty stuff with express or with the don't know, I guess the Web, a p I, or whatever to be stuff with Lambda sig individual functions and sticking things together. What were some of the most challenging aspect ofthe that transition for you guys?

spk_2:   14:11

spk_0:   14:11
when we worked together what three or skirts form was four years ago Now, none of you had done anything. We raid the beers and never mind service. But now you have both exposed herself to so much on a DBS and with service. What are some things that you found so most challenging when you were learning surveillance and what are the things that really help you?

spk_2:   14:32
Um, for me, I guess it was figuring out what you can do, because Lambda in itself is just the compute part, knowing that there's so much more that you could do with a delirious and still I'm clearly still learning about it cause they come up with new stuff all the time. That's most challenging, keeping up to date, knowing what changes that to me is most challenging. I

spk_1:   14:51
would say there was a lot of challenge of things for me when I when I started working with you and your body on it was it was being thrown in at the deep end, To be honest, because, yeah, I was new to know Js I was new to eight of us. I was need to serve a list, so it was all very new. I think the big ones were serving. This are it's almost like a whole new development powered. I'm in a way the kind of way that you test, for example, that you would do TDD orbi d d is. It's very different. I think we quickly realised the acceptance level Testing has much more value when you do in service development than unit testing everything ah lot of that is because you're effectively orchestrating managed services. And if you kind of mock all of that away, it's not really going to test it. You know, you you may be missing that permission on dynamo d B that you simply not going to know about unless you deploy that stuff to the cloud and leverage it from the entry point that the thing calling it will be, Yeah. So the whole different way of testing things, I think silly little things like you have to If you are running your tests against deployed stuff in the cloud, you need to remember toe deploy the latest version of what you're doing before you run those tests deep organ locally, talk a little bit of getting my head around because it was, you know, like, how the hell can you be broke something locally if this is, ah cloud based off. But it turns out you actually can't do that quite nicely using the service framework. I think another night's trick we ended up doing was we parameter rised our tests, though that you know you could run them against, for example, are deployed on point what you could equally run those same tests against the local handler, which meant you could step through the code by running the tests as well. But I guess it was just a whole different way of working. It was the biggest shock to May Andi. I guess the fact that I was also learning, you know, no jazz at the same time. And I was so I'm familiar with all of eight of us is gazillions of offerings. Yeah, it was a fairly difficult transition. But having said that, I thought we got it Didn't take us too long to become productive. Be interested in what your take on that is Young. But it felt like we all got up to speed pretty quickly, even though we are all those hurdles.

spk_0:   17:30
So if I recall correctly, I think every one of you, when you started the first month or just painful just constantly how they did this with note. How do these rate of years what is state of the service you just named dropped into the conversation. But I think after about a month, every single one of you were pretty productive. You able to take on task, get it down very very quickly. And you remember we had this the attack team where we broke up into pairs and then we just go and work on the feature and just get a ship within two weeks. I guess it was difficult, as you said, but you were able to get productive very quickly again. I think once you get over there and learning some offensive you're used to doing, you know what? You have to do it now. It's a lot simpler. So learning the new things that should not that hard, especially with Lambda even know. Just maybe a handful services. We simply Well, you can do most of the things off course. It helps if you know more of the ecosystem but is not required for you to be productive.

spk_1:   18:28
Yeah, I agree. I think after a month, I would say, Yeah, that's about right. We were productive. Obviously you learn Maura, Maura, as time goes on, and I think as the under torture upon, I think the more you can learn about the various offerings. Eight of us has the better chance you have of architect in the right solution because you can't we need to be aware of what's out there to pick the right soul for the job.

spk_2:   18:53
So going back

spk_0:   18:53
to life works. You guys had a priest more team. How do you guys organised your eight of his accounts? But did you have one account per developer, eh? So that everyone deployed to their own accounts?

spk_2:   19:06
Um, no, we had actually won development account. Initially, we rent some problems with that. Actually, we would deploy our stacks in our own environments. Would have Dev Indiana environment on DH. Because we had quite a few Lam da's, we'd end up maybe having 5 50 confirmations tax each. And we are a team of five. And the confirmation stack limit is 200 soft limit granted. But we had to raise that pretty quickly. The eventually we started migrating towards having a shared Dev environment. And then I deploy a deaf Deanna service, which would talk to Dev Services. So I'd only have to deploy maybe one or two of the things I was working on on. Get them to interact. We had to do that because some services depended on others. They would publish SMS messages and other services would listen to them. So if I wanted to play something that had a dependency on something else. I would have to deploy many services eventually. You so? Yeah, he went down the road of having a share. Devon Garment, currently at a time. We're even considering creating a native alias account for each developer. We think that might help with the limits.

spk_1:   20:13
Yeah, maybe it's a simpler way to go. I mean, at one point that life works, we actually hard a single account, a belief, or the lower environments. We ended up off in one per stage in serval of space. At least we ended up having a day of integration test on DH production accounts. Yeah, you can still run into issues. You're all deploying large stocks in the dove account, hitting limits and stuff. It's definitely a problem in that. I would definitely very least recommend that. You know, if you have, you can go to pizza teams that they have their own sets of accounts. I think that could be good for various things as well. You know, from a security perspective, it can kind of decrease the blast radius of an attack. It can decrease the blast radius of a fault, you know, because it's something spirals out of control, it can chew up, can't wide limits. And if you've got, you know, the whole business running in an account in parts of that could be much larger. So much in a good way of scaling accounts is the heart of teams that owning their own accounts. I'm not sure what you've done, Your Honour. Places like the Zone and yeah,

spk_0:   21:23
yeah. So why General rule of thumb is to have at least one account per team environment. So you have one account for, say, the discovery team at the zone where we owned a bunch of Formica services, and we will have one accounts for theft, one account for staging when account for test and then one account for production. And as you said, any blast radius would be limited to just our micro services and to just one specific environment. And also, you don't have that problem with hitting limits all the time. Zone has something like got 200 something developers, many of them working on a backhand on the Wendy. Had she had accounts, it was just ridiculous. In terms ofthe the limits, we hit things that you never imagined you were hit. Number of I am Rose 2000. We raise it to $4 like a month later, hit that limit again. Things that that just, you know, just constantly have to stop doing the day just to race their request to raise the limit so you can move on and even wasn't that you have all these stripper limits run time which is not just resource limits that stops you from the developing and deploying things to your environments. But you can actually stop your service at one time a swell. So definitely having more granular accounts and maybe organiser using a diverse organisations to manage permissions and apply organisation wide guidance and Bruce using service control policies S C. P S and group them around organisation units. That's kind of the way you should be managing your accounts in your organisation nowadays. But again, that s point more applicable for a big organisation. We have lots of different people. Lots of teams may not be as relevant or applicable for a small team with just four developers,

spk_1:   23:08
I agree, but at the same time, if if you think there is a chance you business is going to scale tohave Mohr teams. You maybe can avoid digging yourself into a hole by just doing not from the officer. Personally, I don't figure you huge overhear diver toe have a few more accounts.

spk_0:   23:27
They really depends.

spk_2:   23:28

spk_0:   23:28
a lot of tooling around provisioning a new account and also setting up a landing zone for your new account. None of those tooting our perfect there's the atavus organisations. That's a lot of caveats around their True, it's not grateful helping you maintain your landing zone as you change it, either. It helps you create a first landing zone as you create a new account. But after that, if you change the SE La collection pipeline, for example, you want to ride it out to all of your accounts. Good luck. It doesn't do that for you. There's open source to call the information which developed by this bank, which I should spoke with on this show earlier called Money. You. They provided to allows you to use infrastructures code actually very similar syntax to confirmation to manage your entire organisation. You saying yum o specifying new accounts using demo on DH, then also linking them to your organisation unit and sitting up cloth merchant stack that will be deployed to every region and every accounts that you add to your organisation. So that, too, is probably the best one I've seen so far That helps you manage more complex. A diverse environment with a lot of different accounts to do it. Well, it's special. Weighing over a small team is know as easy as you like.

spk_1:   24:42
Yeah, I think you're right. That sounds like a nice Ah, nice tool. Actually, I've not heard of the one you just mentioned off the child.

spk_0:   24:50
Okay, I will send it to you afterwards, and I also put in the show notice. Well, so this part one on my conversation we've Scott, Indiana Peace. Come back next week for part two of this conversation to access the show Notes and transcript, please go to real world service dot com. I will see you guys next time

who is LifeWorks?
LifeWorks's journey to serverless and 400M acquisition that followed
25 microservices, over 170 Lambda functions, 2.5 developers
on the benefits and challenges of going multi-region and geo-partitioning user data
on why Apollo server on Lambda instead of AppSync
on the challenges of changing to the serverless way of doing things
on how many AWS accounts should you have?