Real World Serverless with theburningmonk

#9: Serverless at DVLA with Matt Lewis and Chris Williams

April 29, 2020 Yan Cui Season 1 Episode 9
Real World Serverless with theburningmonk
#9: Serverless at DVLA with Matt Lewis and Chris Williams
Show Notes Transcript Chapter Markers

I spoke with Matt Lewis (Chief Architect at DVLA and AWS Data Hero) and Chris Williams (Principal Cloud Engineer at DVLA) to talk about the state of Serverless adoption at DVLA. We talked about a range of topics, from serverless security, QLDB, cultural and organizational changes when one moves to serverless and why AWS needs to offer more opinionated advice to customers on when to use which service.

You can find Matt on Twitter as @m_lewis and Chris as @chrilliams.

Here is the NCSC report "Cloud Security made easier with Serverless"

The best site for information about DVLA's cloud academy programme and other roles is

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

Opening theme song:
Cheery Monday by Kevin MacLeod

spk_1:   0:00
In this episode ofthe Real World serve Alers are interviewed Matthew Lewis and the Chris Williams at the D. B A, which is the driver and vehicle licencing agency in the UK We spoke about the many ways that Diva is using service technologies like Lambda Gateway Step Functions and Cody B. To accelerate feature development and to modernise their text back. We also spoke at length about security and the challenges that DVD has faced as they transition from running everything on premises to running service applications in the clouds. In a DBS Hi, Welcome back to another episode ofthe real World Service a podcast way. I speak with real world practitioners and get their storeys from the changes. Today I'm joined by Matt Lewis and Chris Williams from the DVR. A Welcome to the show, guys.

spk_0:   1:04
I am

spk_2:   1:04
I am

spk_1:   1:06
s o for the audience who are not based in the UK Can you tell us a bit about what is devia and that What are you Rose over there?

spk_0:   1:13
So De Vallee stands for the driver and vehicle licencing agency. So we're part of what's called a part for transport within central government in the UK so we're responsible for registering licence single drivers in Great Britain. And that includes issuing the photo car driving licences on also registering licencing or vehicles in the UK, which includes their sort of registration certificates. But most people knows because it drives and vehicles we do, we do more beside. So we basically maintain a call, register about 49 a half 1,000,000 active driver records on a register of around 43 a half 1,000,000 active vehicle records s all told, that means we interacted. About 92% of the entire UK adult population reflect about £6 billion each year in vehicle excise duty. So I'm chief architect there, and that includes predominately tasks like setting technology direction.

spk_2:   2:04
Yeah, and I'm Chris. I'm principal engineer. So I own the technical direction of a cloud platform and that includes a service workload as well as our traditional container upload as well.

spk_1:   2:17
So we worked together for a while, and that's so you guys were doing some really cool stuff using service technologies and from high level. Can you tell us how you guys are using service technologies to support all these enquiries against the driver and vehicle data.

spk_0:   2:33
Yeah, well, you might say it's been a It's been a bit of a journey, so we started. I mean, I guess, I guess from a government perspective, it was relatively early on on DH. That was when we were starting to look at moving on a p I from creating a P I deploy and Public Cloud, which the volumes is sort of growing exponentially. We were looking at traditional approaches of sort of deploying a pie platforms and configuring 82 instances and auto scanning groups on. Then maybe I gave a launch. Some called usage plans were able to sort of put some rate limiting in place. So we launched that your first service that went using service technologies launch sort of January 2017 as soon as the London region became available. And that was with AP, I Gateway Cog Neto Landa. Shortly after that, we moved into buying some voice skills. So again, that was with Alexa Landau and also Dynamo D B. But we also, you know, moved on from there and used it for a raft of different tasks.

spk_2:   3:28
Yes, so it's interesting that Clara platform space, I think we probably started where most people started. That's using Lander for very much housekeeping tasks. So we had the usual things of triggering cad watch events to shut down instances, test instances for overnight when we didn't need them back ups and things like that. But more recently we've including the stuff that matters mentioned we moved into some quite use interesting new space is one of them is using step functions where we use their functions to orchestrate, calling a number of AP eyes with the court system and that includes all kind of back off and re tries. And if they haven't quite finished processing the documentation we try again and all that is brilliantly automated for us and orchestrated on, then we've also got quite a few utility services now service So one of the main ones beads about notification and printing services. So without notification services, we interface with the government services government. Notify service, then the aid of the services of arson s on de CS on with printing as well. We kind of have a mechanism for sending Prince to our output services group here in the way,

spk_1:   4:40
So that's quite a mix off work clothes that you guys are using service for. So lots of things force impact. Their one question that springs to mind right away is that as a government agency, you hand a lot ofthe sensitive use data. Are there any special security requirements for how you manage that data? And you guys get any pushback from under government or regulatory bodies? When you were talking about moving to serve Ellis,

spk_0:   5:06
I mean, I think there was definitely an element what you call third when we first started looking at it. And some of that was probably it's not the, you know, the name itself, in terms of service caused some confusion to certain people that, you know, obviously we have to go through information assurance and policy and data sharing and so on and so on. You know, anything that was different became a case of then we had to try and prove why it could be more secure. So were some of those anything that I think involves change. There's always a bit of a learning curve there, but, you know, I kind of think we're helped in that. You know, we're not the only ones that have been on this journey on DH. For example, in the UK, we have the National Cybersecurity Centre. So the NCs e they carried out a lot of research and it was really helpful because they've published their findings. I'm part of that was talking about how service can make you more secure in certain areas. So they talk about things like, you know, especially around patching and making sure the versions are up to date and we look at the shared responsibility and the handoff in terms of what the cloud provide will do for you. But they also found, in terms of building smaller components of things that do one thing which service you tend Teo automate from Day one. And the thing we see now is part of the border sort of cross government. Joining up is that there's more, more government departments taking advantage of of service. You can definitely see it's reaching that sort of tipping point of becoming quite a common approach.

spk_1:   6:32
Doesn't your great do you know if those findings are published somewhere because I get this kind of similar questions for my love enterprises about security and again like a set. Our love is just a misunderstanding about security. Wayne comes to serve a list, and so many teams are so used to putting order effort around security around the network boundaries and serving as you can't just don't need it anymore. So having something has published by the UK government, I think I'll be really good.

spk_0:   6:58
Yeah, weaken if, how wide it is published, like saying they were looking at it to compare building your own solution. Using infrastructure is a service. First is using service technologies from the main cloud providers, so he was cloud agnostic. From that perspective on DH, they found a commonality, especially in certain areas, in terms of how it formed you to think about designing solutions, how you sort of built automation in you started small, single responsibility principle on there was a big focus. Then around patching you, Khun, build service, you know, solutions or services that have vulnerabilities in them. You know, you still have to take account best practise, but they kind of see that it's a it's a good starting point.

spk_1:   7:39
Yeah, that's definitely what I'm hearing a lot from the love companies that are conscious about security, and she fully understand how much work it takes to secure an infrastructure that you have to manage yourself on order patching R and they're working on everything around it, which will serve as a love that is just being done by a DBS. And I think you guys also are one of the early adopters. When comes through this brand new service. Qutb. So for the listeners who are not familiar with it, this is basically a legend based database. Can you tell us about what's your use case there and how have your experience being with cure debate so far?

spk_0:   8:14
Yeah, So you killed BB is like the contemplated database so that God announced reinvent 2018 in one of the key notes I followed. The positioning of it is that it's, you know, it's another one of the family of purpose built databases. And so because for us, you know, we touched on a register of drivers, a registered vehicles. We have a register of tackle graft cards and trailers and registration numbers or licence plates, and we also operate as a centralised authority for them on. So the ledger database, you know, goes back centuries to things like double entry bookkeeping for us. What it sort of gives you out of the box is the fact that it's clearly a journal. First architecture, they call it. So when you try and commit transaction, the transaction is committed to the journal on. Only committed transactions end up in this journal, and then it generates these materialised views. So it provides a view of the current state of a record and also provides an indexed history. So that history then gives you a complete order trail of every divisional, every change that's being made to a document. So So it takes what tradition would have been seen as things like a right ahead Lagoa transaction logged within a database. And it makes that accessible so banks that, like a first class citizen that you interact with and then on top of that, they built a Merkel tree on that allows you then Teo, sort of cryptographic. We verify the integrity of any version of the document so you can prove that that document exists in that ledger and that it hasn't been altered in any way since it was first written. So basically uses a sharp 256 hash on hash chaining to show that no change is being written. I think the one of the cool things about you know, for me about two of the beers, some of the new features. There's something called Streams that's in preview at the moment. So that allows you to start looking at, I guess, what would be seen as a change data capture pattern. She can hook a cure to be stream up to a Connexus data stream. So when a record is committed, so transactions committed onto the journal that will get streamed out on, then you can have consumers that pick that up and do some processing with it. So if Rose one of our use cases, is because we maintain these registers, we have farm or uses that interact with us for enquiries, then for actual transactions are updating a system of record. So we're able in that use case, too, but a lambda function consuming the date from Connexus and filtering out any data that's considered personal information we wouldn't want and actually populating something like, you know, a table and dynamo to be able to do a single digit Leighton see high volume inquiry, then you have that separation from Qutb is the journals. The ledger is the source of truth on things like read models. You know, again, you can choose the right purpose built models. So if you want to do 30 search, we could put it in something like elastic search. If you want to do a high volume keypad, you look up. We could put it in in dynamo on all of those and then fully manage service components like it is. Ah, it's a relatively new service so that, you know, you still go through some of the learning points as it gets built out. And what's good about it, then, is you can start tryingto influence and provide feedback. And you sort of see how the service matures over time. You know, like for us little things like just getting our head around to use his eye on a data structure. Not Jason, And that's because it provides has finally representation and text representation, but it it supports things like time stamps. You could be more precise, which is needed when you're creating cryptographic caches. There's libraries that late work without easily in Java, but the node ones are still work in progress.

spk_1:   11:48
all right, That's such a sounds really, really cool. And I guess the tooling cellophane is gonna improve over time as more more people started adoptees and database spend more time to focusing on improving the stick itself.

spk_0:   12:00
It launched a clown information on day one.

spk_1:   12:03
Wow, that's what that's for a change. And so I think you have to touch down some service integration that you're doing a swell. I remember from our time working together one of the challenges that you had was You have all this, I guess, sort of truth being stored in really legacy databases and its conductivity between them and your surveillance application. It was a challenge. Maybe. Can you talk about some of the problems that you experienced there and is any sort of learning that you took from that? That may be useful to the listeners.

spk_0:   12:34
There's probably a couple of points to touch on there so that this aspect where you look at how to extract data out of some of the legacy systems and then sort of stream them and keep them up to date, I think the you know, some of the other things was looking and still so The challenges that we have at the moment is also how we are sort of doing things differently in the service approaches opposed to him or servant or traditional approach, you know, so aspects around that is what we found with services were able to, you know, we're trying to go full stack with them teams that satin worked above that data because we're tryingto you know, we very much look at least privileged and secure by design. And because of that, we certainly ended up going down things like a multi account structure on then some of things we've been looking at around. If we didn't want some of the traffic to divers on public Internet and looking at things like private, private AP, I see you've been a lot more involved in that in that Chris Sonja.

spk_2:   13:30
Yes, I've had some interest in time with details because of the nature of the data. What we've kind of done has had new detail processes for each use place, so we have a number of different ways you can inquire on a vehicle, for example. Historically, we've heard a number of processes, all doing similar stuff. But as part of the detail, only making some of the data visible in Dino de B or RGs pose grass. What we kind of found is, rather than having all the detail processes on Gwen in these multiple processes, we're kind of taking the approach of just having the one moving up to the cloud. And then, from there we can use services like Dynamo Deby to dio queries on that data by easily.

spk_1:   14:11
So with service being a big change to how you've been doing things, what were some of the challenges you found that we've bringing a lot of engineers over to this new world? Did you have the fight push backs from this? Engineers who now have to work in a very different way and have to take on more developed responsibilities? Was there anything that you found particularly useful in convincing these engineers and getting them skewed up and ready for a different way of working?

spk_2:   14:40
Yes. So from a service point is, it's really great on ownership, point of view on as we move to more of a product model at the valet, it's it's really really useful to have teams that own full stack from development or a production with that has come a few cultural changes here as well. So we've had a very much a structure of support run and build. So we tend to hand stuff over to sport organisation s O. Some of the challenges around that have bean a little bit interested mentioned everyone about multiple accounts structures so well the ways we're tryingto give squads and teams. Their sense of ownership is by having the moment their own accounts on these challenges around that as well. So in a community's platformer, toofer and historically, we've been able to solve things like the more entry in the alert e and the lock ship in quite nicely because it's all deployed into a share platform with service, we've kind of had to have a car engineer and function which has enabled the things like log shipping into a central account. So we still have the elastics stack that our support teams can use. The monitor on the multiple account structures has beaten very much interest for us is you know, you and so we have essentially managed a gateway on DH. We've quickly become aware off the handoffs required to manage those empty eyes. So when I say essentially managed a PR gateway, this contains all of our user pools incognito. So all of our authentications done through there are usage plans. It's all defined by a swagger. Documentation. So actually expose an appeal from a squad who's responsible to that. There's thiss very manual process on Also, it's it's very difficult to version has a clear across the two teams eso we've had since we have a real interesting points around.

spk_0:   16:33
Yeah, I guess my perspective is also being the even getting getting up and running in terms of the concept of you know, Day one was simple in terms of building a simple prototype, especially, we're using things like service framework that made it easy to quickly stand up something like a gateway land dynamo or whatever collection off services that we're looking at. But what we did find is that we have traditionally a Java shop on DH. You know, I guess if you look at Data Dog knew Alex data service reports that are coming out, you know, the vast majority of probing languages, the sort of node python, those types of languages, and there was like a desire I think we sort of settled on node on. Some of that was just how to build small the functions. And at the time, they were very real Cold star issues using Java on all seem simple in terms of getting up and running. But then it was the sort of like the operational readiness around it so soon. Just need adopted a new programming language. Had to think there will. What are we going to do? Flynn? Tink, What are we going to do for unit tests? How we vulnerability scanning them, how we're gonna package them up, And then the whole wider peace things like monitoring, alerting Chris touched on things like lot chipping. So those types of things that we solved the container platform on. Then we introduced a whole new programming language. The whole approach. How does that fit in with the deployment pipeline? So those then all things that we've subsequently had to solve.

spk_1:   17:52
So that's quite common complaint. I've heard in terms ofthe when come to serve a list some of the languages I just better suited and better supported than others. And I had quite a few customers who ended up doing a similar migration from Dot net or from Java to no dope item, because again, the co starred on somebody's languages are just inappropriate for a lot of use cases. Anything issues of facing basically. So I guess that's certainly one area that 80 Bess is working on, and I do hope that they will better support job and done their core going forward. Improving the performance and co starts on those platforms beyond just letting you say Use provisions on currency to get around the fact that you can have co stars. There are a couple of seconds long. Are there any others perform limits or general tooling problems that you have running to a developer? The working with surveillance State of day that makes your life difficult?

spk_0:   18:43
I mean, there's there's more general one from my perspective, which is just the you know, how rapidly cloud providers evolve on times of bringing out new services. Bringing out you features Ada grass, for example, can't opinionated. They provide almost like a menu that people can pick and choose from. But what you then end up is if you're going to draw. One of these Venn diagrams is a huge overlap in the middle. So we're looking at some of these things now and it's there we made traditionally, you something like sus oh s s s caresses of fan out pattern and then all. But actually we could use conditions here on overs event bridge and it seems that there's so many ways now to solve something. There's then a nervousness in terms of Is this the right approach or not on, like say that? And it's then becomes quite difficult to keep on top of all the changes that are coming out and looking, which wants you to go back and retrofit because they're going to give you some value.

spk_2:   19:34
It's interesting. It's also one of my takeaways from reinventing one of the shops was couple of talks about one table designs and time would he be, Which is a big change for us. So, you know, we're kind of used Teo relational databases. Wave actually had seven. Quite large A. The best bills based on dynamo d B. You just taking approach that we can put another global secondary index here will be fine. We just want to inquire, are on a different type. Eso eso It's around that sort of stuff as well. Which is currently the change for developers?

spk_1:   20:05
Yeah, that the dining with evey single table design. I do think it's Ah, it's very powerful, very clever. But sometimes I think it's too clever for my liking. Also too clever for many of the companies that are using Donna with the bee on the HBs. Seo hye many opinion around what you should use. That's also a really good observation, because again, a lot of customers asked me that same question. When do I use as an S versus as curious for City Vem Bridge, Canisius out. Ikuo Onda for Napier's other things, that's also a p a gateway recipe. Are now this http A p I on DH? Then there's also a copious Well, how do I choose between all these different services that seemed to do very similar thing. So I hope that I understand I would offer you a guess, more reference architectures and talk about why certain decisions are better in some context, rather than just giving you this huge selection of different things. You can try and mix match yourself, but not really telling you when one is better than others given different contexts, is anything else that the Amazon could do better besides better documentation having more reference, architecture based tutorials rather than just Hey, here's a service is anything else that you think? OK, as a customer of really love Amazon to do X

spk_2:   21:18
S s. So for me, I think Amazon have lots of very nice simple use case examples. But when it comes to mention before about multi cam structures and how we limit blast radius, the kind of more complicated and even call in real life solutions I don't feel like there's much guidance on that, which is the biggest. You press on dead things like. So we're nearly doctor of Benito, for example, on the confirmation support and the tooling around that was incredibly slow, you know, And using customer resources isn't nice. And then trying to retrofit that back into class information. We've had lots of fun around that.

spk_1:   21:58
Yeah, cognitive is another very complex, a piece ofthe machinery that many people get confused by. But I guess there were some of the other alternatives, like off zero. The pricing on off zero is just insane. I think a lot of customers that go to cognitive because the pricing is more gentle is more acceptable for their workloads. So I think that covers everything that I wanted to cover. The Senate feels that you guys want to tell the listeners, Maybe Diva is hiring. And how can people find you guys on the Internet?

spk_0:   22:28
You know there is a thing in terms of really keen to reach and grow community down here in South Wales. So except we're both myself and Chris on the organising teams service stage Cardiff We've just done a second edition of that and as part of that, we've got a service South Wales meet up that's been kicked off. So obviously we love people tto join and take part in that.

spk_2:   22:49
On my one thing I kind of went mention is we've got really excited crowd academy kicking off, which is a two year programme where we take people with just a simple interest. 19 cloud put them through a 12 week boot camp on. Then they get to join one of the diva cut engineering teams and get some real life experience. So I'm really looking forward to that programme. Kicking off is that

spk_1:   23:10
programme targeted a students always open to everyone. It's

spk_2:   23:14
It's open to everyone. It's on civil service jobs. Eso Finneran's interested in. Please take a look.

spk_1:   23:20
All right. Sounds great. And I guess that's gonna be hosted in the diva office.

spk_2:   23:24
Yes, he had it when we get in storms.

spk_1:   23:26
You guys did an amazing job for a service States the card. If I had a great time on both locations, and I'm looking forward to seeing you guys again head next year.

spk_0:   23:34
Same two years. Well,

spk_1:   23:36
I'd like to thank you guys for joining me today and spending time to share with us your storeys at the diva and service my

spk_0:   23:42
problems. Thank thank you.

spk_2:   23:44
Thank you.

spk_1:   23:58
So that's it for another episode ofthe real world service. Thank you guys very much for joining us for this conversation with Matt Lewis and the Chris Williams to ask us to show notes and the transcript. Please go to real world service to come. I was sorry, guys. Next time

how DVLA is using serverless
on serverless & security
how DVLA is using QLDB
on integration with legacy databases
on the challenges of transitioning to serverless
easy to get started, but there are still pain points once you get going
AWS needs to offer more opinionated guides on which service one should use
on what AWS should improve on