Mainframe Modernization at WorkSafe BC with Brett Kelly Artwork

Walter's World: Mainframe Modernization Podcast

Join Walter Sweat as he dives into the complex world of Mainframe Modernization. After spending most of his career working on mainframes systems, Walter is now on a mission to decommission the very mainframes that he help build. In Walter's World, no mainframe is too big and no environment is too complex. Mainframes beware, Walter is coming for you!

All Episodes

Walter's World: Mainframe Modernization Podcast

Mainframe Modernization at WorkSafe BC with Brett Kelly

September 03, 2020 • Astadia

In this episode of the Walter’s World Podcast, Walter will be speaking with Brett Kelly, former Lead Architect and Consultant at WorkSafe BC. Walter and Brett will be discussing Mainframe Modernization at WorkSafe BC!

Walter (00:17):

Good morning and good afternoon, everyone. Welcome to the latest edition of Walter's World. My name is Walter Sweat. I'm the chief technology officer for Astadia and today I'm delighted to be joined by Brett Kelly. Brett was a lead architect and consultant for large governmental organization in Canada, who back in 2014 successfully migrated from their mainframe. So I'm delighted to be able to have him talk to us today about those experiences. Brett, welcome. Glad to have you here.

Brett (00:52):

Good morning, Walter.

Walter (00:54):

I've got to ask you a question now. Not everyone knows this, but, shortly after that migration, I think you are one of the fortunate few who were able to retire. How's retirement been for you after going through a migration project like that?

Brett (01:08):

Well, it's been terrific. I moved from the big city of Vancouver to a very small town in South-central BC, called Greenwood. And there's lots of hunting and biking, outdoors, and canning and making jam and whatever. It's been terrific. I've not missed it at all actually.

Walter (01:30):

But secretly I know you still think about coding and COBOL. You just can't ever get away from that, right?

Brett (01:37):

Yeah, right.

Walter (01:42):

Can you tell me the organization that you were working for? You know, it's not an easy thing to think about moving from the mainframe, obviously great safe environment. What was the process that the organization went through as they look for alternatives to the mainframe?

Brett (01:58):

Well, the organization had been strategically thinking about this well before they made the actual decision. So, for example, as early as 96, 97, they began the process of moving their applications off the mainframe. And so that was done in a piecemeal process, you know, as each, application came up for renewal and especially through Y2K, there was an opportunity there to be moving them off. So they removed off to distributed systems, power builder, visual basic, there was a variety of technologies used to get the applications off. We were making kix shop at the time. And so at some point when most of the apps had been moved, I've actually the final large app was moved off. They discussed the idea, the architecture group of getting rid of the mainframe. And so I was tasked with doing some analysis to determine whether it was an appropriate time, whether technology was at a state where a server based platform could support the workload that we currently were hosting on the mainframe. So at that time we had reduced the mainframe to being a database server, large enterprise level DB2 databases. We had COBOL batch, and we had, we also use COBOL for our store procedures in the database. So there was quite a number of store procedures there, and we have a scheduler on there. So we had a large and complex implementation, or I think at the time were using CA scheduled or some variant of that. So, basically the job was to decide whether the platform could support it. The server based platform had been increasing in capability over time. And we also knew that these costs and risks associated with the mainframe skills, et cetera, were increasing costs and be less available. So there were risks associated with me staying with the mainframes. Things like SSD drives were being delivered at that time. So there was technology that was making the platform and multi-core servers, multi-core processes were being introduced. So that server platform is becoming more and more capable. And the risks associated with the mainframe were increasing more and more. So, strategically they, we decided that yes, the platform could manage our workload, and I think probably, probably I had more faith in, we considered Microfocus as one of our potential platforms, but we weren't, we weren't committed to them. And I probably had more faith in other technologies than were justified. I thought, you know, if we had to handcraft things, we could do that, I suspect we probably could not have done that as it turned out, we landed on micro-focus through Astadia and that turned out to be a really great choice. So basically, yes, the decision was to proceed based on the capabilities of the server platform.

Walter (05:35):

I think it would have had to have help the fact that you had already moved some applications off. So you knew that this distributed world could be successful for you. So that's wonderful that you all were able to kind of ease into that.

Brett (05:48):

Yes. Yes.

Walter (05:51):

Could I ask you compared to what the mainframe first looked like when you started moving that workload off, what percentage would you say that it was still involved when you finally turned the mainframe off since you were taking some apps off? Was it 50%? 40%?

Brett (06:11):

Oh, good question. Um, Kix itself had been removed. So we had managed to remove all of the online applications. They were moved off to a distributed platform. But really, you know, we still had a very capable of mainframe, this wasn't, this wasn't a, you know, a small, tight mainframe we, or I forget how many we're probably running, I think, 5,000 MIPS or thereabouts. So this is, this is a big mainframe. So there was still the databases we were running were large and all of those distributed applications were pounding those databases hard. So we had a lot of, we were using a lot of capability of that mainframe, but it's just that the number of technologies that we were using on a mainframe had been reduced dramatically so that the actual problem to be solved in the mainframe had been reduced in complexity and size.

Walter (07:09):

Okay, perfect. That makes sense. I'm always curious about this. How did you come to find out about Astadia?

Brett (07:17):

Well, actually you guys found us, so there's a, there's a website that all government and quasi-government agencies in British Columbia are required to use when posting things like requests for quotes, and a request or proposal, and it's publicly available, you know, worldwide. And I think Astadia were combing those and found it in submitted a proposal along with several other firms. And so that's how we ended up finding you and you finding us, a marriage made in heaven, so to speak.

Walter (07:51):

We loved that it worked out that way. Thank you so much. As you got to know the folks from Astadia, were there any people who just kind of stood out for you and if so, why? Well there, so there definitely were several.

Speaker 1 (08:04):

So Steve Steuart was your, when your lead guy there and Steve was, he was a good guy to have on the, in that role. We got along really well and he was, he was a good guy to manage your account on our, on your behalf there. But you had one star, one big star, um, without whom probably the whole thing might not have got lift off. And that was Raj Barra, your tech guy, Raj is a brilliant, brilliant guy. And he came up with a data conversion engine to bring down the many terabytes of DB2 data that we had in a so short of at a time, it would make your head spin. And it was, it really was, it was a brilliant piece of work. I don't think there's anything to this day. I don't believe there's anything out there that can approach the speeds that he managed to attain to get that data off. The mainframe we were, we were looking at using, I think at 30 hours is what we wanted to get the time down to. And on the very first time he ran his engine, he hit 30 hours. So he had already the very, very first iteration of that technology already met our requirements. He did some further tuning and got it down to 10 hours. And we use that version simply because it met our timelines and it had been tested. But, I know for a fact that Raj had done further tuning on his own and was confident he could meet three hours. There's no other technology out there that can even come close to those times. We're talking about terabytes and terabytes of data. Probably, I think we were between the two databases and managing probably about 4,000 tables worth of data. So it was, this thing was, was just a screamer and it was quite amazing what he, what he accomplished for us in, in that task. There was other guys like Dwight Cannon was your, your sort of Microfocus guy there. Dwight was a great guy. They're very laconic at character. And, he was, he was really good, real calm, between him and Raj. They were two really calm guys. And in the middle of this storm, that such a project is it's nice to have people like that who can maintain their calm in the face of all of the pressures and craziness that goes on. So there were a couple of good guys there, for sure.

Walter (10:42):

Well, thanks for sharing that. We certainly feel blessed to have Raj with us, Dwight unfortunately is not with us anymore, but I've known him for a long time. Great guy. And I agree that as you go through a migration process, it's not easy. There's always going to be challenges and having people who kind of weather those storms, who understand that challenges will be there and don't get unnerved by it. We're blessed to have folks like that with us. So talking about that data migration, you know, people don't often think about this, but it's a reality is you say, when you go live, you don't just push a button and things are just automatically there. You've got to plan ahead of time. You've got to allocate time to get all of that data moved over. So that planning process had to be a very important part of what y'all put together.

Brett (11:37):

Oh yeah, yeah. We had some of our best and most senior project managers and test leads involved. The process of planning getting through the testing and making sure that we were in a position to go live was, it was a big deal for sure. Yeah. And lots of pressure. Yeah. You don't get through those. You don't get through those kinds of projects without a ton of pressure. I mean, it really, when I was telling you before talking about it, I believe it was almost like PTSD for a lot of the people who were involved.

Walter (12:14):

So it's a monumental change, but the great news is as with your organization, a lot of hard work went into it. The fact that you were able to do it open yourselves up to new ways of being able to do business and save money at the same time. I hope it was well worthwhile for you.

Brett (12:36):

Oh yeah, no, it was a, I mean, there were a lot of naysayers, no doubt about that. People including IBM when we originally did our analysis, IBM was contracted to that, that analysis. And, their conclusion was, first of all, that we were crazy to contemplate such a thing. But on the other hand they said, you know, we asked them how much processors, how many processes would it take to support the workload we had given what, what they knew about mainframe MIPS, translating into processes on the city until platform. Their, their estimate was 16 cores. It would take. So on the one hand, they're saying we're crazy to contemplate. And on the other hand, they're saying it's only going to take 16 until, of course, to do the work. So, you know, we did not actually take that at face value because you know, people's careers were on the line here. So they weren't about to risk their careers on ongoing short, a couple of cores. So we actually provisioned two 32 core boxes. Each of those boxes for the database servers had 750 GB of memory, not MB, GB. So these were massively provisioned database boxes, because everybody wanted to make sure that because you can always, you can always take those processes and redeploy them in some other way through a VM, and you can take that memory and move it somewhere else. But at least at the starting point, we wanted to make sure that we were well and truly covered. It turned out at the end that probably even 16 cores would have been more than enough because we were massively over provisioned. And when things got started, we watched it actually as the, CPU on the day one, the CPU were climbing in consumption up to about 45%. And I was, I was starting to break out into a bit of a sweat, but in fact, they stopped it, stopped there and started to go down because of course, with all of that memory and all of the buffers for those databases, as they got filled up, that the thing got faster and faster and faster. And finally it settled in around between eight and 10%. And that's why that's where they, that's where they stayed for the rest of their existence basically is, that's how much CPU that we were actually consuming. So yes, we gradually de-provisioned those boxes and took out CPU, et cetera, because we had so over configured them. Yeah.

Walter (15:20):

So in terms of performance then, um, not just the usage of the CPU and the environment performance was comparable to what you seen on the mainframe. Did users see much of the difference?

Brett (15:34):

Every single application, including batch ran twice as fast as it did before.

Walter (15:39):

That's astounding.

Brett (15:40):

It was, it did blow our minds. We know nobody, nobody expected that kind of performance, but everything we measured it 10 ways from Sunday, everything ran twice as fast, response times were half of what they were. So, so that, although in every other respect that the end user community was probably blissfully unaware of what had gone on because their application looked exactly today, as it did yesterday, you know, it did the interface, the only thing was everything ran faster. So that was the one visible thing to the user community was the fact that their applications were better and faster. Yeah.

Walter (16:18):

That's astounding. You know, you mentioned it before where the hardware that's out there, solid state drives and the faster processors, it is astounding, the amount of improvements that people can see.

Brett (16:33):

Even our, even our storage system, our storage system had 48 cores. I mean, that thing was massively capable and it had to SSD drives. It had all kinds of capabilities that previous generations of storage system didn't have. So, at every stage, along the way, things. So, so for people who are nervous about the fact that they're getting off their mainframe, and they, they think that perhaps they're stepping into a platform where they might be having to scramble to get performance. No, the likelihood is there's probably an overly confident belief in the mainframe that they server based platform with the multi-cores and the storage systems that are available there. I can't imagine there's any application that couldn't be improved by they're being moved to that platform.

Walter (17:28):

That makes sense. Now, in 2014, was cloud and alternative for you, something that you looked at, or..?

Brett (17:37):

No, no. In 2014, you know, cloud, I think it was beginning to be discussed, but I don't think the major cloud AWS is probably around at that time, but, but I don't think he did attracted major corporate installations in the way it has subsequently. So no, we hadn't given serious consideration to cloud at that time, but the client has subsequently aggressively pursued cloud. But of course it's, there's significant issues, you know, between transitioning from on-prem to off-prem like that. Especially if you are taking part of your application suite and moving them over and you have to do conversations between those two platforms or transacting between that's a complicated piece, you know, going from a mainframe to the cloud is probably easier than going to on-prem and then from again, from on-prem to do off-prem. So, yeah, I would say that if, if people are contemplating it and they're able to do a mainframe migration, maybe mainframe migration to cloud would be strategically a better alternative.

Walter (18:56):

In today's world, it's funny, you mentioned that in 2014, I had so many clients who would ask me about the cloud. And at that time we were able to move into the cloud, but no one really wanted to be first back then. Today, it seems every conversation starts out with how quickly can we get to the cloud? So times have certainly changed. But one of the things that is always of interest to me, what were some of the areas where you had to get used to doing things perhaps differently than you did when you were on the mainframe?

Brett (19:32):

Well, you know, we were, we were well positioned because first of all, all our developers were already working off the mainframe. For the most part, we had COBOL developers who were doing COBOL stuff, and we had DBA's who were doing the database administration work on the mainframe. But as an organization, we were well positioned to be, to be moved off on as a research platform because a lot of our staff were already doing that. And from our client's perspective, it was, you know, they didn't see any difference. Once we had moved, the application looked the same as it always did. So, I think other organizations where they have those skill sets as the development skill sets dedicated to their mainframe might have a bigger, a bigger jump to make, because we had chosen to move those applications off earlier. All of our developers were like power builder or VBA or C#, they were all, they were, all the schools are graduating, people who do that stuff. And so that was part of the decision was okay if you, if you're trying to support Kix developers, who's graduating those? Nobody, so yeah.

Walter (20:55):

Yeah, absolutely. How about testing? What was that like for you? Did you find any tools that were of help to you as you went through the testing process?

Brett (21:05):

Yes, there was a couple of things, but one is sort of the testing of the applications I'll deal with that sort of as a separate discussion. But, in order to, in order to determine whether or not the server platform could support the workload we had, we came across a product called iReplay. And, we had, we have sort of two large databases, one of which supported about 60 different applications. So if we had to, we could subdivide that up into many other databases on many other servers and manage the workload that way. There was another database that was a monolithic single database for a single application that could not be subdivided. It was, it was dedicated to this single large case management application. So we had to know whether the platform could actually manage that single workload cause that, that one that could not be subdivided. So iReplay was a terrific tool for that. It basically allows you to take us a backup of your database, then retake a packet capture of your network, traffic to go into the database. Now this, you know, the predicate to that is that you need to have a network, a configuration sets that you have a single choke point for traffic going to your, to this monolithic database, which we have. So you do a packet capturing, we did a packet capture starting Friday night to Monday night. So we covered off our entire weekend batch process and one full day of production online activity. So you capture it. Now that the file files produced and you need to pull them together, are gigs and gigs inside these are massive, massive files. So first of all, being able to, and being able to manage these humongous files, but you run it through the product, it extracts all of the SQL that's in your network traffic. And it takes that. Now, you take this database, you can either run this on the mainframe, or you can run it on a distributed platform, depending on what your requirements are. In our case, we wanted to move that database onto the Intel platform. We're running Windows at the time. And then you run, the software runs the SQL against the database, and you can measure the times the response times, and compare that to what you had on your other platform. And then you can compare the databases at the end, or they did all of the changes and updates get applied? And you can vary the rate at which the file is processed. You can just firehose it in there and just see how quickly could it do it. So essentially it determined that yes, we could manage that major monolithic workload. But the problem actually was that, it did it so quickly that it created more skepticism than confidence. People suspected that there was smoke and mirrors involved, or that it, that it was artificial in some way. So it wasn't the panacea. Like I was confident, I knew what it was doing. I knew that it had a success we'd done the workload, but there was a lot of other people who were highly skeptical of the results. So when we got to the actual project, we had to redo all the, iReplay stuff. We had to redo it again. And again, it produced exactly the same kinds of results. Yes, it showed that things were going to be faster. That still didn't calm lot of people down, there were still a great amount of skepticism and right up until the day we turned things on a technical people were walking into the technical director's office telling him that he was crazy and that his career was over. And that this thing was going to blow up in his face. And it turns out of course, that everything ran twice as fast and it was a major success. And so, you know, my sense of relief of the tech director, of course.

Walter (25:34):

Too good to be true almost, but in this case, yes, it's true.

Brett (25:37):

Yes. Yeah. There was a number of stars, technical stars in there. The, even the windows platform that we hosted DB2 on a lot of people said, you know, you should be doing it on Unix. We chose to, because we were a Windows shop. We chose to put it on a windows server. Windows server was absolutely fabulous. There was no issues whatsoever. It wasn't getting in the way. It didn't need to be recycled. It was, it was as strong a platform, as you could hope for the DB2 distributed DB2 for LUW was a star. I mean, it performed amazingly well. It was a, it's a terrific product that of course doesn't get much play with IBM because they're a mainframe centric bunch. It itself is a terrific product. People shouldn't ever shy away from using that as a product. I had nothing but great things to say about that.

Walter (26:25):

It's full-featured and very, very functional. We've had great success with it.

Brett (26:30):

Oh, absolutely. Yeah. Yeah.

Walter (26:36):

Brett, I think this is important. It's something that folks who are on would like to hear, what were some of the lessons or just about the migration process in general?

Brett (26:44):

Well, I would say this, that, data is universal. Data types are universal. So, contracting the conversion of data. That's for sure. That's a given. You can do that. Code is universal. So COBOL code is COBOL code. So if you've got people who convert COBOL code, then you're good to go there. But application testing is not, you know, every application has its own individual nuances. And so for people contemplating that, they should probably count on doing their own testing. It's not the case that you can just parachute people in to do testing of your applications. It's just, I don't think that's a doable proposition. Certainly we didn't find that to be the case. So, I would highly, highly recommend that when in the planning process, you count on doing your own application testing. You've got, it's likely you've got a test group in house that know those apps. It's the apps that haven't changed. It's the apps, you need a, some kind of a parallel test between the old platform and the new, everything should work out the same. But trying to outsource that I think would be misguided. So certainly I think that's one thing we've found.

Walter (28:03):

I know at Astadia, when we talk to customers and potential customers, we talk about the value of testing. And we recognize, even though we have testing teams who are expert at the process of testing, we will never know the applications, as well as the customers will, so making that a partnership where we're working together, I agree is paramount to ensure a successful migration.

Brett (28:29):

Yeah, yeah, absolutely.

Walter (28:31):

Well, Brett, I think we are almost at the end of our time. I want to thank you so very much for taking the time to join us today. I know this had to be very helpful for people who are listening, who are considering migrations, and I just want to congratulate you on retirement, a well-earned retirement and wanted to thank you again for taking the time to be with us today.

Brett (28:53):

Well, thank you Walter. I enjoyed it and I can say, I think our partnership with Astadia was highly successful. It turned out to be a terrific project. It was successful by every measure and we really very much enjoyed having new folks on board for that ride.

Walter (29:15):

Thank you so much. And everyone, thank you for joining us today. I hope you'll join us for the next edition of Walter's World, the podcast series. And if you have any questions, please reach out to us at www.astadia.com. Thank you all very much and have a great rest of your day.