The Catalyst by Softchoice

The Inheriting a Mess Episode: When the Building Is Already On Fire, What Do You Do?

Softchoice Season 8 Episode 1

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 28:08

Every IT leader has a “day one” story. The moment they opened the server closet, logged into the admin console, or reviewed the vendor contracts and realized the job they were hired for isn’t the job they actually have.

In this episode of The Catalyst, we follow Chris Schopf, an IT operations team lead who walked into a new job to find twelve-year-old servers, two unfinished infrastructure projects, and a team that had been poached by the outgoing manager. Then we hear from Leon Adato, a 37-year IT veteran who’s made a career of walking into other people’s messes, and Ned Bellavance, a former consultant who warns that not every mess is actually a mess. Some of it is “purposeful chaos” you just don’t understand yet.

What you’ll learn:

  • How to tell the difference between a real disaster and “purposeful chaos” with reasons you don’t understand yet
  • Why Chris convinced his CEO to stop all projects for two weeks — and how it saved the team
  • The three-word business framework that gets IT leaders budget, staff, and permission to fix the mess: revenue, cost, risk
  • Why learning to “speak business” doesn’t make you less technical — it makes you bilingual

Featuring:

Chris Schopf — IT Security Architect 

Leon Adato — Principal Technical Marketing Engineer, Cribl

Ned Bellavance — Founder, Ned in the Cloud | Podcast Host 

From Softchoice, a World Wide Technology company — this is The Catalyst.



The Catalyst by Softchoice is the podcast dedicated to exploring the intersection of humans and technology. 

Katey:

Chris Schopf has an open door policy, not because he read it in a management book, not because the company asked him to. It's because at this particular job, on this particular team, the door doesn't really get to close.

Chris:

Sometimes it was like one teammate was knocking at my door, and open door, by the way looking at me, and I was looking at him and said, "Oh, don't say it. What's now? What, what new message you, you're coming with? Uh, is it burning? Yes. Okay."

Katey:

Chris was the IT operations team lead at a mid-market industrial pump manufacturer in southwest Germany. About 2,500 employees, international offices, a data center with servers older than some of his teammates' careers. He had been on the job a few weeks. The previous IT manager just left and took the entire team with him. From Softchoice, a worldwide technology company, this is The Catalyst. I'm Katie Teekasingh. This season, we're doing things a bit differently. We're making audio documentaries, real stories from the front lines of IT, exploring the challenges of small teams chasing big dreams. Today's episode, what to do when you walk into a job and discover the building is already on fire and someone else lit the match. We are calling it the Inheriting a Mess episode. Act one. Walking in. Here's something almost every IT leader experiences at some point in their career, often more than once. You take a new job, you walk in on day one, and within a few weeks, sometimes a few hours, you start to realize that whatever was happening here before you arrived was not going well. Maybe the documentation doesn't exist. Maybe nobody can tell you why something is configured the way it is. Maybe the people who built the system are gone, and so are their reasons. Maybe the servers are 12 years old, and the renewal project that was supposed to replace them got abandoned halfway through, and now you have a hybrid setup that nobody planned and nobody wants. It's so common, in fact, that IT people have a name for it. They call it inheriting a mess. We talked to three people about this, three people who have lived it from very different angles.

Chris:

So today, I'm working as IT security architect for a private healthcare or health insurance, also based in southwest of Germany. And before, I was working, um, my whole life in manufacturing for manufacturing companies in IT.

Katey:

You just heard Chris. He's our hero of this story, the guy in the room with the open door. We'll come back to him in a minute. Then there's Leon.

Leon:

I have been working in IT for about 37 years, and for 27 of those years I've been focused on monitoring and observability, which gives me a lot of insights into the ways that things go wrong and blow up, and you have to put them all back together again.

Katey:

That's Leon Adato. He's a principal technical marketing engineer at a company called Cribl, which makes software for IT teams that need to manage telemetry data. Basically, all the signals coming out of your systems telling you what's happening. Leon has spent his career walking into other people's environments and figuring out what's broken. And finally...

Ned:

I'm Ned Bellavance. I'm the founder of Ned in the Cloud, and my main area of focus is delivering education to technologists, whether that's in the form of videos, books, blog posts, podcasts. I've been in the industry for over 20 years now, learned a lot of stuff, and I like to share what I've learned and help people advance in their career.

Katey:

Ned spent a big chunk of his career as a consultant, which is to say somebody who gets paid specifically to walk into other people's messes and clean them up So when our producers asked these three, "Is inheriting a mess actually a normal part of the IT job?" Here's what Leon said.

Leon:

No. I- if this job was easy, nobody would pay us to do it.

Katey:

Right. If this job was easy, nobody would pay us to do it. Leon goes on.

Leon:

So there's a joke in, in Jewish circles that there may be one synagogue and two Jews going there, and there's at least three opinions there. And I found that the same is true for tech, that there could be one data center and two IT professionals in there, and there's gonna be at least three opinions. There's always more than one way, more than one valid way to do something. And therefore, when you walk into an environment, you are dealing with, you know, a multiplicity, a multitude, a veritable cornucopia of opinions and ideas about how to get anything done.

Katey:

And those opinions and ideas, they calcify into systems, systems that the next person inherits.

Leon:

Technical debt is real. It has been real for the entire span of the 37 years I've been doing this. Whether you're talking about stuff that gets shoved into the bottom of the closet where the network gear is, to really bad choices that everyone knows are bad choices, but it's the only choice you have, to, you know, bigger architectural decisions that seemed right at the beginning, but then you found out they weren't, or whatever, whatever. There's a

Katey:

McKinsey study from 2020 that put numbers behind this. They surveyed 50 CIOs at large companies and asked, "How much of your tech budget is being eaten by technical debt?" The answer? Somewhere between 10 and 20% of the budget that was supposed to go towards building new things instead goes towards paying down old decisions. 20% just to break even. Now, here's the thing. You hear technical debt, and you might assume that means somebody screwed up. Somebody made bad calls. There's someone to blame. Leon pushes back on that.

Leon:

The person walking in sometimes walks into an absolute disaster that is honestly nobody's fault. It's still a disaster, but it's not because someone actively decided to set it on fire. It just was smoldering from the minute it rolled off the production floor.

Katey:

Smoldering from the minute it rolled off the production floor. Hold on to that phrase. We'll come back to it. But before we get into Chris's story, what he walked into and what he did about it, Ned has a warning. Because if you've taken a new IT job and you're starting to look around and feel that creeping sense of, "Oh, no," Ned wants you to slow down for a second.

Ned:

What may initially look like a horrible mess might actually have some sort of planning and process behind it, but when you walk in initially, you don't see that. You just see what looks like chaos. And whether you're an individual contributor or you're a CIO or somewhere in between, you're still walking into a new environment that you don't know anything about. And I think initially, your job is to listen and understand before you suggest any changes, because until you get the context of the environment, you're not gonna know what is an actual mess and what is purposeful chaos.

Katey:

Purposeful chaos. Ned has a story about this. He was consulting at a client. He walked in, looked at how they were managing their virtual desktop environment, and immediately saw something that looked, to him, completely insane.

Ned:

I worked on a lot of different consulting projects where I would walk into an environment and really question why they made certain decisions. Like, I walked into, uh, a Citrix deployment, and it seemed to make no sense the way that they were managing their images. And I was like, "Why do you have 12 different golden images when you could probably just do it with one or two, right?" And what it came down to was some applications had a different rate of change than others, and so having different golden images made sense for those. And then they had very specific different personas that would freak out if they didn't see the exact desktop that they were used to. And so they had to be very deliberate about how those desktops were built and presented for those types of workers. So when I walked in, I saw, you know, 16 different images. That's way too much. You don't have that many desktops. Like, why are you doing it this way? And before I passed judgment, I asked, "What are all these different images for?" And as I dug deeper, it didn't seem like chaos anymore. It seemed more like, oh, you have actual reasons behind why the technology's built this way, and they're not technological reasons. They're people and organization reasons.

Katey:

People and organization reasons. That's the thing the architecture diagram won't tell you. That's the thing you only learn by asking. Ned again.

Ned:

Don't assume everyone's an idiot. They probably had good reasons for the decisions they've made.

Katey:

Okay. So take a deep breath, listen first, don't assume. But sometimes, sometimes when you finally finish listening, you look up and realize, no, this really is a mess. The servers are 12 years old. The renewal project never got finished. The team that built it isn't here anymore, and nobody is coming to save you. That's where Chris was. Act two, the pause. Chris took the team lead job at this manufacturing company in southwest Germany and walked into a situation that by any reasonable measure should not have been allowed to happen. Here's what he found.

Chris:

The old IT manager left and took the whole team with them. So we had about couple of weeks transition before the last old team member left. And yeah, it was really inheriting a mess back then.

Katey:

The previous IT manager poached his entire team on the way out the door, almost all of it. One person stayed in the Middle East office. Everybody else left. A new IT lead, that's Chris, gets hired. New people get hired around him. They walk in, they start looking at what's actually there.

Chris:

A lot of projects weren't finished. They had a local data center and the cloud, of course, Microsoft Cloud, and in the local data center they had two hypervisor renewals back then, um, based on VMware, but none of them was finished, so basically they just added the new servers back then. I think the oldest ones were about 12, 12 years old or something. Uh, built a new cluster, hosted their machines, but didn't move all the old stuff to the new cluster, so they just expanded the mass over the years. Not sure why. Maybe budget, maybe time, whatever. We weren't able to ask them finally.

Katey:

And you can't ask them because they're gone, so you just live with what they left behind. And what's left behind was a hybrid environment that nobody planned for. Two unfinished hypervisor projects sitting next to each other. 12-year-old servers running production workloads. 2,500 employees relying on a team that was, by any reasonable IT staffing math, way too small.

Chris:

We were pretty busy with the day-to-day business, keeping it running somehow with blood, sweat, and tears, I would say.

Katey:

Blood, sweat, and tears. This is the part of the IT job that doesn't make it into the LinkedIn posts, the part where you and your tiny team spend every weekend keeping the lights on, the part where every Monday morning meeting is a roll of the dice, the part where the new guy starts knocking on your door before you've had your coffee with a face that says, "Yes, something else is on fire." Here's what most people would do in Chris's position. They would work harder. They would push the team to put in more hours. They would try to outrun the chaos by sheer force of will. That's not what Chris did

Chris:

We recognized together that it will not work out. Um, when there's a small team of five people in Germany and three people in Middle East handling the whole 2,500 employees, basically IT-wise, um, for operations, it doesn't work out. So I talked to my boss back then

Katey:

He went to his boss, and then his boss took him to the CEO. And Chris had to make an argument that mid-market IT leaders all over the world struggle to make every single day, which is the way to fix this is to stop doing things. For a while, anyway.

Chris:

We talked to the CEO. I explained them what exactly technical debt is. I was like, "Okay, look, you have a lot of debt there, and, uh, day to day, your people are paying the interest on that, so there is not much time left, or there is no time left for any projects which actually will generate business value."

Katey:

He explained technical debt to the CEO, not in IT terms, in terms a CEO understands. Interest, principle, the cost of carrying a balance you can't pay down.

Chris:

And I think that term is great because even somebody who learned economics will understand debt, and will understand interest, and will understand what that means if you're not paying your debt, and if you're not able to pay your interest after a while anymore. They all know that, and everyone understands that. I mean, even a small worker with credit card bills at home will understand what happens if you do not pay your debts.

Katey:

And the CEO, to Chris's credit, and to the CEO's credit, got it.

Chris:

We got the okay, we got the budget to do a full project stop. All projects were stopped for a short period of time, couple of weeks, and, um, then we released step-by-step really small pieces of work back into the pipeline.

Katey:

A full project stop. Everything they were supposed to be building, paused for a few weeks while the team rebuilt the foundation. Chris has a way of describing this decision that I absolutely love.

Chris:

Basically, we stopped cutting the tree with a spoon and switched to a sharp ax by really stopping work, and slowing down, and focusing.

Katey:

They stopped cutting the tree with a spoon. They switched to a sharp ax. What Chris did, the stopping, the focusing, has a name for it in IT. It's something a lot of operators believe in, but very few actually get permission to practice. It's called the theory of constraints. It comes from a book by Eli Goldratt called The Goal, written in the 1980s. The idea is that every system has a bottleneck, and the only useful work is the work that addresses the bottleneck. Everything else is a waste, and Chris is a fan.

Chris:

The context switching between 10 different topics during the day is killing efficiency, and people will not be able to focus and work on what they are supposed to work, what you're paying them to do because they are busy and thinking, "Oh, what's next? Oh, there's another meeting. Oh, they want my opinion." Doesn't work out very well.

Katey:

Now, Chris is not the only person who's done some version of this. Here's Ned.

Ned:

There's a famous law called Conway's Law that essentially says that the systems that you build reflect the organizational structure of the company. And so understanding that organizational structure, who makes decisions and why they make decisions, and sort of how risk-averse a company can be or how quickly they grew, can sort of point you in the direction about, around why those decisions were made.

Katey:

Conway's Law was first written down in 1968 by a programmer named Melvin Conway. It says that any organization that designs a system will produce a system shaped like the organization that's built it. Six teams will build a six-piece system. A company that doesn't talk across departments will build software that also doesn't talk across departments, which means, Ned says, the mess you're inheriting probably isn't really a technical problem at all.

Ned:

Part of it's collecting information and understanding the lay of the land, and, like, figuring out what are the three or four things that absolutely need to be fixed right now, cannot wait. Everything else, you wanna do a deeper dive and figure out why it ended up that way, because you may be able to Band-Aid fix it right now, but if you don't fix the underlying organizational issue that caused the system to be that way, it's just gonna crop up again later. Fixing the organization, and the processes, and the working with the people, has a direct impact on how well the systems and technology function.

Katey:

Fix the organization, fix the process, fix the conversation, which when you think about it, is what Chris was actually doing when he sat down with his CEO. He wasn't fixing servers. He was fixing the conversation about servers. He was getting permission to do the work. So how did the project pause turn out?

Chris:

We hit the pause button really short, just to reorganize ourselves, and then we, of course, released work because we cannot say, "Okay, uh, sorry, you get no IT support. You get no IT operations. We will just stop everything for six month or whatever." It was a busy project time, but we were able to, you know, um, reorganize, talk about who is taking what parts.

Katey:

A few weeks full of pause, then small pieces of work released back into the pipeline. Network backbone renewal, virtual desktop environment, SAP changes, but only the ones that mattered, and only when the team had the bandwidth to do them right. How long until things were actually under control?

Chris:

Overall, I think it took about six or eight month to really got to the point where we said, "Okay, hey, let's say the data center's just running. Okay, we, we monitor it, but it's not like you need to poke it with a stick every weekend that it doesn't die."

Katey:

Six to eight months, not overnight. But also, not blood, sweat, and tears anymore

Chris:

In the end, it saved us a lot of time. Like, uh, some people didn't have to go to the office every weekend because something drastically failed. And then, of course, take off Monday, Tuesday, whatever they need you know? That didn't happen anymore, uh, or at least not so much. Of course, from time to time was still happening. You cannot avoid it. But it was really a resilient infrastructure we built, and that was a foundation where we were able to build on. And I think that's an very important thing that you can say, "Okay, the foundation fits." We have a foundation to build on because if you build on sand, it will not work.

Katey: Act three:

the translation. Here's something I keep thinking about. Chris's story works because his CEO listened, because his boss took him seriously, because when he said, "We're drowning. Give us the budget and the runway to fix this," the answer was yes, and that's not always how it goes. Leon has a thought about why.

Leon:

I think that there's a difference between being angry and being nasty. Most of the time, people are just trying to do good. They want to make things better, and I think the frustration point is when they are blocked from doing it.

Katey:

The frustration point isn't being handed a mess. IT people, broadly speaking, like fixing things. They got into this work because they like fixing things. Walking into chaos is, for a certain kind of person, almost fun.

Leon:

And where the real frustration comes is not when you're being handed a bad situation, but when you're being blocked from doing anything about it.

Katey:

Being blocked, that's the actual problem. So the question becomes, how do you keep from getting blocked? How do you make sure that when you walk in and find a mess, and figure out what needs to happen, that the people upstairs say yes instead of no? Leon thinks IT has a language problem. I

Leon:

think in IT, we suffer from a fact that we are almost allergic to talking about things in business terms, and I understand why, but it doesn't help us, and it doesn't serve us. If we think of business as a language, you know, learning to speak the lingo or the language of business, and a lot of people immediately react and say, "I never wanna do that. I never wanna talk about, you know, EBITDA. I never wanna talk about ROI."

Katey:

This is the part of the IT career that nobody really teaches you. Nobody tells you, "Hey, on top of mastering networking, and storage, and identity, and security, and cloud, and now AI, you also have to learn how to speak finance." But Leon argues that's the unlock

Leon:

Learning the language of business does not make you less of an IT practitioner or professional. It doesn't reduce your credibility in that space. It makes you bilingual.

Katey:

Bilingual, not less technical, just also able to translate. And Leon has a framework for what to translate into.

Leon:

No matter what you do, every business only cares about

three things:

increasing revenue, reducing cost, and removing risk. Those are the three levers that every business responds to. So if you can frame that mess in one or more of those terms-

Katey:

Three levers, revenue, cost, and risk. That's it. That's the whole language... Leon: Hey, if we clean this up, we We will remove risk at this level. Sometimes removing a mess means increasing revenue. If we fix the website, I know it's technical debt, but if we do that, we actually have the potential to serve 30% more users per hour than we do now. You don't walk into the executive office and complain about VMware. You walk in and say, "We can reduce our expenses, or we can remove this risk, or we can increase the rate at which we serve customers. Pick one or all three."

Leon:

You may find that you are allowed, you are given permission, not only permission, but budget and staff to fix the mess because you've spoken a language that they understand.

Katey:

Now, listen to what Chris did again, knowing what Leon just said.

Chris:

We talked to the CEO. I explained them what exactly technical debt is. I was like, "Okay, look, you have a lot of debt there, and, uh, day to day, your people are paying the interest on that, so there is not much time left, or there is no time left for any projects which actually will generate business value."

Katey:

Chris didn't have Leon's framework in front of him when he walked into that meeting. He's a German operations lead at the manufacturing company. He just did it intuitively. Talk about the mess in business terms. Use a metaphor, credit, debt, interest that any executive will understand. Connect the work to the thing the business actually cares about, business value. Chris speaks that second language. He didn't know that's what he was doing, but that's why he won. And Ned will tell you, once you've spoken the language and gotten the permission, the work doesn't actually end

Ned:

Being in IT is being in an agreement to be a lifelong learner. Technology is always changing and evolving, and a big thing I've seen is something may be considered a best practice 10 years ago, and that's when you learned the rule. But you haven't taken time to reevaluate that rule, and now the best practices have changed 'cause the technology has changed.

Katey:

The mess you inherit today is in some ways the consequences of decisions somebody made years ago that were absolutely correct at the time. And the work you're doing now will eventually be inherited by somebody else. So you do it well, you do it carefully, you document it, and you try to leave the next person something better than what you got, or at least not on fire. Three voices, three different takes on the same problem. But when you listen across all three of them, the same advice keeps emerging. Here's what I think the playbook is if you've inherited a mess. One, slow down before you do anything else. Listen, ask questions. Find out what's actually broken and what's purposeful chaos with reasons you don't understand yet. Two, when you're sure the mess is real, make the work visible. Give it a name. Give it a number. Make it impossible to ignore. Three, translate it. Don't ask for permission to fix it. Ask for permission to reduce risk or save money or unlock revenue. That's the language the business speaks, so speak it, and then get to work. We'll let Chris have the last word.

Chris:

Think really about what you need and make work visible. Make work visible so that you can also say, okay, to CEOs or whoever,"Look, we are drowning in, let's say, day-to-day operations because we have to poke some systems with a stick every night or whatever." Don't be scared to ask for a short pause to catch breath with the team and say, "Okay, let's reorganize, restructurize, and try to tackle the whole thing with a new plan."

Katey:

If you're an IT leader listening to this and you're walking into something that feels like Chris's situation, or you're already in the middle of one and trying to figure out how to get the room to listen, you don't have to do this alone. The Catalyst was reported and produced by Tobin Dalrymple and the team at Pilgrim Content. Editing by Ryan Clark with support from Philippe Demas, Joseph Fire, and the marketing team at Softchoice. Special thanks to Chris Shop, Leon Adato, and Ned Bellavance for sharing their stories and their hard-won wisdom from the front lines of IT. If this episode resonated, share it with another IT leader who's walking into something messy right now. They'll thank you for it. I'm Katie Teekasingh, and thanks for listening.