Customer Support Leaders

276: Mastering Incident Management - Part 4b of 6; with Kat Gaines

Charlotte Ward

Send us a text

Unlock the secrets to mastering incident management with our latest episode featuring the insightful Kat Gaines. Discover how to transform your customer support strategy by crafting effective incident communications and building strong interdepartmental relationships. Learn from Kat as she delves into the complexities of maintaining a unified communication strategy amidst a sea of different tools and remote work challenges. We'll walk you through the importance of clearly defined roles and well-documented processes, ensuring that your team's internal coordination translates into seamless external updates that keep your customers informed and confident.

Join us as we unravel the roles of incident commanders and customer liaisons, and explore how small businesses can evolve from an all-hands approach to more streamlined responses using tools like PagerDuty. We'll highlight the delicate art of balancing technical details with brand perception in customer communications, and how your customer-facing teams can lead the charge in crafting messages that resonate. With practical advice, such as utilizing system maps to bridge the gap between technical jargon and customer understanding, this episode promises to equip you with the knowledge to enhance your customer experience during incident management.

Support the show

Charlotte Ward:

Hello and welcome to episode 276 of the Customer Support Leaders podcast. I'm Charlotte Ward Today. Welcome Kat Gaines in part 4B of a six-part series on incident management. I'd like to welcome back to the podcast today, kat Gaines. Kat, lovely to have you back again, for I'm losing count. Is this episode six? I?

Kat Gaines:

think we're at five now. Five, five. We're at five now. Yes.

Charlotte Ward:

We've expanded our plan. The plan is ever-expanding, because this is a topic we can deep dive into, isn't it? Which is what we're going to do today? Right, it's a communications masterclass, part two. Yeah, yeah. Masterclass part two yeah, yeah. So this is, I think, where we left off. Our last chat was, you know, talking about the comms from a more kind of almost the opposite of this, like a strategic, like having guiding principles. It was from a principles point of view, but now we're getting down and dirty, is that?

Kat Gaines:

right. Is that fair to? Say yeah, that sounds about right, so cool.

Charlotte Ward:

So, um, so let's talk about, I guess, one of the when I think about some of the guiding principles. You know that we talked back on last time. Um, we, I think you know what I took away from and I'm going to put it in my own words was kind of, you know, accessibility, you know, having everyone be on the same page, have the same information, you know, getting to that source of truth, etc. So I mean, you know, I've seen a lot of people try and wrangle their own versions of that single source of truth and you end up with you end up with like lots of different business functions saying, well, this is where we work, right? No, but this is where we work and everyone's trying to use their own tools or wrangle their own tools into it. So what's the simplest place to, and simplest way really to combine all of that once everybody has given their opinions?

Kat Gaines:

Oh yeah, so we were talking a lot last time about just kind of keeping customer facing teams in the loop, right. And that that's the first step to communicating making sure that that team has the info and, more importantly, we were talking about the relationships between teams and ensuring that there's a good, tight relationship between the support teams and any team that can cause an incident or otherwise be involved in an incident.

Kat Gaines:

So I would traditionally just say, you know, oh, product and engineering in an incident. So I would traditionally just say, you know, oh, product and engineering. But in reality, anyone has the capacity to cause an incident, based on what you define as an incident, of course, but whether it's a bad deploy or whether it's maybe some kind of security issue, those types of things tend to start spreading throughout the organization in terms of who has the ability to unintentionally be the cause of an incident or be working on an incident that the customer facing teams like support need to know about. So that's one of the first things I think to your question that those relationships have to be really well established, well established, and those teams have to be all on the same page. That you know what.

Charlotte Ward:

Yes, this team absolutely has to understand that this is what this is, who we talk to and what's going on, uh, when we are in the middle of an incident itself and that that talking to that communication really live and in the moment is uh, yeah, I mean I, I in a world and I know you do too where everything's Slack because we're all remote and you know, I guess for me the direct parallel is going to be in an office, everyone gets in a big meeting room in the corner, don't they? But the reality is here, we don't have that option. So what's a good Slack approach? Let's think about where most teams are right now.

Kat Gaines:

Yeah, exactly, you don't have that option if your team is primarily or even partly remote, and you also don't have that option past a certain scale too.

Kat Gaines:

So we might've talked about this on a previous episode, but that's something that my team, when I was running support at Pager Duty back in the day, when we are very small, that we used to do that We'd get all the support and all the product in one room every Friday and just talk about our feelings and it was very. It was really lovely and harmonious and, of course, that didn't scale with size of the company, with added office locations. So, whether you're co-located or remote, you're going to have to eventually work with someone who isn't sitting next to you once your company is larger than, let's say, you know, 60 or 70 people and so being able to communicate through Slack, having a good process set up. So I think we've talked a lot about the full incident response process over the last few episodes and really the key here is that support needs to know what the process is. It has to be documented somewhere, no matter what it is.

Kat Gaines:

And they need to know where they fit in. Part of that is having really clearly defined roles. So if they are falling into that communications liaison role that we've been talking about defining what that role is, we've given folks a lot of tips in these conversations so far in this podcast. Uh, patriot duty has some resources around that specific piece of it and how to define those roles. That we can leave folks with later. But really folks need to know where to understand a couple of particular things. They need to know how incidents are communicated, where they get information and what types of information. So we talked in our last episode about internal updates.

Kat Gaines:

That's where this is going to come in and it's going to eventually translate to your external updates, which we'll get to in a few minutes. They need to know where to track just that progress and then they need to know where to expect updates so they can update the rest of their team as well, usually one of us from support. It's going to be on the call or tracking the incident itself and kind of the, the knowledge holder of the moment, and we need to be able to spread that information out to the rest of the team.

Charlotte Ward:

Yeah, um, I I think the process is really important, everything, everything you just said, but you know, I think, even so. So, like super, super, practically the way we run this the moment is we have a central incident channel, which is like our master channel, if you like. It's where every single incident channel is linked to, so anyone can go to the incidents channel and get the link to the latest incident channel, because every incident runs in its own separate channel. But the process piece around that, because it's really critical, I think, think and I think even in a channel you need the structure of process and roles, because otherwise you just end up in a free-for-all and everyone's trying to do everything.

Charlotte Ward:

I mean, this comes back to the very first conversation we had, where everyone really needs to have a very clear idea of what their part in the incident is. Otherwise you're going to have everyone diving off in different directions, either missing, forcing gaps or, you know, doubling up on work unnecessarily, and it creates a sense of panic and, and particularly when you get that number of people, potentially high number of people in one room, whether it's a physical room or a Slack channel right. So, even though Slack is this informal communication tool, and you know the same can be applied to Teams and, you know, chat software of your choice, dare I say. But even though it seems an informal communication tool, a real-time comms tool, without the process and the roles to give it that backbone, it's hard to corral everything, isn't it?

Kat Gaines:

Yeah, it really is. You really still have to put that intentionality and that thought into it to say, okay, here's what our process is, here are the roles that we have in incident response. So you know whether we have our incident commander, our folks who are our scribes that comes in especially of importance in a Slack channel Our customer liaison, our internal liaisons who are communicating out. You need to know what those roles are and what the responsibilities are and ensure that everyone has that training on them so that they really know when something happens. Here are my responsibilities and when I'm in the Slack channel, here's what's going to be expected of the engagement too. So, for example, it's a very common model to have the scribe literally scribe everything that's happening in the call in the Slack channel.

Kat Gaines:

For someone coming in from the outside directing a question at that scribe is not terribly productive A because they're continuing to scribe through the incident they have to continue doing their job and not get too interrupted. And B because they're continuing to scribe through the incident they have to continue doing their job and not get too interrupted. And B because they're not in conversation with the people who are actually making decisions.

Kat Gaines:

That's not in conversation with the incident commander. It's not in conversation with the customer liaison if they're talking about a communications decision. So we'll see people make that mistake where they come in and they start, you know, adding a thread on one of the scries posts in Slack and asking them questions, saying, well, can we do this, can we do that? And then that's another good cue for someone else on the team whether it's an incident commander or maybe a deputy incident commander, someone else to say, oh, if you want to be part of the conversation, you do need to join the call, because this is just our running record. And then, as you pointed out too, separating out functions a little bit. So the model you outlined, where some companies will separate different incidents into different channels, is one way to do things. What I have also seen people do is separate different functions of the incident response into a different channel. That's especially handy if it's getting very complex.

Kat Gaines:

And you know there's a lot of conversation that needs to happen around one particular piece of it. And then also, too, taking those internal updates out of the main channel.

Kat Gaines:

And so there's your run of Scribe is still in the main channel, but then the key updates that people need to know. That has its own home. That is an internal incident updates, whatever you want to call it channel, so that folks don't feel like they have to sift through the scribe quite literally transcribing every single thing being said and done, but they can just go find key points, not one location yeah, yeah, I, I think, um, I think it's a really good point and something that we are a little fluid about.

Charlotte Ward:

Depending on how complex the incident is, you know how much sits inside the main channel, how much not we, because we run a lot of our incidents in written form already, unless it really requires us to be on long calls. We might jump on a call here or there to talk through options or to quickly, you know, run through possible outcomes or possible paths, or or you know whatever the we don't need a scribe just scribing but where I think it is invaluable to break parts of the conversation away from that main channel and away from everything that's going on in terms of incident management, is for the SMEs to go and do their thing, which is exactly what you're saying. I think we almost have a hybrid approach where there'll be a bunch of engineers in there and suddenly they'll realize they're really deep in code, now we should probably lift this out, and they'll go to another channel and come back and say this is what we've established, rather than doing all the solutioning in in the main space.

Kat Gaines:

You know yeah, yeah, absolutely. That's very difficult to do in one conversation.

Charlotte Ward:

It can become very, uh, a little bit muddled, really easily depending on the complexity yeah, yeah, and we've talked a little bit before as well about the use of tools like documents and you know, for slack, the slack canvas is great for actually it breaks out in some ways. It kind of breaks out a little bit of. You know one of those roles because you can do a lot of the comms. You know the communication, customer liaison stuff. A lot of that is drafting communications, right so. So that can happen in documents or in canvas or somewhere else as well. You know the communication, customer liaison stuff. A lot of that is drafting communications, right, so that can happen in documents or in Canvas or somewhere else as well. So we're already in the incident, aren't we? But actually, again, like deep diving into how this happens, how is the incident like declared? How is it communicated to the business the fact that there is an incident?

Kat Gaines:

yeah. So I think a little bit of what we were talking about less last episode still applies here in the infrastructure and the setting up and what you're doing in that sets folks up. For where are they getting that information? So ideally you're going to be using a couple of things for finding out that there is an issue of any kind. Hopefully you're using something like PagerDuty in your engineering team, so hopefully there's some connection, obviously, where your monitoring is giving you a heads up that something is happening and you also have a path in for people to manually declare incidents. We talked about that in a previous episode.

Kat Gaines:

just having kind of a button people can push to say all right, here's an incident. A Slack command is something I see that's really common to give folks very easy into that, where you just use a command and it has an automation to trigger everything that's necessary and then letting folks know that there's something going on. Again, you kind of have to think about the size of your business and the scale of who needs to know what. So for a small business it might be okay. There's an incident, everybody knows, everyone is alerted, it's all hands on deck, everyone's in it and again, it's very lovely and harmonious when it's possible.

Kat Gaines:

When you grow a little bit larger, that does get unsustainable. Right, you need folks to continue working on the business initiatives themselves instead of just going all hands on deck and get the right people into the incident at the right time, and so that's something that, um, you know at PagerDuty, with our product and how we talk about this, we emphasize that a lot because it's incredibly important to make sure that you can concurrently handle an incident and ensure your business is still running, make sure that someone is focusing on the emergency state and that someone else is looking ahead to the long term. And when you scale up a little bit. You have to be able to do that. So it's about identifying who are the right people.

Kat Gaines:

We talked about the stakeholder communications in our last episode and making sure that there is a channel for them to understand that something's going on. So we just talked a few minutes ago about having an incident updates channel that can be open to the whole business so that anyone who's curious can go hang out there and see what's going on. But then if you're targeting stakeholder notifications, if you're, for example, paging people through something like pager duty with a stakeholder alert, or if you are sending emails to people, you want to curate that list a little bit more to the folks who are going to be directly involved and you're likely going to have it be obviously the teams who are affected. Maybe it's their piece of the product that is broken some kind of incident commander rotation or list, so that you pull the right person in and then, with your customer facing teams, you're going to have at least key contacts on this customer facing team so that again they can communicate that out internally as well.

Charlotte Ward:

Yeah, yeah, that makes sense and you know there is all of that information at declaration and I know part of the answer to this question is that that's what your liaisons are for. But as the incident progresses, how do you manage that flow?

Kat Gaines:

Well as it progresses. You're right, that's part of what your liaisons are for and you want to have those folks just in that moment of looking for the key updates and listening in terms of what the incident commander is discussing, what any subject matter experts are discussing. They want to grab those key updates and start drafting communication around them and for your interleads on that's going to be a little bit more fast and loose. They're able to kind of get things out in a less curated manner because it's internal. You don't have to think too hard about what are you communicating externally? How is the media going to view this? How are your customers going to be concerned? You just want to let folks know what's going on.

Kat Gaines:

But then we get into the external communication and that's really the nitty-gritty of you know, the how do you do this and that's where you have to be a little bit more careful.

Charlotte Ward:

So the very first time, don't you? And actually, one thing I want to just draw on there is that I think there is an interesting I want to say tension that might be too strong a word, but there's certainly a dance let's call it a dance which I find happens in any incident which has an external impact, which is most of them to one degree or another. The incident commander has their fingers in all the pies, knows exactly what's going on, but isn't necessarily qualified to be the arbiter on what to communicate and when to communicate to, particularly external stakeholders, particularly external stakeholders. But also, you know, the liaison has to make sure that they're getting the information they need to make the judgment calls about what to communicate and when to communicate. And I think it's interesting because the incident commander is going to have a lot of information at their fingertips at all time, right and exactly, and there is, you know, and it's also a very busy role.

Charlotte Ward:

So one thing I kind of get the sense of sometimes on really hot incidents is like, particularly the customer liaison, kind of circling, waiting for the thing that they can hook into, you know, circling waiting for the thing that they can hook into, you know, and I think I haven't found a better solution for it yet than just you kind of have to communicate and ask questions, often as the customer liaison, you know. Have we reached a point where what does that?

Kat Gaines:

mean.

Charlotte Ward:

Or you know all these like really obvious questions and I'm I'm very happy to play the dumb person in the room as customer liaison sometimes like does that mean this is fixed or does that not mean this is fixed? You know those kind of questions, so that so you are making the judgment calls on what to communicate and when, and I yeah, because your incident commander isn't necessarily qualified either. So it's not going to get pushed to you, is it?

Kat Gaines:

right, exactly, and the thing we talked about in a previous episode is that you're the expert on customer communication in that moment. You are the authority, and so making those judgment calls and asking those questions has to be something you're comfortable doing. I fully agree with the I'm just I'm going to play the dumb person here.

Charlotte Ward:

I don't really know what's going on.

Kat Gaines:

Explain it to me. Like I'm five, just lean on all of those things and make sure that, a you know what's going on, b that you have a clear picture of how to explain it to other people, and if you don't just kind of wash, rinse and repeat through those steps, and one thing you can do to prep for that, because sometimes I find that when we're in an incident, things are really overwhelming Again.

Charlotte Ward:

I.

Kat Gaines:

I've. I like to talk about that state of trying to get to incident calm that I was talking about in a previous episode. But we still feel that adrenaline rush and that anxiety and we tend to kind of forget ourselves when that happens and given the primary like aim of most instances to resolve.

Charlotte Ward:

You also don't want to feel like you're slowing everyone down, but actually you do have to be comfortable.

Kat Gaines:

Yeah, you do have to be comfortable with that, right yeah yeah, and so something you can do to prepare yourself for that is to say okay, if my brain shuts down during this incident because it's happened to all of us um, if my brain just completely shuts down, give yourself some templates to be able to say for both and actually I don't talk about this enough.

Kat Gaines:

I talk about templates in the sense of external communication, and that's part of what I'm getting at here. You do want templates for external communication to say here's the Mad Libs I need to fill in to be able to get our customers on board with what's going on. Here's my fill in the blank and make it very easy to say here are the details I need. But then you also need that internally too, and that's the part I'm saying I don't talk about enough. You need it for yourself so that if you are getting ready to draft communication you're getting ready to approach that communication template.

Kat Gaines:

You have that first preparation template to be able to say what are the key pieces of information I need as customer liaison to understand what's going on here.

Kat Gaines:

Do I know these different pieces of infrastructure and how they work, like we were talking about in the last episode?

Kat Gaines:

Do I know what all these terms mean? Do I understand a little bit about the steps that are proposed and where we're going next? Even if I'm not communicating that externally, do I at least understand that so I can prepare for what my next communication might be? So you have to do that check with yourself, and I strongly, strongly recommend having templates both for that preparation, so that your team member who is acting as liaison in that moment knows everything they need to get and the questions they need to ask, especially if they're a little bit newer to the process, and for the external communications, that you have a clear view of the information that your customers need on a consistent basis from your company and your product when something is not going to plan and I can think of some key elements every template, even for those internal questions, even for those internal conversations, should have and to some degree, if you are making, if you have a reliable source of truth that's easily accessible.

Charlotte Ward:

This doesn't always have to be a communication, does it? It doesn't have to be real time, it doesn't have to be you asking questions of the incident commander five minutes into every damn incident of like. So what is this? Hopefully you're building a platform, whether it's a slack canvas or page or anything in between, that some of that can be self-served because somebody somewhere is writing down the footprint of the incident, the, you know, the expected path to resolution, certainly the initiation time and, as they understand it, the blast radius in terms of users or cohorts or whatever. So you can self-serve. I hope you get to a point where you can self-serve some of that. So, even if you're templating, you're also not just parroting that template to every instant, commander in every incident, right yeah yeah, exactly.

Kat Gaines:

it becomes more intuitive and more part of your own language around how you ask questions about these things, rather than just reading out the form and saying here's the information I need step.

Charlotte Ward:

You know, um, I think that's really, I think that's really important, even if you don't communicate that, even if, yeah, and you probably won't actually until you're certain, um, yeah, but, but also it just saves you another journey back to the incident commander and back again if, because some customer is going to email you very quickly to say you know what's happening next, yeah, yeah and if you have some clue, you can begin to think about what responses to those questions that come in thick and fast um, look like right yeah, yeah, and that can be.

Kat Gaines:

That gets to be a very kind of touchy and sensitive point in the middle of an incident, and so it does really help to gain that familiarity with. Okay, for example, if there is an accidental, you know bad deploy that caused this, might it might be that simple. I, I don't know, the incidents are often that straightforward anymore, but I think they can be. And so being able to say, okay, well, I've seen this happen before, this is likely what happens next, you're probably not going to communicate exactly that to the customer but, to your point, you're going to have better language to talk about. Well, all right, here is what we usually see in these situations, or here's what we generally expect to happen, that we'll be able you can potentially say we will probably be able to resolve this with some kind of confidence. Again, depending that you've gotten the sign off on that messaging.

Kat Gaines:

Be very careful about promising things on that messaging. Be very careful about promising things, um, but you can at least let folks know that, like this, probably isn't going to be an incident going on for days or weeks or something like that.

Charlotte Ward:

Our team is working on it and they are actively working toward a resolution implying that it's coming yeah, you know it's on the way, yeah and one thing you can always promise is when the next communication will come, and I think that should definitely be part of the template, right.

Kat Gaines:

Exactly. You are always in control of that and you can always live up to that promise too, even if and we coach a lot about being able to give folks meaningful updates even if you don't have something new you can still give a meaningful update. If you don't have something new, you can still give a meaningful update, meaningful update does not equate to new information, but it can equate to.

Kat Gaines:

here's what we know. It hasn't changed a lot since our last update, but we are still working on this and also here's when our next update is going to be. Even if that's all you can say, that makes someone feel so much better than complete radio silence or an update that just seems canned and like you're phoning it in right and I think I want to clarify that for folks listening. Template does not mean full canned language.

Charlotte Ward:

It means a guidebook to what type of things you want to say to folks yeah, absolutely, and I I think the same with you know any response on any ticket.

Kat Gaines:

If you're making overly heavy use of macros, people just disengage because it's it seems they're like ah, I saw this response the last time I put in a feature request, Right. And so I was like ah, I don't think you're taking that seriously, because you're just sending me the same canned thing.

Charlotte Ward:

Exactly, exactly. So we've talked a little bit about internal stakeholders and a bit there about, you know, managing external comms as well as internal comms through templates. Who writes them? Who writes these communications? Is it the customer liaison? Every time?

Kat Gaines:

Yeah, it is. It's usually going to be your customer liaison At the end of the day. Again, this is something we talked about in the last episode not the last episode, but a previous episode. They're a person who is the authority in how to talk to customers, so you need to ensure that they are the ones writing those customer facing updates. And that doesn't mean that no one else has input into them, right? It just means that they're going to write that first draft. They're going to maybe in the call.

Kat Gaines:

Once there's a moment to do so, just raise your hand and let the incident commander know that hey, I have an update that I want to publish externally.

Kat Gaines:

Can we fact check this and make sure that it is completely factual and that we're not leaving out any key information that folks might want to know at this point? And then, once they get that sign off, then they're going to post it. So it's a collaborative team effort, but your customer facing team should own customerfacing communication. And this is not to say that we don't have faith in other teams to write customer-facing communication or anything like that. Absolutely not. Completely believe in those teams' ability to do so, but they're busy, they have work to do. You have to play a role as an incident commander, you have to be a subject matter expert and go track down the issue with your part of the product. You cannot expect your brain to be able to handle both of those very different responsibilities at once, and so it's both about making sure you have enough folks to cover different responsibilities and giving that authority to the people who, again, their day in, day out, is talking to customers.

Kat Gaines:

Yeah, so it's about that customer perspective. They need to know what you're as the support rep or other person who's filling that liaison role, what your understanding of the issue is, because you are going to be best equipped to speak both their language and the internal language around that and to be able to translate that back out.

Charlotte Ward:

Yeah, that translation is a really big part of any support person's role, I think, but I think it's really, really, really important to reassure all of our engineering friends out there and any other team that may be the primary responders in terms of resolution to these types of incidents, whether it's, you know, a security team or an engineering team or you know any number of subject matter experts that are going to get this incident over the line. It's not that we don't trust them to write plain and reasonable English. It's just that there are so many nuances to a well-crafted customer communication and, in the same way, I wouldn't expect me to write perfectly crafted code. I wouldn't. I wouldn't lay the same responsibility on someone whose primary role it wasn't because there are so many nuances around language.

Charlotte Ward:

Choosing your words, commercial responsibilities as well, I think, is something that we don't often talk enough about when we're talking about support responses or any customer communications. You know, finding the right way to say something is it can have a huge effect on the way your brand projects into your customer's business and and the faith that that customer has in you as a business. And you know, and we all know, what it's like to be internally focused. We've all had our moments in you know, whether they last for a couple of hours and you get a bit heads down. You forget about the customer, even even support. People do that right and we all do it absolutely, absolutely.

Kat Gaines:

But when you're knee deep and troubleshooting logs and just so immersed in that that everything becomes what's happening in those logs and eventually you have to pull yourself out of it and go oh yeah, there was a person at the other end of this right. Right, I need to explain that to them.

Charlotte Ward:

Exactly, and I just the way I see it, is that when you're not at the customer facing end of the business, when that is your day-to-day, every day in and say an engineering role, it's just a kind of longer walk, and so those are muscles that you don't practice, that you don't exercise regularly and you also don't have a lot of the. You know it's, it's the, the nuances that I'm talking about are really the things that you pick up as a support person or as a customer success person or whatever. That that um mean that you understand the wider impact on your business from a customer perspective. And so you can think about do I want to say this or do I want to say that, because this is probably more factual or it's more technical. But there is a balance to be made here in terms of our brand promise or in terms of how we're perceived longer term rather than in the moment.

Charlotte Ward:

And I think that engineers again, God, love them. But it's easy, as you said, we've all been there, we've dived into a, dived into a log file or into a piece of code or whatever, and it's um, it's easy to forget about the stuff that for a long, for after a while, becomes much more second nature to customer facing teams yeah, it really is very easy to forget about it and you hit on something that I want to emphasize for folks a little bit around just choosing your words and choosing them wisely, right.

Kat Gaines:

So there are things that you'll want to do in preparation for that. We talked a little bit about them in the last episode and I want to just expand that out to say things like system maps, for example, can be very, very helpful to understand just like where are things in our infrastructure?

Kat Gaines:

What does that look like? And then, how am I going to translate that to the customer experience? You have to be able to draw those connections, and when you're communicating externally, when you're drafting that external communication, your terminology has to be basic for them. And when I say basic, I don't mean dumbed down. I mean choose simple over complex wherever you can.

Kat Gaines:

It's about knowing your audience. It's about knowing how they interact with your product or service and so, when in doubt, your default should be describing the experience they'll have when interacting with it. What are the symptoms they're going to see? How is it going to break their experience that this incident is happening? And you might already know these things, but you'll probably also want to just crowdsource from your peers. What reports are we getting? For example, what symptoms do we see customers experiencing that maybe our engineering team isn't aware of Because, again, we're in the back end of what's happening here. It's not necessarily translating to the user interface, and so, if you're not sure, you can fall back on asking your peers. You might also ask the incident commander or someone acting as a subject matter expert, because they might know, obviously, if they're looking at part of something that's broken that up. This is going to make that one thing in the product look really weird, but you want that to be the leading factor when you're thinking about how you're crafting that external communication. What is the customer experience? Because, at the end of the day, that's what we're all looking for from these communications.

Kat Gaines:

What should I expect? That is the only question I'm usually asking when looking at these. What should I expect Whether that's in terms of how it's going to break my experience when it's going to be resolved, what they're going to do about it in the future? That gets a little bit into instant review and post-mortem territory, which we have another episode scheduled for, but expectation management is the key point of it. It's why we communicate, so that we don't have customers just sitting here going. I don't know what's going on and clearly they don't either. So now I don't trust in this business very true.

Charlotte Ward:

or even, or even you know customers not making the connection. You know, if you put out an instant report on a status page or something that says, hey, this component is generating this error, um, under circumstances that you know, and I'm sat here as a customer and that bit of the screen is missing, or something you know, how do I like? Is that what I'm experiencing? Or should I just contact support about this thing anyway, because you know, and so, as you said, leading with the, because you know, and so on, as you said, leading with the, you know customers may notice this and this is because of this, and and then everything else you said about you know, the, the well-crafted incident communication around what we know and what's happening next and hey that you make a really good point there that sometimes I think we we did this thing where we over correct a little bit and we we put out vague communication in the interest of maybe not wanting to raise alarm where we're not sure yet or protecting proprietary information, and obviously those things are of a concern.

Kat Gaines:

You do have to be careful around them. But we sometimes over-correct, too vague, so that we eventually put out a communication that it's just, it's an unusable state of communication.

Kat Gaines:

It's not something our customers can derive any value from, and so you really do have to think about, okay again, what is the customer experience? What symptoms are they seeing? How can I keep it as clear and define terminology and those types of things as much as possible when I'm crafting this communication? So, for example, if we're just saying you know vaguely infrastructure, but what we really mean is a specific server or set of servers that are having issue, even though that's a relatively small jump, you should still say the latter because it is highly specific compared to just generally infrastructure. Or you know even worse, our product, uh, those types of things that don't really tell you much.

Kat Gaines:

But if, if, uh, you know, like to go back to the classic oh, a server is on fire, then at least they know that there's a server on fire. They might not know which one or what part of the product it affects, but they at least know that. Okay, there is a clear and specific issue that we're gearing toward here that we can let folks know about, so that they know that we're not just kind of flailing around figuring out what to do. That we do, again, to instill their trust in us. We do have an idea of what's going on and we do understand how it breaks their experience.

Charlotte Ward:

And actually it's a really interesting point about the unpredictability of it. Sometimes. I think one of the things that you can communicate with confidence is predictable versus unpredictable. You know this was, this was a bad deploy. It's going to take us this long to roll it back. You should see your experience restored, whatever that is, by x time. Or we will communicate when we have done, or you know you can be. You can communicate the known. You can also say there are some unknowns, there's a server on fire somewhere in a data center far, far away and we are waiting on. You know, um, yeah, I, I know, if I'm a customer reading that, that I'm gonna have to wait for a you know a bit more meaningful information. And again, you can keep your promises in terms of we will give you an update in two hours or whatever that is. But I know that there's an unpredictable nature to this thing that's happening that maybe I'll just, as a customer, hold off pointing out the thing that I'm not sure is related or not, right?

Kat Gaines:

Yeah, you humanize the business that's having the incident, you humanize the team who is working on the incident when you give them the ability to understand that and so you don't have to go over the top and say, please be patient with us, please have empathy, we're human beings. But you, just you have the vulnerability of telling them those unknowns, within a safe boundary to do so.

Kat Gaines:

And that is a cue for them to check in and say, man, they really haven't quite nailed down what's going on yet, but they're trying really hard. Oh yeah, there was a human being at the other end of this and they're struggling right now. Okay, cool, I'll have a little bit of patience. Ideally, that's the narrative in their head. Sometimes we lose our ability to have that level of empathy toward other people and so we might not go through that full process, but you at least give them a cue that can crowd that process for them, and you make everything easier on them and on yourself by doing so.

Charlotte Ward:

Absolutely, absolutely, um, and that's what the instant process is here for is to make everything easier on us, right? Yeah, get that incident.

Kat Gaines:

At the end of it all just get to better and get to calm a little bit quicker.

Charlotte Ward:

Yeah, yeah yeah, so thank you so much for another awesome episode. I feel like this was super practical. Um, I'm hoping everyone's taken like four pages of notes from this and is going to go and go and implement that automation, that channel, that template. You know like, really really like. Applicable is that the word applicable. Applicable advice like this is this is right to go and do today. You can go and take, take this episode and run with it.

Kat Gaines:

Um what are we talking about?

Kat Gaines:

next episode though uh, we'll talk about what happens after an incident, when we are back in that state of well, you know, depending, but calm. Uh, we'll talk about what happens after an incident, when we are back in that state of well, you know, depending, but calm. We'll talk about what happens after the fact. We'll talk about reviews and we'll talk a little bit about how our support teams play into that process, because you're spoiler alert, not done, when everything's back to a green light on your dashboards and so on.

Charlotte Ward:

And I am at least happy to say that all incidents come to an end.

Kat Gaines:

They do eventually yeah.

Charlotte Ward:

It may not feel like it in the moment, but you will get to the review process at some point, so it's absolutely worth covering. I look forward to that. Thank you so much, kat. Great to continue our conversation. I'm looking forward to the next one. That's it for today. Go to customersupportleaderscom. Forward slash 276 for the show notes and I'll see you next time.