Your Contact Center Could Be One Outage Away from Disaster – And You Probably Don't Know It Artwork

CX Today

News and Insights for Today, and Tomorrow CX Today reports on the latest customer experience technology news and marketplace trends. Every day our tech journalists uncover the hottest topics and vendor innovations shaping the future of work.

Our coverage is fully digital offering our audience authentic news and insights on the channel of their choice. We offer daily news, weekly features, video conversations and authority content aligned to the needs of business leaders in today's world.For industry professionals, our weekly newsletter offers a range of popular stories hand-picked by our editorial team.

Subscribe to our weekly newsletter.If you're seeking editorial coverage, connect with our news desk.

All Episodes

CX Today

Your Contact Center Could Be One Outage Away from Disaster – And You Probably Don't Know It

May 06, 2026 • CXToday.com

0:00 | 55:15

Ty Givens, Rhona Bradshaw, and Mike Wehrs join CX Today's Rhys Fisher to examine why most contact centers are less resilient than their uptime metrics suggest – and why agentic AI is about to make that a lot harder to paper over.

For years, five nines has been the number that kept boardrooms comfortable. It looked clean on a performance dashboard and gave technology leaders a ready answer when executives asked whether the contact center could be trusted.

The problem, according to CX Consultant Rhona Bradshaw, is that it was never telling the whole story.

“It acts more like a smoke screen with regards to what's actually going on," Bradshaw says in this CX Today Roundtable.

The real damage, she argues, isn't the full outages that trigger incident reports; it's the persistent, low-grade degradation that chips away at customer trust without ever tripping a status page.

Founder and CEO of the CX Collective, Ty Givens, cuts straight to what resilience actually means on the ground.

“There are two questions. Can the customers get help? And can the agents do their job? Those are the two things that actually make an organization resilient.”

Everything else, she argues, is a distraction.

The shared dependency problem makes it worse. Givens describes organizations rolling out new platforms only to discover their redundancy was illusory all along, with multiple vendors sitting on the same underlying infrastructure, going down together the moment anything is tested.

Mike Wehrs, COO of TieTechnology, brings the sharpest focus to where AI fits into all of this:

“You can't fix stuff by throwing advanced solutions on top of sub-adequate infrastructure and sub-adequate data.”

All three guests make the same point from different angles. AI doesn't necessarily introduce resilience problems into the contact center, but it does have the capacity to find the ones that were quietly there all along.

SPEAKER_01 0:08

Hello and welcome to CX Today. I'm Reese Fisher, Associate Editor, and today I'm delighted to be joined by a number of people. I'm here with Ty Gibbons, the founder and CEO of the CX Collective, Rona Bradshaw, a CX consultant, and Mike Weirds, a COO of Thai Technology. Thanks so much for joining me, guys. How are you all doing? Very good.

SPEAKER_02 0:30

Thank you for having us. Yeah, good morning. Happy to be here.

SPEAKER_01 0:35

Excellent. Yeah, no, I really look forward to the chat today. You know, we're going to be talking about uh CX infrastructure, which I think is one of those topics that often lives a little bit in the background. You know, we've had this promise of cloud, CCAT, and AI powered contact centers kind of working seamlessly together at scale. But I would say across the industry with C and that the reality is often a little more fragile, perhaps, than than the dream suggests. So today we're going to be exploring whether most organizations are generally kind of one bad dependency away from you know CX crisis almost and what it does actually take to build something resilient enough to withstand those stakes. So I guess I wanted to start by setting the scene a little bit and speak to you about this roller because I think resilience means very different things depending on I guess where you've sit. You know, you've led digital transformation and complex consumer operations for a number of companies. When you think about CX infrastructure resilience at scale at the level that you've you've worked with, what does resilient actually mean in practice? Is it, you know, and has that definition shifted, I guess, to something more nuanced recently?

SPEAKER_00 1:54

Yeah, I think it's a great question. And I think actually it is 100%, in my opinion, shifted completely. I think for a long time companies used the five nines up as a metric that kept everybody happy. It looked good on uh performance indicators, and it was a good indicator of the overall health of the network or the infrastructure that was essentially enabling the CX or the services for customers. I think the challenge that it creates is it acts more like a smokescreen with regards to actually what's going on. And I think the challenge that customers complain about and that causes the greatest pain and lack of tolerance is intermittent outages and the lack of the kind of big scale challenges that are occurring on a regular basis. You know, if people have uh intermittently are losing their Wi-Fi or are intermittently unable to pay their bills, then you're not seeing the patterns with those kind of legacy definitions. And I think as we start to think about going forward, especially now with AI involved and things, and the fact that customers' tolerance levels are so much more lower than what they have been in the past, it's the patterns that come through, it's the intermittency of what's occurring. And quite frankly, it's looking to understand where are they finding the pain points specific maybe to the interactions they're having that don't necessarily always fall under the traditional definition of CX resiliency or network resiliency at the end of the day.

SPEAKER_01 3:39

Yeah, yeah, that makes a lot of sense. Just as a follow-up, I guess, Ronald, what do you think maybe is causing that such a large degree of intermittency that you talk about?

SPEAKER_00 3:49

I think legacy infrastructure is a significant part of what every certainly large-scale enterprise is dealing with. I think for a long time, either through mergers and acquisitions, integrations over time, uh the technology upon which all businesses are being, you know, work are working from, um, I think we have made choices as to where we choose to invest our dollars. And I think ultimately those legacy piece parts of whether it's at a middleware perspective, whether it's at a BSS layer, um, whether it's just at a front-end point of view, I think the whole infrastructure has to be reviewed under the concept of resiliency. Because actually, in today's day and age, with cloud and APIs and the way in which things get integrated, it's the smallest connector that can create the biggest problem. And I think over the course of years, the financials haven't always stacked up to rebuilding billing systems or changing out content management systems. And everybody's trying to get as much of the ROI as they can. But the reality is that technology is moving so fast, uh, customers' opinions are moving so fast, and how the business wants to use all of these services have changed so much that actually without a full large-scale review of the architecture under which everything is underpinned, I don't think you can truly determine that you have a resilient system. And I don't think you can ultimately solve the customer's problem because you can't effectively diagnose the issues in the first hand.

SPEAKER_01 5:25

Mike, I wanted to jump to you next. Obviously, you come at things with with a different kind of uh a different experience. You know, you've worked with telephony integration, with CRM connectivity, with UC infrastructure. How do you think the conversation around resilience has changed in recent years? Are organizations thinking about failure models differently, or are they still, do you think, maybe largely reactive?

SPEAKER_02 5:48

Yeah, good question. Um I'm gonna come at this from a different lens, although I agree with every everything that the prior speaker said um uh wholeheartedly. The the lens that I look at this through is data, right? I'm I'm looking at it and saying, up until the point where you had an API economy, and up until the time where people assume that because I have something called artificial intelligence, I can just throw it at my data and everything's gonna be wonderful. And it's gonna fix all of the sins of the past, which were look, this is hidden in a database. Nobody ever looks at this database. I don't necessarily need to even look at the valid validity of this data. Um, I have problems with it, but it never surfaces itself because we don't have any systems that touch that piece of data. So it's not only the infrastructure that needs to be looked at and assessed and figured out as part of the plan, but you need to understand that when you start exposing databases to an engine that is inquisitive, it's going to pull all of that old nasty data that is definitely not at a quality level that you would want to expose to the outside world. And not only is it going to use it, it's going to aggregate it in ways to come to conclusions that are completely wrong. And it's not AI's fault. It has to do with the fact that you didn't do any of the prep work that was necessary in order to let AI actually do its job. So what you're doing is you're putting it in a situation where you're saying, I'm not giving you any guidance, just figure it out and present the information as you conclude it. And I think that that's just a maturation of the user base. I have to understand it's it's like every other technology curve. It's like, you know, you have this disillusionment of people think it's going to be magic. And it isn't. Uh so I look at this and say, the thing that needs to be done is it's a data assessment. You need to look at what A do you actually have. You need to go through it and understand the quality of it and fitness for purpose. You can't just expose all your databases because there's an easy connector that says connect this base to this and let it loose. You will get bad results. And I'd say that's really where we're at right now is we we're constantly brought in. Could you just deploy your AI solution on top of it? And it's like, no, we have to clean up the data or you're gonna have a really bad result. So I'd say if I yeah, to give you a meta on that one, just to sum it up. People have not had to deal with this class of problem before. The data, and let's just call it derivatives of that data are used for business conclusions. And they need to they need to start worrying about that. And it's it's it's a change to their process.

SPEAKER_01 8:14

Yeah, thanks, Mike. I think I think that data angle is is very interesting. I guess sticking with this idea of of the failure modes, I want to go back to you, Rona. Obviously, I think there's a version of this problem that organizations understand and have planned for, and maybe it's a version that tends to blindside them sometimes. In your experience, which failure modes do CX leaders perhaps underestimate quite often?

SPEAKER_00 8:40

I actually think they underestimate the things that they've planned for, quite frankly. And I think a little bit as to what Mike's point as well, as we think about AI, I think for a long time, especially with the digital evolution that occurred a number of years ago, it was an easy, it was easy to put a sticky plaster or to put like an appendage onto something and to nearly not worry about how clean the data was, how it was stored, what the infrastructure was. And actually, I think now it's so far more important to really understand the plumbing and what's actually going on uh below the surface. It's it's it is ultimately the iceberg. Um, but I think in terms of that, on you know, I think it's easy for companies and for executives and for teams to uh respond to the things that they haven't seen coming because everybody rallies behind it, you're immediately on calls, you know, everybody's trying to resolve the problem, there's a massive galvanization. I think where we underestimate the conversation is in the things that have planned, because I think one, we don't and we don't watch and evaluate and determine if what we have planned for is still fit for purpose in terms of how we're going to solve the problem. And two, um, I think when we have determined what we've planned for, I don't think we have recognized how resiliency from a customer perspective may have changed and that tolerance level that I talked about at the beginning may have also evolved. And so I think policies, processes, you know, preemptive, proactive communications, all of which can be a playbook that's in the ready should something go wrong or should that KPI go below a certain kind of threshold. I think the reality is that you're not constantly asking yourself, how has this changed? Will this be responded to in the same way? Have my new systems been integrated in order to continue to do what I have assumed I'm going to be able to plan? I think that's what we underestimate. I think we kind of feel as organizations that once we have a playbook on the shelf, that playbook, you know, can live and breathe as things evolve. And I think the reality is that so much is changing for businesses these days that we have to continuously go back and review and determine if what we have assumed we're going to respond with or how we're going to solve the problem is still fit for purpose. Um and without that, I think we end up in a world of pain.

SPEAKER_01 11:09

Thanks, Rona. I think perhaps leading on from this idea of underestimating. I want to come to you next, Ty. You know, you've got experience of consulting for different uh organizations of different shapes and sizes. Is resilience even something that is on the agenda for most organizations currently, or does it perhaps only become a priority after something has already gone wrong?

SPEAKER_03 11:32

You know, I think at the end of the day, what we have to do is define or redefine what resilience really means. So the there are two questions. The first one is can the customers get help? And then the second is can the agents do get a job? Those are the two things that actually make an organization resilient. Um, at the end of the day, um, piggyback, you know, when Rona was speaking about you're looking at um, you know, five nights and status pages and defining that as success. But at the end of the day, when you pull the lid back, if we are in a space where people can't perform their functions or customers can get the help that they need, then we're in a lot of hurt. Mike, you might be able to relate to this, but uh a company that I know of had a had two phones and studies, one for support and then one for the rest of the company, which happens more often than we care to see. Um and there was an issue with the main company phone line where it went down. So they forwarded all calls to support. And I think in that moment it felt like a good idea. But at the end of the day, what happened is confusion with the agents, customers who couldn't get help. And the cost of the repeat callers from customers who needed support was pretty significant. I feel that, you know, when it comes to resilience, what we're looking at is um a bigger issue around ownership. And again, this touching back on what Rona said, we're looking at an intersection between not just uh support, but support in IT and any vendors that are out there that we're working with. And I don't know if anyone is actually truly really accountable for that. I think we all still kind of work in our silos. So unfortunately, I think at the end of the day, what we are seeing is that resilience was being an afterthought, even though it should be something that we put on the table up front and start to peek through.

SPEAKER_02 13:25

Dan, if I could just um hop in on that riffer for a second, it's it's all right. It's I'll I'll give you one anecdotal that that shows that it's not only about the the customer resilience, but it's about the business resilience where you will make horrendously bad decisions if you don't look at the data with a suspicious eye, if you've never looked at the data that way before. And as of this particular point in the case, we had a um a basically a 20-person kind of uh call center um application. And um they had switched phone phone providers and we started to show them some information that they had never seen before. And what they did is after the first month, they said, We have to basically fire the whole um sales department because they're not able to close. Uh we've got 800 and something phone calls coming in per day, and we have a close rate of two people, and that's just wrong. They don't know what they're doing. And we said, Time, right? And we dug into the data, and what we found out is 783 of those phone calls were under 30 seconds. It was spam. They just were not, they did not count valid calls, they were just looking at total calls because they had never seen how many calls that their people were answering that were just spam a day. And it was like, don't get rid of the salespeople, get rid of the marketing people, right? You don't have qualified leads coming in the door. It's it's the wrong kind of problem that you're solving. So that's an a really crystal example where if you're not on top of it and you don't dig into it, you will make absolutely the wrong business decisions. And I think that that's in the phase that's coming up. The company's gonna be discovering more and more. If they rely on it blindly and say that's the smart thing, they're gonna make mistakes. As they put new systems in that expose the data in different ways, they have to be aware that all of that is suspect because it's bringing in data that's never been processed before.

SPEAKER_03 15:13

I I really love that example. Um, Mona brought up a good point as well that we set a playbook and forget it. And I feel like in support, there's so many opportunities where we build something and we just kind of put it on the shelf and we keep going. And life is going on, and those things need to be iterated above. But as support leaders, you know, we have to be in the past, present, future, and we have to work across people, process, and technology. And the question comes up when is there time to actually sit back and think about, you know, what do I need to update now? But the reality is that we have to guard on that time. People have to guard on that time.

SPEAKER_02 15:48

And if I could, I would I would also offer one other data point on where data gets screwed up in these systems and it it becomes an ongoing problem because it's a business process that has to change. And the example I'll use is like there's an internal extension, and then there's the extension that people can call when they go outside. Well, if that extension number is not exposed in the user interface that someone's using because they just were using an old system, we have found that what they do is they call the company's own 800 number to connect between offices. Could you imagine the cost from zero to calling an 800 number measured in minutes that you're paying for to call office to office? And we have found a trend where it's more than three-quarters of this company's 800 numbers are because of inter-office calls. And it's because no one is looking at it the right way. All right, so that would be one example. And the other reason when we die dove into why they weren't diving the extension, because they said, well, no one sees it in the user interface. So we made it a mandatory field. So someone just literally just hits the number pad to put random numbers in that field because it clears the gate. So not only is it broken because like they didn't know about it, but it's now broken when we turned it on and it's exposed because the data is wrong. So you can imagine cleaning this up for a company. There's lots and lots of stuff like that. It's like every rock you pick up, you're like, oh no, not that. And that's that's a lot of where these errors and and things are coming into it. It's just we're using so much more of what the information is and in ways that were never considered. And it's leading to a lot of what people are calling hallucinations and that kind of stuff. But it's it's just broken data and broken processes because people have not had to look at this before. They never had the data available to them.

SPEAKER_01 17:38

Yeah, thanks, guys. Some really, really interesting examples there. I think it actually leads my scene to where I want to look next in kind of in terms of looking at the actual impact of not having a solid resilient strategy in place. And Ty, I want to I want to start with you here. You know, when a when a CX operation goes down or degrades, you know, not necessarily a full outrage, but what what does that actually look like on the ground for agents and supervisors? And is the business usually measuring, do you think the true cost of that, or maybe just the the surface level cost?

SPEAKER_03 18:16

So it it can look like chaos or silence. It really depends on what the issue is. Um on the silent side, I'd use an example, payment processor uh was unable to process payments, but all other systems were functional. That's their their key function, but all other systems are functional. So you can imagine initially there's chaos, there's an influx of contacts, that's how you find out that there's a problem. But then there's a layer of silence that comes once you're able to start deflecting those contacts. We know there's an issue, we're working on it effectively. Your status page up today, even though this was many years ago. Um, and so what happens is that in that moment on the floor, there's there's pretty much calm because there's no real worry at that moment. We know that there's an issue, we know that the issue is beyond our control. We know what to tell customers. Now, the customer perspective is complete frustration and anger. But on the inside, it's it's literally the calm before the storm. Now, on the other side, uh, it's what happens like right when it leads up to that point, which is the chaos of we're getting reports that starts off with one, then it's many, then there's this snowball, then there's kind of what do we do? Um, effectively, some organizations are going to pull that you know contingency plan playbook to find that it's out of date, and that the way that we plan to handle this doesn't even work anymore. Um, and so you start to see that sort of chaos that happens. At the end of the day, once you get through the issue and you diagnose it, I do find that some mature companies will do a reflection and will calculate the cost of that issue because they want to get the dollars in place to prevent it for future. I see in smaller companies that they kind of take it as a uh, oh darn, a lesson learned, but they may not necessarily spend the time and energy and funds into proactively getting around that issue in the future, unfortunately. Um, so I think that with the leadership, when we're on the floor, people who are on the floor and who are actually handling it, as long as the information is flowing and a next step is defined, it's okay. I think what happens is when we don't know what to say to the customer, that's when you start to feel the chaos. But once those, uh the system comes back up and everybody's back in business, that's when you really start to feel the craziness of everything that's happened. Um, there is a little point in there too, but we're talking about like contingency plans. What are we gonna offer? What are we gonna do to get around it? But I think that everything truly comes into focus right at that moment when we're back online. That's when it gets crazy.

SPEAKER_01 21:02

Yeah, I really like uh I really like that chaos or silence idea because it makes total sense. And I guess we've we've covered there maybe some of the reactions. I wanted to come back to you, Mike, to maybe look at what sets things off. You know, where in the stack do failures most commonly kind of originate in your experience? And are organizations generally aware of those weak points before something breaks?

SPEAKER_02 21:27

I I'd say we fall into two categories on on how that comes at us. Uh number one, the fastest way that we know when there's something wrong is when we start getting calls from people's cell phone numbers saying phone is down, um, and you know, their phones have stopped ringing. That's the fastest way to get someone to call in and say there's a problem. Um and then it's diagnosing where is the problem. Um, and that is a complication that is generally beyond the way a lot of people are prepared to deal with it. Um And I'll give you some concrete data points on this, not just a sweeping comment. Generally, the person who's responsible for the phones is not the IT person. Generally, it's the office manager who gets the uh it's it's not really an IT thing. It's it's not really you deal with it, right? And they give it to somebody who's gonna do the best job possible, but they certainly are not a phone expert, right? So what happens is they end up with a lot of these lightweight IT things, too, because they're the ones who let people in to like do work on IT. And more than half of the time we get a phone out, it's the IT vendor came in and changed a setting or upgraded firmware on their router, right? And it blocked all traffic, it changed all the priority rules and all of that. And I put signs up inside the walls. Don't touch, if you make any changes to the routers, you know, call it none of that works, right? So I'd say that down, down and out of people making internal mistakes because they're not trained to understand that this is a significant part of their IT infrastructure and it's not to be messed with um without thinking about it, because it is a life-ready company, right? If the phones stop ringing and you can't make calls out, what happens? It's kind of like the whole internet going down for you, except it's worse. It's like there's a customer who's taking the time to actually call you and you can't talk to them because you did something dumb. And that's that's really the it's more than half of the problem falls into that. Um, it's an unqualified person who went in to touch the IT, didn't think that it had anything to do with any of the related systems. And it it takes it down. Um, the the the other class of problem is the the reps don't necessarily enter the data the right way. Um, so you know, we see a lot of things of I thought we were, I thought I thought we were supposed to have these data fields captured. And it's like, well, you are, but the way you set your system operationally up didn't allow anybody to have five seconds before you push the next call to them. So everything that they discussed on that phone call is now lost. Now we can we can solve that problem for you, but you know, there's an incremental cost to do that, right? And it's call recording and transcription, and we can put all of that data in, and that's the best solution. But you'd be surprised how many people will say, no, I'll hire another person to talk on the phone and not capture any data before I'll worry about capturing the data and making it useful to all the downstream systems and help my out my customer support numbers and help out my sales numbers. So it it really is one of those, because it's been an ignored factor for so long, um, you know, it's just phones. It's like we don't even think about that. It's like, well, I'd rather I'd like to eliminate them, right? A lot of companies don't even give them to you anymore. It's like you got to talk on your own cell phone. So it's been this really ignored but most pervasive network in your IT shop. And I think that's the the other half of it is just people just it they still treat it as if I have dial tone, it's good enough. When it's a hell of a lot more than that now. So that would be the other reason why it goes down is people just not that they're just not aware of what they've done and and it goes down like that. It's usually not a it's usually not a system crash overall because we've got a lot of redundancy on that. It's usually a customer goes down, it's not usually the whole system goes down.

SPEAKER_01 25:21

Yeah, thanks for that, Mike. Another some more really, really great examples in there outlining some of those problems. I wanted to kind of stick with that theme and come back to you, Ty. I think maybe a problem that gets overlooked a little bit in this area is kind of this this idea of shared dependency, where multiple vendors in a stack are kind of audited in on the same underlying cloud provider. So in that scenario, redundancy can turn out to be something of an illusion, I guess. How often are the organizations you work with genuinely auditing this kind of risk?

SPEAKER_03 25:58

Not very often uh in transparency. The reality is that um a lot of times when new tools are rolled out, similar to the example that Mike gave around the office manager, you know, handling the system administration for phones, a lot of that happens a lot on the support side as well. So the person who's actually responsible for administering the tools and the accesses and the configuration has no experience doing that before, has no background in it, and doesn't understand the technical aspect. And, you know, the reality is that good marketing makes people really excited about rolling out new initiatives and new tools and new products. So they get really excited because they're told that this system is going to solve their issue. But nowhere down that list of requirements is the question, like what, you know, what are you relying on? What do you sit on top of? Because that's not something that even resonates or it's not even something that people think about. You sort of find out that your systems are more connected than you thought when you can't access multiple systems at the same time and you realize, oh, wait a second, there is actually no redundancy here. If one goes down, they all go down and now I'm mental well hurt. And I wish that, you know, this was something that was more top of mind, but I feel that with within any organization, the IT, you know, function or department, they have their own set of responsibilities and things that they need to be in charge of. Um, you know, if you have a to left New York Telecom area or function, I don't know if those are even still in place anymore, the way that things have changed so much. But you know, they're responsible for their piece of the pie. Support is responsible for its piece of the pie. And a lot of times, and that support function or entity, the company wants support to be self-managed and self-maintained. So who do they go to when they help?

SPEAKER_01 27:59

Yeah, thanks, Tanya. Like again, you you mentioned the that that issue of if one part goes down, they all go down. And it was giving me flashbacks of uh putting on my Christmas lights and one bulb goes and then they all go on, and it's like you've got to pull it all off the tree and it's carnage. Mike, I wanted to come back to you just kind of to round up this topic, I guess. What does I guess grateful degradation actually look like at the integradation layer? Is it achievable, do you think, for mid-market operators, or is that that a capability that only some of these bigger enterprises can can realistically build and uh implement?

SPEAKER_02 28:39

I think it's just it's exactly the opposite. I think if you if you design it properly, it's designed in right from the beginning. So I think that small companies, people who are starting now with the world the way it is, they they are entering a world that already expects that and has to design for things that are going to fall down and gracefully deal with it. If they don't, they're really being they're living in a cave, right? If they're if they're not dealing with this now when they're building new, this is the time to build it. In the in the after the fact, I mean, unless you're in a cash cow mode where you just you don't want to invest anything in that particular technology anymore, what you need to do is stratify it into layers, right? There is no like silver bullet that solves that graceful degradation problem. There is things that you can do, though, to start to communicate awareness that something is going on that you can bolt off. So one of the things, first things that we bolt on is we'll put network monitoring. So you know, we're looking at just simple stuff. What's your your actual real throughput that's happening at the person's desks? Um, and we'll monitor that and get status of it when we think that there might be something coming up. We'll we'll look at that. And over time, are we seeing it trending down? Are we seeing anything at different hours, peaking things like that? And so we'll look at traffic. We'll look at your your um your ISP and see what kind of ping ping responses they're getting. We'll check your own router and see if somebody did anything stupid and put in another router or turned off prioritized traffic. You know, QoS stuff is always the first thing that gets messed up when someone does a router resurfaced because they reset those all those say settings that gave priorities traffic prioritized traffic to voice. Um, and you kind of need that, right? No one wants to talk to broken voice. Um, so I'd say that those kinds of things are message to the user. So we'll we'll message to the user that right now um your network is the one that's got transient problems in it. So that it's not somebody on the call, it'll be we're telling you we're having trouble with you. And we're doing it because we're sampling the audio for purposes of recording it along the way. So we can surface that kind of information to the user. Once the user has it, then it's got to go to IT and they they at least can then track it because you know you got to log it. So if you're not instrumenting your code, I'm not gonna get any more technical than that on this call, but if you instrument your code, it will be logging things like when it's running into stress conditions. Um, it'll be looking at things like how much memory is it using in the pool. And when it's up over 80%, it should be doing something about that. And if it's not, we can then help the the company who's have using that tool to say, hey, look, these guys are not handling stuff right, or you need to increase the memory on your machines. So it has nothing to do with a phone, but it it's just because of the use pattern, it highlights these things. So it it does have this element of you got to design for it up front, and it has this other element of it, you got to message it, and that requires messaging not only to the IT guy, but to the end user to let them know that there's a problem that's starting to crop up, and then they can escalate internally. So it's it's first, it's it's classic problem solving. Make them aware of it as early as you can and tell them what they can do about it to prevent it from getting worse. Um, and that's kind of like front and center of what we do in the UI. And then last of all, when we lose connection, I I almost want to dare somebody to tell me they were unaware visually when we lost connection, right? Because it's like we're doing everything short of setting off alarms um on that UI, so letting you know that you are not going to receive any phone calls right now. And if your business is receiving phone calls, you want to do something right now. Um, so I'd say that's it, but pick a reputable software vendor. I, you know, if phones are generally not the best software guys in the world. I mean, they writ they wrote software a long, long, long, long time ago. Um, so if you're looking for phone systems, pick someone who knows what they're doing within the infrastructure you're going to try and put it into and make sure they're doing APIs, make sure they're doing priorities, make sure they're doing QoS. It's kind of all the things that you would you wouldn't look at, but it's not in the phone equipment supplier vendor qualification list, usually, right? That's it, they usually leave that stuff out. But now with everything running in the IT infrastructure, all of that stuff is things you need to still consider. Uh so that that would that would be my advice to people is just pick good vendors and make sure that you're asking just a couple of questions about failover kinds of things. And the last kind of data point I'd give you is um, you know, when you're if everything kind of go into to Ty's point, if you have a single point of failure being your ISP, and every data and voice communication and metric and instrumentation thing that you've got is going out over that one pipe, you are not a good IT manager, right? You have to have some kind of failover for the backhaul. And most of the time, that's the thing that the IT guys cave on because it's like I'm not spending X number of hundred a month for a maybe. And it's like if it goes down once, just once think about what the loss to your company is gonna be. Um, but I would still tell you greater than 90% of our customers refuse the failure connection. And it's not just for voice, everything goes down that day. But that's that's kind of the biggest thing I'd I'd put out there is it's the cheapest insurance you'll ever get to keep your system up and running, is find those single points of failure, and that's where you address your redundancy first.

SPEAKER_01 34:14

Um Ty, I think that was a really, yeah, a really great sort of deep dive into the impact of poor resilience and and great advice on how to combat or how to react when things do go wrong. I guess I wanted to move on to discuss uh a technology that I'm sure you've probably heard of called agentic AI. Um it is it is a little bit ubiquitous at the moment. Um and you know, inevitably it does come up in resilience as well. Um Mike, I want to come back to you on this to start off, I guess. At the infrastructure and integration level, when you know maybe an AI agent starts failing or behaving unexpectedly inside a contact center environment, what does that failure mode actually look like? And in the the current generation of TX infrastructure, TX infrastructure rather? Uh is it even built to detect those kind of those which are those issues which are, I guess, kind of new issues?

SPEAKER_02 35:10

Um all right, I'll I'll I'll I'll go out on a limb. I'll say woefully inadequate is the the the way I describe it. Um and it comes into layers. I I think that's the best metaphor I could give you to kind of make the point on this. If you want your car to go 200 miles an hour and you're focused in on aerodynamics, that's really great. But if you don't have tires, it doesn't matter. Right? So it you you gotta approach the problem the right way, right? And if you're gonna throw agents and give them autonomy on top of this and the data that they're relying on wasn't good enough for you as a human to interpret, how could you possibly expect that the agents are gonna do a good job? That that would be the first place. And then the other one is kind of the metaphor on this one is forms. You know, we've all been faced with you get a form to fill out. And I will go on the record of saying I will bet that nobody in the company who sent me that form actually tried to fill it out. Because there's no way you can get through half of the forms because they just don't do the work. They create them and they don't try them. Um, and that is just an insane thing. Same thing with agents. We have people who don't have never written software before, don't understand things like dead letter boxes, don't understand things like redundant loops and unintended consequence loops. I mean, all of that kind of stuff that goes wrong with agents. And you're gonna now take the time when a user is having a problem and you're gonna put it in an agent's hand on known problematic data with basic instructions that are gonna cause it to say, if I don't know what to do, I'm gonna hand trend you to another agent. So now you now you're gonna frustrate the hell out of the person on top of it. So you fix the first problems first. Going back to the car metaphor, fix the tires. I don't want you going any faster until you can fix the tires, right? So let's focus in on the data quality. Let's fix it. So, first things first, stop the bleeding, right? Don't let any more data, bad data into the system. Okay, now it's not going to get any worse than it is today. That's a great message and it's motivational. Next, what do we got to do to fix the data that is most important, that is gonna cause people to be getting answers that are dead wrong and cause data loss or otherwise catastrophic problems? Fix those. Now go after the rest and message to your customer that this is what's happening with what's what's going on. And you can get through it. But to just throw agents at it, you will you think you're having AI failures and people pissed off at you, throw agents at it when you don't have AI right. Um you can't you can't fix stuff by throwing advanced solutions on top of sub-adequate infrastructure and sub-adequate data. It just doesn't work.

SPEAKER_01 38:01

I uh enjoyed the metaphors. When you uh but when you start talking about forms, I thought you said storms originally. So I was really excited to see where things went, but uh some good stuff. Um Rhoda, I wanted to come back to you, I guess, you have a different perspective, maybe like a uh senior leadership perspective on that. How do you build governance around AI that's kind of running inside these training critical service operations? Um what does observability actually mean when the system making decisions is an AI agent rather than a human or a deterministic workflow?

SPEAKER_00 38:39

Well, first of all, I would say here to Mike, 100% agree with you uh on all that you talked about, without a doubt. And and the second thing I would say to answer the question uh is ultimately AI and the usage of AI shouldn't enable the ability for leadership or the business to abdicate the accountability with regards to what the AI is doing. And I think for uh in a lot of conversations these days, there is a lot of discussion regarding the agentic side of things and the reality that these agentic agents will be autonomous and therefore they will go and do the work of others. And so, therefore, what is the role that humans will play in this? And actually, in my opinion, the reality is that once we all collectively appreciate that it will be about collaborating with the agentic space, that actually that's where the role of humans will come in. So, to the point around observability, it really comes down to how are we going to audit? How are we going to be continuously checking and determining the output of what's happening? Because all the way back to the beginning of our conversation, we're still making sure that the customer is getting what they want, that the business is achieving what it needs. And I think just like from a finance perspective where there is an audit team, I think we will have to start building a structure that enables governance to be a thing that is an everyday occurrence and that is set up where humans are continuously reviewing and determining what's happening. Mike said a lot about data today. I think that is ultimately where we have to be looking at what patterns are coming through, how good is the data? Um, is it behaving the way that we want it to be? Is it the outcomes that's coming through doing what we expect it to do? I think that's where the governance starts to come in and humans and senior leadership and executives all around starting to determine, okay, what is the way in which we collaborate with this so that we are tracking, monitoring, observing, so that we can be sure that the outcomes we're expecting and the quality of those outcomes is ultimately what's coming through at the end of the day.

SPEAKER_01 40:54

Yeah, I really like that phrase, kind of abdication of accountability, Rona. I think that's really good. I think it's really accurate as well. I think it's sometimes a little bit of a oh, it's the AI kind of mentality when you know you can't treat it like that.

SPEAKER_00 41:06

Well, no. And also, you know, when we talk about AI and governance, everybody gets worried about the legal side of it, the regulatory side of it, hallucinations, you know, all of this comes down to us having to make sure that we stay a part of the journey of the party. You know, we can't and should not assume that, you know, whatever it puts in or whatever we take out of it is going to be fit for purpose. To the earlier point, bad data in means bad outcomes out. And so we have to recognize that it's part of our responsibility to put the infrastructure, the foundations, the data, whatever it might be in place in order to get the right thing out of it.

SPEAKER_02 41:46

And if I can just add one comment to what she said, because I it it's really germane. And I'm 100% agreeing with what she's saying, but it's it's got one other element to it. To find the errors that are happening requires a set of skills that are not common for executives who are spent their entire year trained to be able to, that doesn't look right. It's it has to be done a different way. And I'll give you an exact example that that demonstrated that I got to work on my expanding my thought pattern in this area too, because it was almost got biased, right? And it was a report back from an agent, so AI agent, that said, um, we found where the user was having a problem, but we came up with a scalable way to fix it. And I was like, well, that's just great. But then for some reason, I looked and I looked at the supporting data, and I said, that doesn't look like it's scaling the way that I would have expected to look like it's scaling. And not only that, the class of problem should not have been able to be solved like this. So I asked it and I said, could you please give me the definition of what you used for scaling when you told me you did it? And it gave me a bullshit like it was uncomfortable answering, right? So it gave me a different answer to a different question kind of thing. And I eventually, for cross-reference in, I said, show me the piece of source code where you did this in a scalable fashion. And I brought an engineer over. It was absolutely not scalable. It was a fit for exactly to fix that one problem, did not scale. If we deployed that code, it would have blown up the infrastructure because it would have created multipliers of that particular terrible change. So this is when AI gets the bad rap of lying, right? It did. It absolutely did not answer the question I asked it, and it intentionally tried to cover it up. Now, I'm not going to talk about the model of it, but if you don't have the executives who have awareness of their job function still and a curiosity to look for stuff like that, you are going to end up with worse problems than you have today. Because you, every executive, uh, I'm going to go so far as to say if you are not partly curious about your data and have like an Feel for what should be doing when things happen, stuff is going to get through your system and you're going to be like, how did that happen? And it's already going to be out in the wild because these agents are on their own. It's not like the bureaucracy of time of getting things through allowed for people to be more diligent. Now it's they thought of it and it's deployed, and your customers have it. And that's dangerous.

SPEAKER_01 44:21

Yeah, that's a good point, Mike. And I think it leads on nicely to what I wanted to I wanted to kind of tie about. I know obviously we all we all agree that we need these guardrails in place, but I think understanding what exactly they look like and how they work is sometimes the issue. You know, I think one of the common ones is this circuit breaker, perhaps a mechanism that detects when are misbehaving and pulls out of the workflow a little bit like the example you gave there, Mike. Um Ty, what does that actually take to implement, you know, something like that in a real contact center operation? I imagine it's a lot more, a lot more tricky than it sounds.

SPEAKER_03 44:57

Yeah, I think uh when Mike was touching on is something that's really important, um, especially in my experience at the executive level, there's this feeling or thought or idea that AI is just going to solve all things and can be trusted, you know, to the, you know, tenth degree. And that's not necessarily the case. Um, AI can get you falling, but at the same time, there has to be a level of expertise that you can use to vet it to make sure that it's actually performing and doing what you expect it to do. Um, I often say that uh working with AI um exposes your opportunities and communication. Sometimes it does exactly what you ask, and you're like, that's not what I meant. And so imagine what your employees may be feeling if that's the case. Um, but I think that it also forces you to um to know what you expect. So, how many times have we been in a situation where we're asking AI to build something, but we really don't have an expectation behind it, or we don't know what success looks like? When you're building out these circuit breakers or these stop points, you have to be able to have a clear definition definition of what it means to sell. Um, which means you also have to know what it means to succeed, right? And then you have to be comfortable with what you don't know. Um, ideally, there would be some sort of you know, real-time monitoring. That would be the goal. Um, who has time for that? I came for workforce management, real-time monitoring. It's a very different piece today. So that, but that's one of the things. There are tools out there that we could leverage. Um, also, what uh Rona touched on, which is which is human fallback. Like you need a person to oversee this to make sure that it's doing what you expect. At the end of the day, the time that you're gonna save with these automated costs is gonna save you tons of money. So having someone overseeing that is well worth it. And that person should also be accountable for turning it off and knowing when and how to do that. So sometimes these things, these failure points are silent, but they can actually become really, really impactful. So, an example I'll give you is you're automating your responses because your knowledge base is set up properly and you feel really good about it. But you end up in a situation where give you one that actually happened, a customer responded to an email that was uh initially a marketing email, but the customer responded with a question, and AI attempted to answer everything that was in that message, not just that initial, that that last question that the customer asked. That's a pillar to point. That's something that should have been excluded from the use cases. However, that was a we didn't know what we didn't know at that moment, right? So that was a learning and that was something that we needed to monitor and we needed to turn it off and we needed to pivot. So we now know what to do in the future. And I think that pretty much whenever you're using any of these tools, there has to be a certain level of oversight and more importantly, an understanding of what you're trying to get this tool to do so that you'll know if it's being successful or not.

SPEAKER_01 48:10

Yeah, I was gonna touch on that point, I guess, Ty. I think, do you think currently enough organizations have people with the relevant um expertise or knowledge who can effectively sit in that kind of controller role that you're talking about?

SPEAKER_03 48:24

No, and I think it's it's because there's this fallacy that AI is going to solve everything, that I no longer need, you know, a head of support, or I no longer need this executive, I no longer need this oversight because I have AI. Well, you can roll it out, but the who's gonna validate that what it's saying to you is accurate. There are many times where I'm doing research and I'm looking at and I'm like, this is not true, but I know that because I've been doing this for 25 plus years, right? Everyone on this call knows that because they've been doing this for many, many years. But a person who is new to this, maybe one or two years in, who thinks that they can run and scale an operation using AI, will find out the hard way that they cannot.

SPEAKER_01 49:11

Absolutely. Um, Rona, I want to come back to you for, I guess, our last AI-related question here. It's kind of looking ahead, I guess. As AI does, and it will inevitably become more deeply embedded in CX infrastructure. Does the boardroom conversation around resilience need to kind of need to fundamentally change? And do you think senior leaders are currently equipped to have that conversation?

SPEAKER_00 49:39

I think it 100% needs to change. And I think it needs to change because I think everything that has come before us has to be rethought or questioned. I don't think precedence is an answer anymore to um allowing and enabling um strategies to be created and plans to be put in place. I think everything has to be on the table. And I think um the realities is that as human beings, we're very used to um the realities of history and what has been done before, and using that as moments to really create, you know, assumptions around what we think future will look like. And so I think as AI starts to really take hold of how we do business and how business is created, I think we have to be willing and ready and able to actually assume that just because we did it one way before is not how we're gonna have to do it in the future. And and I think the reality is that that is a scary position for senior leaders and for the executive leadership teams of organizations to have to consider because it no longer means that they have all the answers or can leverage the data that they had before, or that they can pre-plan what's gonna come around the corner. I think we're going to all have to realize that we have to learn more, that we have to be open to be educated, and that we're all gonna have to kind of learn and fail at the same pace. And I think as long as leadership tables are surrounded by people that have the um hubris and the you know reality check that actually they don't know it all, and nor do the people around them all have all the answers, and that actually we've got to be all in the trenches together on it. I think they're the companies that will actually win in this journey versus the companies that just assume because we did it one way, um, therefore the outcomes will be the same as we go forward.

SPEAKER_01 51:44

Senior leadership can be a little bit uh old dog new trick sometimes, can't it? But I completely agree. I think you have to be yeah, it is, you know, it's scary for anyone, you know, having been in a in an organization for so long, still thinking they need to learn new things, but I completely agree they're gonna have to.

SPEAKER_02 52:00

I agree a hundred percent again. I mean, this this panel has been wonderfully aligned and in reviews of where the problems are um and what to do about it. Uh so I think it's been great. Um I I just have another uh another view on it too. And it's senior leadership tends to focus in on what are the important things to measure, and that's how they talk about it to the shareholders and to their stakeholders and all of that, right? And all their partners. So if if they believe that coming out and talking about how they have an AI mission is the important thing, then that's what they're gonna be looking at is how fast that is deployed and other ways of talking about the fact that it's been a successful whatever it is that they want to do. And I would put forth that deploying AI is not something that anybody should be aspiring to do. They should be trying to say, I have to get customer support to be world-class and have these kinds of metrics associated with the performance of it. I think that if you want to do that with sales automation, that that measurement of customer satisfaction probably is a more important metric than whether you got an AI system to talk to your sales system deployed, that deployment isn't the issue. Is it having an impact on your bottom line that's the issue? And those are the things that should be measured. And I think that the though that is at the heart of if you want to see rapid change, if you want to see technology integrated the right way, simply make it 20% of the executive team's bonus is gonna be determined by customer satisfaction in sales from our customers and from customer support. The day you do that, this problem will get fixed. Until you do that, it will be a band-aid solution because they're just going to put more infrastructure in that the shareholders and the board is rewarding them for. And that's not saying anything bad, it's that's the way the system works. So if you want to change the incentive to get a different behavior, there is a vehicle to do it, and people just need to have the strength to do it. But I think that it's long time coming, is stop worrying about the things that we're telling everybody, look here. We are the the house is on fire relative to what's happening with our customer experience that we're we're relating. That's what needs to get fixed. Go fix that. AI is just a tool on the way, on the way there, but it's not uh it's not to fix it on its own.

SPEAKER_01 54:22

I agree. Yeah, I think that's probably uh a quick place to end things. Um thanks, Mike, thanks, Ty, thanks, Rona. It's been a it's been a really great chat. Like I said at the jump, I think resilience is something that gets a little bit overlooked in the space sometimes. So having the opportunity to speak to you know really experienced, knowledgeable people and really get a bit of a deep dive into it. Um, yeah, it's really, really great. So thank you very much for your time today.

SPEAKER_02 54:45

Thank you. Thank you all, but it's great to meet you all.

SPEAKER_01 54:48

I did also just want to take a quick second to thank the audience as well for tuning in. If you enjoyed this chat, and I'm sure you did, uh, please like and subscribe to the channel. And remember, you can head on over to cxtoday.com for more stories like this. Until next time, thanks for watching.