Cutting Through the AI Hype: Here's How to Actually Measure What Matters Artwork

CX Today

News and Insights for Today, and Tomorrow CX Today reports on the latest customer experience technology news and marketplace trends. Every day our tech journalists uncover the hottest topics and vendor innovations shaping the future of work.

Our coverage is fully digital offering our audience authentic news and insights on the channel of their choice. We offer daily news, weekly features, video conversations and authority content aligned to the needs of business leaders in today's world.For industry professionals, our weekly newsletter offers a range of popular stories hand-picked by our editorial team.

Subscribe to our weekly newsletter.If you're seeking editorial coverage, connect with our news desk.

All Episodes

CX Today

Cutting Through the AI Hype: Here's How to Actually Measure What Matters

March 26, 2026 • CXToday.com • Season 1 • Episode 1

0:00 | 22:03

Rhys Fisher, Associate Editor at CX Today, sits down with Rémi Guinier, Head of AI Product at Diabolocom, to tackle one of the most pressing challenges in customer experience right now: the gap between AI investment and measurable business value.

If your contact center has been pouring budget into AI tools but struggling to prove the impact, this conversation is essential viewing. Rémi pulls back the curtain on why generic models fall short, what “Shapeable AI” actually means in practice, and how Diabolocom's quality monitoring tools are helping supervisors work smarter, not just harder.

AI is everywhere in the contact center space, but can you actually measure what it's doing for your business? Rémi Guinier breaks down the hard truths and practical fixes CX leaders need to hear.

Generic AI models are failing contact centers. Trained on clean, idealistic audio datasets, most off-the-shelf models buckle under real-world conditions – background noise, interruptions, and latency that makes real-time use impractical.

Shapeable AI puts control back in your hands. Rather than forcing teams to fit a generic tool, Diabolocom's approach lets supervisors configure evaluation criteria, summary formats, and quality grids to match their specific business standards.

Auto-calibration closes the accuracy gap fast. By grading a "golden dataset" of calls, Diabolocom's quality monitoring tool self-adjusts its prompts – moving similarity scores from ~70% up to the low-to-mid 90s against supervisor evaluations.

The three signs of a truly successful AI project. Rémi's framework is refreshingly clear: improvement on core business metrics (CSAT, FCR, AHT), strong adoption rates across the team, and measurable time saved. Hit all three and you've got a winner.

For more Customer Experience tech news visit CX Today.

SPEAKER_00 0:08

Hello and welcome to CX Today. I'm Reese Fisher, Associate Editor, and today I'm going to be speaking with Remy Guinea, head of AI product at Diablo Comm. Remy, thanks for joining me today. How are you doing?

SPEAKER_01 0:21

I'm doing great. Thank you for having me.

SPEAKER_00 0:28

Absolutely. I think I think today we're gonna be talking about what what I think is probably definitely one of the most sort of pertinent questions surrounding AI adoption in the CX space right now, and that's how organizations can actually deliver measurable value with their AI tools. You know, we know the contact centers are investing heavily in AI, but many are still struggling to measure the results. Why do you think that gap exists?

SPEAKER_01 0:55

It's a great question. You're you're, you know, um there's multiple different reasons. I think uh the first thing is that um usually the models that are used are generic models that are used a bit in the a black box uh kind of mode where you don't have access to much information about how it goes, reasons, and uh the eventual intermediate results. So not having this will mean that you don't have the actual data to actually measure what what the model does. Also, uh it's you know uh contact center teams also are not usually skilled in in AI, so uh there might be a lack of technical ability to actually measure anything meaningful because you can measure some stuff, but is it really what you're looking for? It's always a great question. And also, um, there is in the market at this time with the AI models provider a willingness to sell by overpromising, uh, meaning that lots of people are making bold claims but never follow through on how to measure those claims uh in in a real business context. So I think these are the you know the most uh um you know most prominent reasons where I think uh it's it's uh that gap exists today.

SPEAKER_00 2:06

Yeah, it's interesting you you talk about generic AI models. It's something I hear a lot when speaking to people within the space, is perhaps the mistake organizations make where they think AI is this you know magic bullet solution. I think part of that is they treat AI, like you said, as this generic kind of all basis cover tool. In your opinion, why do generic AI models struggle in contact-centered environments, perhaps especially voice-based ones?

SPEAKER_01 2:34

I mean there's yeah, multiple different reasons uh as well on this. Uh you're right, and the marketing claims are you know very aggressive, so you tend to think it is a magic bullet, but in some cases, especially uh voice-based uh use case, uh, there's some uh shortcomings in the models. The the training data that they use to build those models are not representing the actual use cases, usually. It is uh some lots of text, obviously, for LLMs. Uh, and for voice data, it's usually uh ideal case scenarios where you have a very clean audio with the words that are really you know spoken one after each other. Like the biggest benchmarks, I think, are uh fur fleurs in the common voice, which are benchmarks that are idealistic. So it's it's yeah, uh the the models usually perform very well on those tasks, but the reality of the contact-centered space is we're usually dealing with interference, bad audio, uh interruptions, uh, and yeah, these are not uh represented in the benchmarks, so usually the advertised performance is quite misleading. Uh also uh you have they usually have to choose like the generic models, like uh if they are generic, they're usually very large because they need to be able to uh you know uh be used in a broad uh variety of situations, so they need to be very large. So for them to be fast, we're talking real-time fast, like uh real-time transcription or um speech to text, things like this, it can get extremely expensive. So usually people are trying to cut corners and either uh make it you know uh a bit less uh proficient with the the transcription error being a bit more uh present, or uh you know, either make have it uh have um a latency that is not usable in in a real life context. Like uh I've seen models that behave very well on uh you know uh transcript actual transcription, post-code transcription. Whenever you're trying to use them real time, you're hit with like a second plus of latency, which is not exactly ideal for say virtual agents uh kind of use cases, for example.

SPEAKER_00 4:52

Yeah, that makes a lot of sense. I guess the the sort of flip side to these generic models are smaller, more specialized models. I know you touched on it a little bit there, but what are perhaps some of the other advantages that these types of models uh that provide to people?

SPEAKER_01 5:07

Yeah, as I said, uh the the main thing is that they're specialized, they're more tuned to solve the actual business use cases uh that we are dealing with. Uh like uh I always say that to the people that I'm talking to this about. You know, it's uh we're not asking it to provide a strawberry-tarte recipe, we're just asking for a summarization, we're just asking for very simple things uh that are very precise, that do not vary. Like, for example, on quality monitoring, we have usually a grid to follow and to evaluate. This is a very standard uh static use case, and and in that way we can train models to be smaller and more focused, meaning they're faster at doing those kind of things that interest us, and also are less prone to hallucinations because you know they're lighter and we're giving them exactly the context they need, so we're at way lower risks of the model hallucinating. So in general, just like uh better and faster, I would say.

SPEAKER_00 6:05

Yeah, I wasn't uh I wasn't expecting us to cover strawberry tart today, but I think that's a really nice analogy right there. I guess another concept that you guys at DiaboloConf talked about in the past is this idea of shapeable AI. Um people aren't aware. What does that mean in practice, maybe for contact center teams?

SPEAKER_01 6:26

I mean, one trend that we did see in the industry is that uh people were very prone to present new use cases, but in a very bland kind of way. Like uh they say they have some kind of features, but we you cannot tune that to your uh sector, to your business use case. Like it's an AI that uh shapeable AI is uh like we say, an AI that you can tune by you know providing different kinds of instructions. Uh, this is something that we're really um focused on, like giving people the ability to interact with the models the way they want. Like if I want my summary to be of a certain type or a certain you know uh structure, I can tell you that I want it this way, and then it's gonna be done in the way you asked for for quality monitoring. We are asking you what exactly are you after? What do you want to evaluate exactly? Um so we leave the system as open as can be so that the people can shape it into the shape they want, so that they get the results that they're after, not some generic result that they try to apply to their use case, but rather specialized data for their sector, for their business, for you know their their specifics. So it's also yeah, as I say, tune, configure, and fits the the specific sector with enough depths of configuration to tailor the product to to all the use cases, and this is something that we have in all of the the the AI product offering we have at Jocom.

SPEAKER_00 8:00

Yeah, again, it seems you know obviously a lot goes into it, but it feels like such a such a smart way to run things, making it kind of like you said, malleable, built around the individual user. Um yeah, that makes a great deal of sense. I guess sticking with some of Diablocom specific products. I know your quality monitoring product is another area that you've spoken about a lot recently. How does maybe the autocalibration improve AI performance?

SPEAKER_01 8:29

I mean, it autocalibration is a very interesting story. Like by leaving the system opened, we faced a bit of a wall at some time where we would think, okay, we left the system entirely open, that's great. But the people that we are talking with are not exactly AI native, you know. These people have been in business for probably more than 20 years. Most of the people that we've been talking to are actually have been in business for like a lot of years, so they're not exactly um ultra experts on prompt engineering and working around the specifics of AI. So we have that issue, and we said, okay, but uh how are they doing calibration in real life? They're taking a grid and then they're grading calls, and then they say, Okay, supervisors, this is how to grade the different calls. Here are a few examples. So in the next few calls, you need to follow that specific uh those specific guidelines and those specifics, you know, um uh talking points, etc. So here, what are we doing? We're asking our customers to say, okay, you have an interface, you can grade calls manually inside of it, and then you say, Okay, whenever you're done, you say this is part of a golden data set, meaning that we are using them to train the model, and the model is gonna adjust its own prompts so that we try to reach, we say usually the upper 90s in terms of similarity between a supervisor evaluation and the evaluation of the AI on the golden data set that the customer built inside the platform. So we say 90 upper 90s because you know there's some situations that have some ambiguities, so it's not fair to say that the AI should cover 100%, because sometimes we had cases where the AI was actually making the correct decision. Uh so yeah, basically this uh we it improves the I performance because it allows us to tune the prompts that uh we used to work with with a very simple interface as well. Like uh we have some dashboards where you can see the results and the outcome of the auto-calibration saying, okay, at first we were only matching like maybe 70% of the time the evaluation that you did after calibration, it's 93, 95, sometimes even like 100, or in some things that are you know very black and white. Uh AI doesn't really like ambiguity. So whenever it's very black and white, it usually works tremendously well. So yeah, uh pretty much uh how it does improve, yeah. Tuning the prompt so that it gets closer to your uh appreciations uh of the situation.

SPEAKER_00 11:00

Yeah, again, that sounds really, really useful, being able to kind of see that real-time improvement, seeing the score compar comparable, I guess, rising, like you said, from the 70s up to 90s. Sounds like a really, really helpful tool. I guess within that tool, I know another thing you guys have mentioned is the configurable criteria. What kind of role does that play in shaping the sort of results the organizations see?

SPEAKER_01 11:24

Yeah, uh, as I was telling you, the goal is to have the system as open as can be. And uh the competition usually only provides generic grids that are evaluating you know very standard subsets of agent skills. Uh like welcoming the customer uh or you know, uh taking leave at the end of the call. That is very you know it's very static, but also very generic. What if I want my agent to say something specific at the beginning of the call? What if you know uh I want to measure the ability of uh an agent to upsell a customer? What if uh you know I want to ban certain words from conversation? Um and not only like conversation, but maybe subjects, because uh you know uh we could the ban word list exists has been existing for a while, but uh we want to go also into subjects, like uh maybe do not mention those kind of subjects, those kind of issues during the call, and that's something only the LLMs can do at this point. So, yeah, uh we want to give you the ability to tailor the evaluation to your standards and also be able to reach that level of granularity. So, yeah, uh in the end, just uh giving control back to you so that you that you can also explore what's best for your business. Maybe uh it makes sense for you to evaluate new things uh because before that it was very tedious to add new criteria because every time you add one, the evaluation takes a bit longer. Here, if you have an entirely automated process, so you can add as many as you'd like and also as diverse as you'd like, and have a very objective, uh non-biased way of evaluating everything very quickly. So that's why. I mean, uh if it's open, then people are gonna toy around with it, and that's good.

SPEAKER_00 13:08

Yeah, yeah, absolutely. It's a really kind of great overview of those tools, and like you say, like this idea you keep coming back to this openness of allowing almost allowing your users to trial and area a little bit themselves and find, like you said, the best the best version of it for them. I guess trying to link it back to our overall theme here. How does this ability to calibrate and shape AI translate into measurable improvements for supervisors and their teams?

SPEAKER_01 13:36

Uh great question too. Like uh like you said, uh, we want people to tour around in the solution and find the their use cases. The industry is still young, so there is some emergent use cases that you we we have seen over the years. Uh, you know, the most uh uh obvious is uh we are offloading the entire evaluation process apart from certain things. Like uh I've seen clients saying that the evaluation takes like way less time because they are concentrated on very specific uh optional criteria uh that uh they want to evaluate by hand. I have other people that said that we offloaded the entire evaluation process, period. Like uh their supervisors are only doing that looking at aggregated data because you know that's the strength. Uh what before that we were evaluating maybe one call per week, not that we're evaluating like as many calls as you'd like. So they have a more statistic, uh generic, you know, general uh sorry, kind of kind of view on the contact center performance. So we provide a lot of additional data to the managers and we provide also a lot of gain time so that the supervisor will spend more time you know analyzing data and also designing improvement plans for their agents. Uh so maybe a part of their job, but which is uh a bit more fulfilling and meaningful uh for the the contact center they're evolving in. Also for contact center directors, the you know having the way of evaluating the entire contact center at scale within a single glance, it's uh it's a really really powerful thing. Uh any given moment because it's also a semi semi-real-time solution, meaning that the evaluation usually like lands a few minutes after it is input inside the system. So think of it as uh you know a bit of a bit of a real-time monitoring tool as well for the contact center directors. So lots of uh measurable improvements like gain time, additional data to look at, um less missed trend, less missed uh opportunities. Like uh I mentioned the emergent use cases, but we have a client that says, okay, if this topic is mentioned, that it goes back to an outbound campaign so that I can have someone call that person again so that it gets fixed in the end. So yeah, uh that's yeah, that's mostly what I can say. You know, if you're not able to shape it, then you're never offloading the entire process. You're just like looking at generic data and try to fit it into your use case. That's not why it should be. The tool should fit the bill and not the contrary, you know.

SPEAKER_00 16:05

Yeah, yeah, I think that's uh the perfect way to sum it up, really. You you mentioned the kind of AI still being relatively young in the space, and we know just how much it's evolving day to day almost, you know. Advancements that we've seen over the last uh twelve, eighteen months are incredible. In your opinion, what will kind of distinguish the organizations that truly get value from AI from those that maybe struggle as it continues to evolve?

SPEAKER_01 16:33

Like uh like I said, it's a very young industry, so it's it's difficult to tell at this point. But from what I've what we've seen, because we've been at the forefront of this for like a few years now, um the good organization will benefit from having you know operational excellence. Like AI will enable their employees to work better and faster, so they get to the same results with less cost and faster. So they will in turn have more productivity and will able will be able to leverage that productivity to you know in the long term beat out the competition on price, speed, quality of the deliverable. Uh we're acting on you know the big levers for productivity. So um yeah, added productivity, added business. Also, uh I'm I'm usually that is uh a bit more uh controversial, but when I look at at the organization that leverage AI, I'm always looking at the ratio of successful AI projects. It doesn't mean the same to me if you started six projects and succeeded in one than started two and succeeded in both. Because if you have more shots at the goal, obviously gonna score more goals. So the more mature organization will be able to focus on specific use cases that have great return on investment. Uh so less project and more return on investment, they wouldn't they will uh like reach the goal faster. That's what we're doing also with specific with very specialized AI. We don't want to solve all the problems. There's only a specific subset of problems that we're interested in. And you know, we want to score all the goals we can we can, but only on those specific projects where we know the chance of scoring a goal is very high. If that makes sense, yeah.

SPEAKER_00 18:17

Yeah, yeah, absolutely. I was enjoying the the kind of the football metaphor. It was kind of like uh a conversion rate versus just the pure number of goals you score, I guess, that kind of idea.

SPEAKER_01 18:26

Yeah.

SPEAKER_00 18:28

Um I guess the final question here, kind of a little bit more of a practical one. You know, if if a CX leader wants to start measuring the AI performance more effectively, where should they begin?

SPEAKER_01 18:41

I mean, it's it's always the very obvious answer. You get the the improvements on the actual business metrics, you get your uh customer satisfaction, first call resolution, average call duration. Uh there's a way, by the way, on the quality monitoring tool to check for that so you can look at those metrics over time and see like did it really improve, etc. But yeah, basically a successful AI project is moving the right uh metrics and the right metrics for business in CXs, obviously, you know, the big ones, uh CSAD, FCR, ACD, and and the likes. Also, there's uh a less of an obvious one because uh even if those indicators go down, adoption can still be a bit of an issue. Like these tools are very new and they're touching you know the core uh job of the people that were deploying this to like the quality monitoring, it changes a lot the the the job of the supervisors, for example. So, how much are they actually using it? I mean uh it is a very key thing to to to check out because if your KPIs are good but only 10% of uh people are using it, uh is it really moving the needle in the right direction? So improvements on business metrics and adoption together means that the project is actually succeeding. Uh, if you like the metaphors, you know, uh uh I I you know I'll always say like uh when we invented the telephone, uh people weren't questioning if it was useful. Everybody uses that, uses it, uh uses it whenever they had the access to it, right? So for uh for AI and for AI projects, it's the same thing. It is if it is really useful and gaining you time, then people are gonna adopt it and not question it. So resistance is met whenever the goal isn't clear or the benefit isn't clear. Also, the last time, as I said, uh check for time gained, uh, because uh AI is just a tool. We usually measure those by their ability to bring us more productivity. So if you're able to do the same thing in less time with the same level of quality, then you improve productivity, then you improve also you know uh quality of life for your employees, um diversify their job by giving them more things to look at, maybe more in-depth uh data, uh to give them ability to just do their job better. So yeah, it's there's lots of stuff to be configured, to be considered, sorry. But yeah, uh business metrics, adoption, and time gained. I think it's uh if you succeed in all three, then you have a very successful AI project on your end.

SPEAKER_00 21:05

Absolutely, I think that's uh a really nice kind of way to end the conversation. I like like I kind of said at the start, we all know how prevalent AI is right now, but we also know there's a lot of what I would call AI path around the subject now. So I think conversations like this that really dig down into the nuts and bolts about how it is actually performing, how it's actually delivering you know value to organizations, I think they're really, really, uh really important, really helpful. And I'm sure our audience will enjoy it. So yeah, thanks for your time, Randy.

SPEAKER_01 21:34

Yeah, thank you so much. Have a nice day.

SPEAKER_00 21:38

I did just also want to quickly uh thank our audience for tuning in. If you enjoyed this, please remember to like and subscribe to the channel and head on over to cxtoday.com for more stories like this. Until next time, thanks for watching.