How You Can QA Contact Center Calls Using ChatGPT (Desktop) Artwork

Advice from a Call Center Geek!

Advice from a Call Center Geek is a weekly podcast with a focus on all things call center and contact center. Tom Laird, CEO of 600+ seat award-winning BPO, Expivia Interaction Marketing and Ai auto QA startup OttoQa, ICMI Top 25 Contact Center thought leader discusses topics such as call center operations, hiring, culture, technology, and training while having fun doing it!

All Episodes

Advice from a Call Center Geek!

How You Can QA Contact Center Calls Using ChatGPT (Desktop)

May 08, 2024 • Thomas Laird • Season 1 • Episode 221

Send us a text

In this ACG episode, we dive into the transformative impact of AI on customer service QA. Hosted by Tom Laird, the CEO of Expivia Interaction Marketing Group and OttoQa, this session explores how ChatGPT and Claude 3 enhance call analysis and agent performance monitoring.

Discover effective system prompting and its role in refining quality assurance for consistency and high accuracy, with standards reaching 96-97.5%. Learn the nuances of differentiating between sales and retention calls, assessing customer sentiment, and how these insights lead to clear communication.

This guide offers practical steps for integrating AI Auto QA into your business, enabling detailed data analysis and report generation. Ready to elevate your quality assurance strategies with AI? Tune in to find out how.

Tom Laird’s 100% USA-based, AI-powered contact center. As the only outsourcing partner on the NICE CXone Customer Executive Council, Expivia is redefining what it means to be a CX tech partner. Learn more at expiviausa.com.

Follow Tom: @tlaird_expivia
Join our Facebook Call Center Community: www.facebook.com/callcentergeek
Connect on LinkedIn: https://www.linkedin.com/in/tlairdexpivia/
Follow on TikTok: https://www.tiktok.com/@callcenter_geek
Linkedin Group: https://www.linkedin.com/groups/9041993/
Watch us: Advice from a Call Center Geek Youtube Channel

Speaker 1: 0:25

This is advice from a call center geek a weekly podcast with a focus on all things call center. We'll cover it all, from call center operations, hiring, culture, technology and education. We're here to give you actionable items to improve the quality of yours and your customer's experience. This is an evolving industry with creative minds and ambitious people like this guy. Not only is his passion call center operations, but he's our host. He's the CEO of Expedia Interaction Marketing Group and the call center geek himself, Tom.

Speaker 2: 0:55

Laird On the desktop version. Right, and I said the desktop version just because that's where most everybody's at right Now. If you're a developer, like we, have API connections to chat, even to cloud and all that and that's for another day and I'll do another podcast so we can get way more in depth, but this is for you know, maybe that person has 10 contact center agents or wants to see if they can develop some type of QA platform internally using the resources the $20, or even if you have ChatGPT 3.5 to use that as well. I am open for questions, right? I really think that this will be much, much better, right, and much more fun If any of you guys have questions on what I'm talking about. If there's something specific you want me to talk about, I might be able to talk more than I'll show, but I will definitely talk it. I have all my prompts. I have everything right here, so if there's anything specific that you want, I am more than happy to help the happy to help. The other thing is my microphone stinks, so apologize for the sound. My Yeti mic won't plug into this new computer and I just didn't have time to play around with it. So if there's some sound quality. Things sound like I'm in a pool or underwater, all right. So let's start at the beginning.

Speaker 2: 2:00

When we started AutoQA and we started experimenting in R&D, at the very beginning we were just using the desktop and we would take a call transcript and we would upload that transcript into it and then we just start asking questions. That's where we started. Then we developed, you know, kind of the learning more about prompting, right? I mean, prompting has only been around for about a year, year and a half, right. So we went to school on on that and what are the? What are some of the things that that help to make things more consistent and make things better? What are some of the things that help to make things more consistent and make things better, right?

Speaker 2: 2:36

So the first thing is how we have this set up. Is we created a system prompt, which is basically the prompt that says tells ChatGPT what it is, how do we want the outputs? You know it kind of gives the instructions, right. And then we got into the asking questions and prompting for those questions, right? So you know there's really four main pieces to this. Right, a system prompt that we'll talk about. There's your form or the questions that you're asking with the we call it the context, but it's basically the prompt for that question. You have the full call transcript and then I guess it's kind of part of the system prompt but the output that we're looking for that.

Speaker 2: 3:08

So the first thing is you pull up your chat GPT. You know, and I can kind of give you this, I think this is pretty basic. You know you are the head of quality assurance for a contact center. You oversee answering questions for quality compliance using the score provided system to assign scores. You will examine the attached transcript of the answer, right, everybody can figure that out. That's not proprietary anything. But then you start to think about how do you want this thing to look right? So we're using, you know, different outputs. We are looking in our system prompt for things like the call type, which I don't think on ChatGPT, if you're using the desktop version, maybe you can, you know, upload a couple different calls. One's a sales call, one's a retention call. But if you can kind of name those call types or explain on the desktop what a retention call is, it will pick that up. You hit enter on the transcript. So that's one output that you could play around with Sentiment scores right, so you can have your kind of.

Speaker 2: 4:07

You're the head of quality assurance and then you could tell it hey, I want to do customer and agent sentiment and kind of some of the things that we're doing in our prompt for this, in our system prompt. This involves a nuanced examination of language, tone and the context presented in the dialogue, pay close attention to key indicators such as word choice, intensity of expressions, any shifted mood over the course of the conversation, and there's much more to that. With what is positive, we define what is negative and just kind of define sentiment. That's another thing that you can have right on your desktop, right, so again, you're the head of quality assurance. Next prompt is we want to do sentiment score right For this call, that we're doing this one call kind of thing. Then you can see how you can. We want to do sentiments, all right, and there's a ton of prompts out there for sentiment and how to get it from from positive, negative, neutral, using a transcript. I think I even have it on our on our blog post.

Speaker 2: 5:05

We have a JSON output that we tell it to do as well, which you can do on the desktop Right To kind of say, hey, this is what customer sentiment is. This is what agent sentiment is. We tell it, which I think is really cool. We call it the rationale. Agent sentiment is, we tell it, which I think is really cool, we call it the rationale. So we say things like hey, you will explain the rationale for all the questions, regardless of the answer, right? So we tell ChatGPT to basically tell us why it scored, tell Otto why it scored, how it scored, and you can make those things as robust or as small as you want, right, you can put like under 40 words. You can say, hey, pull the exact part of the transcript where this was said for your question, right? So you're layering this right For what is important in your contact center. And again, I'm doing this for somebody who has very little IT expertise. But let's keep going back to this right, building this kind of hamburger, right, you have that, that. You're the head of quality assurance or the head in a contact center.

Speaker 2: 5:58

Let's talk about sentiment. Let's let's build a sentiment prompt. Let's talk about the rationale. We want to know why. You know when chat GPT scores something, what is the reason? Why did it say yes? Why did it say no? I think that's really really important stuff that we have found to be really important in the output. The other thing that I think is really important too is, again, if you don't have any programming expertise, just certain things that you can ask for right for ChatGP to give you back right. You want to know the four things that the agent did well on the call, four things that the agent could have done better on the call right. I think those are the core main things that we're kind of looking for right. So right in our system prompt for auto QA, which is this big giant long thing. It doesn't have to be for you, you're just again, you're asking, you're telling it who it is. Let's do a system sentiment prompt. Let's talk about the four things that the agent did well, four things that the agent struggled with. And then everybody's paying more for auto-summarization. Right, ask it to give you the full cost summary there as well.

Speaker 2: 7:04

So, on your kind of chat GPT, your internal system prompt that you're utilizing and this is something that you can use a personal GPT with for, use that as your system prompt. If you want to kind of look to scale this a little bit. It's going to be a crazy manual process and you're going to have not great outputs, but it's going to give you the general gist of what's happening on calls. We're looking for sentiment scores, the overall score of the call, four things that the agent, four things that they kind of struggled with, the full call summary, and then we get to the actual questions. To struggle with the full call summary and then we get to the actual questions.

Speaker 2: 7:41

So how we have, how we found the best way to break this up, is to tell chat gpt, even on your desktop, um in the system prompt, if you have one form that this, the. The questions will be broken up into three sections. The sections are greeting, um, etiquette and closing information or sales information, whatever it is to you guys, you will tell it how you want to score. Right, you will score a yes, I'm sorry, answer, we screwed that up. You'll answer.

Speaker 2: 8:04

If a call is answered with a yes, a question is answered with a yes, score five points or whatever it is. If a call or a question is answered with a no, score zero points and you can score NA, which is no points at all, right. So kind of set your criteria for five points, zero points or zero points, but zero out of zero instead of zero out of five for an NA. So now you just told it how to score right, so you don't really have to go through each question and tell them how many points is, unless you have some. You know this is worth 10 points. That's worth five points. Then you can tell it. And then this is where the fun comes in, this with five points. And then you can tell it, and then this is where the fun comes in.

Speaker 2: 8:41

This is where, literally, you know that system prompt took us five to six months to really figure out what we wanted to do from an output standpoint, how consistent that we wanted it to be, getting the real outputs, how we wanted to even look on a form and an evaluation. And then we get to the questions right and that's where the rest of the you know seven, eight months of work have come in. That's why the lawyers don't want to totally have me give away all that stuff. But some of the stuff is very basic, right From a question like to start did the agent use the proper scripted greeting on an opening, on the call opening? Right, that's a question number one. And then say prompt or put a P, put prompt right For question number one, and then it is the agent customer or the agent must say thank you for calling XYZ Bank. How was your day? We appreciate your call.

Speaker 2: 9:25

Whatever that is right and you can do things like it has to be exactly this or has to be somewhat like this. It will definitely pick up the nuances of that. So you just kind of do that and you trial and error each of your questions right on the desktop version, right, question one, question two, question three all right, this is the next section, because you already told it, this is the etiquette section. All right, this is what empathy is. If you go on to my blog post, that is, on the autoqacom blog, there is a ultimate guide to auto QA and I have all my prompts for all the basic call control, empathy, openings, greetings, like all that stuff is there. So I don't I don't really want to go over that again, but take those prompts, just copy and paste it right into your chat instance, right, and then hit enter. See how your output comes. If your output isn't in the format that you like, then go tweak that prompt.

Speaker 2: 10:14

But once you get it to where you like, again, I think that's where the cool personal GPTs can really come into play for some basic stuff, to where you can basically upload, especially your system prompt, and then maybe just have your question that you have already done, and then you're just kind of uploading calls, one at a time, and you got to wait for it, but you can still do that and you can still score the calls and you're going to get some really good, really good outputs with that as well. So that's how we started it Right. We said can we build a cool system prompt? Can we build the question? We built all this stuff out before the person and the answer was yes. So that was our proof of concept. So that's what you're kind of doing right now. And then we just took that and took it to the enterprise level of all the APIs that we were doing for getting a transcript and then getting the chat, gpt and cloud three through the APIs to connect to those guys. So it's all the same stuff, it's just one's quicker than the other. There's a little bit more time and effort that went into what we need to do from a prompting standpoint, but you can definitely figure it out and keep trial and error a lot of the stuff.

Speaker 2: 11:23

Some tips that I have for you on prompting Um, we have found that imperative is a is a very strong word. So if you say it is imperative that you score this this way, is imperative that you look for this in the transcript. Um auto QA will do that. Another tip is to tell it to think. If you have something a little bit more complicated that it's to take its time on, tell it to think um in your prompt before you actually get to answering that. That question. Um, we've found again, the structure of it is important, right. So for smaller type prompts, having a paragraph is fine, right. But if you get to a longer prompt, having it in the order, having it almost in a checklist, definitely gave us more consistent and gives us much better results. The other thing is the models do a great job with keywords and finding specific things, but don't think that way. Think much bigger. If you look at the empathy prompt that I have on the blog post, it's basically saying look for any instance that there was a struggle with the customer where they had a pain point and then make sure that the agent responded to that in a way that made the call better, made the agent or customers. So those are some things that I think you can really play with.

Speaker 2: 12:34

And this is a great exercise to not only get better on the prompting aspect of being AI, but to start to really look deeper into your calls. Right, and it's not that hard. It's not that difficult. Doing it at scale and having reporting and doing a lot of different things is a little bit cult, right, but at its core, right, anybody who's really using a large language pretty much, is building something on top of it and using that as the brain so that a human being doesn't have to do things. So when I hear you know all auto QAs are doing is writing prompts, that's true, but good luck trying to figure it out. To get really consistent, better than human being data I just talked to Mark Bernstein from Balto. They have a, I believe, as you said, 96 or 97.5% accuracy is what their standard is compared toa human, which is two, two and a half percent better than we can score. So to get to that level and I think that's we are right there with auto as well it takes a lot of work to figure out what are the consistencies of this thing, but at its core it's just how good a prompting are you? How good are you to really know the data that you have?

Speaker 2: 13:44

Here's the other thing too. Think about this. Right, you can start to use this thing to not just QA your agents but to start to ask questions of the data. Upload five transcripts and start questioning it, start asking, hey, what percentage of these customers are? You can upload more than five. Let's say you upload 15 chat transcripts, call transcripts and start querying things off of it. What percentage of the customers called were irritated? And start querying things off of it. What percentage of the customers called were irritated? How many specific customers thought the product was too expensive? And you can start to kind of build that, see how many you can get and start to learn more off of your data as well. You can do a ton of that stuff with your personal G, uploading a ton of different data and then querying off that data. So there's so much you can do internally to build some things out yourself. Even a very basic team can do some really cool things with the technology of today.

Speaker 2: 14:37

But again, if you're that person that just wants to kind of play around and try to figure out some really cool ways to just start to score calls and again I think that this bridges to other opportunities within the contact center as well, because a lot of the things kind of match up. But I think for the core of this, know, let's start to build your system prompt. Find out what things you want to know, right, do you want to do sentiment? Do you want to do, um, things that the agent could have done better, things that the agent struggle with? You want to summarize the call um, you know, making sure that we're we're giving the the answers back of why it's scored, and have your questions. Have your, have your, uh, your personal prompts for each question and start playing with it and I think you'll get some really good results. And then take it to.

Speaker 2: 15:15

The next step is get your personal gpts involved. Do it a little bit more at scale, um, and then you know. If you want to see a demo of what we're doing at totally at scale, with with api connectivity built to get transcripts from an audio standpoint and also to multiple large languages, I'd be more than happy to kind of show you that and how that works. And then you know using specific outputs to create forms and then reporting on top of that, right, it's a process to do, but I think at its core, you know anybody can start to play with this and start to get some results, at least on a smaller scale, for calls. So if there is no questions on that, you know, as you guys are listening to this, please, please, dm me. If you have anything that you want specific answers to, be more than happy to help. But I hope that was kind of cool and kind of gives you a little bit of insight into how this works.

People on this episode

Thomas Laird

Host