Advice from a Call Center Geek!

Unlocking the Power of Prompts: Enhancing CX QA with Proven Strategies

February 22, 2024 Thomas Laird Season 1 Episode 217
Advice from a Call Center Geek!
Unlocking the Power of Prompts: Enhancing CX QA with Proven Strategies
Show Notes Transcript Chapter Markers

Dive into the transformative journey of OttoQa, where we've uncovered the profound impact of prompts in elevating customer experience (CX) quality assurance (QA) processes. This podcast (taken from a Linkedin Live Event) brings the insights and methodologies from our extensive experience directly to you, focusing on the art and science of crafting effective prompts for QA automation.

Explore a rich discussion on both the fundamental and advanced strategies that have revolutionized our approach to prompts. This episode is designed not just to share knowledge but to inspire action. We encourage listeners to experiment with these techniques in their operations, building confidence in the technology and its application before making any significant investments or deployments.

Whether you're aiming to refine your CX prompting strategies or are curious about their potential to transform your service center, this podcast serves as your guide to unlocking the power of effective prompting. Join us in discovering the methods that have not only propelled our journey forward but also have the potential to redefine the landscape of customer service excellence.

If you are looking for USA outsourced customer service or sales support, we here at Expivia would really like to help you support your customers.
Please check us out at expiviausa.com, or email us at info@expivia.net!



Follow Tom: @tlaird_expivia
Join our Facebook Call Center Community: www.facebook.com/callcentergeek
Connect on LinkedIn: https://www.linkedin.com/in/tlairdexpivia/
Follow on TikTok: https://www.tiktok.com/@callcenter_geek
Linkedin Group: https://www.linkedin.com/groups/9041993/
Watch us: Advice from a Call Center Geek Youtube Channel

Speaker 1:

This is advice from a call center geek a weekly podcast with a focus on all things call center. We'll cover it all, from call center operations, hiring, culture, technology and education. We're here to give you actionable items to improve the quality of yours and your customer's experience. This is an evolving industry with creative minds and ambitious people like this guy. Not only is his passion call center operations, but he's our host. He's the CEO of Expedia Interaction Marketing Group and the call center geek himself, tom Lear.

Speaker 2:

So I'm going to just kick this bad boy off. We've done a lot of work here over the last eight months in trying to fully automate quality assurance, I think for the smaller contact center, right, I don't think we're. You know, the enterprise guys have so many different tools and they're doing so many different things that we saw that there's a need kind of with that smaller contact center and we're saying under a hundred seats. You know, I caught talking to Chris Mouts, who's on here too, an Evalu agent. He talks a lot about how there's so many of these smaller contact centers that are still using Excel spreadsheets, right, they're still using Google Sheets and you know there's kind of a need, I think if you can give them a tool can look to automate with chat, gpt, give them some type of better reporting aspect. That's kind of what we did about seven, eight months ago, or at least we set out to do it.

Speaker 2:

And let me guys throw this to this is a full AMA. So if anybody has any questions, anytime raise your hand, I'll bring you up. We can have a conversation, we can talk this through. But I want to give you some of the cool stuff that we have found out, that we have figured out, especially when it comes to prompting, especially when it comes to how does chat GPT utilize transcripts in the best way for for listening, for specific things like how do you listen for empathy, how do you, how do you try to, how to try to score things that are unseen, like is it? Did an agent go to the right screen on their computer to find this information? Or did they click this box that we can't see in a transcript? How do we, how do we deal with some of that? And then also just some of dealing with some of the, I guess, the nuances of chat GPT and how it thinks right. So the amount of different testing that we've done over the last seven months has been insane. Like we have taken notes, like I really almost want to write a book on all of this, but I have like 15 things that I want to talk to you guys about that I think are super cool and what we learned from the prompting aspect of chat GPT and again, I am a full open book.

Speaker 2:

We have our own product. If you want all these prompts, if you want our static prompt, I will give you everything. Like I think that's the other thing too. I'm not here to hide anything. So anything that, if anything is of interest to you, or even you want to play with it on the desktop version that you have with some of your calls, you know, knock yourself out because I I know that these prompts I'm going to talk to you work.

Speaker 2:

So just the quick overview of how we do this is we have a SAS product that we basically take a call and as soon as we analyze that call, it goes out to a company called DeepGram. It gets the full transcript of the call. The call then comes back, looks at our static prompt, looks at all the context that we did throughout. Each of the questions has specific outputs that we want goes out to chat GPT. It quote, unquote, thinks it comes back and then we get an output. And you guys, if you want to know what the outputs look like, just go look at my LinkedIn. You'll see a bunch of how the outputs look. We've decided that the best outputs, at least to start with, are the actual scoring of every question what are four ways that the agent did well, what are four things that the agent could improve upon in the call summary, and then kind of just that overall score with customer and agent sentiment as well.

Speaker 2:

But let's talk about some of these prompts and some of the things that if you're planning on doing this or some of the things that we have found. So number one is less is more for easy questions. So if you have a greeting or if you want to collect an email address, did the agent collect an email address? Right, that's really all you want to say. You don't want to get. We try to do these kind of elaborate things for everything and it just confused it for the, the, the shorter type, black and white, binary questions. So that's pretty easy.

Speaker 2:

But let me say this the word explicitly is like in grained in chat GPT. So if you use the word explicitly and sometimes we would use chat GPT to help us with developing some of the prompting for each of the questions it would be absolutely exact. So if there was anything off like like one of the things was explicitly, please make sure that the agent explicitly says thank you for calling customer service. If there was anything off, if there was a pause, it would score it as a, as a zero or no. So we have found that if you want to be exact, you don't, you don't have to really tell it to be exact. Just give it kind of that general deal and it works much better. Unless you have something like a disclosure right, like you can't go off. You can't have a t dotted or a t crossed in, an I like it's all going to be, sorry, I muted myself, sorry, I muted myself, it's all going to be perfect. So be careful about being too explicit when you want to have something exact. Most of the time, if you just tell it and give it kind of the rough example, it will work.

Speaker 2:

Now, this is the cool stuff, right? So how do you, how do you have chat GPT when it's looking at it just to transcript, talk about empathy, like that was. That was something that was big for me, and you know you could just say well, we want the agent to say I'm so sorry for you to hear that, I'm so sorry to hear that, or oh my gosh, I can't believe that happened. Right now you could do that, and it's pretty generic because those, those kind of conversations can come up in a lot of different instances. But what we have found better is is to kind of use a lot of if they end statements when it comes to the more thinking type questions. So you know we'll say something and let me actually I'll pull the actual prompt, pull the prompt up. Give me one second here, I'll pull it up in a second.

Speaker 2:

But basically what we say is hey, can you look in the transcript? Look in the transcript to find out any instances where the customer seems distressed, where they said something that was that that was had a negative sentiment, that was that was not positive. And then after you have found that, then we want to make sure that the agent isn't using kind of just a basic scripted response, but that they're actually using some words in there that correlate directly to what the customer said. So we're not looking for specific keywords like the agent must say. I'm so sorry for you to hear that we got a little general with what, what could be said by the agent, as long as it kind of correlated back to the actual problem and that the agent was actively listening. For that I have a full, if you guys. Again, if you go on my LinkedIn, I think yesterday I posted like these five kind of core prompts. I have the exact full prompt for empathy and what we did there and it works every single time. So again I would ask you or or Employ you if you don't believe me, take that prompt, go play with it on the desktop, take your call recording. I think that was something that was really cool for us to kind of finally figure out, to just try to, because we were always trying to do something different. Like we know, we can just say, hey, can you find this in a recording, but how do you take it to the next level, to really kind of Use the use case that we want? So I think that that was interesting.

Speaker 2:

The other thing and I'm just kind of all over the board here, these are all random is is don't tell chat GPT to tell you when something is not there. Now it can do that, and let me give you an example. But it would get confused a lot when we would say certain things like Please score this with full points if this is a sales call or a retention call, but score it as an NA if it is a password reset call, all right. And chat GPT would consistently get confused with what was what, even though We've done some things with even selecting what different call types can come in. So I think in for for our platform, it doesn't matter.

Speaker 2:

You don't have to have skills set up for you know Sales, retention, password reset, that you could have one skill that comes in and we have a way to know it's actually. People know what type of call it is and then what questions that correlates to it, but we were trying to tell it too much information and it would get crazy confused. So what we found is that you don't have to tell it NA, you just have to tell it for what it's looking for and if that stuff's not there, it will do and it will score it as an NA if that makes sense. So you know, please only score this if it is a sales or retention call and then you kind of leave it at that at the end of the prompt, Don't tell it to say and if it's not there, scored in a.

Speaker 2:

It got crazy confused and that was super frustrating because we're like no, we're telling it exactly what we want, but it would get it would get frustrated with that. So that's, that's a tip for there and I think that's like that's more the analytics right. You always, when you're looking at like advanced speech analytics, it's very easy to find things that are there, but it's more difficult to kind of look for things that aren't, and I think that that kind of maybe is a little bit of a crossover, why? Why it gets confused. The other thing that I think is is pretty cool is how do you prompt for the unseen right, meaning an agent has to move on to a certain, you know, part of their computer screen, they have to get a certain part of information, a certain piece of information. They have to click on a click on a box. And I do actually want to pull the prompt up here. Give me one second, because I think this, this one, baffled us for a really long time, and I'm not saying it's perfect, because it's never gonna be perfect if it's something that we can't find in the, in the actual transcript, but I think it's pretty darn close and it has given us the Kind of the outputs that I think we've wanted on on a vast majority of the calls. All right, give me one second, let me pull this bad boy up. I don't know, I thought I had it up but I deleted it. All right, let's set. Pull this post up, all right.

Speaker 2:

So for scoring for the unseen, we basically say things and again, this prompt is in that that post that I did the other day is Well, what, what can't, what do we know? You know, if an agent has to get some specific information from a specific part of a screen, we know certain things, like there's a promptness in providing that information, right. So if we say, all right, let me, I need to pull that up, or or you know something along those lines, if there's, if there's a delay in the in the actual talking, we can kind of see that, yeah, they're probably not being able to find that piece of information quickly. Can they transition between topics, like if there's a big change in topic? And again, if the question is, did the agent read the proper disclosure or did the let's? Let's say, did the agent Find the information for the dishwasher? Right, and so the cusp, if the customers is saying have a problem with my dishwasher, and there's four seconds, five seconds, six seconds when the agent is trying to find that information right and the question is did the agent quickly find the information? We know that if that's going to be kind of a yes or no again, is that perfect? It's not perfect, but I think that you kind of get the idea of the transition between topics, you know confirmation of actions, minimal need for correction. So there's a couple things in that prompt that basically said how quick did that? Did that agent really find this information? Now, things like did they click the, the, the box for opt out of email, we actually look for a little bit of a delay. So if a customer, if that's a question In the the agent says, hey, would you like to opt out of our email, and the the customer said yes, I do. If the agent says okay and they wait like a second, all right, like things like that.

Speaker 2:

We've been able to kind of find in all of this kind of data that we think gives a pretty good representation of Seeing the unseen and doing the best that we could possibly do without right now having AI be able to go on to the actual, you know computer for what we're doing and actually seeing seeing what we're doing. We kind of talked about. One of the things that we had a huge problem with is that chat GBT sometimes, let's say we have 35 questions on a QA form, a lot of times it would not return all 35 questions, which is a really big issue. Right, and it wasn't just NA questions, it wasn't just yes or no questions. There was no real rhyme or reason to why it was not returning and our JSON output all the files or all of the questions. We still don't know why that that did happen, but we use the word imperative.

Speaker 2:

We've used a lot of different words, but we found that imperative worked the best. So we basically said it is imperative that you return all of the questions in a JSON format and then you know the whole. There's more to that, but basically telling it imperative, we have found and explicitly right those two words, and I'm sure there's a ton of those words, I'm sure it's not just those two words, but those two words definitely have an impact in your prompting, to be exact and to kind of not go off. So you know, once we said that now there was a lot of different ways that we could have done, that we could have said you know, one of the things we were talking about is hey, you know, please review how many questions there are at the beginning, make sure that you answer the same amount at the end. You know those kind of things, but we found that that it took, it made the prompt or made the QA form take a little bit too long. So that's kind of the route we went in just one little quick sentence and it works. And it's worked every single time and we have not had a problem with that sense.

Speaker 2:

The other thing that we have found for accuracy and for speed is to tell ChatGPT where to look for certain things in a transcript. So if we say things like for the greeting, like the caller must you know for this specific client, the caller must, or the agent must, say thank you for calling customer service. Please look for that in the first five lines of the transcript. So we have found that that has I don't want to say significantly reduced the allotted amount of time that it takes for a QA form to come back. But I think it's been more accurate because it's not looking at everything and it has been a little bit quicker the more that we've implemented those type of things. You can do the same thing for the closing right, because you're not going to have a closing at the beginning. So why have it read through the entire transcript for all of those things, you know.

Speaker 2:

I really got excited with the thinking questions, the black and white binary. Did the agent do this or that? Everybody knows that ChatGPT could handle that, but I think that the nuance to any of these companies that are going to try to do this is how do they handle the empathy questions? Did the agent do something appropriately throughout the call where it's not just a black and white but it takes a thought process of maybe multiple sections of the call, and I think that it can be done. I think we've done a really good job with you know, the cool thing about this is being able to test this with our actual customers. So pretty much every single customer we have on our BPO is utilizing this now. So our QA department I haven't got rid of any QA people or anything like that, yet. They are. We're basically scoring a call human beings and they're calibrating it. We're just doing that all day long, all day long, making sure that all of these prompts work.

Speaker 2:

We're now kind of I don't want to say we're hands free, but we're at a point now where you know, I think, that the core basic prompts that everybody has right, everybody's going to have, like an opening, a closing, they're going to have a greeting, they're going to have, you know, did the agent use, have proper tone? Did they use proper word choices? Did they not use him and ha, did they not have diminishing language for the company? Like these, like 10 things that we know work really well, or 15 things you know are going to be part of every single kind of onboarding. And then, obviously, you just utilize it and change it and put it into your company's context and add as many questions in as you can. But I think writing the book on understanding how to prompt for specific questions, whether it is a thinking question to a binary black and white question, to something that takes a little bit more thought process, those were the things that I think we feel comfortable about and that's the magic sauce, right. So that's why I could care less of everybody you could use.

Speaker 2:

You know all of our prompts. I don't care, because there's going to be certain things that come up that we're going to kind of understand a little bit more. But I also want people to feel comfortable with this technology. I think this should be democratized. This could be. You could be a five-seater and just use the desktop version and have one prompt that has everything and you could just be hammering out calls by yourself for free every single day, and I would love to see that right. I think that could be. You know one option. Obviously we have, I think, a slicker version of that, and there's a lot of companies that are coming out with slicker version. This isn't just us, but that's the thought process that kind of goes into it, from understanding how chatGPT thinks to get the best result and to get the most consistent results. And I would say now, again, like I said, all of our QA department is utilizing this for all of our customers. That's kind of our alpha test before we beta. But yeah, I mean, I think that's kind of what I wanted.

Speaker 2:

I'm trying to just look down my list here. Is there any other prompt or anything that else that I thought was pretty cool. I don't know. Do you guys have any questions? I appreciate everybody kind of joining here, hopefully that this was a little bit of insightful, that a little bit of how it was a little bit of insightful and I think it's kind of cool, but is there anything? Do you guys have anything? Any questions? Just trying to think of things. Like you know, we didn't really struggle.

Speaker 2:

We found that I know ChatGBT has kind of a and again I'm not a programmer so I'm gonna say this wrong but they have a way or a button that you basically click to guarantee a JSON file output. We found that that was very restricting, so we just prompt for the JSON output in our actual static prompt and that has worked out much better and we have a lot of flexibility then to make sure that we're getting the right stuff that we want. I thought one of the things that was really helpful and this is kind of crazy, but just a quick story is there's a I forget what her name is, but she won the Singapore national prompting competition and I was trying to read as much as I could on prompting and kind of how to figure this stuff out and she had an article on Medium and at the bottom it was like hey, if you wanna talk to me, it's like 50 bucks for a half hour. So we've utilized her a couple of times at the very beginning a couple of months ago. That really helped us to understand some of the outputs.

Speaker 2:

Understanding, you know, just look for certain aspects of the transcript. Don't read the whole transcript every single time. If you know something's at the beginning, at the end, understanding that you know the structure right of how chat GPT's quote unquote mind works. I think all that was extremely helpful when we're going through our prompting. You know the other aspect, though. There is, oh, jeremy, yes, let me bring you up bud. All right, jeremy, you're muted, but you're up.

Speaker 4:

Hey, thanks, buddy. I joined a little bit late, so apologize if you spoke about this already and I missed it. I'm just curious if there's anything that you found from you know any of your clients where it's like you know what a human still needs to do this part? There's a certain type of process or policy or question that it just doesn't have the needed information. You know, maybe some sort of a different than that. Maybe there's something in the record history that it doesn't have access to, or anything along those lines.

Speaker 2:

Yeah, and I think it just does go back. If we have a client that is very heavy into Things that are happening on their computer screen, right, like they have to be in a certain field, they have to make sure that certain things are clicked, we're gonna really struggle with that right now. You know the, the visual aspect. I mean we don't have any of that. I mean not that we couldn't, but what I mean I'm not even we're saying totally on a transcript. So I think I think that there's a lot to this right. Number one is there is a Security aspect to this right there. To be perfectly fair, I think, using the, the API Version of chat, gpt, I feel much better on the security aspect than if we were just. Obviously we would never use just the desktop, but but I still think that from a masking standpoint, from a PCI standpoint, I don't know if I feel comfortable working right now with, you know, financial services clients to to have credit card numbers and all that stuff. Now I think that it it probably is totally fine, but again, I I think that's a, that's a, that's a thought that we would really have to think through. The other thing is, again. I just think it is.

Speaker 2:

I, as long as something is in the transcript, we've been able to figure out really unique ways To be able to score that call the other things. We can rate it's not as as accurate, but I'm starting to feel like it can be an offering because it's it's accurate enough. Where you know, some QA forms have like a one through five right, like score this on or on a scale. So we're looking at that. But I think that those I think there is a little bit of a there's going to be some Customers that are that are nervous from the security aspect, that they're not gonna want this, they're gonna want a human being to do it. But the other thing is, if you have more than you know, 20% of your your questions are not in the actual transcript and it has to be a Transactional thing on a computer screen, then we're gonna stink at that too. If that, if that kind of, answers your question, yeah, that's great. Thank you, all right. All right, let me bring you up. All right, javi, gear up how you doing buddy.

Speaker 3:

I'm doing well, tom. How are you?

Speaker 2:

I'm good. I'm good, I'm gonna talk to you.

Speaker 3:

Yep, thank you for this session, same as Jeremy. I apologize, I joined a little little late, but as a follow-up to Jeremy and also a question for you. So on our side we've been leveraging the world, the chat, gpt API to do some automated quality and I think it's very important, like you mentioned earlier, to add in a lot of context, before you even ask it, the questions that's all related to the QA form, provided the intro and what it is that you're given it, like this is a transcript, so this is a color, this is a chat or whatever. And then within that context also, what we've learned is We've been having to provide it a whole bunch of gap card whales If there is this, do not bring it into your analysis. If there is that, do not bring it into your analysis either, like ignore it or Whatever. And even more than guardrails, we've been having to tell it things like use constructive language, do not use negative terms like mediocre or poor or weak. So we've been having to be very specific with it in terms of Contextualizing as much as possible. So when we do finally ask it the question that is linked to the quality assurance form, it's got all that context before it answers it.

Speaker 3:

In addition to that, to Jeremy's point, what is it that chat GPT can do that we have to rely on a quality analyst to do? We've started to tell it. If this conversation is too complex for you to provide us constructive feedback, please flag it so we can have one of our quality analysts look at it. So we're basically telling chat GPT to help us identify which calls should be reviewed by human in order to help provide more analysis and more constructive Feedback to the rep or to the manager of that rep To improve. So I want to learn from you about all that context that you've been providing. Yeah, god, wales, how did you add them within the logic?

Speaker 2:

So I will tell you this we did ask for a confidence score With chat, gpt. So you know we basically said can you, you know, rate this transcript, rate the, the output that you have, on a scale of one to ten? And if it, you know I forget what we said. This was at the beginning when we started testing the different question. But if it was like below five, then kind of flag that because you don't feel comfortable or confident that you could score this call Either from a complexity standpoint, the transcript was garbled, you know, something like that. We also found that we would ask Confidence score as we're testing for every single question and that also found what prompts we were struggling with and there was kind of a direct correlation.

Speaker 2:

To answer your first question, we really have not found too much of that Now. Like our static prompt is basically you know, you're the, we go into it just like a regular deal. Like you're the head of quality assurance, you oversee scoring for quality. You will define the type of list as the call. We just kind of define what we want our output. So one is going to be the call type. So if it's a sales call or retention call, we want to know that we basically tell it to give an agent and a customer sentiment score. We ask it, you know, to add that to the JSON output. We talk about the scoring being a number being an NA, yes, a no in an NA. We kind of go through that. We talk again. We talk about the outputs of four ways that they did well, four ways that they did poorly, and then we actually we ask for the call summary in that that as well. But we have not really done any type of guardrails, especially in the summary, and not we found it. It really hasn't, you know, said anything derogatory or poor. You know, when it comes to the actual summaries, we do ask for we're calling it the rationale right now. So if, if, if, chat, gbt, it does the summary, if it scores it as a yes, like it gave a full points, we don't really we don't say anything. But if it scores it as a no or an NA, then we have like a little question mark next to the question where we can look at that and it will tell us why it scored it as a no. And a lot of times that will be kind of part of the prompt as well. That we're kind of because we wanted to know you know what piece of that question it did not, it didn't like. But after that it's just each question has its own. We're calling it context, but the context is just the mini prompt to find that question, and then that's basically, and then we just define the output of how we want the JSON file to look like, and then that's how we get the output for each of the of the call scoring form.

Speaker 2:

So, again, from a guardrail standpoint, I'm not and I don't know I've done, I don't want to say a thousand of these, but hundreds upon hundreds of these myself, the call summaries have been pretty much on point with what the call is, black and white. We added in there to please, in the call summary you know, talk about if the agent did not do something where the points were taken off, so that you know that that QA person can kind of read that and look at that. You know it probably does mean a lot based on the. You know how complicated and how complex that you know the calls are. You know I mean we're talking about BPO, you know financial services, retail tech support. You know those type of of kind of I don't know, say four to 10 minute type calls that a lot of them, you know, are extremely binary and they're, yes, no type question, that we have one client that has seven different call types that come in, that have seven different types of different calls, that all have different scoring and questions that correlate to different types of calls. We've been able to kind of figure that out. But yeah, I mean, I guess I really haven't seen too much kind of derogatory language or those type of things you know with the, with the outputs, but I'm going to probably look out for it, maybe a little bit now.

Speaker 2:

You got me freaked out, but yeah, that's kind of what we've, how we've at least structured the, the regular prompt, which is pretty straightforward, and I think the meat and potatoes of it, though, is the, is the figuring out the, the context or the prompting for each of the questions to get a proper response, an accurate response and a consistent response. So I hope that that helps you a little bit. Yeah, let me bring it up, all right, guys? Well, hey, I don't know, that's really all that I have. I appreciate it. I hope that that was helpful. We'll continue to kind of do this. I think it's been interesting to to kind of go down this path. And then I know there's a lot of you who are interested in this stuff too and it's it's a lot of fun to talk with you guys. So again, thank you guys very, very, very much. If you have any questions, just just hit me up. Thanks, guys. Tick, tock, what's up? Does anybody have any? You guys have any questions on prompting, on AI, on quality assurance? Let me know.

Call Center Geek Podcast
Optimizing Chat GPT for Call Analysis
Utilizing ChatGPT for Automated Quality Assistance
Quality Assurance and Prompting in Calls