Disruption Works Chit Chat

Speech recognition technology where we have come from and where are we heading.

February 10, 2023 Disruption Works Season 3 Episode 2
Disruption Works Chit Chat
Speech recognition technology where we have come from and where are we heading.
Show Notes Transcript

In today's podcast we are putting together a potted history of what technology used to look like around in speech recognition used in voice applications and how that has changed.

Tom and Steve discuss the history of where it was and why the industry boom of the late 90's failed. Then moving onto the incredible acceleration of the technology now making the world of voicebots and voice applications common place and proven.

Then moving onto the challenge that businesses have, after being burnt with previous investment, why they now need to look at this and not lose significant market share with more dynamic companies joining the fray.

Our latest series of podcasts, concentrates on voice and how that is going to impact the next few years with tips along the way. Find out more about voicebots here and if you have any subjects that you would like us to discuss then email info@disruptionworks.co.uk with the subject Podcast and we will see what we can do ;-)

0:0:13.270 --> 0:0:19.800
 Steve Tomkinson
 Welcome to a new podcast with Steve and Tom. Hello, Tom. How you doing today?

0:0:20.170 --> 0:0:21.750
 tom (gast)
 Thank you, Steve. I'm doing fine you.

0:0:25.260 --> 0:0:26.390
 tom (gast)
 Yeah, it's called you too.

0:0:22.50 --> 0:0:27.750
 Steve Tomkinson
 Yeah, not too bad, not too bad. It's cold out there. My God, is it gone cold? Geez.

0:0:27.90 --> 0:0:28.800
 tom (gast)
 And at school here too, yeah.

0:0:28.920 --> 0:0:35.670
 Steve Tomkinson
 It is immensely cold. I've got all sorts of jumpers and everything on at the moment and the heating's on and we can't afford to heating so.

0:0:40.140 --> 0:0:40.890
 tom (gast)
 Yeah.

0:0:43.870 --> 0:0:44.100
 tom (gast)
 Yeah.

0:0:38.30 --> 0:0:45.600
 Steve Tomkinson
 So what can you do? I just start lighting a small fire in my living room, isn't it? Ohh dear.

0:0:51.240 --> 0:0:51.780
 Steve Tomkinson
 Uh, yeah.

0:0:44.900 --> 0:0:56.370
 tom (gast)
 Well, that's the that's the advantage we have because we are living in a high end apartment and everything is very, very good control. So we have four floor heating and very isolated windows. So it's really very good.

0:0:57.570 --> 0:0:58.890
 tom (gast)
 I remember this four.

0:0:55.170 --> 0:1:2.300
 Steve Tomkinson
 Yeah. Are you alright? No, no, no. I'm gonna get me cat. I'm gonna get me camping cooker out in a minute and stick that on.

0:1:3.920 --> 0:1:5.370
 Steve Tomkinson
 The other one, the other one.

0:1:5.90 --> 0:1:6.900
 tom (gast)
 Everything you do to get a little with cozy.

0:1:15.230 --> 0:1:15.420
 tom (gast)
 Yep.

0:1:8.610 --> 0:1:38.430
 Steve Tomkinson
 Absolutely, absolutely. So well, anyway, thanks. Thanks for joining us today, Tom. And I suppose what we're gonna talk about today is voice and but also kind of reviewing where we've been over the last 25 years of this been around. I mean that's a long time in technology and but it's you know the as all these things do they they change over time.

0:1:38.630 --> 0:1:47.390
 Steve Tomkinson
 And I suppose what we were thinking talking about today was kind of going from where it all started to, well, what's gonna happen next and.

0:1:47.470 --> 0:1:54.280
 Steve Tomkinson
 Them, and I suppose over to you, where where did this all start? Tom, you know where where do we start with voice technology?

0:2:0.400 --> 0:2:0.810
 Steve Tomkinson
 Yeah.

0:2:7.610 --> 0:2:8.80
 Steve Tomkinson
 Right.

0:1:54.820 --> 0:2:13.130
 tom (gast)
 Yeah. The very first development of voice technology was already late 80s, of course, but the the real substantial development of voice was in the end of 90s when we really were in a position that we could use it as a solution because the idea was.

0:2:20.80 --> 0:2:20.450
 Steve Tomkinson
 Yeah.

0:2:14.310 --> 0:2:20.810
 tom (gast)
 Why not try to integrate the most natural interface we have and that the speech instead of?

0:2:29.150 --> 0:2:29.530
 Steve Tomkinson
 Yeah.

0:2:45.60 --> 0:2:45.490
 Steve Tomkinson
 Yeah.

0:2:22.280 --> 0:2:47.990
 tom (gast)
 Having to learn how to control and apparatus or how to touch certain buttons in a in a certain environment. The thing was when we were able to use speech that would be a great advantage. It would be very user-friendly. It would be very intuitive. It would be very easy. And so at the 90s when we were able to really do some substantial speech recognition, the idea was this is going to boom. This is going to be.

0:2:48.390 --> 0:2:53.230
 tom (gast)
 We revolution in the in the, in the industry, in industrial world.

0:2:53.580 --> 0:3:1.610
 Steve Tomkinson
 Yeah. Well, so what's what's I suppose to speech recognition wasn't nearly as good then, but it just started, isn't it? So you know.

0:3:8.870 --> 0:3:9.280
 Steve Tomkinson
 Right.

0:3:18.930 --> 0:3:19.290
 Steve Tomkinson
 Yeah.

0:3:1.240 --> 0:3:24.170
 tom (gast)
 Yes. And then you have to think about the early days. Speech recognition was mainly one word recognition. So you've got these boundaries, you could not design the way we do today with a lot of open-ended prompting in the sense of how may I help you and then maybe find domains we get back to that later on. But in the old days like for 25 years ago.

0:3:25.190 --> 0:3:29.400
 tom (gast)
 We we were talking about designs that would sound like.

0:3:38.70 --> 0:3:38.370
 Steve Tomkinson
 Yeah.

0:3:29.940 --> 0:3:43.250
 tom (gast)
 And if you want to rent the car today, say today if you rent a car or do you wanna rent a car tomorrow, say tomorrow and if you would say something like well, I would like to rent a car today. Then the machine would get back to you. Sorry I didn't understand that.

0:3:44.70 --> 0:3:45.690
 tom (gast)
 That it was just, yeah.

0:3:48.830 --> 0:3:49.50
 tom (gast)
 Yeah.

0:3:52.510 --> 0:3:52.770
 tom (gast)
 Yeah.

0:3:44.200 --> 0:3:55.260
 Steve Tomkinson
 But you know what? Do you know what tell I'm getting. I I still get those. There's still those knocking about the go say today. So tomorrow you just go. Really. What is that what we're doing then?

0:3:58.260 --> 0:3:58.570
 Steve Tomkinson
 Yeah.

0:4:4.440 --> 0:4:4.660
 Steve Tomkinson
 Yeah.

0:4:9.960 --> 0:4:10.260
 Steve Tomkinson
 Yeah.

0:3:54.340 --> 0:4:14.750
 tom (gast)
 I I yesterday I called my, I called my insurance, my health insurance for some questions. My daughter. And then it says well and please the date of birth. And I gave the date of birth and they should let me see if I got that right. Then the system repeats the date of birth and says am I right? And please answer the question only with a yes or a no.

0:4:15.80 --> 0:4:20.100
 Steve Tomkinson
 Ohh no, really. Wow. Wow good grief.

0:4:19.70 --> 0:4:28.700
 tom (gast)
 Still would have mentoring you, right? It's still very, very simplistic for they they do not do any grammar design to allow the caller to say a little bit more than only a yes or a no.

0:4:31.920 --> 0:4:35.660
 tom (gast)
 Yeah, correct. Yeah, that's that's exactly, yeah.

0:4:39.760 --> 0:4:41.80
 tom (gast)
 Yeah, no problem.

0:4:47.560 --> 0:4:47.860
 tom (gast)
 Yes.

0:4:50.480 --> 0:4:50.790
 tom (gast)
 Yes.

0:4:56.660 --> 0:4:57.40
 tom (gast)
 Well.

0:4:28.470 --> 0:4:57.760
 Steve Tomkinson
 Uh, right. Because because you go. Yeah, that's right. You know, something like that. Because that's how you would answer somebody else putting the format. Yeah. No, it's funny. I'm still getting those. But anyway, sorry, I I I love you. So. So suppose the next thing is that we had did have a boom in this. There a lot of people invested in this, didn't they? I mean we've had booms and technology and then that was one of them. So kind of why why did it not really take off during that boom period.

0:5:8.40 --> 0:5:8.330
 Steve Tomkinson
 Yeah.

0:5:10.740 --> 0:5:11.0
 Steve Tomkinson
 Yeah.

0:5:18.650 --> 0:5:18.950
 Steve Tomkinson
 Yeah.

0:5:22.920 --> 0:5:23.220
 Steve Tomkinson
 Yeah.

0:4:58.500 --> 0:5:28.660
 tom (gast)
 There are few reasons, I guess. First of all, all those companies popped out of the ground claiming that they were able to do speech recognition and formally they were web design companies and the idea was zooming around that from a website to a talking machine is just a few buttons, which was of course not exactly right. And that was one thing. So a lot of people claimed to.

0:5:32.160 --> 0:5:32.520
 Steve Tomkinson
 Yeah.

0:5:28.760 --> 0:5:37.180
 tom (gast)
 Be able to work with speech recognition but they didn't because it didn't have the skill sets. And secondly there was a.

0:5:38.140 --> 0:5:40.560
 tom (gast)
 The the industrial world had a really.

0:5:47.680 --> 0:5:48.520
 Steve Tomkinson
 Yeah, yeah, yeah.

0:5:55.300 --> 0:5:56.400
 Steve Tomkinson
 Yeah, yeah.

0:6:2.880 --> 0:6:3.690
 Steve Tomkinson
 Yeah, yeah.

0:5:41.500 --> 0:6:12.170
 tom (gast)
 Was very how do you say the the expectation was very, very immense. I mean they all that this is going to help us in our call center, this is going to help us in any kind of a simple automation when people can deal with the system with speech. It would be great, great, great. One thing we forgot we were still in the world of 1 record one word recognition and also with companies like IBM Nuance they all have their own language models and their own grammar books.

0:6:12.520 --> 0:6:12.860
 Steve Tomkinson
 Yeah.

0:6:12.570 --> 0:6:19.80
 tom (gast)
 And it was not based on something we call the cloud today. There's nothing to do with it was it was limited.

0:6:20.80 --> 0:6:26.190
 tom (gast)
 And so we were very much forced in a in a framework where we were.

0:6:27.190 --> 0:6:28.0
 tom (gast)
 Not able to.

0:6:28.740 --> 0:6:31.230
 tom (gast)
 And compose a conversational style.

0:6:31.960 --> 0:6:32.450
 tom (gast)
 So.

0:6:40.270 --> 0:6:40.670
 tom (gast)
 Exit.

0:6:45.420 --> 0:6:46.650
 tom (gast)
 Correct. So yeah.

0:6:31.840 --> 0:6:51.110
 Steve Tomkinson
 No. So it's still in the single words and I suppose the the there wasn't nearly the same training data or processing power around that training data. Then to be able to make the technology work, you know we we're we're I remember a vendor over here doing this something and they were trying to do it and this was.

0:6:52.250 --> 0:7:0.590
 Steve Tomkinson
 Anybody's listening to this from the UK at the cinema chain Odeon, we're trying to do this and they would you were doing like a telephone.

0:7:10.310 --> 0:7:10.660
 tom (gast)
 Yeah.

0:7:20.410 --> 0:7:20.690
 tom (gast)
 Yeah.

0:7:1.10 --> 0:7:25.620
 Steve Tomkinson
 Uh cinema ticket and they were asking you which cinema you wanted to do and it would never, ever get it right. It was so bad and it and it was just like, why did you bother? How did you not test this or know that this isn't working, you know, because it just didn't and it was single word stuff again. But those quite a complicated process, actually. They were trying to simplify.

0:7:26.140 --> 0:7:38.560
 Steve Tomkinson
 And and and we didn't have apps then and we didn't have anything else. It was gonna be a phone call if you wanted to do it remotely. The website wasn't very good and all that type of stuff. So it would have been a great solution for them.

0:7:39.220 --> 0:7:39.580
 tom (gast)
 Yeah.

0:7:39.250 --> 0:7:41.940
 Steve Tomkinson
 But the technology wasn't there to support it, you know.

0:7:50.600 --> 0:7:50.880
 Steve Tomkinson
 The.

0:7:41.390 --> 0:7:50.920
 tom (gast)
 No, that's that's right. Because that reminds me of of a airplane. Sorry aircraft company that.

0:7:51.680 --> 0:7:57.510
 tom (gast)
 Wanted to automate automated voice technology as well and then you had this problem with Newark and New York.

0:7:58.10 --> 0:7:59.80
 Steve Tomkinson
 Uh, right? Chat.

0:8:2.700 --> 0:8:2.970
 Steve Tomkinson
 Yeah.

0:8:7.190 --> 0:8:7.510
 Steve Tomkinson
 Yeah.

0:8:10.370 --> 0:8:10.700
 Steve Tomkinson
 Yeah.

0:7:58.180 --> 0:8:19.730
 tom (gast)
 So don't. Don't forget, it's not only the the the the limited vocabulary, but also the accuracy of speech recognition on itself. So the actually was not that high yet. So we were happy when we had 0.7 accuracy and now we are not happy when it's below 0.97.

0:8:20.260 --> 0:8:21.120
 Steve Tomkinson
 Yeah, yeah.

0:8:20.990 --> 0:8:22.670
 tom (gast)
 That's a big difference. So the.

0:8:23.970 --> 0:8:24.180
 tom (gast)
 Yeah.

0:8:33.210 --> 0:8:33.870
 tom (gast)
 That's perfect.

0:8:35.360 --> 0:8:35.730
 tom (gast)
 Yeah.

0:8:22.220 --> 0:8:37.170
 Steve Tomkinson
 Can I just stop you there, Tom? I was just looking at the transcript because obviously we're recording this and I was looking at the transcript and Microsoft managed to define between Newark and New York, so that knowledge has moved on. So that's always encouraging, isn't it?

0:8:47.30 --> 0:8:47.300
 Steve Tomkinson
 Yeah.

0:8:37.20 --> 0:8:52.910
 tom (gast)
 Yeah, that's good. Yeah, but today the accuracy is so, so much better than than than for 20 years ago. So that's also one of the reasons because everybody had Great Expectations. And then we started to industrialize some applications out there. And of course, they did not live up to those expectations.

0:8:52.500 --> 0:8:55.700
 Steve Tomkinson
 And I suppose that was the same as the web boom as well, because.

0:9:8.630 --> 0:9:8.950
 tom (gast)
 Yes.

0:9:16.140 --> 0:9:16.640
 tom (gast)
 That's good.

0:8:56.420 --> 0:9:26.170
 Steve Tomkinson
 Not that people weren't ready, because there's this time they consumer was probably ready to use voice. They were fine with it, but I would say that the technology wasn't ready then, whereas web was different, the consumer was already to engage all the their own tech in home wasn't ready to do web properly. The Internet was very slow. Very so you couldn't access websites without waiting for an age-old telephone lines and things and.

0:9:26.820 --> 0:9:27.170
 tom (gast)
 That's good.

0:9:26.370 --> 0:9:40.800
 Steve Tomkinson
 And this is a different type of technology, not actually keeping up, but you know, I suppose it kind of brings us up to where we are today now. So you know what's what's the what is the ultimate difference today? You know where we are now.

0:9:39.610 --> 0:9:47.370
 tom (gast)
 The ultimate difference, the ultimate difference, is very, very clear to me and let's go back 15 years ago or 10 years doesn't matter.

0:9:47.130 --> 0:9:47.430
 Steve Tomkinson
 Yeah.

0:9:56.570 --> 0:9:56.860
 Steve Tomkinson
 Yeah.

0:10:4.910 --> 0:10:5.180
 Steve Tomkinson
 Yeah.

0:10:11.410 --> 0:10:11.780
 Steve Tomkinson
 Yeah.

0:10:17.750 --> 0:10:18.150
 Steve Tomkinson
 Yeah.

0:10:24.670 --> 0:10:25.360
 Steve Tomkinson
 Yes. Yeah.

0:10:5.240 --> 0:10:36.610
 tom (gast)
 Was using your speech or your voice in a totally unnatural way? So the idea was let's integrate voice so we can use a natural interface the way we speak. But when it was integrated in the early days, we were not able to speak naturally because we couldn't. And so the expectations were high. But we're not lived up to because we were when we when you want, when you want, when you wanted to use the speech recognition, recognition based application, you had to talk different.

0:10:37.50 --> 0:10:37.430
 Steve Tomkinson
 Yeah.

0:10:39.450 --> 0:10:42.380
 Steve Tomkinson
 Yeah, if you, you actually the robot in this relationship.

0:10:50.860 --> 0:10:51.170
 Steve Tomkinson
 Yeah.

0:10:36.970 --> 0:10:57.150
 tom (gast)
 You have to talk like a robot or you had to do so. The answers exactly. No. And today the big difference with with today is that today we have the, the, the glorious solution with the cloud. I mean Google or Microsoft clouds. They have a so well tuned API.

0:10:59.470 --> 0:10:59.810
 Steve Tomkinson
 Yeah.

0:11:3.730 --> 0:11:4.60
 Steve Tomkinson
 Yeah.

0:11:13.490 --> 0:11:13.850
 Steve Tomkinson
 Yeah.

0:10:58.270 --> 0:11:27.140
 tom (gast)
 Formula and So what you get back is almost 99% accurate and which also allows allows us to make a total different design truly based on a conversation and a conversational style. So how may I help? You know, that's not correct. Let me try to do something blah. Making an appointment. Does it suit tomorrow afternoon? No, sorry. I can't make it tomorrow afternoon. Let's see for another day. This is really a.

0:11:27.240 --> 0:11:39.790
 tom (gast)
 Black and white difference from from 15 years ago. Speech recognition worked but we could not. We could not work with it in the way we are used to using our speech. We were talking to each other on the phone for instance.

0:11:46.770 --> 0:11:47.120
 tom (gast)
 Yes.

0:11:59.390 --> 0:11:59.560
 tom (gast)
 Yeah.

0:11:40.0 --> 0:12:1.800
 Steve Tomkinson
 Yeah. Yeah, that's right. And and that's it, isn't it? Because the whole point is that it's natural, you know, cause it, if you like, there's, there's gonna be a slightly funneled process in some regard because of business, usually as as offering a service. And that service is got 10 arms of the service and that, you know, so you're gonna be in some sort of conversation.

0:12:2.500 --> 0:12:9.70
 Steve Tomkinson
 But it doesn't mean that it's it has to be very robotic. It can be very natural and its process, you know.

0:12:18.300 --> 0:12:19.110
 Steve Tomkinson
 Yeah, yeah.

0:12:8.700 --> 0:12:26.280
 tom (gast)
 Yeah. That's correct. Yeah, that's correct. Because suppose suppose you're making an application that allows public to make an appointment with the hospital, for instance. Then you will not ask for a Coca-Cola, will you? So we're talking, exactly. So we're talking about.

0:12:22.80 --> 0:12:29.10
 Steve Tomkinson
 No, that's right. But it would be rare. It'd be rare. You might. You might do, you might do, but it'd be rare. Yeah, yeah.

0:12:33.10 --> 0:12:34.400
 Steve Tomkinson
 No. Again. Again, very right.

0:12:41.560 --> 0:12:41.820
 Steve Tomkinson
 Yeah.

0:12:45.550 --> 0:12:45.880
 Steve Tomkinson
 Yeah.

0:12:47.60 --> 0:12:47.320
 Steve Tomkinson
 Yeah.

0:12:28.450 --> 0:12:57.20
 tom (gast)
 But it would be very yeah or yeah, you will not rent a car, for instance. But. But so we're talking about domains. So if we're talking about an application that allows people to make an appointment within hospital, there are certain domains, certain words, certain phrases that people might use or will use when they make appointment. So a thorough analysis will give us very clear, will give us a very clear picture of what the words are that people use when they make an appointment. And so.

0:12:56.720 --> 0:12:57.50
 Steve Tomkinson
 Yeah.

0:13:0.890 --> 0:13:1.220
 Steve Tomkinson
 Yeah.

0:13:6.130 --> 0:13:6.440
 Steve Tomkinson
 Yeah.

0:12:57.110 --> 0:13:16.780
 tom (gast)
 Those words are then easily integrated in a domain and then we can work with intelligent speech recognition and intelligent conversational style. But of course you have to define domain. Of course you have to do some monitoring in order to understand the dynamics in such a discussion. When people call the hospital to make an appointment.

0:13:17.280 --> 0:13:18.0
 Steve Tomkinson
 Yeah, sure.

0:13:18.700 --> 0:13:23.150
 Steve Tomkinson
 I I suppose so. We're doing that now and that's the that's.

0:13:23.860 --> 0:13:33.150
 Steve Tomkinson
 That's what is available today. And I mean, you know, we're talking to people and they don't know that this is available. And I think that's the the challenge we've now got is that.

0:13:46.220 --> 0:13:46.480
 tom (gast)
 Yeah.

0:13:48.730 --> 0:13:49.140
 tom (gast)
 Yeah.

0:13:55.840 --> 0:13:56.60
 tom (gast)
 No.

0:13:56.710 --> 0:13:57.240
 tom (gast)
 There's one way.

0:13:58.330 --> 0:13:59.780
 tom (gast)
 Exactly. Exactly.

0:13:33.820 --> 0:14:5.270
 Steve Tomkinson
 People are have not got an idea that they can do this really is effectively, you know, like we were joking at the start. I still get those single word menu items or even, you know press 1 press 2, press 3 you know and then they're labeled as IVR which is interactive voice response and stuff like that. And you go well, it's not interactive. It's not a voice response. There is no conversation going on you know and and it's just extraordinary that that is an acceptable technology nowhere.

0:14:5.430 --> 0:14:11.220
 Steve Tomkinson
 Just doesn't have to be. You know, it's, it's today. It's moved on so much.

0:14:11.930 --> 0:14:12.320
 Steve Tomkinson
 You know.

0:14:10.110 --> 0:14:13.390
 tom (gast)
 You're quite right that you're quite right, I mean.

0:14:14.230 --> 0:14:18.480
 tom (gast)
 Yeah, the technology has been has been so.

0:14:20.180 --> 0:14:31.840
 tom (gast)
 Has been improved. So I mean the technology has been developing so, so, so immensely. And today we can we can have somebody stammering and still understand what he or she says.

0:14:32.320 --> 0:14:33.740
 tom (gast)
 And why?

0:14:41.140 --> 0:14:41.550
 Steve Tomkinson
 Yeah.

0:14:34.350 --> 0:14:51.100
 tom (gast)
 In the old days, when somebody gives you could. Sorry I don't understand you directly today we have filters. We have strategies to work with, extents with crazy synonyms, with people who pause when they talk.

0:14:52.410 --> 0:14:57.990
 tom (gast)
 Would you like to make an appointment now, or shall I? Shall I? Shall I call you back? Wait. Well.

0:14:59.490 --> 0:15:3.290
 tom (gast)
 And the system still gets that still works with, you know.

0:15:1.800 --> 0:15:3.440
 Steve Tomkinson
 Yeah. No, no.

0:15:3.770 --> 0:15:17.570
 tom (gast)
 And also we can we can even we can even touch on this also this interesting detail we can even with speech recognition and site systems determine if somebody's lying or not.

0:15:19.430 --> 0:15:19.750
 tom (gast)
 Umm.

0:15:21.820 --> 0:15:22.720
 tom (gast)
 Exactly. Exactly.

0:15:17.980 --> 0:15:24.970
 Steve Tomkinson
 Yeah. Yeah, that's right. Well, you've got the biometric stuff now so that you can get sentiment analysis.

0:15:26.450 --> 0:15:26.900
 Steve Tomkinson
 I mean.

0:15:25.520 --> 0:15:27.490
 tom (gast)
 HHH gender.

0:15:28.100 --> 0:15:43.850
 Steve Tomkinson
 Yeah, I mean, like they, you know, there's even the stuff that's really interesting. Now, though, I kind of like is that you can do a voice identity for an individual and then pull them back. You know? So if there if this is a regular scenario that somebody found you on.

0:15:51.760 --> 0:15:52.140
 tom (gast)
 2nd.

0:16:4.930 --> 0:16:5.160
 tom (gast)
 Yeah.

0:15:44.400 --> 0:16:8.820
 Steve Tomkinson
 And you can actually answer although Bob, because you now know who they are because you've got their voice, I then you know and that's really good, you know, from both security much better security and also from a just a nice customer journey. It's like the bot recognizes Bob but it also recognizes 10,000 other customers. Yeah.

0:16:8.520 --> 0:16:14.60
 tom (gast)
 Now that's true, and that for for the people who listen to this podcast, I mean, for your information.

0:16:14.510 --> 0:16:16.590
 tom (gast)
 And a a voice print.

0:16:17.340 --> 0:16:20.160
 tom (gast)
 Is about 40,000 times more.

0:16:20.760 --> 0:16:23.800
 tom (gast)
 The individual and.

0:16:27.820 --> 0:16:28.130
 Steve Tomkinson
 Yeah.

0:16:24.380 --> 0:16:31.750
 tom (gast)
 And how do you say this detailed compared to Iris copy and fingerprint together?

0:16:31.480 --> 0:16:34.40
 Steve Tomkinson
 Yeah, yeah, yeah. Together. Wow.

0:16:32.480 --> 0:16:38.630
 tom (gast)
 So I mean, a voice print is really a good a good way to do a a safe system so to speak.

0:16:38.760 --> 0:16:44.910
 Steve Tomkinson
 Yeah. Yeah. And that's really important. I mean, look, banks have gotta take that as a, as a thing, you know, can you?

0:16:53.730 --> 0:16:54.100
 tom (gast)
 Yes.

0:16:46.950 --> 0:16:55.360
 Steve Tomkinson
 It it you know if you want to access your bank and you've got a voice print, then wow, that's that's gonna be huge. You know, it's gonna be very difficult for you to.

0:17:6.840 --> 0:17:7.970
 tom (gast)
 That's that's yeah.

0:16:56.0 --> 0:17:14.0
 Steve Tomkinson
 To mimic that, especially if it's kind of random sentences or random stuff that you're gonna look at rather than it being, you know, a set recorded voice or something like that, that could possibly get around it, you know, and now it's a very, it's a very key thing that you know.

0:17:17.970 --> 0:17:18.300
 Steve Tomkinson
 Yeah.

0:17:13.740 --> 0:17:19.980
 tom (gast)
 Yeah. So so the the combined voice print with the personal PIN code or whatever and the it's it's a very solid.

0:17:20.290 --> 0:17:22.100
 tom (gast)
 So yeah, yeah.

0:17:19.540 --> 0:17:28.570
 Steve Tomkinson
 Yeah, really solid, don't you? Yeah. And I suppose that they're starting to bring is on to what's happening tomorrow. Then I suppose that's the point is that.

0:17:29.360 --> 0:17:40.550
 Steve Tomkinson
 Tomorrow looks different again. You know, we've got developments now in neural TTS and stuff like that, which is you know very nice, you know sounds really quite good.

0:17:41.210 --> 0:17:41.540
 tom (gast)
 Yes.

0:17:41.170 --> 0:17:49.280
 Steve Tomkinson
 I'm close to, you know, just a human voice, essentially, you know, in a in intonations, getting good and all that type thing.

0:17:57.860 --> 0:17:58.240
 Steve Tomkinson
 Yeah.

0:18:5.120 --> 0:18:5.460
 Steve Tomkinson
 Yeah.

0:18:13.240 --> 0:18:13.590
 Steve Tomkinson
 Yeah.

0:17:49.700 --> 0:18:18.480
 tom (gast)
 That's right. I mean you see that in particular in countries where they already are more or less acquainted to speech, like in the United States, also in the UK, a little more maybe in Germany, but those TTS machines are incredibly well tuned and it's also developing. I mean you have tools already to influence the intonation, the tempo, speech rate, you can make a voice more happy, you can make a voice more serious.

0:18:18.860 --> 0:18:19.280
 Steve Tomkinson
 Yeah.

0:18:24.470 --> 0:18:25.120
 Steve Tomkinson
 Yeah, yeah.

0:18:19.650 --> 0:18:29.140
 tom (gast)
 Incredible. You have the toolbox, you can just work with it and sort of customize your voice. The quality of the TTS and the recordings made to.

0:18:34.800 --> 0:18:35.180
 Steve Tomkinson
 Yeah.

0:18:41.30 --> 0:18:41.390
 Steve Tomkinson
 Yeah.

0:18:44.850 --> 0:18:45.120
 Steve Tomkinson
 Yeah.

0:18:29.220 --> 0:18:49.730
 tom (gast)
 To to to create that quality is is amazing today. It's really good. Of course, there's always will always be a difference between really prerecorded prompts with the voice artists in the studio compared to TTS, but still the development is incredible. And yeah, now you see already interesting, but last week I saw on television.

0:18:50.380 --> 0:18:55.810
 tom (gast)
 Yeah, this context sensitive way of writing a.

0:19:1.520 --> 0:19:1.870
 Steve Tomkinson
 Yeah.

0:18:56.620 --> 0:19:2.340
 tom (gast)
 Just note or a Twitter Twitter message that.

0:19:8.260 --> 0:19:8.630
 Steve Tomkinson
 Yeah.

0:19:3.330 --> 0:19:11.380
 tom (gast)
 The cloud can already is already able to see the context of what you're writing, and then can react in context.

0:19:12.20 --> 0:19:12.210
 tom (gast)
 No.

0:19:11.810 --> 0:19:12.790
 Steve Tomkinson
 Yes. Yeah.

0:19:12.880 --> 0:19:22.290
 tom (gast)
 You can. You can even give a certain system the assignment. Write me a little story about a little sailing ships.

0:19:22.870 --> 0:19:23.480
 Steve Tomkinson
 Yeah, yeah.

0:19:23.370 --> 0:19:27.80
 tom (gast)
 And the thing starts writing and there comes a solid story.

0:19:25.320 --> 0:19:30.990
 Steve Tomkinson
 Yeah, they're well, you're talking about. You're talking about the Jack cheap Jack Cheese BT on you? No.

0:19:32.230 --> 0:19:32.560
 Steve Tomkinson
 Yeah.

0:19:30.360 --> 0:19:40.630
 tom (gast)
 Exactly. Exactly. Now it's it. It will not be that long before that goes over to voices will infect voice as well you.

0:19:55.470 --> 0:19:56.820
 tom (gast)
 It's not not only a none.

0:19:40.340 --> 0:19:59.70
 Steve Tomkinson
 William, while I was looking at so, I've used that and I've been playing around with it for a while and you know much as the and and there's a lot of news about it being not quite giving you the right answers back and it, but being very authoritative in it, you know, well that's slightly different than.

0:20:0.670 --> 0:20:13.940
 Steve Tomkinson
 But there's actually a lot of people have missed the point in the fact that actually the fact that it gave the answer back in an authoritative very nicely grammar correct.

0:20:16.490 --> 0:20:16.700
 tom (gast)
 Umm.

0:20:38.450 --> 0:20:38.880
 tom (gast)
 Yes.

0:20:14.680 --> 0:20:45.330
 Steve Tomkinson
 Uh format, like you said, it wrote an article, now it's loads of people again. I wrote this article on Jack, GB, PT and you're going. You're great. OK. Yeah. So did everybody else. And you know, and that sort of thing. But it is impressive that that is the the way that's forming its answers and that's then leads on to a voice thing. So like you said, it gets context, it gets what you're after. You might not necessarily answer the question correctly, but if it's if it starts becoming.

0:20:45.390 --> 0:20:53.450
 Steve Tomkinson
 Or there is an ability to make it a subject matter expert in a specific vertical or personal to a business?

0:20:58.610 --> 0:20:59.40
 tom (gast)
 Yes.

0:21:8.370 --> 0:21:9.450
 tom (gast)
 It's it's. That's right.

0:20:54.100 --> 0:21:11.320
 Steve Tomkinson
 Then that's a massively powerful thing, and I think that's the next step for that. And then of course, you've got nary natural responses. You stick a very nice TTS on the end and you've got a dynamic voice environment, which is huge, you know, that's that's very, very good, you know.

0:21:11.0 --> 0:21:18.310
 tom (gast)
 Why, you know, am I was thinking if you have the context and you have a sort of.

0:21:20.370 --> 0:21:27.720
 tom (gast)
 Yeah, if you have a brother or horizon of analyzing the context, I mean, the question is like, can you have a look at my balance?

0:21:28.200 --> 0:21:28.670
 Steve Tomkinson
 Yeah.

0:21:28.420 --> 0:21:32.190
 tom (gast)
 And and then the voice bot or the speech application would say.

0:21:33.590 --> 0:21:34.400
 tom (gast)
 For what account?

0:21:34.940 --> 0:21:35.840
 Steve Tomkinson
 Yeah, that's right.

0:21:35.590 --> 0:21:39.450
 tom (gast)
 Or or. Please let's that's makes a lot of sense, you know.

0:21:45.910 --> 0:21:46.150
 tom (gast)
 It's.

0:21:47.630 --> 0:21:48.120
 tom (gast)
 Exactly.

0:21:56.80 --> 0:21:56.580
 tom (gast)
 Exactly.

0:22:0.430 --> 0:22:0.610
 tom (gast)
 Yep.

0:21:39.310 --> 0:22:11.200
 Steve Tomkinson
 Yeah, yeah, that's right. But but you. But The thing is with that is it is you have to be the subject matter expert for the fact that you're inside the banking environment and what chat GPT is, is not doing at the moment is it's it's, you know, the actual source is the Internet. It's going again content from the Internet and that's where it's data is. Yeah, you know, you look through a few layers and it basically ends up saying, yeah, we scraped tool from the Internet. Well, that's OK. But the accuracy is poor and never is fight moaning about that because it's basically turning into a.

0:22:11.400 --> 0:22:13.570
 Steve Tomkinson
 At chatbot version of Wikipedia.

0:22:14.70 --> 0:22:14.410
 tom (gast)
 Yeah.

0:22:39.980 --> 0:22:40.300
 tom (gast)
 Yeah.

0:22:41.730 --> 0:22:42.180
 tom (gast)
 2nd.

0:22:14.540 --> 0:22:45.70
 Steve Tomkinson
 And Umm, which as we all know, it varies wildly. What's true and not true in the and you know much as it's moderated and they try and make it as good as possible, you know the the fact that it's as an editable space means that somebody can put something in there that isn't. But when you're then in a vertical, there's slight insurance, financial or even just a retail environment, it has to be specific, it has to have to know what to do next and has to be trained in that bit. So that's going to be the next challenge for that thing. If they start then going.

0:22:52.320 --> 0:22:52.510
 tom (gast)
 No.

0:22:45.170 --> 0:22:56.260
 Steve Tomkinson
 Right. OK. Gives you data for X, gives you data for X, gives you data for X and then it will go understand all that stuff. Then you know where there aren't we, you know that's that's that's that's it.

0:22:54.900 --> 0:22:59.440
 tom (gast)
 Well, that's where we hitting, yes. But then again, let's be honest with each other, I mean.

0:23:5.30 --> 0:23:5.390
 Steve Tomkinson
 Yeah.

0:22:59.590 --> 0:23:6.640
 tom (gast)
 And we are living in a world where we already thinking about the day after tomorrow, but still we see that.

0:23:8.420 --> 0:23:12.110
 tom (gast)
 Current business customers are not even.

0:23:13.440 --> 0:23:15.370
 Steve Tomkinson
 They're still in the 1980s.

0:23:13.20 --> 0:23:16.20
 tom (gast)
 Using a 10 still the 90s.

0:23:16.780 --> 0:23:17.220
 tom (gast)
 I mean.

0:23:16.480 --> 0:23:20.670
 Steve Tomkinson
 90s alright, I'll give him a 90s then. But it is dreadful, you know.

0:23:30.420 --> 0:23:30.610
 Steve Tomkinson
 Yeah.

0:23:19.280 --> 0:23:36.310
 tom (gast)
 Yeah, I mean to make that step to go into a real good voice bot environment where we really do some solid monitoring on context and domains and to make that particular interaction between the caller and a company work very well.

0:23:44.170 --> 0:23:44.550
 Steve Tomkinson
 Yeah.

0:23:36.730 --> 0:23:47.90
 tom (gast)
 And it's indeed some investment, but with the technology we have come so much further, so much further than what we are still implementing today. It's a little bit ridiculous actually.

0:24:3.920 --> 0:24:4.300
 tom (gast)
 You're.

0:23:47.450 --> 0:24:7.960
 Steve Tomkinson
 Yeah, I know I spent. So it's it's really don't get why there isn't a race to this. Uh, because I think the first movers will get massive advantage because of course the big advantage with this and you know the ones that have got to race to it now are the ones that have got cues they've got waiting lists, they've got, you know anybody now that's got queue above.

0:24:9.80 --> 0:24:15.350
 Steve Tomkinson
 5 minutes. It's gotta be looking longer, harder themselves to go. I need to to better solution than this.

0:24:20.180 --> 0:24:20.460
 Steve Tomkinson
 No.

0:24:15.20 --> 0:24:31.310
 tom (gast)
 Yeah, exactly. And you don't have to. You don't have to automate the whole journey. I mean, you can automate the beginning, the simple questions, the, the stuff that is easy to cover and then already you can do so much better than what we're still doing today.

0:24:44.80 --> 0:24:44.770
 tom (gast)
 Absolutely.

0:24:31.600 --> 0:25:2.790
 Steve Tomkinson
 Yeah, and and and if you think about it, you know, I think they're the bots now with the language that is so good. You know, they're they're better perceived now than, you know, the Indian call centers of Malaysian or going somewhere else, you know, so those that are out there and doing that stuff. I mean, look, you know, it is cheap. Cheerful did it is a very cost effective solution to go Far East, but it's not a great brand thing. And I think the the nice thing about the voice bots is that.

0:25:10.320 --> 0:25:11.420
 tom (gast)
 Where? Yeah.

0:25:25.610 --> 0:25:25.890
 tom (gast)
 Yeah.

0:25:3.170 --> 0:25:29.470
 Steve Tomkinson
 To do that, you you're actually looking as an innovation company, you're looking at innovative. You're just trying to help you trying to be as smart as you can rather than it doesn't feel like, well of replaced a load of people with cheaper people. And there's some slavery going on. And over here, you know, which is the feeling when you go out to those things, doesn't have to be the case, but it's the feeling and that's the impression, I think, that people get in the Western world.

0:25:43.730 --> 0:25:44.570
 Steve Tomkinson
 Yeah, that's right.

0:25:52.50 --> 0:25:52.360
 Steve Tomkinson
 Yeah.

0:25:53.350 --> 0:25:53.950
 Steve Tomkinson
 Yeah.

0:25:29.950 --> 0:25:57.600
 tom (gast)
 Yeah, I I do agree with you and also don't forget about the the fact that the voice bot is very adaptive. If you have a change in your environment changing your product or whatever, it is very easy to make those changes and to directly put them in a live application compared to a call center that works for you that needs to be instructed when something changes. So it's also cost effective solution when it comes to.

0:26:0.200 --> 0:26:0.420
 tom (gast)
 No.

0:26:10.560 --> 0:26:10.950
 tom (gast)
 Yes.

0:26:14.100 --> 0:26:15.350
 tom (gast)
 No, it was.

0:25:57.310 --> 0:26:16.920
 Steve Tomkinson
 Yeah, you can't dynamically change that as easily because there's a big training effort, you know, because the the volume of people that are coming through the door, you know, if you've got 500 people working in a contact center to deal with that level, of course, then that's that. You can't train those overnight. That's that's a. So it's a long. It's a big shift you know.

0:26:18.970 --> 0:26:19.720
 Steve Tomkinson
 Yeah, yeah.

0:26:28.480 --> 0:26:28.770
 Steve Tomkinson
 Yeah.

0:26:29.560 --> 0:26:30.370
 Steve Tomkinson
 Yeah, absolutely.

0:26:31.400 --> 0:26:31.690
 Steve Tomkinson
 Yeah.

0:26:16.700 --> 0:26:33.70
 tom (gast)
 Yeah. And also the control of the whole thing. I mean, if if company X does, it does have installed very solid and good design voicebot they can also tweak themselves in a very, very fast efficient way. This is but still.

0:26:35.200 --> 0:26:37.630
 tom (gast)
 I get the idea that a lot of companies still.

0:26:38.460 --> 0:26:41.90
 tom (gast)
 Don't know the potential of voice.

0:26:58.230 --> 0:26:58.650
 tom (gast)
 Yeah.

0:26:40.940 --> 0:26:59.360
 Steve Tomkinson
 No, I I think that's the problem. And I think we're still in an education phase which is amazing, but it's I I can see where it's coming from. But you know they they've gotta be. They've gotta be on it now and you know, and it's easy to demonstrate, it's easy to show how good it is now. So you know it's it's sicker, sicker, it's it's to do.

0:27:7.780 --> 0:27:8.330
 Steve Tomkinson
 Yeah, yeah.

0:27:15.840 --> 0:27:16.160
 Steve Tomkinson
 Yeah.

0:27:20.910 --> 0:27:22.280
 Steve Tomkinson
 Yeah, the cinema, yeah.

0:27:28.670 --> 0:27:28.970
 Steve Tomkinson
 Yeah.

0:26:59.310 --> 0:27:30.810
 tom (gast)
 But still still be cope with the the bed impressions we made in in in for like 1213 years ago when the Deutsche banned for instance also in installed this application where you could buy a ticket on the phone and where you had exactly the same problem as the illustration you just made with the replaces that and still that is having its effect on the acceptance of voice today. If you talk to an average German company.

0:27:30.930 --> 0:27:33.970
 tom (gast)
 They will say, well, with the bond, it failed. Why should we work on?

0:27:35.70 --> 0:27:35.440
 tom (gast)
 Yeah.

0:27:33.910 --> 0:27:36.440
 Steve Tomkinson
 Yeah, that's right. Yeah, that's right, everybody does it.

0:27:36.590 --> 0:27:36.930
 tom (gast)
 Yeah.

0:27:37.350 --> 0:27:45.720
 Steve Tomkinson
 Alright. Well look, I I think that's hopefully that's been interesting for everybody. And you know we've we've covered the whole.

0:28:4.620 --> 0:28:4.920
 tom (gast)
 Yeah.

0:27:46.780 --> 0:28:8.400
 Steve Tomkinson
 Decades of voice in a in about 25 minutes. So you know, there was a bit of a race through, but I think it's interesting to see where it is and where it's come from to get into context. And I think that's the important part just to to then go look, this is now proven technology is done. It's doing it. It's happening. So there's there are people, banks and all sorts of using this.

0:28:10.650 --> 0:28:11.140
 tom (gast)
 Yeah.

0:28:9.170 --> 0:28:20.680
 Steve Tomkinson
 Really. Well, there are plenty not doing it and they're the ones that are not doing it. The ones that ask you to ask answer single questions, single words. If they're doing single words, you know they're not, they're in it.

0:28:25.400 --> 0:28:25.710
 Steve Tomkinson
 Yeah.

0:28:21.20 --> 0:28:29.550
 tom (gast)
 Yeah, that's right. But yeah, maybe we should see this as a sort of introduction and let's let's following following podcast Go a little bit deeper in.

0:28:36.160 --> 0:28:37.220
 Steve Tomkinson
 Yeah, yeah, yeah.

0:28:38.490 --> 0:28:38.800
 Steve Tomkinson
 Yeah.

0:28:30.630 --> 0:28:46.90
 tom (gast)
 In in let's go in detail what is actually in fact change what has changed? Why is it really better? And let's also give them some examples. Why not build something on the flow and hear how technology has been?

0:28:47.800 --> 0:28:48.20
 tom (gast)
 Yeah.

0:28:52.100 --> 0:28:52.420
 tom (gast)
 Yes.

0:28:43.840 --> 0:28:58.70
 Steve Tomkinson
 Yeah, we could maybe do that. Well, what's we can Cermak demo now on the podcast anyway. But look, thanks very much everybody. And I found that interesting and look out for our next ones coming up. Thanks again Tom. Cheers.

0:28:59.10 --> 0:29:0.60
 Steve Tomkinson
 OK, cheers.

0:28:57.710 --> 0:29:0.750
 tom (gast)
 OK. You're welcome. Thank you too. OK, bye.