
Disruption Works Chit Chat
Disruption Works Chit Chat
Speech recognition technology where we have come from and where are we heading.
In today's podcast we are putting together a potted history of what technology used to look like around in speech recognition used in voice applications and how that has changed.
Tom and Steve discuss the history of where it was and why the industry boom of the late 90's failed. Then moving onto the incredible acceleration of the technology now making the world of voicebots and voice applications common place and proven.
Then moving onto the challenge that businesses have, after being burnt with previous investment, why they now need to look at this and not lose significant market share with more dynamic companies joining the fray.
Our latest series of podcasts, concentrates on voice and how that is going to impact the next few years with tips along the way. Find out more about voicebots here and if you have any subjects that you would like us to discuss then email info@disruptionworks.co.uk with the subject Podcast and we will see what we can do ;-)
0:0:13.270 --> 0:0:19.800
Steve Tomkinson
Welcome to a new podcast with Steve and Tom. Hello, Tom. How you doing today?
0:0:20.170 --> 0:0:21.750
tom (gast)
Thank you, Steve. I'm doing fine you.
0:0:25.260 --> 0:0:26.390
tom (gast)
Yeah, it's called you too.
0:0:22.50 --> 0:0:27.750
Steve Tomkinson
Yeah, not too bad, not too bad. It's cold out there. My God, is it gone cold? Geez.
0:0:27.90 --> 0:0:28.800
tom (gast)
And at school here too, yeah.
0:0:28.920 --> 0:0:35.670
Steve Tomkinson
It is immensely cold. I've got all sorts of jumpers and everything on at the moment and the heating's on and we can't afford to heating so.
0:0:40.140 --> 0:0:40.890
tom (gast)
Yeah.
0:0:43.870 --> 0:0:44.100
tom (gast)
Yeah.
0:0:38.30 --> 0:0:45.600
Steve Tomkinson
So what can you do? I just start lighting a small fire in my living room, isn't it? Ohh dear.
0:0:51.240 --> 0:0:51.780
Steve Tomkinson
Uh, yeah.
0:0:44.900 --> 0:0:56.370
tom (gast)
Well, that's the that's the advantage we have because we are living in a high end apartment and everything is very, very good control. So we have four floor heating and very isolated windows. So it's really very good.
0:0:57.570 --> 0:0:58.890
tom (gast)
I remember this four.
0:0:55.170 --> 0:1:2.300
Steve Tomkinson
Yeah. Are you alright? No, no, no. I'm gonna get me cat. I'm gonna get me camping cooker out in a minute and stick that on.
0:1:3.920 --> 0:1:5.370
Steve Tomkinson
The other one, the other one.
0:1:5.90 --> 0:1:6.900
tom (gast)
Everything you do to get a little with cozy.
0:1:15.230 --> 0:1:15.420
tom (gast)
Yep.
0:1:8.610 --> 0:1:38.430
Steve Tomkinson
Absolutely, absolutely. So well, anyway, thanks. Thanks for joining us today, Tom. And I suppose what we're gonna talk about today is voice and but also kind of reviewing where we've been over the last 25 years of this been around. I mean that's a long time in technology and but it's you know the as all these things do they they change over time.
0:1:38.630 --> 0:1:47.390
Steve Tomkinson
And I suppose what we were thinking talking about today was kind of going from where it all started to, well, what's gonna happen next and.
0:1:47.470 --> 0:1:54.280
Steve Tomkinson
Them, and I suppose over to you, where where did this all start? Tom, you know where where do we start with voice technology?
0:2:0.400 --> 0:2:0.810
Steve Tomkinson
Yeah.
0:2:7.610 --> 0:2:8.80
Steve Tomkinson
Right.
0:1:54.820 --> 0:2:13.130
tom (gast)
Yeah. The very first development of voice technology was already late 80s, of course, but the the real substantial development of voice was in the end of 90s when we really were in a position that we could use it as a solution because the idea was.
0:2:20.80 --> 0:2:20.450
Steve Tomkinson
Yeah.
0:2:14.310 --> 0:2:20.810
tom (gast)
Why not try to integrate the most natural interface we have and that the speech instead of?
0:2:29.150 --> 0:2:29.530
Steve Tomkinson
Yeah.
0:2:45.60 --> 0:2:45.490
Steve Tomkinson
Yeah.
0:2:22.280 --> 0:2:47.990
tom (gast)
Having to learn how to control and apparatus or how to touch certain buttons in a in a certain environment. The thing was when we were able to use speech that would be a great advantage. It would be very user-friendly. It would be very intuitive. It would be very easy. And so at the 90s when we were able to really do some substantial speech recognition, the idea was this is going to boom. This is going to be.
0:2:48.390 --> 0:2:53.230
tom (gast)
We revolution in the in the, in the industry, in industrial world.
0:2:53.580 --> 0:3:1.610
Steve Tomkinson
Yeah. Well, so what's what's I suppose to speech recognition wasn't nearly as good then, but it just started, isn't it? So you know.
0:3:8.870 --> 0:3:9.280
Steve Tomkinson
Right.
0:3:18.930 --> 0:3:19.290
Steve Tomkinson
Yeah.
0:3:1.240 --> 0:3:24.170
tom (gast)
Yes. And then you have to think about the early days. Speech recognition was mainly one word recognition. So you've got these boundaries, you could not design the way we do today with a lot of open-ended prompting in the sense of how may I help you and then maybe find domains we get back to that later on. But in the old days like for 25 years ago.
0:3:25.190 --> 0:3:29.400
tom (gast)
We we were talking about designs that would sound like.
0:3:38.70 --> 0:3:38.370
Steve Tomkinson
Yeah.
0:3:29.940 --> 0:3:43.250
tom (gast)
And if you want to rent the car today, say today if you rent a car or do you wanna rent a car tomorrow, say tomorrow and if you would say something like well, I would like to rent a car today. Then the machine would get back to you. Sorry I didn't understand that.
0:3:44.70 --> 0:3:45.690
tom (gast)
That it was just, yeah.
0:3:48.830 --> 0:3:49.50
tom (gast)
Yeah.
0:3:52.510 --> 0:3:52.770
tom (gast)
Yeah.
0:3:44.200 --> 0:3:55.260
Steve Tomkinson
But you know what? Do you know what tell I'm getting. I I still get those. There's still those knocking about the go say today. So tomorrow you just go. Really. What is that what we're doing then?
0:3:58.260 --> 0:3:58.570
Steve Tomkinson
Yeah.
0:4:4.440 --> 0:4:4.660
Steve Tomkinson
Yeah.
0:4:9.960 --> 0:4:10.260
Steve Tomkinson
Yeah.
0:3:54.340 --> 0:4:14.750
tom (gast)
I I yesterday I called my, I called my insurance, my health insurance for some questions. My daughter. And then it says well and please the date of birth. And I gave the date of birth and they should let me see if I got that right. Then the system repeats the date of birth and says am I right? And please answer the question only with a yes or a no.
0:4:15.80 --> 0:4:20.100
Steve Tomkinson
Ohh no, really. Wow. Wow good grief.
0:4:19.70 --> 0:4:28.700
tom (gast)
Still would have mentoring you, right? It's still very, very simplistic for they they do not do any grammar design to allow the caller to say a little bit more than only a yes or a no.
0:4:31.920 --> 0:4:35.660
tom (gast)
Yeah, correct. Yeah, that's that's exactly, yeah.
0:4:39.760 --> 0:4:41.80
tom (gast)
Yeah, no problem.
0:4:47.560 --> 0:4:47.860
tom (gast)
Yes.
0:4:50.480 --> 0:4:50.790
tom (gast)
Yes.
0:4:56.660 --> 0:4:57.40
tom (gast)
Well.
0:4:28.470 --> 0:4:57.760
Steve Tomkinson
Uh, right. Because because you go. Yeah, that's right. You know, something like that. Because that's how you would answer somebody else putting the format. Yeah. No, it's funny. I'm still getting those. But anyway, sorry, I I I love you. So. So suppose the next thing is that we had did have a boom in this. There a lot of people invested in this, didn't they? I mean we've had booms and technology and then that was one of them. So kind of why why did it not really take off during that boom period.
0:5:8.40 --> 0:5:8.330
Steve Tomkinson
Yeah.
0:5:10.740 --> 0:5:11.0
Steve Tomkinson
Yeah.
0:5:18.650 --> 0:5:18.950
Steve Tomkinson
Yeah.
0:5:22.920 --> 0:5:23.220
Steve Tomkinson
Yeah.
0:4:58.500 --> 0:5:28.660
tom (gast)
There are few reasons, I guess. First of all, all those companies popped out of the ground claiming that they were able to do speech recognition and formally they were web design companies and the idea was zooming around that from a website to a talking machine is just a few buttons, which was of course not exactly right. And that was one thing. So a lot of people claimed to.
0:5:32.160 --> 0:5:32.520
Steve Tomkinson
Yeah.
0:5:28.760 --> 0:5:37.180
tom (gast)
Be able to work with speech recognition but they didn't because it didn't have the skill sets. And secondly there was a.
0:5:38.140 --> 0:5:40.560
tom (gast)
The the industrial world had a really.
0:5:47.680 --> 0:5:48.520
Steve Tomkinson
Yeah, yeah, yeah.
0:5:55.300 --> 0:5:56.400
Steve Tomkinson
Yeah, yeah.
0:6:2.880 --> 0:6:3.690
Steve Tomkinson
Yeah, yeah.
0:5:41.500 --> 0:6:12.170
tom (gast)
Was very how do you say the the expectation was very, very immense. I mean they all that this is going to help us in our call center, this is going to help us in any kind of a simple automation when people can deal with the system with speech. It would be great, great, great. One thing we forgot we were still in the world of 1 record one word recognition and also with companies like IBM Nuance they all have their own language models and their own grammar books.
0:6:12.520 --> 0:6:12.860
Steve Tomkinson
Yeah.
0:6:12.570 --> 0:6:19.80
tom (gast)
And it was not based on something we call the cloud today. There's nothing to do with it was it was limited.
0:6:20.80 --> 0:6:26.190
tom (gast)
And so we were very much forced in a in a framework where we were.
0:6:27.190 --> 0:6:28.0
tom (gast)
Not able to.
0:6:28.740 --> 0:6:31.230
tom (gast)
And compose a conversational style.
0:6:31.960 --> 0:6:32.450
tom (gast)
So.
0:6:40.270 --> 0:6:40.670
tom (gast)
Exit.
0:6:45.420 --> 0:6:46.650
tom (gast)
Correct. So yeah.
0:6:31.840 --> 0:6:51.110
Steve Tomkinson
No. So it's still in the single words and I suppose the the there wasn't nearly the same training data or processing power around that training data. Then to be able to make the technology work, you know we we're we're I remember a vendor over here doing this something and they were trying to do it and this was.
0:6:52.250 --> 0:7:0.590
Steve Tomkinson
Anybody's listening to this from the UK at the cinema chain Odeon, we're trying to do this and they would you were doing like a telephone.
0:7:10.310 --> 0:7:10.660
tom (gast)
Yeah.
0:7:20.410 --> 0:7:20.690
tom (gast)
Yeah.
0:7:1.10 --> 0:7:25.620
Steve Tomkinson
Uh cinema ticket and they were asking you which cinema you wanted to do and it would never, ever get it right. It was so bad and it and it was just like, why did you bother? How did you not test this or know that this isn't working, you know, because it just didn't and it was single word stuff again. But those quite a complicated process, actually. They were trying to simplify.
0:7:26.140 --> 0:7:38.560
Steve Tomkinson
And and and we didn't have apps then and we didn't have anything else. It was gonna be a phone call if you wanted to do it remotely. The website wasn't very good and all that type of stuff. So it would have been a great solution for them.
0:7:39.220 --> 0:7:39.580
tom (gast)
Yeah.
0:7:39.250 --> 0:7:41.940
Steve Tomkinson
But the technology wasn't there to support it, you know.
0:7:50.600 --> 0:7:50.880
Steve Tomkinson
The.
0:7:41.390 --> 0:7:50.920
tom (gast)
No, that's that's right. Because that reminds me of of a airplane. Sorry aircraft company that.
0:7:51.680 --> 0:7:57.510
tom (gast)
Wanted to automate automated voice technology as well and then you had this problem with Newark and New York.
0:7:58.10 --> 0:7:59.80
Steve Tomkinson
Uh, right? Chat.
0:8:2.700 --> 0:8:2.970
Steve Tomkinson
Yeah.
0:8:7.190 --> 0:8:7.510
Steve Tomkinson
Yeah.
0:8:10.370 --> 0:8:10.700
Steve Tomkinson
Yeah.
0:7:58.180 --> 0:8:19.730
tom (gast)
So don't. Don't forget, it's not only the the the the limited vocabulary, but also the accuracy of speech recognition on itself. So the actually was not that high yet. So we were happy when we had 0.7 accuracy and now we are not happy when it's below 0.97.
0:8:20.260 --> 0:8:21.120
Steve Tomkinson
Yeah, yeah.
0:8:20.990 --> 0:8:22.670
tom (gast)
That's a big difference. So the.
0:8:23.970 --> 0:8:24.180
tom (gast)
Yeah.
0:8:33.210 --> 0:8:33.870
tom (gast)
That's perfect.
0:8:35.360 --> 0:8:35.730
tom (gast)
Yeah.
0:8:22.220 --> 0:8:37.170
Steve Tomkinson
Can I just stop you there, Tom? I was just looking at the transcript because obviously we're recording this and I was looking at the transcript and Microsoft managed to define between Newark and New York, so that knowledge has moved on. So that's always encouraging, isn't it?
0:8:47.30 --> 0:8:47.300
Steve Tomkinson
Yeah.
0:8:37.20 --> 0:8:52.910
tom (gast)
Yeah, that's good. Yeah, but today the accuracy is so, so much better than than than for 20 years ago. So that's also one of the reasons because everybody had Great Expectations. And then we started to industrialize some applications out there. And of course, they did not live up to those expectations.
0:8:52.500 --> 0:8:55.700
Steve Tomkinson
And I suppose that was the same as the web boom as well, because.
0:9:8.630 --> 0:9:8.950
tom (gast)
Yes.
0:9:16.140 --> 0:9:16.640
tom (gast)
That's good.
0:8:56.420 --> 0:9:26.170
Steve Tomkinson
Not that people weren't ready, because there's this time they consumer was probably ready to use voice. They were fine with it, but I would say that the technology wasn't ready then, whereas web was different, the consumer was already to engage all the their own tech in home wasn't ready to do web properly. The Internet was very slow. Very so you couldn't access websites without waiting for an age-old telephone lines and things and.
0:9:26.820 --> 0:9:27.170
tom (gast)
That's good.
0:9:26.370 --> 0:9:40.800
Steve Tomkinson
And this is a different type of technology, not actually keeping up, but you know, I suppose it kind of brings us up to where we are today now. So you know what's what's the what is the ultimate difference today? You know where we are now.
0:9:39.610 --> 0:9:47.370
tom (gast)
The ultimate difference, the ultimate difference, is very, very clear to me and let's go back 15 years ago or 10 years doesn't matter.
0:9:47.130 --> 0:9:47.430
Steve Tomkinson
Yeah.
0:9:56.570 --> 0:9:56.860
Steve Tomkinson
Yeah.
0:10:4.910 --> 0:10:5.180
Steve Tomkinson
Yeah.
0:10:11.410 --> 0:10:11.780
Steve Tomkinson
Yeah.
0:10:17.750 --> 0:10:18.150
Steve Tomkinson
Yeah.
0:10:24.670 --> 0:10:25.360
Steve Tomkinson
Yes. Yeah.
0:10:5.240 --> 0:10:36.610
tom (gast)
Was using your speech or your voice in a totally unnatural way? So the idea was let's integrate voice so we can use a natural interface the way we speak. But when it was integrated in the early days, we were not able to speak naturally because we couldn't. And so the expectations were high. But we're not lived up to because we were when we when you want, when you want, when you wanted to use the speech recognition, recognition based application, you had to talk different.
0:10:37.50 --> 0:10:37.430
Steve Tomkinson
Yeah.
0:10:39.450 --> 0:10:42.380
Steve Tomkinson
Yeah, if you, you actually the robot in this relationship.
0:10:50.860 --> 0:10:51.170
Steve Tomkinson
Yeah.
0:10:36.970 --> 0:10:57.150
tom (gast)
You have to talk like a robot or you had to do so. The answers exactly. No. And today the big difference with with today is that today we have the, the, the glorious solution with the cloud. I mean Google or Microsoft clouds. They have a so well tuned API.
0:10:59.470 --> 0:10:59.810
Steve Tomkinson
Yeah.
0:11:3.730 --> 0:11:4.60
Steve Tomkinson
Yeah.
0:11:13.490 --> 0:11:13.850
Steve Tomkinson
Yeah.
0:10:58.270 --> 0:11:27.140
tom (gast)
Formula and So what you get back is almost 99% accurate and which also allows allows us to make a total different design truly based on a conversation and a conversational style. So how may I help? You know, that's not correct. Let me try to do something blah. Making an appointment. Does it suit tomorrow afternoon? No, sorry. I can't make it tomorrow afternoon. Let's see for another day. This is really a.
0:11:27.240 --> 0:11:39.790
tom (gast)
Black and white difference from from 15 years ago. Speech recognition worked but we could not. We could not work with it in the way we are used to using our speech. We were talking to each other on the phone for instance.
0:11:46.770 --> 0:11:47.120
tom (gast)
Yes.
0:11:59.390 --> 0:11:59.560
tom (gast)
Yeah.
0:11:40.0 --> 0:12:1.800
Steve Tomkinson
Yeah. Yeah, that's right. And and that's it, isn't it? Because the whole point is that it's natural, you know, cause it, if you like, there's, there's gonna be a slightly funneled process in some regard because of business, usually as as offering a service. And that service is got 10 arms of the service and that, you know, so you're gonna be in some sort of conversation.
0:12:2.500 --> 0:12:9.70
Steve Tomkinson
But it doesn't mean that it's it has to be very robotic. It can be very natural and its process, you know.
0:12:18.300 --> 0:12:19.110
Steve Tomkinson
Yeah, yeah.
0:12:8.700 --> 0:12:26.280
tom (gast)
Yeah. That's correct. Yeah, that's correct. Because suppose suppose you're making an application that allows public to make an appointment with the hospital, for instance. Then you will not ask for a Coca-Cola, will you? So we're talking, exactly. So we're talking about.
0:12:22.80 --> 0:12:29.10
Steve Tomkinson
No, that's right. But it would be rare. It'd be rare. You might. You might do, you might do, but it'd be rare. Yeah, yeah.
0:12:33.10 --> 0:12:34.400
Steve Tomkinson
No. Again. Again, very right.
0:12:41.560 --> 0:12:41.820
Steve Tomkinson
Yeah.
0:12:45.550 --> 0:12:45.880
Steve Tomkinson
Yeah.
0:12:47.60 --> 0:12:47.320
Steve Tomkinson
Yeah.
0:12:28.450 --> 0:12:57.20
tom (gast)
But it would be very yeah or yeah, you will not rent a car, for instance. But. But so we're talking about domains. So if we're talking about an application that allows people to make an appointment within hospital, there are certain domains, certain words, certain phrases that people might use or will use when they make appointment. So a thorough analysis will give us very clear, will give us a very clear picture of what the words are that people use when they make an appointment. And so.
0:12:56.720 --> 0:12:57.50
Steve Tomkinson
Yeah.
0:13:0.890 --> 0:13:1.220
Steve Tomkinson
Yeah.
0:13:6.130 --> 0:13:6.440
Steve Tomkinson
Yeah.
0:12:57.110 --> 0:13:16.780
tom (gast)
Those words are then easily integrated in a domain and then we can work with intelligent speech recognition and intelligent conversational style. But of course you have to define domain. Of course you have to do some monitoring in order to understand the dynamics in such a discussion. When people call the hospital to make an appointment.
0:13:17.280 --> 0:13:18.0
Steve Tomkinson
Yeah, sure.
0:13:18.700 --> 0:13:23.150
Steve Tomkinson
I I suppose so. We're doing that now and that's the that's.
0:13:23.860 --> 0:13:33.150
Steve Tomkinson
That's what is available today. And I mean, you know, we're talking to people and they don't know that this is available. And I think that's the the challenge we've now got is that.
0:13:46.220 --> 0:13:46.480
tom (gast)
Yeah.
0:13:48.730 --> 0:13:49.140
tom (gast)
Yeah.
0:13:55.840 --> 0:13:56.60
tom (gast)
No.
0:13:56.710 --> 0:13:57.240
tom (gast)
There's one way.
0:13:58.330 --> 0:13:59.780
tom (gast)
Exactly. Exactly.
0:13:33.820 --> 0:14:5.270
Steve Tomkinson
People are have not got an idea that they can do this really is effectively, you know, like we were joking at the start. I still get those single word menu items or even, you know press 1 press 2, press 3 you know and then they're labeled as IVR which is interactive voice response and stuff like that. And you go well, it's not interactive. It's not a voice response. There is no conversation going on you know and and it's just extraordinary that that is an acceptable technology nowhere.
0:14:5.430 --> 0:14:11.220
Steve Tomkinson
Just doesn't have to be. You know, it's, it's today. It's moved on so much.
0:14:11.930 --> 0:14:12.320
Steve Tomkinson
You know.
0:14:10.110 --> 0:14:13.390
tom (gast)
You're quite right that you're quite right, I mean.
0:14:14.230 --> 0:14:18.480
tom (gast)
Yeah, the technology has been has been so.
0:14:20.180 --> 0:14:31.840
tom (gast)
Has been improved. So I mean the technology has been developing so, so, so immensely. And today we can we can have somebody stammering and still understand what he or she says.
0:14:32.320 --> 0:14:33.740
tom (gast)
And why?
0:14:41.140 --> 0:14:41.550
Steve Tomkinson
Yeah.
0:14:34.350 --> 0:14:51.100
tom (gast)
In the old days, when somebody gives you could. Sorry I don't understand you directly today we have filters. We have strategies to work with, extents with crazy synonyms, with people who pause when they talk.
0:14:52.410 --> 0:14:57.990
tom (gast)
Would you like to make an appointment now, or shall I? Shall I? Shall I call you back? Wait. Well.
0:14:59.490 --> 0:15:3.290
tom (gast)
And the system still gets that still works with, you know.
0:15:1.800 --> 0:15:3.440
Steve Tomkinson
Yeah. No, no.
0:15:3.770 --> 0:15:17.570
tom (gast)
And also we can we can even we can even touch on this also this interesting detail we can even with speech recognition and site systems determine if somebody's lying or not.
0:15:19.430 --> 0:15:19.750
tom (gast)
Umm.
0:15:21.820 --> 0:15:22.720
tom (gast)
Exactly. Exactly.
0:15:17.980 --> 0:15:24.970
Steve Tomkinson
Yeah. Yeah, that's right. Well, you've got the biometric stuff now so that you can get sentiment analysis.
0:15:26.450 --> 0:15:26.900
Steve Tomkinson
I mean.
0:15:25.520 --> 0:15:27.490
tom (gast)
HHH gender.
0:15:28.100 --> 0:15:43.850
Steve Tomkinson
Yeah, I mean, like they, you know, there's even the stuff that's really interesting. Now, though, I kind of like is that you can do a voice identity for an individual and then pull them back. You know? So if there if this is a regular scenario that somebody found you on.
0:15:51.760 --> 0:15:52.140
tom (gast)
2nd.
0:16:4.930 --> 0:16:5.160
tom (gast)
Yeah.
0:15:44.400 --> 0:16:8.820
Steve Tomkinson
And you can actually answer although Bob, because you now know who they are because you've got their voice, I then you know and that's really good, you know, from both security much better security and also from a just a nice customer journey. It's like the bot recognizes Bob but it also recognizes 10,000 other customers. Yeah.
0:16:8.520 --> 0:16:14.60
tom (gast)
Now that's true, and that for for the people who listen to this podcast, I mean, for your information.
0:16:14.510 --> 0:16:16.590
tom (gast)
And a a voice print.
0:16:17.340 --> 0:16:20.160
tom (gast)
Is about 40,000 times more.
0:16:20.760 --> 0:16:23.800
tom (gast)
The individual and.
0:16:27.820 --> 0:16:28.130
Steve Tomkinson
Yeah.
0:16:24.380 --> 0:16:31.750
tom (gast)
And how do you say this detailed compared to Iris copy and fingerprint together?
0:16:31.480 --> 0:16:34.40
Steve Tomkinson
Yeah, yeah, yeah. Together. Wow.
0:16:32.480 --> 0:16:38.630
tom (gast)
So I mean, a voice print is really a good a good way to do a a safe system so to speak.
0:16:38.760 --> 0:16:44.910
Steve Tomkinson
Yeah. Yeah. And that's really important. I mean, look, banks have gotta take that as a, as a thing, you know, can you?
0:16:53.730 --> 0:16:54.100
tom (gast)
Yes.
0:16:46.950 --> 0:16:55.360
Steve Tomkinson
It it you know if you want to access your bank and you've got a voice print, then wow, that's that's gonna be huge. You know, it's gonna be very difficult for you to.
0:17:6.840 --> 0:17:7.970
tom (gast)
That's that's yeah.
0:16:56.0 --> 0:17:14.0
Steve Tomkinson
To mimic that, especially if it's kind of random sentences or random stuff that you're gonna look at rather than it being, you know, a set recorded voice or something like that, that could possibly get around it, you know, and now it's a very, it's a very key thing that you know.
0:17:17.970 --> 0:17:18.300
Steve Tomkinson
Yeah.
0:17:13.740 --> 0:17:19.980
tom (gast)
Yeah. So so the the combined voice print with the personal PIN code or whatever and the it's it's a very solid.
0:17:20.290 --> 0:17:22.100
tom (gast)
So yeah, yeah.
0:17:19.540 --> 0:17:28.570
Steve Tomkinson
Yeah, really solid, don't you? Yeah. And I suppose that they're starting to bring is on to what's happening tomorrow. Then I suppose that's the point is that.
0:17:29.360 --> 0:17:40.550
Steve Tomkinson
Tomorrow looks different again. You know, we've got developments now in neural TTS and stuff like that, which is you know very nice, you know sounds really quite good.
0:17:41.210 --> 0:17:41.540
tom (gast)
Yes.
0:17:41.170 --> 0:17:49.280
Steve Tomkinson
I'm close to, you know, just a human voice, essentially, you know, in a in intonations, getting good and all that type thing.
0:17:57.860 --> 0:17:58.240
Steve Tomkinson
Yeah.
0:18:5.120 --> 0:18:5.460
Steve Tomkinson
Yeah.
0:18:13.240 --> 0:18:13.590
Steve Tomkinson
Yeah.
0:17:49.700 --> 0:18:18.480
tom (gast)
That's right. I mean you see that in particular in countries where they already are more or less acquainted to speech, like in the United States, also in the UK, a little more maybe in Germany, but those TTS machines are incredibly well tuned and it's also developing. I mean you have tools already to influence the intonation, the tempo, speech rate, you can make a voice more happy, you can make a voice more serious.
0:18:18.860 --> 0:18:19.280
Steve Tomkinson
Yeah.
0:18:24.470 --> 0:18:25.120
Steve Tomkinson
Yeah, yeah.
0:18:19.650 --> 0:18:29.140
tom (gast)
Incredible. You have the toolbox, you can just work with it and sort of customize your voice. The quality of the TTS and the recordings made to.
0:18:34.800 --> 0:18:35.180
Steve Tomkinson
Yeah.
0:18:41.30 --> 0:18:41.390
Steve Tomkinson
Yeah.
0:18:44.850 --> 0:18:45.120
Steve Tomkinson
Yeah.
0:18:29.220 --> 0:18:49.730
tom (gast)
To to to create that quality is is amazing today. It's really good. Of course, there's always will always be a difference between really prerecorded prompts with the voice artists in the studio compared to TTS, but still the development is incredible. And yeah, now you see already interesting, but last week I saw on television.
0:18:50.380 --> 0:18:55.810
tom (gast)
Yeah, this context sensitive way of writing a.
0:19:1.520 --> 0:19:1.870
Steve Tomkinson
Yeah.
0:18:56.620 --> 0:19:2.340
tom (gast)
Just note or a Twitter Twitter message that.
0:19:8.260 --> 0:19:8.630
Steve Tomkinson
Yeah.
0:19:3.330 --> 0:19:11.380
tom (gast)
The cloud can already is already able to see the context of what you're writing, and then can react in context.
0:19:12.20 --> 0:19:12.210
tom (gast)
No.
0:19:11.810 --> 0:19:12.790
Steve Tomkinson
Yes. Yeah.
0:19:12.880 --> 0:19:22.290
tom (gast)
You can. You can even give a certain system the assignment. Write me a little story about a little sailing ships.
0:19:22.870 --> 0:19:23.480
Steve Tomkinson
Yeah, yeah.
0:19:23.370 --> 0:19:27.80
tom (gast)
And the thing starts writing and there comes a solid story.
0:19:25.320 --> 0:19:30.990
Steve Tomkinson
Yeah, they're well, you're talking about. You're talking about the Jack cheap Jack Cheese BT on you? No.
0:19:32.230 --> 0:19:32.560
Steve Tomkinson
Yeah.
0:19:30.360 --> 0:19:40.630
tom (gast)
Exactly. Exactly. Now it's it. It will not be that long before that goes over to voices will infect voice as well you.
0:19:55.470 --> 0:19:56.820
tom (gast)
It's not not only a none.
0:19:40.340 --> 0:19:59.70
Steve Tomkinson
William, while I was looking at so, I've used that and I've been playing around with it for a while and you know much as the and and there's a lot of news about it being not quite giving you the right answers back and it, but being very authoritative in it, you know, well that's slightly different than.
0:20:0.670 --> 0:20:13.940
Steve Tomkinson
But there's actually a lot of people have missed the point in the fact that actually the fact that it gave the answer back in an authoritative very nicely grammar correct.
0:20:16.490 --> 0:20:16.700
tom (gast)
Umm.
0:20:38.450 --> 0:20:38.880
tom (gast)
Yes.
0:20:14.680 --> 0:20:45.330
Steve Tomkinson
Uh format, like you said, it wrote an article, now it's loads of people again. I wrote this article on Jack, GB, PT and you're going. You're great. OK. Yeah. So did everybody else. And you know, and that sort of thing. But it is impressive that that is the the way that's forming its answers and that's then leads on to a voice thing. So like you said, it gets context, it gets what you're after. You might not necessarily answer the question correctly, but if it's if it starts becoming.
0:20:45.390 --> 0:20:53.450
Steve Tomkinson
Or there is an ability to make it a subject matter expert in a specific vertical or personal to a business?
0:20:58.610 --> 0:20:59.40
tom (gast)
Yes.
0:21:8.370 --> 0:21:9.450
tom (gast)
It's it's. That's right.
0:20:54.100 --> 0:21:11.320
Steve Tomkinson
Then that's a massively powerful thing, and I think that's the next step for that. And then of course, you've got nary natural responses. You stick a very nice TTS on the end and you've got a dynamic voice environment, which is huge, you know, that's that's very, very good, you know.
0:21:11.0 --> 0:21:18.310
tom (gast)
Why, you know, am I was thinking if you have the context and you have a sort of.
0:21:20.370 --> 0:21:27.720
tom (gast)
Yeah, if you have a brother or horizon of analyzing the context, I mean, the question is like, can you have a look at my balance?
0:21:28.200 --> 0:21:28.670
Steve Tomkinson
Yeah.
0:21:28.420 --> 0:21:32.190
tom (gast)
And and then the voice bot or the speech application would say.
0:21:33.590 --> 0:21:34.400
tom (gast)
For what account?
0:21:34.940 --> 0:21:35.840
Steve Tomkinson
Yeah, that's right.
0:21:35.590 --> 0:21:39.450
tom (gast)
Or or. Please let's that's makes a lot of sense, you know.
0:21:45.910 --> 0:21:46.150
tom (gast)
It's.
0:21:47.630 --> 0:21:48.120
tom (gast)
Exactly.
0:21:56.80 --> 0:21:56.580
tom (gast)
Exactly.
0:22:0.430 --> 0:22:0.610
tom (gast)
Yep.
0:21:39.310 --> 0:22:11.200
Steve Tomkinson
Yeah, yeah, that's right. But but you. But The thing is with that is it is you have to be the subject matter expert for the fact that you're inside the banking environment and what chat GPT is, is not doing at the moment is it's it's, you know, the actual source is the Internet. It's going again content from the Internet and that's where it's data is. Yeah, you know, you look through a few layers and it basically ends up saying, yeah, we scraped tool from the Internet. Well, that's OK. But the accuracy is poor and never is fight moaning about that because it's basically turning into a.
0:22:11.400 --> 0:22:13.570
Steve Tomkinson
At chatbot version of Wikipedia.
0:22:14.70 --> 0:22:14.410
tom (gast)
Yeah.
0:22:39.980 --> 0:22:40.300
tom (gast)
Yeah.
0:22:41.730 --> 0:22:42.180
tom (gast)
2nd.
0:22:14.540 --> 0:22:45.70
Steve Tomkinson
And Umm, which as we all know, it varies wildly. What's true and not true in the and you know much as it's moderated and they try and make it as good as possible, you know the the fact that it's as an editable space means that somebody can put something in there that isn't. But when you're then in a vertical, there's slight insurance, financial or even just a retail environment, it has to be specific, it has to have to know what to do next and has to be trained in that bit. So that's going to be the next challenge for that thing. If they start then going.
0:22:52.320 --> 0:22:52.510
tom (gast)
No.
0:22:45.170 --> 0:22:56.260
Steve Tomkinson
Right. OK. Gives you data for X, gives you data for X, gives you data for X and then it will go understand all that stuff. Then you know where there aren't we, you know that's that's that's that's it.
0:22:54.900 --> 0:22:59.440
tom (gast)
Well, that's where we hitting, yes. But then again, let's be honest with each other, I mean.
0:23:5.30 --> 0:23:5.390
Steve Tomkinson
Yeah.
0:22:59.590 --> 0:23:6.640
tom (gast)
And we are living in a world where we already thinking about the day after tomorrow, but still we see that.
0:23:8.420 --> 0:23:12.110
tom (gast)
Current business customers are not even.
0:23:13.440 --> 0:23:15.370
Steve Tomkinson
They're still in the 1980s.
0:23:13.20 --> 0:23:16.20
tom (gast)
Using a 10 still the 90s.
0:23:16.780 --> 0:23:17.220
tom (gast)
I mean.
0:23:16.480 --> 0:23:20.670
Steve Tomkinson
90s alright, I'll give him a 90s then. But it is dreadful, you know.
0:23:30.420 --> 0:23:30.610
Steve Tomkinson
Yeah.
0:23:19.280 --> 0:23:36.310
tom (gast)
Yeah, I mean to make that step to go into a real good voice bot environment where we really do some solid monitoring on context and domains and to make that particular interaction between the caller and a company work very well.
0:23:44.170 --> 0:23:44.550
Steve Tomkinson
Yeah.
0:23:36.730 --> 0:23:47.90
tom (gast)
And it's indeed some investment, but with the technology we have come so much further, so much further than what we are still implementing today. It's a little bit ridiculous actually.
0:24:3.920 --> 0:24:4.300
tom (gast)
You're.
0:23:47.450 --> 0:24:7.960
Steve Tomkinson
Yeah, I know I spent. So it's it's really don't get why there isn't a race to this. Uh, because I think the first movers will get massive advantage because of course the big advantage with this and you know the ones that have got to race to it now are the ones that have got cues they've got waiting lists, they've got, you know anybody now that's got queue above.
0:24:9.80 --> 0:24:15.350
Steve Tomkinson
5 minutes. It's gotta be looking longer, harder themselves to go. I need to to better solution than this.
0:24:20.180 --> 0:24:20.460
Steve Tomkinson
No.
0:24:15.20 --> 0:24:31.310
tom (gast)
Yeah, exactly. And you don't have to. You don't have to automate the whole journey. I mean, you can automate the beginning, the simple questions, the, the stuff that is easy to cover and then already you can do so much better than what we're still doing today.
0:24:44.80 --> 0:24:44.770
tom (gast)
Absolutely.
0:24:31.600 --> 0:25:2.790
Steve Tomkinson
Yeah, and and and if you think about it, you know, I think they're the bots now with the language that is so good. You know, they're they're better perceived now than, you know, the Indian call centers of Malaysian or going somewhere else, you know, so those that are out there and doing that stuff. I mean, look, you know, it is cheap. Cheerful did it is a very cost effective solution to go Far East, but it's not a great brand thing. And I think the the nice thing about the voice bots is that.
0:25:10.320 --> 0:25:11.420
tom (gast)
Where? Yeah.
0:25:25.610 --> 0:25:25.890
tom (gast)
Yeah.
0:25:3.170 --> 0:25:29.470
Steve Tomkinson
To do that, you you're actually looking as an innovation company, you're looking at innovative. You're just trying to help you trying to be as smart as you can rather than it doesn't feel like, well of replaced a load of people with cheaper people. And there's some slavery going on. And over here, you know, which is the feeling when you go out to those things, doesn't have to be the case, but it's the feeling and that's the impression, I think, that people get in the Western world.
0:25:43.730 --> 0:25:44.570
Steve Tomkinson
Yeah, that's right.
0:25:52.50 --> 0:25:52.360
Steve Tomkinson
Yeah.
0:25:53.350 --> 0:25:53.950
Steve Tomkinson
Yeah.
0:25:29.950 --> 0:25:57.600
tom (gast)
Yeah, I I do agree with you and also don't forget about the the fact that the voice bot is very adaptive. If you have a change in your environment changing your product or whatever, it is very easy to make those changes and to directly put them in a live application compared to a call center that works for you that needs to be instructed when something changes. So it's also cost effective solution when it comes to.
0:26:0.200 --> 0:26:0.420
tom (gast)
No.
0:26:10.560 --> 0:26:10.950
tom (gast)
Yes.
0:26:14.100 --> 0:26:15.350
tom (gast)
No, it was.
0:25:57.310 --> 0:26:16.920
Steve Tomkinson
Yeah, you can't dynamically change that as easily because there's a big training effort, you know, because the the volume of people that are coming through the door, you know, if you've got 500 people working in a contact center to deal with that level, of course, then that's that. You can't train those overnight. That's that's a. So it's a long. It's a big shift you know.
0:26:18.970 --> 0:26:19.720
Steve Tomkinson
Yeah, yeah.
0:26:28.480 --> 0:26:28.770
Steve Tomkinson
Yeah.
0:26:29.560 --> 0:26:30.370
Steve Tomkinson
Yeah, absolutely.
0:26:31.400 --> 0:26:31.690
Steve Tomkinson
Yeah.
0:26:16.700 --> 0:26:33.70
tom (gast)
Yeah. And also the control of the whole thing. I mean, if if company X does, it does have installed very solid and good design voicebot they can also tweak themselves in a very, very fast efficient way. This is but still.
0:26:35.200 --> 0:26:37.630
tom (gast)
I get the idea that a lot of companies still.
0:26:38.460 --> 0:26:41.90
tom (gast)
Don't know the potential of voice.
0:26:58.230 --> 0:26:58.650
tom (gast)
Yeah.
0:26:40.940 --> 0:26:59.360
Steve Tomkinson
No, I I think that's the problem. And I think we're still in an education phase which is amazing, but it's I I can see where it's coming from. But you know they they've gotta be. They've gotta be on it now and you know, and it's easy to demonstrate, it's easy to show how good it is now. So you know it's it's sicker, sicker, it's it's to do.
0:27:7.780 --> 0:27:8.330
Steve Tomkinson
Yeah, yeah.
0:27:15.840 --> 0:27:16.160
Steve Tomkinson
Yeah.
0:27:20.910 --> 0:27:22.280
Steve Tomkinson
Yeah, the cinema, yeah.
0:27:28.670 --> 0:27:28.970
Steve Tomkinson
Yeah.
0:26:59.310 --> 0:27:30.810
tom (gast)
But still still be cope with the the bed impressions we made in in in for like 1213 years ago when the Deutsche banned for instance also in installed this application where you could buy a ticket on the phone and where you had exactly the same problem as the illustration you just made with the replaces that and still that is having its effect on the acceptance of voice today. If you talk to an average German company.
0:27:30.930 --> 0:27:33.970
tom (gast)
They will say, well, with the bond, it failed. Why should we work on?
0:27:35.70 --> 0:27:35.440
tom (gast)
Yeah.
0:27:33.910 --> 0:27:36.440
Steve Tomkinson
Yeah, that's right. Yeah, that's right, everybody does it.
0:27:36.590 --> 0:27:36.930
tom (gast)
Yeah.
0:27:37.350 --> 0:27:45.720
Steve Tomkinson
Alright. Well look, I I think that's hopefully that's been interesting for everybody. And you know we've we've covered the whole.
0:28:4.620 --> 0:28:4.920
tom (gast)
Yeah.
0:27:46.780 --> 0:28:8.400
Steve Tomkinson
Decades of voice in a in about 25 minutes. So you know, there was a bit of a race through, but I think it's interesting to see where it is and where it's come from to get into context. And I think that's the important part just to to then go look, this is now proven technology is done. It's doing it. It's happening. So there's there are people, banks and all sorts of using this.
0:28:10.650 --> 0:28:11.140
tom (gast)
Yeah.
0:28:9.170 --> 0:28:20.680
Steve Tomkinson
Really. Well, there are plenty not doing it and they're the ones that are not doing it. The ones that ask you to ask answer single questions, single words. If they're doing single words, you know they're not, they're in it.
0:28:25.400 --> 0:28:25.710
Steve Tomkinson
Yeah.
0:28:21.20 --> 0:28:29.550
tom (gast)
Yeah, that's right. But yeah, maybe we should see this as a sort of introduction and let's let's following following podcast Go a little bit deeper in.
0:28:36.160 --> 0:28:37.220
Steve Tomkinson
Yeah, yeah, yeah.
0:28:38.490 --> 0:28:38.800
Steve Tomkinson
Yeah.
0:28:30.630 --> 0:28:46.90
tom (gast)
In in let's go in detail what is actually in fact change what has changed? Why is it really better? And let's also give them some examples. Why not build something on the flow and hear how technology has been?
0:28:47.800 --> 0:28:48.20
tom (gast)
Yeah.
0:28:52.100 --> 0:28:52.420
tom (gast)
Yes.
0:28:43.840 --> 0:28:58.70
Steve Tomkinson
Yeah, we could maybe do that. Well, what's we can Cermak demo now on the podcast anyway. But look, thanks very much everybody. And I found that interesting and look out for our next ones coming up. Thanks again Tom. Cheers.
0:28:59.10 --> 0:29:0.60
Steve Tomkinson
OK, cheers.
0:28:57.710 --> 0:29:0.750
tom (gast)
OK. You're welcome. Thank you too. OK, bye.