Technologies Impacting Society

The Future Of Voice With Dr. Catherine Breslin

December 18, 2020 INA | Catherine Breslin Season 1 Episode 8

Send us a text

Dr. Catherine Breslin, https://www.linkedin.com/in/catherine-breslin-0592423a - a machine learning scientist, specialising in speech and language technology. Catherine has a distinguished academic background with a degree from Oxford and postgraduate qualifications from Cambridge. Her career has been about the application of speech and language research to real world problems. She's worked for Toshiba and then onto Amazon, where she played a major role in developing Alexa there onto other platforms and she now works for Cobalt Speech https://www.cobaltspeech.com/

I caught up with Catherine, where we got to discuss the future of voice technology and where it's all going.

--------------------------------------

FOLLOW CATHERINE BRESLIN,  Scientist

🌐 Website: https://www.catherinebreslin.co.uk/about
➡️ Twitter: https://twitter.com/catherinebuk

--------------------------------------

Oriel - A Magnesium For Sleep 😴
Affiliated With Oriel Magnesium Store: Get deep sleep💤, boost your energy 💪 and immune system.

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Support the show

--------------------------------------

FOLLOW ME:

➡ Website:
http://www.inaom.io
➡ Link-tree:
https://linktr.ee/inaom
➡ Instagram: http://www.instagram.com/iomurchu
➡ Twitter: http://www.twitter.com/Ina

--------------------------------------

SUBSCRIBE:

➡ YouTube: https://www.youtube.com/@iomurchu

--------------------------------------

JOIN & FOLLOW TECHIS:

➡ Facebook: https://www.facebook.com/groups/TECHIS
➡ LinkedIn: https://www.linkedin.com/company/technologies-impacting-society

--------------------------------------

SUBSCRIBE TO GET THE LATEST EPISODES!

➡ Spotify: https://spoti.fi/3bJWfex
➡ iTunes: https://bit.ly/2LTxKRs

--------------------------------------

RATE MY PODCAST ⭐️⭐️⭐️⭐️⭐️

➡ Rate TECHIS: https://ratethispodcast.com/havealisten

-------------...

Ina O'Murchu :

Hi there, and you're very welcome to my podcast show and I'm your host Ina O' Murchu. In this podcast, I got to speak with Dr. Catherine Breslin, https://www.linkedin.com/in/catherine-breslin-0592423a - a machine learning scientist, specialising in speech and language technology. Catherine has a distinguished academic background with a degree from Oxford and postgraduate qualifications from Cambridge. Her career has been about the application of speech and language research to real world problems. She's worked for Toshiba and then onto Amazon, where she played a major role in developing Alexa there onto other platforms and she now works for Cobalt Speech https://www.cobaltspeech.com/ I caught up with Catherine, where we got to discuss the future of voice technology and where it's all going.

Dr Catherine Breslin :

Hi! So thanks a lot for inviting me on to talk with you today. My name is Catherine Breslin, and I've been working in voice and language technologies since the early 2000s. So I've been building speech recognition systems or building voice interfaces and computers that have conversations with people.

Ina O'Murchu :

Catherine, can you tell us a little bit about your background? You know how you've ended up where you're at today with voice.

Dr Catherine Breslin :

Sure. So, I started off not entirely knowing what it was I wanted to do when I was younger. But I did love maths and science and computers. And so I went off to university to study engineering. And it was only while I was there that I really sort of thought about using computers in different ways. And in the final year of my engineering degree, we started to study what was at the time called pattern recognition and machine learning of the idea of how you get computers to learn stuff, how you get them to show some sort of intelligence, do some of the things that people find really easy but computers find really hard because we can't just program them to do things like vision, speech, and audio hearing all of these things. To us they come really naturally and small children can do that pretty well but computers just couldn't get them to do these things very easily. And so I discovered this field of machine learning of how you get computers to start to understand some of these complex tasks and be able to do them and so it went from there, at least. I went to study a Masters and a PhD in Speech Recognition, trying to get computers to listen to people. And then, since then, I've been working in various different places. I've worked in academic research, both University of Cambridge and also Toshiba research here in Cambridge, where we're looking at sort of general research into improving the accuracy and the performance of speech recognition systems. And then around the early 2000s, there was a lot of excitement and a lot of breakthroughs and new progress in using deep neural networks for machine learning. So technology which has been around a while but really came into its own around that time. And so the performance of speech recognition systems started to get a lot better. And that led me into industry. So working on products and I ended up at Amazon working on Amazon Alexa for a few years. And Amazon Alexa was a great place. I joined just before Alexa was launched. So it was, you know, amazing to join and see this product and see how it became really successful, and everybody started talking about it. After a few years at Amazon, I moved on to my current role, which is with a company called Cobalt Speech. And we're a consultancy, we build custom technology for different clients of ours who want to use voice and language technology. And so now it's really interesting to see and to hear all of the different things that different businesses are thinking about voice technology and how they want to use it in their businesses.

Ina O'Murchu :

What are those interesting things that companies are looking at if you could talk about them? What cool things can be built with voice technology Catherine? Do you have any specific use cases you can share with us?

Dr Catherine Breslin :

Yeah, so there's some really interesting things you can build with voice technology. But first, maybe just take a step back and say, like, what is voice technology? And what's does it cover because I think the way that most people end up talking to a computer is via some sort of smart speaker or some sort of personal assistant on your phone and there's a different set of technology that go into these systems. So there's, for example, speech recognition, which is the technology which is going to take audio and convert it into text. So the words that you said in that audio, there is language understanding and dialogue technology, which is going to take that text and try and understand what it is that you've said and how a computer should respond to that. That thing that you've asked about. There's text to speech technology, which is the way a computer can speak back to you. And if you're interacting with an assistant, often you'll hear a text to speech voice and automated voice speaking the response back to you. But then there's other things that we might want to understand from audio too so you can understand in a long stream of orders, say at a meeting or a broadcast or something or podcast, even when lots of people are talking, you can use technology to understand who's speaking when. So this is something we call diarisation. And then there's are other things you can understand from the audio signals as well. So you can start to understand someone or classify or guess some of the other aspects of, of speech. And one thing that interests people a lot is the idea of whether you can understand some sort of sentiments from the way people are speaking. Can you understand if somebody is excited, or if somebody is sad, from the way that they're talking. There's other related technology that you can use as well - like - leading onto it to use cases now - one of the things we can do is we can use speech recognition technology and understand what someone is saying and use that in a scenario where they're trying to learn a foreign language, you can understand if their pronunciation is matching well with the pronunciation that you would expect from a native speaker of that language. And this lets you give feedback to somebody when they're learning a foreign language about where they can and where they're making mistakes, where they should improve where they should focus their efforts. So that I think that's a really nice use case for speech technology.

Ina O'Murchu :

Have you got any examples of any cool things have been built apart from Alexa Catherine?

Dr Catherine Breslin :

So we've do build lots of cool, different technologies, many of which we can't talk about, because our clients are the ones who are building or distributing these, but I can talk about some of the use cases that we have done and understanding how someone is talking in order to give them feedback for learning is one that we've worked on. Other areas that we focus are on searching audio. So if you have lots of audio and you're trying to search that audio for particular keywords or particular phrases that people are saying at different times, we have technology that will do that. So this can work really well in - in some interactions, some people you're talking with, there are certain things that they have to say legally, like if you're getting financial advice, there are certain disclaimers that somebody has to give you and for compliance purposes often these conversations have to be recorded, and then manually monitored to check that those phrases are being set. That the right caveats are being given to people when they're being sold financial advice. So we can use automated technology to do this, to help search for these sorts of phrases that are recurring in conversations like this. And this can happen in sort of conversations where you're talking one on one across a desk, but also conversations in a call center, when you're calling up and a lot of call center providers now are really interested in whether you can automate any of the call center experience. So whether you can call up and have some of the simple questions that people might want to ask just automatically answered.

Ina O'Murchu :

It's really heading towards automation...

Dr Catherine Breslin :

I think, yeah, there's a lot of heading towards automation, but not automation of everything. Because human conversation is really complex. And there's a lot of nuances and a lot of difficult problems that people have, that they have to resolve with conversations with people. But there are lots of very simple things that people want answers to, as some of those companies are already looking at whether they can automate those simpler queries.

Ina O'Murchu :

More repetitive ones? I think the automation of simpler tasks is possible. I mean, Alexa, there's a whole new world developing around Alexa. There is monetising Alexa skills. For example, as a small business this never existed before Alexa came along. And now it's like the early days of app development, let's say as well.

Dr Catherine Breslin :

Yeah.

Ina O'Murchu :

So the whole, I suppose, the direction of the opportunities? Well, it depends on where the technologies have been developed in the area of voice there is this new opening, it's an area of innovation isn't really - Alexa skills?

Unknown Speaker :

Yeah, I think Alexa skills is very interesting because it allows you to build your own voice interfaces and your own things that you want to do by voice. I've done this at home with my kids and we've built an Alexa skill that has answered some of the questions that the kids have learned at school. They have a topic at school and we've put it into an FAQ and made it a little Alexa skill that works at home. And this is great fun that kids love to type in what they learned in school and hear Alexa speak it back. But it also has great applications for other people, because you can now very easily make your content accessible on a large platform and lots of people can talk to Alexa and effectively talk with you and your company that way. So I think this is great to be able to do that on such a large platform to have the chance to take your content there. Just like you said, the early days of apps where the App Store on the iPhone started. And there was a boom in sort of applications that developers were building and now we're very used to having apps on our phone and there's an app for everything. I think Alexa Skills is the same it's letting people build content for the for the voice platform.

Ina O'Murchu :

Yeah, and it fits in quite well with the area of chatbots as well this content and voice and all the automation of it. So it I don't know it just seems like it's kind of like a whole new industry that has opened up since the launch of Alexa, really. I mean voice is still so in the early stages of development we're not quite there yet, but you spoke about the computers being able to tell whether someone was excited or someone was upset. Do you have any use cases that you can talk about, about this - an application of this technology? You know, sort of its emotional scanning? Is that what you would call it?

Dr Catherine Breslin :

So I think you have to be a little careful here because you can't necessarily tell from someone's voice, their internal emotional state. You can tell what they're trying to project with their voice. And there maybe times when you may want to understand what emotions people are trying to protect. But I think there's a really interesting application the other way around, where you have a synthetic voice, you want to have that synthetic voice speaking in an expressive way. And here, I think there's lots of applications in, for example, you can use synthetic voices for reading audiobooks. And if you can read an audio book in an expressive style, you can start to put a lot more nuance into text that you're reading than you would otherwise be able to do. The same with having synthetic voices in maybe computer gaming, you can generate a lot of text automatically and you want emotion in there because these things are designed to be emotional. And to get you excited or to have you interact with the game. And so I think having a voice, which is able to express emotion automatically is a really interesting application.

Ina O'Murchu :

For sure within the computer games itself. So how would you code? Do you have to code those nuances in?

Dr Catherine Breslin :

You have to know what you want your voice to project at any one time. So maybe audiobooks is an easier one to think about where if you're reading a book, and something exciting is happening, and your character says something in an excited voice, you want to be able to read that and in an excited style. So you have to know that this particular bit of text ought to be read in an excited style. But maybe there's another bit of text later where the character is nervous and you want to convey that in the voice that you are reading with. So you have to know which bits of the text corresponds to which emotion you're trying to present.

Ina O'Murchu :

Okay, so that depends on the content as it usually does depend on the content itself. Yeah. It's interesting. So where do you see all the world of voice leading us into the future? Catherine, what's the future for voice technology?

Dr Catherine Breslin :

So, I think we're still like really early days in interacting with voice. And if you've used systems for any extended period of time, you know that they are not 100% perfect. They make a lot of mistakes still, and they can't have very interesting conversations with people yet. So I think there's a few directions that this technology will go in. So there's, for example, taking the technology out to different languages. So a lot of work has been done in English and then other major languages. So some of the European languages some Chinese, Japanese, but there are lots of minority languages, which we don't have good resources for building voice technology. We don't have good data collections, we don't have many people speaking those languages to use the systems. So I think one of the things that will happen is that we will start to get voice technology developed in more languages and made much more accessible to many more people in the world. Aside from language, I think that the technology will get better. I think speech recognition technology is improving a lot over time so we'll start to see the technology get more accurate in understanding people, but also our conversation technology will start to get more, more involved, and you'll be able to have much more interesting conversations with computers. So there's a lot of research now in the academic community about how to have longer conversations between a computer and a person. And a few years ago, there was the Alexa prize a competition sponsored by Amazon or run by Amazon, which was about can you design a system such that it will have a - an open ended conversation with a person for some number of minutes, maybe 20 minutes or something like that, which is a long time to have a conversation with somebody for. But these sorts of competitions are driving innovation and research in the area of conversation. And we're seeing people try different things and have new ways to hold conversations. So this technology will filter through to applications in the next years, I think.

Ina O'Murchu :

So where do you see businesses benefiting Catherine from the use of voice technologies within their business? Can you give some examples?

Dr Catherine Breslin :

So, I think there's many places where businesses are dealing with audio and they can benefit from using some speech technology. So a really obvious example is the call centers. Call centers have people phone up all the time, and they have problems that are resolved. We talked a little bit about automating call centers and how you can use conversational technology to interact with customers and automate some of the call center. But another thing that call centers might be interested in is analyzing the conversations that are happening with their operators, and figuring out what people are calling about. So if people are calling about specific kinds of problems, you want to train your call center operatives to better answer those questions. But you may discover that people are calling about one particular thing at one particular time. And that's because something went wrong with the products that you sent out. And so you can catch issues and follow those back so if you can understand what people are calling you about, on a large scale, you can help drive the direction of your business into fixing the right problems. So that's definitely uses there in sort of understanding what - what your customers are contacting you about because that's very important. But there are other places inside businesses where you might be dealing with large amounts of audio. So a simple example is subtitling videos. You might have internal training materials, internal video content that you want to make more accessible to people so subtitling I mean, I've just searched that audio can help make your stuff more efficient. You might have a lot of meetings that you want to better minute and better understand what's happening. So there's the chance to use voice technology to analyse what's going on in conversations and help you understand what's happening in order to make things better.

Ina O'Murchu :

Okay, so where are we at today with automatic speech recognition Catherine?

Dr Catherine Breslin :

Automatic speech recognition, we have made huge strides in the past 10 years in making systems more accurate. Now, I think we're at the stage where there are some very good, general purpose speech recognition systems that have been built. That are very accurate and do really well on major languages. So we have, especially in English, sort of general conversational English models, we do very well at speech recognition there. Speech recognition systems, however, they still fall apart a little bit when there is noisy environments. So for example, you're in a noisy car on a motorway, and you're trying to use a speech recognition system, perhaps the noise is just too much for our systems to handle very well. They also don't do very well when you're looking at very specific domains. So if you have a system, you're a doctor in a hospital and you're dictating medical notes to help speed up your data entry in your administration. There's a lot of specialised vocabulary there, and a lot of words that you wouldn't hear in other contexts. And so our general speech recognition systems aren't so good at specific domains like this like medicine or perhaps legal transcription or perhaps some scientific lectures, because the vocabulary and the style is very different. And I think we will start to see the speech recognition technology in more and more languages over time, because we've got some very good technology in major languages now. And I think there's an engineering effort to build speech recognition in other languages.

Ina O'Murchu :

Are there any really interesting projects that you can discuss with us that you've worked on there at your company, Catherine, anything that springs to mind?

Dr Catherine Breslin :

So I can talk about one that comes from before my time here, and this is a company who were looking at noise robust speech recognition, so they, they're a manufacturer of headsets and they wanted to improve the speech recognition in their headset. But their headsets were used to very noisy environments, and so a very noisy kind of audio you're getting from it. So they have other senses on their headset and particularly they have a laser sensor that's measuring vibrations of your cheek and that's obviously not affected by the noise around you, and can help with the speech recognition because the vibrations that your face is making is directly related to the sounds that you're making. And so we did a project where we were using the signal in order to do speech recognition to improve the performance in this, this headset and I think that's a great example of how you can do something just a little bit different and to solve a problem in a noisy environment that you might not otherwise be able to solve.

Ina O'Murchu :

Is there an ethical side to voice technology?

Dr Catherine Breslin :

Absolutely. I think as we build technologies that is being used and deployed in the world, we have to think really carefully about how we can use this and how it can work with people's regular lives and what they're expecting from technology and what they think they've signed up for. So one of the main concerns that are people have issues around privacy and data security. So your voice print is very unique to you. So people are often a little concerned about their voice being recorded and using it what they wouldn't otherwise want their voice to be recorded. And here in Europe, we have very strong laws, data privacy laws around protecting and looking after people's data and giving them some rights about asking that their data be deleted or asking that they conceal the data that companies hold on them. Those sorts of things that come in as a direct reaction to some of this technology that's being exploited and not just in voice, but obviously other areas of AI machine learning too. But then there's other concerns to think about. So when you have a system, which is speaking back to you, and that system could or could not be automated, and you can't tell the difference between whether you're talking to a computer or whether you're talking to a person that raises some questions about whether that is something that should be done. And I think, I firmly stand on the side that if you were talking to a computer, that computer should make yourself aware. Then make you aware that you are talking to a computer, not that the computer should make itself over. The computer should make you aware that you are talking to a computer and that you're not talking to a person. If people behave very differently when they're talking to a computer, or whether they're talking to a person,

Ina O'Murchu :

How can voice technology be manipulated?

Dr Catherine Breslin :

Imitation is a is a problem, I think in voice technology. And until now, we've had the situation where if you want to build a synthetic voice, you usually have a voice actor. And it's their voice that is being recorded and imitated. But as we get better at doing this with less and less data, and we get faster we can start to build from people who aren't professional voice actors, voices, and there's great uses of this technology. And one example of a good use of that technology is in voice banking. So if you are about to face a procedure, which may mean that you could lose your voice, perhaps you can record some of your voice so that you can build a synthetic voice to allow you to synthetically speak with your voice after the surgery. A lot of people are interested in this. But obviously that raises questions about imitating people and pretending to be them over - over the phone, for example, and manipulating other people because they believe that they're talking to someone they know. And this is something which I think is we're starting to face now, the problem of what people call deep fakes. So fake videos that have been faked and we can do the same with audio. So there's a lot of questions there about how and when that shouldn't be used. And again, this is one of the reasons I think that it's always good to have a computer to make you aware that you're talking to a computer.

Ina O'Murchu :

Thanks a lot for listening to my podcast. If you'd like you can subscribe to my podcast on iTunes, Spotify or SoundCloud. Of if you want more information you can head over to my website at www.inaom.io