The Power of Sociolinguistic Bias (w/ Dr. Nicole Holliday) Artwork

People Nerds by Dscout

The People Nerds blog you know and love, now in your headphones!

People Nerds by Dscout

The Power of Sociolinguistic Bias (w/ Dr. Nicole Holliday)

August 31, 2022 • dscout

Language undergirds so much of the experience research process: The designs we're evaluating (e.g., button text, UI, a chatbot's mannerisms), the studies we're creating (e.g., survey questions, interview guides), and certainly the data we collect (e.g., verbatims, video). We might "know" that our language use can bias results, but in what ways and to what effects?

In this episode, we chatted with Dr. Nicole Holliday, a sociolinguist from Pomona College, who investigates the ways social identities are formed, communicated, and interpreted via language. She's also interested in technology's role in this process, from automated speech recognition (ASR) software to live text in video software.

In addition to unpacking her work and methods as a sociolinguist, Dr. Holliday shares the ways biases can materially impact the experience of users. In addition, she outlines strategies we can use to think more critically about the role and impact of language in our work.

Show Notes:

Follow Dr. Holliday on Twitter
Check out Dr. Holliday on the Spectacular Vernacular podcast
Read some of Dr. Holliday's work:

Nicole Holliday:

The number one thing is to be really honest about what our biases are going in. We always do a project expecting that we're going to find something one way or another, or even if we're not explicitly doing hypothesis testing, we have a feeling. You should be honest about that with yourself and with your collaborators or whatever. And maybe even in your write up. We expected to find this, we actually found this other thing.

Ben:

Welcome to the people nerds podcast, expanding your human centered practice with unexpected sources of wisdom. I'm Ben, joined as always by my colleague and friend, Karen. Hey Karen.

Karen:

Hi. How are you doing today Ben?

Ben:

Not as well as you are because today we are talking language and linguistics. Karen, tell us why you are extra excited about this episode.

Karen:

Well, I am super excited because I myself in a past life was in the linguistics field. Before I was a people nerd, I was academically in the world of language and society and societal standards of language, language use and the like. And so I'm very excited today because we are going to be taking a deep dive into the subfield of sociolinguistics, which is a subfield of linguistics concentrated all on the ways in which language helps us perform our identities, interact with one another and make judgements on one another based on the way that we talk. This is something that I studied, and I'm so excited to be talking to our guest today all about it.

Ben:

And before I introduce that guest, I just want to just reiterate just the importance of language to our audience who are qualitatively leaning mixed methods human-centered thinkers. They often use verbatims, direct quotes, maybe audio or video files as part of their deliverables and they know the power of language. And so our guest today I think is a really important one for us to be thinking about biases and the role of technology. And again, how language is shaped uttered and certainly perceived.

Ben:

So our guest today is Dr. Nicole Holliday, assistant professor in the department of linguistics and cognitive science at Pomona College. Dr. Holliday is, as we said, a socio linguist specifically interested in how people use linguistic variation to perform and construct their social identities and to understand the identities of others. She is especially interested in how individuals who cross traditional social boundaries reflect multiple social identities through linguistic practices. So again, if you thought going into this language, it is so much more than that. Karen, what were your highlights from the conversation outside of all of it?

Karen:

How dare you? Every second that someone's talking about linguistics is the best possible-

Ben:

Is the best part. Yes.

Karen:

Yeah, it is the best part. No, Dr. Holliday, we got to talk about a huge range of topics within this conversation. We dove really deep into the field of sociolinguistics, how it is methodologically carried out and the things that's concerned about. We got to talk about language perception and bias, particularly when it comes to black and biracial speakers, which is Dr. Holliday's particular area of expertise. We talk about the material and emotional impact that language bias can have on certain communities and speakers. We also get deep into technological biases in things like automatic speech recognition and the ways in which those biases are informed by research and data and the ways that we can hopefully fix or improve them in the future with better uses of language and data. And then lastly, we do get to talk about biases in the research process, both linguistic and otherwise, and how to potentially avoid them, or at least be transparent about them in our own work.

Ben:

Yeah, we are so excited to bring you this conversation. We think it will elevate your thinking about and hopefully use of language. So let's get into it. Here is our conversation with Dr. Nicole Holliday.

Ben:

Welcome to the people nerds podcast, Dr. Nicole Holliday. We're so excited to be joined by you.

Nicole Holliday:

Hi, it's great to be with y'all.

Ben:

We have so much that we'd like to cover with you. We think that your work, your research and your methods will have a lot of resonance with our audience, but before we get into those nitty gritty details, I'm hoping you could start by just describing what your work as a socio linguist means? What does it mean to do sociolinguistic research and certainly in today's rather tech saturated world?

Nicole Holliday:

Yeah. I mean, basically the central question of sociolinguistics is, and this is not me, this is billed above from back in the day. Why did this person say this thing this way in this situation? So everybody knows language varies. Even when I said it was nice to be with you all, I could have said y'all, I could have said you guys, I could have said you, but I said it's nice to be with you all and why did I do that?

Nicole Holliday:

And I actually had a little moment of self reflection when I did that, which is why I'm saying this because I am a person who says you guys as part of my dialect and my language is changing away from that. And so I'm now making different choices and that's exactly the kind of thing that we study. What are people doing? What does it say about the way that they are moving through society? What does it say about them and what does it say about what the nature of language is? What our boundaries are around language?

Ben:

And just within that then, where do you find yourself? I mean, you do have a specific area of interest. Could you talk about what it is that within this amazingly diverse and nuanced ... I mean, human language construction and both paralinguistic and the linguistic elements of it, there's a lot to choose from. So what sorts of questions occupy your time and what sorts of answers are you seeking?

Nicole Holliday:

So I'm interested in how people make social judgements on very small elements of language. So I look at tone and intonation. So how the voice goes up and down and how people are able to make judgments about someone's race, for example, or where they're from, or their age or anything like that, based on very small speech samples. So just where the voice goes up and down, not even anything about the content or anything like that.

Karen:

That is so interesting. And I have a bunch of questions to follow up, but I think first can we start with how? How do you do that? What kind of methods do you use to get out those judgements and perceptions?

Nicole Holliday:

Yeah, this is a really good question. Well, sometimes we just ask people. The basic thing that we teach students to do is get people's impressions of speech. So play somebody a sample, what do they think? And we use that kind of impressionistic methodology sometimes too. Most of my work is quantitative. So I do a lot of measuring. Did the voice go up X hertz at the end of a phrase, as opposed to this other person who goes up a different number of hertz? And what does that mean?

Nicole Holliday:

When they asked a question that was a yes, no question like is today Friday, whatever it is, did they raise their voice? Did they lower the pitch? So these kinds of things that we can actually measure and quantify and compare people to each other, or compare the same person in different situations where maybe they're doing one aspect of their identity or doing another aspect of their identity. So there's a lot of ways. I mean, basically we are interested in how people perceive speech, so there's perception studies, and then also how people are actually articulating. So those are production studies.

Karen:

From my experience of being in the linguistics world, I know that there are quite a few people out there who haven't really ever thought too much about language as an issue of race or ethnicity and judgment of those things as an issue of racism or oppression. I was wondering if you could speak to that a little bit, what exactly is the connection here between sociolinguistic work and these issues of race and identity?

Nicole Holliday:

So I mean there's really basic stuff. If you talk to somebody on the phone, you don't see them, you're going to get a picture of who they are. And this is just really automatic for us because it's useful for us to have social information with our [inaudible 00:08:32], the people that we're talking to. So if you call your bank and they answer, you're going to make some presumptions, right or wrong, about the gender, the age, the region, the race, including of the person that you're talking to, and this is going to impact how you talk to them.

Nicole Holliday:

So I just give you example from my own life, I'm terrified of the DMV and-

Ben:

Rightfully so.

Nicole Holliday:

Yeah. So actually at one point I was living in DC so I had to call the DC DMV to ask something about paperwork and the woman answered the phone. And I just knew that she was an older black woman and I was also afraid of the DMV. So I was like, okay, I'm talking to an older black woman, it's very important that she knows that I'm also black. And that I'm very, very polite.

Ben:

Yes.

Nicole Holliday:

Because this is a situation where I don't have the power, I need help. And culturally, I have a set of things that I am naturally going to do when I talk to somebody that is older than me, that either is the same race as me, or isn't the same race as me. We all do this when we talk to different people. This can be good. It probably helped in my DMV interaction that I was conscientious about this because maybe she expected to be treated politely. I mean, we should treat people politely anyway, but-

Ben:

Absolutely. Yes.

Nicole Holliday:

But if we have negative social stereotypes, if we are participating in racism because we live in a racist society, then sometimes it negatively impacts the way that we interact with people or that we treat people. I have an ongoing project with Dr. Sabri Fisher from Wellesley that's funded by the Spencer Foundation where we're basically investigating the relationship between kids experiencing school discipline and their use of particular features of the variety that we call African American English. But for my piece, I'm interested in whether maybe black students are sometimes interpreted as hostile when in fact they're not, it's just a dialect difference. It's a difference in expectation between the way that the teachers talk and the way that the students talk, because they're coming from different experiences, different backgrounds, and you can get this cultural misunderstanding. So this is the kind of thing that I think that's where the stakes are, right?

Ben:

And I'm struck by that they are. I mean, there are material consequences as you've just been outlining there. I was reading one of your previously published pieces and we'll share a link to it below about black students in predominantly white universities and the material consequences of certainly out grouping versus in grouping, but being a student who feels free to choose majors that is interesting to one and socialize in ways that are ... again, all the sorts of things, all the positive outcomes that what we should experience or we expect rather from prototypical university environments. Could you describe a bit of about what you found in that interview study?

Nicole Holliday:

Yeah. So this is a project that I did with Lauren Squires from Ohio State University. And she is also a linguist and had noticed that she had a lot of black students, semester after semester, that would come up to her and be super interested in the course material at the end of the semester, after they had been quiet in class or would have private conversations with her, but wouldn't be speaking up and things like this. And she wondered, is there something about the social situation that is making these students feel uncomfortable or whatever.

Nicole Holliday:

And once she started asking students about this informally, they said "Yeah, well this is a linguistics class. I'm the only black student here. If I say something incorrect, or if I ask the question that other people think is stupid, it's just really threatening for me. I already kind of feel like I don't belong here because I'm the only one."

Nicole Holliday:

And so we set out to find out what are the factors that influence students to feel this way? Because feeling like you can't speak up in class is a problem and even if you're in a place where you feel like, okay, it's not exactly racially hostile, it's just not the most comfortable situation you're still not getting the most out of your educational experience. So we did a series of interviews with students at Pomona College, where I am in California, which is a liberal arts school and with some students at Ohio State where Lauren Squires is, and they said similar and different things about, "I feel a lot of pressure to sound extra smart" especially in Pomona, that's what the students said. Karen's laughing-

Karen:

Sorry. I'm laughing.

Nicole Holliday:

She went to the Claremont colleges-

Karen:

Alma Mater of Pomona, and that tracks.

Nicole Holliday:

Yeah. So there's a lot of pressure to impress each other and students of all races feel this way. I've heard that, but especially if you feel stereotype threat, you're the only one and everybody's looking at you as the black kid and you are expected, you feel like you're expected to speak for all black people all the time. Then the stakes of speaking up in class are higher.

Nicole Holliday:

And we have a lot of literature on this and education about women for example, being less likely to speak up in math classes, anybody that's kind of marginalized can feel this stereotype thread or whatever. And so it's impacting their educational experiences. Also students report issues with sounding different than people expect them to sound and being policed for that in a way. So not sounding black enough in black spaces or sounding too black in the classroom or not being up on the latest progressive language because they grew up in a different kind of community and then being targeted for that. So there's all kinds of things that can go awry because of this relationship that we have between the way that we talk and what it says about our identity. And then of course on top of that, the assumptions that we make about other people.

Ben:

One other question, Dr. Holiday, about the field more broadly, and I'm hoping in the second half of our conversation we can talk about some of the technological implications on your questions around equity access, different interpretations of folks' tonality and the nuances of their voice. How more broadly have you observed the milieu of technology to change your work or to change linguistics as a field?

Nicole Holliday:

Yeah, no, this is a really good question. People talk about the metaverse thing. We're kind of-

Ben:

Oh my gosh. Yes. Yeah.

Nicole Holliday:

We are embodied in 3D space, but we are also all having these parallel and interlocking online lives too. So the way that people communicate online is the whole area of research for linguists. It's not mine really so much, but I am still interested in as we're in our online space and in our 3D space, how do some of the same types of questions that we've been talking about play out? So I've just been working on a few projects this year with some collaborators at students at the University of Pennsylvania, and also another project with some collaborators at consumer reports in Northeastern university, where we're looking at bias in auto captioning systems.

Nicole Holliday:

So a lot of people use auto captions. If you're on zoom you can turn on the auto captions. There's captions on YouTube. People are familiar with this. And of course they're there for lots of reasons, for accessibility, for people who are deaf or hard of hearing, for people that have auditory processing difficulties, those are really helpful. They're also really helpful if you're speaking English as a second language and 75% of the people in the world who are speaking English are speaking it as a second or third or fourth or fifth language. That's the best estimate we have. So people really utilize these things. A lot of people just even enjoy watching TV with the subtitles on all the time. Super useful.

Nicole Holliday:

Well, what we found in the series of studies that I've been involved with in the last year is there's a really persistent bias against people who speak English as a second language. So if you have a speaker who they come from wherever, and they're on zoom and you're talking to them and English is there a second language even if they are master master level proficient in English, because of their pronunciation these systems are going to do much worse at understanding them than humans do.

Nicole Holliday:

So if you are a person that's hard of hearing, you are going to be less likely to be able to understand the messages coming from somebody who's speaking English as a second language than somebody who's speaking it as their first language, because the systems are not reliably as good for everybody. Other studies have found racial bias, gender bias. Obviously there's a lot involved with the kind of communication that's happening. If it's super technical, there's a lot of jargon. The systems can struggle. They can struggle on the other end if it's really, really informal, because the way that they function is they do better on things that are more similar to their training data and their training data tends to be semi formal corpus, stuff that was built by somebody a while ago. So if you're talking to somebody they're using a lot of really modern slang or something like that, it's not going to get it because it's not what it was trained on.

Ben:

I guess as a follow up to that, Nicole, are you shifting your methods at all? Again, as someone who leans more quantitative are you ... some of my social networking scholar colleagues do network scraping to look, they're not looking at tonality or intonation, or they might be using specific language use questions to drive their research. Have you flexed or modified any of the research methods that you've been using to capture TikTok videos as a way? The multimedia richness of technology might offer a boon for your work or could be wholly overwhelming as a way to systematize it?

Nicole Holliday:

Yeah. I mean, honestly, I'm pretty old school. And it seems really obvious from the outside, but I will tell you that it's more complicated than this. Okay well, we're just looking at people's pitch. Can't we automate that? Can't we measure that? Actually no, pitch tracking is really, really hard because we can just give a little bit of a behind the scenes, how the sausage is made.

Ben:

Please.

Nicole Holliday:

We rescheduled the current interview that you and I are doing because I was in a place where there was a guy talking outside loudly, outside of the booth that I was trying to record in. And it would've sounded bad for the podcast. Well guess what happens if you try to do pitch tracking on my voice in that same situation? It picks him up too. It picks up ambient noise. If there's an air conditioner, if a truck goes by, any of those things.

Nicole Holliday:

So automated pitch tracking for these narrow language purposes is not very functional. So we end up doing a lot of things by hand still, very time consuming. For that reason it's a little bit harder to do things with really big data. I have had some things that to me are pretty big, but for example, this automatic speech recognition auto caption stuff that I was talking about, I can't do necessarily the linguistic pitch measurements, the questions that I would ask on a data set with 900 speakers. I mean, it would just take a really long time. I guess I could do it.

Karen:

So a small army of undergraduates?

Nicole Holliday:

I need to hire about 20 people for a whole year. So that's one thing. There are though, because of the wealth of data that people are basically posting online, you can look at YouTube videos as data. You can look at all kinds of things is data. I have a project that I've been working on for too long, where people were posting Memoji versions of themselves on Twitter talking specifically about code switching. So somebody would post their Memoji talking, saying like, "Well, this is my home voice. And this is how I talk at home. And this is my work voice. And this is how I talk at work."

Nicole Holliday:

And I saw that and I was like, oh my God, this is perfect. They just created a code switching data set for me, and I can see their facial expressions because they're Memoji, but what's more interesting to me is I don't need to ask them questions about race because they chose a particular Memoji skin tone and style and hair to represent themselves. So I'm not saying anything about the race of these people. I'm saying things about how these people presented their race and the relationship between that self presentation and the way that they talk about code switching.

Karen:

Nicole, what you just brought up is so fascinating. And I would really love to dig in more a little bit about your methods and how you choose to go about things the way you do. Let's get back to that right after a short break.

Ben:

Welcome to our scout sound off where we check in with folks about a topic from the pod. We do this using D scout express, a quick turn, qualitative tool, purpose built for surfacing stories and experience feedback. This episode sound off focuses on folks dialects and specifically the way they perceive their voice to sound as well as how they believe it's perceived by others. Karen, walk us through what you found this week.

Karen:

Yeah, thanks Ben. And this week was really interesting. We decided to recruit folks who perceive themselves or report themselves as having a dialect or accent other than what linguists might call muse or mainstream US English. This is what you might hear newscasters speak in or something that you might consider to be non accented or non dialect speak. We found 65 people in our panel who reported speaking in a way besides that. And it was our fastest turnaround yet. We launched this in the morning and we were analyzing by mid afternoon.

Karen:

About half of the folks who we ended up recruiting were English as second language speakers who spoke English with some accent related to their original language. But the other half were native English speakers who were raised in a variety of American dialects, including but not limited to African American vernacular English, Chicano English, New York English, and several varieties of Southern English. All of these folks speak totally competently and well, just in a dialect other than muse, we ask them how they think others perceive their speech. And about two thirds of them noted that they feel noticeably judged by others based solely on their accent or dialect. We asked them to speak a little bit more about this experience and how it affects them. And we want to share some of those stories and thoughts with you here. Take it away Scouts.

Speaker 1:

I think a lot of times people are judged on the way that they speak compared to other people.

Speaker 2:

People often have stereotypes and so it's stressful, the idea that the moment I let go and speak with my natural at home accent or dialect, I'm getting judged. People are making value judgements right off the bat.

Speaker 3:

I guess I'm kind of self-conscious about it. Or even the way I pronunciate words sometimes because I feel like people look at me like I'm not as educated or smart, even though I know that I am.

Speaker 4:

There's always this stigma, they think you're a redneck or a hick, or there's all these labels that go with that dialect.

Speaker 5:

Growing up I definitely got picked on a lot for it and it really did affect my self-esteem when I was younger. But now that I'm older, I'm doing my best to be true to myself and my family.

Speaker 6:

I guess that it's something that I'm proud of, that I see it as pretty much where I came from so it's something that's part of me. It's not something that I'm able to get rid of.

Speaker 7:

People may think I'm not as smart as they think I am, but it doesn't stop me. That's just an accent. Doesn't mean I'm not intelligent. Doesn't mean I'm a fool.

Ben:

Thanks very much to our Scouts for sharing their experiences. If you'd like to start capturing empathy rich stories, or add an always on element to your UX research toolkit, check out dscout.com. And now let's get back to our conversation with Nicole.

Karen:

Welcome back, Dr. Holliday. Super excited to talk a little bit more in depth about methods and about things that might be applicable to our audience in the industry trying to do research where they're sitting. My first question for you is maybe a little bit noodly, but bear with me. One thing that's often on people's minds is this question of self-report versus actual behavior. So are people really doing what they say they're doing? How can we track people's behavior? And how does that line up or mismatch with what they say they think about something. I'm kind of curious about which methods you use to answer which questions and how you can really get right down to the bottom of people's behavior, perhaps even past their conscious opinions of their behavior.

Nicole Holliday:

Yeah. It's not easy. This is the constant issue with social sciences. I have a study that is really unusual and we basically called it ... it's called something fancy now but when we were in draft, it was how black does Obama sound now? And that's literally what we asked the participants. So we had all of these manipulated clips of Obama. We messed with his pitch. This is work that I did with Dan [inaudible 00:26:29], the university of Pittsburgh.

Nicole Holliday:

So we very carefully manipulated Obama saying these short phrases to make the pitch higher and lower in some places in a way that we thought would be more characteristic of African American speech or of white speech. And we just asked people over and over again different phrases, how black does Obama sound now? When you do studies like this you always do a pilot and you send it to your friends and family. So I sent it to my mom and I was like, "Mom, can you do this study? Give me your feedback." And she got done with it. She's like, "This is the stupidest thing I've ever seen in my life. You're not going to learn anything."

Ben:

What was the response scale? Was it one to seven? Not at all black, very black-

Nicole Holliday:

Basically yes, that. And she's like, "You got a PhD for this?" Yeah. Thanks. So we ran the study and yeah, people made exactly the judgements that we thought they would. They really did judge him differently on the ones that we had manipulated to sound what we thought would be more characteristic of an African American intonation pattern. They said he did sound blacker. But I think I bring this up to a roundabout way to answer your question, Karen.

Nicole Holliday:

My mom thought that this was very stupid. She thought I'm not doing anything. What do you mean? I know how black he is. He can't sound more or less black in different clips, but what she was actually doing was judging him differently, even though she didn't have and [inaudible 00:27:55] linguistic awareness about that. And also not only her, other people when we asked for qualitative feedback said, "I didn't feel comfortable with this. This is a really weird question. I don't like to make racial judgments based on speech." And the truth is that people are making racial judgment based on speech. So regardless of whether you want to admit that you're doing that or not, you are doing that. So we have to be creative in our methods to get around the fact that people think this is not a polite thing to talk about.

Ben:

In the creation of research is there anything you might share to keep our implicit biases or keep the idea of bias top of mind?

Nicole Holliday:

I mean, I do not have a PhD in survey design. That is a whole thing. And the reason I know that is because when I was in grad school, I worked in the census bureau for a little bit and my job was largely testing translation efficacy. So if we translate this word this way in Spanish versus this other way in Spanish, then do we get more reliable responses or the responses we're looking for, this kind of thing.

Nicole Holliday:

And I was working with people who had degrees in survey design. So I saw the way that we think about this. I mean, I think that the number one thing is to be really honest about what our biases are going in. We always do a project expecting that we're going to find something one way or another, or even if we're not explicitly doing hypothesis testing, we have a feeling. You should be honest about that with yourself and with your collaborators or whatever. And maybe even in your write up, we expected to find this, we actually found this other thing.

Nicole Holliday:

This is one reason that I also really prefer to do quantitative as opposed to qualitative research for myself, because I feel like it keeps me more honest. I have trouble not being biased. So maybe I'm not a person that should do extremely qualitative research. In sociolinguistics you're always getting a little bit of both because people categories are messy. We were talking, Karen mentioned, I had worked with biracial people. Nobody had ever studied biracial people explicitly before in a linguistic study and okay, I'm biracial so I truly had my own personal interest. But part of the reason that folks were not interested in studying these folks before is because they're hard to ... what's going on with them? They don't fit into a binary the way that we have historically talked or thought about race. How do we even deal with the messiness of that category? Well, it's got to be a little qualitative. I can't just say, well, on a scale of white to black, here's where you are. It doesn't work that way. These are whole people.

Karen:

Right? Yeah.

Nicole Holliday:

So in terms of addressing our own biases, that work was hard for me because I have my own identity in the way that I've moved through the world and I had to hear about people that had done things very differently. Sometimes I agreed, sometimes I didn't agree. But when I wrote up that research, I had to say, look, this is who I am too, my positionality and what I thought I would find and things like that. So I think transparency is one piece of it.

Nicole Holliday:

Also, if you can design your study or your project to get farther away from your question or to make sure that even if there is bias, people can't really enact it. So this is one thing that I really like about studying tone and intonation. People are exposed to such short samples in the studies that I do typically that they can't be too biased about the content, for example. So the words, they're getting one word at a time so as long as they're not saying something that's particularly racially charged or associated with young people or whatever, they're just saying common words, then that is one way to get out of that. So I'm just asking them to judge on tone. Well, the question then becomes narrow enough that even if the listeners have some bias, it can't really be introduced in that type of scenario because they don't have enough information to start creating a story in their mind about what's going on with this person.

Karen:

I love the idea of introducing greater transparency into reporting, both design and reporting. But I'm thinking too because not only are we introducing our own biases, but often we have very explicit ... people in the industry might have very explicit biases coming externally from various stakeholders who not only have implicitly, but have explicitly said we are expecting to find this.

Ben:

We need to ship this in two weeks.

Karen:

Or we need you to find this, which is a very interesting extra piece of the puzzle. Not that I expect you to have an answer, but it's just making me reflect on how even just introducing that kind of transparency in the reporting might go a long way to be like, look, these are the stakeholders, this is what everybody expected to see. We saw this or we didn't, but being a little bit expanding the story of the research to that beginning stage of what is everybody expecting and why is this happening?

Nicole Holliday:

Yeah. I mean, this is one thing that I guess I'm really privileged to be an academic. Nobody's asking me for deliverables. It's all on my own account or whatever. So I think that is hard when you're experiencing external pressure from a funder or from your boss or whatever it is to be like, okay, I'm ethical, but I also have to have this deliverable. So it's something to manage, but if you're in a place where people are committed to doing the research well then even if you find something a little different than what you intended, I think it's still always useful. Really what research is is just gathering information.

Nicole Holliday:

So even if it goes off the rails a little bit, it's all right. And this is one thing that I really like about linguistics compared to other fields and specifically sociolinguistics. There's not really such a thing as a null result because we're always just describing what people are doing. So it's like, well, we thought they would be using this intonational pattern, but it turns out they're not. Well, that's still an interesting thing. Why aren't they then? Let's see what we can take from what we know, the social information, things like that to figure out the puzzle.

Ben:

Earlier you referenced automatic speech recognition and closed captioning as an example where bias of training data can hinder and limit folks' ability to connect. What would you like to see in a world wherein, Karen and I can speak firsthand, every second or third client or partner we're working with is trying to automate inject machine learning, streamline language. They want to make chat bots easier to use for folks so that they don't have to talk to a real human. And again, I know you're not a data ethicist in the traditional sense, but you do think about these automatic systems vis a vis language. What would you like to see so that we could be including more folks and reducing harm and exclusion for others?

Nicole Holliday:

Yeah, I mean, do I think we should all hire linguists? Yes. Everybody should hire-

Ben:

There we go. There it is.

Nicole Holliday:

My self interested, my not ethical response. I think that things have improved in the training data really rapidly for English speakers over time. So in one of the studies that I was working on, I'll just say this, YouTube has much better auto captioning than almost any other platform that we've seen consistently. And part of the reason is that YouTube has so much data from so long and also from all kinds of genres. Think about what the diversity of what people put on YouTube. It's kids doing gaming, it's literally people putting videos of their small children. It's old people, it's people doing formal speeches. It's all kinds of stuff. That's how language actually works. That's how humans process language. We're used to this stylistic variation, we're used to, oh, this human is a different kind of human than this other one and so I'm going to interpret them differently.

Nicole Holliday:

We're trying to get the machines to do this. And it's really, really hard because we learn to do this over the course of our lives. And we have brains that work in ways that we don't even totally understand that allow us to make these adjustments. And part of the issue is we don't understand everything about cognition. So how are we going to program the computer to do this? But I think being conscientious about the historical biases in ASR, for example, is a lot of the battle. Improving your training data, bringing in people who can point out, people like linguists or other social scientists who can point out, "Hey, this might be an issue because of what this training data looks like or we need to update it this way, or we need it to be sensitive in this other way."

Nicole Holliday:

In a project that I was working on with the students at the University of Pennsylvania, we looked at Otter AI transcription, which runs on the back of Zoom. And the first thing we did we looked at was, is there bias? Yeah, there's bias against L2 or second language English speakers, which was not surprising. And then we said, well, is the bias consistent based on the person's first language? Can we actually look at the linguistic features where the ASR seems to be going wrong and then suggest how to fix that? So for example, if you have a speaker of a language that only has three vowels, which some languages only have three vowels, we know that their pronunciation of the 12, 13, 15, 18, however many vowels there are in English, depends on your variety. There are more, there are at least 13.

Karen:

There are not only five folks, you heard it-

Nicole Holliday:

Yeah, there's a ton of sounds. That's why you have to learn the crazy short I. And we don't use that terminology, but when you do hooked on phonics or whatever. So, okay you have a speaker that has a three vowel system now speaking a language with a 14 vowel system or whatever. Well, their E is not going to be in the same place as an English speaker's E, there A's not going to be in the same place. And so what happens with the ASR is it can't tell the difference between peach and pitch. But if you tell the system, "Hey, this person has a very dispersed pronunciation of I that could be anywhere, look more deeply, not only at the acoustic model which is telling you which vowel it's going to be, but also at the language model, which tells you which words should be there for this kind of speaker, because their pronunciation might be more variable." Well, then we can start to fix that. You know you're dealing with a speaker who's likely to have this kind of mismatch between the training data. Well, build it in.

Ben:

I'm struck by just how much context you're advocating for, rightfully so.

Karen:

What also really strikes me is the importance of research. There's a particular kind of evaluative research here, or perhaps even a generative style of research that is thinking about segments that we might not even know .... people who are not trained in this might not even think, oh, these are two different segments, people who only use a three vowel system versus people who use a different number of vowels. To your point it's not even just English as second language speakers. It might be what language were you speaking first? Or what is your pronunciation system and backing up it sounds like some people might not even be considering that ESL speakers are a segment in and of themselves that need to be studied separately, evaluated separately from muse, mainstream US English speakers, or English as first language speakers. And it's just making me really reflect on the importance of doing research on the front end to really try to understand your user base or the audience that you're trying to speak to and look for these differences that might be really significantly impacting the use of a tool that you might not be cognizantly aware of at the outset.

Nicole Holliday:

And I know probably all of the CS folks who are out here, oh, well, she thinks it's so easy to just add all this context, but it's so hard because the data's so big and da da, da, da. Yeah, it is. But you know what you could also do if you are doing an ASR? How about you ask the speaker what their first language is, and then you have a model that's for that? I'm asking for a one question survey and obviously it's not going to be for every language in the world, but there's a lot of people whose first language is Spanish. We could start there. Or Mandarin or these major ones, which there's just ... the ASR systems we've seen usually do so bad with Mandarin speakers speaking English. There's so many of them, that's ridiculous.

Karen:

It's [inaudible 00:41:22] languages in the world,

Nicole Holliday:

Half a billion people that are speaking some English here or there so that's a big market.

Ben:

I'm hoping we can get you out on something a little lighter in fare. And that is, this is a podcast called people nerds. What are you nerdy about aside from intonational nuances in the contours of language, though nerdy and super interesting in its own. What else are you a people nerd about?

Nicole Holliday:

I am a trivia person, a real real trivia person, which is insane because it's just like I'm really good at retaining information I don't typically need. If I need it, I have to study. If I don't need it's just going to be floating around in my mind. But I think there's an advantage there in academic spaces. I had the fortune to be in Europe in the summer. And in Portugal they use cha for tea. But in Spain, for example, it's te, it's tea and almost all the languages in the world either use te or cha. And it's because they're both Chinese words. So this is trivia and linguistics.

Ben:

Perfect.

Nicole Holliday:

And it depends on how they came to tea. So if it came via land route, or if it came via sea route. So I was on a tour in Portugal and we had tea and I was like, "Let me tell you a fun fact about why it's called cha here, because ...", and I'm that person for sure.

Ben:

Oh, that's great.

Nicole Holliday:

Not only about language stuff, but it's just really fun to learn things and know things and share things. And it's great. So that's the other thing I would say I'm kind of a nerd on.

Ben:

That's fantastic. That's fantastic. The ringer for your next trivia. We really, really appreciate some of your time. Thank you so much, Dr. Holliday.

Nicole Holliday:

Of course, it's been so fun.

Karen:

Wow. Thank you so much to Dr. Nicole Holliday for being on our podcast this week, we have dropped all of the papers she mentioned in the link below, in case you want to explore them. Thanks, Ben, for compiling that.

Ben:

You got it, you got it.

Karen:

And Nicole also has a podcast of her own that she ran called Spectacular Vernacular. Along with Wall Street Journal language columnist Ben Zimmer. In it they've discussed the ways that language is changing. They talk to scholars and writers and they set and solve word puzzles. You can also find Dr. Holliday at her website linked below and at her Twitter with the handle @mixedlinguist.

Ben:

And if you like what you hear, please subscribe. We'd also love your feedback via a review. For more resources on human-centered research including howtos and breakdowns, check out peoplenerds.com. You can also find us on your favorite social platform with the handle @dscout. And if you're curious about a platform to start supporting your mixed methods research practice, be sure to check out dscout.com.

Karen:

And tune in down the road for more interesting conversations and food for thought from outside the borders of UX. Thanks again for listening. See you next time nerds.

Ben:

Bye nerds.