Across Acoustics

The official podcast of the Acoustical Society of America's Publications' Office. Highlighting authors' research from our four publications - The Journal of the Acoustical Society of America (JASA), JASA Express Letters, Proceedings of Meetings on Acoustics (POMA), and Acoustics Today.

All Episodes

Across Acoustics

Iconicity and Sound Symbolism

August 11, 2025 • ASA Publications' Office

For a long time, it was believed that words were mostly arbitrary symbols. However, with advances in our ability to study speech, it has become clear that we must reconsider the fundamental relationship between words' sounds and their meanings. In this episode, we talk to two of the editors of the Special Issue on Iconicity and Sound Symbolism, Aleksandra Ćwiek and Susanne Fuchs (Leibniz-Centre General Linguistics), about research in the issue that examined these connections.

Read all the articles from the special issue here!

Read more from The Journal of the Acoustical Society of America (JASA).
Learn more about Acoustical Society of America Publications.

Music Credit: Min 2019 by minwbu from Pixabay.

ASA Publications (00:24)

Today we're going discuss a recent special issue from JASA, the Special Issue on Iconicity and Sound Symbolism, which aims to shed some light on the relationship between words’ sounds and their meanings. With me are two of the editors of the special issue, Aleksandra Cwiek and Susanne Fuchs. Thanks for taking the time to speak with me today. How are you?

Aleksandra (00:43)

Hi. Thank you. Yes, very well. Excited to do this together.

ASA Publications (00:48)

I'm very excited, too! I've been fascinated by this issue since we started publishing it. So tell us a bit about your research backgrounds.

Aleksandra (00:57)

I currently work in the Laboratory Phonology Group at the Leibniz Center of General Linguistics in Berlin. I've worked on iconicity since 2018 and I've done my PhD under the supervision of Susanne, for example, on this exact topic and I completed this PhD on 2022. And I'm also interested in studying multimodality, so how communication combines various senses, like speech, movement, touch, and possibly other senses. And I'm also interested in the question on how human language emerged in the first place.

Susanne (01:36)

Yes, and I'm Susanne Fuchs. I'm a speech scientist also working at ZAS, which is a research institution in Berlin that tries to improve our understanding of the biological, cognitive and social factors that are involved in language. And within the institute, I'm one of the senior scientists that is specialized in biological mechanisms involved in language. So this involves studying the motions of the articulators, gestures, breathing, and so on and so on, and the coordination of all these actions.

ASA Publications (02:12)

Cool, so that's where you get all those like cool tongue MRI images or whatever that we see sometimes in articles?

Susanne (02:18)

We don't work with MRI, but we work lot with different techniques, measuring breathing using respiratory platysmography or electromagnetic articulography or electropalatography, measuring tongue palatal contact and so on. So we have a lab for this here, and a motion capture lab with Optitrack. Yeah, it's cool.

ASA Publications (02:39)

Okay, okay. So I guess the first question is, what are iconicity and sound symbolism?

Susanne (02:44)

So let me step back a little bit. In linguistics, signs are mostly, but not entirely arbitrary. That’s different from sound symbolism or iconicity. So, for example, if you have the word whale, it's actually a rather small word for something really, really big, while microorganism is a big word for something very small. And there's nothing in canine about a dog. This is what Hackett, one of researchers in this area, said. So people have assumed for a long, long time that linguistic science or linguistics, words and meanings, concepts and so on, are arbitrary. They have nothing to do with the form of how you call them or express words’ concepts have nothing to do with its meaning.

Aleksandra (03:37)

Yeah, so iconicity would be very briefly, very simply said, the resemblance between form and meaning. And it can be perceived, but it can be produced. It can be either one or the other or both at the same time. So if I mean something iconically, but you don't necessarily perceive it as iconic, it doesn't diminish the value of iconicity in this that only I mean it this way, only for me it's iconic.

But the way that an expression is, whether it's an acoustic expression, a word, a sound, an image, it has to have this resemblance between how it is, its form, and what it refers to, its meaning. So this is iconicity in a very simple nutshell. People have written many articles about the definition. So we're just boiling it down to a simple one here. And sound symbolism a special kind of iconicity that just refers to sounds. So when we just focus on speech sounds, sound symbolism refers to how these speech sounds, by the way they are, can refer to a certain meaning or a certain part of a whole meaning. It doesn't have to be a whole meaning of one thing.

ASA Publications (04:56)

Okay. Okay, okay.

Aleksandra (05:00)

To give you an idea of how iconicity in sound, because here we use the sound modality in a podcast. If we had a video, we could also use the visual modality. But in this acoustic environment, for example, when I say “mil” or “mal,” which of these names would be more fitting for a big table? And I couldn't resist to use another iconic term. Well,, there were studies showing that “mil” is more fitting for a small table and “mal” for a big table because just of how “eee” and “ah” are opposing to one another. “Eee” has a, just by its virtue, a higher fundamental frequency, very slightly, it's an intrinsic frequency, just a little bit higher than an “ah,” just by the inherent virtue of how it is produced. And so it feels better to refer to smaller things with a “eee” or things that are closer, and things that are further, they are more “ah” distance.

And then we can just use the tone of the voice. So when I was saying “a big table,” I lowered my voice. This is also an example of iconicity. So this is a big, “I got a big raise.” But this is just a small, a small problem. Just a small problem. And the same when you're talking to a baby, you want to sound friendly, so you're making your voice higher. Like, “Oh, you're little cute.” “How nice is it, little cute girl, boy?” “What a nice puppy.” And this evokes, again, an impression of smallness, friendliness. You're not threatening. So there are different levels of, on a level of just the sound, then on the properties of how sound can be--can it be high or low in a frequency?

And then we can have it as a metaphor of friendliness or threatening something. So if I want to be dominating, I will use a low voice. “I'm a very dominating person, actually.” Then “I'm bossy, bossy girl, bossy person.”

Susanne (07:16)

And there are other stereotypes of creaky voice, for instance. So if I use this kind of creak, you know, I also want to show that I'm particularly eloquent or particularly dominant whatsoever, but I'm not a child in a situation and want to play and I rather show that I'm somebody.

Aleksandra (07:41)

Yeah, so in the creaky voice, for example, we see this, even Frank Zappa wrote this song, Valley Girl, because it already started decades ago that in the Valley, it started with women, maybe because of the societal role women have been put into. And so in order to try to fight it with like, “I'm not a little girl. I'm really, I can also be a boss. I can be someone.” Then trying to use this metaphor of dominance in the voice, “I'm gonna speak like this. So I'm gonna have a really low voice.” But there is a level when it starts to sound ridiculous, actually. Really there is because then I'm trying to do one thing, but the other thing is how you perceive it.

So one thing that I learned in my PhD from Susanne is like, look at the production but also look at the perception. And here there are sociolinguistic studies that actually the perception of especially this creaky voice may not be as dominant. It may be… It's just fake. So we have to look at both sides. But essentially this also comes down to iconic phenomena, to what iconicity can manifest and can give us in a voice.

ASA Publications (09:04)

Yeah, interesting, interesting. And so in the production sense too, going back to the “eee” versus “ah”, when we were prepping for this, and maybe you're going to bring this up later, but you were talking about how like “eee” is a smaller shape in your mouth when you're making that sound and “ah” is a larger mouth shape. And that's sort of those two tie together to the “eee” relating to smaller size things and “ah” relating to larger things sometimes. Did I understand that correctly? Or...

Susanne (09:31)

Yeah, I mean you find it the acoustic level as Aleksandra already said. You know, you see an intrinsic difference in the fundamental frequency between “eee” and “ah,” but on the other hand, you also have articulatory correlates for that because when you produce an “ah” you lower your jaw and your whole vocal tract becomes more widely open, in comparison to an “eee” which is more constricted, so an “eee” has a rather closed jaw position and a high tongue and that overall is a rather small vocal tract, or it corresponds or it leads to a smaller vocal tract area function whatsoever. So you have both in the acoustics and in the articulatory domain and they also, they are in relation to each other. We come back to this later on.

Aleksandra (10:23)

Yes, we have actually in the special issue two papers that exactly talk about whether it's more about acoustics or more about articulation that drive the iconic phenomena. But we can see, as you mentioned here, both on the one hand, this fundamental frequency, which would be more of an acoustic side, and then the articulatory side about the constriction or also the sensory motor feeling of like there is more tongue occupying the space versus less tongue because it's lowered because there is the jaw is lower. So, yeah, we don't know, even after the special issue. We don't know for sure.

ASA Publications (11:04)

So this is probably a good segue into our next question. What kind of connections can arise between a word’s sound and its meaning?

Susanne (11:10)

So the bouba-kiki effect is actually well known and goes back to some earlier studies early 20th century on Maluma and Takete. And some researchers investigated the cross-modal correspondence between the visual shape of an object, so an object which is depicted almost like a flower with round shapes, and in comparison to very spiky shapes and the two words “bouba” and “kiki.” And people were asked relate the words to certain shapes. So you can say the auditory components of bouba or kiki to the visual shape of a round or a spiky object. In the early days it was basically written, so it was not produced as an acoustic modality or as a spoken word, the bouba and the kiki, but later on that has been done. So most people, even across different languages and continents and so on, they would associate a round shape with bouba and a spiky shape with kiki. So there's a kind of cross-modal correspondence between the visual shape and the auditory sound. So between the sound modality or the auditory modality and the visual modality, and that has become the famous bouba-kiki effect, which has been investigated in many, many different cultures and cross linguistically.

ASA Publications (12:51)

Can you give us somebackground on this field of research prior to this special issue?

Aleksandra (12:55)

Yeah, so this question has actually been interesting to us humans way, way back. Since Plato, actually, we have the first written evidence of this question, the question of where do words, sounds, and their meaning come from? Is it naturally motivated? So there is a dialogue in one of Plato's dialogues that exactly speaks of this problem. Do names for things, for people, have a natural motivation. And, you know, we developed as humans over ages, had different philosophical streams and they had different influences of how thought about nature, how we thought about what we can feel or we cannot feel, whether, you know, we are made of our thoughts or not.

But leaving this and focusing on linguistics, it was really focused on orthography. We had also already phonetics, but we didn't have the amount of technology that we have right now, that I can record myself on my phone and analyze the audio. I can have chat GPT generate perfect, really perfect speech. And so this is fairly new. But 100 years ago, we didn't have many of the devices that we have now. And we were focused on orthography, on grammar, maybe spoken language at most, which we were writing with letters. And so then the International Phonetic Alphabet was introduced, but still it was a new development. And it was often orthography based.

And as Susanne mentioned, the bouba-kiki effect and the mil-mal effect that I mentioned earlier, they are early 20th century developments, coming also from behavioral science, Gestalt theory, where people started showing, well, there is some inherent motivation to how we perceive sound and the names that contain these sounds that are given to certain objects or shapes. But then this were the 20s, 1920s, and at the same time Ferdinand de Saussure came and Susanne mentioned the arbitrariness of language. So it was a competing philosophy of perceiving language. On the one hand, there are people saying, “Wow, bouba, kiki, people from very different places on earth feel the same way about bouba and kiki, shape and sound.” And then there are other people saying, “No, but a linguistic sign is arbitrary. There is no motivation.” And then it really took off. So then there were structuralists and generativists paradigms, which were focused again on orthography and how language is shaped as a tool. I'm not a general activist or structuralist myself, so I will not be, I cannot say too much about the complexities of these philosophies. But they were emphasizing discrete symbolic rules, iconic links were accidental there.

So then in 1960s, 1970s, especially in 1960, Stokoe published a paper on sign languages. And then it was really the first time that someone said a sign language is a real language, and not some random people gesturing. And then in 1970s, gesture studies started appearing. And, it’s easier for us to buy, to say, “A gesture, it resembles the object.” Yeah?

We can speak with our bodies, we can show something with our bodies. And I think this was really helpful to start discovering the iconic mappings and also, with time, now it's 50 years ago that gesture studies was really born, began to be born, and it started to really push these iconic links into the mindset of linguists, of people who, yeah, we're doing this gestural linguistics and sign language linguistics, and who started saying, “Wow, it really is there.” Rediscovering those studies about bouba kiki, rediscovering the studies about mil and mal, etc. Rediscovering Plato. And now I think since the 90s until today, new instruments started appearing, new methods. We have corpora. We have cameras with very high frame rate, and we have, yeah, our recording devices, and we can, through the internet, we can reach a lot of people and see, wow, there are these links that are not random. They are based on resemblance, really across many minds on earth. And I think this special issue was for us, we were motivated by this increasing interest, and that iconicity studies has established itself, even across people who maybe perceive linguistics more traditionally, who maybe come from these more traditional generativist, structuralist theories, who start saying, “Wow, there is really something to it.” And we're really happy to live in these times and work with people from various philosophies on the same topic that can really unite us as scientists, as linguists, from semantics, pragmatics, phonetics, gesture studies, etc. All the others linguistic levels.

ASA Publications (18:31)

Yeah, yeah, it's very exciting.

Aleksandra (18:33)

Sorry for the long rant.

ASA Publications (18:34)

No, that's totally fine. Totally fine. So sort of building off of that, in your introduction to the special issue, you assert that the special issue underscores a shift in how we understand these phenomena. So what's that shift?

Susanne (18:46)

Yes, there actually, it's not just one. I think there several shifts in this domain. And one might be, as Aleksandra already mentioned, that iconicity research has increased tremendously. So, while this topic has been more traditionally related to maybe language acquisition or maybe language evolution, now it's been considered a frequent property of language. Even nowadays. And our world is full of iconicity even if we speak different languages and so on.

Here are a few examples that we even find this nowadays and it's not something which happened somewhere a million years ago when language developed, but this is something also recent. For instance, if you imagine social media exchanges, right? So for instance, we looked into certain corporam and we found that in social media, which are written, these written texts sometimes have features of prosody which are used in the spoken modality. So if you want to emphasize something or you want to say you want to make something very extreme, for example, a word like long, an adjective, having the meaning of something which takes time, is then written with several letter replications, right? You make it like looooooooong. So, this is, for instance, something which has been invented nowadays, due to social media and finding ways, creative ways, how to express something or describe meaning, and so on. So that's an example from recent years if you like. We can also see iconicity nowadays, imagine you walk through a park or you're running in the morning and you hear dog barking, right? So immediately, when you perceive that you will know it's a big or a small dog on whether it's dangerous or not. There's also some iconicity involved. That's another example, yeah.

But I wanted to talk about new shifts, and I think traditionally… What I wanted to point out here is traditionally iconicity has often been associated with things which happened in the past. But I think nowadays it's more established that iconicity is something which happens in our daily life and in communication in general. It's not something ancient.

Another point I want to make is that there has also been more and more awareness in the field that communication is multimodal. Right? It's not only sounds and speech that is produced, which is audible, but we also speak with our whole body. We use eyebrow raising, for if we disagree or if we want to raise a question, we nod with our heads when we back channel, we say, “Hmm, yeah?” or “Continue to talk” or when we agree or disagree. We do all kinds of postural sways, we do gestures and so on, so they all can contribute potentially to the meaning of what is expressed. And with gesture research and particularly with sign language, iconicity research has had a kind of renaissance or has been born again.

And something was also important, and something what we wanted to, how we in our special issue wanted to move the field forward, was by emphasizing that we should not look only into written language and consider based on the written language, the phonology and look into phonological features of sound symbolism, but we should actually also invest the underlying acoustic and articulatory properties and their perception. Because these properties may be more gradual than phonological features. Phonological features in most traditions are binary, so it's there, a feature is there or not. While if you look into the actual acoustic production, you find something which is more gradual. You have a certain frequency range where a sound is produced and so on.

So I think that's important, or we think that's important, because we gained a lot of knowledge due to field workers going and documenting endangered languages, which is a very important area. And they often wrote the sounds of other languages in terms of the IPA, so they provided phonological descriptions of these languages. But if we look into databases, there are different databases of the sounds of the words' languages, and if we look into them, there is sometimes disagreement. So if you look into UPSIT database, which has been created by Ian Madison and others, and compare it to the IPA, the International Phonetic Association, and their phonemic inventories, sometimes there are mismatches. So some phoneme inventories of the same languages are described with a different number of phonemes. Right? Because field workers may go to a certain area. They are also a little bit biased, or they may to some extent be biased by their own language, by their own writing system and so on. And we think it would be better to, or we would move the field forward, if we actually do the recordings and then do a more data-driven analysis, and then describe what we hear, on the basis of the acoustics or on the basis of articulation. So that was a proposal we made in the special issue, and we think this is the future.

ASA Publications (24:36)

Right, right, right. So basically, it's a way to not describe other languages in terms of your own language, but just look at it as it is itself and the sounds that they are sort of inherently.

Susanne (25:05)

Yes, I mean, field workers often try their best. They do an amazing job, right? I don't want to undermine their role.

ASA Publications (25:10)

Yeah. Right, right.

Susanne (25:12)

And it's also not easy to get acoustic or articulatory data across many continents, right? That would really require a lot of funding and connections and so on. So this is like, getting all these data was an initial step and led to some very fascinating research, what people did, but it can be affected by potential biases field workers have, or people who describe languages. And, you know, getting additional acoustic and articulatory data would be great.

ASA Publications (25:51)

Right, okay. To enhance everything then, basically.

Susanne (25:54)

Yeah. Yeah.

ASA Publications (25:55)

Yeah, got it. So a couple of the articles in this issue looked at how humans perceive iconic relationships between sound and meaning. What did these pieces end up showing?

Aleksandra (26:06)

Well, as we mentioned earlier, we are really blessed by two kind of complementary pathways for those sound-meaning connections that we described as iconic. We have, on the one hand a paper by Bodo Winter and then we another one by Mutsumi Imae and colleagues. And Bodo describes an acoustic-based perception of iconicity. So according to him, it's acoustic based. Articulation doesn't play a role. So that, for the bouba-kiki for example, it doesn't matter that maybe the sounds that make up bouba and kiki. “B” is a voiced obstruent, it's a voiced sound. “Ooo” is a round vowel. So it's articulated with a certain roundness to it. Of course it should refer to a round object. And Kiki, “k,” it's a voiceless obstruent. “Eee” is a closed, unrounded vowel, front vowel. So of course it feels more spiky because it's articulated more spiky, one would say. But Bodo says, well actually, articulation doesn't matter here so much, because acoustics plays the biggest role. If I can perceive it through the acoustic image, as round or spiky, this is what creates the connection. Because even if people do not produce, or cannot, because maybe they do not have the ability to produce the rrrr, it's a tough sound. So then they couldn't have this connection because they cannot articulate it, right? But they still have a connection that rrrr feels rough, which we will talk about later. And that was basically his argument, that we have this feedback loop. We can hear everything we say anyway. So it's acoustic based. I'm boiling it down to a very simple way. Of course, Bodo has a paper of many, many pages, and I encourage everybody to read all the papers from the special issue because here I will be radically reducing the arguments to the way they struck me. And I hope I do justice to the authors. So, but this is how I understand it also from my conversations with Bodo, that really environmental correlations, this perceptual correlations of the acoustics makes it happen.

Susanne (28:44)

So the articulation-based is then not really the opposite, but the study by Imae and colleagues actually, they say articulation matters, articulatory movements matters. And they have a very interesting experimental paradigm because they actually studied participants who were deaf or hard of hearing. So their auditory modality is, to some extent, limited, and they still are able to detect the shape sound symbolism comparable to hearing participants. So that's fascinating. So their title was a little bit provocative and they said, “Sound symbolism without sound.” Right? So how should that matter? And it may be that different modalities actually can be used if one is not so much available, maybe another one, somatosensory feedback and so on, can be used to also detect sound symbolism if you like.

ASA Publications (29:49)

Okay.

Susanne (29:50)

There was another experiment, what they did is, and actually they perturbed the articulation of the participants. That means they perturbed, so they put a spoon in the mouth. And here, that was the case when the hard of hearing participants and the deaf participants performed less good than the normal participants, and that for them was evidence that articulation matters. Because articulation was blocked due to the spoon. Yeah, so that had a severe limitation on the output. And that's their argument for, articulation matters. But doesn't speak against acoustic, that was also clear. So I wouldn't see this as completely different or contrary evidence. I think it's more complementary because Bodo's paper focuses on acoustics and said that acoustic properties are sufficient, while in Imae and colleagues, some of the participants had an absence, or not an absence, reduced auditory modality, and here articulation plays a role.

So the discussion about acoustic versus articulatory targets is a very old one. It goes back some discussions among people at MIT and at Haskins, where people at Haskins were in favor of the articulatory targets while people at MIT, they often looked into invariant acoustic features. So they said, we hear the acoustics and they try to look for acoustic invariance, while at Haskins's set, the underlying representations are articulatory gestures, so our movements, maybe that goes too far, but that kind of topic goes on since the 60s or, I don't know, the 70s of the last century. So it's a very old discussion, with many different fights, and the motor theory, then and the question about speech acquisition. Children have a very different vocal they produce higher acoustic sounds, or the sounds have different frequency domains. How should they imitate having these different vocal tracts and adult? I mean, well, yeah, that's an old discussion.

Aleksandra (32:25)

Sometimes you imagine those scientists fighting at conferences and throwing like things at each other like football fans. But I don't that this is what the, that's why I love that the two papers are featured within one special issue because they show two pathways of accessing the same information, the iconic information and this information is essential. This information is important. As Susanne said, it was important maybe for our ancestors but now. for us nowadays, it's just as important and we have different pathways of accessing them depending on how our perception system works.

ASA Publications (33:08)

Right. Ot makes it even more impactful that way because it's like, okay, it's not just one pathway, it's multiple pathways that underline this effect. Right.

Aleksandra (33:17)

Yes, all roads to Rome. All pathways lead to iconicity. Maybe a very bold claim, but it makes it. Yeah, but having those two pathways, makes it so robust in a sense. So I'm very, very grateful that we are opening the discussion on the special issue with discussing those two papers and that both teams, both Bodo and Mutsumi Imae and colleagues, they teamed up in this special issue. And yeah, it's just amazing. I'm very grateful.

ASA Publications (33:55)

Yeah. Well, let's talk about some more articles from the special issue. So some of the other studies demonstrate how acoustic features serve as conduits for meaning. How so?

Aleksandra (34:05)

So, you know we mentioned bouba-kiki earlier, actually a few times by now. It might come up a few more times. So just please don't mind me. So, in the study I was very privileged to lead with a team of researchers, we found something even stronger than bouba-kiki. And we played sound “rrr,” like in the Spanish “perro,” right, but a little bit longer, to speakers of 28 languages, from Japanese to Polish, German, to Zulu, Palikur, so we had various, various different speakers, different languages. And we found an astounding about 90%, even over 90%, matching performance where people match the sound “rrr” to a jagged line rather than to a smooth line. A smooth line, they matched it with a “lll.” So “rrrr” is rough and “lll” is flat, in a sense. And this really beats the famous bouba-kiki effect by at least 15% matching accuracy.

So... Why is it interesting and why is it iconic? Well, because we see both articulatory and acoustic iconic properties that allow us to trace the behavior or the properties of “rrrr “to a rough texture or a jagged line because “rrrr,” when it's created, the tongue vibrates very rapidly, the tip of the tongue it taps our alveolar ridge and then flows down and creates an oscillation, that again is reflected in the acoustic image, and it creates this kind of like a little saw. And this is exactly the shape that we were asking people to match it with. So really is an interesting iconic relationship, and even people who spoke languages that don't make the distinction between “rrr” and “lll,” so our two sounds in question. Even people speak languages that don't make a distinction between those two sounds, they were matching it just as good as people who distinguish between those two sounds. So the difference between “rrr” and “lll” in Japanese doesn't distinguish between a different meaning. So if I say “rime” and “lime,” it's just the same word. In German, if I say ,these are two different words, two different meanings, I didn't prepare an English minimal pair with “rrr”. Light and right. Light and right. In English, they are two words. So this “lll” and “rrr” distinction is meaningful, but in Japanese it's not, in Korean it's not, for example. But still, speakers of those languages, they were just as sensitive to this iconic relationship as speakers of English, for example, for whom this is a meaningful difference in their linguistic system. So this was for us very interesting to find. And we also found that speakers of languages, like my native language is Polish. And for us, especially this specific sound, “rrr,” is the only sound, the only “r” sound. In English, we have the “r,” right? But in America, but in Scotland maybe they have the “rrr” or another. There are various different sounds of R across the world. In German you have three. You have “rrr” in Bavaria. In the standard high German you use “R”. It's a fricative sound made in the back of your throat. And then you can have a uvular trill. This is the French “rrr.” I'm very bad at doing this so had to prepare myself. It's also one variant in German. Maybe at the border of France in Saarland you will hear it. Susanne, is that true? Maybe. Yes. So Germans can make any of those, and that doesn't make any difference for them. But in Polish we just have “rrr”... And in this study I was describing, we found my country people are a little bit worse in this matching. And we were—

ASA Publications (38:25)

Interesting.

Aleksandra (38:26)

Yes, we were asking ourselves why. And we came up with the explanation that because language is arbitrary ,too, right? We have arbitrary form-meaning relationships, too, not everything is iconic. So if I use, every day I use words with “rrrr” that don't only mean rough, that don't only mean jagged texture, so then my perception is damped a little bit, just a little bit. It's still super high. It's still at 90%. But it's just a little bit worse. It's not at 95%. It's at 90%. Yes. it feels that really our brains make this connection automatically, regardless of the linguistic system that we are born with, that we are raised with, and it feels right in a way.

There was another study that we wanted to mention. So there was a study showing, with Japanese, Korean, English and Mandarin speakers, that sounds have the ability of being pluripotential. So that one sound can mean multiple things, but these connections aren't random, so to say. They're still iconic; they're still motivated. And, for example, what they showed, it's a team led by Kimi Akita. They showed that across languages, people associated “eee” with both small and bright things.

So the “eee” and smallness, I mentioned a little bit earlier with the mil-mal effect, with the mil is a small table, mal is a big table, and they replicate this, essentially, but add on top another sensory perception, perception of light. And if you think about it, this feels right, at least to me. “Eee” is bright, maybe “ah” is a little bit darker. And I don't know what, maybe it has to do also with the light spectrum, I don't know.

ASA Publications (40:22)

Yeah, I was wondering that. Is the light frequencies, how is that compared to, yeah.

Aleksandra (40:28)

But yes, maybe, right? Because essentially this is also a wave ,and speech is also a wave. So it also somehow iconically maps and it makes sense and it feels right. So the sound “eee” has higher frequencies and maybe the specific form and structure also. So the formants that are overlaid on top of the fundamental frequency that are inherent for “eee,” they also mimic smallness but also correspondence to the brightness. So this is what is meant by pluripotentiality of sound symbolism that was shown.

Susanne (41:09)

And they, I think they also created novel words for “diamond.” So maybe the diamond itself is already related to light and brightness.

ASA Publications (41:18)

Interesting, interesting, okay.

Aleksandra (41:20)

We had another study here, our colleague, actually, who was part of the editorial team, Jodi Kreiman. Jodi, hi! She contributed to this special issue. She's an expert in voice quality. So I hope I will do justice to her expertise also. You should read, as I said, you should just read her paper, everybody. She wrote a piece in which she described how we refer to voice quality and where maybe those names may come from. Voice quality suggests that certain acoustic features consistently mean the same thing across cultures because they are tied to survival. So it may be very important for us hear and recognize certain characteristics in the voice because a breathy voice may signal a very different arousal state than a rough voice. It may signal sickness or aggression or something. So it's very important for us recognize it and to name it, to call it and give it a name that is recognizable and traceable, and that we can map on the acoustic features that have been used for signaling dominance or reproductive fitness, emotional states for many, years. And I mean, if you are in a conversation or in a setting during a meeting and you hear someone's voice getting rougher when they are angry, even on Zoom, even when you just hear someone, or breathier when they're tired, then you're actually just picking up on the acoustic cues that were used just the same by our ancestors back in the day. And we are perceiving and trying to find a name for those signals that might have been signaling dominance, or maybe they are a potential mate for me, or maybe they need help, maybe I should help them, they are important to our society. Also these names that we different qualities maybe just, driven by the biological state, may be motivated iconically.

Susanne (43:17)

May I say something about Jody's paper as I understand it?

Aleksandra (43:19)

Yes.

Susanne (43:20)

So there is a real puzzle in voice quality research. And the puzzle is we have descriptions of what is a breathy voice or what is a tense voice and so on. So we have these categories. And when it comes to the actual underlying acoustics, so if we record the voices, which we perceive as maybe breathy or tense, there is often no consistency which features always belong to what is a breathy voice or what’s a tense voice and so on. So there are some consistencies but also not really. So there's a, somehow, if you like, there's a kind of mismatch between perception and production or acoustics. And Jody tries to solve this puzzle and says, we need to make distinction. One is a more a biological principle which has very old roots, how we perceive the voice, and it may also go back to, you know, high and low pitch and stuff like that. And there's something also more cultural which then which may be more individual differences or cultural differences of how something is perceived. So you have like two mechanisms or two principles. One is more like rooting voice quality in evolution, and in something which is biological and which is perceived among humans no matter where they are. And there other voice quality features which are more culturally or socially evoked and perceived, if you like.

ASA Publications (45:13)

So kind of this idea of cross-cultural linguistic similarities versus things that are specific to cultures goes right into our next topic, which is, you mentioned some patterns appear across different languages and cultures. Can you describe some of these findings and explain why these crosslinguistic patterns are significant?

Aleksandra (45:35)

Yeah, so, this is for me personally, my favorite thing about this research, because it shows how cultures are connected, in a way. And, in the earlier question, we started with talking about the R and roughness connection or “rrr” and roughness connection, but this was just one of the studies, of course, that features many, many different languages the special issue. And we had a colleague who analyzed 245 language families, Niklas Erben Johansson, and he found that really iconic sounds are like VIPs of sounds because they consistently appear closer to the beginnings of the words than the non-iconic sounds. And if we look at other research, then sounds that come earlier in the word, because speech is linear, and as it unfolds, us additional information to competitors, other words that possibly the person wants to say. So if I already in the beginning of the word have sounds that are more, giving away more information, then this is really the VIP treatment, because they enable listeners to identify the word faster. So it enhances, possibly can enhance processing. So there's plenty of research on this, morphology also, on the lexical processing and stuff, but what Niklas’s study shows is that really is this relation that iconic sounds appear closer to the beginnings of the word than non-iconic sounds.

And they also are related to prominence. So prominence is the part in the word or in the sentence that is standing out. This is the prominent part. For example, an accent. It allows us to differentiate between pairs of words like OB-ject and ob-JECT. We can do it because of prominence. Then Niklas also showed that if iconic sounds come up in the prominent place, the meaning of the word seems to make more sense. And prominent sounds is also more important across words, because they are the ones standing out, so we pay more attention to them.

And what was amazing is that he found these two… It's a study across almost 250 language families. There's a lot of different languages. And he showed this correlation between iconic sounds come earlier in the words and they come more often prominent syllables too. So this is just what I meant by they get a VIP treatment because they appear in the most important parts of words.

ASA Publications (48:14)

You

Aleksandra (48:29)

Maybe Susanne you want to talk about the Ponsonnet? Yeah.

Susanne (48:33)

Yes, there's another study in our special issuem, which is fascinating to me, that has been done by Ponsonnet and colleagues, and they looked into interjections. So they followed an idea by Mark Dingemanse who proposed that interjection something like, “Oh!”, “Ah!”, and so on, there's some consistency that everything related to pain may have more like this low vowel “ah” included, and that is the case across many, many different languages. And Ponsonnet et al. actually looked across 131 languages from five continents. They looked into dictionaries, found pain words, but not only pain words, but also words for disgust and happiness. They found these interjections, and they looked which sounds occur frequently in these words. And they found that there was a very consistent pattern that pain, as proposed by Mark Dingemanse, had very frequently an “ah” sound included. They also found something new, that these pain interjections had a lot of diphthongs, so, like two vowels which are connected like an ai, ou, oi, and so on. So they had these falling diphtongs include like, “Ayyyy.” So the falling dipthong would be like in the “ouch” word, right? The “ouch: interjection. And they didn't stop there, but they, you know, here's the kicker-- they also actually recorded people vocalizing, and they asked them to vocalize pain, not using words, just raw vocal expressions. And they also asked them to vocalize happiness and disgust. And they found for pain, at least, exactly the same patterns. So it's like our linguistic pain words are fossilized versions in our ancestral cries. So that was really something robust and novel and and was their empirical evidence.

And for the other ones, for the other emotional expressions, disgust and happiness, that was much more dependent on culture and language and so on and so on. So maybe they are much moredifferences how much we are allowed to express disgust or what we understand with happiness and so on. But the pain was very consistent.

ASA Publications (51:27)

Interesting. Okay, so actually, you just mentioned this study related to vowels and interjections of emotional vocalizations. ⁓ And there are these similarities between sounds for many languages. How are iconic associations mediated through social and emotional contexts?

Aleksandra (51:45)

Yes, so, this is partly what Susanne said. It actually, the expression of disgust and happiness, how it differed to the expression of pain. As Susanne mentioned, we might have some cultural restrictions of how open can we be about expression of happiness or how open can we be about expression of disgust? Maybe we're not allowed to say something… And pain is very, while there might still be some cultural, you have to endure pain, right? But sometimes you just cannot resist it. You cannot hold it in because pain is just very, so, so deeply rooted in the experience of being human also, whether it's just physical and emotional pain, right? So thinking about the society or the individual differences may be a little bit like prisms that allow iconic relations be refracted a little bit, so they look the same from a distance but if we examine them closely they be slightly different depending on who is looking, or what is the context that we are finding ourselves in, where we are asking this question. And that’s why I think it's very important to remain open in defining iconicity. And in the beginning I actually was referring to a slightly new definition that is very open and saying it can be perceived and/or produced, right? So this allows this openness about what is iconicity. And I think there are several papers. Yes, on the one hand, the one about the emotional cries of pain, disgust, and joy is certainly one to remember here.

But there is another study that dug into the issue of frequency code. Actually, this is a name that we didn't put out there yet, although we've been speaking of the contents of the frequency code since the beginning of this conversation. So frequency code is this dominance-friendliness access. So friendliness and non-dominance would be like the high-pitched sound, the small dog doing like “woof woof woof,” and then the dominance, big dog, what Susanne said also, the size-sound relationship, size-sound expectation. A big dog would be more like, “woof, woof,” and it's more threatening to us immediately just because we see it. And if we just hear it around the corner, we hear a big bark, this low bark, and we know, “Oh, God, there is a big dog there. I don't want to go around the corner,” right? Maybe it's not leashed. So, this is the frequency code by John Ohala, introduced by John Ohala originally. And it's also a very deep one. So we're breaking it down here to the basics.

A paper in the special issue investigated how a frequency code differs across different populations. So for example, they tested English native speakers and found that the strength of the speech-pitch meaning associations varies really dramatically based on who you are and which voice you're hearing. So, male listeners showed a much stronger association, generally, than female listeners, and especially when they listen to male voices. So, this association between dominance and friendliness and height of the voice. And older men showed the strongest association of all, while younger men's responses were more similar to women's. So there is this gender difference and age difference. So while the frequency code is very deeply rooted in us… Well, it’s rooted in animals. Their are frogs that lower their quacks when they feel that there is another male frog entering their territory. So they start quacking lower. Of course, it's because of the testosterone. So it's more like a hormonal response. They don't have much control of it. But we don't know exactly when it started to be controllable on the course of evolution. When did we start not to only respond to testosterone or hormones, but to respond to our wishes? “Oh, now I want to be dominant. Frog, go away.” So, right, while it might have these evolutionary roots, it can get amplified or dampened by social experience. So as they put it, as Sasha Calhoun and colleagues put it, these associations have a shared extra-linguistic basis, but the strength and availability varies according the to listeners' experiences and beliefs.

Susanne (56:31)

Which means, you know, we are also shaped by our attitudes, by our... I mean, perception is nothing neutral. Perception is something like a filter-- what, you know, what you believe, yeah, about your attitudes, stereotypes, and so on and so on. That also plays a role.

Aleksandra (56:49)

Yeah. And there was also another study that again shows the power of iconicity playing together within the rules of the linguistic system. We weree talking a little bit about this, how it's shaped by the linguistic system, even in a language where pitch is supposedly just functional. So in Mandarin Chinese, tone distinguishes word meanings. I can't imitate it. I can't give you an example. I'm sorry. Because I don't speak any tone language.

ASA Publications (57:15)

That's okay.

Aleksandra (57:17)

I don't hear the difference myself. Even though I'm a phonetician, I don't hear it. That that wouldn't be fair to the billions of speakers of Mandarin Chinese. But still, where it's purely functional, we might say, emotional iconicity still sneaks in there. So it can shatter this system a little bit. The researchers who contributed,

Zheng and colleagues, investigated Mandarin Chinese, and they found that certain tonal patterns systematically bias emotional responses. So words with falling, falling tone sequences like the... Oh! I have an example actually I prepared. Dum-dum. Sorry, I mean really for the next interview I will learn a bit Mandarin. That will be the biggest contribution. So imagine for us non-tonal language speakers the dum-dum pattern. So those kinds of words received very high arousal ratings, higher than the rising-rising sequences even. And when they tested this with meaningless nonsense words, the effect got even stronger because there there was no restriction of the system perhaps, and they discovered some valence effects, too. So falling patterns felt more negative and rising patterns more positive.

Again, we have the spectrum here, like with the light a little bit, we were talking about brightness and the acoustic being just wave. So then, beautiful iconicity. So even when pitch is doing a serious job, being a linguistic system, distinguishing words, our brains are kind of still running this emotional-acoustic algorithm in the background. And maybe you are having a conversation, understanding the words in one language here, but there is an emotional undertone in another language going in the background. So this was very interesting also to read this contribution.

ASA Publications (59:28)

So does Iconicity play a role in language acquisition?

Aleksandra (59:31)

To answer the question briefly: yes, but. And now comes the yes part. There is this brilliant hypothesis called the sound symbolism bootstrapping hypothesis by Mutsumi Imai and Sotaro Kita that basically argues that iconicity is like the training wheels for language learning. When you first think about how, I don't know if you have a child, if our audience has a child or some experience with children. Babies first words are often, and even if we talk to them, are often, oh look, there is a vowel vowel. There is a woof woof. There is a car. It goes vroom vroom. The cow goes moo. So we always use these iconic words and these iconic words are, of course, loaded with iconic information.

And through the imitation, the sound, they make it easier for a child to remember them, to recognize the reference. Because in a child's language, there's constantly, the so-called Gavagai problem. Because when you say to a child, “Oh, look, it’s blue.” They don't know if it's a color, as it is, or if it's… that it's spiky or that it's a cup or whatever. It can be anything. And that's a big problem for a child.

ASA Publications (1:00:49)

Right. Right.

Aleksandra (1:00:50)

It doesn't even know where one word begins and where it ends. Of course, we have prosody helping, right? But it's everything coming at once. So iconicity should also be, as these training wheels, helping the child to make the reference and make the reference persistent and make it easy to repeat. So we see that there are studies showing that caretakers use more iconic words and iconic means. With prosody also, when we talk to children, you talk like, “Yes! !hat's very good!” A lot of prosodic modulation, a lot of prosodic words. We use them to children and children also use iconic words. And at some point it stops. At some point children should start speaking normally, right? They should start writing, they should start talking like these words we don't write. Yes, English orthography is a nightmare, but they should use it, they should learn it. So at some point you take off the training wheels and there are a few bumps on the road and the child might fall and maybe it's not even wearing a helmet. But…. That's how which learning might feel like some point. But iconicity in the first place, yes. It is helpful. But now maybe comes the but. I give the word to Susanne.

Susanne (1:02:14)

Yes, yes. So the story is even more complicated than this. So we have a study by Suzanne Aussems and colleagues in the special issue, and they tested 14-to-17-months-old infants with what should have been perfect iconic cues. So they use high-pitched voices for small objects and low-pitched voices for large objects, so there's a perfect match, plus hand gestures showing the size. And what happened? These infants completely failed to use these.

ASA Publications (1:02:55)

Oh no!

Susanne (1:02:56)

No preference whatsoever. So I'm very happy also that, you know, usually in many journals it's only possible to publish positive results, but here we have a strong experimental paradigm, very coherent and strong methodology and negative results. So there are no significant differences or no, yeah, there's no difference here. And that now, how do we deal with this? This could have been devastating for iconicity helps acquisition hypothesis, but instead the authors described it in a more nuanced way and more interesting. So the authors showed that different types of iconicity have different developmental timelines. So, the segmental sound symbolism we talked about earlier like E sounds small might be more accessible to young infants than prosodic features, like pitch height or gestural iconicity depicting large or small objects. So this is the discussion they propose and… Yeah, time matters for development, clearly. But it also opens up new questions which should be answered in the future about what is the timeline of gesture plus pitch and their correspondence? When does iconicity play a role? When it can be perceived? When does it matter? When is it important and when not anymore? So I think that's something we need to investigate in the future.

Aleksandra (1:04:44)

Yes, and even if we think about what we said about the social factors, gender differences and all these things, if this also comes to play, the possible cultural differences come to play, well, babies just start to absorb these, so this might of course be another factor. So we need more studies of different developmental stages, of continuous measures, not only discrete measures, but continuous measures of iconicity across, many cultures, across many ages, with diverse populations. And we think this is the way to go because iconicity is of course helpful, but it may not be sufficient for language acquisition. It's not the only thing. It's like a map, but you also still need to learn certain patterns to reading the map. Where are the shortcuts that you can take and the local customs and where do you go for coffee for the best flat white? Right, you need to learn it. The map is not enough, right? So we still have enough evidence to say that iconicity boosts learning in children and this is a super interesting study to show how diverse this world of iconicity may be.

ASA Publications (1:06:02)

Yeah, yeah, So you already talked about some future research for talking about language acquisition. What do you see as the future of research in this field in general?

Susanne (1:06:11)

I think we both agree that it would be super, super cool to join forces and have a kind of cross-linguistic, acoustic articulatory lab where we could combine data from many, many different countries, cultures, language families, and so on and do more studies, and boost what has been done already in the written domain or in the phonological domain to an acoustic articulatory domain, I think that would be fascinating. So that's something we would really like to carry out in the future.

ASA Publications (1:06:52)

Yeah, yeah, it would be amazing.

Aleksandra (1:06:54)

Yes, I think just echoing what I a little bit said before, having more recordings of diverse populations. Now, we, of course, we are still limited. We cannot take a very complex articularography into certain places, and we cannot take speakers into our labs. But maybe there are other methods. Now we have ultrasound, for example, to record articulatory movements. We even have certain methods to do neurolinguistic studies, like the FNIRS, et cetera. This is a portable method. And I'm not, myself, I'm not a researcher who would do it. I don't have an expertise in this. But I would just like to be the voice of saying, let's go and do this. Try to reach as diverse populations as we can because building cognitive linguistic and linguistic theories on a subset of populations that are not even representative, because WEIRD populations, western educated industrialized rich and democratic populations, are an outlier. They are not the majority. We are. We. Necause I also belong to these populations. I was privileged, born in Europe. I can travel, you know, I'm privileged. And to me, this is a learning experience. My time on Earth is limited. I will not be able to go to all the countries. But I can do my best as a researcher to try to reach as many populations and learn from them.

And this is to me so beautiful and I wish we could do it more, just through collaborations, enabling researchers from the global south so that their languages can be heard, because not us speaking English or German or big languages are the majority. We may be the loudest voices on the internet, but we are not the global majorities, and I think this is in order to have real cognitive theories of what constitutes us as humans, what constitutes the language, we should strive to really close this gap.

ASA Publications (1:09:10)

Do you have any other closing thoughts?

Aleksandra (1:09:13)

This was partly a little bit of my closing thoughts. That maybe sounds a little like, big, idealistic, I don't know, like a cult.

ASA Publications (1:09:25)

I love it

Aleksandra (1:09:26)

Yes, think just language is not only about what we think, but what we feel, because language comes from the body. But we live with our bodies in a world that is made immaterial. We of course also have feelings that may not be material, but we feel them and these sensory perceptions that constitute us, they are part of how we communicate and what we communicate and how we create language and ultimately how we create ourselves, also, because language is a part of us.

And, to me, research on iconicity is very beautiful because it is a little bit about the feeling, this unanimous feeling that I often when I give a presentation about bouba-kiki or some other phenomena and I ask people, so what does it mean? And everybody says, well, this means this. And I say, well, this feeling that we are sharing is because there is a cross-modal correspondence. And I'm sharing this feeling with Cantonese speakers, with Zulu speakers, with Korean speakers. And I've never dreamed of this when I was growing up in my small town in central Poland, that I will be able to do this. And I'm very grateful that this research is enabling me to do this. And I want to give back with this research to show that maybe there are borders between our languages and between us and political, especially, you know, it's very tough time to be alive, politically, geopolitically. And I want to say that iconicity will bring peace. But maybe a feeling of just for one person or for two people, of like we are not that different. Even though we are, there is a border between us, but there is something connecting us, too, this feeling that we share that this means this.

Yeah, this may sound very idealistic, but this is what I'm very grateful for doing this research.

Susanne (1:11:40)

What I like to say is that not everybody may agree with what we have said here in this podcast. So we opened up many different topics, and we tried to talk about the different papers within the special issue, but I think even if people disagree with what we said, this is a good opportunity to come together and to talk about this because that's a very interesting topic and we should exchange on that and maybe even also think about future studies and joint work.

ASA Publications (1:12:21)

Yeah, this is such a fascinating area of research. You make me want to go back to grad school and study it myself. It's really cool to hear about how there could be so much commonality between the sounds used in languages and how much that commonality can be based in sort of the inherent meaning the sounds have. I really appreciate you taking the time to speak with me today and have a great day.

Aleksandra (1:12:40)

Thank you so much for having us. Yes, it was a pleasure.

Susanne (1:12:42)

Thank you so much.