Heliox: Where Evidence Meets Empathy 🇨🇦
We make rigorous science accessible, accurate, and unforgettable.
Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.
We dive deep into peer-reviewed research, pre-prints, and major scientific works—then bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Heliox: Where Evidence Meets Empathy 🇨🇦
Vocal Fry: The Truth About Whose Voice Creaks
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
📖 Read the companion essay on Substack
🎥 YouTube: https://www.youtube.com/channel/UCd5BbCEeC3Z6dp-nNjWRbBw
🎙️Available for Broadcast: https://exchange.prx.org/group_accounts/253118-heliox_where_evidence_meets_empathy
Researchers analyzed 92,000 individual vowels from 49 Canadian public figures — including Céline Dion and Justin Trudeau — using an automated acoustic pipeline with zero human bias. What they found overturns a decade of cultural certainty: men creak more than women. Older speakers creak more than younger ones. And the reason society got this so spectacularly backwards has everything to do with pitch contrast, cognitive bias, and the fact that our brains actively rewrite what our ears receive.
In this episode of Heliox: Where Evidence Meets Empathy, we take a deep dive into the sociophonetics of creaky voice — unpacking the biomechanics of the larynx, four objective acoustic metrics, the observer's paradox, presbyphonia, and the psychological phenomenon of perceptual false alarms. By the end, you will never hear a voice the same way again.
What you'll learn:
- Why vocal fry is a biological reality, not a trend
- How men's voices fly under the perceptual radar
- Why young women became the scapegoat of an acoustic illusion
- What 92,000 vowels tell us about how bias shapes perception
••How to apply this evidence to your own listening — and your own judgments
This is Heliox: Where Evidence Meets Empathy
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Disclosure: This podcast uses AI-generated synthetic voices for a material portion of the audio content, in line with Apple Podcasts guidelines.
We make rigorous science accessible, accurate, and unforgettable.
Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.
We dive deep into peer-reviewed research, pre-prints, and major scientific works—then bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs
This is Heliox, where evidence meets empathy. Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe easy. We go deep and lightly surface the big ideas. You know, usually when we talk about human senses, especially hearing, there is this ingrained expectation of like objective reality.
Speaker 2:Right. Yeah. We like to think of our ears as these perfect biological microphones.
Speaker 1:Exactly. You imagine that when you listen to someone speak, the sound waves hit your tympanic membrane and it translates that physical pressure into an electrical signal. And that is just it. The brain receives exactly what happened in the physical space.
Speaker 2:It is a really comforting idea, isn't it?
Speaker 1:Yeah, it really is.
Speaker 2:This whole notion of a pure, completely unfiltered feed of reality. But it's also one of the greatest illusions of human biology.
Speaker 1:Oh, absolutely.
Speaker 2:We want to believe in this direct physical transaction between the world and our brains. But the reality is that the human auditory system is not some passive microphone. It is an active, highly opinionated interpreter.
Speaker 1:Right.
Speaker 2:And by the time a sound actually registers in your conscious awareness, It has been run through just an incredibly complex psychological and sociological filter.
Speaker 1:And when you step into the world of sociolinguistics, you realize that that pristine microphone has been run through like a dozen hidden distortion pedals.
Speaker 2:Oh, at least a dozen. Yeah.
Speaker 1:We are looking at a landscape of human perception that is quite honestly just totally murky. I mean, the brain is constantly editing, highlighting and even entirely inventing audio data based on what it just expects to hear.
Speaker 2:Which brings us perfectly to the topic we are jumping into today.
Speaker 1:Yes, it does.
Speaker 2:Because nowhere in modern linguistics is that perceptual distortion more obvious or honestly more culturally explosive than what we are about to discuss right now.
Speaker 1:Absolutely. Because today we are taking a deep dive into something you have definitely heard. Something you almost certainly have a very strong opinion about. Oh, no doubt. And something that has caused an absolute cultural uproar over the last decade or so. We are talking about the phenomenon known as vocal fry.
Speaker 2:Right, which is also referred to in the scientific and sociophonetic literature by the slightly more clinical and perhaps less appetizing name of Creaky Voice.
Speaker 1:Creaky Voice. The creak. I mean, if you were listening right now, you know the sound. It is that low, gravelly, popping, almost rattling sound that happens, usually right at the end of a sentence. Yeah. It sounds like someone's voice is literally running out of gas and just sputtering on the fumes. And culturally, it has been slapped with all sorts of highly subjective labels. Oh, definitely.
Speaker 2:It has been called a fashion trend, a linguistic epidemic, an affectation.
Speaker 1:Right, an affectation. And most notoriously, in the public discourse at least, it is almost exclusively blamed on young, upwardly mobile women.
Speaker 2:The narrative over the last, you know, 10 to 15 years has been incredibly consistent and frankly quite aggressive.
Speaker 1:Yeah, it really has.
Speaker 2:If you look at media coverage, opinion pieces, and even some early flawed academic papers, the sound is heavily penalized. It is described as grating or annoying.
Speaker 1:Right.
Speaker 2:And perhaps most damagingly, when we talk about real-world consequences, it is frequently labeled as unprofessional in the workplace.
Speaker 1:It is wild how much heat this specific phonetic sound takes. I mean, we don't see thousand-word op-eds complaining about how people pronounce their vowels, but vocal fry makes front-page news.
Speaker 2:Yeah, it is a total lightning rod.
Speaker 1:But here is the thing, and this is the core reason we are doing this deep dive today. We have an absolutely massive stack of research sitting right in front of us.
Speaker 2:The literal stack.
Speaker 1:Anchored by this fascinating, incredibly detailed sociophonetic study titled Vocal Fry, a sociophonetic study of creaky voice across language, gender, and age in Canadian, English, French, bilinguals.
Speaker 2:And what makes this specific piece of source material so compelling and so completely disruptive to the popular narrative is that it completely strips away the subjective human element.
Speaker 1:Right.
Speaker 2:The researchers designed a methodology that just removed the social media complaining, the cultural hot takes, and most importantly, the inherently biased human ear.
Speaker 1:Exactly. So our mission for this deep dive is to act as sonic detectives. We are putting down the opinion pieces, we are turning off the cultural commentary, and we are relying purely on objective acoustic data.
Speaker 2:Just the math of the sound.
Speaker 1:Yes, exactly. We want to know what is actually happening in the physical sound waves. Not what we think we hear, but what is physically measurable in the air.
Speaker 2:Which means we need to ask you, the listener, sitting right there, to do something a little difficult.
Speaker 1:Yeah, bit of a challenge.
Speaker 2:We need you to temporarily suspend all your preconceived notions about who uses vocal fry, what it means, and why people do it.
Speaker 1:You might think you know exactly what vocal fry sounds like, and you probably think you know exactly who does it the most. But today, we're going to show you how your own ears might be playing massive psychological tricks on you.
Speaker 2:And they really, really are.
Speaker 1:Okay, let's unpack this. Because before we can even begin to figure out who is creaking in society, we need to understand the anatomy of a creak in the physical world.
Speaker 2:Right, the mechanics of it.
Speaker 1:Yeah. What is actually happening inside the human body when this sound is produced?
Speaker 2:Well, to understand the mechanics of creak, we have to travel down into the larynx, specifically to the vocal folds. Now, when you are speaking in a standard, clear voice, what phoneticians actually call modal voice, your vocal folds are coming together and vibrating at a relatively regular, rapid, and periodic pace.
Speaker 1:So like a nice, even vibration.
Speaker 2:Exactly. It's a very balanced state of muscular tension and airflow. But when a speaker shifts into creaky voice, the biomechanics of the larynx shift dramatically.
Speaker 1:So let's break down that biomechanical shift. The vocal folds are these two muscular bands spanning across the airway, right?
Speaker 2:Correct. Yeah.
Speaker 1:So what is actually happening to the tension in those bands to produce that rattling sound?
Speaker 2:Okay. So two very specific simultaneous muscular actions happen to produce a true creak. First, you have an increase in what we call adductive tension.
Speaker 1:Adductive tension.
Speaker 2:Right. The lateral cricoradenoid muscles engage, which basically means the vocal folds are pressed together more tightly along your medial edge. They're coming together with a lot more compressive force.
Speaker 1:Okay. I am with you.
Speaker 2:But second, and this is the really crucial part, you have a decrease in longitudinal tension.
Speaker 1:Wait, so hold on. They are squeezing together tightly side to side, but they are relaxed end to end.
Speaker 2:Yes, precisely. The thyroatinoid muscle, which makes up the main body of the vocal fold, it relaxes. So the folds become highly compressed together, but they're also thick, slack, and kind of loose along their entire length.
Speaker 1:Oh, wow.
Speaker 2:And because they are thick and tightly pressed together, the air pressure building up from your lungs, the subglottal pressure, has a really hard time pushing through.
Speaker 1:Because it's trying to push through a thick, heavy, closed doorway rather than a thin, taut one.
Speaker 2:Exactly. It takes work. So instead of a smooth, rapid, continuous vibration, the air pressure has to build up significantly just to blow the folds apart for a split second.
Speaker 1:Oh, I see.
Speaker 2:And because the folds are slack, they just snap back together very heavily and very irregularly. The air bubbles up in slow, distinct, uneven bursts. Bop, bop, bop, bop.
Speaker 1:Yeah, yeah.
Speaker 2:That slow, irregular vibration is what produces the physical acoustic sound of creaky voice.
Speaker 1:That makes total sense when you think about the sound itself. I mean, it literally sounds like air struggling to bubble through a thick, compressed space.
Speaker 2:It really does.
Speaker 1:It has that damp, sputtering quality. But here is the massive methodological challenge, right? For decades, when linguists and scientists wanted to study this phenomenon, they relied on something called impressionistic coding.
Speaker 2:Which is really just an academic way of saying they use their own ears.
Speaker 1:Right. A researcher would literally just sit in a room with a pair of headphones, listen to hundreds of hours of audio recordings, and just make a subjective judgment call. Like, yep, that vowel sounded creaky, or nope, that wasn't creaky.
Speaker 2:And while those early linguists were highly trained, I mean, human perception is fundamentally flawed.
Speaker 1:Yeah.
Speaker 2:We just established that the brain filters audio data.
Speaker 1:Yeah.
Speaker 2:So if you are relying purely on human perception, your scientific data is going to inherit all the biases of the human listening to the tape.
Speaker 1:Absolutely.
Speaker 2:If they expect a certain demographic to creak, they're going to unconsciously hear more creak in that demographic. And it's just unavoidable.
Speaker 1:But our source material today completely bypassed that flaw. The researchers didn't use impressionistic coding at all.
Speaker 2:Nope, not at all.
Speaker 1:They used an automated computer pipeline to extract precise acoustic measurements from over 92,000 individual vowels. 92,000.
Speaker 2:It is a staggering amount of data.
Speaker 1:There is no human bias in a computer script measuring sound wave frequencies.
Speaker 2:Right. And what's fascinating here is the sheer scale and the computational rigor of the data. The researchers didn't just look at one basic metric either.
Speaker 1:No, they went all in.
Speaker 2:They deployed an entire acoustic toolkit to measure this biological phenomenon from multiple mathematical angles. And we really need to break down these core metrics because understanding the underlying physics here is the foundation of everything that comes next.
Speaker 1:I am fully on board with that. Let's go through the tools in this acoustic toolkit. Metric number one, F0, what are we measuring here?
Speaker 2:So F0, or the fundamental frequency, is just the scientific measurement for pitch. It is measured in hertz, which is just the number of vibration cycles per second.
Speaker 1:Okay, simple enough.
Speaker 2:Now, tying this back to the biology we just discussed, because the vocal folds are thick and relaxed during creak, the vibration is very, very slow.
Speaker 1:Right.
Speaker 2:And slow vibrations produce low frequencies.
Speaker 1:Right.
Speaker 2:So the most basic acoustic indicator of creaky voice is a very low F0.
Speaker 1:Got it. The slower the fold vibrates, the lower the pitch, the more likely it is a creak. But then there's a second metric tied to that, which I just found completely incredible. Unreliable F-0 tracks.
Speaker 2:Oh, yes. This is a fun one.
Speaker 1:It sounds like the computer software literally throwing its hands up and just giving up.
Speaker 2:That is essentially exactly what is happening. Acoustic software uses pitch tracking algorithms, often relying on a mathematical process called autocorrelation, to map the pitch of a voice.
Speaker 1:Okay.
Speaker 2:And the algorithm assumes a human voice will have a certain level of rhythmic regularity. It looks for a repeating pattern. But creaky voice, by its very biological nature, is highly irregular. The pops of air are totally unevenly spaced.
Speaker 1:So the signal gets so chaotic that the algorithm just breaks?
Speaker 2:Yes. The tracking algorithm literally fails and returns an undefined error because it just can't find a periodic pattern to measure.
Speaker 1:That is hilarious.
Speaker 2:But the researchers realized that this failure isn't a bug. It's a feature. They use the failure itself as a metric. A higher proportion of undefined or unreliable pitch tracks is a strong objective indicator of irregular creaky phonation.
Speaker 1:I love that so much. The vocal fry is so chaotic, it literally breaks the algorithm. Okay, next tool in the kit, H1H2, which the study refers to as spectral tilt. Right. This is where things get a bit mathematically dense, but it is so crucial. How are we measuring spectral tilt, and what does it actually tell us about the throat?
Speaker 2:This is a really brilliant metric. When the vocal folds vibrate, they don't just produce one single frequency. They produce a complex wave made up of a fundamental frequency that's our iso and a series of overtones called harmonics.
Speaker 1:Okay, I follow.
Speaker 2:H1 is the amplitude or the loudness of the first harmonic. H2 is the amplitude of the second harmonic. Spectral tilt just measures the difference in loudness between the two.
Speaker 1:So we are comparing the volume of the lowest pitch to the volume of the next pitch up. How does that translate to physical tension in the throat, though?
Speaker 2:It really comes down to how sharply the vocal folds are closing. When you have high adductive tension, when those vocal folds are pressed tightly together to make a creak, they snap shut very abruptly.
Speaker 1:Okay.
Speaker 2:In acoustic physics, an abrupt, sharp closure of the glottis sends a lot of acoustic energy into higher frequencies. Because more energy is pushed into the higher harmonics, the amplitude of H2 rises. Which means the difference between H1 and H2 gets smaller.
Speaker 1:Okay, I think I follow that. A tighter, sharper closure in the throat creates higher frequency energy, which lowers the overall H1H2 number. So a low spectral tilt equals a tight glottis, which equals creek.
Speaker 2:Exactly. It's a way to use the mathematical distribution of sound energy to literally see how tightly a person's vocal folds are compressed together without ever having to actually put a camera down their throat.
Speaker 1:That is incredible. And finally, the last major metric in the toolkit, CPP and HNR under 500 hertz. These stand for sepstral peak prominence and harmonics to noise ratios. Basically, these algorithms are looking for noise in the audio signal.
Speaker 2:That's right.
Speaker 1:Now, I want to push back on this a little bit, because if I am listening to a human voice, especially a clear, articulate voice, I consider that a clear signal. Isn't a voice supposed to be musical? Why are acoustic scientists specifically going on a hunt for noise?
Speaker 2:That is a really fundamental question about the nature of phonation. When we think of a perfectly clear modal voice, like an opera singer holding a perfect note, we are talking about a highly periodic signal.
Speaker 1:Like a clean wave.
Speaker 2:Exactly. It is a clean acoustic wave repeating perfectly over and over. But creaky voice is fundamentally a periodic. It breaks that perfect repetition. And in acoustics, any deviation from perfect periodicity is mathematically defined as noise.
Speaker 1:Ah, okay. It makes me think of an engine. It's like comparing a smooth humming electric car engine to a classic heavy muscle car idling at a low RPM.
Speaker 2:I like where this is going.
Speaker 1:When an electric car runs, it's just a pure high-pitched hum, perfect periodicity. But that old muscle car engine, it sputters, it rumbles, the exhaust pops unevenly. That rumble, that irregular guttural sputter is the noise we are measuring, right? The computer is looking for the acoustic equivalent of a sputtering engine.
Speaker 2:That is an exceptionally accurate analogy. the vocal folds in Cree are acting exactly like that idling muscle car. And the brilliance of these specific metrics measuring the noise, specifically under 500 Hz, is that they differentiate between different types of biological noise.
Speaker 1:Because there's more than one way a voice can be noisy.
Speaker 2:Yes. If you have a breathy voice, like you were whispering, the vocal folds aren't closing all the way. Air is rushing through, creating high-frequency aspiration noise.
Speaker 1:Like a hiss.
Speaker 2:Exactly. It sounds like the hiss of wind. That high-frequency hiss would completely ruin our data if we were looking for a creek, but the sputtering muscle core noise of a creek happens at the very bottom of the frequency range. Oh, I see. So by telling the computer to only look at the noise under 500 hertz, the researchers filter out the wind noise and perfectly isolate the true biological creek.
Speaker 1:Okay, so we have our acoustic toolkit fully unpacked. We have computers objectively measuring low-pitch logging broken tracking algorithms, calculating tight vocal folds through spectral tilt and isolating low-frequency sputtering noise.
Speaker 2:A completely objective setup.
Speaker 1:We have an impenetrable objective way to measure vocal fry. The next logical question is, who exactly are we testing these tools on?
Speaker 2:And this is where the methodology of the study gets incredibly clever. If you are a sociolinguist and you want to know how social factors or demographics or even different languages affect human speaking behavior, You have a massive confounding variable you have to control for.
Speaker 1:Which is.
Speaker 2:You have to control for biology.
Speaker 1:Right, because interspeaker variability is huge.
Speaker 2:Oh, massive.
Speaker 1:If I just take an audio recording of an American English speaker and compare it to a recording of a Parisian French speaker and the American is measurably creakier, I don't actually know why. Is it because the English language is inherently a creakier language? Or is it just because that specific American guy happens to have physically thicker, heavier vocal folds than that specific French guy? Biology just gets in the way of the linguistic data.
Speaker 2:Precisely. The anatomical differences, vocal tract length, laryngeal mass, cartilage density they make comparing two different people, incredibly difficult if you are trying to isolate language. So the researchers bypassed this problem entirely by using Canadian French bilinguals.
Speaker 1:And not just random people in a lab either. They gathered a massive data set of 49 public figures from Ontario and Quebec. 25 men, 24 women. Right. And these are people the listener might actually recognize. We are talking about Justin Trudeau, the prime minister. We are talking about Celine Dion.
Speaker 2:Yeah, very famous voices.
Speaker 1:Acclaimed filmmakers like Denis Villeneuve and Xavier Dolan. We have comedians, actors and journalists.
Speaker 2:And what is crucial about this data set is the nature of the audio. They did not bring these public figures into a sterile laboratory and ask them to read phonetically balanced sentences off a clipboard.
Speaker 1:No, they didn't.
Speaker 2:They pulled their data from spontaneous speech. So podcasts, radio interviews, unscripted television talk shows.
Speaker 1:Why does that matter so much? I mean, does reading from a script change the acoustics?
Speaker 2:Oh, drastically. In sociolinguistics, there is a concept known as the observer's paradox. When people know they're being recorded for a linguistic study or when they're reading text, they naturally hyper articulate.
Speaker 1:They get stiff.
Speaker 2:Yeah, exactly. They adopt a formal, unnatural register. But spontaneous speech in an interview setting gives researchers access to the vernacular. It shows how these people actually utilize their vocal tracks in the real world when they're focused on communication, not pronunciation.
Speaker 1:And because every single one of these 49 people is highly bilingual, the researchers have the ultimate biological control group.
Speaker 2:Yes, they do.
Speaker 1:Celine Dion's vocal tract is Celine Dion's vocal tract, whether she is speaking English to an American interviewer or French to a Quebecois interviewer. The biological hardware, the laryngeal mass, the lung capacity is exactly the same. The only variable changing is the software, the language she is speaking.
Speaker 2:This specific control allows the researchers to answer a major longstanding debate in the linguistic community. For years, there has been a prevailing theory that English is simply an inherently creakier language than other languages.
Speaker 1:I have actually heard this. The idea that something about the rhythm or the vowels of English just naturally lends itself to vocal fry.
Speaker 2:Yes. English is a stress-timed language, meaning we highly emphasize certain syllables and dramatically reduce others. whereas French is syllable-timed, meaning each syllable gets roughly equal rhythmic weight.
Speaker 1:Okay, that makes sense.
Speaker 2:Previous studies, often relying on that flawed human impressionistic coding, suggested that the prosody of English caused speakers to creak far more than French or Spanish speakers. So to test this, the researchers put the 92,000 vowels from these bilingual Canadians through their automated acoustic pipeline.
Speaker 1:So what does this all mean? Is the English language inherently creakier than French?
Speaker 2:The results were a massive plot twist. Surprisingly, the answer is no. When you look at the objective acoustic data across all these bilingual speakers, there is virtually no significant difference in creakiness between the languages within the same speaker.
Speaker 1:Wow. If we connect this to the bigger picture, this finding is profound. It means that your vocal quality, your tendency to use vocal fry, is not dictated by the grammar, the prosody, or the vocabulary of the language you happen to be speaking.
Speaker 2:Not at all.
Speaker 1:It is fundamentally speaker dependent. you carry your own unique vocal fingerprint with you, regardless of the linguistic software you
Speaker 2:are running. Exactly. Justin Trudeau's baseline level of biological creek is essentially the same in English as it is in French. The only microscopic exception the data showed was a slightly higher number of those unreliable pitch tracks when speaking English. But across the board, the pitch, the glottal constriction, the harmonic noise, it was remarkably stable across languages.
Speaker 1:And just to ensure their data was absolutely watertight, the researchers also checked the bilingual data for known linguistic variables. They confirmed that regardless of whether the speaker was speaking English or French, CREAK happens exactly where the physics of the vocal tract dictate it should happen.
Speaker 2:Right. The data showed CREAK predominantly at the ends of sentences, what linguists call utterance final position during moments of fast speech and when speakers use low vowels like the ah sound.
Speaker 1:Which makes perfect physiological sense. At the end of a long sentence, you are naturally running out of breath. Your subglottal pressure drops, your vocal folds relax, and your pitch falls. It creates the perfect environment for that...
Speaker 2:So through this meticulous methodology, we have built a rock-solid foundation. We know the biomechanics of a creek. We have the acoustic tools to measure it without human bias.
Speaker 1:Yes, we do.
Speaker 2:And we have proven that the data holds up across languages, because the biology, not the language, dictates the sound.
Speaker 1:Now that the baseline is set and the acoustic truth is established, we have to talk about the demographics. Because this is where the data takes an absolute sledgehammer to everything society thought it knew about vocal fry.
Speaker 2:We really need to look at the data on gender.
Speaker 1:Exactly. The entire cultural narrative, the thousands of opinion pieces, the viral videos mocking vocal fry, they all point the finger at one specific demographic. Young, upwardly mobile women. Exclusively almost. The stereotype is so deeply ingrained in our culture that it is treated as an established fact of nature. I mean, if you ask 100 people on the street who uses VocalFry the most, 99 of them are going to say young women.
Speaker 2:But as we've established, the computer algorithms don't read opinion pieces. They don't have Twitter accounts.
Speaker 1:Don't they don't?
Speaker 2:They only read the mathematics of sound waves. And when the researchers ran the objective acoustic data on those 25 men and 24 women, the reality was shocking.
Speaker 1:Here's where it gets really interesting. Who is actually creaking more?
Speaker 2:The acoustic data reveals, without a shadow of doubt, that men's voices are unequivocally creakier than women's.
Speaker 1:Wow.
Speaker 2:And we are not talking about a slight edge on one obscure metric. Across every single measurement we discussed, men exhibited significantly more creak.
Speaker 1:Walk me through the actual metrics. Like, how definitive is this?
Speaker 2:It is definitive across the board. Men had significantly lower F0, which we expect, But they also had vastly larger proportions of those unreliable broken pitch tracks.
Speaker 1:The broken algorithms.
Speaker 2:Right. Their algorithms failed much more frequently. They had shallower spectral tilt, meaning their vocal folds were closing more tightly and sharply than the women's. And they had lower H&R ratios, meaning they produced significantly more of that low-frequency muscle car sputtering noise.
Speaker 1:Hold on. Let me just process this. The science proves that men do this more. And not just anecdotally, but definitively across every objective acoustic metric we possess?
Speaker 2:Yes. The data is completely unambiguous. Men exhibit far more glottal closure and irregular voicing than women do in spontaneous speech.
Speaker 1:If you are listening to this right now, your jaw should be on the floor. I know mine is. Because I have to push back on behalf of everyone listening, and honestly, on behalf of logic itself.
Speaker 2:Please do.
Speaker 1:If men are definitively biologically doing this more, Why on earth are young women taking all the heat in the media? Why are women constantly being penalized? Why are they being told they sound grating or unprofessional for producing a biological sound that men are walking around doing even more frequently? How did society get this so incredibly disastrously wrong?
Speaker 2:This raises an incredibly important question, and it is arguably the most fascinating and perhaps frustrating part of this entire deep dive. How can a society collectively hallucinate a linguistic trend?
Speaker 1:Yeah, how?
Speaker 2:How can millions of people be so demonstrably wrong about what they are hearing? The answer lies in the deeply flawed nature of human psychoacoustics. We have to introduce two well-documented psychological concepts that actively trick the human ear. The pitch contrast scenario and the gender bias scenario.
Speaker 1:Okay, let's unpack these. I really want to understand how my brain is lying to me. Start with the pitch contrast scenario. How does pitch trick our ears?
Speaker 2:It has to do with how the human brain processes sensory contrast. Our brains are evolutionary wired to detect sudden changes in our environment.
Speaker 1:Right, that's survival.
Speaker 2:Biologically, because of the size and mass of their vocal folds, women generally have a higher habitual speaking pitch than men. A woman's normal, clear, modal voice might sit somewhere around 200 hertz.
Speaker 1:Okay, so that's the baseline, 200 vibrations per second.
Speaker 2:Now, as we established in the biomechanics, Producing a creaky voice requires you to drop into a very low frequency, often below 60 or 50 hertz. When a woman drops from her normally high habitual pitch of 200 hertz all the way down into a low rumbling vocal fry at 50 hertz, the acoustic distance she covers is massive.
Speaker 1:The mathematical gap between the modal voice and the creak is huge.
Speaker 2:Exactly. And because that gap is huge, the contrast is stark. It stands out to the human ear like a siren. In psychoacoustics, we say it is highly salient. The brain registers it as a massive sudden deviation from the norm.
Speaker 1:And what about men?
Speaker 2:Men, on the other hand, have a much heavier laryngeal mass, meaning their normal habitual speaking pitch is already much lower. A man's modal voice might sit around 100 or 120 hertz.
Speaker 1:So already much closer to the creek.
Speaker 2:It is already hovering near the basement of his vocal range. So when a man drops into a 50 hertz vocal fry, The acoustic distance he covers is tiny.
Speaker 1:The shift is so small it barely registers.
Speaker 2:Right. His normal voice and his creaky voice are practically overlapping in frequency. The brain doesn't register a massive shift, so it doesn't flag the sound as abnormal.
Speaker 1:Okay. This needs an analogy to really cement it. It's like imagine you have two people drinking coffee.
Speaker 2:Okay.
Speaker 1:One person is wearing a crisp, bright white shirt. The other is wearing a dark, heavy brown shirt. If the person in the bright white shirt spills a single tiny drop of dark black coffee right in the middle of their chest, the contrast is so extreme that everyone in the room notices the stain immediately.
Speaker 2:Oh, absolutely.
Speaker 1:Your eye is drawn to it. You can't look away from it.
Speaker 2:I see where you're going with this.
Speaker 1:But if the guy wearing the dark brown shirt spills coffee all over himself, no one even sees it. He could be absolutely drenched in coffee, but because the color of the stain is so close to the color of his shirt, it just blends right into the fabric. The coffee stain is the vocal fry. Because women's voices are the bright white shirt, we hear every single drop of creak. It is incredibly salient. But men's voices are the dark brown shirt. Men are creaking all over the place. They are saturated in creak. But because their baseline pitch is already so dark and low, our ears just gloss right over it.
Speaker 2:That analogy is acoustically perfect. Men are drenched in the coffee stain of creak. But because of the lack of pitch contrast, our brains simply do not register the shift as a distinct event.
Speaker 1:Wow.
Speaker 2:But with women, the ship is highly salient, so our brains flag it immediately. That is the biological reality of the pitch contrast scenario.
Speaker 1:Wow. Okay, so pure acoustics and biological contrast. Explain why we notice it more in women. We notice the drop. But what about the gender bias scenario? That sounds a little more insidious than just simple acoustics.
Speaker 2:It is much more insidious because it involves top-down processing in the brain. The gender bias scenario explains how our societal expectations literally rewrite the audio data as it enters our consciousness.
Speaker 1:Our brains edit the tape.
Speaker 2:Over the last decade, we have been told repeatedly by articles, podcasts, comedy sketches, and cultural commentary that young women use vocal fry. It is a pervasive stereotype.
Speaker 1:So society has basically handed us a script.
Speaker 2:Yes. And because society tells us to expect this behavior from women, listeners actively, albeit subconsciously, listen for it when a woman speaks. We are primed to find it.
Speaker 1:We are actively staring at the white shirt waiting for the coffee to spill.
Speaker 2:Precisely. And here is where the psychological research gets truly wild. The perceptual bias against women is so strong that researchers conducting perception tests have found that listeners will actually experience false alarms.
Speaker 1:Wait, what do you mean by a false alarm in hearing?
Speaker 2:In these studies, researchers will play audio recordings of low-pitched male voices for listeners. And crucially, they manipulate the audio so that it contains absolutely zero acoustic creak.
Speaker 1:Zero creak.
Speaker 2:It is perfectly smooth, highly periodic, modal voice, but it is low in pitch.
Speaker 1:Okay, snooze but deep.
Speaker 2:When listeners evaluate these deep male voices, they will frequently and falsely identify the voice as creaky simply because it is low pitched.
Speaker 1:You are kidding.
Speaker 2:No, not at all. Their brains associate low pitch with men, and they don't penalize it. But when they play identical acoustic stimuli for female voices, when they hear a woman drop in pitch, they accurately identify the creak, but then apply the societal penalty, labeling it unprofessional or annoying.
Speaker 1:Wait, so our brains are so biased that we literally invent the sound of vocal fry in men's voices just because they have deep voices, but we think it sounds authoritative.
Speaker 2:Yes. We conflate low pitch with authority in men, even if it's creaky. But we conflate low pitch in women with a lack of professionalism. We have completely conflated the salience of the sound in women, the fact that it stands out with the frequency of the sound. Right. We assume that because we notice it more, they must be doing it more.
Speaker 1:That is absolutely mind-blowing. The objective acoustic data proves beyond a doubt that men are the ones driving the vocal fry bus. But because of a biological acoustic illusion and deeply ingrained societal bias, young women are the ones getting thrown under the tires.
Speaker 2:It is a really sobering realization. It forces us to reckon with how unreliable and prejudiced our subjective judgments of other people's voices truly are.
Speaker 1:So the gender stereotype is a complete myth busted wide open by 92,000 objectively measured vowels. But we aren't done yet because if the gender stereotype is entirely backward, what about the generational stereotype? The other half of the vocal fry myth is that this is a new trend.
Speaker 2:Right. The temporal argument.
Speaker 1:Exactly. We constantly hear it described as a millennial thing or a Gen Z affectation. It is treated in the media like a linguistic virus that suddenly infected the youth sometime around the year 2010. People act like nobody before the invention of the iPhone ever used creaky voice.
Speaker 2:To test this temporal myth, we have to look at the year of birth data from our 49 Canadian public figures. And this highlights another brilliantly designed aspect of the methodology.
Speaker 1:How so?
Speaker 2:They didn't just pick 49 people of the same age. They ensured a massive chronological spread. The youngest speaker in the data set was born in the year 2000.
Speaker 1:So a true Gen Z representative, they grew up entirely in the smartphone era.
Speaker 2:Yes. And the oldest speaker in the data set was born in 1937.
Speaker 1:1937. That person was born before World War II. They were a teenager in the 1950s.
Speaker 2:Exactly.
Speaker 1:So we have a massive multi-generational spread of voices spanning over six decades. If vocal fry is truly a trendy new affectation invented by millennials and passed down to Gen Z, the acoustic data should show the younger speakers creaking constantly and the older speakers barely creaking at all. the graph should be heavily skewed toward the youth. What did the computers actually find?
Speaker 2:Another cultural myth completely busted by the math. The acoustic cues, specifically the fundamental frequency, the septal peak prominence, and the noise ratios under 500 hertz showed what we call a monotonic rising creak as people age.
Speaker 1:Let me translate monotonic rise for a second. In mathematics, that means a trajectory that goes in one steady continuous direction without reversing. The older the speaker, the creakier the voice.
Speaker 2:Exactly. The data revealed that older speakers are actually significantly creakier than younger speakers. The line goes straight up as age increases. Wow. In fact, the youngest speakers in the study, the millennials and Gen Z representatives who are constantly blamed for this trend, were actually the least creaky demographic of all.
Speaker 1:Let that sink in for a second. You're telling me that an 80-year-old man born in 1937 is producing more measurable vocal fry than a 25-year-old influencer?
Speaker 2:According to the objective acoustic metrics, yes, by a significant margin. And what's fascinating here is how this points us entirely away from sociology and cultural trends and points us directly toward physiological reality.
Speaker 1:Right.
Speaker 2:Why would an 80-year-old creak more than a 20-year-old? Because of vocal aging.
Speaker 1:What actually happens to our laryngeal anatomy as we get older? We know our skin loses elasticity, our bones lose density, our vision degrades. What is the biological wear and tear on a vocal cord?
Speaker 2:The clinical term is presbyphonia, which simply means the aging of the voice. As human beings age, the tissues in our larynx undergo massive physiological changes. The laryngeal cartilages, the framework of the throat, actually begin to ossify. They turn from flexible cartilage into rigid bone. Furthermore, the muscles themselves, particularly the xyroratinoid muscle, which makes up the vocal fold, undergo sarcopenia, which is muscle atrophy.
Speaker 1:So the muscles are physically shrinking and losing their tone.
Speaker 2:Yes. And as they lose muscle mass and tone, they often develop a condition known as vocal fold bowing. They literally curve inward, creating a gap. To compensate for this gap and this loss of muscle control, older speakers often have to compress their vocal folds more tightly just to get them to vibrate at all.
Speaker 1:And let's tie this right back to our mechanical baseline from the beginning of the deep dive. The recipe for a creak is a vocal fold that is compressed tightly but lacks longitudinal tension. It is thick, loose, and struggling against air pressure. And the natural biological process of aging literally forces your vocal cords to become stiff, loose, and atrophied. Aging naturally creates the perfect anatomical environment for irregular sputtering voicing.
Speaker 2:Exactly. This natural presbyphonia, this vocal aging, creates a massive measurable increase in creaky voice in older adults. It is not a fashion trend. It is a biological reality of the human lifespan.
Speaker 1:But wait, if this is just a physiological reality, if old people have always creaked more because their throats are aging, why on earth did society suddenly decide it was a brand new trend invented by young people 10 years ago?
Speaker 2:If we connect this to the bigger picture of sociolinguistics, it gives us a brilliant insight into how linguistic myths are born. In the study of language, there's a well-established phenomenon regarding how languages evolve. When a true sound change in progress is happening, meaning when society is actually adopting a brand new way of speaking, a new vowel shift or a new slang, it is almost exclusively pioneered and led by young women.
Speaker 1:Right. Young women are historically the great innovators of language. From the valley girl like to modern internet slang, they invent it, they use it, and eventually the rest of the culture copies them.
Speaker 2:Yes. Young women are the linguistic trendsetters. So when cultural commentators in the 2010s started noticing young women using vocal fry because of the pitch contrast scenario we discussed earlier, where it stands out highly saliently, society made a massive flawed assumption.
Speaker 1:Oh, I see it.
Speaker 2:They thought, ah, young women are doing this highly noticeable thing. Young women always start linguistic trends. Therefore, vocal fry must be a brand new trend.
Speaker 1:They took a biological acoustic illusion and tried to fit it into a sociolinguistic framework that didn't apply.
Speaker 2:Exactly. But the acoustic data proves them completely wrong. Because young women are actually the least creaky demographic in the objective data, it violates the core principle of a sound change in progress. It highly suggests that vocal fry is not a new linguistic trend or affectation at all.
Speaker 1:So what does this all mean? Let's bring this all the way home. It means vocal fry isn't a trend. It isn't a TikTok fad. It isn't some Kardashian invention. It is a biological reality of human vocal mechanics and vocal aging that has literally always been there. Since the dawn of human speech, older people and men have been creaking.
Speaker 2:Our society just happened to hyper fixate on it when young women did it entirely because of the acoustic reality of pitch contrast and our own deeply ingrained top down social biases. We noticed the coffee stain on the white shirt, and we freaked out, entirely ignoring the fact that everyone else in the room was saturated in coffee.
Speaker 1:I am honestly blown away by how completely backwards the cultural narrative is compared to the scientific reality. This has been an absolute mind-bender of a deep dive.
Speaker 2:It really reframes everything.
Speaker 1:To summarize the journey we've just been on, Vocal Fry is not a trendy new affectation made up by young women in the 2010s. It is an objective, mathematically measurable biological reality that is predominantly found in men and in older people.
Speaker 2:And we only believed otherwise because we trusted our highly subjective, easily fooled human ears over objective acoustic data. We let our social biases rewrite the physics of sound.
Speaker 1:Which brings us directly to you listening to this right now. The goal of this deep dive isn't just to give you some fun phonetic trivia. We want to give you a tool to apply this to your own life.
Speaker 2:Exactly.
Speaker 1:Because let's be real, you are going to hear vocal fry again. You will probably hear it today. The next time you are sitting in a boardroom meeting or listening to a presentation or even just talking to a friend and you find yourself internally judging someone's voice for being grating or unprofessional, we want you to take a step back.
Speaker 2:Pause for a moment and ask yourself a critical, scientifically grounded question. Am I reacting to the actual physical acoustic sound waves in the room? Or am I reacting to a social bias that my own brain is actively projecting onto this speaker?
Speaker 1:Are you judging the speaker's intelligence? Or are you just judging the biological pitch contrast? Because if there is one thing this massive 92,000 vowel study proves, it's that your ears are not a pristine microphone. They are a filter heavily colored by the society you live in.
Speaker 2:And this raises an important lingering question, one that goes far beyond just the sound of vocal fry and touches on how we navigate the world.
Speaker 1:What's that?
Speaker 2:If our ears can be so easily thoroughly fooled by a simple biological illusion like pitch contrast to the point where it completely dictates who we think sounds authoritative is who we think sounds annoying, what other completely objective judgments we're making about people's intelligence or their competence or their leadership potential that are actually just biological acoustic illusions. Heliox is produced by Michelle Bruecker and Scott Bleakley. It features reviews of emerging research and ideas from leading thinkers curated under their creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals. Thanks for listening today. Four recurring narratives underlie every episode. Boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren't just philosophical musings, but frameworks for understanding our modern world. We hope you continue exploring our other episodes, responding to the content, and checking out our related articles at helioxpodcast.substack.com.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
Hidden Brain
Hidden Brain, Shankar VedantamAll In The Mind
ABC Australia
What Now? with Trevor Noah
Trevor Noah
No Stupid Questions
Freakonomics Radio + Stitcher
Entrepreneurial Thought Leaders (ETL)
Stanford eCorner
This Is That
CBCFuture Tense
ABC Australia
The Naked Scientists Podcast
The Naked Scientists
Naked Neuroscience, from the Naked Scientists
James Tytko
The TED AI Show
TED
Ologies with Alie Ward
Alie Ward
The Daily
The New York Times
Savage Lovecast
Dan Savage
Huberman Lab
Scicomm Media
Freakonomics Radio
Freakonomics Radio + Stitcher
Ideas
CBCLadies, We Need To Talk
ABC Australia