
Across Acoustics
Across Acoustics
Reproducing Soundscapes with the AudioDome
Recreating the natural hearing experience has long challenged researchers who study auditory perception. Recently, ambisonic panning has been developed as a method to accurately reproduce soundscapes. In this episode, we talk with Nima Zargarnezhad and Ingrid Johnsrude (Western University) about their research testing the accuracy of the "AudioDome," a device that using ambisonic panning to simulate soundscapes in the lab.
Associated paper:
- Nima Zargarnezhad, Bruno Mesquita, Ewan A Macpherson. and Ingrid Johnsrude. "Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors." J. Acoust. Soc. Am. 157, 2802–2818 (2025). https://doi.org/10.1121/10.0036226.
Read more from The Journal of the Acoustical Society of America (JASA).
Learn more about Acoustical Society of America Publications.
Music Credit: Min 2019 by minwbu from Pixabay.
ASA Publications (00:26)
Today we'll be discussing a JASA article that was recently featured in an AIP press release, “Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors.” With me are two of the paper's authors, Nima Zargarnezhad and Ingrid Johnsrude. Thanks for taking the time to speak with me today. How are you doing?
Nima Zargarnezhad (00:47)
Thanks for inviting us over for this interview. I'm good, thanks.
Ingrid Johnsrude (00:51)
Yeah, thank you. I’m great.
ASA Publications (00:54)
Awesome, great. So first, tell us a bit about your research backgrounds.
Nima Zargarnezhad (00:58)
Ok, I came to Ingrid's lab in 2021 as a graduate student to study the interaction between sound’s pitch and location in organizing the auditory perception. Particularly, I'm interested in understanding how music ensemble conductors benefit from their experience in analyzing the surrounding acoustic environment.
Ingrid Johnsrude (01:17)
Yeah, and I'm a psychologist by training. My background is in cognitive neuroscience, and I've studied hearing, particularly speech listening for about two decades now. I'm really interested, and my lab is really interested, in studying auditory perception in natural environments where there are multiple sounds present in different locations and we're trying to listen to just one of them.
The central problem of hearing, at least it looks like a problem, is that when multiple sounds are present, all of those sounds, if they're natural sounds, will have frequencies across the spectrum. And those frequencies mix together at the two ears. And the brain has to take that complex, interacting waveform and perceptually separate it, such that you hear these different identifiable sounds in different locations.
ASA Publications (02:24)
Yeah, right.
Ingrid Johnsrude (02:26)
As an example, imagine sitting in the park, right? You're sitting there and maybe you hear a conversation off to the side. You hear a squirrel scrabbling in the leaves below you. You hear birds chirping in the trees above you. Somehow, all of those sounds are mixing at your ears and you're able take that complex input and separate it into the bird and the squirrels and the conversation.
ASA Publications (02:54)
Yeah, it's just a very complex process within the ear and the brain too, right? Because it's just hitting the eardrum and then, I mean, we'll get into this later, but.
Ingrid Johnsrude (03:05)
Yeah, all it is is a very complex pattern of vibration at the two eardrums. And somehow we take those two systematically different patterns of vibration and reconstruct a whole auditory soundscape.
ASA Publications (03:23)
Right, right. It's so fascinating, because when you think about it, like with the eye, when you're getting the visual representation, you get different parts of the retina to recreate the image, but you don't get that with hearing, right?
Ingrid Johnsrude (03:35)
Exactly. The sounds are really mixed together. Two different sounds are going to be impinging on and activating the same parts of the Basilar membrane. And that constructive and destructive interference between sounds makes the problem, at least from a constructionist sort of analytic point of view, really complicated. And yet the perception of different objects is automatic. It's really interesting and seems kind of paradoxical.
ASA Publications (04:12)
Yeah, yeah. Well, so, and the question of this article had to do with reproducing these really complex soundscapes. So what is soundscape reproduction and how is it used?
Nima Zargarnezhad (04:24)
Well, as Ingrid mentioned, we are interested in the natural hearing experience, and part of that experience as we have been discussing is perceiving sounds from a variety of locations. Now soundescape reproduction are the techniques that simulate this perceptual experience of immersive sound for lots of different purposes. You have probably experienced the spatial audio over a pair of headphones while playing games or in home theater systems when watching a movie. These are some examples of soundscape reproductions for entertainment. There are other reasons one might recreate soundscapes, such as research or even therapeutic purposes. But we should remember that you probably experienced that these spatialized audio are not as precise, as focal, or as distributed as they should be for the immersive sound and for purposes such as research it's very important to have that precise and focal sound source reproduction ability. And that's what this paper is about.
ASA Publications (05:25)
Okay, yeah, that makes sense. Like it's one thing if you're like watching a movie and maybe you don't hear the bird in the movie at exactly the right spot, but for research you want it to be very precise. Okay, yeah, that makes sense. So your work involves something called the “AudioDome.” That sounds fun, what is it?
Nima Zargarnezhad (05:43)
The Audio Dome is the device that we use to simulate these soundscapes in the lab. As Bruno, one of the co-authors of this paper, always says, you can imagine the AudioDome is like virtual reality glasses, but just for ears instead of your eyes. I just want to clarify this—the Audio Dome is just a sound presentation device. It's an array of 91 loudspeakers arranged in a geodistic spherical shape and four subwoofers on the ground. And we have a few different algorithms that we can reproduce soundscapes. We implemented them in the paper. We can talk about them later on. But the idea is to be able to present sounds to the individual seated at the center of this huge dome.
We sometimes also integrat it with other devices, like VR glasses, to present multimedia stimuli as well. And this device was built by Sonible. It's an Austrian company and it was installed at Western Interdisciplinary Research Building in 2019.
ASA Publications (06:44)
Awesome, that sounds so cool. So, to create the perception of being in space, you're obviously going to have to take into consideration how people localize sound, which we were kind of just talking about a little bit with Ingrid. Can you explain what aspects of human hearing are important to understand in soundscape reproduction?
Nima Zargarnezhad (06:59)
Absolutely. Let's start with human sound localization. So imagine a bird in the park scenario that Ingrid described, and the bird is chirping from top of a tree on the right side. Now the sound is spreading across the space, and it will hit our two ears, and the sounds that are received into these ear canals are slightly different. First of all, the bird is closer to the right ear, so the sound arrives to the right ear earlier than the left ear. Second of all, the head is masking the sound for the left ear, so the sound received in the left ear is a little quieter. And these slight differences are used to localize the horizontal position of sound sources. These cues that we also talk about in the paper are known as interaural time differences, or ITDs, and interaural loudness differences, or ILDs. And well, for lower frequencies, we rely on ITDs. For high frequencies, ILDs dominate localization. But there is also this vertical position of sounds that we mostly perceive by the spectral filtering that's done by the shape of our ears. So if you look into our outer ear or pinnae, you see there are lots of different peaks and spirals. They filter the based on its vertical position, and we learn where a sound is coming from as we grow up and associate these filterings with the sound positions. So the most important aspect of human localization that should be considered for sound reproduction is the individual differences. All these cues that we learn as we grow up are highly individualized because we have different ear shapes, we have different head sizes, and we have different upper body shapes, and these all affect the signals that we get from a sound ,and the filtering, the delays, the loudness differences. For some reproduction techniques you need to take into consideration all these measurements and differences but for some, you can just skip them.
ASA Publications (09:08)
Ok, okay, got it. That's so cool. I don't think I ever realized that about the pinnae. So if somebody were to have something happen to their outer ear, would that affect their sound localization then?
Nima Zargarnezhad (09:18)
That's an interesting question, yes. There are studies that actually mask parts of the pinnae and then they see the participants learn to hear with a new ear.
ASA Publications (09:28)
Cool, okay, that's very interesting. Okay, so to get back to your research. So what are some current soundscape reproduction techniques and what are their benefits and limitations?
Nima Zargarnezhad (09:39)
Yeah, that's an interesting question. Let's start with the headphone experience. The stereo headphone experience tries to project all the sounds that you're perceiving as they should be received into your ear canals. And by that, they manipulate an audio file from a sound source such that the differences I mentioned before are adjusted to create the perception of that sound source from that particular location. However, there are two limitations with stereo headphones. First of all, you have to have all the parameters and measurements from somebody's head shape and ear shape. And second of all, if you have your headphones on and then you start rotating your head, then all the soundscape is going to rotate with you. And that's not natural. Sometimes it's actually blocking some signals you use for auditory attention. So these are not ideal for soundscape reproduction.
The second family of soundscape reproduction techniques are those that are known as free-field or those that use loudspeakers. So in the simplest scenario, you would place a loudspeaker at the location that you want to present the sound source and then just run a playback. And that's it. But there is no assumptions about anything. It's perfect reproduction. But there are two limitations here again. First of all, you have to have the loudspeakers placed at the sound source locations before you simulate a soundscape. And if soundscapes that you want to reproduce are different in terms of sound source location, then you have to change your loudspeaker location, which is just a lot of burden. And the second limitation is that you cannot recreate any sound sources at any location other than the locations of those loudspeakers that are limited. So the spatial resolution of your reproduction is limited.
ASA Publications (11:44)
Okay.
Nima Zargarnezhad (11:45)
So to get over the limitation of single channel presentation, there is this other technology called vector-based amplitude panning, or for short, VBAP. VBAP does not assume anything about the listener, again, because it's a free-field soundscape reproduction method. The idea is that, imagine that you want to recreate a sound source in between two loudspeakers that you already have. This is something that you could not achieve at single channel presentation. And VBAP just assumes a vector from the head of the listener to the sound source location, and it decomposes to the direction of those loudspeakers again, relative to listener's head. And then it distributes the energy to those directions such that they sum up to that original vector of your sound source. So this technique is actually niche, but also there are some limitations with this technique. First of all, in the VBAP algorithm, if coincide with, a loudspeaker location, then you're not using the surrounding loudspeakers, which is fine, but the problem is that if you have another sound source that is further away, then you're recruiting more sound sources, so the sound sources are kind of blurry, and you cannot assign a particular location to the sound sources that are not coinciding with a loudspeaker. And the second problem is that the blurriness depends on the loudspeaker array configuration. The more loudspeakers you have, the less blurriness you get. And this is the problems that the ambisonics panning, which is the last method I'm going to talk about, tries to overcome.
ASA Publications (13:47)
Okay.
Nima Zargarnezhad (13:48)
So far I've been talking about vocal sound source reproduction because that has been initial motivation for reproduction technologies. In the early days, the goal was to present sounds of a specific source from a desired location. However, as these technologies advanced, we learned that this is not the only way that we can think about sound sources. We also felt the need to reproduce immersive sound fields such as a rainy forest in which there is not a focal rain sound source. Well, it's coming from above the head. This is when the attitude from focal sound source reproduction shifted to the reproduction of surrounding sound field. And this is the attitude used in ambisonic reproduction technologies.
So in ambisonics we try to simulate the entire sound field. You can imagine there is multiple sound sources around us and they're gonna add up around your head and then go into the ears. That's what ambisonics tries to do. It just breaks down the entire sound field into something called spherical harmonics, which are kind of the basis functions of the algorithm and then reproduces the sound fields such that it will generate the same immersive feeling the center of the dome[NZ1] .
ASA Publications (14:47)
Okay, okay, so it's a bunch of loudspeakers at the same time, essentially, to create the sound field rather than just a few at a time.
Nima Zargarnezhad (14:55)
Yes, it essentially uses all of them. Even if you're presenting a sound source from exactly a loudspeaker location, then you're using all other loudspeakers. It's kind of counterintuitive, but it actually has some benefits.
Ingrid Johnsrude (15:11)
As a mathematically somewhat illiterate psychologist, I understand ambisonics to be an approximation, and the detail depends on the order of the ambisonics, and we'll get to that in a moment. But what you're doing is you're presenting the sounds from all the loudspeakers in the array such that the mixture that's hitting your ears is exactly the mixture that you would get if there were real sounds in the real world in those different positions.
ASA Publications (15:45)
Okay, so let's talk about order of ambisonics. You talked about using ninth order ambisonics. What does that mean?
Nima Zargarnezhad (15:53)
Well, as I mentioned, ambisonics uses some basis functions, and this is something theoretical. So the higher the order is, you get more basis functions and you can have a more detailed decomposition and simulation of your soundscape. Essentially, with ninth order or higher order ambisonics, you can recreate finer details and higher frequencies. And that's something we knew. Like, with ninth order, there is a limit of what you can do. And that was the thing that we wanted to determine in this paper. Something very important about ambisonics is that this simulation is valid only at the center of your array, like in a spherical arrangement. And there are some theoretical predictions that say, hey, you cannot go over this particular frequency if you want to use this order of ambisonics to simulate a soundscape at the center for a sphere with a specific radius. And that is what we studied in this paper.
ASA Publications (16:54)
So what was the goal of your study?
Nima Zargarnezhad (16:57)
Well, we are a cognitive neuroscience lab, and we are interested in hearing research, right? So we wanted to know what are our actual limitations of using this technology. As I mentioned before, there are these theoretical limitations, but we wanted to know what would happen if you just have your listeners or experiment participants listen to sound sources. What would they perceive and how would that affect our future experiments? So, we ran this entire series of experiments to quantify that.
ASA Publications (17:29)
Okay, got it. So this is a big question because you actually covered three experiments in your research. Tell us about those experiments and their results.
Nima Zargarnezhad (17:39)
Yeah, that's a huge question. But yeah, let's start with the first experiment. There is some history in all these experiments and let’s talk about all of them.
ASA Publications (17:50)
Yeah, totally. We love story.
Nima Zargarnezhad (17:55)
So I was planning to implement my first experiment with the Audio Dome, which was simulating a music ensemble. And then as I was programming, then I realized there is no programming limitation over the location of the sound sources I could put there, and I was just curious. Like what are the increments that we can place sound sources? Could I say I want two sound sources to be close together as 0.5 degrees and they be perceived separately, or does it implement it properly? What do humans perceive in that scenario, because that was something the AudioDome let me do.
So in the first experiment we just wanted to see what's that limitation and we decided to quantify the spatial acuity of humans on the horizontal plane to just get a sense. So we got the participants into the dome. We tested seventeen different locations from exactly to the left to exactly to the right and at the front. And it was a five-hour experiment, very long, and we had this focal sound discrimination task for the participants and we estimated the minimum audible angles, which are the increments of spatial discrimination of hearing or the spatial acuity of humans.
And we already knew that it's going to be variable. So we are very good at discriminating the frontal sounds, but we are not as good on the sides. And that's what we actually saw in that experiment. There's a table in the paper that summarizes all these values. And the first thing we observed was that, well, good news. We got the minimum audible angles similar to what people got in single channel presentations previously, which in all the experiments we took as the ground truth or the most naturalistic way of soundscape reproduction. So that was the best news. But on the other hand, there was this weird pattern on the sides. We saw that participants were reporting odd things, including that they perceived sounds from above the horizontal plane, or below it, and some of them have developed some strategies to respond to the task based off of that. So we were curious about it and then that led to the second experiment. But before I go to the second experiment, I wanted to mention that there was another part of that first experiment, that we were curious about the claim that ambisonics makes that the soundscape reproduction does not depend on the loudspeaker configuration. So we had our participants to do the spatial discrimination task once at the front where we knew they would be the best and we had like a concentration of loudspeakers, there were like five loudspeakers exactly at the front. But there was this location on the side that was very far away from any loudspeakers. So we had them rotate towards that location and do the frontal spatial discrimination again. And we saw that there is actually no difference between these two conditions. And that kind of proves the advantage of ambisonics over VBAP, at least in this aspect.
ASA Publications (21:07)
Right. Okay, that's very cool. So basically the idea that you’re looking at in this one is like, just to make sure I'm understanding, is if we're talking about our imaginary little birds, right? We're saying like, if we're hearing sound from the front, we probably have a very good, like, two birds really close together, we'd still be able to differentiate which bird is saying what. But when we're out to the side, right, and we had two birds, we may have a harder time differentiating between those two birds. And so you're trying to see if the AudioDome could represent those two separate birds to the side to the same degree that humans could understand.
Nima Zargarnezhad (21:44)
Exactly.
ASA Publications (21:45)
Got it. Okay. Cool.
Nima Zargarnezhad (21:46)
And what we observed was that if the two birds were as close as one degree, participants can tell the difference at the front.
ASA Publications (21:53)
Okay. Okay, got it.
Ingrid Johnsrude (21:54)
Yep, which is previously established limit of human spatial acuity. So what Nima's experiment tells us is that the AudioDome is capable of placing focal sound sources as close together or closer together than the human brain can perceive.
ASA Publications (22:16)
That's so cool. Okay, so into your second experiment. Sorry.
Nima Zargarnezhad (22:21)
Well, that's all good. And that was the goal of the first experiment. By that logic, we proved the use of this equipment for future experiments, right? We have adequate resolution. Okay, so for the second experiment, yes, we had this weird report from the participants that the sound is coming from above or below and they were confused. And we went there actually, we listened to the sounds that we were presenting for them on the side and we perceived them from higher vertical position. But then when we rotated to face them, that perception was gone. And it was kind of going back to the horizontal plane where it should have been perceived. So the second experiment tried to quantify all the localizations cues that we used for this task.
So we took this manikin that we called the Head-And-Torso S imulator. It has a upper body of an average adult. It has nice ears that are actually the average shape of adult ears. And there are two microphones into the ears. So what we essentially did was that we presented sound sources from all the locations we presented to the participants in the previous experiment with single channel presentation, VBAP, and ambisonics. And we recorded all sounds. And the audio that we presented full spectral sweep, so from zero frequency to 22 kilohertz, and then we estimated the localization cues based on these recordings inside the Head-And-Torso Simulator ears. Well, the purpose of the Head-And-Torso Simulator was to replicate all the measurements that affect human localization, right?
So we quantified all those measures and there were a few observations. First of all, ITD cues were fine, but ILD or the loudness differences were not as good as they should have been for the higher frequencies, but what we observed aligned with this theoretical limitation. So from theory we knew that there is a limitation of 4 kilohertz to reproduce sounds with this device at the center for the average head size. So we split the spectrum into low and high frequency portions below and above 4 kHz and then compared these cues in these spectral parts. We also observed some irregular frequency distortions in ambisonics compared to single channel on the sides which was explaining the elevation perception by the participants. And that's an important limitation to consider when designing the future experiments.
And the third experiment, finally, wanted to show that those notches and spectral distortions we observed in ambisonics are actually perceived as elevation cues. So in the third experiment, we had our participants do the location discrimination task, but this time instead of asking them for horizontal discrimination, we asked them for vertical discrimination on the sides and at the front. And we showed that, yes, these ambisonic, reproduction errors or distortions in high frequencies manifest as inaccurate vertical position perception.
ASA Publications (25:50)
Do you have any idea on how to fix that, or is that like a future research question?
Nima Zargarnezhad (25:55)
Well, I am not an audio engineer, so I don't design this algorithm or device. It's just quantifying the limitation. And as I mentioned, it's known that there are limitations, and we just need to be aware of them for future experiments, right?
ASA Publications (25:58)
Yeah, right, totally. Right, that makes sense, that totally makes sense, yeah. Okay, so what is the significance of these findings for soundscape reproduction with ambisonics?
Nima Zargarnezhad (26:21)
Well, I guess it's mostly that we should be very careful. First of all, we proved that ambisonics has enough resolution for reproducing soundscapes for humans in the first experiment. And we also proved the claim that the reproduction does not depend on the loudspeaker configuration compared to VBAP. However, there is this limitation of like high frequency that people need to consider when they want to make soundscapes for any reason with ambisonics.
So the other thing is that if you want to have more space to make your soundscapes, if you want higher frequencies, then you need to think about higher orders which essentially require you to recruit more loudspeakers and the decoding and encoding should change, so all these stuff in this research we just wanted to quantify all the limitations one would get if they wanted to reproduce sounds with ambisonics.
ASA Publications (27:18)
That's really cool though. So what are the next steps in this research?
Ingrid Johnsrude (27:22)
Well, now that we have shown that we can use the Audio Dome to focally play sounds where ever we want at a resolution that meets or exceeds human spatial acuity, as long as we keep our sounds below four kilohertz, 4,000 hertz, we'll get good reproduction through the ambisonic system.
Now we can study human spatial localization with very high fidelity in three dimensions. Typically people study spatial localization usually using a ring of speakers on the horizontal plane, but now we can look more systematically at elevation and ask questions like how does elevation affect or contribute to hearing sounds in different locations such that you can attend to one or the other? How does it contribute to sound segregation and streaming?
So one thing we can do is study perception of moving sounds. The AudioDome is capable of playing as many sounds as you can imagine. I think there is a limit, but it's a high and ridiculous limit in terms of the number of sounds you can place at the same time. So we can have a background of sound, we can study perception of a target sound that's moving against this background. And what's most exciting for our lab is the ability to study how people segregate a target speech sound in the presence of naturalistic competing sources that are in different locations that are coming on and off, some of which may be moving. We, for a long time, have studied speech perception over headphones, and it's time to take it out into the real world, or at least a controlled simulation thereof, so that we can understand individual differences in perception, what happens as people get older, what happens in hearing loss.
ASA Publications (29:25)
Yeah, those all sound like very exciting ways to use this research. It's interesting because I always think of like, soundscape reproduction, and I think of the entertainment purposes like video games or movies or whatever. And it's so interesting how you could use this in research. Do you have any closing thoughts?
Nima Zargarnezhad (29:43)
So yeah, for me mostly it's I'm happy that we could show something behaviorally about the theory of ambisonics and I'm really happy that we initiated doing research with this device. In the future I think we need to consider how to improve soundscape reproduction with ambisonics as you asked.
And I would be happy to hear what people think about different strategies or different ways of looking into soundscape reproduction quality with all these techniques. I’ve already made the data we collected available online, so if anyone has any ideas about reanalyzing them, doing it again better, maybe adjusting it. I would appreciate it.
ASA Publications (30:27)
Awesome. Is that included in your article where it's published, it's in the data availability?
Nima Zargarnezhad (30:32)
Yes.
ASA Publications (30:33)
Okay, great. I was gonna say otherwise we can include it in our show notes. But folks, go to the article, you'll find the data.
Well, the whole idea... of virtual acoustics and soundscape reproduction is so interesting, and it's so fun to learn about ambisonics. And it's fascinating that you can simulate the spatial resolution of a soundscape so that it sounds like real to a human listener. That's so neat. So thank you again for taking the time to speak with me today, and I hope you have a great day.
Ingrid Johnsrude (31:02)
Thank you. Bye bye.
Nima Zargarnezhad (31:03)
Thanks for having us.
[NZ1]Maybe this could be the portion used for visual soundbite instead?