| |
Sensible Stereo Recording
-- Auditory Scene creation -- Recording
what we hear -- Experimental results
-- Theory -- SRA
--
Under construction:
3/10/12 - On hold while I am
working on a paper for the San Francisco AES Convention with the title:
"Spatial rendering of the reverberated stereo signal in the ITD range of
hearing." or
"Recordings, Rooms and Rendering" for short.
Listening to WATSON-SEL gave me new insights.
Creating the Gestalt of the Auditory
Scene
Page 1 - Page 2 -
Page 3 - Page 4
1.3.1 The sound of a single loudspeaker in a room
Let's look at the simple case of a single loudspeaker in a reverberant room.
Wall reflections, FIG. 16, can be modeled by image sources, which replace the
walls. FIG 17 shows the intersection of three orthogonal mirror surfaces as in a
room corner. It can be seen that there are seven images of the single loudspeaker, all
radiating sound but to varying degree. If the ceiling were added to the three
walls, then there would be images added in the ceiling corner, and their images
in the floor corner, and so on. If a second side wall and a front wall are added
so that the room is closed and has eight corners, then the total number of image
sources becomes very large. But as the image distances from the corner increases from
1st order, to 2nd order, to 3rd order reflections and so on, their strength also
decreases due absorption and diffusion by the wall surface properties. Still, as
long as there are reflections there is never just a single loudspeaker in the
room, but many copies and seen in different orientations. The situation is not
unlike having another person in the room who is talking to you. She too has many
acoustic images and if the room is very reverberant she may be hard to understand.
In an anechoic room there is no issue with intelligibility, but we feel uneasy.
Between those two extremes of reverberation lies a comfort zone for normal
living spaces. Cathedrals, concert halls, auditoria or office spaces have their
unique and different requirements for sound reverberation. What applies to
acoustically large spaces does not necessarily translate to living rooms, which
are considered as acoustically small below their Schroeder frequency, i.e. typically below 200 Hz. What
constitutes an acoustically comfortable living room? To me it is a room with
RT60 around 500ms, which is much more live than a German recording studio built
to the RT60 standard of 250ms. Even Tonmeisters agree that this is not an
environment to listen for pleasure, but a working space to analyze the
recording. They speculated that the PLUTO loudspeaker would not be suitable in
the studio, but they would like it in a living room for enjoyment and checking
how their recording translates.
FIG. 16 Direct sound and first
reflections from a loudspeaker near a room corner. No floor or ceiling
reflections are shown.
FIG. 17 Image sources corresponding
to floor, rear and side wall reflections for a dipole loudspeaker near a
room corner. Actual surfaces scatter, diffuse and absorb sound to
varying degrees. Thus the mirror images should look foggy, fainter and
differently colored in comparison to the direct source. Initial coloring
of the images is caused by changes in the polar frequency response of
the loudspeaker. Reflections are also weaker by (Ddirect / Dreflected)
due to greater distance and are delayed by (Dreflected - Ddirect) for
an observer at Ddirect from the loudspeaker. [11] |
 |
As an experiment to show the interaction of a single
loudspeaker with a room I will use a 1 kHz toneburst, FIG. 18, applied to the
left ORION, FIG. 19, and then applied to the left PLUTO after it has been moved
into the previous location of the ORION. The loudspeaker outputs are recorded
with a stereo microphone from the listening position A in FIG. 4.
FIG. 18 Narrowband test
signal to show 1 kHz room reflections with approximately 10 ms time resolution |
 FIG.
19
Recording of the sound output from the left located ORION, PLUTO or my voice with a SONY
PCM-M10 at the listening position
|

FIG. 20 Direct and reflected signal response
during the first 200ms as recorded from listening position A for ORION and
PLUTO.
Though ORION and PLUTO reproduce the same 1 kHz burst the
response of the room is stronger and different for PLUTO as it is for ORION.
PLUTO radiates omni-directional while ORION radiates bi-directional in dipole
fashion. Reflected signals that arrive within 10 ms of the direct sound from
nearby surfaces or objects add to the burst received by the microphone and
cannot be resolved visually. To do so would require to increase the burst
frequency, e.g. to 3 kHz for a 3.3 ms duration test signal. The room response at
the higher burst frequency would be indicative of the radiation pattern of the
loudspeakers at the higher frequencies and the reflectivity of the room around 3
kHz. In general it should be noted that the reflection pattern and the
reverberant field at the listening position is a direct function of the 3-D
polar response of the loudspeaker and its variation with frequency. A
corresponding situation occurs in the concert hall, where any musical instrument
that is being played determines the gestalt of its direct and reverberated sound
at the recording microphone location due to its directivity. Thus, when a
musician is recorded in an isolation booth and reverberation added later, it is
highly unlikely that the reverberation will have the natural characteristics of
the same musical instrument in a particular venue.
The reflections in FIG. 20 are above the threshold of
detection in FIG. 9. They are below the drift threshold in FIG. 10 for the first
50 ms or so but not for greater delays. Since we are dealing with a large number
of reflections and a situation that is quite different from the cases that were
investigated in those two figures above, one must be careful when drawing
conclusions about how we are likely to perceive the acoustic event. You can
apply the 1 kHz burst test signal to your own loudspeaker and hear for yourself.
I generated the burst using the f(x) Expression Evaluator in Goldwave under Tools. The
expression is: (0.5-0.5*cos(2*pi*t*f/x))*sin(2*pi*t*f) with x=10 and f=1000. It
creates a 1000 Hz sinewave, which is 100% amplitude modulated by a 100 Hz
raised-cosine wave. I copied one modulation cycle and pasted it three times at
500 ms intervals into a New wave of 2 s duration to generate Track 05-LL-1kHz_bursts_500ms.wav. You
can use this technique to generate tonebursts of different frequency, number of
cycles or repetition rate for testing purposes, particularly for high level
stress testing of devices that might otherwise be damaged by overheating from
continuous signals.
a - Track 05-LL-1kHz_bursts_500ms.wav
Here are the three tonebursts that are fed to the left loudspeaker. The
right preamp output is disconnected. The loudspeaker location and distance are
recognized. This is a real sound source, which creates an Auditory Scene
when I close my eyes, of a pinging sound coming from a specific place in my
room. The sound does not characterize the real nature of the source, what
generated the sound, only its location. The pinging tends to sound distorted at
high levels even though the drivers should have little problem around 1 kHz. I
assume this is due to reflections of the burst.
b - Track 06-LL-1kburst-orion-pluto.mp3
Here we have the three bursts of Track 05 followed by a recording from
listening position A of the left ORION and PLUTO outputs. All three sets of
bursts sound different. More room reflections are
added to PLUTO's bursts than ORION's. But neither of the recorded bursts sound
like how I hear them from location A when Track 05 is reproduced by ORION or
PLUTO.
c - Track 07-LL-anechoic-female-male-music.wav
Short clips of reflection free, anechic recordings of female and male speech
and music. Neither of the clips creates the perception
of a real person or a music band being in my room, though I can readily locate
them at the left loudspeaker. What is wrong? I suspect the acoustic size
and thus the 3-D radiation pattern of the loudspeaker is too different from that
of a real person. Therefore the reflections in the room and the reverberation
have a different character than those of a real person in the loudspeaker
location. The test should give more realistic results in an anechoic chamber,
but how often do we converse with someone in an anechoic chamber. The
music section is even less convincing because it should extend over a much wider
spatial arc to be believable.
d - Track 08-LL-left_orion.mp3
Here Track 07 was played through the left loudspeaker and recorded. Thus the
response of my room became part of the recording and so it is no longer
anechoic. It is always surprising to me how strongly I can hear the room in the
recording, when I did not notice it as much on playback of the anechoic track.
It seems that the brain can tune out the room excitation, but not if it is
already imbedded with the recording.
e - Track 09-LL-left_pluto.mp3
Similar to Track 08, except that PLUTO was recorded. The room response is
stronger than for ORION.
f - Track 10-LL-left_SL.mp3
Instead of recording a loudspeaker reproduction I recorded my voice from the
location of the left loudspeaker. The recording
contains, of course, the response of the room as seen from the microphone
location A. Note how much more realistic my voice sounds than the previous
anechoic voice over loudspeaker.
g - Track 11-LL-left_orion-SL_replay.mp3
Here is my voice as played back by the left ORION loudspeaker.
h - Track 12-LL-left_SL_EL.mp3
The previous recordings were at a distance of 2.4 m from the loudspeaker or
from me. For this track I am standing at the left loudspeaker with the
microphone in my hand. My wife sits in a chair at A. Because of the close
proximity of the microphone there is little room reverberation heard. My wife's
voice is more distant and clearly in a reverberant
environment. The volume of her voice did not seem as low to me as it is on
loudspeaker playback. Her distance from the microphone was about eight times
that of mine.
Note that my voice appears at the loudspeaker, not 30 cm
in front of you, as was the distance from my mouth to the microphone. If this
were a true 3-D reproduction, then my voice would have been this close to you.
Instead it is located at the loudspeaker. We process directional and distance
cues in the direct and reflected sounds of any source of sound. In this case we
localize the voice at the loudspeaker. My voice has little recorded
reverberation, but the voice of my wife does and is at lower volume. In the AS
she is placed at some distance from me. The real distance to the loudspeaker
from where ever I am in the room is also the minimum distance between me and the
AS when I close my eyes.
Last night I attended a concert, FIG.1, and made some
related observations about single loudspeaker sound reproduction. The lecturer
of the pre-concert music talk had a microphone. A loudspeaker high above the
stage amplified his voice but the volume was kept low and the precedence effect
[2] prevented me from localizing the loudspeaker as a source separate from the
speaking person. Nevertheless it was obvious to me that a loudspeaker was
involved. His voice had an unnatural overlay due to the strong reverberation of
the loudspeaker's output by the hall. Furthermore the reverberation was colored
by the polar response of the loudspeaker.
Just before the concert started a female voice announced over loudspeaker(s) to
turn off electronic devices. Her voice was quite loud and so there was no
question that it was reproduced. Her acoustic size was unrealistically large
compared to what a speaking voice sounds like in the large hall. This was
demonstrated a few minutes later when the conductor turned to the audience to
introduce the upcoming modern violin concerto. His voice did not suffer from
unnatural reverberation. His voice actually showed little reverberation though I
am seated at great distance. It sounded like a distant voice in a large space
and though not loud, was easy to understand as the audience was quiet. The
concert illustrated again how different solo instruments reverberate with
different strength and character due to output volume and directional aim of
their sound. The bass drum sound rolls through the hall even when only slightly
tapped. The solo violin does not seem to excite any reverberation whether played
softly or strident. The massed orchestra illuminates the hall with sound and
enveloping reverberation.
I have often wondered how we recognize whether a sound was
made by a real instrument or by a loudspeaker, especially when the sound comes
from some distance, like through the hallway in a building or from an open
window. It must be the reverberant sound, which is characteristic for every
instrument or source. The physical construction, size and geometry of a source
determines its radiation pattern. Since we live in reflective environments each
source of sounds assumes a gestalt. We recognize the source by its gestalt even
when the environment changes. We can differentiate a piano from the loudspeaker
reproduction of a piano even when the sound that we hear has lost high or low
frequencies or envelope distortion in its acoustic transmission from a distant
place.
I draw several conclusions from the listening tests for
sound reproduction from a single loudspeaker in a reverberant space:
- The most believable AS is created by Track 10 and Track
12, which are recordings of live voice in my living room. I recognize the
acoustic event and there is no illusion about the nature of the loudspeaker
reproduction in my room. They are there, not here. These are live, not
anechoic recordings and I hear the spatial context and relationships even
with single loudspeaker playback. The AS has a familiar connection to
reality. My physical distance to the loudspeaker is the minimum auditory
distance of the AS.
Each source of sound, a person, a musical instrument, a tool or loudspeaker
has its own acoustic gestalt, which is not hidden by reflections and
reverberation. Reflections and reverberation may actually help in
recognizing the gestalt and thus the identity of the source. Together with
finding the direction and distance of an unknown source of sound, namely its
spatial location, recognizing its identity, the nature of the beast, could
be important for survival. We are naturally motivated to learn to recognize
the acoustic signature of sound sources in spaces with different reflective
and reverberant properties. We stay especially alert when in an environment
where this source might reside. The brain uses cognitive processes all the
time even when we are not conscious of them. They allow us to relax when we
are in a familiar and friendly space. There we pull in our auditory horizon
and pay attention to the AS of the present acoustic event. This might be a
live concert or its phantom image produced by a pair of loudspeakers, where
the loudspeakers have disappeared. But spatial awareness remains and might
even be heightened for enjoyment. The auditory apparatus is always working.
Unlike the visual apparatus it has no lids. Auditory events do not occlude
each other. When multiple mixed streams of sound arrive at the ears, the
brain knows which one to attend to and which ones to ignore as background,
even if only a hint of the source's identity is provided [12, 13].
- Track 07, the set of anechoic recordings, whether
played back over ORION or PLUTO creates an AS that is not mistaken for a
real female, male or music band in my room. I recognize the absence of
reverberation in the recording. The AS is that of the left ORION or PLUTO in
my room. If we listened only in mono, then the perfect loudspeaker would
have a flat on-axis response and the polar response of the source that it is
reproducing. It would be an interesting exercise to build such a loudspeaker
for human voice, but then it is not of use for anything else. The typical
single loudspeaker can at best create the illusion of an acoustic event. It
cannot recreate the acoustic event.
As an aside, when I design a loudspeaker I first built only one, measure and
listened to it in my room and then question if it is worth to build an
identical second unit. A single loudspeaker is more revealing than a stereo
setup.
- Tracks 08, 09 and 11 illustrate the perceptual
distortion of loudspeaker and room, which was created by an imperfect
loudspeaker interacting with the room and then recorded from a single point
in space. These tracks highlight limitations in recording and reproduction
of acoustic events. They are specific magnifications of the acoustic
behavior of loudspeaker and room. I do not believe that they could be the
starting point to change either the loudspeaker polar response or the room
acoustics. The ratio of direct to reverberant sound, D/R, has been decreased
in the playback of the room recording when the playback level for Track 08
and Track 09 is the same as for Track 07. The direct to reverberant ratio is
a critical parameter for the quality of speech transmission and
intelligibility but also for concert hall acoustics [13]. D/R depends upon
the radiation pattern of the source and its location on the stage. Thus it
is nearly impossible to provide artificial reverberation that sounds natural
in all cases.
The polar response of the loudspeaker, the spatial context
in the recording and the listening room are important contributors to creating
the Gestalt of the AS.
With currently used methods of sound recording and
reproduction we are merely trying to fool the mind. Therefore it should be
important to understand the cues or tricks that we fall for and to avoid those
that destroy the illusion. Physical reconstruction of the wave field that
existed at some location in space during the original acoustic event is still at
the experimental stage. If realized it could give us a taste of time travel.
Stereo loudspeakers in a room are not up to that task, only binaural is, FIG.
8.
1.3.2 The sound of two loudspeakers in a room
FIG. 21 Direct signals and room
reflected signals from left and right loudspeakers superimpose at both
ears of the observer |
Two loudspeakers in a
room, FIG. 21, are perceptually treated as two independent
sources of sound to the degree that the sounds are uncorrelated. The
left speaker may carry the interview of a male in a church, the right
speaker the interview of a female in the same church and location, but
recorded at a later time. When played back simultaneously we hear a male
voice at the left speaker, a female voice voice at the right speaker and
that both talk in a similarly reverberant space. They may be difficult
to understand when they talk at the same time. They would be even more
difficult to understand if both tracks were played back simultaneously
over only one of the two loudspeakers. The distance between left and
right loudspeakers helps localization of male or female voice by turning
the head towards the one of interest. Localization makes it easier to
follow the sound stream and to suppress sound from other locations to
some degree.
We have an amazing ability to hold a conversation
with the person in front of us when surrounded by a multitude of people,
conversations and background noise. The so called "Cocktail Party
Effect" and related phenomena have been intensively studied [15,
16, 17, 18]. People with hearing aids often have difficulty coping in
such situations. |
I am in awe of how the brain has evolved to perceive,
locate and recognize multiple, individual sources of sound in environments of
greatly different reflective properties. To quote from Bregman [20]:
"Sound is a pattern of pressure waves moving through
the air, each sound producing event creating its own wave pattern. The human brain recognizes these patterns as indicative of the
events that give rise to them: a car going by, a violin playing, a woman
speaking, and so on. Unfortunately by the time the sound has reached the ear,
the wave patterns arriving from the individual events have been added together
in the air so that the pressure wave that reaches the eardrum is the sum of the
pressure patterns coming from the individual events. The summed pressure wave
need not resemble the wave patterns of the individual sounds.
As listeners, we are not interested in this summed pattern, but in the
individual wave patterns arising from the separate events. Therefore our brains
have to solve the problem of creating separate descriptions of the individual
happenings, but it doesn't even know, at the outset, how many sounds there are,
never mind what their wave patterns are: so the discovery of the number and
nature of sound sources is analogous to the following mathematical problem: -
The number 837 is the sum of an unknown number of other numbers; what are they?
There is a unique answer. - "
Physics Today carried in 2011 an article, "Listening in on the listening brain",
which indicates that hearing the eardrum signals involves a bi-directional
stream of information - from inner ear to brain and from brain to inner ear.
A short 0.1 ms click stimulates approximately 7 ms of brain activity.
How to make sense of two loudspeakers emitting identical
or nearly identical wave patterns in a reflective
environment, as with stereo, would seem to be a particularly difficult and
confusing problem in hearing. In evolutionary terms, there is no natural
precedence for such situation. Maybe a wolf pack can create a similar acoustic
event. Hearing phantom sources is like an escape from reality, from hearing the
two loudspeakers, which are readily recognized, and the familiar room. Hiding
loudspeakers and room should therefore be the challenge for every loudspeaker
designer and recording engineer in order to create a believable auditory
illusion, an Auditory Scene of a different reality, that of listening to
musicians performing Copland's Organ Symphony in Davies Symphony Hall, for
example.
i - Track 05-LL-1kHz_bursts_500ms.wav
j - Track 06-LL-1kburst-orion-pluto.mp3
k - Track 07-LL-anechoic-female-male-music.wav
l - Track 08-LL-left_orion.mp3
m - Track 09-LL-left_pluto.mp3
n - Track 10-LL-left_SL.mp3
o - Track 11-LL-left_orion-SL_replay.mp3
p - Track 12-LL-left_SL_EL.mp3
----------------------------------------------------------------------------------------------------------------------
impulse response inside the brain


-- Auditory Scene creation -- Recording
what we hear -- Experimental results
-- Theory -- SRA
--
|