(c) Copyright, Robert B. Richards, 2016
How We Hear in Rooms
The human ear-brain mechanism perceives sound in many different
ways depending on frequency.
Sound travels at about 1128 feet per second,
at about 20 degrees C (68F). It changes some but not much over temperature.
To figure out the size of a wavelength when you know the frequency, you just
divide speed by frequency, or 1128/f. so the wavelength of 100HZ would be 1128/100
= 11.28 feet. The wavelength of 7kHZ would be 1128/7K = 0.161 feet or 1.93 inches.
At 20kHZ the wavelength is 0.677 inches.
In a typical living room:
High treble (6kHZ - 20kHZ) is a
very directional energy, and is therefore usually thought of as a vector. The
higher in frequency you go, the more likely the energy will get absorbed by
typical furniture. It will bounce off anything that has a hard surface. When
reflected (delayed) sound reaches the listeners ear and combines (adds) with
direct sound from the speaker, it will create what's called a "comb filter"
effect. At the frequency where the delay is a half wavelength, there will be
a cancellation of energy. There will also be cancellations at all the integral
multiples of that "fundamental" cancellation frequency. If a half
wavelength delay were to cause a fundamental cancellation at 1KHZ (a reflection
path 1.12 ft. longer than the direct path), there would also be cancellations
at 2K, 3K, 4K, etc. The cancellations depend on the hardness of the reflection
sufaces involved, but are often 10dB or more deep. Usually less in the highest
frequencies due to absorbtion. In a typical room there will be many reflection
paths, with randomly different delay times, and therefore randomly different
cancellation frequencies. When many different reflection path energies combine
(add) at the listeners ear, the cancellations of any given reflection path will
likely get largely filled in by the energy of other reflection paths, which
will have different cancellation frequencies. With enough different reflections
reaching the ear, you end up with a relatively shallow ripple in the perceived
frequency response, rather than the rather large cancellations that you would
get if there were only a few reflection paths. When it comes to sensing the
location of a sound in the higher frequencies (above about 6kHZ), the wavelengths
are so small that the size and shape of our outer ear comes into play and helps
us determine height as well. I believe this is a learned perception, somewhat
based on a comparison to the perception of the lower frequency energies.
Here's a theoretical example of what happens when you combine
only one reflection with a direct sound path. My first hand experience showed
that it can be significantly worse than this theoretical example. Even in a
This comb-filter concept applys to all frequencies, not just
the high treble. Putting acoustically absorbtive material on walls, away from
corners can actually make things worse in many cases, because it reduces the
number of random reflections, each of which can help fill in each others comb
filter cancellations. The best place to put absorbtive materials appears to
be in both two surface and three surface corners.
Upper-midrange (1KHZ - 6kHZ) is
less directional than the higher frequencies, but is still substantially directional.
It will have virtually the same multi-path comb-filter effects and potential
significant acoustic issues as above. This is the frequency range where the
ear is most sensitive, especially at lower levels (See Fletcher-Munson graph
below). This is also the frequency range where most "stereo effect"
is relatively perceivable by the human ear-brain mechanism, since "Inter-aural
crosstalk" confuses imaging in the frequencies below about 1kHZ in a standard
2 speaker playback system. In this frequency range, we sense stereo image location
by amplitude comparisons, rather than timing (or phase) comparisons (as in the
lower frequencies). This is because above about 1kHZ, the half-wavelength becomes
shorter than the distance between our ears, so the brain has no way of knowing
which period of waveshape it's comparing. To get the best stereo effect and
imaging, you want perfect amplitude balance over this frequency range between
the left and right speakers.
Fletcher-Munson graph. This varies from person to person, but
is considered a good average of many people.
If a speaker had this frequency response, we would think it had
equal loudness at all audio frequencies.
A typical listening level of about 80dB (at 1kHZ) means you need
to turn up the bass by almost 20dB, in order to perceive it as having good balance.
You would also want to turn down the upper midrange frequencies
(centered at 4kHZ) by about 10dB.
This is usually compensated for in the recording process to some
extent. As you can see, it varies significantly with level.
Do you still want "flat" (frequency response) speakers
and no tone controls in your preamp?
Lower Midrange (100HZ - 1KHZ). Below
about 1kHZ, we perceive sound location more by timing (or phase) comparisons
than by amplitude comparisons. A true "binaural" stereo recording
that gives almost perfect wideband imaging when listened to with headphones,
won't give good imaging in this frequency range with the conventional 2 speaker
stereo setup, due to inter-aural crosstalk. Inter-aural crosstalk needs to only
happen once, either at the recording end or at the playback end. Otherwise the
ear-brain mechanism gets confused. To some extent, listening room reflections
can create a false sense of space in this frequency range, but any embedded
imaging cues in the program material will be hard for the ear-brain mechanism
to interpret if the inter-aural crosstalk happens twice; once in the recording
process, and then again with different timings when listed to in the conventional
2 speaker set-up.
When you get down below about 400HZ, the energy becomes substantially
more diffractive (bends around corners easily), and room reflections work in
a different way. Instead of analyzing the wave behavior as vectors, it's considered
more effective to analyze the energy as "pressure waves". The overall
size and shape of the room will color this frequency range more than specific
reflection paths. Due to the size of the wavelengths and diffraction, cancellations
are less likely to get filled in by multiple reflection paths. This is usually
what causes lower midrange and upper bass to sound "boomy". Positioning
speakers away from walls, and especially corners, will usually minimize this
boomyness effect. Lower frequencies will then get less reinforcement from walls
(so you'll want to turn up the bass using tone controls), but the room shape
and size will color the sound less. The speaker system may have a ruler flat
frequency response when measured in an anechoic chamber (no room reflections
at any frequency), but at the listening position in a typical living room, it's
often pretty bad.
Here's one example of how a room destroyed an otherwise very
flat speaker system
A high end speaker system
It's frequency response measured up close
Same speaker measured further back, in a relatively typical listening
room, with the calibrated mic about where a typical listener would be located
(about 8 feet out in this example).
The blue curve is the actual measurement, the orange curve is
believed to be how the typical ear-brain mechanism perceives it.
Bottom Line: How a speaker interacts acoustically
with the listening room is one of the most important things to worry about.
Low bass (20HZ - 100HZ) Because
the ear-brain mechanism is less sensitive to these frequencies (see Fletcher-Munson
graph), it is typical that most of the energy in a musical performance will
be in the lower mid and bass frequencies, so we perceive it as being in balance
with the higher frequency energies. These "pressure wave" energies
are much less directional. The wavelengths are between about 10 feet and 50
feet long, and are usually substantially reinforced by typical room boundaries
(walls). Many speakers have a roll-off in the frequency response below about
80HZ, and depend on listening room walls to work with the diffraction effect
to reinforce (acoustically amplify) this region of frequency. The speaker designer
often assumes you will put the speakers against a wall. Even with this room
boundary reinforcement, most speakers produce little energy below about 40HZ.
When a speaker is acoustically relatively flat down to 25HZ, it can sound much
better, not just for the extra low bass notes, but because all the higher frequency
energy in real-world music comes in "envelops" that contain significant
energy down to near DC. A sense of "presence" is increased.
Most listening rooms will substantially damage the frequency
response of any speaker by the time the sound gets to the listeners ear. Especially
There's both the comb-filter effects described above, and room
resonances. They are completely different mechanisms. Only the latter rings,
and has a start-up time and a decay time.
A transient or very short-lived note may not last long enough
to get a room ringing, but musical notes held for any significant time will.
Because of this, fixing a room resonance problem with EQ is tricky.
A compromise is necessary.
Corners appear to "ring" the most, and are therefore
arguably the most important place to put sound absorptive materials (from my
I found that nailing 2 inch cotton rope (soft foam rubber would
work too) in the corners of the listening room reduced room ringing substantially,
using the hand clap test.
If it's practical, I rcommend putting acoustically absorptive
materials in the corners (2 surface and 3 surface corners).
Ringing in the midrange frequencies (500HZ - 10kHZ) can get very
tedious to listen to over time. You may not realize the degree to which it is
My living room with absorbtive material nailed into the corners.
I used 2 inch cotton rope, and later found it to be very flamable.
Most foam rubber has fire retardant in it, so is much safer.
Room Power Response (what the room
reflects to the listener). This is a question of what energy gets reflected
or focussed toward the listening position of the room. It's highly dependent
on a speakers off axis frequency response. At higher ferquncies more of the
energy goes straight to the listener. The lower you go in frequency, the more
the energy diffracts and gets more affected by the room boundaries (walls and
large furniture). In a multi-way speaker system, there can be an abrupt change
in off-axis projection of the sound due to transducer size, at a crossover frequency,
which will be reflected in the room power response. If for example, a 10 inch
woofer crosses over to a 1 inch tweeter at 2kHZ, the speaker system will have
a significantly uneven off-axis frequency response. At 2kHZ a 10 inch driver
will be very directional, and a 1 inch dome tweeter will have a much wider dispersion
Because the ear-brain mechanism "compresses" the dynamic
range , and because acoustic energy drops off logarithmically with distance
(twice as far away might be 1/4 the power, depending on a bunch of variables),
the acoustic "signature" of a room will be significantly more of an
issue when the room is smaller. The reflection paths aren't nearly as long as
they would be in a larger room, so the reflected energy is not as attenuated
by travel distance as it would be in the larger room. So on one hand it's better
to have room boundary reflections be distant from both the speakers and the
listening position, but at the same time you want many random reflections that
will fill in each others comb filter cancellations...
If a room is so big that reflections fall in the range of 50mS
- 150mS, you get intelligability problems (according to David Griesinger, formerly
of Lexicon). If the room is rectangular, the other dimension should NOT be an
exact multiple of the first (shortest) dimension, or acoustic problems double
up. With ratios of 1:1.4 or 1.62, the comb filter cancellations and possible
resonance effects will largely be spread out in frequency, with minimal double-ups
at both the fundamental frequencies and the harmonics of those frequencies.
There may be other ratios that are good too, but these are the ones I know of.
Understanding these issues helps a lot when trying to improve
things, but real world room acoustics are riddled with variables. Improving
acoustics of a room is usually experimental and difficult.
Most "acoustically absorptive" material, (foam, fiberglass
insulation, thick felt padding, etc.) works pretty good at high frequencies,
but usually does little at low frequencies (below about 400HZ).
To make a difference at low frequencies the "acoustic material"
must have significant mass (physical weight). Multiplex movie theaters have
concrete cinder-block walls separating the theater rooms from each other as
an effort to block bass frequency energy from leaking to the adjacent theater
rooms. Despite that, you can still hear some bass from adjacent theater
rooms sometimes. They use cinder blocks because of their weight.
Sidewall Reflections will create
a sense of spaciousness for most recordings, but can be tedious to listen to
over time. It never changes with the program material. It also blurs or dominates
any stereo imaging cues that might be embedded in the program material. So less
"fidelity" but some people like it. (I prefer to keep it minimal).
Floor and/or ceiling bounce are often some of the more damaging
mechanisms because there's no furniture in the way that might attenuate or randomize
the energies. Especially ceiling bounce in the typical flat ceiling room with
a carpeted floor. Therefore, making a speaker more directional on the vertical
axis is usually a good thing. Vertical line array speakers are substantially
more directional on the vertical axis, so have less of a problem with this.
Reflections off the front wall of a listening room (behind the
speakers you are facing) are actually enjoyable if the delay of the reflection
is greater than 6mS, meaning the speaker must be out from the wall at least
3 feet, for this to be a plus. It's a "psycho-acoustic" effect. This
is one of the reasons why many people like the experience of open-baffle speakers
positioned 3ft.+ out from any walls. Although experts (such as Linkwitz) agree
on the 6mS number, I suspect that it may vary some with frequency. Reflections
in the range of 6mS - 20mS are considered helpful because they help decorrelate
imaging cues, which effectively enlarges the "sweet spot" area.
If you decide to use a graphic or parametric EQ to even out the
response at the listening position (not necessarily recommended above about
300HZ), you can attenuate (reduce) peaks, but it's a mistake to try and pull
up cancellations. It doesn't really work and causes problems. It creates peaks
elsewhere in the room. Cancellations are totally dependent on mic, speaker and/or
listener location, on all 3 axis. Peaks are more likely from the speaker frequency
response or room ringing.
Moving the woofers to different locations in the room can be
the best way to even out the lower mid/upper bass frequency response issues,
which is why I often prefer the Satellite/Subwoofer arrangement. Adding more
woofers that are physically distant from one another is also one of the better
ways to even out the lower mid and bass frequency response anywhere in the room.
I don't recommend putting woofers in a 3 surface corner. You
get more effective output level (reinforcement), but it stimulates the room
acoustic effects's more, usually causing a more uneven perceived frequency response.
As you can see, acoustics are a very complicated variable. How
speakers interact with listening room acoustics is often the weakest link in
any playback system.