(c) Copyright, Robert B. Richards, 2016
How We Hear in Rooms
The human ear-brain mechanism perceives sound in many different
ways depending on frequency.
Sound travels at about 1128 feet per second,
at about 20 degrees C (68F). It changes some but not much over temperature.
To figure out the size of a wavelength when you know the frequency, you just
divide speed by frequency, or 1128/f. so the wavelength of 100HZ would be 1128/100
= 11.28 feet. The wavelength of 7kHZ would be 1128/7K = 0.161 feet or 1.93 inches.
At 20kHZ the wavelength is 0.677 inches.
In a typical living room:
High treble (6kHZ - 20kHZ) is a
very directional energy, and is therefore usually thought of as a vector. The
higher in frequency you go, the more likely the energy will get absorbed by
typical furniture. It will bounce off anything that has a hard surface. When
reflected (delayed) sound reaches the listeners ear and combines (adds) with
direct sound from the speaker, it will create what's called a "comb filter"
effect. At the frequency where the delay is a half wavelength, there will be
a cancellation of energy. There will also be cancellations at all the integral
multiples of that "fundamental" cancellation frequency. If a half
wavelength delay were to cause a fundamental cancellation at 1KHZ (a reflection
path 1.12 ft. longer than the direct path), there would also be cancellations
at 2K, 3K, 4K, etc. The cancellations depend on the hardness of the reflection
sufaces involved, but are often 10dB or more deep. Usually less in the highest
frequencies due to absorbtion. In a typical room there will be many reflection
paths, with randomly different delay times, and therefore randomly different
cancellation frequencies. When many different reflection path energies combine
(add) at the listeners ear, the cancellations of any given reflection path will
likely get largely filled in by the energy of other reflection paths, which
will have different cancellation frequencies. With enough different reflections
reaching the ear, you end up with a relatively shallow ripple in the perceived
frequency response, rather than the rather large cancellations that you would
get if there were only a few reflection paths. When it comes to sensing the
location of a sound in the higher frequencies (above about 6kHZ), the wavelengths
are so small that the size and shape of our outer ear comes into play and helps
us determine height as well. I believe this is a learned perception, somewhat
based on a comparison to the perception of the lower frequency energies.
Here's a theoretical example of what happens when you combine
only one reflection with a direct sound path. My first hand experience showed
that it can be significantly worse than this theoretical example. Even in a
This comb-filter concept applys to all frequencies, not just
the high treble. Putting acoustically absorbtive material on walls, away from
corners can actually make things worse in many cases, because it reduces the
number of random reflections, each of which can help fill in each others comb
filter cancellations. The best place to put absorbtive materials appears to
be in both two surface and three surface corners.
Upper-midrange (1KHZ - 6kHZ) is
less directional than the higher frequencies, but is still substantially directional.
It will have virtually the same multi-path comb-filter effects and potential
significant acoustic issues as above. This is the frequency range where the
ear is most sensitive, especially at lower levels (See Fletcher-Munson graph
below). This is also the frequency range where most "stereo effect"
is relatively perceivable by the human ear-brain mechanism, since "Inter-aural
crosstalk" confuses imaging in the frequencies below about 1kHZ in a standard
2 speaker playback system. In this frequency range, we sense stereo image location
by amplitude comparisons, rather than timing (or phase) comparisons (as in the
lower frequencies). This is because above about 1kHZ, the half-wavelength becomes
shorter than the distance between our ears, so the brain has no way of knowing
which period of waveshape it's comparing. To get the best stereo effect and
imaging, you want the best possible amplitude balance over this frequency range
between the left and right speakers.
This is the Fletcher-Munson graph. An average of a whole bunch
of people who were tested way back in time (1950's?).
It shows how the human ear sensitivity changes with both frequency
At low levels, it's harder to hear the bass and high treble.
This varies from person to person, but is considered a good average
of many people.
If a speaker had the above frequency response, we would think
it had equal loudness at all audio frequencies.
A typical listening level of about 80dB (at 1kHZ) means you need
to turn up the bass by almost 20dB, in order to perceive it as having good balance.
You would also want to turn down the upper midrange frequencies
(centered around 3-4kHZ) by roughly 5dB.
This is usually compensated for in the recording process to some
extent. As you can see, it varies significantly with level.
Many Hi-Fi enthuriasts think a minimalist flat frequency response
preamp design with no tone controls is better. They are sadly mistaken.
I find that a 4 section Baxandall tone control circuit is an
excellent way to go. It allows me to dial in just the right "Loudness Compensation"
effect, and deal with room acoustics issue to some extent. It can also greatly
improve some recordings that were poorly mastered.
Lower Midrange (100HZ - 1KHZ). Below
about 1kHZ, we perceive sound image location more by timing (or phase) comparisons
than by amplitude comparisons. A true "binaural" stereo recording
that gives almost perfect wideband imaging when listened to with headphones,
won't give good imaging in this frequency range with the conventional 2 speaker
stereo setup, due to inter-aural crosstalk. Inter-aural crosstalk needs to only
happen once, either at the recording end or at the playback end. Otherwise the
ear-brain mechanism gets confused for frequencies below about 1kHZ. To some
extent, listening room reflections can sometimes create a false sense of space
in this frequency range, but any embedded imaging cues in the program material
will be hard for the ear-brain mechanism to interpret if the inter-aural crosstalk
happens twice; once in the recording process, and then again with different
delay timings when listed to in the conventional 2 speaker set-up.
When you get down below about 400HZ, the energy becomes substantially
more diffractive (bends around corners more easily), and room reflections work
in a different way. Instead of analyzing the wave behavior as vectors, it's
considered more effective to analyze the energy as "pressure waves".
The overall size and shape of the room will color this frequency range more
than specific reflection paths. Due to the size of the wavelengths and diffraction,
cancellations are less likely to get filled in by multiple reflection paths.
There are often only a few substantially effective reflection paths below about
400HZ in a typical living room in a modest house or apartment, so cancellation
areas of frequency are less likely to get filled in by alternate reflection
paths, as happens more in the higher frequencies. Big cancellation dips in the
lower frequencies are usually what cause lower midrange and upper bass to sound
"boomy". Positioning speakers away from walls, and especially corners,
will usually minimize this boomyness effect. Lower frequencies will then get
less reinforcement from walls (so you'll then probably want to turn up the bass
using tone controls). The speaker system may have a ruler flat frequency response
when measured in an anechoic chamber (no room reflections at any frequency),
but at the listening position in a typical living room, it's often pretty bad.
Here's one example of how a typical listening room destroyed
the frequency response of a highly accurate speaker system, at the listening
A high end speaker system
It's frequency response measured up close (similar to if it was
being tested in an Anechoic Chamber).
Same speaker measured further back, in a relatively typical listening
room, with the calibrated mic about where a typical listener would be located
(about 8 feet out in this example).
The blue curve is the actual measurement, the orange curve is
believed to be how the typical ear-brain mechanism perceives it.
Bottom Line: How a speaker interacts acoustically
with the listening room is one of the most important things to worry about in
any playback system.
Low bass (20HZ - 100HZ) Because
the ear-brain mechanism is less sensitive to these frequencies (see Fletcher-Munson
graph), it is typical that most of the energy in a musical performance will
be in the lower mid and bass frequencies, so we perceive it as being in balance
with the higher frequency energies. These "pressure wave" energies
are much less directional. The wavelengths are between about 10 feet and 50
feet long, and are usually substantially reinforced by typical room boundaries
(walls, large heavy furniture, etc.). Many speakers have a roll-off in the frequency
response below about 80HZ. Better ones get closer to 40HZ (lowest note on a
4 string bass guitar). When a speaker is acoustically relatively flat down to
25-30HZ, it can sound much better, not just for the extra low bass notes, but
because all the higher frequency energy in real-world music comes in "envelops"
that contain significant energy down to near DC. You can see this for yourself
on a realtime analyzer. A sense of "presence" is increased. Drums
sound much better IMO.
Resonance is usually where two parallel surfaces or walls will
cause "ringing", when acoustic energy is fed into the space between
them, at frequencies where half wavelengths and all integral sub-multiples of
that wavlength fit perfectly between the walls. Those certain frequencies will
effectively be amplified. Using active EQ to try and bring up cancellations
is pretty much always a wrong thing to do (it can create peaks in other locations
in the same room and usually only makes things worse), but hammering down resonant
frequency issues with active EQ is often a good idea. Because a resonance elongates
the time of the envelope in real world music, it's arguable that any resonace
should actually be EQ'd to a slightly negative level (maybe 3-6dB), relative
to the rest of the frequency response, to compensate for how the ear-brain mechanism
is going to perceive those elongated resonant frequencies. A really good listening
room would be designed to have few or no parallel walls.
Most listening rooms will substantially damage the frequency
response of any speaker by the time the sound gets to the listeners ear. Especially
smaller rooms, because acoustic energy dies off exponentially with distance.
Twice the distance might give you approximately 1/4 the energy level (if the
energy source was a point source, which it usually effectively is).
There's both the comb-filter effects described above, and room
resonances described here. They are completely different mechanisms. Only the
latter rings, and has a start-up time and a decay time.
A transient or very short-lived note may not last long enough
to get a room ringing audibly, but musical notes held for any significant time
Because of this, fixing a room resonance problem with EQ is tricky.
A compromise is necessary.
Corners appear to "ring" the most, and are therefore
arguably the most important place to put sound absorptive materials (from my
I found that nailing 2 inch cotton rope (soft foam rubber would
work too) in the corners of the listening room reduced room ringing substantially,
using the hand clap test.
If it's practical, I rcommend putting acoustically absorptive
materials in the corners (2 surface and 3 surface corners).
Ringing in the midrange frequencies (500HZ - 10kHZ) can get very
tedious to listen to over time. You may not realize the degree to which it is
My living room with absorbtive material nailed into the corners.
I used 2 inch cotton rope, and later found it to be very flamable.
Most foam rubber has fire retardant in it, so is much safer.
"Room Power Response" (what
the room reflects to the listener - everything except the direct path from speaker
to ear). It's highly dependent on a speakers off axis frequency response and
where in the room the speakers are located. At higher ferquncies more of the
energy is directional, so might go straight to the listening position, without
much interaction with the room acoustics. The lower you go in frequency, the
more the energy diffracts and gets more affected by the room boundaries (walls
and large furniture).
In a multi-way speaker system, there can be an abrupt change
in off-axis projection of the sound due to transducer size. This will be reflected
in the room "power response". If for example, a 10 inch woofer crosses
over to a 1 inch tweeter at 2kHZ, the speaker system will have a significantly
uneven off-axis frequency response. At 2kHZ a 10 inch driver will be very directional,
and a 1 inch dome tweeter will have a much wider dispersion pattern. At 2kHZ
the off axis FR will jump up by several dB. It's arguable that you always want
to use the driver with the smallest diameter, that can also handle the acoustic
power output that you will want.
If you are building a 3 way tri-amp'd 24dB/octave actively crossed
over speaker system, and you want your midrange driver to be "competent"
IMO down to 100HZ for example, I'd recomend at least a 5 inch driver for typical
small room use, 2 fives is significantly better. If you only need the
midrange driver to go down to 500HZ, a 3 inch driver (with actual cone diameter
of about 2 inch) would be my choice. This way you get a reasonable max SPL (loudness
level), while maintaining the best off axis response that's practical.
Because the ear-brain mechanism "compresses" the dynamic
range , and because acoustic energy drops off logarithmically with distance
(twice as far away might be 1/4 the power, depending on a bunch of variables),
the acoustic "signature" of a room will be significantly more of an
issue when the room is smaller. The reflection paths aren't nearly as long as
they would be in a larger room, so the reflected energy is not as attenuated
by travel distance, as it would be in the larger room. So on one hand it's better
to have room boundary reflections be distant from both the speakers and the
listening position, so less interaction. But at the same, especially at the
mid and higher frequencies, having many off axis room reflections can help fill
in what ever comb filter cancellations exist. It's a tradeoff. But less room
acoustics interactions usually give a better result.
If a room is so big that reflections fall in the range of 50mS
- 150mS, you get intelligability problems (according to David Griesinger, formerly
of Lexicon). If the room is rectangular, the other dimension should NOT be an
exact multiple of the first (shortest) dimension, or acoustic problems double
up. With ratios of 1:1.4 or 1.62, the comb filter cancellations and possible
resonance effects will largely be spread out in frequency, with minimal double-ups
at both the fundamental frequencies and the harmonics of those frequencies.
There may be other ratios that are good too, but these are the ones I know of.
If I remember correctly, the 1:1.62 is called the golden ratio, which shows
up in nature a lot.
Understanding these issues helps a lot when trying to improve
things, but real world room acoustics are riddled with variables. Improving
acoustics of a room is usually experimental and difficult.
Most "acoustically absorptive" material, (foam, fiberglass
insulation, thick felt padding, etc.) works pretty good at high frequencies,
but usually does little at low frequencies (below about 400HZ), unless it has
significant mass attached to it.
To make a difference at low frequencies the "acoustic material"
must have significant mass (physical weight). Multiplex movie theaters have
concrete cinder-block walls separating the theater rooms from each other as
an effort to block bass frequency energy from leaking into the adjacent theater
rooms. Despite that, you can still hear some bass from adjacent theater
rooms sometimes (the movie next door has bombs going off, for ex.). They use
cinder blocks because of their weight.
Sidewall Reflections will create
a sense of spaciousness for most recordings, but can be tedious to listen to
over time. It never changes with the program material. It also blurs or dominates
any stereo imaging cues that might be embedded in the program material. So less
"fidelity" but some people like it. (I prefer to keep it minimal).
Floor and/or ceiling bounce are often some of the more damaging
mechanisms because there's no furniture in the way that might attenuate or randomize
the energies. Especially ceiling bounce in the typical flat ceiling room with
a carpeted floor. Therefore, making a speaker more directional on the vertical
axis is usually a good thing. Vertical line array speakers are substantially
more directional on the vertical axis, so have less of a problem with this.
Reflections off the front wall of a listening room (behind the
speakers you are facing) are actually enjoyable if the delay of the reflection
is greater than 6mS, meaning the speaker must be out from the wall at least
3 feet, for this to be a plus. It's a "psycho-acoustic" effect. This
is one of the reasons why many people like the experience of open-baffle speakers
positioned 3ft.+ out from any walls. Although experts (such as Linkwitz) agree
on the 6mS number, I suspect that it may vary some with frequency. Reflections
in the range of 6mS - 20mS are considered helpful because they help decorrelate
imaging cues, which effectively enlarges the "sweet spot" area and
create more of a sense of depth. Open baffle speakers can be difficult to use
in smaller rooms.
If you decide to use a graphic or parametric EQ to even out the
response at the listening position (not necessarily recommended above about
500HZ), you can attenuate (reduce) peaks, but it's a mistake to try and pull
up cancellations. It doesn't really work and causes problems. It creates peaks
elsewhere in the room. Cancellations are totally dependent on mic, speaker and/or
listener location, on all 3 axis. Peaks are more likely from the speaker frequency
response or room ringing, and are OK to hammer down with an equalizer.
Moving the woofers to different locations in the room can be
the best way to even out the lower mid/upper bass frequency response issues,
which is why I often prefer the Satellite/Subwoofer arrangement. Adding more
woofers that are physically distant from one another, is also one of the better
ways to even out the lower mid and bass frequency response anywhere in the room.
I don't recommend putting woofers in a 3 surface corner. You
get more effective output level (reinforcement), but it stimulates the room
acoustic effects's more, usually causing a more uneven perceived frequency response
at the listening chair.
As you can see, acoustics are a very complicated variable. How
speakers interact with listening room acoustics is often the weakest link of
any playback system. Even weaker than the speaker itself.