Acoustics 101

(c) Copyright, Robert B. Richards, 2016


How We Hear in Rooms

The human ear-brain mechanism perceives sound in many different ways depending on frequency.

Sound travels at about 1128 feet per second, at about 20 degrees C (68F). It changes some but not much over temperature. To figure out the size of a wavelength when you know the frequency, you just divide speed by frequency, or 1128/f. so the wavelength of 100HZ would be 1128/100 = 11.28 feet. The wavelength of 7kHZ would be 1128/7K = 0.161 feet or 1.93 inches. At 20kHZ the wavelength is 0.677 inches.

In a typical living room:

High treble (6kHZ - 20kHZ) is a very directional energy, and is therefore usually thought of as a vector. The higher in frequency you go, the more likely the energy will get absorbed by typical furniture. It will bounce off anything that has a hard surface. When reflected (delayed) sound reaches the listeners ear and combines (adds) with direct sound from the speaker, it will create what's called a "comb filter" effect. At the frequency where the delay is a half wavelength, there will be a cancellation of energy. There will also be cancellations at all the integral multiples of that "fundamental" cancellation frequency. If a half wavelength delay were to cause a fundamental cancellation at 1KHZ (a reflection path 1.12 ft. longer than the direct path), there would also be cancellations at 2K, 3K, 4K, etc. The cancellations depend on the hardness of the reflection sufaces involved, but are often 10dB or more deep. Usually less in the highest frequencies due to absorbtion. In a typical room there will be many reflection paths, with randomly different delay times, and therefore randomly different cancellation frequencies. When many different reflection path energies combine (add) at the listeners ear, the cancellations of any given reflection path will likely get largely filled in by the energy of other reflection paths, which will have different cancellation frequencies. With enough different reflections reaching the ear, you end up with a relatively shallow ripple in the perceived frequency response, rather than the rather large cancellations that you would get if there were only a few reflection paths. When it comes to sensing the location of a sound in the higher frequencies (above about 6kHZ), the wavelengths are so small that the size and shape of our outer ear comes into play and helps us determine height as well. I believe this is a learned perception, somewhat based on a comparison to the perception of the lower frequency energies.

Here's a theoretical example of what happens when you combine only one reflection with a direct sound path. My first hand experience showed that it can be significantly worse than this theoretical example. Even in a carpted room.

This comb-filter concept applys to all frequencies, not just the high treble. Putting acoustically absorbtive material on walls, away from corners can actually make things worse in many cases, because it reduces the number of random reflections, each of which can help fill in each others comb filter cancellations. The best place to put absorbtive materials appears to be in both two surface and three surface corners.

Upper-midrange (1KHZ - 6kHZ) is less directional than the higher frequencies, but is still substantially directional. It will have virtually the same multi-path comb-filter effects and potential significant acoustic issues as above. This is the frequency range where the ear is most sensitive, especially at lower levels (See Fletcher-Munson graph below). This is also the frequency range where most "stereo effect" is relatively perceivable by the human ear-brain mechanism, since "Inter-aural crosstalk" confuses imaging in the frequencies below about 1kHZ in a standard 2 speaker playback system. In this frequency range, we sense stereo image location by amplitude comparisons, rather than timing (or phase) comparisons (as in the lower frequencies). This is because above about 1kHZ, the half-wavelength becomes shorter than the distance between our ears, so the brain has no way of knowing which period of waveshape it's comparing. To get the best stereo effect and imaging, you want perfect amplitude balance over this frequency range between the left and right speakers.

Fletcher-Munson graph. This varies from person to person, but is considered a good average of many people.

If a speaker had this frequency response, we would think it had equal loudness at all audio frequencies.

A typical listening level of about 80dB (at 1kHZ) means you need to turn up the bass by almost 20dB, in order to perceive it as having good balance.

You would also want to turn down the upper midrange frequencies (centered at 4kHZ) by about 10dB.

This is usually compensated for in the recording process to some extent. As you can see, it varies significantly with level.

Do you still want "flat" (frequency response) speakers and no tone controls in your preamp?


Lower Midrange (100HZ - 1KHZ). Below about 1kHZ, we perceive sound location more by timing (or phase) comparisons than by amplitude comparisons. A true "binaural" stereo recording that gives almost perfect wideband imaging when listened to with headphones, won't give good imaging in this frequency range with the conventional 2 speaker stereo setup, due to inter-aural crosstalk. Inter-aural crosstalk needs to only happen once, either at the recording end or at the playback end. Otherwise the ear-brain mechanism gets confused. To some extent, listening room reflections can create a false sense of space in this frequency range, but any embedded imaging cues in the program material will be hard for the ear-brain mechanism to interpret if the inter-aural crosstalk happens twice; once in the recording process, and then again with different timings when listed to in the conventional 2 speaker set-up.

When you get down below about 400HZ, the energy becomes substantially more diffractive (bends around corners easily), and room reflections work in a different way. Instead of analyzing the wave behavior as vectors, it's considered more effective to analyze the energy as "pressure waves". The overall size and shape of the room will color this frequency range more than specific reflection paths. Due to the size of the wavelengths and diffraction, cancellations are less likely to get filled in by multiple reflection paths. This is usually what causes lower midrange and upper bass to sound "boomy". Positioning speakers away from walls, and especially corners, will usually minimize this boomyness effect. Lower frequencies will then get less reinforcement from walls (so you'll want to turn up the bass using tone controls), but the room shape and size will color the sound less. The speaker system may have a ruler flat frequency response when measured in an anechoic chamber (no room reflections at any frequency), but at the listening position in a typical living room, it's often pretty bad.

Here's one example of how a room destroyed an otherwise very flat speaker system

A high end speaker system


It's frequency response measured up close


Same speaker measured further back, in a relatively typical listening room, with the calibrated mic about where a typical listener would be located (about 8 feet out in this example).

The blue curve is the actual measurement, the orange curve is believed to be how the typical ear-brain mechanism perceives it.

Bottom Line: How a speaker interacts acoustically with the listening room is one of the most important things to worry about.

Low bass (20HZ - 100HZ) Because the ear-brain mechanism is less sensitive to these frequencies (see Fletcher-Munson graph), it is typical that most of the energy in a musical performance will be in the lower mid and bass frequencies, so we perceive it as being in balance with the higher frequency energies. These "pressure wave" energies are much less directional. The wavelengths are between about 10 feet and 50 feet long, and are usually substantially reinforced by typical room boundaries (walls). Many speakers have a roll-off in the frequency response below about 80HZ, and depend on listening room walls to work with the diffraction effect to reinforce (acoustically amplify) this region of frequency. The speaker designer often assumes you will put the speakers against a wall. Even with this room boundary reinforcement, most speakers produce little energy below about 40HZ. When a speaker is acoustically relatively flat down to 25HZ, it can sound much better, not just for the extra low bass notes, but because all the higher frequency energy in real-world music comes in "envelops" that contain significant energy down to near DC. A sense of "presence" is increased.



Most listening rooms will substantially damage the frequency response of any speaker by the time the sound gets to the listeners ear. Especially smaller rooms.

There's both the comb-filter effects described above, and room resonances. They are completely different mechanisms. Only the latter rings, and has a start-up time and a decay time.

A transient or very short-lived note may not last long enough to get a room ringing, but musical notes held for any significant time will.

Because of this, fixing a room resonance problem with EQ is tricky. A compromise is necessary.

Corners appear to "ring" the most, and are therefore arguably the most important place to put sound absorptive materials (from my experience).

I found that nailing 2 inch cotton rope (soft foam rubber would work too) in the corners of the listening room reduced room ringing substantially, using the hand clap test.

If it's practical, I rcommend putting acoustically absorptive materials in the corners (2 surface and 3 surface corners).

Ringing in the midrange frequencies (500HZ - 10kHZ) can get very tedious to listen to over time. You may not realize the degree to which it is happening.

My living room with absorbtive material nailed into the corners.

I used 2 inch cotton rope, and later found it to be very flamable. Most foam rubber has fire retardant in it, so is much safer.


Room Power Response (what the room reflects to the listener). This is a question of what energy gets reflected or focussed toward the listening position of the room. It's highly dependent on a speakers off axis frequency response. At higher ferquncies more of the energy goes straight to the listener. The lower you go in frequency, the more the energy diffracts and gets more affected by the room boundaries (walls and large furniture). In a multi-way speaker system, there can be an abrupt change in off-axis projection of the sound due to transducer size, at a crossover frequency, which will be reflected in the room power response. If for example, a 10 inch woofer crosses over to a 1 inch tweeter at 2kHZ, the speaker system will have a significantly uneven off-axis frequency response. At 2kHZ a 10 inch driver will be very directional, and a 1 inch dome tweeter will have a much wider dispersion pattern.

Because the ear-brain mechanism "compresses" the dynamic range , and because acoustic energy drops off logarithmically with distance (twice as far away might be 1/4 the power, depending on a bunch of variables), the acoustic "signature" of a room will be significantly more of an issue when the room is smaller. The reflection paths aren't nearly as long as they would be in a larger room, so the reflected energy is not as attenuated by travel distance as it would be in the larger room. So on one hand it's better to have room boundary reflections be distant from both the speakers and the listening position, but at the same time you want many random reflections that will fill in each others comb filter cancellations...

If a room is so big that reflections fall in the range of 50mS - 150mS, you get intelligability problems (according to David Griesinger, formerly of Lexicon). If the room is rectangular, the other dimension should NOT be an exact multiple of the first (shortest) dimension, or acoustic problems double up. With ratios of 1:1.4 or 1.62, the comb filter cancellations and possible resonance effects will largely be spread out in frequency, with minimal double-ups at both the fundamental frequencies and the harmonics of those frequencies. There may be other ratios that are good too, but these are the ones I know of.

Understanding these issues helps a lot when trying to improve things, but real world room acoustics are riddled with variables. Improving acoustics of a room is usually experimental and difficult.

Most "acoustically absorptive" material, (foam, fiberglass insulation, thick felt padding, etc.) works pretty good at high frequencies, but usually does little at low frequencies (below about 400HZ).

To make a difference at low frequencies the "acoustic material" must have significant mass (physical weight). Multiplex movie theaters have concrete cinder-block walls separating the theater rooms from each other as an effort to block bass frequency energy from leaking to the adjacent theater rooms. Despite that, you can still hear some bass from adjacent theater rooms sometimes. They use cinder blocks because of their weight.


Sidewall Reflections will create a sense of spaciousness for most recordings, but can be tedious to listen to over time. It never changes with the program material. It also blurs or dominates any stereo imaging cues that might be embedded in the program material. So less "fidelity" but some people like it. (I prefer to keep it minimal).

Floor and/or ceiling bounce are often some of the more damaging mechanisms because there's no furniture in the way that might attenuate or randomize the energies. Especially ceiling bounce in the typical flat ceiling room with a carpeted floor. Therefore, making a speaker more directional on the vertical axis is usually a good thing. Vertical line array speakers are substantially more directional on the vertical axis, so have less of a problem with this.

Reflections off the front wall of a listening room (behind the speakers you are facing) are actually enjoyable if the delay of the reflection is greater than 6mS, meaning the speaker must be out from the wall at least 3 feet, for this to be a plus. It's a "psycho-acoustic" effect. This is one of the reasons why many people like the experience of open-baffle speakers positioned 3ft.+ out from any walls. Although experts (such as Linkwitz) agree on the 6mS number, I suspect that it may vary some with frequency. Reflections in the range of 6mS - 20mS are considered helpful because they help decorrelate imaging cues, which effectively enlarges the "sweet spot" area.

If you decide to use a graphic or parametric EQ to even out the response at the listening position (not necessarily recommended above about 300HZ), you can attenuate (reduce) peaks, but it's a mistake to try and pull up cancellations. It doesn't really work and causes problems. It creates peaks elsewhere in the room. Cancellations are totally dependent on mic, speaker and/or listener location, on all 3 axis. Peaks are more likely from the speaker frequency response or room ringing.

Moving the woofers to different locations in the room can be the best way to even out the lower mid/upper bass frequency response issues, which is why I often prefer the Satellite/Subwoofer arrangement. Adding more woofers that are physically distant from one another is also one of the better ways to even out the lower mid and bass frequency response anywhere in the room.

I don't recommend putting woofers in a 3 surface corner. You get more effective output level (reinforcement), but it stimulates the room acoustic effects's more, usually causing a more uneven perceived frequency response.


As you can see, acoustics are a very complicated variable. How speakers interact with listening room acoustics is often the weakest link in any playback system.