Acoustics 101

(c) Copyright, Robert B. Richards, 2016


How We Hear in Rooms

The human ear-brain mechanism perceives sound in many different ways depending on frequency.

Sound travels at about 1128 feet per second, at about 20 degrees C (68F). It changes some but not much over temperature. To figure out the size of a wavelength when you know the frequency, you just divide speed by frequency, or 1128/f. so the wavelength of 100HZ would be 1128/100 = 11.28 feet. The wavelength of 7kHZ would be 1128/7K = 0.161 feet or 1.93 inches. At 20kHZ the wavelength is 0.677 inches.

In a typical living room:

High treble (6kHZ - 20kHZ) is a very directional energy, and is therefore usually thought of as a vector. The higher in frequency you go, the more likely the energy will get absorbed by typical furniture. It will bounce off anything that has a hard surface. When reflected (delayed) sound reaches the listeners ear and combines (adds) with direct sound from the speaker, it will create what's called a "comb filter" effect. At the frequency where the delay is a half wavelength, there will be a cancellation of energy. There will also be cancellations at all the integral multiples of that "fundamental" cancellation frequency. If a half wavelength delay were to cause a fundamental cancellation at 1KHZ (a reflection path 1.12 ft. longer than the direct path), there would also be cancellations at 2K, 3K, 4K, etc. The cancellations depend on the hardness of the reflection sufaces involved, but are often 10dB or more deep. Usually less in the highest frequencies due to absorbtion. In a typical room there will be many reflection paths, with randomly different delay times, and therefore randomly different cancellation frequencies. When many different reflection path energies combine (add) at the listeners ear, the cancellations of any given reflection path will likely get largely filled in by the energy of other reflection paths, which will have different cancellation frequencies. With enough different reflections reaching the ear, you end up with a relatively shallow ripple in the perceived frequency response, rather than the rather large cancellations that you would get if there were only a few reflection paths. When it comes to sensing the location of a sound in the higher frequencies (above about 6kHZ), the wavelengths are so small that the size and shape of our outer ear comes into play and helps us determine height as well. I believe this is a learned perception, somewhat based on a comparison to the perception of the lower frequency energies.

Here's a theoretical example of what happens when you combine only one reflection with a direct sound path. My first hand experience showed that it can be significantly worse than this theoretical example. Even in a carpted room.

This comb-filter concept applys to all frequencies, not just the high treble. Putting acoustically absorbtive material on walls, away from corners can actually make things worse in many cases, because it reduces the number of random reflections, each of which can help fill in each others comb filter cancellations. The best place to put absorbtive materials appears to be in both two surface and three surface corners.

Upper-midrange (1KHZ - 6kHZ) is less directional than the higher frequencies, but is still substantially directional. It will have virtually the same multi-path comb-filter effects and potential significant acoustic issues as above. This is the frequency range where the ear is most sensitive, especially at lower levels (See Fletcher-Munson graph below). This is also the frequency range where most "stereo effect" is relatively perceivable by the human ear-brain mechanism, since "Inter-aural crosstalk" confuses imaging in the frequencies below about 1kHZ in a standard 2 speaker playback system. In this frequency range, we sense stereo image location by amplitude comparisons, rather than timing (or phase) comparisons (as in the lower frequencies). This is because above about 1kHZ, the half-wavelength becomes shorter than the distance between our ears, so the brain has no way of knowing which period of waveshape it's comparing. To get the best stereo effect and imaging, you want the best possible amplitude balance over this frequency range between the left and right speakers.

This is the Fletcher-Munson graph. An average of a whole bunch of people who were tested way back in time (1950's?).

It shows how the human ear sensitivity changes with both frequency and amplitude.

At low levels, it's harder to hear the bass and high treble.

This varies from person to person, but is considered a good average of many people.

If a speaker had the above frequency response, we would think it had equal loudness at all audio frequencies.

A typical listening level of about 80dB (at 1kHZ) means you need to turn up the bass by almost 20dB, in order to perceive it as having good balance.

You would also want to turn down the upper midrange frequencies (centered around 3-4kHZ) by roughly 5dB.

This is usually compensated for in the recording process to some extent. As you can see, it varies significantly with level.

Many Hi-Fi enthuriasts think a minimalist flat frequency response preamp design with no tone controls is better. They are sadly mistaken.

I find that a 4 section Baxandall tone control circuit is an excellent way to go. It allows me to dial in just the right "Loudness Compensation" effect, and deal with room acoustics issue to some extent. It can also greatly improve some recordings that were poorly mastered.

Lower Midrange (100HZ - 1KHZ). Below about 1kHZ, we perceive sound image location more by timing (or phase) comparisons than by amplitude comparisons. A true "binaural" stereo recording that gives almost perfect wideband imaging when listened to with headphones, won't give good imaging in this frequency range with the conventional 2 speaker stereo setup, due to inter-aural crosstalk. Inter-aural crosstalk needs to only happen once, either at the recording end or at the playback end. Otherwise the ear-brain mechanism gets confused for frequencies below about 1kHZ. To some extent, listening room reflections can sometimes create a false sense of space in this frequency range, but any embedded imaging cues in the program material will be hard for the ear-brain mechanism to interpret if the inter-aural crosstalk happens twice; once in the recording process, and then again with different delay timings when listed to in the conventional 2 speaker set-up.

When you get down below about 400HZ, the energy becomes substantially more diffractive (bends around corners more easily), and room reflections work in a different way. Instead of analyzing the wave behavior as vectors, it's considered more effective to analyze the energy as "pressure waves". The overall size and shape of the room will color this frequency range more than specific reflection paths. Due to the size of the wavelengths and diffraction, cancellations are less likely to get filled in by multiple reflection paths. There are often only a few substantially effective reflection paths below about 400HZ in a typical living room in a modest house or apartment, so cancellation areas of frequency are less likely to get filled in by alternate reflection paths, as happens more in the higher frequencies. Big cancellation dips in the lower frequencies are usually what cause lower midrange and upper bass to sound "boomy". Positioning speakers away from walls, and especially corners, will usually minimize this boomyness effect. Lower frequencies will then get less reinforcement from walls (so you'll then probably want to turn up the bass using tone controls). The speaker system may have a ruler flat frequency response when measured in an anechoic chamber (no room reflections at any frequency), but at the listening position in a typical living room, it's often pretty bad.

Here's one example of how a typical listening room destroyed the frequency response of a highly accurate speaker system, at the listening location chair.

A high end speaker system


It's frequency response measured up close (similar to if it was being tested in an Anechoic Chamber).


Same speaker measured further back, in a relatively typical listening room, with the calibrated mic about where a typical listener would be located (about 8 feet out in this example).

The blue curve is the actual measurement, the orange curve is believed to be how the typical ear-brain mechanism perceives it.

Bottom Line: How a speaker interacts acoustically with the listening room is one of the most important things to worry about in any playback system.

Low bass (20HZ - 100HZ) Because the ear-brain mechanism is less sensitive to these frequencies (see Fletcher-Munson graph), it is typical that most of the energy in a musical performance will be in the lower mid and bass frequencies, so we perceive it as being in balance with the higher frequency energies. These "pressure wave" energies are much less directional. The wavelengths are between about 10 feet and 50 feet long, and are usually substantially reinforced by typical room boundaries (walls, large heavy furniture, etc.). Many speakers have a roll-off in the frequency response below about 80HZ. Better ones get closer to 40HZ (lowest note on a 4 string bass guitar). When a speaker is acoustically relatively flat down to 25-30HZ, it can sound much better, not just for the extra low bass notes, but because all the higher frequency energy in real-world music comes in "envelops" that contain significant energy down to near DC. You can see this for yourself on a realtime analyzer. A sense of "presence" is increased. Drums sound much better IMO.



Resonance is usually where two parallel surfaces or walls will cause "ringing", when acoustic energy is fed into the space between them, at frequencies where half wavelengths and all integral sub-multiples of that wavlength fit perfectly between the walls. Those certain frequencies will effectively be amplified. Using active EQ to try and bring up cancellations is pretty much always a wrong thing to do (it can create peaks in other locations in the same room and usually only makes things worse), but hammering down resonant frequency issues with active EQ is often a good idea. Because a resonance elongates the time of the envelope in real world music, it's arguable that any resonace should actually be EQ'd to a slightly negative level (maybe 3-6dB), relative to the rest of the frequency response, to compensate for how the ear-brain mechanism is going to perceive those elongated resonant frequencies. A really good listening room would be designed to have few or no parallel walls.

Most listening rooms will substantially damage the frequency response of any speaker by the time the sound gets to the listeners ear. Especially smaller rooms, because acoustic energy dies off exponentially with distance. Twice the distance might give you approximately 1/4 the energy level (if the energy source was a point source, which it usually effectively is).

There's both the comb-filter effects described above, and room resonances described here. They are completely different mechanisms. Only the latter rings, and has a start-up time and a decay time.

A transient or very short-lived note may not last long enough to get a room ringing audibly, but musical notes held for any significant time will.

Because of this, fixing a room resonance problem with EQ is tricky. A compromise is necessary.

Corners appear to "ring" the most, and are therefore arguably the most important place to put sound absorptive materials (from my experience).

I found that nailing 2 inch cotton rope (soft foam rubber would work too) in the corners of the listening room reduced room ringing substantially, using the hand clap test.

If it's practical, I rcommend putting acoustically absorptive materials in the corners (2 surface and 3 surface corners).

Ringing in the midrange frequencies (500HZ - 10kHZ) can get very tedious to listen to over time. You may not realize the degree to which it is happening.

My living room with absorbtive material nailed into the corners.

I used 2 inch cotton rope, and later found it to be very flamable. Most foam rubber has fire retardant in it, so is much safer.


"Room Power Response" (what the room reflects to the listener - everything except the direct path from speaker to ear). It's highly dependent on a speakers off axis frequency response and where in the room the speakers are located. At higher ferquncies more of the energy is directional, so might go straight to the listening position, without much interaction with the room acoustics. The lower you go in frequency, the more the energy diffracts and gets more affected by the room boundaries (walls and large furniture).

In a multi-way speaker system, there can be an abrupt change in off-axis projection of the sound due to transducer size. This will be reflected in the room "power response". If for example, a 10 inch woofer crosses over to a 1 inch tweeter at 2kHZ, the speaker system will have a significantly uneven off-axis frequency response. At 2kHZ a 10 inch driver will be very directional, and a 1 inch dome tweeter will have a much wider dispersion pattern. At 2kHZ the off axis FR will jump up by several dB. It's arguable that you always want to use the driver with the smallest diameter, that can also handle the acoustic power output that you will want.

If you are building a 3 way tri-amp'd 24dB/octave actively crossed over speaker system, and you want your midrange driver to be "competent" IMO down to 100HZ for example, I'd recomend at least a 5 inch driver for typical small room use, 2 fives is significantly better. If you only need the midrange driver to go down to 500HZ, a 3 inch driver (with actual cone diameter of about 2 inch) would be my choice. This way you get a reasonable max SPL (loudness level), while maintaining the best off axis response that's practical.

Because the ear-brain mechanism "compresses" the dynamic range , and because acoustic energy drops off logarithmically with distance (twice as far away might be 1/4 the power, depending on a bunch of variables), the acoustic "signature" of a room will be significantly more of an issue when the room is smaller. The reflection paths aren't nearly as long as they would be in a larger room, so the reflected energy is not as attenuated by travel distance, as it would be in the larger room. So on one hand it's better to have room boundary reflections be distant from both the speakers and the listening position, so less interaction. But at the same, especially at the mid and higher frequencies, having many off axis room reflections can help fill in what ever comb filter cancellations exist. It's a tradeoff. But less room acoustics interactions usually give a better result.

If a room is so big that reflections fall in the range of 50mS - 150mS, you get intelligability problems (according to David Griesinger, formerly of Lexicon). If the room is rectangular, the other dimension should NOT be an exact multiple of the first (shortest) dimension, or acoustic problems double up. With ratios of 1:1.4 or 1.62, the comb filter cancellations and possible resonance effects will largely be spread out in frequency, with minimal double-ups at both the fundamental frequencies and the harmonics of those frequencies. There may be other ratios that are good too, but these are the ones I know of. If I remember correctly, the 1:1.62 is called the golden ratio, which shows up in nature a lot.

Understanding these issues helps a lot when trying to improve things, but real world room acoustics are riddled with variables. Improving acoustics of a room is usually experimental and difficult.

Most "acoustically absorptive" material, (foam, fiberglass insulation, thick felt padding, etc.) works pretty good at high frequencies, but usually does little at low frequencies (below about 400HZ), unless it has significant mass attached to it.

To make a difference at low frequencies the "acoustic material" must have significant mass (physical weight). Multiplex movie theaters have concrete cinder-block walls separating the theater rooms from each other as an effort to block bass frequency energy from leaking into the adjacent theater rooms. Despite that, you can still hear some bass from adjacent theater rooms sometimes (the movie next door has bombs going off, for ex.). They use cinder blocks because of their weight.


Sidewall Reflections will create a sense of spaciousness for most recordings, but can be tedious to listen to over time. It never changes with the program material. It also blurs or dominates any stereo imaging cues that might be embedded in the program material. So less "fidelity" but some people like it. (I prefer to keep it minimal).

Floor and/or ceiling bounce are often some of the more damaging mechanisms because there's no furniture in the way that might attenuate or randomize the energies. Especially ceiling bounce in the typical flat ceiling room with a carpeted floor. Therefore, making a speaker more directional on the vertical axis is usually a good thing. Vertical line array speakers are substantially more directional on the vertical axis, so have less of a problem with this.

Reflections off the front wall of a listening room (behind the speakers you are facing) are actually enjoyable if the delay of the reflection is greater than 6mS, meaning the speaker must be out from the wall at least 3 feet, for this to be a plus. It's a "psycho-acoustic" effect. This is one of the reasons why many people like the experience of open-baffle speakers positioned 3ft.+ out from any walls. Although experts (such as Linkwitz) agree on the 6mS number, I suspect that it may vary some with frequency. Reflections in the range of 6mS - 20mS are considered helpful because they help decorrelate imaging cues, which effectively enlarges the "sweet spot" area and create more of a sense of depth. Open baffle speakers can be difficult to use in smaller rooms.

If you decide to use a graphic or parametric EQ to even out the response at the listening position (not necessarily recommended above about 500HZ), you can attenuate (reduce) peaks, but it's a mistake to try and pull up cancellations. It doesn't really work and causes problems. It creates peaks elsewhere in the room. Cancellations are totally dependent on mic, speaker and/or listener location, on all 3 axis. Peaks are more likely from the speaker frequency response or room ringing, and are OK to hammer down with an equalizer.

Moving the woofers to different locations in the room can be the best way to even out the lower mid/upper bass frequency response issues, which is why I often prefer the Satellite/Subwoofer arrangement. Adding more woofers that are physically distant from one another, is also one of the better ways to even out the lower mid and bass frequency response anywhere in the room.

I don't recommend putting woofers in a 3 surface corner. You get more effective output level (reinforcement), but it stimulates the room acoustic effects's more, usually causing a more uneven perceived frequency response at the listening chair.


As you can see, acoustics are a very complicated variable. How speakers interact with listening room acoustics is often the weakest link of any playback system. Even weaker than the speaker itself.