Originally Posted by Floyd Toole
This is always a challenge. All perceptual models, however elaborate, attempt to replicate what humans can hear - they do not define what humans can hear. I played with primitive neurological models in my PhD sound localization work in the early '60s.
The simple fact is that humans can hear, and identify the sound of very high Q resonances. In the 1988 paper I referred to earlier, and in my books, it is clear that Q=50 resonances are recognizable and audible, but the thresholds of audibility are higher than for lower Q resonances - as measured in the frequency response. Part of the reason why the thresholds are high is that they occupy a small spectral footprint, meaning that a sound of a quite specific frequency must be present long enough to energize the resonances. It is basic physics. Such sounds are relatively rare in music, compared to lower Q resonances that can be energized by a wider range of frequencies. So, logically, the detectability of resonances is very dependent on the program material. Close miked rock and roll is very forgiving. As the spectral density increases and reverberation is included, thresholds drop. Reflections make us more sensitive to resonances - they are repetitions, giving the listener repeated "looks" at the sound.
Thanks for making a point I've been meaning to make ... that the reason that lower Q resonances are more audible is simply that they are more readily energized by real content. The worst case, of course, would be music consisting of long-running pure tone sine waves. In that situation, we really can "hear" the frequency response as it would be measured at each ear.
Originally Posted by Floyd Toole
The concept that critical bands, ERBn and such are measures of the resolution of the hearing system are faulty. They have meaning, but this is not it. This is discussed in my original book, and more elaborately in the new one.
I look forward to reading your criticism of this concept. Though, I believe critical band theory actually is very useful. The point to realize is that just because the bands / ERBs may be only 1/3rd to 1/9th octave wide doesn't mean that we can't hear frequencies at higher resolution than that. So long as the sampling time interval of the hypothetical cochlear filter system is short enough, the brain has sufficient to information to assess frequency much more precisely than the bandwidth of the filters themselves.
I suspect that we can take away at least one useful piece of information from the critical band experiments. We may be able to conclude a rough upper bound on the temporal resolution of the hearing system. I believe the actual resolution is frequency dependent and may be approximately represented as a Gaussian time window whose Fourier transform is also a Gaussian with a 1/3rd octave bandwidth. This information can be used, in turn, to estimate the ability of the hearing system to independently resolve direct sound arrivals and reflections at different frequencies.
My own experiments, albeit based entirely on my own non-blind subjective judgment, suggest that 1/3rd octave is the optimal bandwidth to use for assessing the overall spectral balance of sound. Therefore, this is the resolution to use with frequency dependent windows (aka complex smoothing and not magnitude smoothing) when fitting in-room response to a desired target. Much higher resolutions do, of course, remain useful for assessing resonances, but 1/3rd octave is king for overall tonal balance.
My experiments suggest that the desired target for music is flat for most of the mids and highs, has a slight dip (maybe 1 dB or so) in the low mids/upper bass (very roughly 175-350 Hz), and slopes up several dB as frequency drops into the sub bass range. Presumably this target approximately reflects what happens to the first arrival sound from a "typical" speaker that measures flat under anechoic conditions and is placed near a floor but far away enough from the other boundaries that their reflections are easily distinguished by the hearing system until much lower frequencies (likely well into the modal region, in many cases). Interference with the floor creates the dip and subsequent rise in the sub bass range in the first arrival sound.
I can't recall where I found it, but I believe Harman has published a headphone target curve, which was determined using listener experiments. IIRC, its low frequency characteristics look very similar to the curve I describe above, including the slight upper bass / low mid dip.
Of course in reality, the region of the dip and rising bass response for an anechoic speaker placed in-room will vary substantially depending on the listener distance, the distance between the woofer(s) and floor, and crossover characteristics. However, mix and master engineers strive to achieve consistency in their work by comparing their work to existing content. In this way, the Circle of Confusion works to our advantage by reducing variation in tonal balance that might otherwise arise, even when using perfectly flat monitors, due to differences in driver orientation, room placement, and listener distance that impact sound below 500 Hz.