Originally Posted by Floyd Toole
Notes on loudspeaker evaluations.
It is important to realize that the subjective rating scale is elastic. In the beginning of my evaluations, several decades ago, nothing was really good and I adopted a "fidelity" scale, where 10 was the best sound imaginable and 0 was the worst. To maintain some consistency in the subjective ratings it was necessary to provide "anchors" - products that were low scoring as well as some that were high scoring. Winners in these tests became the new "king of the hill". The candidates under test would then be rated within a context of products in general. It worked very well until most of the loudspeakers being tested achieved high ratings. On the scale of 10, these good products would all crowd together at the top of the scale, and differences were not statistically significant.
However, when a group of good products were compared with each other, the ratings expanded to fill the scale - the scale is elastic. But now the rating number cannot be considered in the original context of truly bad to truly good. Instead it is a relative rating, which I chose to call "preference" instead of "fidelity".
Two products that are truly very good, which would sit at the very top of a "fidelity" or "accuracy" scale in the global scheme of all loudspeakers might, when compared in isolation, generate numbers that suggest a large difference. Claims that product A "blows away" product B are often really just indications of a small, slightly audible difference. The winner is still the winner, but the relative ratings are unrealistic. This is why I suggest adding a third or fourth product into a randomized comparison (blind of course).
This is a very good point, and this applies to a wide variety of other studies, scientific or not, that involve subjects assigning a numerical rating. Great care must be taken when analyzing and drawing conclusions from this kind of data because the meaning of the numbers is very sensitive to context.
Originally Posted by Floyd Toole
Having done this for hundreds of products with hundreds of listeners, it is common to find that the very best products end up in a statistical tie. There is a point of diminishing returns. The biggest variable is the program material - the circle of confusion - and its interactions with the different products. As I keep on reminding people, the world's "best" loudspeaker cannot sound good with all recordings.
The good news is that a neutral loudspeaker is substantially recognizable from a spinorama set of measurements. The problem is that such data are scarce.
I would argue that just because two speakers end in a statistical tie doesn't mean that they sound the same. Of course, that's not what you are actually saying here, but I think it may be helpful to clarify to other readers that a statistical tie does not imply equivalence at all. All it really says is that not enough information is available to reach a conclusion one way or the other.
Along those lines, I do wonder about something though. Among speakers that rate similar to one another on average, do you notice any significant preference correlations between speakers and listeners? My thinking is that while a more neutral sound may be a universally preferred, preference for particular *flaws* may be much more individual. For example, suppose listeners are asked to compare two obviously flawed speakers, one with bloated bass and one with harsh treble. On average, they may rate equally, but perhaps some listeners always prefer the bass heavy speaker and some listeners always prefer the treble heavy speaker.
I think this preference for flaws could explain the proliferation of a wide diversity of poor performing audio products. In fact, the culture of the industry seems to encourage it. Audio product consumers are actually encouraged to indulge their preferences for flaws.
Originally Posted by Floyd Toole
Tweeter types? The evidence is in the measurements. Over the years membrane tweeters have shown a tendency to power compress, but this may or may not be true for all of them now. The larger size causes more beaming. Compression drivers have traditionally had great difficulties at very high frequencies, ending up in chaotic breakup. The new designs are greatly superior, so now there is a choice. In the meantime domes have just moved from good to excellent. With the addition of properly designed waveguides to match directivity at crossover good tweeters have become better tweeters. They need to be designed for individual systems though, taking into account midrange size and diffraction effects.
Chaotic breakup is not the only problem with CDs, right? Another problem is the inconsistent off-axis behavior of those break-ups. Yet another problem in typical applications involving horns, including the M2, the horn does not control dispersion in the very high frequencies like it does lower.
The M2 has very consistent dispersion from 500-10000 Hz, but above 10000 Hz, the pattern appears to narrow substantially. A similar pattern of narrowing occurs for the Revel Salon 2, and indeed this is a common characteristic in speakers in general as transducers are rarely designed to be smaller than 1" or so. When this occurs, the on-axis and listening window average measurements diverge substantially.
When deciding how to voice the speaker, which do you regard to be more important? And what about early reflections and power response? I know I'm asking some very difficult questions because if I look at spinoramas for the M2 and the Salon 2, I can see that different approaches were used for each speaker. The M2 is optimized for a flat listening window average response. The Salon 2, being a passive speaker, does not conform to any particular target exactly, but it appears to favor a flatter on-axis response while allowing more roll-off of the listening window average response. I'm inclined to conclude from those pictures that the M2 sounds brighter than the Salon 2, and that at least one of them does not have a subjectively neutral sound. Though I think this may be oversimplify things a lot. I'll maybe elaborate my thoughts on this in later posts.
For what it's worth, I recently listened to the M2s in John's room, sitting on-axis most of the time. My impression was that the top sounded very smooth (lacking any resonances) but also a little bit hot. It's tempting to say something like "some people will like that sound", but if the goal was to achieve a perfectly neutral sound, then I'm not sure the M2 accomplishes that as well in the top octave as in the 500-10k Hz range. Then again, I'm not sure any speaker does this well. When off-axis but still well within the 120 degree window, the M2s still sounded very good but lost just a bit of the magic. I could definitely perceive the beaming when moving my head a few feet side-to-side from the on-axis sweet spot. It looks like the beaming in the Salon 2 may be a bit less severe.