Quote:
Originally Posted by
K Shep /t/1315667/list-of-devices-that-bypass-ipods-internal-dac/60#post_23107112
Quote:
Originally Posted by arnyk /t/1315667/list-of-devices-that-bypass-ipods-internal-dac/60#post_23106636
Ever try to do something more exacting, better controlled, or better bias controlled?
Voltmeter, digital sound level meter and changing equipment by one as another sits forward of the audio equipment. In my opinion the issue with listening tests by the average person in their home is the time interval between audio tracks (test tones).
I would advise anyone curious about audio evaluation to read chapters 17-18 of "Sound Reproduction: Floyd E. Toole"
Since you mentioned it:
"
17.5 BIAS FROM NONAUDITORY FACTORS
A widespread belief among audio professionals is that they are immune to the influences of brand, price, appearance, and so on. They persist in conducting listening evaluations with the contending products in full view. This applies to persons in the recording industry, audio journalists/reviewers, and loudspeaker engineers.
As this is being written, the 45th anniversary issue of Stereophile magazine arrived (November 2007). In John Atkinson’s editorial, he interviewed J. Gordon Holt, the man who created the magazine. Holt commented as follows: As far as the real world is concerned, high-end audio lost its credibility during the 1980s, when it fl atly refused to submit to the kind of basic honesty controls (double-blind testing, for example) that had legitimized every other serious scientifi c endeavor since Pascal. [This refusal] is a source of endless derisive amusement among rational people and of perpetual embarrassment for me, because I am associated by so many people with the mess my disciples made of spreading my gospel.
When I joined Harman International, listening tests were casual affairs, usually sighted. At a certain point it seemed appropriate to conduct a test, a demonstration that there was a problem. It would be based on two listening evaluations that were identical, except one was blind and one was sighted (Tooleand Olive, 1994).
Forty listeners participated in a test of their abilities to maintain objectivity in the face of visible information about products. All were Harman employees, so brand loyalty would be a bias in the sighted tests. They were about equally divided between experienced listeners, those who had previously participated in controlled listening tests, and inexperienced, those who had not. Figure 17.12 shows that in the blind tests, there were two pairs of statistically indistinguishable loudspeakers: the two European “voicings” of the same Bias from Nonauditory Factors 359
The anechoic data were unreliable below 200 Hz. Two of the loudspeakers were visually identical, large floorstanding units, representing alternative crossover network designs from different sales/marketing regions in Europe thought to cater to special regional tastes in sound. The third product was a recently introduced, inexpensive subwoofer satellite system with sound-quality performance that belied its small size and low cost. This was to be the honesty check-in sighted tests. The fourth product was a respected high-end product, a large fl oor-standing unit, from a competitor. One review of it claimed sound quality “equal to products twice its price.” Another allowed that there were “a few $10 000 speakers that come close.” Because this test was an evaluation of sound quality, not dynamic capabilities, care was taken not to drive the small system into overload. Loudness levels were equalized as
well as possible, using a combination of measurements and listening. They remained unchanged throughout the test. The small bars on top of the large verticals are 95% confi dence error bars, an indication of the difference between the ratings required for the difference not to be attributable to random factors.
In the sighted version of the test, loyal employees gave the big attractive Harman products even higher scores. However, the little inexpensive sub/sat system dropped in the ratings; apparently its unprepossessing demeanor overcame employee loyalty. Obviously, something small and made of plastic cannot compete with something large and stylishly crafted of highly polished wood. The large, attractive competitor improved its rating but not enough to win out over the local product. It all seemed very predictable.
From the Harman perspective, the good news was that two products were absolutely not necessary for the European marketing regions. (So much for intense arguments that such a sound could not possibly be sold in [pick a country].) In general, though, what listeners saw changed what (they thought) they heard. Dissecting the data and looking at results for listeners of different genders and levels of experience, Figure 17.13 shows that experienced males (there were no females who had participated in previous tests) distinguished themselves by delivering lower scores for all of the loudspeakers. This is a common trend among experienced listeners. Otherwise, the pattern of the ratings was very similar to those provided by inexperienced males and females. Over the years, female listeners have consistently done well in listening tests, one reason being that they tend to have closer to normal hearing than males. Lack of experience in both sexes shows up mainly in elevated levels of variability in responses (note the longer error bars), but the responses themselves, when averaged, reveal patterns similar to those of more experienced listeners. With experienced listeners,
statistically reliable data can be obtained in less time.
The effects of room position at low frequencies have been well documented in Chapters 12 and 13. It would be remarkable if these did not reveal themselves in subjective evaluations. This was tested in a second experiment where the loudspeakers were auditioned in two locations that would yield quite different sound signatures. Figure 17.14 shows that listeners responded to the differences
Summarizing, it is clear that knowing the identities of the loudspeakers under test can change subjective ratings.
■ They can change the ratings to correspond to presumed capabilities of the product, based on price, size, or reputation.
■ So strong is that attachment of “perceived” sound quality to the identity of the product that in sighted tests, listeners substantially ignored easily audible problems associated with loudspeaker location in the room and interactions with different programs.
These fidings mean that if one wishes to obtain candid opinions about how a loudspeaker sounds, the tests must be done blind. The good news is that if
the appropriate controls are in place, experienced and inexperienced listeners of both genders are able to deliver useful opinions. Inexperienced listeners simply take longer, more repetitions, to produce the same confidence levels in their ratings.
Other investigations agree. Bech (1992) observed that hearing levels of listeners should not exceed 15 dB at any audiometric frequency and that training is essential. He noted that most subjects reached a plateau of performance after only four training sessions. At that point, the test statistic FL should be used to identify the best listeners. Olive (2003), some of whose results are shown in Figure 17.6, compiled data on 268 listeners and found no important differences between the ratings of carefully selected and trained listeners and those from several other backgrounds, some in audio, some not, some with listening experience, some with none. There were, as shown in Figure 17.6, huge differences in the variability and scaling of the ratings, so selection and training have substantial benefi ts in time savings. Rumsey et al. (2005) also found strong similarities in ratings of audio quality between naive and experienced listeners, anticipating only a 10% error in predicting ratings of naïve listeners from those of experienced listeners.
In the end, the best news for the audio industry is that if something is done well, ordinary customers may actually recognize it. The pity is that there is no source of such unbiased listening test data for customers to go to for help in making purchasing decisions.
It is paradoxical that opinions of reviewers are held in special esteem. Why are these people in positions of such trust? The listening tests they perform violate the most basic rules of good practice for eliminating bias. They offer us no credentials, no proofs of performance, not even an audiogram to tell us that their hearing is not impaired. Perhaps it is the gift of literacy that is the differentiator, the ability to convey in a colorful turn of phrase some aspects of what they believe they hear. Adding insult to injury, as will be discussed in the following chapter, most reviews offer no meaningful measurements so that readers might form their own impressions.
Fortunately, it turns out that in the right circumstances most of us, including reviewers, possess “the gift”—the ability to form useful opinions about sound and to express them in ways that have real meaning. All that is needed to liberate the skill is the opportunity to listen in an unbiased frame of mind.
"