Originally Posted by _tk
Level matching was ultimately done by ear, not instrumentation
[Not that they were initially using the right kind nor the right calibration test signal in the first place):
"Pink noise was used to set levels until the panel got tired of the blast of noise and found we could set levels even better by ear(s) than the RTA I was using."
This is unacceptable because the person doing it by ear is cognizant of which device is currently active. [It's also rather odd to be performing the calibration with the panelists already assembled in the room. It should be done before
they enter.] So their testing is invalid right from the start, but I'll continue.
"The DACs were hidden behind the system, out of view of the judges."
It's always a red flag to me when a tester thinks a blind test means using blindfolds or obscuring the device under test itself. That's immaterial [assuming they don't have some sort of a flashing indicator exposing they are the one currently in the signal path. This is called a "tell"]. What counts is the listeners not knowing which DAC is currently the active one in the signal path.
The person who did the actual switching remained in the room so the test was not double blind. Their body language or unintentional grimace after making a switch, for example, could have theoretically influenced the listeners.
Statistical significance was determined by opinion, not math:
"5 of the 6 judges indicated that the differences were "Significant" with one saying they were "Subtle".
The test methodology was flawed because it assumed, from the get go, that there are
audible distinctions in the first place. This was never established at all
. Instead 6 DACs [in this shoot out, I haven't examined shootout 2 since this one is already so flawed]. And by votes of preference by the listeners they picked their "favorite". Say "C" or "D" on this score sheet:
Nowhere on the score sheet is the instruction "And if you don't feel you hear any distinction, write the words 'neither DAC preferred" so this made the test essentially a "forced preference test". There is no option for "no difference found". In such a test as this you are assuredly going to find a winner most of the time using such a small number of trials, somewhat similar to how if you flipped a fair coin say 5 times to see if it favored heads or tails it too would always show a "preference".
There are other questions not answered like:
"Did each listener hear the sound in isolation or did they see the body language of the other participants as they filled out their score sheet?"
"Was the presentation order randomized per listener or did everyone comparing (say) DAC C to DAC D always hear C first and then D second? Scientific studies published in peer reviewed journals have found that second exposure test signals are statistically preferred by listeners even when they are (unbeknownst to the listeners) literally the exact same test signal.