Doing a test blind, or double-blind, is essential. But the test method matters greatly to what the test can reveal.
Listening to a single loudspeaker - the single stimulus method - is not very useful as it relies heavily on our poor auditory memory. Only if the speaker has an obvious flaw, like a howling resonance, is it likely to stand out. Listening to different speakers in different rooms is almost useless. Speakers with minor flaws, especially those associated with spectral balance or uniformity (on or off axis) can be adapted to - the human "breaks in". Prolonged listening to a wide variety of material is useful.
An A vs. B test is excellent at revealing differences - remember to balance A to B and B to A sequences. Problems shared by both speakers may go unnoticed, though.
Multiple comparisons of three or four loudspeakers are best, and of course, the most difficult to organize. It is good that mono listening is the most revealing listening format
. Why? Because what listeners are required to do in a test is to:
1. separate the sound of the speaker from that of the room - which is why we do all tests in the same room, using positional substitution or at least close spacing to minimize room interaction variations, especially in the bass.
2. separate the sound of the program from that of the speaker and the room.
Having multiple loudspeakers makes all of this very much easier. It is quickly obvious how timbre is modified by each of the speakers, and having varied programs will reveal different kinds of defects. The most obvious problems are associated with resonances and the associated spectral irregularities. Inconsistent off axis performance is likely the second most common issue.
Low-frequency extension is a significant factor, as bass contributes about 30% of the factor weighting in an overall sound quality evaluation. Other things being equal the speaker with the lowest bass will have an advantage.
Equal loudness presentations are the goal, but with the differences in frequency response it may not be possible to achieve perfectly. The good news is that resonances, the most common flaws, are not very sensitive to playback level. Spectral balance is more sensitive because of the equal loudness contours (a.k.a Fletcher/Munson curves). But this is also an issue in program - the circle of confusion - which is why varied program sources are needed. Solo voices and musical instruments are good for demos, but not very revealing in listening tests. Don't get hung up on your favorite recording.