Originally Posted by srk052004
Second, there have been no apparent attempts at objective "calibration" of the human listeners. This will necessarily be tricky, but since even doubters regarding CD players accept that speakers differ, some kind of acceptable calibration could be done. But much more could be done, given the digital signal processing built into many devices. Delays could be manipulated in particular ways, with the results recorded. Tone controls could be manipulated. The same pair of speakers could be used twice for each listener, once in phase, once out of phase.
These points actually are related: listeners who have difficulty with such tests might be able to detect a difference only if, say, 1000 of them were tested; listeners who easily passed might be able to detect a difference with an N of 20.
Depending on what you mean by "calibration" that's done all the time, though the stuff that falls into the category of studying the human auditory system is usually not done in direct conjunction with the listening tests.
Most simple double blind ABXs include a non-blinded familiarization period where listeners are allowed to familiarize themselves with the equipment under test using the same sources that will be used during testing. These tests allow the subjects to convince themselves that they know what they are listening to (ie; calibrate themselves to the conditions).
The hearing tests you suggest (eg. speakers out of phase, etc.) are used in three ways:
1) to study the threshold of audibility. This kind of testing determines what kinds of changes in audio signals people will consistently report as being audible. This is now getting to the point where we can use fMRI studies to actually see what parts of the brain are being activated when people report something as audible vs. what parts of the brain are in use otherwise.
2) to test the ability of listeners to be "trained" to detect a difference. Relative phase, for example, seems to be one of those areas where some listeners can be trained to pick up differences more consistently once they know what to listen for. (Reports vary on whether absolute phase is audible, though I've seen pretty good evidence it is and I'm personally pretty convinced it is.) From what I'm familiar with a lot of this training is apparently lost after not being used for even short periods of time.
3) to group and classify subjects for other testing. Eg. researchers will look for statistical variations and correlations between groups known to have differences in their thresholds of audibility for certain signals. This can be used to show that a particular defect in hearing does or does not affect the ability to hear something else under test (eg. frequency response vs. phase response).
In short, the kind of studies you suggest have been done for many, many years now. In particular, by two groups; our friends in the Audio Engineering Society (AES) and the medical community. It probably should come as no surprise that the results correlate well between these groups and that the research on thresholds of audibility correlates well with the results of double blind ABX testing.