The application of blind and double-blind tests is thought by a small, but vocal, minority in the audio community to be the supreme evaluation standard for detecting audible differences in audio systems. It is true that some types of audio systems are well suited for blind and double-blind A/B or A/B/X type tests. A/B and A/B/X tests are useful in scenarios when the two audio signals being compared are simple in nature. For example, telephone company engineers have routinely used, and continue to use, A/B and A/B/X tests to evaluate improvements in voice circuit quality.     However, we must realize and understand that a test that is suitable for one type of audio system might not be suitable for another. It is worth noting that the same company (the Bell Telephone System) that was responsible for the invention and implementation of telephone service was the same company that was responsible for the invention and implementation of home stereophonic audio systems.    It is even more interesting to note that while A/B and A/B/X tests were found to be appropriate for evaluating voice quality improvements on bandwidth-limited telephone circuits, subjective, non-blind listening tests based on careful listening, evaluator training and realistic home listening conditions were the scientific standards for the evaluation of stereophonic audio systems.         
It should not be too difficult to understand that a testing methodology that is appropriate for evaluating simple band-limited monophonic signals would most probably not be appropriate for evaluating complex stereophonic signals that cover the full range of human hearing and which are designed to convey aural, spatial and tactile information. Telephone systems are audio systems, but they are audio systems which are primarily designed to convey clear voice communication. Stereophonic systems are audio systems, but they are audio systems which are designed to convey a weighty, complex, realistic illusion of a three-dimensional music concert performance.
Origins Of Blind Stereophonic Audio Testing
A paper published by Jon Boley and Michael Lester in the proceedings of the 127th Convention of the Audio Engineering Society, October 2009, stated:
"ABX tests have been around for decades and provide a simple, intuitive means to determine if there is an audible difference between two signals."
Within the audio engineering community, the ABX methodology has become the standard psychoacoustic test for determining if an audible difference exists between two signals." 
The first statement is true if the signals are very simple in nature, especially if they are monophonic signals. The second statement is questionable since both founders of the ABX audio testing religion wrote ten-year follow-up papers lamenting the widespread unacceptance of ABX testing by audio engineers and the audio press.  
Ethan Winer, at his "Audio Myths, Artifact Audibility and Comb Filtering" workshop presented at the 127th Convention of the Audio Engineering Society in October 2009 stated:
"Double blind tests are the gold standard in every field of science."
"It amazes me when some people claim that double blind testing is not valid for assessing audio gear." 
What is truly amazing is that some people would stray so far from the scientifically valid subjective listening evaluation procedures developed at Bell Telephone Laboratories and other electronics firms that participated in the invention and early development of home stereophonic systems (e.g. General Electric, Radio Corporation of America, etc.).
Another amazing feature of the Winer presentation is that he included a staged purse-snatching demonstation (at time 9:56) to illustrate the unreliability of short term visual memory. None of the audience members could accurately identify the "purse-snatcher", even though some were sure that they could. The purpetrator was only in the room for 10 seconds. Mr. Winer later contradicts himself (at time 27:50) by advocating the use of an audio evaluation test that uses short term aural memory.
As far as I have been able to determine, the seminal papers in the application of ABX methodology to stereophonic systems are a paper presented by Stanley Lip****z, Ph.D. and Dr. John Vandekooy Ph.D. in 1980 to the 65th Convention of the Audio Engineering Society in London and a paper presented by David Clark in 1981 to the 69th Convention of the Audio Engineering Society in Los Angeles.
Drs. Lip****z and Vanderkooy stated:
"In order for subjective tests to be meaningful to others, the following should be observed...The test must be blind or preferably double-blind. To implement such tests we advocate the use of A/B switchboxes." 
Mr. Clark stated:
"Listening tests used to evaluate audio equipment can seldom be considered scientific tests".
"A system for practical implementation of double-blind audiobility tests is described. The controller is a self-contained unit, designed to provide setup and operational convenience while giving the user maximum sensitivity to detect differences." 
The contoller that Mr. Clark mentioned was an "ABX Comparator" system that he and some associates were marketing through the "ABX Company".
It is curious to note that neither of these seminal papers present a discussion of how the proposed ABX methodology relates to the evaluation of the primary performance metrics of stereophonic sound systems, such as:
1. Optimization of sound stage width,
2. Optimization of sound stage depth,
3. Stable stereo image placement,
6. Dynamics (dynamic range),
7. Tactile impact,
8. Sonic realism.
Whereas the founders of stereophonic audio systems emphasized listener education and ear training with music, Mr. Clark proposed a different training paradign for increasing the resolution sensitivity of listeners ([18, p. 332]):
"Great improvements in resolution can be achieved if the listener knows what to listen for. Sensitizing tests can use pink noise, sine waves, or pulses as appropriate to hear a difference. Sometimes an artificially enhanced distortion can be produced by reducing feedback or connecting multiple devices in series for distortion buildup. The listener is then more able to hear the difference in music."
Ten years after writing this, Mr. Clark, in a paper presented to the 91st Audio Engineering Society Convention in 1991 stated:
"Ten years ago [in 1982] the present author presented a paper to the AES on double-blind testing using the A/B/X technique. For the next five years, a device to conveniently implement this test was commercially available. It was thought by the author and his associates that general use of this system would resolve "The Great Debate" of whether or not small differences in audio components were audible." 
In the same paper, Mr. Clark stated:
"It becomes an ethical and perhaps legal question when it is claimed that improved sound quality is delivered despite failure of tests to prove it.
This would be less of an issue if the number of engineers who dismiss double-blind test results were small, but this is not the case. As Chairman of an AES Workshop on Esoteric Audio in 1988, I asked, by a show of hands, who in the audience believed that different gain-matched amplifiers of modern design sound different from each other. It was stated that all would measure good in conventional tests and all were operated below clipping or other gross distortion levels. Approximately 70% of the audience indicated they believed the amplifiers would likely sound different. This is an amazing response from members of an engineering society which failed to support the claim." 
Ten years after his seminal double-blind audio testing advocacy paper was published, Dr. Lip****z presented a paper to the AES 8th International Conference in 1991 in which he stated:
"It is now ten years since my initial involvement in the controversy surrounding double-blind subjective testing in audio and twelve years since this subject first hit the headlines with the Quad power amplifier comparison challenge in England.
A lot of water has passed under the bridge in the intervening years, but our hopes of a decade ago, that the validity of the method would be generally accepted by the audio press and adopted wherever feasible, have not been realized." 
At the same 1990 AES conference where Dr. Lip****z lamented the lack of widespread acceptance of blind testing in stereophonic system evaluation, Tom Nousaine, in a paper entitled "The Great Debate: Is Anyone Winning?", was decidely more upbeat:
"This paper simply presents a compilation of the twenty two blind and double blind listening tests [from 1978 to 1990] of power amplifiers for which numerical results have been published. There is a rather large collection of data which contains some surprising information and ultimately confirms that one side of the debate seems to have a commanding lead."
Mr. Nousaine concluded:
"Many factors can contribute to the subjective enjoyment of a given amplifier but sound quality differences are not among them.
This does not suggest that amplifiers are perfect and they will never be found to sound different. It does suggest to purchasers of today's audio amplifiers that as long as the product in question meets basic traditional measured performance standards, has enough output capability, and adequate quality of construction, it will be sonically indistinguishable from all others meeting those criteria." 
It is a shame that Mr. Nousaine and some others view the pursuit of stereophonic audio (stereophony) as some sort of contest to be won rather than an attempt to recreate a realistic reproduction of the live concert experience in the listener's home.
At the same 1991 AES conference where Mr. Clark lamented the lack of widespread acceptance of blind testing in stereophonic system evaluation and the demise of his company which was offering ABX Comparator devices, Tom Nousaine, in a paper entitled "Can You Trust Your Ears?", was decidely more upbeat:
"In assembling a summary of 22 blind listening tests published between 1978 and 1990 we discovered that subjects consistently reported preferences or differences in sound quality when given two identical alternatives."
"The evidence clearly demonstrates a bias in listeners who have an interest in audio to report differences which do not exist." 
Whereas the inventor of sterephonic home audio systems, Dr. Harvey Fletcher, and the other early researchers in the field advocated the use of comfortable, stress-free listening environments similar to a typical consumer's home, Mr. Nousaine preferred a different evaluative environment. The "test design" section of the paper stated:
"Our basic test procedure requirements included:
1. Listeners with a strong interest in audio sound quality ("Audiophiles")
2. Listeners with no audio background ("Consumers")
3. Single or Double Blind presentations
4. "Preference Scoring (Prefer A, B or Neither?")
5. Reasonably large sample size (30 by category and 300 overall)
6. Controlled introduction of loudness differences (1 dB)
7. "Purchase-like" conditions/scientific controls"
* Short musical selections
* Single listeners or small groups
* Written scores
* A/B Presentations with no repeat
* Maximum 10 trials/15 minute sessions (Preserve listener freshness)" 
In blind and double-blind telephone voice quality trials, the tests are administered under conditions representative of the way consumers actually use telephones. In blind and double-blind pharmaceutical trials, medicine is administered under conditions representative of the way patients would actually use it. However, when we see blind and double-blind trials applied to stereophonic audio systems, they are consistently used in a manner that is highly unrealistic and detrimental to an accurate stereophonic presentation. Why would a purportedly serious audio evaluation study seek to replicate "Purchase-like" listening conditions? This is not optimal, reasonable, realistic or scientific for a study which purports to be a critical analysis of a listener's ability to discern sound quality differences in stereophonic sound systems. "Purchase-like" listening conditions would be more appropriate for a marketing study. Why not replicate typical "home-like" listening conditions? That is what the scientists did in , , , , , , ,  and .
Why would a purportedly serious audio evaluation study only allow very short musical selections? This was actually answered in the paper:
"Subjects also consistently requested shorter evaluation intervals. Sixty seconds seemed too long and some became impatient with 30 seconds. Subjects were paid a small stipend for participation." 
Really? Thirty seconds was "too long" to listen to a musical selection during a critical listening session conducted as part of a "scientific study"? I wonder if any of the "audiophiles" preferred the 30 second music snippets. One might also wonder if the participants were just there to collect a check as quickly as possible.
Why would a purportedly serious audio evaluation study use a "small group" listening arrangement? The optimal sound quality of stereophonic audio systems is only found in the stereo sweet spot, which can only be occupied by one person at a time. This is a fundamental design aspect of stereophonic audio systems which must never be violated in critical listening situations unless the intent is to acquire performance data for off-axis (out of the sweet spot) listening.
The extremely short listening intervals (30-60 seconds) and group listening sessions provide some insight into the contributing factors toward different listening impressions from identical pieces of equipment.
Mr. Nousaine provides some statistical analysis of his test results. One of the references in Mr. Nousaine's "Can You Trust Your Ears?" study is "Introduction to Probability and Statistics" (1967) by William Mendenhall. Mr. Nousaine cites Mendenhall's chapter 11. However, chapter 9 provides some important insight:
"The reader will note that we have employed two different statistical tests to test the same hypothesis. Is it not peculiar that the t test, which utilizes more information (the actual sample measurements) than the binomial test, fails to supply sufficient evidence for rejection of the hypothesis u1 = u2?
The explanation of this seeming inconsistency is quite simple. The t test described in Section 9.3 is not the proper statistical test to be used for our example." [Emphasis mine.]
Comparisons Of A/B/X and Blind Test Procedures To Basic Sterephonic Principles
Dr. Harvey Fletcher said that his motivation for inventing sterophonic sound systems (initially called auditory perspective systems) was to provide an exciting home concert experience:
"This symposium describes principles and apparatus involved in the reproduction of music in large halls, the reproduction being of a character that may give even greater emotional thrills to music lovers than those experienced from the original music." 
Therefore, high quality stereophonic sound systems are based on the following principles:
1. Realistic and accurate reproduction of the sonic and tactile sensations of the live concert experience,
2. Stable and realistic sonic imaging,
3. Sonic images distributed throughout a three-dimensional sound stage that is analogous to the way real instruments and singers are distributed throughout an actual concert stage,
4. Lifelike instrumental and vocal clarity,
5. Lifelike instrumental and vocal detail,
6. Listener education and training,
7. Emotional thrills.
In the 1981 seminal paper on blind testing for stereophonic systems, Drs. Lip****z and Vanderkooy appear to mock the founding design principles of stereophonic sound systems:
"The last half-dozen years have seen a remarkable proliferation in the number of adjectives used to describe the alleged audible qualities of audio components by the audio press, both above and below ground-words such as depth, air, graininess and liquidity spring to mind. The differences supposedly characterized by these emotive epithets are generally discovered during the course of a listening test, in which the component under test is either heard in isolation, or else is compared with other components of the same type." 
Seemingly unknown to Drs. Lip****z and Vanderkooy, much of the basic descriptive vocabulary used by the audio press came directly from peer-reviewed scientific journal papers written by the inventor of stereophonic sound and subsequent researchers in the field. For example, Steinberg and Snow devote considerable discussion to stereophonic depth perception in . T. Somerville in  devotes considerable discussion to concert hall and home listening room acoustics. This is the paper in which the term "sound stage" was coined. Somerville discusses reverberation effects (which later came to be called "air" and "ambiance"), sound stage width and depth and "aesthetic presentation". One of the most profound comments made by Somerville was:
"A listener in a concert hall, because of the binaural characteristics of hearing, is able to distinguish between the various sections of the orchestra, and, in particular, he can pick out the solo part. In this, hearing is also assisted by sight."  [Emphasis mine.]
After over fifty years of scientific brilliance in the field of stereophonic sound, some people came in with "better" ideas:
1. According to Lip****z and Vanderkooy, It is better to blindfold or otherwise visually handicap music listeners. This directly conflicts with the fact that sound image localization is one of the basic principles of stereophonic reproduction. Somerville and other scientists found that hearing (sound localization and sonic depth perception) is assisted by sight. [12, p. 205]
2. According to Clark, the best ear training is gained from listening to pink noise, sine waves, pulses and artificially enhanced distortion. According to Nousaine and other blind-listening test proponents, critical listening sessions are best conducted when music is listened to in 30 to 60 second snippets. This directly conflicts with the fact that spatially correct reproduction of a live concert experience is one of the basic principles of stereophonic reproduction. Gaining evaluative expertise by listening to actual musical performances was the preferred training regimen of the founding scientists working on stereophonic systems. Stereophonic sound systems were invented for music lovers, not test signal lovers.
3. According to Clark, Nousaine, and other A/B/X and blind listening test proponents, sitting off axis and far outside the stereo sweet spot and/or listening in a group environment is a proper method of evaluating stereophonic sound systems. This is absurd and gives the impression that blind listening proponents are desperate to prove a point: That audiophiles are a delusional lot and that the perceived differences in audio components are largely imaginary.
The highly regarded (among blind audio testing proponents) A/B/X test is not scientifically applicable to stereophonic systems. According to the worldwide standard basic textbook on sensory evaluation techniques , the A/B/X test, which is usually referred to in the scientific literature by its proper name: duo-trio balanced reference test, is appropriate under the following conditions:
"Duo-Trio Test - Scope and Application
The duo-trio test (ISO 2004a) is statistically less efficient than the triangle test because the chance of obtaining a correct result by guessing is 1 in 2. On the other hand, the test is simple and easily understood.
Use this method when the test objective is to determine whether a sensory difference exists between two samples. This method is particularly useful in situations
1. To determine whether product differences result from a change in ingredients, processing, packaging, or storage.
2. To determine whether an overall difference exists, where no specific attributes can be identified as having been affected.
The duo-trio test has general application whenever more than 15, and preferably more than 30, test subjects are available. As a general rule, the minimum is 16 subjects, but for less than 28, the beta error is high. Discrimination is much improved if 32, 40, or a larger number [of subjects] can be employed."
[Beta error is a statistical error in testing when it is concluded that something is negative when it is actually positive. Beta error is often referred to as "false negative".]
"Two forms of the test exist: the constant reference mode, in which the same sample, usually drawn from regular production, is always the reference, and the balanced reference mode [ABX], in which both of the samples being compared are used at random as the reference.
Use the constant reference mode with trained subjects whenever a product well known to them can be used as the reference.
Use the balanced reference [ABX] mode if both samples are unknown or if untrained subjects are used." 
We now have nearly thirty years of documented experience in the application of blind and A/B/X testing to stereophonic audio systems. The A/B/X test setup arrangements and test results have been consistently absurd and consistently statistically similar to guessing. One would think that, after all these years of "all amplifiers sound alike", A/B/X and blind listening test proponents would begin to question the validity of their testing methodology. I don't expect this to ever happen because ridiculing audiophiles is so much fun.
The literature promoting these tests display a profound lack of understanding of how stereophonic systems work and a profound lack of understanding of how the human senses work. Some senses (sight/hearing and taste/smell) are closely interrelated. For a given sensory exercise, more than one sense may be employed. The primary sense for a given stimulus is typically enhanced by the sensory contributions of secondary senses. Compromising one secondary sense can adversely affect the perception of the primary sense.      During food consumption, flavor (taste) is the primary stimulus, but the perception of flavor is enhanced by the appearance (sight), aroma (smell), texture (touch), and sound (hearing) of the food. Soggy cereal is every bit as nutritious and tastes the same as crunchy cereal, but most people won't eat a bowl of soggy cereal if the texture is expected to be crunchy. They might perceive it as tasting bad, even though the chemical composition that affects taste receptors is the same. This is an example of an impairment in the sense of touch (secondary sense) affecting the sense of taste (primary sense) during food consumption.
The literal meaning of "stereophonic" is "solid sound" (from the Greek "stereos" for "solid" and "phone" for "sound").  Solid things can be seen and felt. Therefore, "stereophonic" audio systems produce "solid sound" or sound which can be heard, seen and felt. The live concert experience is a sensory exercise which employs the senses of hearing, sight and touch. These same senses are employed in the home stereo experience. For stereophonic listening, the sense of hearing is primary, but the senses of sight and touch play important secondary roles in the creation of the stereophonic illusion of "solid sound". Compromising a person's sense of sight and/or touch, especially in the short term, is stressful and can have detrimental affects on sound localization ability. This stress can exert undue influence on listening evaluation results.
Stereophonic music reproduction is designed according to the principles of sound localization, long term sonic memory of actual musical events, and the reception of tactile sensations from the sound stage. Blind audio testing, which includes visually obscuring all or part of the sound stage, rapid switching of musical selections and off-axis and group seating, impairs the listener's ability to localize sounds (seeing), to internalize and evaluate aural cues (hearing) and to receive correct stereophonic tactile information (touching). Any stereophonic audio system testing methodology which compromises and hinders the processes of human sensory perception is useless.
ABX and blind testing proponents say that they want to apply a scientifically rigorous testing methodology to stereophonic audio in order to determine if the claimed differences in audio components actually exist. However, they ignore decades of scientifically and mathematically rigorous subjective listening techniques that were developed by the inventor and subsequent researchers in the field of stereophonic sound.
 Snow, W. B., "Audible Frequency Ranges of Music, Speech and Noise", Journal of the Acoustical Society of America, Vol. 3, Issue 1A, July 1931, pp. 155-166.
 Farhid, M. and Tinati, M.A., "Robust voice conversion systems using MFDWC", Proceedings of the 2008 IEEE International Symposium on Telecommunications, Tehran, Iran, August 2009, pp. 778-781.
 Kun, Liu Jianping and Zhang, Yonghong Yan, "High Quality Voice Conversion through Phoneme-Based Linear Mapping Functions with STRAIGHT for Mandarin", Proceedings of the IEEE Fourth International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, China, August 2007, pp. 410-414.
Chinese Acad. of Sci., Beijing
 Cheng-Yuan Lin and Jang, J.-S.R. "New Refinement Schemes for Voice Conversion", Proceedings of the IEEE 2003 International Conference on Multimedia and Expo, Baltimore, MD, July 2003, pp. 25-728.
 Fletcher, Harvey, "Symposium on Wire Transmission of Symphonic Music and Its Reproduction in Auditory Perspective-Basic Requirements", Bell System Technical Journal, Vol. 13, 1934, pp. 239-244.
 Fletcher, Harvey, "Hearing, The Determining Factor for High-Fidelity Transmission", Proceedings of the I.R.E., Columbus, OH, June 1942, pp. 266-277.
 Hilliard, John K., "The History of Stereophonic Sound Reproduction", Proceedings of the Institute of Radio Engineers, Vol. 50, No. 5, May 1962, pp. 776-780.
 Fletcher, Harvey, “Hearing Aids and Deafness”, Bell Laboratories Record, Vol. 5, No. 2, October 1927, p. 33.
 Steinberg, J. C., and Snow, W. B., "Symposium on Wire Transmission of Symphonic Music and Its Reproduction in Auditory Perspective-Physical Factors", Bell System Technical Journal, Vol. 13, 1934, pp. 245-258.
 Harvey, F. K. and Schroeder, M. R., "Subjective Evaluation of Factors Affecting Two-Channel Stereophony", Journal of The Audio Engineering Society, Vol. 9, No. 1, January 1961, pp. 19-28.
 Moir, J. and Leslie, J. A., "The Stereophonic Reproduction of Speech and Music", Journal of the British Institution of Radio Engineers, London, September 1951, pp. 360-366.
 Somerville, T., "Survey of Stereophony", Proceedings of the Institution of Electrical Engineers, Convention on Stereophonic Sound Recording, Reproduction and Broadcasting, London, March 1959, pp. 201-208.
 Moore, H. B., "Listener Ratings of Stereophonic Systems", Institute of Radio Engineers Transactions on Audio, September-October 1960.
 Beaubein, W. H. and Moore, H. B., "Perception of Stereophonic Effect as a Function of Frequency", Journal of the Audio Engineering Society, Vol. 8, No. 2, April 1960, pp. 76-86.
 Boley, Jon and Lester, Michael, "Statistical Analysis of ABX Results Using Signal Detection Theory", Proceedings of the 127th Convention of the Audio Engineering Society, October 2009, New York, NY.
 Winer, Ethan, "Audio Myths, Artifact Audibility and Comb Filtering" workshop presented at the 127th Convention of the Audio Engineering Society in October 2009, New York, NY. http://www.youtube.com/watch?v=BYTlN6wjcvQ
, YouTube video accessed 8/3/2010.
 Lip****z, S. P. and Vanderkooy, J., "The Great Debate: Subjective Evaluation", Journal of the Audio Engineering Society, Vol. 29, No. 7/8, July/August 1981, pp. 482-491.
 Clark, D., "High-Resolution Subjective Testing Using a Double-Blind Comparator", Journal of the Audio Engineering Society, Vol. 30, No. 5, May 1982, pp. 330-338.
 Clark, David, "Ten Years Of A/B/X Testing", 91st Audio Engineering Society Convention, New York, NY, October 1991.
 Lip****z, S., "The Great Debate-Some Reflections Ten Years Later", Audio Engineering Society 8th International Conference, Washington, D.C., May 1990, pp. 121-123.
 Nousaine, Tom, "The Great Debate: Is Anyone Winning?", Audio Engineering Society 8th International Conference, Washington, D.C., May 1990, pp. 117-119.
 Nousaine, Tom, "Can You Trust Your Ears?", 91st Audio Engineering Society Convention, New York, NY, October 1991.
 Schjonneberg, K. and Olson, F., "Listening Test Methods and Evaluation", Journal of the Audio Engineering Society, Vol. 9, No. 1, January 1961, pp. 29-36.
 Olsen, Harry F., "Stereophonic Sound Reproduction in the Home", Journal of the Audio Engineering Society, Vol. 6, No. 2, April 1958, pp. 80-90.
 Snow, W. B., "Auditory Perspective", Bell Laboratories Technical Journal, Vol. 12, No. 7, March 1934, pp. 194-198.
 Meilgaard, Morten, Civille, Gail Vance, Carr, B. Thomas, "Sensory Evaluation Techniques", 4th Ed., CRC Press, Boca Raton, FL, 2007, pp. 72-73.
 Hull, J., "Touching the Rock: An Experience of Blindness", Vintage Publishing, London, 1992.
 Montagu, A., "Touching: The Human Significance of the Skin", Columbia University Press, New York, London, 1971.
 Wright, D., "Deafness: A Personal Account", Faber, London, 1990.
 Yuan, Yi-Fu, "Sight and Picture", Geographical Review, Vol. 69, pp. 413-422, 1979.
 Rodaway, Paul, "Sensuous Geographies: Body, Sense and Place", Routledge, London, New York, 1994.
 World Book Dictionary, 1973 ed. Vol. 2, p. 2033.