Originally Posted by CharlesJ
There are no peer reviewed Journal publications.
Papers published in the JAES (as opposed to JAES convention preprints and presentations) are peer-reviewed.
So that gives us at least these JAES papers up through Nov 2007 that discuss or use DBT methods:
The Great Debate: Subjective Evaluation
Volume 29 Number 7/8 pp. 482-491; July/August 1981
Authors: Lipshitz, Stanley P.; Vanderkooy, John
A polarization of people has occurred regarding subjective evaluation, separating those who believe that audible differences are related to measurable differences in controlled tests, from those who believe that such differences have no direct relationship to measurements. Tests are necessary to resolve such differences of opinion, and to further the state of audio and open new areas of understanding. We argue that highly controlled tests are necessary to transform subjective evaluation to an objective plane so that preferences and bias can be eliminated, in the quest for determining the accuracy of an audio component. In order for subjective tests to be meaningful to others, the following should be observed. (1) There must be technical competence to prevent obvious and/or subtle effects from affecting the test. (2) Linear differences must be thoroughly excised before conclusions about nonlinear errors can be reached. (3) The subjective judgment required in the test must be simple, such as the ability to discriminate between two components, using an absolute reference wherever possible. (4) The test must be blind or preferably double-blind. To implement such tests we advocate the use of A/B switchboxes. The box itself can be tested for audibly intrusive effects, and several embellishments are described which allow double-blind procedures to be used in listening tests. We believe that the burden of proof must lie with those who make new hypotheses regarding subjective tests. This alone would wipe out most criticisms of the controlled tests reported in the literature. Speculation is changed to fact only by careful experimentation. Recent references are given which support out point of view. The significance of differences in audio components is discussed, and in conclusion we detail some of our tests, hypotheses and speculations.
High-Resolution Subjective Testing Using a Double-Blind Comparator
Volume 30 Number 5 pp. 330-338; May 1982
A system for the practical implementation of double-blind audibility tests is described. The controller is a self-contained unit, designed to provide setup and operational convenience while giving the user maximum sensitivity to detect differences. Standards for response matching and other controls are suggested as well as statistical methods of evaluating data. Test results to date are summarized.
On the Magnitude and Audibility of FM Distortion in Loudspeakers
Volume 30 Number 10 pp. 694-700; October 1982
Authors: Allison, Roy; Villchur, Edgar
Beers and Belar, in their 1943 paper on Doppler effect in loudspeakers, recognized and pointed out limitations in the scope of their analysis. They also suggested simple methods for keeping FM distortion products below the level of audibility, such as dividing the spectrum among at least two drivers. Recent work is described which extends Beers and Belar's analysis along lines they suggested, and which, by means of double-blind listening tests, provides confirming evidence that Doppler distribution in practical multidriver systems is indeed inaudible.
A New Method for the Design of Crossover Filters
Volume 37 Number 6 pp. 445-454; June 1989
Author: Aarts, R. M.
A new method is presented for the design and evaluation of loudspeaker crossover filters. The desired system characteristic can be prescribed by a (complex) acoustic transfer function rather than an electrical one only. It may be derived from conventional filters or based on a measured filter from a reference (favorable) system. Double blind listening tests are performed to verify subjectively the similarity between the reference system and its experimental counterpart. The drivers of the experimental loudspeaker are preceded by digital filters, enabling the limitation of several different favorable loudspeakers. Multidimensional scaling techniques are applied to represent the results of the listening tests. These results affirm the strength of the design method.
Observations on the Audibility of Acoustic Polarity
Volume 42 Number 4 pp. 245-253; April 1994
Authors: Greiner, R. A.; Melton, Douglas E.
A number of experiments are described which show that absolute acoustic polarity is clearly audible in certain select cases of reproduced sound from acoustical instruments. The nature of the audible differences and the characteristics of the temporal signals which lend themselves to audibility are described. A large double blind listening experiment using varied musical program material is described as well.
The Sound of Midrange Horns for Studio Monitors
Volume 44 Number 1/2 pp. 23-36; January/February 1996
Authors: Holland, Keith R.; Fahy, Frank J.; Newell, Philip R.
A blind listening test is described in which 16 loudspeakers are compared with four reference loudspeakers under anechoic conditions. The test is concerned with the perceived sonic similarity between midrange horn loudspeakers and direct radiators and is intended to pinpoint the physical cause of a "characteristic sound" attributed to many studio monitor systems equipped with midfrequency-range horns. Comparisons are made between the listening test results and measurements of on-axis frequency response. The results indicate that short horns sound more similar to direct-radiating loudspeakers than long horns.It is concluded that the reflections from the mouth termination of long horns is responsible for the characteristic sound and that for studio monitor applications, a midrange horn should have a length not exceeding 340 mm and should be free of flare discontinuities.
Analyzing Listening Tests with the Directional Two-Tailed Test
Volume 44 Number 10 pp. 850-863; October 1996
Authors: Leventhal, Les; Huynh, Cam-Loi
Researchers typically analyze double-blind listening tests with a one-tailed significance test, which provides for deciding whether performance is better than chance. But the little-known directional two-tailed test, which provides for deciding whether performance is better or worse than chance, may be more useful to some investigators. This paper compares the tests, discusses when to use them, and provides statistical tables for conducting the tests without calculation.
Subjective Evaluation of State-of-the-Art Two-Channel Audio Codecs
Volume 46 Number 3 pp. 164-177; March 1998
Authors: Soulodre, Gilbert A.; Grusec, Theodore; Lavoie, Michel; Thibault, Louis
The results of double-blind subjective tests are reported, which were conducted to examine the audio quality of several state-of-the-art two-channel audio codecs against a CD-quality reference. Implementation of the MPEG Layer 2, MPEG Layer 3, MPEG AAC, Dolby AC-3, and Lucent PAC codecs were evaluated at the Communications Research Centre in Ottawa, Canada, in accordance with the subjective testing procedures outlined in ITU-R Recommendation BS.1116. The bit rates varied between 64 and 192 kbit/s per stereo pair. The study is unique in that this is the first time that these codecs have been compared in a single test. Clear results were obtained for comparing the subjective performance of the codecs at each bit rate. All codecs were software based and constituted the most current implementations at the time of testing. An additional hardware-based MPEG Layer 2 codec was included in the tests as a benchmark.
Subjective Testing of Compression Drivers
Volume 53 Number 12 pp. 1152-1157; December 2005
Authors: Geddes, Earl R.; Lee, Lidia W.; Magalotti, Roberto
[Engineering Report] A subjective test was devised and performed in order to assess the factors that influence the perception of sound emitted by compression drivers. A musical passage was high-pass filtered and played through three compression drivers of similar characteristics, loaded by a plane-wave tube, and recorded. To obtain different levels of nonlinear distortion, the passage was played at three different voltage levels on each driver. The resulting sound files were recombined with the low-pass-filtered portion, yielding nine complete sound pieces whose only differences from the original passage were caused by the drivers’ behavior. The nine stimuli were then presented, in a double-blind test, to 27 subjects, who were asked to rate audible differences when compared to the original passage. Analysis of the results shows that the differences in frequency response between drivers are statistically significant, whereas differences in playing level, and therefore nonlinear distortion, were not significant. This unexpected result implies that nonlinear distortion is not audible under these test conditions, and it leads to important conclusions regarding the design objectives of compression drivers.
Add to these any of Toole and Olive's papers on loudspeaker preference in the JAES.
There's a reason many of these deal with loudspeakers. It's because loudspeakers really are likely to sound DIFFERENT -- you can predict that just from how they work, and how they measure. So it becomes a study of audible determinants of preference
-- arguably a more fruitful and interesting topic than 'can people hear a difference between CD players?'.