High Resolution Audio Perception Meta-Analysis

The June 2016 issue of the Journal of the Audio Engineering Society (JAES) includes a fascinating paper entitled, “A Meta-Analysis of High Resolution Audio Perceptual Evaluation” by Joshua D. Reiss, PhD, a Reader (professor) at Queen Mary University of London’s Centre for Digital Music. In his paper, Dr. Reiss summarizes the results of his meta-analysis of 18 published experiments, with a total of over 400 participants in over 12,500 trials, intended to determine if high-res audio can be perceptually distinguished from “standard” audio that conforms to CD specs.

For those who might not be familiar with the term “meta-analysis,” it’s a process that compiles data from multiple studies, performs statistical analysis on the aggregate data, and draws new conclusions from this analysis. As Wikipedia puts it, “Conceptually, a meta-analysis uses a statistical approach to combine the results from multiple studies in an effort to increase power (over individual studies), improve estimates of the size of the effect and/or to resolve uncertainty when reports disagree. A meta-analysis is a statistical overview of the results from one or more systematic review. Basically, it produces a weighted average of the included study results.”

The studies used in the meta-analysis were not identical. In fact, they can be divided into two main groups: auditory perception resolution and format discrimination. In the first group, most of the studies were concerned with frequency and temporal resolution—that is, the extent to which humans perceive frequencies above 20 kHz as well as time smearing caused by lowpass and anti-aliasing filters. The second group were focused on how well humans can distinguish between different formats, including CD, high-res PCM, and DSD.

Also of interest is the methodology used in these studies. They included AB (play sample A, then sample B, ask participants which is which), ABX (play sample A, then sample B, then one or the other and ask participants to identify it as A or B), and others. In addition, an analysis of the different methodologies and possible biases that could arise from them helps to understand the results. In the end, the meta-analysis sought to transform and combine the results from the various studies to determine the statistics of correct responses across all of them.

HRA-Study-List
The first section of this table lists the studies included in the meta-analysis. The second section identifies the risk of various potential biases (“-” means low risk, “?” means unclear risk, and “? in a box” means high risk) along with the types of errors those risks lead to (Type I = false positive, Type II = false negative, Neutral = neither). The third section indicates the total number of trials and correct answers for each study with the associated binomial probability—assuming there is no discernable difference, this is the probability of obtaining at least that many correct answers. The numbers in boldface are statistically significant results.

The paper makes special note of the Meyer and Moran study from 2007, which has been widely discussed on AVS Forum and elsewhere—in fact, AVS Forum is cited in Dr. Reiss’ paper! It has the most participants of any study, but not all the required data was available, so it could only be included in parts of the meta-analysis. Among the problems cited in that study was that many of the test tracks might not have included high-resolution content for three different reasons: The encoding scheme of SACD obscures frequencies above 20 kHz, the mastering of SACD and DVD-Audio content might have applied additional lowpass filters, and some of the source material might not have been recorded in high resolution in the first place. Also, according to Dr. Reiss’ paper, the experimental setup was not well-described, and the experiment was not well-controlled. Still, there was enough valid data to use in certain parts of the meta-analysis.

One aspect of the meta-analysis was to see if trained listeners—those who had been instructed in what to listen for, heard examples and learned the results before the test, and so on—performed significantly better than untrained listeners. (Not all studies identified trained and untrained listeners, so only those that did are included in this part of the meta-analysis.) As you might expect, trained listeners perform much better, as shown in the data plot below.

HRA-Trained-Untrained
In this forest plot, the untrained listeners are clustered around 50% correct, though most of the studies are slightly above that. The trained listeners performed much better, usually between 60% and 70% correct, though the confidence intervals are much wider than for most of the studies with untrained listeners.

Several other factors were examined to see how they might have affected the results. These included the duration of each stimulus and the interval between stimuli, the test methodology, and bit depth (most of the studies were more concerned with high-frequency content) among others.

Overall, the meta-analysis concluded that, within the included studies, there was a small but statistically significant ability to discriminate between CD-quality audio and high-resolution audio with specs beyond those of CD. Also, trained listeners were substantially more successful than untrained participants at correctly distinguishing between high-res and CD-quality audio. In addition, the selection of stimuli and their duration may play an important role in the ability to discriminate, and the potential biases of the studies tended toward Type II (false negative) errors.

Several aspects of high-res audio perception could not be confirmed or denied because they were not a significant part of the studies. For example, as I mentioned, most of the studies were more concerned with high sample rates, not bit depth. Also, none of the studies used headphones, so questions about how headphones affect high-res audio perception remain open. Perhaps most important, the specific parameters of the audio systems used in the studies—such as the choice of applied filters, audio formats, and hardware components of the recording and playback systems—were not considered.

Clearly, more research is needed to quantify how high-res audio is perceived and how strongly it affects the listening experience. But this paper adds to our growing knowledge base and supports the notion that high-res audio is worthwhile—at least for those of us who care about such things.

Dr. Reiss’ paper is available to download for free; to get it, click here. But be forewarned—you’ll need some familiarity with statistical analysis to fully understand it.