AVS Forum banner

AVS/AIX High-Resolution Audio Test: The Results So Far

Tags
frontpage
29K views 456 replies 51 participants last post by  arnyk 
#1 ·


The first results of our high-resolution audio experiment are in, and they are very interesting.

A little over two weeks ago, I made available three pairs of level-matched 24/96 WAV files provided by Mark Waldrep (aka Dr. AIX) of AIX Records as part of an experiment. We want to see if true high-res audio (HRA), with ultrasonic frequencies and wide dynamic range, can be reliably distinguished from the same recording limited to Redbook CD specs. I invited AVS members to download the files, listen to both versions of each track, and PM me with their determinations of which version is true high-res in each case, along with the make and model of the components in the audio system and how they are connected. The files are available here.

So far, I've received responses from 13 members with non-HRA systems and six members with HRA-capable systems, and the results are quite interesting. Of the 13 responses from members with non-HRA systems, three were unable to distinguish any difference between the files at all. One identified none of the HRA versions correctly, five got one right, two identified two versions correctly, and two got all three right, as shown in the graph above. The average score among the 10 who submitted determinations is 50% correct.

Among those who have HRA-capable systems, all six identified the high-res versions perfectly—there wasn't a single incorrect determination. Obviously, both results support my contention that high-res audio requires a high-res audio system to reliably discern the difference that HRA can make to the sound of a recording.

Other interesting results include the fact that two out of the six with HRA-capable systems used headphones rather than speakers, and I verified that their frequency response extend beyond 20 kHz (at least according to the manufacturer's specs). One of the headphone users also played the files on a system with HRA-capable speakers and came to the same determinations about which version was which. And one of the speaker users also tried a pair of HRA-capable headphones and came to the same conclusions, but when he tried a pair of non-HRA headphones, he reported not being able to hear a difference.

The track most commonly cited as being the easiest to hear the difference was "Mosaic"—which got the highest number of correct identifications among the non-HRA group—though one respondent said that Mosaic was the only one he couldn't hear a difference in. Two respondents said that "On The Street Where You Live" was easy to differentiate, but another one said it was the hardest.

Of course, these results are anecdotal and not statistically significant, and the entire experiment is not scientifically controlled—I can only hope that participants don't cheat and look at the spectra of the two versions before listening and deciding which is which. Plus, the only way I have to verify that an audio system is truly HRA-capable is to go by the manufacturer specs.

Nevertheless, I think this is an interesting and fun experiment that I'd like to continue for another two weeks or so, giving more AVS members the opportunity to download the files and see if they can hear the difference and identify which version is true high res. If you'd like to add your determinations to the results, please PM me with your selections and a description of your audio system, including the make and model of each piece of gear and how they are connected together, plus any comments you care to make about your experience. I am collecting all these data in a spreadsheet to see what we can learn from them.

Meanwhile, I'll be sending PMs to those who have already sent me their determinations with the results of how well they did. All I ask is that they not reveal which version is which in any thread on AVS or elsewhere so those who haven't yet participated can do so fairly.

Let the listening continue!

Like AVS Forum on Facebook
Follow AVS Forum on Twitter
+1 AVS Forum on Google+
 
See less See more
1
#5 ·
While I appreciate the effort and spirit of the thread, surely with a site this size we could have a LOT more input if we had just a few members willing to let others listen in on their "qualified" rig rather than leaning on such a numbered few with systems offering the needed resolution?

Ditto in the scientific validity dept...setting up a couple valid ABX's really wouldn't be that big of a deal.

Thanks again...just an idea.


James
 
#8 ·
I would LOVE to have many more members submit their determinations, but I can't force them to. I hope to get more responses over the next couple of weeks. I suspect that suitably high-res audio systems are relatively rare, but there has to be more than six AVS members who own such a system. Also, I think it's a great idea for those who don't to seek out those who do and see if they can try the test during a visit.

As far as I can see, setting up an consistent ABX test with different computers and software isn't as easy as you suggest. For example, foobar for Windows and ABXTester for the Mac work in different ways and generate somewhat different results, and I couldn't figure out how to combine them into a consistent framework. Also, identifying which one is "X" isn't the same thing as identifying which one is high-res, which is the point of this particular experiment.
 
#6 ·
Have some questions about your files. The downsampled versions are .2 db quieter. This is about where level differences are known to be audible. Further if you null them out you get better nulling if you put that .2 db back. I think this calls the results into question.

The next question is that there appears to be a sub-sample timing shift between the files. The downsampled files are a tiny tiny amount slower as well. Normally you would see such a thing if the downsampled files also went through another DA/AD loop. Did this happen? Each file also has a time offset of a few samples. If you took the original and digitally downsampled I don't see why this would occur. Again suggesting an extra trip through the analog realm for the downsampled files.

When I take your hires file and do a downsampling everything lines up to the nearest bit automatically and the files null out far better. There is no level difference and no timing drift. If you normalize such files that actually corrupts the results.

So my primary question involves the level difference and whether or not these downsampled files have been analog an additional time somewhere along the way.
 
#7 ·
The level difference you cite is exactly what some members identified in the first set of files I posted. I subsequently posed a second set of files with matched levels; those are the files everyone should be listening to, and they are available at the link in the original post above. They can be distinguished from the original set of files by file names that include "A2" and "B2" rather than "A" and "B." We matched levels by lowering the level of the native 24/96 files by 0.2 dB; we did not normalize the files.

The files did not go through a DA/AD loop. We considered doing that to bring the downsampled version back up to 24/96, but in the end, we decided to keep everything in the digital domain.

The purpose of this test is to have people listen to the files without analyzing them in software—just listen. We believe that any differences in level and speed between the two versions of each track are now well under the perceptible threshold. If you can hear a difference between the versions, which one is native high res, A2 or B2? That's all we're looking for here.
 
#10 · (Edited)
Correct me if I'm wrong -- the task for each listener is to listen to three pairs of files, and pick the 'Hi rez' version in each pair?


if so, NB: a 3-trial test has a p = 0.125 for a perfect score (3 correct in 3 tries). That means we could 'expect' 12.5 perfect scores to occur in 100 tries -- simply by chance (e.g., by 'choosing' each answer via a coin flip)


Most scientific work has a p
 
#12 ·
Correct me if I'm wrong -- the task for each listener is to listen to three pairs of files, and pick the 'Hi rez' version in each pair?


if so, NB: a 3-trial test has a p = 0.125 for a perfect score (3 correct in 3 tries). That means we could 'expect' 12.5 perfect scores to occur in 100 tries -- simply by chance (e.g., by 'choosing' each answer via a coin flip)


Most scientific work has a p
 
#19 ·
I must have misread something some where in my haste- Ill blame the iphone.

Take care
James
 
#20 ·
A lot of Headphones are indeed fine for this experiment, just that we have to pay attention to the associated components. Some built in onboard sound card has a sharp rolloff at 20kHz mark which doesn't help to discern the files. Let's hope that we have more participants. Scott has gone to a great extent of arranging this. For this, I would like to thank you very much.
 
#21 ·
From the beginning, I have fully acknowledged that this is not a scientific test; it's an informal and fun experiment meant to give AVS members an opportunity to explore for themselves how much benefit high-res audio offers, and to gather some data about how well people can discern any difference.

Even "informal and fun experiment" may be giving it to much credit, however. The basic problem is that you know how many people could distinguish the files correctly and reported their results, but you don't know how many couldn't do it and failed to report their results (or didn't even bother to try, because they knew they'd only be guessing). So the data you have gathered is totally meaningless.
 
#23 · (Edited)
Wow, so people couldn't tell the difference between High-res and CD.

Meanwhile some people claim they can hear a difference just from adding POWER CONDITIONING, or a luxury loudspeaker or amplifier:rolleyes: I think we can lay that to rest once and for all.

I'm sure someone employed by or having connections to the industry (many people here) will argue against this, however. I love how Scott covers himself with that "non-scientific" disclaimer, since he wants to keep his position and keep the advertisers happy. Listening is listening, and this was a listening test.
 
#24 ·
My "non-scientific" disclaimer has nothing to do with keeping my position or keeping the advertisers happy; it has everything to do with keeping this experiment as open and transparent as possible (except for which file is which, of course!). However, you are exactly correct that this is a listening test, and all I want people to do is listen.
 
#26 ·
The problem isn't with the test, which is merely meaningless. It is with the misuse of the "results" by slimebucket audio salesmen and their enablers, which is already occurring elsewhere on AVS.
 
  • Like
Reactions: tubetwister
#27 ·
Hi Scott

I just want to be sure that I understand your definition of an HRA system. If I download the music files to a thumb-drive and play them through the USB port in my Oppo 105D, will that constitute an HRA system? If the answer is yes, then I'm going to participate in this test and see if I can differentiate between HRA and non-HRA files! Thanks.
 
#29 ·
The system you specify certainly qualifies as HRA, but only as a front end. You also need to consider the preamp, power amp, and speakers or headphones; they all need to be able to pass frequencies well above 20 kHz and a dynamic range beyond 93 dB. If any component in the system can't do that, it's a bottleneck that renders the system as a whole non-HRA.

For example, the Oppo's analog audio outputs are spec'd out to 96 kHz at -1.5 dB, while the headphone output is spec'd to 20 kHz, ±0.3 dB into 300 ohms. I suspect the headphone output might be able to go much higher, but I don't know, so plugging a pair of HRA-capable headphones into the Oppo's headphone output might or might not reproduce the ultrasonics in the native high-res files. I'll contact Oppo and see what they have to say about this.
 
#31 ·
I must respectfully disagree; I have already received a goodly number of responses with incorrect identifications. Of course, I can't know how many people got it wrong (or right, for that matter) if they don't report their findings, and of course, if someone doesn't even try, they're not going to submit their findings.

Which is why your data is meaningless. You haven't even been able to collect all of it. Without knowing how many people tried and failed, any conclusions drawn will be baseless.

I hope people had fun. There wasn't any other point.
 
  • Like
Reactions: tubetwister
#35 ·
So, your logic is that - because everyone on AVS didn't submit data (myself included) - the conclusions drawn from the data that was gathered are baseless?

I don't think you get it.
 
  • Like
Reactions: amirm
#32 ·
I want to know exactly what qualifies a system as "Hi-Res" capable?

And why. If we don't know what people are hearing (and we don't), how can we possibly determine that one system is reproducing what is necessary to determine a difference and another isn't? This is just an exercise in selection bias, on top of the original exercise in self-selection bias. You couldn't make this more bogus if you tried.
 
#36 ·
While it looks generically like the kind of thing you'd only try to publish if you were an "expert" in litigation, it wasn't proposed as a scientifically valid experiment. But idk how bias causes accurate assessment in what seems like a blind test. I don't have transducers that go supersonic but I will check out the files at some point for fun.

If there are folks who accurately identify the files, blind
the question is why. Is there something that makes the higher res files distinguishable? Is it the higher res, or something else? Fun at least to speculate
 
#34 ·
HRA system examples?

Scott,
When looking at the specifications of a pre-amp/amp/avr for the appropriate frequency range, what numbers should we be looking at?
This Marantz (http://us.marantz.com/us/Products/P...Id=HiFiComponents&SubCatId=0&ProductId=PM8005) gives:
THD: 0.02% (20Hz – 20kHz, 2 channel driven, 8 ohms load)
and
Frequency response: 5Hz – 100kHz ±3dB (CD, 1W, 8 ohms load)
This Yamaha (http://usa.yamaha.com/products/audio-visual/hifi-components/amps/a-s2000_black__u/?mode=model) gives
Minimum RMS Output Power: (8 ohms, 20 Hz-20 kHz, 0.02% THD) 90 W + 90 W / (4 ohms, 20 Hz-20 kHz, 0.02% THD) 150 W + 150 W
and
Frequency Response: CD, etc. to speaker out, Flat position 5 Hz-100 kHz +0 dB / -3 dB, CD, etc. to speaker out, Flat position 20 Hz-20 kHz +0 dB / -0.3 dB


Also, could you post some of the HRA systems provided in the responses so far?
 
#38 ·
So, your logic is that - because everyone on AVS didn't submit data (myself included) - the conclusions drawn from the data that was gathered are baseless?

No, my point is that because everyone who downloaded the files didn't submit data, we have a skewed data set. In particular, there could be a lot of people who listened to the files, decided that they couldn't tell the difference, and just didn't go further. Had they all reported their guesses, we might have a far higher percentage of wrong answers than we do. And since we don't know how much higher, we can't draw any conclusions.

A good test would require you to recruit your test subjects in advance and require all of them to complete the test. This was not a good test.
 
  • Like
Reactions: rogergraham
#54 ·
So, your logic is that - because everyone on AVS didn't submit data (myself included) - the conclusions drawn from the data that was gathered are baseless?

No, my point is that because everyone who downloaded the files didn't submit data, we have a skewed data set. In particular, there could be a lot of people who listened to the files, decided that they couldn't tell the difference, and just didn't go further. Had they all reported their guesses, we might have a far higher percentage of wrong answers than we do. And since we don't know how much higher, we can't draw any conclusions.
i'm at that point right now too.
should i really pm a pure guess i mean i don't have any clue at all.
 
#39 ·
While it looks generically like the kind of thing you'd only try to publish if you were an "expert" in litigation, it wasn't proposed as a scientifically valid experiment.

Then it should not be interpreted as a scientifically valid experiment. But it will be, because there are people here who will stretch any truth to sell product. I presume I don't need to mention any names.

If there are folks who accurately identify the files, blind the question is why.

Based on the data we have, we cannot say that they weren't just lucky guessers.

Is there something that makes the higher res files distinguishable? Is it the higher res, or something else? Fun at least to speculate

Have all the fun you want. But you could have done the speculating just as well before the test.
 
#47 ·
there are people here who will stretch any truth... I presume I don't need to mention any names.

.

if you are at a mirror, it should be obvious
 
#40 · (Edited)
Knowing that the results are not scientifically valid, why do it at all? Seems like a waste of time. Maybe as a prelude to get iron out a methodology prior to undertaking an actual double blind study, sure. I'm very interested in real results, not anecdotes from "golden ears" on the internet.

People lie about all kinds of things, especially how good their hearing is (on websites like these), that's beyond a doubt.
 
#45 ·
@mcnarus and RLBurnside:

Science (up and down every single field) is completely littered with tests/experiments that are there simply to see if there is a trend worth pursuing in a more rigorous setting. This doesn't give meaningless results at all, the proposal was to see if people could tell the difference and given the results that he received, the answer is that there is enough evidence to warrant further investigation and mostly likely the answer is yes. It doesn't matter how many non-reporting people there is, this type of experiment the way it was setup only requires positive answers and doesn't even need to get out of the margin of error (yes it would be nice, but he's receiving explanations along with the answers which also count for something).

It'd be nice to able to get the breakdown of the general population on whether or not they can and at what the percentage is at various resolution levels but that wasn't the point of this.
 
#46 ·
Science (up and down every single field) is completely littered with tests/experiments that are there simply to see if there is a trend worth pursuing in a more rigorous setting.

True, but this particular "experiment" is insufficient to demonstrate any "trend" worth pursuing. At best, you've demonstrated that something in the way these files were coded produced a difference in the audible range that was sufficient to be audible. Big deal. And that's only if you misinterpret the results.

This doesn't give meaningless results at all, the proposal was to see if people could tell the difference and given the results that he received, the answer is that there is enough evidence to warrant further investigation and mostly likely the answer is yes.

No, there really isn't. You can't even say for sure that people weren't just lucky guessers. What scientist is going to stop for a second to think about this result? It is the product of wishful thinking, nothing more. A scientist needs something more than that to go on.
 
#58 ·
Yeah, these results could not be less scientific, but my curiosity is still fully piqued. It's possible to interpret the results such that 100% of the people who evaluated the files with an HRA-capable system could detect the difference. There are only six people in the "sample", but all we need is ONE person who can reliably demonstrate the ability to detect ultra-sonic frequencies and some of my understanding of the science here is thrown out the window.

Put another way, THIS test is just a fun one and is AT MOST just suggestive of a possible result... but it's suggestive enough that I think it justifies an actual scientific test to determine if there is new knowledge to be had here or not. I believe that Dr Waldrep was considering doing such a thing and I hope so! This is all utterly fascinating.

(For the record, I didn't submit my results because I've proven time and again with ABXTester that I can't even tell the difference between an MP3 and a CD :eek: So the fact that I can't tell the difference between CD-quality and high-res is hardly a surprise and would just skew the results.)
 
#59 ·
There are only six people in the "sample", but all we need is ONE person who can reliably demonstrate the ability to detect ultra-sonic frequencies and some of my understanding of the science here is thrown out the window.

First, there are not just 6 people in the sample. There are 6 people in Scott's cherrypicked subset of the sample. And cherrypicking is a big no-no in statistics.

Second, so far no one has demonstrated that they can reliably tell the difference between these two types of files, because 3 out of 3 isn't statistically significant.

Third, even if some people can hear a difference, it does not mean that they can detect ultrasonic frequencies. It means that, either in the way the files were made or the way they were played back, there are distortion artifacts in the audible range in one or both of the files.

(For the record, I didn't submit my results because I've proven time and again with ABXTester that I can't even tell the difference between an MP3 and a CD So the fact that I can't tell the difference between CD-quality and high-res is hardly a surprise and would just skew the results.)

No, it won't skew the results. In fact, we need to know how many people can't do this in order to know whether anybody can. That's why the whole exercise is bogus. If you take yourself out of the sample because you think you know what your result will be, then it is you who are skewing the results.
 
  • Like
Reactions: tubetwister
#60 ·
First, there are not just 6 people in the sample. There are 6 people in Scott's cherrypicked subset of the sample. And cherrypicking is a big no-no in statistics.
I suggest studying Simpson's Paradox. Not only is that allowed but without doing so you can get the opposite results than the truth.

Second, so far no one has demonstrated that they can reliably tell the difference between these two types of files, because 3 out of 3 isn't statistically significant.
But this certainly is:

Thank you Scott! Much appreciated the effort you have put on this project Scott. For the first time I feel that the forum is moving forward toward better understanding of this topic.

foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/10 18:50:44

File A: C:\Users\Amir\Music\AIX AVS Test files\On_The_Street_Where_You_Live_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\On_The_Street_Where_You_Live_B2.wav

18:50:44 : Test started.
18:51:25 : 00/01 100.0%
18:51:38 : 01/02 75.0%
18:51:47 : 02/03 50.0%
18:51:55 : 03/04 31.3%
18:52:05 : 04/05 18.8%
18:52:21 : 05/06 10.9%
18:52:32 : 06/07 6.3%
18:52:43 : 07/08 3.5%
18:52:59 : 08/09 2.0%
18:53:10 : 09/10 1.1%
18:53:19 : 10/11 0.6%
18:53:23 : Test finished.

----------
Total: 10/11 (0.6%)
The third track was pretty easy. First segment picked was quite revealing:

foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/10 21:01:16

File A: C:\Users\Amir\Music\AIX AVS Test files\Just_My_Imagination_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\Just_My_Imagination_B2.wav

21:01:16 : Test started.
21:02:11 : 01/01 50.0%
21:02:20 : 02/02 25.0%
21:02:28 : 03/03 12.5%
21:02:38 : 04/04 6.3%
21:02:47 : 05/05 3.1%
21:02:56 : 06/06 1.6%
21:03:06 : 07/07 0.8%
21:03:16 : 08/08 0.4%
21:03:26 : 09/09 0.2%
21:03:45 : 10/10 0.1%
21:03:54 : 11/11 0.0%
21:04:11 : 12/12 0.0%
21:04:24 : Test finished.

----------
Total: 12/12 (0.0%)

:)
foo_abx 1.3.4 report
foobar2000 v1.3.2
2014/07/11 06:18:47

File A: C:\Users\Amir\Music\AIX AVS Test files\Mosaic_A2.wav
File B: C:\Users\Amir\Music\AIX AVS Test files\Mosaic_B2.wav

06:18:47 : Test started.
06:19:38 : 00/01 100.0%
06:20:15 : 00/02 100.0%
06:20:47 : 01/03 87.5%
06:21:01 : 01/04 93.8%
06:21:20 : 02/05 81.3%
06:21:32 : 03/06 65.6%
06:21:48 : 04/07 50.0%
06:22:01 : 04/08 63.7%
06:22:15 : 05/09 50.0%
06:22:24 : 05/10 62.3%
06:23:15 : 06/11 50.0%
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top