Yesterday, myself and a few other enthusiasts got together to conduct a processor blind listening test. Our goal was to determine whether or not the different DACs and analog prestages of different receivers and pre/pros can affect sound quality. Amplification was not being tested, just processor sound quality, so a discrete amp was used to do all of the amplification. The units we used, in my opinion, were a good representation of various levels of product that most of us in this hobby will consider.
Processors:
Pioneer VSX 1014 - Essentially the same unit as the newer 1015, this receiver has become the standard for entry-level receivers. Plenty of features, decent amp section, and a reputation as being great for movies and not so great for music.
Harman Kardon AVR 635 - Not a hi end receiver by any means, but definitely regarded as a step up from entry level. Tons of features, beefy amp section, and a reputation as being one of the top receivers in regards to sound quality.
Audio Refinement Pre-2DSP - A dedicated AV preamp processor that is regarded as another step up from receivers. While this unit is not the most expensive and doesn't have the longest list of features, it is regarded as being one of the most musical pre/pros, having great sound quality.
Amplifier:
PS Audio HCA 2 - Quality 2 channel amp that is also regarded for its great sound quality.
Speakers:
Totem Acoustics Forrests
CD Player:
Panasonic DVD S77
Cables
RS Gold analog stereo, Dayton Audio digital coax, DIY dual 14 gauge twisted pair speaker wire
The participants were Jon, his girlfriend Gudrun, my friend Tyler, and myself. ---k---, a member from htguide.com, and another professor from Purdue were all scheduled to come as well, but they wimped out..buncha wimps. Again, since we were only interested in testing processing, we used the digital coax output from the cd player to each unit and then the analog preouts from each unit to the amplifier. All of the equipment (aside from the speakers of course) was kept in a second room with doors shut so neither it nor the moderator were visible to the listeners - the speaker cables ran under the door. Jon will post a few pictures of the setup when he replies to this thread. We made sure to eliminate all variables that may affect the sound that aren't related to the actual processing, so EQ, tone controls, distance settings, subwoofer functions, etc., in each processor were turned off. The units simply had to decode the incoming digital stream and send full range signals out to the speakers with no post processing. We didn't use a subwoofer because each unit may have different crossover slopes or bass management methods, and that could possibly affect what we heard and would not be attributed to the DACs or analog prestage. We didn't test surround sound quality performance because I have already had discussions with an algorithm engineer from Dolby about whether or not different DSPs can affect what info steered to different channels or whether they can affect sound quality.
The units were calibrated to each other by using a test cd with a wide band pink noise tone and a digital RS meter mounted to a tripod that was placed on the main seating position. We plugged in the L channel output from one unit and adjusted the master volume until we registered 66db from the tone. Then we unplugged the L channel, plugged in the R channel, and adjusted the individual R channel settings until it also read 66db. Then we plugged in both channels and measured the output, which in Jon's room was 70db. We did this with each unit until we had identical output levels.
Our test consisted of two parts. The first part was to test whether or not there were sound quality differences between units - not which unit we preferred, not what differences we noticed, JUST whether or not we heard differences. The second part, which was to be conducted only if our first test statistically proved to us that differences did exist (based on a 70% accuracy level), was to test which unit had the best sound quality based on our preferences. We wrote out the three combinations of units - HK vs PI, PI vs AR, and HK vs AR - on three strips of paper, folded them up, and placed them in a basket. Three of us reached in the basket and selected a piece of paper, which we then put in our pockets. When it was someone's turn to be moderator, they opened up the paper, and were allowed to use only those two units for playback during their test. This way, nobody knew what pieces of equipment were being tested when. The fourth person (who won this spot through rock, paper, scissors) had free reign to use all three units during their testing.
The procedure for each test is written down on our log sheets, so see attached. If anyone has any questions or if it is not clear enough, feel free to ask. I'd write it all out here again, but then this post would be nearly twice as long, and none of us want that. So just take a look at the attached log sheets, maybe zoom in a bit, and read the procedure. One difference is that we did not have to start on a silent track, as you'll read later, I inserted 3.5 seconds of silence at the beginning of each track, so the mod just had to que up the track number and press play. The audible difference part consisted of four tests with three listeners and one moderator for each test, and the moderator changed for each test. There were three listening positions, left, middle, and right, and the listeners rotated their seats between tests so everyone got a chance at each seat. Listeners were not allowed to speak of the test of share any impressions at all until all four tests were done. The moderator could not be seen or heard in the other room behind the doors. We all did a few dry runs as both listeners and mods so that everyone was clear in how to conduct the test - it is difficult in explaining it, but very easy and intuitive in practice.
The songs we chose for the audible difference testing were selected back in January. Each participant chose a couple of songs that they both enjoyed and were confident that they knew very well. They then sent me these songs, and I isolated a ~35 second long clip from each song that we agreed captured its essence and a range that we felt would be easy to distinguish between if audible differences were present. I compiled these clips in addition to the full songs on cds and sent them out to each participant in early February. By doing this, each participant has been able to listen to and become very familiar with the songs and exact clips we would be using for this audible difference testing for over three months. Basically, by the time the test finally took place, the participants knew the samples through and through. A note of interest is that I received the HK 635 earlier this week and found that it will mute the first second or so of playback from a digital stream, so at the last minute, I had to pull up the clips again and add 3.5 second of silence to the beginning of each clip. In doing this, I eliminated any chance of this oddity tipping us off as to whether the HK was being used. Taking this into consideration, I think we successfully covered all aspects of the test that could have possibly kept it from truely being blind.
Before we get to the results, I just want to make some points clear so we can avoid some of the nastiness that resulted from our last test. Whether you agree or disagree with our results is fine, just don't try and convince us of something otherwise, as we just spent a 10 hour day testing. We aren't trying to pass off our test as a given fact in every single circumstance for every single person, but our results are fact in this listening room with this equipment with these people. If you disagree with some part of the methodology, that is fine, just politely express it as a logical point and I will address it. If you don't agree with our results, DO NOT try and find imaginary faults within our test to try and justify yourself.
The raw results from the audible differences test showed that as a group, we were correct 61 times out of 120, or 51% accuracy. To break it down by comparison:
HK vs PI - correct 21 out of 39, or 54% accuracy
PI vs AR - correct 15 our ot 36, or 42% accuracy
HK vs AR - correct 19 out of 30, or 63% accuracy
To break that down further, these are the results we got when removing the trials in which the moderator chose to use the same unit twice in a row. In other words, these results are purely of the direct comparison of switching from one unit to the other, and because of that, the most significant in our opinion.
HK vs PI - correct 18 out of 33, or 55% accuracy
PI vs AR - correct 9 out of 24, or 38% accuracy
HK vs AR - correct 10 out of 18, or 56% accuracy
To examine it a different way, here are the results by person:
Jon - correct 14 out of 30, or 47% accuracy
Gudrun - correct 16 out of 30, or 53% accuracy
Tyler - correct 12 out of 30, or 40% accuracy
Steve - correct 19 out of 30, or 63% accuracy
No combination resulted in 70% or greater accuracy, and no single person achieved greater than 70% accuracy. Because of this, and because we agreed afterwards that it was very difficult to try and pick something out to base your decision on, we did not continue on with the sound quality preference testing.
The closest we came to statistically proving there were audible differences was with the HK vs the AR, using the song Arousing Thunder by Grant Lee Buffalo, which has some bass from a drum being struck throughout the clip. As a group, we were correct 12 out of 15 times, or 80% accurate. Tyler had actually taken down a few notes during this test, and on Trials 3 and 5 he jotted that the second playback had heavier or deeper bass - the HK was used for the second playback on both of those trials. Later in the evening, we did a quick test of the HK using it's internal amplification vs the AR using the PS Audio amp, and I also noted that the first playback had more punch to the bass - it turned out to be the HK as well. Unfortunately, I don't know how much significance we can draw from only 15 samples on that combination with that song. Our collective score of the HK vs the AR never got higher than 63%. If we had more time, we could have examined this further, but it was already into the night and we needed to refit the baseplate on Jon's kickass subwoofer.
To be honest, the results were pretty surprising to me. Had you asked me prior to a few months ago whether DACs made a difference, I would have said no. But in doing my research for a new receiver purchase, I came upon several first hand user reviews from this website and others, some from users whose opinion I really respect, that different DACs truly do make a difference. So in the last few months, I thought for sure we would be able to identify differences..I guess not. If we were able to measure level matched outputs of the same clips from two units on a computer screen, we may notice small differences do exist, but in actual practice, they were not readily discernable. Will this test affect my purchasing decision as I claimed it would for months leading up to it? Yes. My HK 635 has a couple of glitches and needs to go back. Since I will be using a Carvin hd1800 to power my mains, this test proves to me I can buy a less expensive receiver and still get the same sound quality from the processing. A Pioneer 1015 might be the ticket.
UPDATE: A Pioneer 1015 was NOT the ticket lol. Yamaha HTR 5890 did the trick.
As a side blind test, one that I have always wanted to do but never got around to, mainly because I haven't drank a soda in years, we tested Pepsi vs Coke. There was a pretty big difference between the two that we all picked up on, one had a lot more carbonation and had a hint of citrus, the other was sweeter and smoother, almost more syrupy tasting. Only problem was that Gudrun and I assumed Coke was the more carbonated soda, so we were incorrect, but it still stands that the difference between the two is quite evident.
Big thanks to Jon for hosting and providing us with a nice spread of food. And Jon, though I said it like 20 times yesterday, that audio rack looks great! I want to get started on mine asap.
Processors:
Pioneer VSX 1014 - Essentially the same unit as the newer 1015, this receiver has become the standard for entry-level receivers. Plenty of features, decent amp section, and a reputation as being great for movies and not so great for music.
Harman Kardon AVR 635 - Not a hi end receiver by any means, but definitely regarded as a step up from entry level. Tons of features, beefy amp section, and a reputation as being one of the top receivers in regards to sound quality.
Audio Refinement Pre-2DSP - A dedicated AV preamp processor that is regarded as another step up from receivers. While this unit is not the most expensive and doesn't have the longest list of features, it is regarded as being one of the most musical pre/pros, having great sound quality.
Amplifier:
PS Audio HCA 2 - Quality 2 channel amp that is also regarded for its great sound quality.
Speakers:
Totem Acoustics Forrests
CD Player:
Panasonic DVD S77
Cables
RS Gold analog stereo, Dayton Audio digital coax, DIY dual 14 gauge twisted pair speaker wire
The participants were Jon, his girlfriend Gudrun, my friend Tyler, and myself. ---k---, a member from htguide.com, and another professor from Purdue were all scheduled to come as well, but they wimped out..buncha wimps. Again, since we were only interested in testing processing, we used the digital coax output from the cd player to each unit and then the analog preouts from each unit to the amplifier. All of the equipment (aside from the speakers of course) was kept in a second room with doors shut so neither it nor the moderator were visible to the listeners - the speaker cables ran under the door. Jon will post a few pictures of the setup when he replies to this thread. We made sure to eliminate all variables that may affect the sound that aren't related to the actual processing, so EQ, tone controls, distance settings, subwoofer functions, etc., in each processor were turned off. The units simply had to decode the incoming digital stream and send full range signals out to the speakers with no post processing. We didn't use a subwoofer because each unit may have different crossover slopes or bass management methods, and that could possibly affect what we heard and would not be attributed to the DACs or analog prestage. We didn't test surround sound quality performance because I have already had discussions with an algorithm engineer from Dolby about whether or not different DSPs can affect what info steered to different channels or whether they can affect sound quality.
The units were calibrated to each other by using a test cd with a wide band pink noise tone and a digital RS meter mounted to a tripod that was placed on the main seating position. We plugged in the L channel output from one unit and adjusted the master volume until we registered 66db from the tone. Then we unplugged the L channel, plugged in the R channel, and adjusted the individual R channel settings until it also read 66db. Then we plugged in both channels and measured the output, which in Jon's room was 70db. We did this with each unit until we had identical output levels.
Our test consisted of two parts. The first part was to test whether or not there were sound quality differences between units - not which unit we preferred, not what differences we noticed, JUST whether or not we heard differences. The second part, which was to be conducted only if our first test statistically proved to us that differences did exist (based on a 70% accuracy level), was to test which unit had the best sound quality based on our preferences. We wrote out the three combinations of units - HK vs PI, PI vs AR, and HK vs AR - on three strips of paper, folded them up, and placed them in a basket. Three of us reached in the basket and selected a piece of paper, which we then put in our pockets. When it was someone's turn to be moderator, they opened up the paper, and were allowed to use only those two units for playback during their test. This way, nobody knew what pieces of equipment were being tested when. The fourth person (who won this spot through rock, paper, scissors) had free reign to use all three units during their testing.
The procedure for each test is written down on our log sheets, so see attached. If anyone has any questions or if it is not clear enough, feel free to ask. I'd write it all out here again, but then this post would be nearly twice as long, and none of us want that. So just take a look at the attached log sheets, maybe zoom in a bit, and read the procedure. One difference is that we did not have to start on a silent track, as you'll read later, I inserted 3.5 seconds of silence at the beginning of each track, so the mod just had to que up the track number and press play. The audible difference part consisted of four tests with three listeners and one moderator for each test, and the moderator changed for each test. There were three listening positions, left, middle, and right, and the listeners rotated their seats between tests so everyone got a chance at each seat. Listeners were not allowed to speak of the test of share any impressions at all until all four tests were done. The moderator could not be seen or heard in the other room behind the doors. We all did a few dry runs as both listeners and mods so that everyone was clear in how to conduct the test - it is difficult in explaining it, but very easy and intuitive in practice.
The songs we chose for the audible difference testing were selected back in January. Each participant chose a couple of songs that they both enjoyed and were confident that they knew very well. They then sent me these songs, and I isolated a ~35 second long clip from each song that we agreed captured its essence and a range that we felt would be easy to distinguish between if audible differences were present. I compiled these clips in addition to the full songs on cds and sent them out to each participant in early February. By doing this, each participant has been able to listen to and become very familiar with the songs and exact clips we would be using for this audible difference testing for over three months. Basically, by the time the test finally took place, the participants knew the samples through and through. A note of interest is that I received the HK 635 earlier this week and found that it will mute the first second or so of playback from a digital stream, so at the last minute, I had to pull up the clips again and add 3.5 second of silence to the beginning of each clip. In doing this, I eliminated any chance of this oddity tipping us off as to whether the HK was being used. Taking this into consideration, I think we successfully covered all aspects of the test that could have possibly kept it from truely being blind.
Before we get to the results, I just want to make some points clear so we can avoid some of the nastiness that resulted from our last test. Whether you agree or disagree with our results is fine, just don't try and convince us of something otherwise, as we just spent a 10 hour day testing. We aren't trying to pass off our test as a given fact in every single circumstance for every single person, but our results are fact in this listening room with this equipment with these people. If you disagree with some part of the methodology, that is fine, just politely express it as a logical point and I will address it. If you don't agree with our results, DO NOT try and find imaginary faults within our test to try and justify yourself.
The raw results from the audible differences test showed that as a group, we were correct 61 times out of 120, or 51% accuracy. To break it down by comparison:
HK vs PI - correct 21 out of 39, or 54% accuracy
PI vs AR - correct 15 our ot 36, or 42% accuracy
HK vs AR - correct 19 out of 30, or 63% accuracy
To break that down further, these are the results we got when removing the trials in which the moderator chose to use the same unit twice in a row. In other words, these results are purely of the direct comparison of switching from one unit to the other, and because of that, the most significant in our opinion.
HK vs PI - correct 18 out of 33, or 55% accuracy
PI vs AR - correct 9 out of 24, or 38% accuracy
HK vs AR - correct 10 out of 18, or 56% accuracy
To examine it a different way, here are the results by person:
Jon - correct 14 out of 30, or 47% accuracy
Gudrun - correct 16 out of 30, or 53% accuracy
Tyler - correct 12 out of 30, or 40% accuracy
Steve - correct 19 out of 30, or 63% accuracy
No combination resulted in 70% or greater accuracy, and no single person achieved greater than 70% accuracy. Because of this, and because we agreed afterwards that it was very difficult to try and pick something out to base your decision on, we did not continue on with the sound quality preference testing.
The closest we came to statistically proving there were audible differences was with the HK vs the AR, using the song Arousing Thunder by Grant Lee Buffalo, which has some bass from a drum being struck throughout the clip. As a group, we were correct 12 out of 15 times, or 80% accurate. Tyler had actually taken down a few notes during this test, and on Trials 3 and 5 he jotted that the second playback had heavier or deeper bass - the HK was used for the second playback on both of those trials. Later in the evening, we did a quick test of the HK using it's internal amplification vs the AR using the PS Audio amp, and I also noted that the first playback had more punch to the bass - it turned out to be the HK as well. Unfortunately, I don't know how much significance we can draw from only 15 samples on that combination with that song. Our collective score of the HK vs the AR never got higher than 63%. If we had more time, we could have examined this further, but it was already into the night and we needed to refit the baseplate on Jon's kickass subwoofer.
To be honest, the results were pretty surprising to me. Had you asked me prior to a few months ago whether DACs made a difference, I would have said no. But in doing my research for a new receiver purchase, I came upon several first hand user reviews from this website and others, some from users whose opinion I really respect, that different DACs truly do make a difference. So in the last few months, I thought for sure we would be able to identify differences..I guess not. If we were able to measure level matched outputs of the same clips from two units on a computer screen, we may notice small differences do exist, but in actual practice, they were not readily discernable. Will this test affect my purchasing decision as I claimed it would for months leading up to it? Yes. My HK 635 has a couple of glitches and needs to go back. Since I will be using a Carvin hd1800 to power my mains, this test proves to me I can buy a less expensive receiver and still get the same sound quality from the processing. A Pioneer 1015 might be the ticket.
UPDATE: A Pioneer 1015 was NOT the ticket lol. Yamaha HTR 5890 did the trick.
As a side blind test, one that I have always wanted to do but never got around to, mainly because I haven't drank a soda in years, we tested Pepsi vs Coke. There was a pretty big difference between the two that we all picked up on, one had a lot more carbonation and had a hint of citrus, the other was sweeter and smoother, almost more syrupy tasting. Only problem was that Gudrun and I assumed Coke was the more carbonated soda, so we were incorrect, but it still stands that the difference between the two is quite evident.
Big thanks to Jon for hosting and providing us with a nice spread of food. And Jon, though I said it like 20 times yesterday, that audio rack looks great! I want to get started on mine asap.