Ryan -
MCode has said Harman does blind test there amps.
See my thread on what I see as a well conducted test. You posted there - did you bother to read my initial post or the Kevin Voecks interview?
The AES test is fascinating because what many consider even more subtle in sonic differences - CD sampling rate vs. high frequency, an ABX test shows positive results. Unlike your examples it is published by a reputable scientific organization and I believe has been peer reviewed.
I should conduct an ABX test in my basement and post pictures to the web, you could use it as a reference too -

By the way I agree with the sighted tests reference and the basic validity of blind testing, if conducted well.
Since you need an AES membership for the study I mentioned here's one summary I found:
- Trained listeners were 9 sound engineers and 4 musicians of age ~28 (SD 5.6 yr)
- LAME used, unknown version. 96, 128, 192, 256 and 320. Alas, no VBR it seems. Pity.
- Genres: Pop, Metal/Rock, "Contemporary", Classical, Opera. <10sec excerpt.
- HQ speaker setup (not headphones). Wonder if it would have made much difference?
- 150 randomized trials (per excerpt? overall? not clear). Pairwise A/B. Testers asked to "prefer" one or the other and then the overall % tested for statistical significance.
- For 256 and 320 preference was 50/50 (so not significant). For 192, 128 and 96 it was 60/40, 75/25 and 80/20, respectively (significant).
- Sound engineers were more likely to prefer the higher quality version than musicians. Electric genres (pop/metal) were more frequently preferred in their HQ version than acoustic ones.
- Order of problems cited in decreasing frequency were: high freq artefacts, general distortion, transient artefacts, stereo image, dynamic range, reverb, background noise.
- No correlation between listening habits and performance.
- In the conclusion it is stated that trained listeners can not discriminate between CD quality and mp3 compression at 256-320 kb/s, while expert listeners could. Not sure who these "expert listeners" are supposed to be, but probably the test subjects from a referenced paper (Sutherland 2007?) who reportedly could do so even at 320.