I've only skimmed this so far but I can't pinpoint the authors of the article. Is it a culmination of efforts from the ITU, or is there a single author? I ask because you linked it and I suspect you would have a fair level of understanding of this article's source and credibility.
It' not an article; it's a standard, drafted under the aegis of the ITU, which is a pretty high-powered international professional association. (I believe the immediate purpose was to set up rules of the road for testing lossy codecs, so nobody could do a phony test and claim their version of MP3 sounded better.)
Rather than going through an exhaustive procedure with many steps, is there a way that we can agree on something more simple so EVERYONE will want to participate?
Who wants to participate? There are people who understand and accept what's been found in the past, and people who really do not want to know. I'm not sure there's a third category.
For what it's worth, it's not a question of steps, but of conditions. I'd say the basic conditions of any reliable listening test would be:
1) Double-blind, meaning that neither the subject(s) nor anyone coming in contact with the subject(s) during the test knows which unit is which.
2) Level-matching to within 0.1 dB, which requires a voltmeter at the speaker terminals, preferable checking multiple frequencies.
3) For tests of source units, time-synching of the units.
To maximize the chances of getting a positive result, I'd add two further requirements:
4) The switch is controlled by the test subject.
5) The switch between presentations is instantaneous.
6) There must be enough trials and enough correct answers to achieve statistical significance.