Mcnarus says,
” But that's the whole point, Amir. If they had preconceived notions, that would bias the results. If they didn't, there's no bias, and no reason to reject the results of the test.”
I said they did not have preconceived notions about the
numerical value of jitter. Taking that way does nothing to change bias. What if there was no audible distortion there? What if there was distortion at all levels? These are the reasons we don’t in blind tests put the user in charge of selecting the parameters being tested.
Imagine this test for a new drug: they give patients 10 bottles of a drug with the active ingredient ranging from 1 to 10 milligrams, in 1 milligram steps. They then ask them to self-medicate and keep increasing the dosage until they felt better. Would you believe any outcome from this trial? I hope not. That is why we use a placebo in some of the trials and perform the test double blind. In absence of that, subjects may very well assume “more is better” and give us false outcomes.
What if I repeated The Dolby test at home and came and told you that I could hear jitter at 0.01 nanoseconds. Would you believe me? What would stop me from imagining it? And what control you have in place to catch that?
So no, we can’t give control of the test to the user and say the test is blind. The tester is in charge of the parameter being tested and hence by definition, it is a sighted test.
I have done tests like this and know this first hand. We were developing a new version of our audio encoder at Microsoft and I spent days at home optimizing a few of the parameters. I had no idea what the parameters did. I only thought that if I changed them, the sound would get better or worse. So I kept playing with them ‘till I found what I thought were optimal values. I handed them back to my team and they produced a set of blind tests for before/after. You know what? I could not hear any difference! Here I thought I was hearing every incremental change but at the end, it was all for nothing. My team explained that while I was optimizing the values with fractional numbers, the program never used the fractions! So objectively I was wrong too. Darn it

.
” Blinding is simply a tool you use to control for certain forms of bias. What uncontrolled bias exists in the test you're talking about, Amir, and how would you go about controlling for it?”
I thought Bigus provided a method for this. But here is more. The first thing to do is take the control of the knob from tester and give it to a proctor hidden from the tester. This is what audiologists do to test your hearing and optometrists use to find out what lenses you need for your eyeglasses. The subject is asked questions that he answers, not knowing what is really going on. He doesn’t know if a number is going up, or down. He simply answers questions.
To reduce the time it takes to run the test, a binary search could be used. The proctor would start with “zero” jitter and ask if it is audible compared to a reference. If the answer is no, then you turn the dial to max, say, 200 nanoseconds and ask again. If the subject still doesn’t hear it then you are done with that tester, or test in general if others can’t hear it. Assuming they did hear the max, then you go down to the middle which in this example would be 100. If they still hear it, you jump to 50 nanoseconds. Then 25, then 12, etc. Once you get a negative response, you reverse direction and split the distance by two again. Eventually you wind up with a number.
But you are not done yet. You should repeat the test again to see if the data is reliable. Assuming it is, then you take that value and run randomized tests against the reference – just like my team did to me. If the subject can’t reliably hear the difference then that invalidates the results for this tester. And if no one can hear it, then you give and declare that such tests are harder than they look

.