I performed some SBT's a few years ago…I got around the level matching challenge by taking it from opposite angle. Since I couldn't level match Component A & B, every iteration was scripted to have a completely different combination of volume, treble, and bass. For each iteration the listener had to write down A or B, there were 20 iterations. Therefore, the two components were not just competing with each other, but with itself as well. For test subject I employed the wife, whose bias is to save as much money as possible (to make available for her hobby, designer fashion!). For comparison's sake, we then reversed roles where i was the listener and she would make up a different script.
The goal was, there needs to be a really major difference to escape a 50% guessing result.
My setup at the time was ideal, because the audio stack was at the opposite end of the room and completely invisible to the listener. Wherever the same amp was scripted between iterations, I still pulled out and reinserted the banana plugs, going through the same motions to ensure the listener hears the same background sounds between iterations.
I suggest this approach to those who are curious, while I'm sure it loses out in ultimate accuracy to scientific double-blind testing, but gains in relevancy because it uses your equipment, in your acoustical space, with subjects who care about music as you do. It's most satisfying to come to your conclusions based on your own experience rather than swallowing kool-aid from either camp.