I've done this test over and over and over using far less headroom than you maniacs are discussing as being a requirement.
I've also done it using a limitless headroom system, but that's irrelevant to the test. You aren't testing loudness capability and you don't need to pump the SW level to stupid high levels to get an answer from listeners. In fact, starting with very high levels taints the test, and running the sub hot will also skew any listening test. Those are personal preferences that are satisfied by scaling the system up and are detrimental to the test results.
This one is such a given to me because it has never failed to impress listeners as being noticeable, subjective preferences being irrelevant.
But, if you don't normalize the 2 choices, you are wasting your time.
Not singling out Arch, because it's a recurring scene in other comparos, but since he posted the graph, I'll use it. There is no chance you can even evaluate the OS as a subwoofer on its own with a FR like the one posted here. This response could be from any number of subs and it pretty much wouldn't matter because the focus of any listener would be on the huge peak (or the huge hole, depending on how you see the response and calibrate that response into the system).
In my most recent comparo of 20 Hz ported vs full bandwidth, you can clearly see the ported subs roll off has no weird false bump/dip that most other graphs show. The roll off profile of the 2 subs is obviously going to be different, but the rooms influence should affect them both identically. If it doesn't, something has been done incorrectly with placement, calibration or whatever. With the mic in the same position and the subs in the same position, the room influence does not change. Matching the levels is then imperative. Some people can't tell a 3dB difference, but most people can detect a difference even less than that, which can skew the comparison.
There is a time lapse between the actual comparison because the switch can't be accomplished baddah-bing, but it does not matter. You will get a 100% result, every time. It does not matter if you are on a slab or in the attic or if the room is 6k cubes or 2k cubes. There is a difference and it will be noticed and verified by every listener.
When scaling and overlaying the trace Arch posted, if I had a response like that, I would not proceed with the test (or any sort of test) because it would yield results that are pretty much meaningless and boil down to how loud a sub can play from 40 Hz to cross.
With all else matched, there is zero question as to whether or not you will experience a difference. Zero. And, you do not need 120dB levels to get the result.
When you take a system like the one in the graph that shows flat-to-4 Hz and start injecting various HPF and/or boost signal shaping, you aren't comparing a resonant, bandwidth-limited sub to a full bandwidth sub. You're then comparing optional presentations from the same system. This is the whole premise of my system and always has been. Having a system that adapts to its owner's preferences for playback levels, source, extension or whatever, is the best approach. I have no qualms with anyone's preference. But, I do think that once you decide to get serious about low end you should have the option to select the native response that works best for you with no compromise.
I've done tests using the same system/placement/calibration and different HPF/Boost setting and the difference between 10 Hz and 5 Hz is not nearly as easy to appreciate and has a more narrow choice of source to conduct the test with. I'l say here that Black Hawk Down's "Irene" scene is not a good test source either way. It is the best scene I know of to compare a digital SpecLab graph to a mic'd SpecLab graph for sub playback accuracy and even order distortion in-room, but steady state (especially playing sine waves) is not the way to go here.
Finally, on the idea of putting a percentage on what scenes have this content, I would like to state the obvious: The Master List of Movies With Bass thread lists over 1,000 titles. Virtually all of them have ULF content. Virtually none of that content is unintentional artifact. I personally have nowhere near 1,000 titles in my library, let alone 1,000 titles with great low end soundtracks. I may prefer to watch a movie from the 40s that will have nothing in the sub range at all. I may have friends over and let them watch full tilt low end blockbusters all night long. I may spend a whole night listening only to music. Point is, all of that has absolutely nothing to do with the discussion.
The Q is: is the content there, was it intentionally put there and does it make a difference?
After the research and personal interest tests I've done myself since the release on DVD of LOTR FOTR in 2002, my answer is yes, yes and yes.