When I bought a mid-range (analog) HiFi around 1982, I was told to pay attention to two things: frequency range (e.g., 20-20000 Hz, +/- 3 dB, though I admit the speakers I bought only went up to about 16,000 Hz) and THD (I was told to stay below 0.01%).
Virtually any modern AVR supposedly meets the 20-20,000 Hz criteria - if anything, I suppose the higher frequency capabilities and ditigization might introduce digital aliasing effects.
I hadn't heard yet of enharmonic distortions, such as are frequently produced by intermodulation effects.
I had heard of phase distortions - though of course that is only a subset of timing distortions.
Many modern consumer market amplifiers and AVRs have THDs in the 1% range, though when I bought my (used) AVR, I decided to keep THD fairly low - after all, fairly old used AV equipment is pretty cheap. I got a Yamaha RX-V863. Here are the specs from the owner manual:
I'm not sure about the enharmonic and intermodulation distorition figures. For that matter, I understand that THD is only measured at one frequency - sometimes the frequency at which the amplifier or AVR is best at, and at the volume level the amplifier or AVR is best at.
An interesting question is whether any of these distortions - including THD in the 1% range of many consumer market AVRs - significantly affect dialog comprehension, in TV shows and movies. And whether an even better AVR would have improved my dialog comprehension.
Many sources say most of the sounds that affect phoneme (and microphoneme) discrimination of American are roughly in the 40-4000 Hz range (with some variation in the stated range). E.g., see
Generating and understanding speech | Ecophon
www.meshguides.org
Some sources say people create varying amounts of those frequencies by altering the shapes of their throats, mouths, and lips. (And in fact, different people do this somewhat different ways - as they must, since the typical fundamental frequency range of their voices are different, and the shapes and acoustic of the vocal tracts are different.)
Many modern speakers don't reach 20-20,000 Hz - but many do include the 40-4000 (8000?) Hz frequencies, so maybe that doesn't matter to dialog comprehension. I confess I bought my speakers as an afterthought, and they aren't all that great - just the speakers that came with an even older Onkyo SKC-540C Home Theater (Front: SKF-540F, Center: SKC-540C, Surround: SKM-540S, Surround back: SKB-540, Powered Subwoofer: SKW-540 - which I don't use). They don't include the lower frequencies. Here are the specs for a somewhat similar system:
Note that the center channel speakers only go down to 55 Hz. I did not find distortion figures for them.
The (used) headphones I bought are somewhat better: Sennheiser HDR 175, rated at 17 - 22,000 Hz, but still have fairly high distortion figures: THD<0.5 % at 1 kHz, 100 dB SPL. I am using them instead of the speakers, because because I am sharing a home with other people, and because they sound better than those speakers.
What I'm really wondering is whether any of these distortions could produce sufficient contributions in the 40-4000 (or 8000?) Hz frequency range to affect comprehension.
In addition, timing distortions in a typical home theater system might be significant, since the different speakers and speaker cones are in different locations. That should mean that the time of arrival at your ears of different sound components is different, which might make it hard to understand many consonants. Perhaps many phonemes and microphonemes might arrive mixed together, as in a room with substantial echoes. Maybe that is part of why I understand dialog better using my headphones than my speakers...
So - are noise figures like THD, EHD, IMD, and timing distortions important to dialog comprehension? Is it possible that people buying consumer market home theater equipment - not just me - sometimes lose dialog comprehension in part because of that?
(Of course, the restriction of dialog to the center sound channel must create another comprehension problem. Many sources say that most people obtain a lot of information for dialog comprehension in normal human conversation by lip reading - i.e., looking at the mouths and lips of the speakers. We also obtain some information by looking at their bodies for gestures. They talk about "gaze behavior" - that we look at the person speaking - which means that in the ideal case, the sound should be heard by us to the location of the speaker on the screen (not in real 3D space). But I guess there isn't much we can't do much about that - that's the way TV shows and movies are made, and even if some of the dialog occurs in other speaker channels, it is unlikely to be mapped to the screen positions of their mouths and bodies - so a lot of that information must sometimes be lost. Is there anything we could do about that?)
Virtually any modern AVR supposedly meets the 20-20,000 Hz criteria - if anything, I suppose the higher frequency capabilities and ditigization might introduce digital aliasing effects.
I hadn't heard yet of enharmonic distortions, such as are frequently produced by intermodulation effects.
I had heard of phase distortions - though of course that is only a subset of timing distortions.
Many modern consumer market amplifiers and AVRs have THDs in the 1% range, though when I bought my (used) AVR, I decided to keep THD fairly low - after all, fairly old used AV equipment is pretty cheap. I got a Yamaha RX-V863. Here are the specs from the owner manual:
I'm not sure about the enharmonic and intermodulation distorition figures. For that matter, I understand that THD is only measured at one frequency - sometimes the frequency at which the amplifier or AVR is best at, and at the volume level the amplifier or AVR is best at.
An interesting question is whether any of these distortions - including THD in the 1% range of many consumer market AVRs - significantly affect dialog comprehension, in TV shows and movies. And whether an even better AVR would have improved my dialog comprehension.
Many sources say most of the sounds that affect phoneme (and microphoneme) discrimination of American are roughly in the 40-4000 Hz range (with some variation in the stated range). E.g., see
Generating and understanding speech | Ecophon
Sounds of Speech | MESHGuides
Some sources say people create varying amounts of those frequencies by altering the shapes of their throats, mouths, and lips. (And in fact, different people do this somewhat different ways - as they must, since the typical fundamental frequency range of their voices are different, and the shapes and acoustic of the vocal tracts are different.)
Many modern speakers don't reach 20-20,000 Hz - but many do include the 40-4000 (8000?) Hz frequencies, so maybe that doesn't matter to dialog comprehension. I confess I bought my speakers as an afterthought, and they aren't all that great - just the speakers that came with an even older Onkyo SKC-540C Home Theater (Front: SKF-540F, Center: SKC-540C, Surround: SKM-540S, Surround back: SKB-540, Powered Subwoofer: SKW-540 - which I don't use). They don't include the lower frequencies. Here are the specs for a somewhat similar system:
Note that the center channel speakers only go down to 55 Hz. I did not find distortion figures for them.
The (used) headphones I bought are somewhat better: Sennheiser HDR 175, rated at 17 - 22,000 Hz, but still have fairly high distortion figures: THD<0.5 % at 1 kHz, 100 dB SPL. I am using them instead of the speakers, because because I am sharing a home with other people, and because they sound better than those speakers.
What I'm really wondering is whether any of these distortions could produce sufficient contributions in the 40-4000 (or 8000?) Hz frequency range to affect comprehension.
In addition, timing distortions in a typical home theater system might be significant, since the different speakers and speaker cones are in different locations. That should mean that the time of arrival at your ears of different sound components is different, which might make it hard to understand many consonants. Perhaps many phonemes and microphonemes might arrive mixed together, as in a room with substantial echoes. Maybe that is part of why I understand dialog better using my headphones than my speakers...
So - are noise figures like THD, EHD, IMD, and timing distortions important to dialog comprehension? Is it possible that people buying consumer market home theater equipment - not just me - sometimes lose dialog comprehension in part because of that?
(Of course, the restriction of dialog to the center sound channel must create another comprehension problem. Many sources say that most people obtain a lot of information for dialog comprehension in normal human conversation by lip reading - i.e., looking at the mouths and lips of the speakers. We also obtain some information by looking at their bodies for gestures. They talk about "gaze behavior" - that we look at the person speaking - which means that in the ideal case, the sound should be heard by us to the location of the speaker on the screen (not in real 3D space). But I guess there isn't much we can't do much about that - that's the way TV shows and movies are made, and even if some of the dialog occurs in other speaker channels, it is unlikely to be mapped to the screen positions of their mouths and bodies - so a lot of that information must sometimes be lost. Is there anything we could do about that?)