Lower end speakers (like the kind found in HTiB systems) are often lacking in midrange clarity. So you have to turn it up louder to be able to hear what they are saying. A great deal of the sounds related to speech intelligibility are located in the 1-5khz region. This also happens to be where the crossover is located in most 2-way speakers (or MTMs). Properly designed crossovers can increase cost substantially, so speaker manufactuers cut corners in here a lot. You will rarely find more than a couple capacitors, a resistor, and maybe an inductor. Sometimes they just let the mid without a lowpass, which can result in a good deal of distortion in the speech frequencies. When you open up your speaker and just find a capacitor on the tweeter, you know you have a well designed crossover
A properly designed 3-way speaker can greatly improve speech intelligibility, but the cost can rise substantialy, especially in crossover parts.
...So, it boils down to this: You have to turn it up to make out what they are hearing. Not necessarily because it is too quiet, but probably because it's just harder to understand. Then when an action scene comes along, the volume is too high and, well, you know what happens from there.
There is little you can do if it is an issue with the speaker design. Sure you can calibrate everything so the levels are equal. Or you can crank the center channel up so voices are louder. But when the explosion is in the center channel, that's not going to help you out much.
Eventually you just have to upgrade... You'll find that with a higher end speaker you will be able to make out voices you didn't hear before and actually be able to listen at a lower volume.
Maybe I should be a salesman.