A ideas you may want to try (sorry, if you didn't want suggestions):
If you are having trouble with voice intelligibility, try removing the center channel from the mix on all other speakers. This will prevent comb filtering, which is more problematic for voice than with environmental stuff. This will anchor the center channel to the screen, rather than making it a seamless part of your 360 degree sound, but that's usually a good thing. Of course, you can do everything in-between by simply reducing the level of the center channel on other speakers.
If you have the ability to time align each speaker (after the matrix mix, not before), here are a few strategies:
1) If you are primarily concerned with the main listening position, time align each speaker for that. I suspect this is the case, based on your diagram.
2) If you have a large room and multiple rows of seating that you care about, try time aligning to the back of the room and putting lots of acoustical treatment on the back wall. This creates a time aligned wavefront that moves from the front to the back of the room. This is more of an outdoor stadium style approach.
3) Large theaters use a similar approach to #2, but rather than focusing on the angle, they create multiple sets of side speakers with controlled dispersion, so each row of seats hears the side coming from their side.
FYI, there are a lot of commercially-oriented DSP devices available that provide a lot of control for tuning mixes, delay, and EQ for a large number of incoming and outgoing channels. Any large venue that cares about sound needs stuff like that. Just search for "matrix mixer."