First, there are two types of VMR.
There's VMR7 that came with WinXP and is limited to WinXP and then there's the updated VMR9 which came with DirectX-9 and works on all operating systems.
VMR7 still uses the overlay, but it has been dumped for VMR9.
VMR9 uses 3D polygons that are front-aligned to the monitor to display the video. VMR9 is rather new, a lot of its interfaces are not yet supported (or properly supported) by display drivers.
VMR9 supports the ProcAmp interface which gives access to certain color controls (Brightness, Contrast, Hue and Saturation ... no gamma). It also gives access to hardware based DeInterlacing.
The scaling on the VMR9 in my opinion is superior to the Overlay Hardware, at least on ATI and NVIDIA cards. The CPU usage is more or less the same (maybe 5% faster toward overlay using my GF4ti4200, but very small difference).
Unlike the overlay hardware which can only be used once. VMR9 can be used every which way. Which means you can have 2 videos playing at once using hardware scaling, you can have a video playing on a second monitor which doesn't support overlays at all.
Since VMR9 is based on Direct3D, there is no resolution limitation.
The only downside to VMR9 is that it doesn't support overlay color-keying, which is a very simple interface that allows you to overlay non-video data ontop of videos without using any CPU power (useful for OSD and other menu systems).
As far as pixel shaders go... sure, those can be used, but to do really powerful video-DSP, I don't think shaders are up to the task. And programming them is probably not going to be very easy.
The biggest problem right now is driver support. You need to remember that overlay has been out since around 1995 (maybe even sooner) and VMR9 has only been out since DirectX9 has been released (less than a year). And sadly, the display card companies care more games than video, so it may take another 1-2 year for VMR9 to reach it's peak. But even now, I use VMR9 on my NVIDIA card as frankly, NVIDIA's overlay is just plain flawed.
Nich:
On my system (p4 2.53ghz) using VMR9 and the latest ZP build (3.0 RC3), I can play the 6mbit+ Liquid demo at it's native resolution (1280x960, so you get 1:1 pixel mapping) at 50-85% CPU usage, completely smooth.