Just imagine it as having 4x the pixels for full 4k resolution. They would be fat, overlapping pixels, creating a blurred image. The blurring can be counteracted with sharpening prior to display. This works better with real images than with test patterns. To see what this looks like, take an image in a photo editing program, do a sharpen (to simulate the in-projector processing), then do a blur (to simulate the overlapping pixels).
The next thing to understand is that 2 diagonal pixels is almost (but not quite) as good as 4 pixels. You can see lots of examples of this if you look up examples of 2x MSAA.
As it has already been said, the result of these is closer to 4k than 2k, but not as clean as true 4k.
Time offset images do work fine for blending, due to persistence of vision. The shorter the time interval, the better. For examples, look at active 3D without glasses or interlaced TV.