The primary reason to go beyond 1080p would be to be able to support things like anti-aliasing. The one thing that still brings out the pixel based nature of even HD content is aliasing along hard edges. We can see these very clearly. So though we wouldn't necessarily need much more than 1080p to provide some really good home video, we could use more resolution for playing tricks that make the image look smoother.
Actually, according to how it's done, you wouldn't actually have to deliver more than 1080p. There could be vector information in the 1080p data stream that provided hints as to how each frame could benefit from anti-aliasing, and it could be done on the fly by the display, given sufficient processing power. Given the great inertia inherent in changing broadcast and delivery and display formats, that might be an easier rock to crack, since if your display didn't use the hints, that's fine. But if you have the display with 2x or more times the available source resolution, you could make use of it.