You are repeating the same basic intuitions that began the whole conversation, and which Darin is showing to be misleading.
Darin is pointing out that the resolution of a lens is very much tied to SPATIAL resolution, not temporal resolution. Lens resolution is a problem of AREA of SPACE, and how much can be resolved within that given space - it's not in issue of sequential information. If it was, then any old lens only capable of passing 720p images could be considered of virtually "unlimited resolution" because the amount of info you can pass through it (e.g. every movie or video ever made) sequentially is almost unlimited. After all, the film and videos are made of sequential information that our brains put together, just like E-shift. But that doesn't tell us anything about the resolving power of a lens. So the idea that adding up image info sequentially solves the resolution limitation problem - as your idea relies on - is a misunderstanding of the issues of lens resolution.
Hence it really matters that tiny pixels are trying to be moved around within the space of a single lens, whether this happens sequentially or not.
Try for a moment to put aside whether you are right or wrong, and think about the following:
Say you have a lens that CAN resolve down to 4K pixel sizes. Now, you have a blank field and you turn one of those teeny 4K pixels on. You can see the pixel resolved on screen through the lens. Now turn the pixel next to it on. Now you have two pixels beside each other, clearly resolved. Now, turn the first pixels off. You still see the new pixel resolved. Turn the original pixel back on. Both pixels are there.
Notice this is the difference between both 4K pixels being resolved simultaneously (both on) vs sequentially (turning one off, then on). You could turn all the millions of pixels in the 4K pattern on
either simultaneously. One pixel after the other. Or just turning only one pixel on at at time.
Through this process, is the resolution of the lens changing? Does it NEED to change? Surely not. It's the same lens, it can resolve 4K pixels spatially within it's dimensions. It doesn't matter whether you are showing those pixels one at a time or showing them sequentially. Each time you turn on another pixel, you are presenting NEW information through the lens...SPATIALLY....that it has to resolve.
If the lens couldn't resolve the first 4K pixel in the first place, you wouldn't see it, nor would you see the second one turned on sequentially either. Time has nothing to do with it: spatial resolving power does.
This is because the resolution of the lens isn't tied to time; it's about the spatial resolution possible.
So now imagine projecting a 1080p pixel grid. Now keep that grid there while SIMULTANEOUSLY projecting another 1080p pixel grid and E-shifting it. What happens? If the resolution is there, you can see how the fine grid structure has created new, smaller pixel sizes than the 1080p pixels. In other words, image information finer than the original 1080p pixel grid is now being displayed. A reminder of how this looks:
So there is now SMALLER SPATIAL INFORMATION being conveyed through the lens, the smaller grid areas creating smaller pixel. Again, this is presuming SIMULTANEOUS off-set of the pixel grid, not sequential.
Now think back on the example of resolving the 4K pixels. It did not matter whether the 4K pixel pattern was created sequentially, or simultaneously. The lens either had the spatial resolution to resolve the 4K pixels, together or separately - or it didn't.
If it didn't, you couldn't distinguish the 4K pixels whether you presented them simultaneously or sequentially within the space of the image.
It's the same now for the E-shift pixels. Looking at the SIMULTANEOUS projection of the E-shifted pixels, you can either see (resolved by the lens) the SMALLER-THAN-1080P grid structure and pixels, or you can't. You can make out how the grid has cut the original larger pixels into smaller ones...or you can't. If your lens couldn't resolve that fine information, then when you simultaneously display the E-shifted 1080p grid, it would not be resolving the new finer line structure. If the lens was ONLY capable of resolving something as large as a 1080p pixel, then overlaying a new shifted 1080p image wouldn't show the new, smaller visible pixel structure created by the intersection of the grid lines, it would simply create a more blurry image, with the big pixels simply WIPING OUT the finer pixel structure.
Now, just like the 4K pixel example, turn the second E-shift grid on and off, sequentially. Now remember the 4k pixel example. It didn't matter whether whether the pixels were on simultaneously or sequentially: what mattered was whether the lens had the spatial resolution to show information that small. It's the same with the new, smaller grid structure created by the E-shifted 1080p grid. The lens either has the spatial resolution to resolve the finer grid/pixels appearing between the shifted bigger pixels, or not. Like the 4K pixels, it doesn't matter whether you are asking the lens to show those finer pixels at the same time, or sequentially. In terms of spatial resolution, they put the same demands on the lens. The new finer grid structure of an E-shifted image puts the same spatial resolution demands on the lens, whether shown simultaneously or sequentially.
Yes, our brains put the E-shifted image together sequentially. But this can only happen in the first place if the spatial resolution of the lens would allow us to actually see the finer grid lines/pixel structure that occur when subtly off-setting the grid spatially. By subtly shifting the pixel grid via E-shift, this is a SPATIAL movement of the information as well as a sequential issue, and a lens has limited spatial resolution in which you can shift around teeny image information such that the effects would be visible to our eyes.