Originally Posted by madshi
I see. Stacey has been so kind to send me a comparison screenshot with error diffusion turned on. FWIW, I can't see a noticable improvement over my random dithering results, so that seems to confirm your tests. I'm happy because not having to do error diffusion saves precious GPU shader power!
Yes, and the standard error diffusion algorithms don't really lend themselves to GPU acceleration, because they don't parallelize well.
You might consider seeing if you can come up with another error diffusion strategy that *would* parallelize well. It'd be an interesting challenge. The basic principle of gathering together the error in a local area and propagating it into just one or two of the nearby pixels is still a good strategy.
Yes, I did. On the problematic test clips I'm using, a very soft Gaussian filter looks *a lot* better than Catmull-Rom (to my eyes at least). The difference is significant. Yes, chroma gets very soft. But Catmull-Rom leaves very noticeable jaggies in the chroma channels. Obviously this is most visible in scenes with e.g. red fonts on black or gray backgrounds.
Huh. That's one of the things we looked at as well - red letters on a black background. But we never looked at Gaussian. I'd like to see what our color converter looks like compared to yours.
I was thinking of a way to convert 1080p 4:2:0 YCbCr to linear light, while keeping it at 4:2:0. This way we could do chroma upsampling in linear light. I was thinking of this:
(1) Given a 1080p 4:2:0 source, produce two images: Image A becomes 1080p 4:4:4 by doing best quality chroma upsampling (e.g. Gaussian). Image B becomes 540p 4:4:4 by scaling Y down (e.g. using Bicubic) while keeping CbCr untouched.
(2) Convert both images to linear light YCbCr.
(3) Combine the Y channel of image A with the CbCr channels of image B.
This way we would get linear light 1080p YCbCr 4:2:0. Now we can again upsample chroma (e.g. by using Gaussian filtering) to get to linear light YCbCr 1080p 4:4:4.
Does that make any sense to you? Maybe I'm really chasing wild gooses here?
Well, it seems like a fairly roundabout process. The thing is, to get from Y'CbCr to "linear YCbCr," you have to go through R'G'B' and RGB, so you'd be converting a buffer to R'G'B', then RGB, then "linear YCbCr" (which needs a better name), then doing the same for another buffer, then combining pieces of the two, then upscaling just the "linear Cb and Cr" channels, then converting to RGB.
Keep in mind that Y'CbCr is just a lossy encoding of R'G'B'. The test of any upconversion strategy is whether it in general gets you back to the same R'G'B' values you started with. It never will get you there, because three-quarters of the chroma information is being discarded. But that's always the goal.
So it's easy enough to run some tests. Take original R'G'B' images, convert them to Y'CbCr 4:2:0 and back and see which algorithms come closest to the original image using something like PSNR as the judging criterion.
Would you mind saying in a few words what exactly is covered by the patent? E.g. is linear light scaling generally covered? So that anybody who converts gamma corrected image/video content to linear space and then scales it would infringe the patent?
Your best bet is to read the text of the patent. I'm reaaaaaly uncomfortable expressing my opinion about what the patent covers. If Microsoft lawyers have impressed anything on me over the years it's this: don't offer legal opinions or anything resembling a legal opinion.
YES, I think there is! If you upscale e.g. a circle, if you only look at 2 pixels at a time you have less information to work with. You will not know that it's a circle and not just a straight line. If you take more adjacent pixels into account, you may be able to preserve the natural form of the circle better.
At least that's how I understand the logic of using more than just 2 pixels. And in my experiments e.g. Lanczos3 does a noticeable better job preserving smooth curves over Lanczos2. Also if I e.g. take the Lanczos4 parameters, but only make use of the two main contributors, upscaled curves are not smooth, anymore (I've already tried that).
Hmmm... I'm not understanding the part about 2 pixels. Catmull-Rom or Lanczos2 or any other two-lobed filter needs 4 pixels minimum as inputs (and more for the downscaling case). If your Lanczos2 implementation only looks at two pixels, it's wrong. Lanczos4 should be integrating 8 pixels (in each dimension).
If you just misspoke, then we're on the same page. I'm going to assume that's the case.
There can be reasons to examine a larger area than the immediate neighborhood of the pixel, as for various edge-preserving scaling algorithms like NEDI or ICBI or DCDI or SmartScale or various other algorithms. But while those algorithms may examine a large number of pixels (though most of them don't), they still actually do the interpolation with local pixels. Because I return to my previous assertion: there is no physical
reason to interpolate based on anything but the nearby pixels. There are psychovisual reasons (sharpening, edge preservation) to interpolate from extra pixels, but that is in no way a physical
model of imaging. And even if there are good reasons (edge-finding) to examine
further afield than the local area, it doesn't immediately follow that you should incorporate those other pixels in the interpolation kernel. You could separate an analysis kernel and an interpolation kernel, for example, where the analysis finds edges and sets coefficients, but then the interpolation itself is limited to the local vicinity.
Two-lobe filters, with 4 pixel inputs, are already a violation of my principle 1, because they are integrating information from two pixels away from the destination. I think it's an acceptable compromise because using that extra pixel gets you extra sharpness, and that's valuable.
I can imagine that Catmull-Rom or Lanczos2 could look more jagged than, say, Gaussian, because Catmull-Rom and Lanczos2 have inherent sharpening, and sharpening in X and Y nearly always increases jaggies on diagonals. But I have no idea why Lanczos4 would look better. It looks worse to me, other than the extra sharpness. But again, you may have found something new, or I might have made a mistake; I'm certainly not claiming I couldn't have done something wrong.
Also, I've tried upscaling and then downscaling a photo by 4x. Using Lanczos4 the final result was *much* nearer to the original than when using e.g. Catmull-Rom. The Lanczos4 processed image looked almost exactly like the original, while the Catmull-Rom processed image was a lot softer compared to the original.
That doesn't match my experience; have you tried the same experiment with other apps? The bicubic in Photoshop, GIMP, Paul Heckbert's Zoom, etc. are all pretty close to Catmull-Rom, or in some cases a sharpened Catmull-Rom.
True. You mean ringing, right?
Please check out these scaling samples:http://madshi.net/resampling.rar
There are 2 folders. The folder "ringing" contains 400% upscaled images by using standard resampling filter techniques like Catmull-Rom and Lanczos4. The folder "non-ringing" uses the same filter kernels, but a tweaked kernel apply logic invented by me. Let me know what you think!
I think I'm surprised by how good the diagonals are on larger-scale objects in the Lanczos4 case; it makes me want to dust off my testbench and try Lanczos4 again and see if I missed something along the way. There are also parts of the image that I honestly prefer the Catmull-Rom version, but it seems to me that overall a bunch of features have cleaner diagonals on the Lanczos4. It certainly doesn't match my memory of Lanczos4, but maybe we were so turned off by the ringing that we didn't spend as much time looking at the positive aspects of it. The ringing on the regular Lanczos4 is just as bad as I remember.
I think you've done a nice job with the ringing suppression, except that I kinda like a tiny bit of ringing just on the adjacent pixels; one mild halo next to edges; about what you see on the normal Catmull-Rom. I'd go so far as to say I like the Lanczos4 better with the ringing suppression, but I like the Catmull-Rom better without ringing suppression. I'm going to guess that your algorithm could be gated to allow a small amount of ringing through, which would be interesting to see.
I'd also like to see the various algorithms at 150% or 225% scaling, which is going to involve more subtle differences. But the advantages of windowed sinc and bicubic really shine on non-integer scaling ratios, IMO, and sometimes you really see artifacts best at odd scaling ratios. I did a lot of evaluation of images scaled up by one pixel, and found some numerical instability in the code, and odd beat frequencies that shouldn't have been there.
Do you want to continue this via email or via forums? Having this discussion on forum might be interesting for other people to read? But I would be fine with email, too. Just let me know what you prefer.
Oh, the forum is fine. Just if you want to reach me privately, use email rather than PM.
Thanks for your comments - much appreciated!! It's really a joy talking to someone who is equally obsessed with max image quality as I am.