AVS Forum banner

41 - 60 of 212 Posts

·
Registered
Joined
·
1,005 Posts

Quote:
Originally Posted by madshi /forum/post/16225250


I'm working on implementing a new DirectShow video renderer with highest possible image quality (making use of GPU shaders to do all the dirty work). I've just released a first beta version here, in case anybody is interested:

http://forum.doom9.org/showthread.php?t=146228

This is super cool!

Quote:
(1) I'm currently using simple TPDF dithering to do bring the floating point RGB data (coming from YCbCr -RGB conversion) down to e.g. 8bit integer. I'm aware that when going to very low bitdepths (e.g. 2bit like in your screenshots above) random dithering looks bad. But what is your opinion when going down to 8bit RGB or 10bit, 12bit? Is error diffusion really visibly better in this situation, too? Random dithering looks just fine to me with these bitdepths. Btw, is the algorithm you're using very different to Floyd Steinberg?

The short answer is that I don't know if you can find an image that looks better with error diffusion vs. plain random dither at 8 bits.


When we started this project we had people who were skeptical that any kind of dither was really necessary if you could get a really clean 8 bit downconversion. We had a video "expert" tell us that banding was a result of multiple 8-bit rounding steps, leaving gaps in the histogram. We pretty quickly demonstrated that that wasn't true; that with a smooth clean gradient you can have visible bands if you don't use dither.


We did try lots of bit depths and you get visibly improved results adding the error diffusion at 2, 3, 4, 5, and 6 bits on any material. With 7 bits, we can see the improvements but they're subtle on lots of material. With 8 bits, I can't tell you that I've seen anything that clearly looks better with error diffusion turned on. We turn it on because we're satisfied that it's better.


Our algorithm is basically Floyd-Steinberg, but with different coefficients. They came up with their coefficients by trial and error, looking for simple integer ratios that would produce good looking 1-bit-per-pixel results. With multiple bits per pixel, plus random noise injection, those specific coefficients in Floyd-Steinberg aren't important anymore. The key thing is the propagation of the error to adjacent pixels.

Quote:
(2) I've implemented chroma upsampling by using Gaussian scaling which seems to produce the best (very smooth) results. But I've done that in gamma corrected light. I'm wondering whether upsampling chroma in linear light would be even better? But of course the problem is how to get 4:2:0 chroma to linear light without upsampling it first...

That's really interesting. We never tried Gaussian but it sounds like a reasonable approach for chroma. It seems like your color edges are doomed to be soft, but maybe that's better-looking than over-sharp and jagged. I assume you compared it to a bicubic like Catmull-Rom?


The usual guidance I give on scaling in general is that you want a scaling kernel to have three things:


1. Amplitude locality, meaning keeping the energy of a pixel as close as you can to the original pixel. This is why long filters are bad; they spread the influence of a single pixel far beyond its immediate neighbors.


2. Smoothness, specifically that you don't introduce discontinuities that weren't in the original that will be perceived as new edges.


3. Edge preservation, meaning keeping the edges that are in the original.


All scalers have tradeoffs among these three. Long filters like any windowed sinc with more than two lobes are bad on the first principle. Bilinear isn't so good on the second principle or the third. Gaussian (if it's small-radius) is great on both 1 & 2, but lousy at 3. Any bicubic or two-lobe windowed sinc are a good tradeoff between the three. But none is perfect.


Our approach was to treat the chroma upconversion as a scaling operation and use our favorite bicubic, which is Catmull-Rom. But given that the chroma channel isn't really intended to carry edge information, maybe Gaussian is a better choice.


It's not really possible to upconvert chroma in linear space, because the chroma channels aren't really any kind of linear representation of anything, or even a simple mathematical transformation of a linear representation. The chroma channels are a linear combination of three gamma-corrected values, so there's no simple transformation you can apply to them to transform them individually into radiometrically linear.


One thing we meant to experiment with was whether there was a mathematical transform you could do to the chroma channels that, while still not mathematically perfect, would produce better results. What if you just treat them like they have a gamma curve applied, "undo" the gamma, and scale? Would it be better? We never really did much on it because it seemed like there were too many edge cases that were likely to look worse. But it might be worth exploring. It might also be a wild goose chase.

Quote:
(3) Can I directly convert Y'CbCr to YCbCr without going to RGB? I've been told that isn't possible. But if that isn't possible then why is Y'CbCr named Y'CbCr and not Y'Cb'Cr'? I mean the name Y'CbCr suggests that the only difference when going Y'CbCr -> YCbCr is the luma channel. Is CbCr not changed at all? Sorry, this might be a stupid question...

See above. Not really. It's called Y'CbCr just so people don't think Y' is equivalent to Y. It's not even a gamma-corrected version of Y. Cb and Cr don't have a prime mark because they aren't typically confused with some other well-known quantity. It's just a nomenclature issue. Y' means "value sort of related to Y" just like R' means "value sort of related to R".

Quote:
(4) I've stumbled over your 2007 patent. How are you handling it? I mean can I implement linear light scaling in my renderer without getting into trouble or do I need to get a license from you somehow? (Of course I'd use my own algorithms/code/logic, I'd just use the idea to do scaling in linear light).

Well, we aren't handling it at all - it's assigned to Microsoft. I hesitate to give you legal advice. I would point out that, say, Xvid is a minefield of patent infringement, yet no one has to my knowledge been sued over it. That's not the same as no one being able to sue over it. If there's no real money involved, it's still infringement, but without the ability to get a big fat settlement patent suits are not very attractive.

Quote:
FYI, I've developed a (mostly) ringing free new scaling algorithm based on 4-lobe Lanczos. Is that of any interest to you? If so, PM me.

I'm certainly interested, though I'm dead-set against all 3+ lobe windowed sinc filters (principle 1, above). Just do a thought experiment - is there any physical reason to spread the color from a pixel any further away from its location than the adjacent pixels?


It's absolutely true that 4-lobe Lanczos is sharper than 2-lobe, and 64-lobe Lanczos is sharper still. We know - we tried it.
But the artifacts get really ugly, and the extra sharpness is entirely because of the first and second lobe. That led us to ask, "How can we get a 2-lobe filter where the two lobes look more like the first two lobes of a many-lobe filter?"


Keep in mind that these filters aren't magic. You could draw a filter freehand that looks basically like a Lanczos filter and get good results. We know - we tried it.



Don't PM me, though; I'm not trying to be hard to reach, I just don't like communicating via web forms. Email me at [email protected] or [email protected] .


This is really great stuff you're doing; I'm really glad to see someone applying these principles.


Don
 

·
Registered
Joined
·
8,136 Posts

Quote:
Originally Posted by dmunsil /forum/post/16231779


When we started this project we had people who were skeptical that any kind of dither was really necessary if you could get a really clean 8 bit downconversion. We had a video "expert" tell us that banding was a result of multiple 8-bit rounding steps, leaving gaps in the histogram. We pretty quickly demonstrated that that wasn't true; that with a smooth clean gradient you can have visible bands if you don't use dither.

Hah - that sounds so much like my own experience! I've had several similar discussions on doom9...

Quote:
Originally Posted by dmunsil /forum/post/16231779


We did try lots of bit depths and you get visibly improved results adding the error diffusion at 2, 3, 4, 5, and 6 bits on any material. With 7 bits, we can see the improvements but they're subtle on lots of material. With 8 bits, I can't tell you that I've seen anything that clearly looks better with error diffusion turned on. We turn it on because we're satisfied that it's better.

I see. Stacey has been so kind to send me a comparison screenshot with error diffusion turned on. FWIW, I can't see a noticable improvement over my random dithering results, so that seems to confirm your tests. I'm happy because not having to do error diffusion saves precious GPU shader power!

Quote:
Originally Posted by dmunsil /forum/post/16231779


That's really interesting. We never tried Gaussian but it sounds like a reasonable approach for chroma. It seems like your color edges are doomed to be soft, but maybe that's better-looking than over-sharp and jagged. I assume you compared it to a bicubic like Catmull-Rom?

Yes, I did. On the problematic test clips I'm using, a very soft Gaussian filter looks *a lot* better than Catmull-Rom (to my eyes at least). The difference is significant. Yes, chroma gets very soft. But Catmull-Rom leaves very noticeable jaggies in the chroma channels. Obviously this is most visible in scenes with e.g. red fonts on black or gray backgrounds.

Quote:
Originally Posted by dmunsil /forum/post/16231779


It's not really possible to upconvert chroma in linear space, because the chroma channels aren't really any kind of linear representation of anything, or even a simple mathematical transformation of a linear representation. The chroma channels are a linear combination of three gamma-corrected values, so there's no simple transformation you can apply to them to transform them individually into radiometrically linear.


One thing we meant to experiment with was whether there was a mathematical transform you could do to the chroma channels that, while still not mathematically perfect, would produce better results. What if you just treat them like they have a gamma curve applied "undo" the gamma, and scale? Would it be better? We never really did much on it because it seemed like there were too many edge cases that were likely to look worse. But it might be worth exploring. It might also be a wild goose chase.

I was thinking of a way to convert 1080p 4:2:0 YCbCr to linear light, while keeping it at 4:2:0. This way we could do chroma upsampling in linear light. I was thinking of this:


(1) Given a 1080p 4:2:0 source, produce two images: Image A becomes 1080p 4:4:4 by doing best quality chroma upsampling (e.g. Gaussian). Image B becomes 540p 4:4:4 by scaling Y down (e.g. using Bicubic) while keeping CbCr untouched.

(2) Convert both images to linear light YCbCr.

(3) Combine the Y channel of image A with the CbCr channels of image B.


This way we would get linear light 1080p YCbCr 4:2:0. Now we can again upsample chroma (e.g. by using Gaussian filtering) to get to linear light YCbCr 1080p 4:4:4.


Does that make any sense to you? Maybe I'm really chasing wild gooses here?


Quote:
Originally Posted by dmunsil /forum/post/16231779


Well, we aren't handling it at all - it's assigned to Microsoft. I hesitate to give you legal advice.

Would you mind saying in a few words what exactly is covered by the patent? E.g. is linear light scaling generally covered? So that anybody who converts gamma corrected image/video content to linear space and then scales it would infringe the patent?

Quote:
Originally Posted by dmunsil /forum/post/16231779


I'm certainly interested, though I'm dead-set against all 3+ lobe windowed sinc filters (principle 1, above). Just do a thought experiment - is there any physical reason to spread the color from a pixel any further away from its location than the adjacent pixels?

YES, I think there is! If you upscale e.g. a circle, if you only look at 2 pixels at a time you have less information to work with. You will not know that it's a circle and not just a straight line. If you take more adjacent pixels into account, you may be able to preserve the natural form of the circle better.


At least that's how I understand the logic of using more than just 2 pixels. And in my experiments e.g. Lanczos3 does a noticeable better job preserving smooth curves over Lanczos2. Also if I e.g. take the Lanczos4 parameters, but only make use of the two main contributors, upscaled curves are not smooth, anymore (I've already tried that).


Or am I wrong with my line of thinking?


Also, I've tried upscaling and then downscaling a photo by 4x. Using Lanczos4 the final result was *much* nearer to the original than when using e.g. Catmull-Rom. The Lanczos4 processed image looked almost exactly like the original, while the Catmull-Rom processed image was a lot softer compared to the original.

Quote:
Originally Posted by dmunsil /forum/post/16231779


It's absolutely true that 4-lobe Lanczos is sharper than 2-lobe, and 64-lobe Lanczos is sharper still. We know - we tried it.
But the artifacts get really ugly

True. You mean ringing, right?


Please check out these scaling samples:

http://madshi.net/resampling.rar


There are 2 folders. The folder "ringing" contains 400% upscaled images by using standard resampling filter techniques like Catmull-Rom and Lanczos4. The folder "non-ringing" uses the same filter kernels, but a tweaked kernel apply logic invented by me. Let me know what you think!

Quote:
Originally Posted by dmunsil /forum/post/16231779


Don't PM me, though; I'm not trying to be hard to reach, I just don't like communicating via web forms. Email me at [email protected] or [email protected] .

Do you want to continue this via email or via forums? Having this discussion on forum might be interesting for other people to read? But I would be fine with email, too. Just let me know what you prefer.

Quote:
Originally Posted by dmunsil /forum/post/16231779


This is really great stuff you're doing; I'm really glad to see someone applying these principals.



Thanks for your comments - much appreciated!! It's really a joy talking to someone who is equally obsessed with max image quality as I am.
 

·
Registered
Joined
·
5,585 Posts
Discussion Starter #43

Quote:
The short answer is that I don't know if you can find an image that looks better with error diffusion vs. plain random dither at 8 bits.

I tried several combinations with Elephants Dream. I don't have them anymore, but could re-create them once I dig out the drive. The FS + random looked better than pure random.
 

·
Banned
Joined
·
20,735 Posts
Stacey, do the same concerns arise when upscaling as well? Don't you still run into similar problems when doing linear operations on gamma-corrected content when upscaling? I'm curious about this.
 

·
Registered
Joined
·
1,005 Posts

Quote:
Originally Posted by madshi /forum/post/16232801


I see. Stacey has been so kind to send me a comparison screenshot with error diffusion turned on. FWIW, I can't see a noticable improvement over my random dithering results, so that seems to confirm your tests. I'm happy because not having to do error diffusion saves precious GPU shader power!

Yes, and the standard error diffusion algorithms don't really lend themselves to GPU acceleration, because they don't parallelize well.


You might consider seeing if you can come up with another error diffusion strategy that *would* parallelize well. It'd be an interesting challenge. The basic principle of gathering together the error in a local area and propagating it into just one or two of the nearby pixels is still a good strategy.

Quote:
Yes, I did. On the problematic test clips I'm using, a very soft Gaussian filter looks *a lot* better than Catmull-Rom (to my eyes at least). The difference is significant. Yes, chroma gets very soft. But Catmull-Rom leaves very noticeable jaggies in the chroma channels. Obviously this is most visible in scenes with e.g. red fonts on black or gray backgrounds.

Huh. That's one of the things we looked at as well - red letters on a black background. But we never looked at Gaussian. I'd like to see what our color converter looks like compared to yours.

Quote:
I was thinking of a way to convert 1080p 4:2:0 YCbCr to linear light, while keeping it at 4:2:0. This way we could do chroma upsampling in linear light. I was thinking of this:


(1) Given a 1080p 4:2:0 source, produce two images: Image A becomes 1080p 4:4:4 by doing best quality chroma upsampling (e.g. Gaussian). Image B becomes 540p 4:4:4 by scaling Y down (e.g. using Bicubic) while keeping CbCr untouched.

(2) Convert both images to linear light YCbCr.

(3) Combine the Y channel of image A with the CbCr channels of image B.


This way we would get linear light 1080p YCbCr 4:2:0. Now we can again upsample chroma (e.g. by using Gaussian filtering) to get to linear light YCbCr 1080p 4:4:4.


Does that make any sense to you? Maybe I'm really chasing wild gooses here?

Well, it seems like a fairly roundabout process. The thing is, to get from Y'CbCr to "linear YCbCr," you have to go through R'G'B' and RGB, so you'd be converting a buffer to R'G'B', then RGB, then "linear YCbCr" (which needs a better name), then doing the same for another buffer, then combining pieces of the two, then upscaling just the "linear Cb and Cr" channels, then converting to RGB.


Keep in mind that Y'CbCr is just a lossy encoding of R'G'B'. The test of any upconversion strategy is whether it in general gets you back to the same R'G'B' values you started with. It never will get you there, because three-quarters of the chroma information is being discarded. But that's always the goal.


So it's easy enough to run some tests. Take original R'G'B' images, convert them to Y'CbCr 4:2:0 and back and see which algorithms come closest to the original image using something like PSNR as the judging criterion.

Quote:
Would you mind saying in a few words what exactly is covered by the patent? E.g. is linear light scaling generally covered? So that anybody who converts gamma corrected image/video content to linear space and then scales it would infringe the patent?

Your best bet is to read the text of the patent. I'm reaaaaaly uncomfortable expressing my opinion about what the patent covers. If Microsoft lawyers have impressed anything on me over the years it's this: don't offer legal opinions or anything resembling a legal opinion.


Quote:
YES, I think there is! If you upscale e.g. a circle, if you only look at 2 pixels at a time you have less information to work with. You will not know that it's a circle and not just a straight line. If you take more adjacent pixels into account, you may be able to preserve the natural form of the circle better.


At least that's how I understand the logic of using more than just 2 pixels. And in my experiments e.g. Lanczos3 does a noticeable better job preserving smooth curves over Lanczos2. Also if I e.g. take the Lanczos4 parameters, but only make use of the two main contributors, upscaled curves are not smooth, anymore (I've already tried that).

Hmmm... I'm not understanding the part about 2 pixels. Catmull-Rom or Lanczos2 or any other two-lobed filter needs 4 pixels minimum as inputs (and more for the downscaling case). If your Lanczos2 implementation only looks at two pixels, it's wrong. Lanczos4 should be integrating 8 pixels (in each dimension).


If you just misspoke, then we're on the same page. I'm going to assume that's the case.


There can be reasons to examine a larger area than the immediate neighborhood of the pixel, as for various edge-preserving scaling algorithms like NEDI or ICBI or DCDI or SmartScale or various other algorithms. But while those algorithms may examine a large number of pixels (though most of them don't), they still actually do the interpolation with local pixels. Because I return to my previous assertion: there is no physical reason to interpolate based on anything but the nearby pixels. There are psychovisual reasons (sharpening, edge preservation) to interpolate from extra pixels, but that is in no way a physical model of imaging. And even if there are good reasons (edge-finding) to examine further afield than the local area, it doesn't immediately follow that you should incorporate those other pixels in the interpolation kernel. You could separate an analysis kernel and an interpolation kernel, for example, where the analysis finds edges and sets coefficients, but then the interpolation itself is limited to the local vicinity.


Two-lobe filters, with 4 pixel inputs, are already a violation of my principle 1, because they are integrating information from two pixels away from the destination. I think it's an acceptable compromise because using that extra pixel gets you extra sharpness, and that's valuable.


I can imagine that Catmull-Rom or Lanczos2 could look more jagged than, say, Gaussian, because Catmull-Rom and Lanczos2 have inherent sharpening, and sharpening in X and Y nearly always increases jaggies on diagonals. But I have no idea why Lanczos4 would look better. It looks worse to me, other than the extra sharpness. But again, you may have found something new, or I might have made a mistake; I'm certainly not claiming I couldn't have done something wrong.

Quote:
Also, I've tried upscaling and then downscaling a photo by 4x. Using Lanczos4 the final result was *much* nearer to the original than when using e.g. Catmull-Rom. The Lanczos4 processed image looked almost exactly like the original, while the Catmull-Rom processed image was a lot softer compared to the original.

That doesn't match my experience; have you tried the same experiment with other apps? The bicubic in Photoshop, GIMP, Paul Heckbert's Zoom, etc. are all pretty close to Catmull-Rom, or in some cases a sharpened Catmull-Rom.

Quote:
True. You mean ringing, right?

Yes.

Quote:
Please check out these scaling samples:

http://madshi.net/resampling.rar


There are 2 folders. The folder "ringing" contains 400% upscaled images by using standard resampling filter techniques like Catmull-Rom and Lanczos4. The folder "non-ringing" uses the same filter kernels, but a tweaked kernel apply logic invented by me. Let me know what you think!

I think I'm surprised by how good the diagonals are on larger-scale objects in the Lanczos4 case; it makes me want to dust off my testbench and try Lanczos4 again and see if I missed something along the way. There are also parts of the image that I honestly prefer the Catmull-Rom version, but it seems to me that overall a bunch of features have cleaner diagonals on the Lanczos4. It certainly doesn't match my memory of Lanczos4, but maybe we were so turned off by the ringing that we didn't spend as much time looking at the positive aspects of it. The ringing on the regular Lanczos4 is just as bad as I remember.


I think you've done a nice job with the ringing suppression, except that I kinda like a tiny bit of ringing just on the adjacent pixels; one mild halo next to edges; about what you see on the normal Catmull-Rom. I'd go so far as to say I like the Lanczos4 better with the ringing suppression, but I like the Catmull-Rom better without ringing suppression. I'm going to guess that your algorithm could be gated to allow a small amount of ringing through, which would be interesting to see.


I'd also like to see the various algorithms at 150% or 225% scaling, which is going to involve more subtle differences. But the advantages of windowed sinc and bicubic really shine on non-integer scaling ratios, IMO, and sometimes you really see artifacts best at odd scaling ratios. I did a lot of evaluation of images scaled up by one pixel, and found some numerical instability in the code, and odd beat frequencies that shouldn't have been there.

Quote:
Do you want to continue this via email or via forums? Having this discussion on forum might be interesting for other people to read? But I would be fine with email, too. Just let me know what you prefer.

Oh, the forum is fine. Just if you want to reach me privately, use email rather than PM.

Quote:
Thanks for your comments - much appreciated!! It's really a joy talking to someone who is equally obsessed with max image quality as I am.

Likewise, absolutely.



Don
 

·
Registered
Joined
·
1,005 Posts

Quote:
Originally Posted by ChrisWiggles /forum/post/16235345


Stacey, do the same concerns arise when upscaling as well? Don't you still run into similar problems when doing linear operations on gamma-corrected content when upscaling? I'm curious about this.

Absolutely; the same issues apply on upscaling as downscaling. It's all about interpolation. When you interpolate between two gamma-corrected values, you get a result that's too dark. It doesn't matter whether it's for up- or downscaling. The edges always get pulled toward black.


Don
 

·
Registered
Joined
·
8,136 Posts

Quote:
Originally Posted by dmunsil /forum/post/16236990


You might consider seeing if you can come up with another error diffusion strategy that *would* parallelize well.

That's really hard to do because the order in which the pixels are processed inside of a GPU pixel shader is not documented (or at least I've not seen that documentation). So I can really only handle each pixel on its own. I can read the surrounding pixels just fine, but I can't write to them.

Quote:
Originally Posted by dmunsil /forum/post/16236990


Huh. That's one of the things we looked at as well - red letters on a black background. But we never looked at Gaussian. I'd like to see what our color converter looks like compared to yours.

Try this one:

http://madshi.net/chromatest.mkv


It's just a 10 second clip containing a very good "red on black" situation. You can compare your own chroma upsampling to the results my renderer produces. A screenshot of my renderer's results is in the 2nd post here:

http://forum.doom9.org/showthread.php?t=146228

Quote:
Originally Posted by dmunsil /forum/post/16236990


Hmmm... I'm not understanding the part about 2 pixels. Catmull-Rom or Lanczos2 or any other two-lobed filter needs 4 pixels minimum as inputs (and more for the downscaling case). If your Lanczos2 implementation only looks at two pixels, it's wrong. Lanczos4 should be integrating 8 pixels (in each dimension).

Yes, you're right and of course my implementation uses 8 pixels for Lanczos4 and 4 pixels for Catmull-Rom upscaling. I just misspoke.

Quote:
Originally Posted by dmunsil /forum/post/16236990


There can be reasons to examine a larger area than the immediate neighborhood of the pixel, as for various edge-preserving scaling algorithms like NEDI or ICBI or DCDI or SmartScale or various other algorithms. But while those algorithms may examine a large number of pixels (though most of them don't), they still actually do the interpolation with local pixels. Because I return to my previous assertion: there is no physical reason to interpolate based on anything but the nearby pixels. There are psychovisual reasons (sharpening, edge preservation) to interpolate from extra pixels, but that is in no way a physical model of imaging.

If you were right shouldn't Bilinear scaling produce the best results (apart from not doing sharpening)?

Quote:
Originally Posted by dmunsil /forum/post/16236990


Two-lobe filters, with 4 pixel inputs, are already a violation of my principle 1, because they are integrating information from two pixels away from the destination. I think it's an acceptable compromise because using that extra pixel gets you extra sharpness, and that's valuable.

You seem to think that using more than 2 pixels (more than Bilinear) only helps sharpness. I'm sorry to say, but I think you're wrong here. Using more than 2 pixels helps sharpness, but it also allows a smoother (less jaggied) interpolation of curves.


See also the screenshot comparison here:

http://audio.rightmark.org/lukin/gra...house_more.htm


You will see that the Sinc filter is ugly as hell, but it produces the smoothest curves. Both Bilinear and Catmull-Rom have *loads* of jaggies in comparison.

Quote:
Originally Posted by dmunsil /forum/post/16236990


That doesn't match my experience; have you tried the same experiment with other apps? The bicubic in Photoshop, GIMP, Paul Heckbert's Zoom, etc. are all pretty close to Catmull-Rom, or in some cases a sharpened Catmull-Rom.

Haven't tried any of these. When I started working on resampling algorithms, I began with studying Alexey Lukin's screenshots. Then I implemented my own resampling algorithms and the results exactly matched Alexey Lukin's screenshots. So I see no reason to doubt the validity of my results, unless you have screenshots which look different.

Quote:
Originally Posted by dmunsil /forum/post/16236990


I think I'm surprised by how good the diagonals are on larger-scale objects in the Lanczos4 case; it makes me want to dust off my testbench and try Lanczos4 again and see if I missed something along the way.

Please do!


Quote:
Originally Posted by dmunsil /forum/post/16236990


There are also parts of the image that I honestly prefer the Catmull-Rom version, but it seems to me that overall a bunch of features have cleaner diagonals on the Lanczos4. It certainly doesn't match my memory of Lanczos4, but maybe we were so turned off by the ringing that we didn't spend as much time looking at the positive aspects of it. The ringing on the regular Lanczos4 is just as bad as I remember.


I think you've done a nice job with the ringing suppression, except that I kinda like a tiny bit of ringing just on the adjacent pixels; one mild halo next to edges; about what you see on the normal Catmull-Rom. I'd go so far as to say I like the Lanczos4 better with the ringing suppression, but I like the Catmull-Rom better without ringing suppression. I'm going to guess that your algorithm could be gated to allow a small amount of ringing through, which would be interesting to see.

My "non-ringing" algorithm is based on allowing ringing in some areas of the image and not allowing it in others. So I think it could be tweaked a little to accommodate your likings, but I haven't tried that yet, because personally I can't stand any ringing, no matter how small it is. The only reason that my algorithm allows ringing in some parts of the image is that if I suppressed any and all ringing, the image would lose all its smooth curves for whatever reason.

Quote:
Originally Posted by dmunsil /forum/post/16236990


I'd also like to see the various algorithms at 150% or 225% scaling, which is going to involve more subtle differences. But the advantages of windowed sinc and bicubic really shine on non-integer scaling ratios, IMO, and sometimes you really see artifacts best at odd scaling ratios. I did a lot of evaluation of images scaled up by one pixel, and found some numerical instability in the code, and odd beat frequencies that shouldn't have been there.

I've uploaded some new samples here, this time only Lanczos4 and Catmull-Rom:

http://madshi.net/resampling2.rar


Included are 150% upscales, 225% upscales, +1 pixel upscales and a "225% -> 100%" up/downscale. Please check out especially the up/downscale. If you look at the ringing Lanczos4 up/downscale, you should notice that it looks almost identical to the original, while the Catmull-Rom result looks a lot softer.


Would be great if you could confirm or refute my results with your own testing methods!



Thanks!!
 

·
Registered
Joined
·
9,884 Posts
Don -


Some of the above has turned into an excellent summary of the current state of the art in video processing.


Definitely a keeper thread. Thank you.


BTW, I was half way thinking about trying to do something special to celebrate my 10000'th post on this forum. But decided my thanks for this thread was certainly worth it instead, at least to me.


- Tom


PS - Talking about looking at too many pixels, anybody doing scaling using GPU based full screen Fourier transforms yet?
 

·
Registered
Joined
·
3,314 Posts

Quote:
Originally Posted by dmunsil /forum/post/16231779




Well, we aren't handling it at all - it's assigned to Microsoft. I hesitate to give you legal advice. I would point out that, say, Xvid is a minefield of patent infringement, yet no one has to my knowledge been sued over it. That's not the same as no one being able to sue over it. If there's no real money involved, it's still infringement, but without the ability to get a big fat settlement patent suits are not very attractive.




Don

Don't think the actual concept of linearising imagery prior to filtering operations is particularly patent worthy in itself. The concept is well understood and applied by lots of digital imaging practitioners ever since I can remember.


If I catch any of my guys resizing images in anyting other than linear I tend to get a little bit loud.


Not to take anything away from anyone's accomplishments.


Also does this mean I can sue Microsoft?


http://archive2.avsforum.com/avs-vb/...2&page=3&pp=30



Quote:
Just to keep some people happy.


Does ffdshow linearise the video material prior to scaling ie does it attempt to flatten the video intensity curve into a totally linear response ( a colour correction or transfer function if you will).


I'm not sure how much of an impact the nonlinearity of a video curve has on pixel interpolation but its certainly common place to linearise 10bit log film scans prior to filtering operations to get more "accurate" pixel values in the interpolation.
 

·
Registered
Joined
·
5,585 Posts
Discussion Starter #50

Quote:
If I catch any of my guys resizing images in anyting other than linear I tend to get a little bit loud.

That is always good to hear.

Quote:
Some of the above has turned into an excellent summary of the current state of the art in video processing.

The current state of the art is always at least one step ahead.



It is nice to see stuff done back in 2003 finally being looked at again.
 

·
Registered
Joined
·
9,884 Posts

Quote:
Originally Posted by sspears /forum/post/16238283


That is always good to hear.




The current state of the art is always at least one step ahead.



It is nice to see stuff done back in 2003 finally being looked at again.

Kewl,


Can you tell us about any newer developments?


- Tom (who hasn't written a new video filter for a 2-3 years now)
 

·
Registered
Joined
·
1,005 Posts
Sorry for the delay replying; it's been busy around here. I'm not ignoring the thread. Hopefully I can post a detailed response tomorrow after I get my taxes done.
 

·
Banned
Joined
·
20,735 Posts
Stacey & Don:


I have a question about this whole thing of processing in linear light that I don't quite grasp as I was thinking about this the other day. I was discussing this with Darin and I think both of us are missing why this is advantageous.


I have sort of taken for granted in the past that it's better to process on RGB values that are linear to light rather than gamma-corrected R'G'B', but now that I think of it in simple terms I fail to see why this gives any advantages, and in my simplistic thinking seems counterproductive. Here's why:


So, given your previous example of a black sample next to a white sample, and if we interpolate a sample between these two samples, at what level should that sample be?


I understand that if you do this operation linearly on gamma-corrected values, we end up with a 50% lightness value, which equates to ~18% Luminance (true Y). This is what you describe as the values being pulled towards black in non-linear space. But this is still at 50% Lightness, and appears subjectively halfway between white and black visually. I fail to see why this would not be what is desired.


On the other hand, if we de-gamma and move to linear RGB, then interpolate a sample between black and white, our new sample is at 50% luminance. This translates to (using .4 power between luminance and lightness) to about 76% lightness. So in this example instead of seeing a black element, then a half-gray element, then a white element, we see a black element, then an element a good deal brighter than half white(in perceptual lightness), then a white element.


Wouldn't we prefer, for perceptual reasons, the first example where we have interpolated a value that yields us 50% lightness or half gray, than 76% lightness which is much brighter than that?


What part of this am I missing here? It would seem to me that it would be advantageous to process on nonlinear values to maintain perceptual uniformity.
 

·
Registered
Joined
·
1,005 Posts
What you're getting stuck on is the issue of perceptual versus physical. It's true that the value that is numerically 50% between two values in gamma-corrected space is fairly close to perceptually 50%, because gamma-corrected space is closer to perceptually linear than a radiometrically linear (physical) space.


However, the concept of "perceptual linearity" isn't applicable here, because when we scale an image we are effectively modeling what happens when we take bunch of photons in one set of buckets (pixels) and redistribute them into different buckets. Scaling an image by 50% in both dimensions should produce similar results to taking a picture of the same scene with a digital camera with 50% fewer pixels in both dimensions. It's pretty obvious that this is a physical process, not a perceptual one.


Another way of looking at it is this: suppose you have a checkerboard of alternating white and black pixels on a screen. If you walk away from it until you can't see individual pixels but rather a gray wash, what color should that wash be? Again, this is a physical process. The amount of light coming off the screen doesn't change, it's just being captured by different buckets (in this case, cones in your eye).


The answer, in both cases, is that the interpolation needs to be done on the physical quantities, not to the perceived lightness values. This matches real-world testing. An image that was scaled using linear values looks much closer to the original than one using gamma-corrected values.


In the end, it's worth considering what "perceptually linear" means. It means that presented with three separate gray levels that are equidistant in Lightness in a perceptually linear space, people will generally agree that the middle one is equally distant from the outer two. But it does not mean that a mix of the outer two will produce or should produce the middle value. And it does mean that if you see a black and white checkerboard pattern on a tiled surface, when you walk away from it until it turns gray, that gray level will not be perceptually midway between black and white! And this is perfectly fine, and doesn't cause cognitive dissonance. Or at least it shouldn't.



And I'm still trying to find time to respond to Madshi's last post; it appears that getting involved with a highly technical discussion online while you're refinancing your house, paying your taxes, and trying to get a big milestone shipped at work is a sub-optimal strategy. And I want to post function graphs and stuff, but that means dusting off all my graphing software. But I love this discussion and I look forward to explaining further why long scaling filters are the work of the Devil.



Don
 

·
Banned
Joined
·
20,735 Posts
Don, thanks, that does make more sense to me now. It was just one of those things I was thinking about as I was falling asleep, and had kind of a "wtf, wait that doesn't really make sense" moment.
 

·
Registered
Joined
·
329 Posts
Don & Stacey,

I've been searching a little to find a good interpolation algorythm to fit some custom gamma curve points to use in the next version of a little software tool I'm writing (cr3dlut). I've decided to go for the akima cubic spline, which seems to be the best for my needs.

This made me wonder... would it also be a good option for scaling an image? akima spline is great because it gives a very smooth curve, without any overshoot. Have you read about it and/or tryed it while performing your resizing tests?


Here are some links:
http://www.alglib.net/interpolation/spline3.php
http://www.codecogs.com/d-ox/maths/i...tion/akima.php
http://www.fugroairborne.com/resourc...ension_III.pdf


If you want Akima's papers go here:
http://portal.acm.org/results.cfm?co...TOKEN=84253391


A source code example part of linux:
http://debian.cs.binghamton.edu/debi..._1.1-11.tar.gz


Sorry it it's a dumb post, but I'm a newbie in resampling algorythms...
 

·
Registered
Joined
·
2,631 Posts

Quote:
Originally Posted by yesgrey3 /forum/post/16341803


Don & Stacey,

I've been searching a little to find a good interpolation algorythm to fit some custom gamma curve points to use in the next version of a little software tool I'm writing (cr3dlut). I've decided to go for the akima cubic spline, which seems to be the best for my needs.

This made me wonder... would it also be a good option for scaling an image? akima spline is great because it gives a very smooth curve, without any overshoot. Have you read about it and/or tryed it while performing your resizing tests?[/url]


Sorry it it's a dumb post, but I'm a newbie in resampling algorythms...

I've not heard of it by that name, but the lack of overshoot is a welcome property. Since codecs are all about encoding frequencies, any algorithm that adds extra, erroneous frequencies will make compression harder on top of any visual impact.


That's one of the reasons why aliasing from scaling and deinterlacing is so bad: first it makes the image look worse, and then the ugly errors suck bits away from the real image, hurting that as well.


Downsampling and upsampling are different beasts, of course. I like to arrange my affairs to that I'm never upsampling on either axis. Downsampling I've grown quite to like the Super Sampling implementation in Expression Encoder 2 SP1.
 

·
Registered
Joined
·
9,884 Posts

Quote:
Originally Posted by benwaggoner /forum/post/16343208


...


That's one of the reasons why aliasing from scaling and deinterlacing is so bad: first it makes the image look worse, and then the ugly errors suck bits away from the real image, hurting that as well.

...

Yep. Viewers may be willing to overlook a lot of aliasing to get more detail since we have all been brought up on interlaced TV anyway. But that makes deinterlacing for progressive encoding a whole different (more filtered) game than for viewing.


- Tom
 

·
Registered
Joined
·
8,136 Posts
Dear experts,


I've some trouble with chroma upsampling and color conversion and hope you can help out.


Basically the problem is that in some cases chroma upsampling, followed by YCbCr -> RGB conversion can result in RGB values that are out of range (e.g. negative when using PC levels or

Now my problem is this: How do displays handle BTB information? E.g. if a display receives (via HDMI) an RGB pixel value of (24, 12, 12), what does it do? Since most displays are physically RGB these days, probably displays display R with the proper value 24, but display G and B with values 15 or 16 because they're calibrated that way? If that's the case, the displayed brightness of the RGB pixel is clearly incorrect (much too high)! Wouldn't it in that case be better to post process the RGB data in such a way that brightness of each RGB pixel is displayed correctly by the display (by "removing" BTB and WTW information in a clever way)?


This clearly puts me in a conflict: All my life I've read and heard that BTB and WTW information should be left intact. But considering what I wrote above I'm actually thinking that it might result in more correct display if we "remove" BTB and WTW information (while keeping brightness intact).


Your thoughts?
 
41 - 60 of 212 Posts
Top