You folks like difficult math problems, right? Hold onto your propellered beanies then, cuz here comes some...
Originally Posted by dovercat
Perceptual coding = gamma used to define the size of grey scale steps. So perceptually the steps are a similar size across the grey scale.
Basically, though I'd probably turn your statement around a bit and say that nonlinear encoding/decoding (aka gamma compression/expansion) is a form of perceptual coding.
If you want to split hairs, dithering could also be considered a form or aspect of "perceptual coding" in that it takes advantage of our eye's tendency to fuse small patches of colors together into new virtual shades*, allowing more colors to be represented with fewer bits; and it's also part of the overall perceptual coding scheme or strategy in consumer video which more broadly includes such things as nonlinear coding (aka gamma correction/compression), nonlinear decoding (aka "display gamma"), dithering, MPEG compression, chroma subsampling, and "picture rendering" (aka compensation for the surround effect).
(*Seurat, Monet and other impressionists/pointillists also use this phenomena to nice effect in their paintings.)
The last item, picture rendering, or "compensation for the surround effect" is actually part of the nonlinear encoding/decoding process. Poynton does a fairly good job of explaining the concept here. (There's a particularly nice graph at the top of page 12 that gives an indication how well different display gammas do in terms of perceptual uniformity in relation to L*.) There are couple of points that are somewhat glossed over or not fully explored there though that I think are important or at least pertinent to a better understanding of this subject. (This also has relevance to your questions re artistic intent and the robustness of consumer video in different viewing conditions, or I probably wouldn't bother going into it here.)
To put it in a nutshell, "picture rendering" is the slight bit of nonlinearity that's left in the final image on the screen after all the other nonlinearities in the video pipeline are multiplied out. A simple example would be...
.50 camera encoding gamma * 2.4 display gamma = 1.2 final screen gamma.
The 1.2 screen gamma here is used to compensate for "the surround effect", which is basically a change in the perceived lightness of an image when it's viewed in a dim or dark surround.
Picture rendering is one of the other tools in the video calibration toolkit that can be used to help preserve the "fidelity" or "integrity" of video content in different viewing conditions. It allows an image to be adapted to different surround lighting conditions, while still maintaining reasonably good perceptual similarity to the original mastered image. The farther away you get from the conditions in the mastering environment however, the less well the original "artistic intent" will probably hold up. In a knowledgeable calibrator's hands though it can work pretty well.
All of that you may already know. But here's what you may not know (and what Poynton doesn't really delve into too deeply in his proposal). We'll title this section...
How Our Current Home Video Paradigm Is Somewhat Perceptually Flawed or...
How Perceptual Uniformity in Video Could Be Improved... But Probably Won't Be, Cuz It Just Ain't Practical at the Mo'
As alluded to above, the "surround effect" is essentially a change in the magnitude or nonlinearity of lightness perception.
In the PDF above, Poynton estimates the average (or "best fit") exponent for L* (lightness) as ~.42. L* represents perceived lightness in an average surround. It corresponds approximately with Munsell's value scale, where a perceptually middle gray has 18% reflectance. Note the similarity in exponent between middle gray on the Munsell scale and Poynton's .42 estimate.
.18 reflectance ^ .4042 = .50 (or 1/2) "lightness"
Most of the experts (including Stevens) agree that lightness perception in an average surround falls between a cubed and squared root. So for simplicity's sake, what we'll do here is use the geometric mean of those two roots (which falls nicely in the middle of Munsell, Stevens and Poynton's estimates)...
(1/3 * 1/2) ^ 1/2 = .4082
This value is almost exactly 1/2.45. (Technically it's 1/2.44948974278317809819728407470589.)
IMO, we can't really come to a more accurate estimate of lightness in an average surround than this, without more extensive research and investigation. This value is also very close to the implicite exponent of L* 50, which is .4097 (or ~1/2.44).
Assuming these values are more or less in the ballpark, then in a system that uses simple power-law functions, perceptual uniformity should be best achieved in an average surround by encoding with an ~.4082 exponent and viewing on a display with roughly the inverse of that value, or ~2.45 gamma.
.4082 encoding gamma * 2.45 display gamma = 1.0 screen gamma
In the dim to dark surround typical of home viewing though the average exponent of perceived lightness drops somewhat closer to a cubed root. (See Stevens, Bartleson & Breneman, etc.) IOW, our perception of lightness gets less linear the darker the surround is, and the brighter the image is by comparison. You can see this influence of the surround in L* as well.
("% Lightness" in Average Surround)
||50 (Middle Gray in Average Surround)
The brighter the L* shade is in relation to an average surround, the lower (ie less linear) the equivalent exponent becomes.
To achieve better perceptual uniformity in a dim to dark surround, ideally you'd want your image/video to have an encoded gamma that's somewhat closer to a cubed root (ie 1/3 or .3333), and your display gamma closer to 3.0. There are a couple problems with this arrangement though.
No "picture rendering". The steps between codes may be more perceptually uniform in a darker surround with this setup, but the image still appears too bright because the final screen gamma is only 1.0.
1/3 encoding gamma * 3.0 display gamma = 1.0 screen gamma
That's easily rectified though by simply using the encoding exponent for an average surround of ~.4082 in place of a cubed root.
.4082 encoding gamma * 3.0 display gamma = 1.2247 screen gamma
(Bartleson & Breneman would probably like that ~1.225 screen figure because it's the geometric mean between their suggested values of 1.0 to 1.5 for image adjustment in "bright" and "dark" surrounds. )
In this setup with ~.41 encoding and ~3.0 display gamma, both perceptual uniformity and picture rendering might be closer to the optimum range for a darker surround, based on what's generally understood on these subjects.
Most displays currently in use have a much lower gamma than 3.0, and are more in the ballpark of the 2.45 gamma suggested above for an average surround. With ~.41 encoding, that basically takes us back to square one, and an ~1.0 final screen gamma, which means no picturing rendering.
Enter Rec. 709 with it's ~.50 encoding to save the day!!
.50 encoding gamma * 2.45 display gamma = 1.225 screen gamma
In this setup, picture rendering is accomplished with only a slight hit to perceptual uniformity on most current displays.
The moral to this story is that our current system of ~.50 encoding and ~2.20 to 2.50 display gamma may not necessarily be ideal, but it's probably about the best that can be hoped for, under the current circumstances, in terms of both perceptual uniformity and picture rendering. And it probably does a pretty decent job in most situations.
If you haven't already guessed what my recommendation for reference display gamma would be in this current ~.50 Rec. 709 encoding paradigm, it's the square root of six or 2.45, for the reasons outlined above (which is fairly close to Poynton's suggested ~2.4).
The advantage to using something closer to 2.45 is that it makes the math a little easier (from an engineering standpoint anyway), and it offers some clearer points of reference (e.g. Stevens' cubed & square roots, Bartleson & Breneman's 1.0 to 1.50 ratio, and the exponent at L* 50) than the "best fit" approach suggested by some other experts. And it's more consistent with Munsell's classic 18% gray. On a CRT with 2.45 gamma, for example...
.50 source voltage ^ 2.45 ∝ .1831 intensity
Or more generally...
.50 stimulus ^ 2.45 = .1831 relative luminance
Research is ongoing in all of these areas though, so it's quite possible that the models and numbers above might need some tweaking to better fit results "in the field" as they say.
Edited by ADU - 9/22/13 at 3:34pm
While working this up, I also ran across this white paper. It relates more to desktop imaging than video, but thought some here might find it an interesting read. (The author btw repeatedly makes the "error" of referring to L* as "luminance". Since L* is a perceptual scale, it more correctly refers to "lightness". Given that he's talking about encoding with L* though, his use of the term "luminance" is most likely shorthand for the "nonlinearly encoded representation of luminance".)