*UPDATED* comparisons can be found here.
I recently had an opportunity to compare the performance of currently available 3DLUT generating software for the eeColor box. I believe this is the first systematic and rigorous look at the relative results of these software packages in a typical home theater set-up. I will describe the initial conditions first, then stability checks and assessment of precision for the comparisons, and then discuss the results.
Display initial conditions: Samsung D8000
ArgyllCMS [ver. 1.6.3]
Cube parameters
20 single axis, 40 grayscale, 4 white, 8 black, 2393 Optimized Farthest Point Sampling (OFPS)
Dark region emphasis (1.3), Neutral axis emphasis (0.75), preconditioned with previously measured native gamut.
Command lines:Code:
targen -v -d3 -G -e4 -B8 -g40 -s20 -f2500 -V1.3 -N0.75 -c native.icm argyll_patch_setdispread -v -d2 -X d3oem_jeti.ccmx -yr -P0.5,0.5,2.2,1.3 argyll_patch_setcolprof -v -qh -bl -ax argyll_patch_setcollink -v -qh -3e -et -Et -G -IB -ia -w rec709.icm argyll_patch_set.icm 3DLUT_1.icm
Probe parameters
0.4 second integration, auto measure delay, sync, adaptive integration time
JETI-1211 profiled Display Pro-OEM
LightSpace CMS [ver. 6.5.0.1820]
Cube parameters
17x17x17
Probe parameters
0.5 second integration, LLH enabled
JETI-1211 profiled Display Pro-OEM
CalMAN 5 [ver. 5.2.3.1416]
Cube parameters
17x9 (optimized), 109% included
Note: Limit to 100% is broken [subsequently fixed in later versions]
Probe parameters
0.5 second integration, LLM 2 seconds, 5 cd/m^2 trigger, sync
JETI-1211 profiled Display Pro-OEM
Pattern geometry used: 11% windows on a black background.
Pattern generator: PC (intel HD3000/HDMI out - RGB Full, driver ver. 9.17.10.3040).
The most critical thing in a comparison like this is to know how reproducible the measurements are so that you can assess the significance of the measured color difference between one set of results and another separated in time, this will define our precision in being able to discriminate between performance results. The test was designed so that ideally nothing should change except the software used to calculate the 3DLUT transforms from their respective test patch characterizations. In this scenario the stability of the display and probe are the limiting factors. To ensure the best precision, all the profiles were measured in one session over an eight hour period without disturbing the hardware set-up. Additionally a baseline monitor measurement was performed after each profile and at the end of the verification steps. The measurement of offsets between the JETI-1211 and Display Pro was measured once and incorporated in each software package.
Baseline test set (BTS)
This is a 1000 patch quasi-random perceptual space-filling volume chosen so that it does not overlap with the device characterization test sets used by any of the software packages. I will measure the display native response several times bracketing the profile measurements to test system stability. It will also be used when LUTs are loaded and tested for performance against each other. Measurement time for the BTS is ~15 minutes.
Test sequence
Comparing steps 1,3,5,7,13 I find:
dE2000 color difference
3 to 1 - avg. = 0.15, max = 0.71
5 to 3 - avg. = 0.12, max = 0.67
7 to 5 - avg. = 0.14, max = 0.83
13 to 7 - avg. = 0.21, max = 0.96
So in this hardware configuration there is a dE floor of ~0.16 due to a combination of probe precision/display repeatability or bit noise.
Steps 8-12 were used to assess performance.
Tabular Results:
Histograms:
Expanded view
The ArgyllCMS LUT is significantly more peaked than either CalMAN or LightSpace and is able to drive 93% of the test patches below 1 dE. This is consistent with previous measurements of ArgyllCMS performance. Note that the CalMAN distribution is skewed so that it's peak is about halfway between the ArgyllCMS and LS peaks. This is due to the way it iteratively optimizes dE compared to the other 2 software approaches. This yields about the same average performance as LightSpace but does better at pushing more points below 1 dE (72% vs. 67%) at the expense of a slightly higher maximum error (4.3 vs 3.6, Colorchecker SG 3.8 vs. 2.9)
In all the verification measurements I ran the Argyll generated LUT outperformed both LightSpace and CalMAN in a statistically significant manner. I do not consider the differences between CalMAN and LightSpace statistically significant except in the case of gray scale calibration where CalMAN did a bit better than LS (0.8 vs. 1.3) and the skin tone tests where LS did better than CalMAN in both the SG (0.8 vs. 1.4) and Pantone (1.0 vs. 1.8) sets. But are these differences perceptually significant? In an average sense, no, all three solutions produce an average dE at or below the level of human perception when viewing side by side patches in environmentally controlled test conditions! These differences will be even less noticeable in isolated moving images. I've poured over a lot of source material and cycled through these three LUTs using the eeColor box, and there simply is no way to tell them apart on a still image, let alone a moving one. I would be happy to use any of them long term.
You'll also notice that the internal CMS for this device is not too bad (>95% of errors are verify.zip 175k .zip file
.chc files for HCFR verification
3dlut_hcfr.zip 16k .zip file
*Updated* - Additional algorithm comparisons and analysis
[URL="http://www.avsforum.com/t/1517849/a-comparison-of-3dlut-solutions-for-the-eecolor-box#post_24361430"]This post demonstrates that given the same optimized patch set that LS comes very close to matching Argyll performance.
A factor of at least 2 in patch number reduction compared to standard cube sampling can be achieved while maintaining color correction performance with both programs by utilizing Farthest Point Sampling (FPS), in conjunction with perceptual patch placement and display specific preconditioning profiles.
A sparse sampling test was performed to test algorithm robustness. LS fails to generate a usable LUT under these conditions, indicative of a more simplistic gamut mapping and interpolation approach than what Argyll currently uses.
Some further investigation of the topology of the 3D LUTs created by the three solutions reveals that given the same patch set the LS algorithm produces noisier corrections than Argyll.
[/URL]
I recently had an opportunity to compare the performance of currently available 3DLUT generating software for the eeColor box. I believe this is the first systematic and rigorous look at the relative results of these software packages in a typical home theater set-up. I will describe the initial conditions first, then stability checks and assessment of precision for the comparisons, and then discuss the results.
Display initial conditions: Samsung D8000
- Native gamut selected, larger than target gamut.
- Internal CMS off
- 2pt. grayscale set for D65 calibration at 20% and 100% video levels
- 100% video level = 136 cd/m^2
- 109% video level = 165 cd/m^2
- Black level: 0.028 cd/m^2
- No visual clipping
- Brightness set via normal methods
ArgyllCMS [ver. 1.6.3]
Cube parameters
20 single axis, 40 grayscale, 4 white, 8 black, 2393 Optimized Farthest Point Sampling (OFPS)
Dark region emphasis (1.3), Neutral axis emphasis (0.75), preconditioned with previously measured native gamut.
Command lines:Code:
targen -v -d3 -G -e4 -B8 -g40 -s20 -f2500 -V1.3 -N0.75 -c native.icm argyll_patch_setdispread -v -d2 -X d3oem_jeti.ccmx -yr -P0.5,0.5,2.2,1.3 argyll_patch_setcolprof -v -qh -bl -ax argyll_patch_setcollink -v -qh -3e -et -Et -G -IB -ia -w rec709.icm argyll_patch_set.icm 3DLUT_1.icm
Probe parameters
0.4 second integration, auto measure delay, sync, adaptive integration time
JETI-1211 profiled Display Pro-OEM
LightSpace CMS [ver. 6.5.0.1820]
Cube parameters
17x17x17
Probe parameters
0.5 second integration, LLH enabled
JETI-1211 profiled Display Pro-OEM
CalMAN 5 [ver. 5.2.3.1416]
Cube parameters
17x9 (optimized), 109% included
Note: Limit to 100% is broken [subsequently fixed in later versions]
Probe parameters
0.5 second integration, LLM 2 seconds, 5 cd/m^2 trigger, sync
JETI-1211 profiled Display Pro-OEM
Pattern geometry used: 11% windows on a black background.
Pattern generator: PC (intel HD3000/HDMI out - RGB Full, driver ver. 9.17.10.3040).
The most critical thing in a comparison like this is to know how reproducible the measurements are so that you can assess the significance of the measured color difference between one set of results and another separated in time, this will define our precision in being able to discriminate between performance results. The test was designed so that ideally nothing should change except the software used to calculate the 3DLUT transforms from their respective test patch characterizations. In this scenario the stability of the display and probe are the limiting factors. To ensure the best precision, all the profiles were measured in one session over an eight hour period without disturbing the hardware set-up. Additionally a baseline monitor measurement was performed after each profile and at the end of the verification steps. The measurement of offsets between the JETI-1211 and Display Pro was measured once and incorporated in each software package.
Baseline test set (BTS)
This is a 1000 patch quasi-random perceptual space-filling volume chosen so that it does not overlap with the device characterization test sets used by any of the software packages. I will measure the display native response several times bracketing the profile measurements to test system stability. It will also be used when LUTs are loaded and tested for performance against each other. Measurement time for the BTS is ~15 minutes.
Test sequence
- BTS - Unity lut, Native
- ArgyllCMS profile
- BTS - Unity lut, Native
- LightSpace profile
- BTS - Unity lut, Native
- CalMAN profile
- BTS - Unity lut, Native
- BTS - ArgyllCMS LUT
- BTS - LightSpace LUT
- BTS - CalMAN LUT
- CalMAN SG - All LUTs
- HCFR Grayscale, Saturations, Color Checker - All LUTs
- BTS - Unity lut, Native
Comparing steps 1,3,5,7,13 I find:
dE2000 color difference
3 to 1 - avg. = 0.15, max = 0.71
5 to 3 - avg. = 0.12, max = 0.67
7 to 5 - avg. = 0.14, max = 0.83
13 to 7 - avg. = 0.21, max = 0.96
So in this hardware configuration there is a dE floor of ~0.16 due to a combination of probe precision/display repeatability or bit noise.
Steps 8-12 were used to assess performance.
Tabular Results:
Histograms:
Expanded view
The ArgyllCMS LUT is significantly more peaked than either CalMAN or LightSpace and is able to drive 93% of the test patches below 1 dE. This is consistent with previous measurements of ArgyllCMS performance. Note that the CalMAN distribution is skewed so that it's peak is about halfway between the ArgyllCMS and LS peaks. This is due to the way it iteratively optimizes dE compared to the other 2 software approaches. This yields about the same average performance as LightSpace but does better at pushing more points below 1 dE (72% vs. 67%) at the expense of a slightly higher maximum error (4.3 vs 3.6, Colorchecker SG 3.8 vs. 2.9)
In all the verification measurements I ran the Argyll generated LUT outperformed both LightSpace and CalMAN in a statistically significant manner. I do not consider the differences between CalMAN and LightSpace statistically significant except in the case of gray scale calibration where CalMAN did a bit better than LS (0.8 vs. 1.3) and the skin tone tests where LS did better than CalMAN in both the SG (0.8 vs. 1.4) and Pantone (1.0 vs. 1.8) sets. But are these differences perceptually significant? In an average sense, no, all three solutions produce an average dE at or below the level of human perception when viewing side by side patches in environmentally controlled test conditions! These differences will be even less noticeable in isolated moving images. I've poured over a lot of source material and cycled through these three LUTs using the eeColor box, and there simply is no way to tell them apart on a still image, let alone a moving one. I would be happy to use any of them long term.
You'll also notice that the internal CMS for this device is not too bad (>95% of errors are verify.zip 175k .zip file
.chc files for HCFR verification
3dlut_hcfr.zip 16k .zip file
*Updated* - Additional algorithm comparisons and analysis
[URL="http://www.avsforum.com/t/1517849/a-comparison-of-3dlut-solutions-for-the-eecolor-box#post_24361430"]This post demonstrates that given the same optimized patch set that LS comes very close to matching Argyll performance.
A factor of at least 2 in patch number reduction compared to standard cube sampling can be achieved while maintaining color correction performance with both programs by utilizing Farthest Point Sampling (FPS), in conjunction with perceptual patch placement and display specific preconditioning profiles.
A sparse sampling test was performed to test algorithm robustness. LS fails to generate a usable LUT under these conditions, indicative of a more simplistic gamut mapping and interpolation approach than what Argyll currently uses.
Some further investigation of the topology of the 3D LUTs created by the three solutions reveals that given the same patch set the LS algorithm produces noisier corrections than Argyll.
[/URL]