AVS Forum banner

5521 - 5540 of 9454 Posts

·
Registered
Joined
·
141 Posts
Instead of treating false positives and false negatives as binary, could we score them based on how bad they are? For instance, a false positive (mistakenly detecting a cut) is only bad if it leads to a large change in brightness or a lot of wobble (or both). Similarly, a false negative (failing to detect a cut) is only bad if there's a big difference in brightness between the pre-cut scene and the post-cut scene.
 

·
Registered
Joined
·
1,209 Posts
Your Excel formulas and thresholds look alright to me. FWIW, I can modify the CLD New 2 "original" threshold so that it's very near to "prefer center". But why do you prefer false positive misses over real scene misses? Aren't both equally bad? Actually, I think image pumping (as bad as it is) is less evil than a sudden brightness jump in the middle of a scene, isn't it?
Of course, if there is a sudden jump in the middle of a calm scene, it would be the worst, but if I had to choose between a sudden brightness jump "blended" into a movement/flash scene and visible brightness adaptation at the start of a calm scene... I guess I would chose the unnoticeable jump, but hopefully metric1 is here to compensate some bad decision made by metric2 (and vice-versa) to avoid both pretty well.

Anyway, to be honest I'm a bit disappointed in the results. I had hoped for a noticeable improvement, but I don't see it in the data, sadly.
Lots of testing for nothing this time, unfortunately :(

One thing to consider: The "max adjustment" algo fixes the problem which CLD New 2 "original" still has, namely coming out of black, e.g. in the Lala Land scene you mentioned earlier. So I wonder if it's worth investigating into that direction more?

The "max adjustment" currently limits APL adjustments to a max factor of 4.0 (or 0.25). Without that, the APL adjustment factor is completely unlimited. It being unlimited causes the problems with coming out of black. We could try different max factors. E.g. if unlimited works better than 4.0/0.25, we could maybe try 10.0/0.1?
Fortunately, metric1 always has a very big value when coming out of black frame, so even if metric2 is 0 the scene is still detected properly, so I don't know if it is worth limiting it in the end.

Is there any other advantage of it being unlimited? any improvement for the the Harry Potter scene I posted for instance?
 

·
Registered
Joined
·
1,479 Posts
Discussion Starter #5,523
Instead of treating false positives and false negatives as binary, could we score them based on how bad they are? For instance, a false positive (mistakenly detecting a cut) is only bad if it leads to a large change in brightness or a lot of wobble (or both). Similarly, a false negative (failing to detect a cut) is only bad if there's a big difference in brightness between the pre-cut scene and the post-cut scene.
Agreed.
Since the dynamic target nits rely on the Fall algo, then we should look for FALL changes.

If there are large one, then it will most certainly be a scene change. And the associated target nits will change a lot.

If that's only a small change, we may miss a cut... but that's not very important since the fall did not change much and the associated target nits with it.

An other option would be to directly look at the "ideal target nits" value. No big change, no need to cut.
 

·
Registered
Joined
·
1,209 Posts
Problem found:

Try to watch this scene with metric1 and metric2 enabled, and then only with metric1 enabled (metric2 threshold set to 0):

https://www.mediafire.com/file/qtb3v8qb0ghqqwk/brightness_jump.mkv/file

When the image goes white, you will notice a big brightness jump with metric1 and 2, but not when only metric1 is used (with "disable substraction of previous frame" ON), because metric1 adapts immediately before with several false detections, avoiding the big jump later.

With metrics 1&2




With metric1 only



I guess you are right, brightness jumps like this are the worst.
 

·
Registered
Joined
·
7,954 Posts
Instead of treating false positives and false negatives as binary, could we score them based on how bad they are? For instance, a false positive (mistakenly detecting a cut) is only bad if it leads to a large change in brightness or a lot of wobble (or both). Similarly, a false negative (failing to detect a cut) is only bad if there's a big difference in brightness between the pre-cut scene and the post-cut scene.
Do you mean during testing now, to improve Metric2? Or do you mean in the "final" combined Metric1+2 algorithm?

Fortunately, metric1 always has a very big value when coming out of black frame, so even if metric2 is 0 the scene is still detected properly, so I don't know if it is worth limiting it in the end.

Is there any other advantage of it being unlimited? any improvement for the the Harry Potter scene I posted for instance?
I'm not sure, haven't had a chance to check out that Harry Potter scene yet. Generally, it's nearly impossible to tweak scene detection for one specific scene, only, though. If you report issues like Metric2 not working well in dark scenes, that's something I can work on, but if there's a problem that is specific to one specific movie scene, there might not be much I can do about that. Anyway, will have a look if I can see anything in that HP scene.

Problem found:

Try to watch this scene with metric1 and metric2 enabled, and then only with metric1 enabled (metric2 threshold set to 0):

https://www.mediafire.com/file/qtb3v8qb0ghqqwk/brightness_jump.mkv/file

When the image goes white, you will notice a big brightness jump with metric1 and 2, but not when only metric1 is used (with "disable substraction of previous frame" ON), because metric1 adapts immediately before with several false detections, avoiding the big jump later.
I think this may eventually be solvable by combing Metric1 and 2 in a better way. E.g. I'm not convinced a simple average makes sense. E.g. I'm considering create a small neural network to analyze Metric1 and Metric2 results and spit out a recommendation. But that's something to look at in more detail later.

-------

New thought: I'm wondering if it makes sense to optimize Metric2 only on its own. Maybe it could in some situation be counter productive? What we *really* want to achieve is best Metric3 (combined Metric1+2) results! So probably we should modify the Excel sheet accordingly? E.g. we could change the formulas which calculate false positive and real scene "misses" by calculating Metric3 on the fly and comparing to that. For that to work, we would need the Metric1 measurements, though.

Thoughts?
 

·
Registered
Joined
·
141 Posts
Do you mean during testing now, to improve Metric2? Or do you mean in the "final" combined Metric1+2 algorithm?
I mean extending the spreadsheets with some additional information, so we can use it to score each miss (and weight them in the final tally). It might be a little tricky to set up though.

1) For scoring false negatives, you could store either the instantaneous brightness for the first post-cut frame, or the average brightness for the post-cut scene - then compare it to the brightness assigned to the first post-cut frame when the scene change is not detected.
2) For scoring false positives I think you'd want the same as above, but additionally some metric for the "wobble" as the algorithm stabilizes after the (erroneously detected) cut, to indicate how visually jarring the change is for the user.

Once you have that data, for the instantaneous change in brightness you could probably just add up the squared differences and get a RMSE measure for the quality of the algorithm. Actually gathering the data might be tricky.. I guess ideally you'd automate it by creating a file containing the frame counts of (real or fake) cuts which override the cut detection, then have madVR write a file with the relevant information at those points. Then run it without that input file and output data about the detected cuts.
 

·
Registered
Joined
·
1,209 Posts
New thought: I'm wondering if it makes sense to optimize Metric2 only on its own. Maybe it could in some situation be counter productive? What we *really* want to achieve is best Metric3 (combined Metric1+2) results! So probably we should modify the Excel sheet accordingly? E.g. we could change the formulas which calculate false positive and real scene "misses" by calculating Metric3 on the fly and comparing to that. For that to work, we would need the Metric1 measurements, though.

Thoughts?
That make sense, but when deciding between the ratio of false positives/real cuts to set, I agree with this:

Instead of treating false positives and false negatives as binary, could we score them based on how bad they are? For instance, a false positive (mistakenly detecting a cut) is only bad if it leads to a large change in brightness or a lot of wobble (or both). Similarly, a false negative (failing to detect a cut) is only bad if there's a big difference in brightness between the pre-cut scene and the post-cut scene.
I think we should only list cases of visible artifacts and not to try to get more/less false positives/real cuts if they are not noticeable.

One thought:

If Metric2 is made more reliable (or only with "CLD New 2 - max adjustment 4" for now), maybe we could use a better metric than the average for scenes detection.

This way, you won't necessarily need to "look ahead" since the value of Metric2 is quite low on muzzle flashes compared to real cuts.

One idea that comes to mind from this: why not only use Metric2 when there is a big FALL change for instance (and only use Metric1 when FALL change is small)?

Edit: Looking at scenes with flashes from BvS, Lucy and Passengers, it could work like this:

Code:
if (currentFALL > previousFALL * 1.5 || currentFALL < previousFALL / 1.5)
{
    //useMetric2
}
else
{
    //useMetric1
}


Metric2 would have a threshold of ~75 to be safe.
 

·
Registered
Joined
·
7,954 Posts
Problem found:

Try to watch this scene with metric1 and metric2 enabled, and then only with metric1 enabled (metric2 threshold set to 0):

https://www.mediafire.com/file/qtb3v8qb0ghqqwk/brightness_jump.mkv/file

When the image goes white, you will notice a big brightness jump with metric1 and 2, but not when only metric1 is used (with "disable substraction of previous frame" ON), because metric1 adapts immediately before with several false detections, avoiding the big jump later.
I guess this happens due to "adjust to brightness changes" being turned off in Metric1 now?

I mean extending the spreadsheets with some additional information, so we can use it to score each miss (and weight them in the final tally). It might be a little tricky to set up though.

1) For scoring false negatives, you could store either the instantaneous brightness for the first post-cut frame, or the average brightness for the post-cut scene - then compare it to the brightness assigned to the first post-cut frame when the scene change is not detected.
2) For scoring false positives I think you'd want the same as above, but additionally some metric for the "wobble" as the algorithm stabilizes after the (erroneously detected) cut, to indicate how visually jarring the change is for the user.

Once you have that data, for the instantaneous change in brightness you could probably just add up the squared differences and get a RMSE measure for the quality of the algorithm. Actually gathering the data might be tricky.. I guess ideally you'd automate it by creating a file containing the frame counts of (real or fake) cuts which override the cut detection, then have madVR write a file with the relevant information at those points. Then run it without that input file and output data about the detected cuts.
This seems very time consuming. If I automate it, it would be very time consuming for me. If we let the testers sort this sort of stuff out, it would be very time consuming for them. Furthermore, for this to work at all, we would probably need a lot more scenes than we have now. I don't think this is feasible at this point in time. I might try an automated approach some time in the future, but probably not soon.

I think we should only list cases of visible artifacts and not to try to get more/less false positives/real cuts if they are not noticeable.
But how would you do that? The only thing coming to mind would be to remove real scenes and false positives from the Excel sheet which don't produce visible artifacts when mis-detected? But that might reduce the number of scenes quite a lot, so you would have to search for many new scenes to replace them.

If Metric2 is made more reliable (or only with "CLD New 2 - max adjustment 4" for now), maybe we could use a better metric than the average for scenes detection.

This way, you won't necessarily need to "look ahead" since the value of Metric2 is quite low on muzzle flashes compared to real cuts.

One idea that comes to mind from this: why not only use Metric2 when there is a big FALL change for instance (and only use Metric1 when FALL change is small)?

Edit: Looking at scenes with flashes from BvS, Lucy and Passengers, it could work like this:

Code:
if (currentFALL > previousFALL * 1.5 || currentFALL < previousFALL / 1.5)
{
    //useMetric2
}
else
{
    //useMetric1
}
Metric2 would have a threshold of ~75 to be safe.
I'm not sure, something like this could work, I guess. Would it be possible to add the Metric1 numbers to the Excel sheet and then add some scenes where Metric1 would be better than Metric2 (with low FALL change)? Then we could actually play with the Metric1 vs Metric2 formula in Excel to see which gives the best results.

I'm also thinking we could do something like this:

We could define threshold ranges for Metric1 and Metric2 where each Metric is: a) Sure that there's no scene change. b) Unsure. c) Sure that there is a scene change. And then combine the information like this:

1) If either Metric1 or Metric2 is sure that there's no scene change, then there is no scene change, regardless of what the other metric says.
2) Else: If either Metric1 or Metric2 is sure that there's a scene change (and the other Metric is not sure) then there is a scene change.
3) Else: Both Metrics are unsure. In this case we could use an average of both Metrics, or maybe your idea to pick a Metric depending on FALL change.

This logic should help avoiding false positives better, because a scene change would only be detected if both Metrics measure a noticable amount of change.

Thoughts?
 

·
Registered
Joined
·
1,209 Posts
I guess this happens due to "adjust to brightness changes" being turned off in Metric1 now?
I just checked and "adjust to brightness changes" doesn't help at all for this scene, but "smooth histogram" helps, a lot.

But how would you do that? The only thing coming to mind would be to remove real scenes and false positives from the Excel sheet which don't produce visible artifacts when mis-detected? But that might reduce the number of scenes quite a lot, so you would have to search for many new scenes to replace them.
Yes, the bad thing is that I would have to remove 99% of scenes I guess, but the good thing is also that there are not much misses which produce visible artifacts. And IMO we should concentrate to find them and find a way to remove them.

2) Else: If either Metric1 or Metric2 is sure that there's a scene change (and the other Metric is not sure) then there is a scene change.
I was thinking of something easier:

1) If Metric1 and Metric2 are sure that there is a scene change, then there is a scene change.
2) If Metric1 and Metric2 are sure that there is not a scene change, then there is not a scene change.
3) If either Metric1 or Metric2 is sure that there's a scene change (and the other Metric is not sure), then we decide on a Metric depending on FALL change.

But your solution could be better.
 

·
Registered
Joined
·
7,954 Posts
Yes, the bad thing is that I would have to remove 99% of scenes I guess, but the good thing is also that there are not much misses which produce visible artifacts. And IMO we should concentrate to find them and find a way to remove them.
Hmmmm... I wonder if it's worth the extra effort on your side.

FYI, I'm currently just trying to optimize Metric2. There's not much more I can do, maybe 1-2 more test builds, then we're probably done, I think. Afterwards, we have two separate metrics, which are reasonably optimized on their own, and then the question is how to combine them in the best possible way. This will be made more complicated by looking ahead.

My ultimate goal is to calculate the 2 metrics by madVR, and then feed them (including the looking ahead stuff) into a neural network to make the decision if there is a scene change or not. This should produce the best overall results. But it will require a lot of data to train the neural network properly. So it will take quite a bit of effort to pull this off (mostly on my side, maybe I'll need your help collecting data, though).

So I think in the short run we'll have to think up a nice stop gap solution to combine the 2 metrics somehow, including looking ahead information, to find a reasonably well working scene detection logic. But it doesn't need to be perfect, because it will be replaced by a neural network at some time in the future.

It *does* probably help to optimize the 2 metrics as much as possible, though, because they will (probably) be what is later fed into the neural network.
 

·
Registered
Joined
·
7,954 Posts
Here's the next test build:

http://madshi.net/madVRhdrMeasure70.zip

This comes with my last new "big" idea for improving Metric2 (looking at the (spatial) smoothness of the detected motion vectors). If this doesn't help much, we're probably at the end of improving Metric2. Otherwise, if it does help, maybe we can tweak the new idea a bit more. But we'll soon be done with optimizing Metric2.
 

·
Registered
Joined
·
1,209 Posts
Here's the next test build:

http://madshi.net/madVRhdrMeasure70.zip

This comes with my last new "big" idea for improving Metric2 (looking at the (spatial) smoothness of the detected motion vectors). If this doesn't help much, we're probably at the end of improving Metric2. Otherwise, if it does help, maybe we can tweak the new idea a bit more. But we'll soon be done with optimizing Metric2.
Thanks!

The variants with motion smoothness are not working with cuts from black (value close to 0), is that expected?

Edit: CLD max 4 looks promising :)
 

·
Registered
Joined
·
7,954 Posts
Thanks!

The variants with motion smoothness are not working with cuts from black (value close to 0), is that expected?

Edit: CLD max 4 looks promising :)
Haven't tested the motion smoothness variants with cuts from black. Not sure why it's not working for that, will have to double check. That might be fixable, though. Do the values look useful otherwise?

CLD max 4 is a combination of "CLD New 2 - max adjustment 4" and "CLD New 2 - prefer center".
 

·
Registered
Joined
·
1,209 Posts
Haven't tested the motion smoothness variants with cuts from black. Not sure why it's not working for that, will have to double check. That might be fixable, though. Do the values look useful otherwise?
I only found this issue with it for now, but it doesn't work both ways: real scenes with big FALL changes are not properly detected:

From high to low FALL


It - Frames 149206-149207

From low to high FALL


La La Land - Frames 38156-38157

But if we use Metric1 only for this, then it is not an problem.
 

·
Registered
Joined
·
7,954 Posts
I think I know why this happens. Do you think this shortcoming will affect the outcome of the measurements? I mean are there any real scenes or false positives in your Excel sheet that will suffer from the same problem? I think I might be able to fix this problem, but it might make other things worse. So if we can workaround this shortcoming by using Metric1, it might actually not be a bad idea to keep it as it is. But of course it's an important question whether your Excel sheet likes the new variants at all.
 

·
Registered
Joined
·
1,209 Posts
I think I know why this happens. Do you think this shortcoming will affect the outcome of the measurements? I mean are there any real scenes or false positives in your Excel sheet that will suffer from the same problem? I think I might be able to fix this problem, but it might make other things worse. So if we can workaround this shortcoming by using Metric1, it might actually not be a bad idea to keep it as it is. But of course it's an important question whether your Excel sheet likes the new variants at all.
Without even finishing the sheet, "CLD max 4 + motion smoothness" is already winning by a very large margin over all the CLDs we tested before, with a value of ~8 for the threshold.

A few scenes are affected by this issue, but it would be bad not to try to fix it IMO, because "CLD max 4" is reacting properly to detect real cuts with big FALL changes, and avoids false positives from flashes at the same time.

"CLD max 4 + motion smoothness" is equally good to avoid false positives from flashes, but has an issue to detect real scenes with big FALL changes.

Therefore, using Metric2 for big FALL changes won't work with "CLD max 4 + motion smoothness" because of this (can't differentiate cuts from black and flashes) :(
 

·
Registered
Joined
·
7,954 Posts
Without even finishing the sheet, "CLD max 4 + motion smoothness" is already winning by a very large margin over all the CLDs we tested before, with a value of ~8 for the threshold.
YES!!! :D

I was hoping for that, my first very limited tests looked promising, but those have been wrong before.

Anyway, yes, of course we can try some tweaks to fix the shortcomings of the new algo, and maybe even improve it a bit further. Not today, though. Maybe tomorrow. Would still be very useful to complete the sheet, so we have a baseline to compare any tweaks to. Thanks! :)
 

·
Registered
Joined
·
8,454 Posts
YES!!! :D

I was hoping for that, my first very limited tests looked promising, but those have been wrong before.

Anyway, yes, of course we can try some tweaks to fix the shortcomings of the new algo, and maybe even improve it a bit further. Not today, though. Maybe tomorrow. Would still be very useful to complete the sheet, so we have a baseline to compare any tweaks to. Thanks! :)
I've sat this round out because it looks kinda complex, I cant wait to try the beta build which has whatever it is you are working on fully locked down :)

Good job everyone!
 
5521 - 5540 of 9454 Posts
Top