View Full Version : Ffdshow FAQ


Pages : 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18

cyberbri
06-29-04, 01:52 PM
I've noticed a lot of you are using different resize settings than me. I have a Samsung DLP, and run resize at 1280x720 (desktop set to 1280x720 as well). If I turn resize off, it displays the DVD image in the very center of the screen, 720x480. If I double the DVD resolution, the picture goes way off the sides.

Is there a step I'm missing somewhere to have it up-convert to any resolution, and then display that at just the desktop's 1280x720 signal?


Thanks

.....
06-29-04, 02:03 PM
Weird, sounds like you're displaying the video image at 100%, not fullscreeen.

cyberbri
06-29-04, 02:38 PM
I got it. Zoom Player has to be set to readjust the fullscreen resolution, and I set that to 1280x720 and am getting it to work now (now testing resize at 1920x1080). Thanks for speaking up. Now I know what's going on.

sknyfs
06-29-04, 02:52 PM
Why can't I adjuat ffdshow to resize at 1.5x DVD resolution (1080 x 720)

cyberbri
06-29-04, 02:53 PM
What do you mean, "can't"?

You change the setting, but the display/image doesn't change? (verified with on-screen display)

You change the setting, but it can't run at that setting?

e268
06-29-04, 03:31 PM
Originally posted by madpoet
I think that's the point e... there is no middle of the road system that's going to last a couple of years if you want to keep pushing the envelope. For a stable set of parameters, sure. But If you want to be running the same settings 2 years from now, then I marvel at your self control ;)

You are right. 2 years is asking for too much. I'll settle for "a stable set of parameters".

sknyfs
06-29-04, 03:34 PM
Originally posted by cyberbri
What do you mean, "can't"?

You change the setting, but the display/image doesn't change? (verified with on-screen display)

You change the setting, but it can't run at that setting?

When I input 1080, the display is red, and when I tried to run ZP the OSD said 720x480. ...Let me try it again

It allows 1040x720, but 1.2 dvd res is 1080.

cyberbri
06-29-04, 03:48 PM
Be sure to set the fullscreen display to your desktop resolution in your Zoom Player settings. What kind of display do you have it hooked up to? Are you going through VGA, or DVI? Your desktop display setting should match the TV, although the Zoom Player fullscreen output probably overrides that in case they're different.

Then turn on the OSD and make sure it shows both input and output resolutions. Input should say 720x480. Output should have the resolution you are trying to resize it to. Turn off everything but resize and OSD and try the 2x Resolution setting first.

I also had some weird display problems that I think were due to trying to use VMR9 inside Zoom Player (may be already on inside the video codec) or something, where two were overlapping or something.

You could try no resize or anything, and make sure Zoom Player is set to the proper resolution for fullscreen, then go from there, trying different codecs/overlay/vmr at first to make sure it's right, then try resize at different resolutions.

.....
06-29-04, 03:49 PM
sknyfs,

the value needs to be a direct multiplier of 16, i.e. 1072 or 1088

MixTracks
06-29-04, 04:31 PM
So what do all think is better for a Sammy DLP for resize? 1280x720 or double DVD resolution?

Mastiff
06-29-04, 05:14 PM
No reason not to go as high as you possibly can before your computer starts dropping frames. The graphics card scales it down again anyway.

cyberbri
06-29-04, 05:26 PM
I was running resize at 1280x720, with Soften and denoise3d before resize, sometimes Sharpen at 10 or 20 after. But today thanks to ".....", I figured out how to resize larger. It seemed to look better the 5 minutes or so I tested Matrix Reloaded right then (had spent 2+ hours just fiddling with the ffdshow filters, adjustments on it last night, instead of actually watching it...). It just keeps getting better and better. :)

ed.howell
06-29-04, 06:27 PM
Hi guys,

You are probably going to think this is a really stupid question but it's killing me and I have been searching for the answer but cannot find it. Is VMR9 an option you can turn on in ffdshow? If so how do you enable VMR9 in ffdshow? If it is something else to use how do I get it and install it? I am using TheaterTek 1.5

I have a P4 3.4C Oced to 3.85 so I don't think power is a problem. I am using a P4C800E-dlx mobo and I need to get it modified to run over 4Ghz. It will boot to windows now over 4Ghz but the voltage droops so bad I cannot run anything on it. I also have a Geforce Fx5950 256mb. This is hooked up to a Samsung HLN617W.

Right now I am using denoise 3d .5 1.0 5.0, unsharp mask strength 20, and I am trying out spline resize at 1920x1728 with luma sharpen at .5.
Maybe you guys could give me some advice on these settings as this is a work in progress.

Thanks in advance for your help.

Ed

Grooby
06-29-04, 06:46 PM
Hello

I'm using Reclock with Ffdshow on PAL. My problem is on VMR9 everytime i go into another film Reclock is off a fraction. I have to go into Reclock and click on the VMR slider button (not moving it) and then all is smooth again.

Any ideas what might be causing this?

Thanks
Andy

vpopovic
06-29-04, 06:49 PM
You have to enable VMR9 support in your DVD player, not FFDShow.

As far as sharpening in 3D pipe, there is actualy a specific driver setting (at least on Nvidia you can unlock it only through NV Hard Page utility) that enables "texture sharpening". I use this option for DVDs that are verticaly filtered (generaly all but Superbit DVDs). AF provides for better VMR7/9 scaling, which ultimately gives you crispier (sharper?) picture.

Anyway, I have posted a summary of this approach in this thread:

http://www.avsforum.com/avs-vb/showthread.php?s=&threadid=416282&pagenumber=3

ed.howell
06-29-04, 07:03 PM
Originally posted by vpopovic
You have to enable VMR9 support in your DVD player, not FFDShow.

As far as sharpening in 3D pipe, there is actualy a specific driver setting (at least on Nvidia you can unlock it only through NV Hard Page utility) that enables "texture sharpening". I use this option for DVDs that are verticaly filtered (generaly all but Superbit DVDs). AF provides for better VMR7/9 scaling, which ultimately gives you crispier (sharper?) picture.

Anyway, I have posted a summary of this approach in this thread:

http://www.avsforum.com/avs-vb/showthread.php?s=&threadid=416282&pagenumber=3

Vlad,

Please excuse my ignorance but what is AF and AA and are you saying to set them both at 8X and not application controlled--if I am looking at the right things. As far as I can tell, I think I have looked everywhere, Theatertek does not have VMR9 or I don't know where to find it, probably the later.

vpopovic
06-29-04, 10:24 PM
AF stands for Aniostropic filtering, one of 3D settings in your video card drivers. AA stands for Anti-aliasing, another one of 3D settings in your video card drivers. Depending on your video card, it might be tricky (or impossible due to your hardware limitations) to activate them on 1920x1080 input resolution. Applying AA and AF at lower resolution might improve your image quality, but perhaps not so much that it justifies setting your system to lower resize in FFDShow. If you can't make it work with 8x, perhaps it would work with 2x or 4x. These settings, like FFDShow settings, will depend on your setup. Comparing 3D settings to FFDShow, you can think of AF as "resize", AA like choice between bicubic or lanczos, and texture sharpening, well, like sharpening. Except for AF that objectively produces better image due to more advanced scaling, AA and sharpening settings will depend on your personal preference and your setup. With Nvidia drivers, you really need NV Hard Page utility to force some of the settings. Nvidia drivers do not provide controll over some of the key settings.

If you leave it at "application controlled" setting, AA and AF will not be used. All this works only in VMR7/9 mode, which TT currently does not support (only overlay). New version of TT supports at least VMR9.

Patriots
06-29-04, 11:23 PM
With all due respect to Owen. I have

AMD 64 3200
512MB
ATI 9600 Card


I get great results with the F SHOW. None of my equip is overclocked and right now with the latest SSE2 version I am running Denoise at your settings, then resize to 1920 X 1440, with 50 first dates playing as smooth as can be. I use Bicubic resize and VMR9. CPU Utilization On the F SHOW OSD is around 67 to 80% fluctuates very fast, with what appears to be an occasional spike to 90 or 100% during the opening. But as far as the picture goes, no skipped frames. I am running

Zoomplayer V4.0WMV Final
SONIC Audio and Video Decoders.


I have found that Sonic is less CPU demanding VS. Elecard.

Anyway I can honestly say I wopuldn't consider getting the P4 whatever over the 64 chip I have. Cheap or Not. So you might be able to sqeak a little more out of your system right now, but when the 64 bit days are here, I will be ready!!I am really interested in Overclocking my CPU and upgrading to a stronger Video card with more memory and speed, then I believe I could obtain the same or better results then any of the Intel chips out there.

cyberbri
06-29-04, 11:36 PM
I've been tweaking around more than I probably should with ffdshow, and have found that I need to tweak for a while, then save those settings for that specific movie, so I don't have to re-tweak before each movie.

At least I'll probably find a good setup for regular DVDs, and tone it down on a separate name if it's too sharp and grainy or something (like Spider-man superbit with normal DVD settings).

e268
06-29-04, 11:45 PM
Patriots: that is great to hear. Do you mind posting the rest of your equipment, i.e., mobo, psu, type of ram...and which AMD64 3200? Clawhammer or Newcastle? Thanks.

Patriots
06-30-04, 12:10 AM
I Just flirted with F Show and mimicing at owens 1920X1728, My CPU went up to 85%, but there was tearing. I believe that was more the fault of the Video card. I have Kingston 512 PC3200DDR and the VIA KT800 Chipset. I keep my settings at 1776X1000, because that is what the ATI outputs to the TV, but I stay at only 67% and no tearing on that setting. I normally run Overlay instead of VMR 9, as I find the picture to my liking better in that setting. I only tried Owen's settings after reading his comments about the performance of the non FX 64 chips versus Intel. Icould go as high as 1920X1440 mimicing his settings with no tearing or stuttering. I read posts of others who have said they have similar set ups and can't get the same performance. I can't figure it out. When Andy started working on the SSE2 versions my performance just got better and better. Another issue is the decoders. If I try and Run the AC3 Filter it crashes my system. For some reason on my system I get a strong reduction in CPU usage using Sonic decoders and I can't notice a PQ difference versus Elecard. I recommend Anyone with AMD 64 try Sonic Cinemaster Video and Audio decoders..

e268
06-30-04, 12:15 AM
Thanks. I just edited my question which you may not have seen. Anyway, which AMD64 3200 do you have, the Clawhammer or the Newcastle? Thanks.

Patriots
06-30-04, 12:31 AM
The ClawHammer!

bedo
06-30-04, 06:29 AM
I am running a 2800+ Barton and it seems that since I reenabled AGP 8X I am unable to play at the settings in ffdshow that I have done in the past with no problem. What are the best settings to disable/enable re: my ATI 9800 Pro to help ffdshow? I have yet to go back and disable it again to retry.
Also, does it make sense that I can play to my crt monitor through vga w/out problem whereas the same settings will freeze the video after a few seconds when playing through DVI to my 2HD?

Thanks for suggestions.

Jeff

Owen
06-30-04, 07:36 AM
Originally posted by pcgeek
Stupid question but I don't remember it being asked in this context recently, but you do have zoom player fixing the VMR scaling bug, right? It defaults to off and you really need to turn it on if using VMR. At least until 9.0c comes out which fixes the bug. Otherwise the images actually get scaled down first and then up and that will make the image look soft as well. AFAIK, Zoom is the only player with a work around in place.

-Pat


Having Zoom Player set to “Fix VMR9 scaling bug” in the Filters menu is a given for anyone using VMR9.
It should be on by default IMHO.

Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

Owen
06-30-04, 09:31 AM
Originally posted by Patriots
With all due respect to Owen. I have

AMD 64 3200
512MB
ATI 9600 Card


I get great results with the F SHOW. None of my equip is overclocked and right now with the latest SSE2 version I am running Denoise at your settings, then resize to 1920 X 1440, with 50 first dates playing as smooth as can be. I use Bicubic resize and VMR9. CPU Utilization On the F SHOW OSD is around 67 to 80% fluctuates very fast, with what appears to be an occasional spike to 90 or 100% during the opening. But as far as the picture goes, no skipped frames. I am running

Zoomplayer V4.0WMV Final
SONIC Audio and Video Decoders.


I have found that Sonic is less CPU demanding VS. Elecard.

Anyway I can honestly say I wopuldn't consider getting the P4 whatever over the 64 chip I have. Cheap or Not. So you might be able to sqeak a little more out of your system right now, but when the 64 bit days are here, I will be ready!!I am really interested in Overclocking my CPU and upgrading to a stronger Video card with more memory and speed, then I believe I could obtain the same or better results then any of the Intel chips out there.


Seeing as the P4 systems can perform at least as well if not better then any Athlon 64 including the FX for video processing, and do so at a lower cost, I just can see how spending more money on a slower system is a smart move.
I therefore cannot recommend the Athlons at this time.

Realistically, by the time 64bit Windows and the 64bit software to make use of it are available, the existing Athon 64 systems will be obsolete IMHO.
By then we will be using 4-5Gig systems for cutting edge video processing with different mother boards, RAM and even cases.
New software will continue to be released that pushes our PC’s to the limit.
You just can’t have enough power at the moment and I don’t expect that situation to change.
P4’s offer more bang for your buck at the moment. :D

I am NOT an AMD basher, just a realist. ;)


Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

P.S.
You may believe that your Athlon 64 system can match a good P4 setup, but we will believe it when we see it.
To the best of my knowedge, no Athlon user has managed it, so what makes you so sure. :D
Vpopovic has managed the best Athlon performance so far recorded with his overclocked FX system. But I thing even he will admit that his system is not faster then a cheaper overclocked P4 system.

Owen
06-30-04, 10:03 AM
Originally posted by Patriots
I Just flirted with F Show and mimicing at owens 1920X1728, My CPU went up to 85%, but there was tearing. I believe that was more the fault of the Video card. I have Kingston 512 PC3200DDR and the VIA KT800 Chipset. I keep my settings at 1776X1000, because that is what the ATI outputs to the TV, but I stay at only 67% and no tearing on that setting. I normally run Overlay instead of VMR 9, as I find the picture to my liking better in that setting. I only tried Owen's settings after reading his comments about the performance of the non FX 64 chips versus Intel. Icould go as high as 1920X1440 mimicing his settings with no tearing or stuttering. I read posts of others who have said they have similar set ups and can't get the same performance. I can't figure it out. When Andy started working on the SSE2 versions my performance just got better and better. Another issue is the decoders. If I try and Run the AC3 Filter it crashes my system. For some reason on my system I get a strong reduction in CPU usage using Sonic decoders and I can't notice a PQ difference versus Elecard. I recommend Anyone with AMD 64 try Sonic Cinemaster Video and Audio decoders..

I also used to run 1776x1000 resize (my display res.) with overlay output and I liked it until I got 1920x1728 (for PAL) running smoothly with VMR9 and Elecard decoder thanks to Andy's optimizations.
I can’t go back now. :D

Regards,

Owen

Patriots
06-30-04, 10:28 AM
I can understand your thinking. I certainly loo forward to the day of 4 and 5 GHZ. I love My Athlon and wish the FX were so expensive!!!Right now though I am happy with running 1776X1000. My main reason for replying to your post was because

1) The hole Cheap 64 thing riled me a bit:)

2) I had been reading posts of people with similar equip to mine not being able to to get decent performance. I don't understand that! I strictly use


Denoise=Optimized Settings
Resize 1776X1000
Luma Sharp to 1.3
Overlay.


Anyway Owen i must thank you for all the info you post. I found your posts helpful while getting my feet wet with the F Show.




PS......I f there is someone in SC who can give me a Overclocking walk through tutorial. Please let me know.

Energeezer
06-30-04, 03:30 PM
Finally got a PC with enough power to use FFDshow so I started looking at this thread. The problem is that the thread is 89 pages long and I fear I'll expire before reading it all.
Can someone give me the BASICs to get started with the following equip and software.
Maybe it is time to start a new up to date thread since OWEN makes note on the first page of this one that the info (at the beginning) is outdated.

P4 2.6Ghz 512 L2
512 Meg Ram
Radeon 8500 64MB card
ECS 848P board.
160 Gig HD

Running
XP Pro
Win DVD video and Power DVD audio under ZP

I have been using WIN DVD 6 trial (and like it except DMN) but trial is running out so I will be back to Win DVD 4 vid decoder or I can get Win DVD 5 if neccessary.

Thanks
Steve

cyberbri
06-30-04, 03:33 PM
If you don't know how to turn on ffdshow inside Zoom Player, look back through the last few pages (I think I posted on it to someone else).

This should also help get you started:
http://htpcnews.com/main.php?id=ffdshowdvd_1

Spoonfed
06-30-04, 05:32 PM
I updated to the latest version of the "normal" FFDShow by Andy.

Only 7 days newer than the one i had, yet now i can run 2 x PAL resize AND denoise 3D on my 2400+ not overclocked, impressive enhancements

Only issue is its to close to "the edge" to run this as i often capture 2 DTV channels at once while playing back, hmmm.

ed.howell
06-30-04, 05:43 PM
Originally posted by vpopovic
AF stands for Aniostropic filtering, one of 3D settings in your video card drivers. AA stands for Anti-aliasing, another one of 3D settings in your video card drivers. Depending on your video card, it might be tricky (or impossible due to your hardware limitations) to activate them on 1920x1080 input resolution. Applying AA and AF at lower resolution might improve your image quality, but perhaps not so much that it justifies setting your system to lower resize in FFDShow. If you can't make it work with 8x, perhaps it would work with 2x or 4x. These settings, like FFDShow settings, will depend on your setup. Comparing 3D settings to FFDShow, you can think of AF as "resize", AA like choice between bicubic or lanczos, and texture sharpening, well, like sharpening. Except for AF that objectively produces better image due to more advanced scaling, AA and sharpening settings will depend on your personal preference and your setup. With Nvidia drivers, you really need NV Hard Page utility to force some of the settings. Nvidia drivers do not provide controll over some of the key settings.

If you leave it at "application controlled" setting, AA and AF will not be used. All this works only in VMR7/9 mode, which TT currently does not support (only overlay). New version of TT supports at least VMR9.

Thanks Vlad,

Thats what I thought AA and AF stood for but was not 100% sure. I changed them to both 8X and my system seemed to be ok at ffdshow resize 1920X1728 like I had it.

gazzagazza
06-30-04, 10:53 PM
Should I be able to see a histogram in the levels section when playing a DVD? If so, how do I configure it to display?

cyberbri
06-30-04, 11:05 PM
Isn't there a check-box that says "Show Histogram" or something? I think I saw one there yesterday in the version I'm running.

But the DVD has to be actually playing -- for me, I have to click the screen to pause, right-click and open ffdshow, go to Levels and turn it on, click again on the DVD playback, then do Alt-Tab to swtich back to the open ffdshow window to watch -- and hope the DVD playback doesn't stop due to lack of memory...

Charles Black
06-30-04, 11:40 PM
Gazzagazza,

You need to have Picture properties selected and one of the settings in it set to anything but default I just put gamma at 1.01. You can use Postprscessing instead but it requires some excessive setting before the graph will work.

By the way, if you expand a 16 to 235 range to 0 to 255 using Picture Properties you can actually see the lost codes in the graph!

Charlie

AndyIEG
07-01-04, 12:20 AM
new SSE2 build

here are some additional hits to the new version

Hint:
The Parameter setting in the resizer tab direct influence the filter/tap deep and wich internal routines are used.
For Lanczos the parameter choose the mode/tap deep
(3 = lanczos3, 4 = lanczos4, 5 = lanczos5 ...)

Speed tips:
1: u NEED to use a filter (level/denoise...) BEFORE u resize to force max. performance
2: The new default Bicubic setting is a special tuned setting/mode, for best performance dont change the parameter.
If u dont like the new settings, the old parameter was "-0.6". But with the new default setting some new/faster routines are forced/used.

3: Dont go higher than 4 aka Lanczos4, or slower internal routines are used.

4: in Bicubic mode dont set Luma sharpen higher than 1.60 or slower routines are used.
in lanczos3 mode dont set Luma sharpen higher than 0.62/0.82 or slower routines are used.
in lanczos4 mode dont set Luma sharpen higher than 1.20 or slower routines are used.

5: avoid using Chroma sharpen, if u do dont go over 1.20 or slower routines are used.
6: always try to output YV12 colorspace at the output pane
7: Spline & Sinc use slower, lesser optimized routines so avoid those modes.

So mainly use bicubic with the new "default" setting and only Luma sharpen (0-1.6).
Lanczos3 and Lanczos4 are also well optimized, but anything higher is not, like Lanczos with parameter higher than 4 or Spline/Sinc, avoid those modes.

PS: bugs/crashes ... per private message pls. Also gimme some feedback on the resizer speed on P4 since i could not test the code on a P4.
Im still working on one of the main routines so this is not the final code, i hope to push some more speed out of the routines.

Pls test if the horizontal lining bug is now fixed and no color anomalies happend. If u notice anything strange or anomalies send me a private message.

vpopovic
07-01-04, 01:14 AM
I was just about to go to bed. I guess not. Thanks for all you hard work. I'll PM comments.

vpopovic
07-01-04, 03:15 AM
Preliminary comments (it is quite late, so my eyes might me playing tricks on me).

Excellent job. I mostly tested Lanczos4, and checked Bicubic. I used 1920x1080 with Levels clipped 0-254 on output and had quite a bit of headroom on CPU (VMR7). It looks like this version is faster than before. For me it is hard to say how much as I was using Avisynth + FFDshow combo before.

Although I did not test my favorite horizontal scaling bug scenes, this built did well on opening scene of LOTR TT EE and in the second scene with lots of fog. Colors are tiny bit different than before (again Avisynth could be the reason), but don't appear to be off.

Improvement in image quality with Lanczos4 is extraordinary. I have not expected this much gain. I am not sure if others will agree with my comment that gain in PQ is "extraordinary", but it should probably be at least significant on everybody's scale (everybody meaning big screens, VMRs, setups without bugs, etc.). While more detail is noticable everywhere, it is especialy true for the background. I took some screenshots and compared to my old reference screenshots of the same scenes (in its ugly compressed form available in WM9 vs. DVD thread) and that confirmed my conclusion.

Then I deared to compare Kill Bill 1 DVD vs. WM9 720p trailer, and while WM9 had more clarity in the med/long shots, the gap is clearly closing. DVD was holding on quite well. Better than ever before.

I noticed some other things, but can not comment without some more testing. I just reported quite obvious stuff.

Andy, thank you very much for your effort. You have done an excelent job and you should be proud of your work that will be very much appreciated and enjoyed by your fellow HTPC enthusiasts. We did not have a new resize in a year or so, and of course, we never had a better one.

vkon
07-01-04, 03:59 AM
$&!# i'm on vacation right now and i can't try it.

Anyway , Andy thank you very much for your efforts.

nm88
07-01-04, 04:29 AM
Thanks for the new version AndyIEG. A few comments about this version on a P4: Resize > 2000 still doesn't work like it should (I tried 2160x1440). If I do it while running a program, I get a crash; if I do it before and try to start one, I get a filter connection with VMR9.

Lanczos is really strange now, it has an extreme sharpness to the image compared to Bicubic, almost like an added unsharp mask, that accentuates edge enhancement and banding. And only parameter = default or 4 seem to look OK; 2 or 3 causes odd edge artifacts and 5+ causes distortions in the hue. I'll have to go back to the older version to make sure this is not just something I overlooked before.

On a P4, CPU utilization is about the same as the last version, +/- 5%, with all my settings (Dscaler LumCromShift + Denoise3D before resize). Bicubic shaves about 10% utilization from CPU 1 compared to Lanczos 4. YUY2 output adds 10-15% utilization, but on CPU 0 only, hyperthreading at work I suppose. I still pick YUY2 because YV12 seems to compress the range of greys.

Goi
07-01-04, 05:59 AM
nm88, that's interesting. zplayer doesn't use more than 50%(on 1 logical CPU) for me. Apparently there's no way to distribute load on the 2 logical CPUs.

BangoO
07-01-04, 07:10 AM
I would be nice to have a real clear explanation of Overlay vs VMR9 and YUY2 vs YV12 when using a Radeon card, because this is still a bit unclear for me...
I read that using a huge resize makes the image sharper with VMR9, but what about resizing with Overlay ?

Spoonfed
07-01-04, 07:26 AM
I "understood" from reading this and other threads that FFDShow worked internally within YV12 colour space, so even inputting YUY2 and outputing YUY2 there is a conversion, hence outputing YV12 only has one input conversion (if the codec is feeding YUY2).

AndyIEG
07-01-04, 08:16 AM
Originally posted by nm88
Thanks for the new version AndyIEG. A few comments about this version on a P4: Resize > 2000 still doesn't work like it should (I tried 2160x1440). If I do it while running a program, I get a crash; if I do it before and try to start one, I get a filter connection with VMR9.

Lanczos is really strange now, it has an extreme sharpness to the image compared to Bicubic, almost like an added unsharp mask, that accentuates edge enhancement and banding. And only parameter = default or 4 seem to look OK; 2 or 3 causes odd edge artifacts and 5+ causes distortions in the hue. I'll have to go back to the older version to make sure this is not just something I overlooked before.

On a P4, CPU utilization is about the same as the last version, +/- 5%, with all my settings (Dscaler LumCromShift + Denoise3D before resize). Bicubic shaves about 10% utilization from CPU 1 compared to Lanczos 4. YUY2 output adds 10-15% utilization, but on CPU 0 only, hyperthreading at work I suppose. I still pick YUY2 because YV12 seems to compress the range of greys.

mhh this is realy strange for me everything work with resolutions higher than 2000. I tryed zoomplayer(media and dvd playback) also Crystal Player. Make sure the player try to use VMR7/9 for the playback or filter connection errors happend. I need more input here from others too.

Sorry for the confusion about the new settings. The new code just covers the lanczos1-4 mode and bicubic mode, all other use the old code, thats why "5+ causes distortions in the hue". Btw lanczos3 is the same than lanczos default.
There is nothing changed in the way the resizer work, just some tweaks and bugfixes. "Lanczos is really strange now, it has an extreme sharpness to the image compared to Bicubic"
Thats cause in fact the new default Bicubic mode is not as sharp as the old. U can compensate this by setting it to -0.6 (old setting) or go higher on the Luma sharpen. Lanczos3-4 nearly use the same routines, i will check Lanczos1-2 routines and see if there is something strange.

The internal YV12->YUY2 conversion is not the best and cause this 10-15% more CPU usage. I dont see the reason why not outputting YV12 since everything is converted to yv12 anyways and just converted again if u choose for output.

In general the speed gain isnt this huge i know, thats cause the old version was already well optimized and we have long dependency chains wich are hard to optimize. Only a 64bit version can boost the speed better since i could rewrite many routines im not happy with.

PS: using higher sharpen values than the one i recommend as max. will also result in the old code... sorry had no time to rewrite the standard main loop. In lanczos4 this happens if u change the Luma sharpen higher than 1.20 the old code is used and u can see the horizontal lining bug again. I will try to fix this and also rewrite the main loop...

vpopovic
07-01-04, 08:57 AM
On my FX-51 speed gain is evident. Again, it is difficult for me to estimate how much faster this version is as I used Avisynth + FFDshow, but Lanczos4 and bicubic seem faster. I could not do 2560x1440 bicubic before and I can do it now with smooth playback. I don't remember what was my CPU utilization on 1080p bicubic, but it was probably around 60%. Going from 2 million pixels to 3.7 million pixels is a huge gain. I have to yet see if there are any tangible benefits from this monster resize though. Lanczos4 did not exist before, so it's difficult to say, but even if speed gain is not huge, Lanczos4 seems to upsize without previous bugs, and does reveal much more detail than before. Now that's huge.

midiboy
07-01-04, 09:13 AM
Hi !


Andy, Owen and whoever else keeps recommending YV12 as output colorspace :

Is this only because using YUV2 would result in a speed drop or do you have any other reasons for recommending YV12 over YUV2 ?

Blight for instance recommends to use YUV2 because using VY12 according to him crushes contrast a bit and I have recently found out that my major problem with VMR tearing is due to the use of YV12 as an output colorspace on my system. If you are interested in my findings you can go and read this thread (http://www.avsforum.com/avs-vb/showthread.php?s=&threadid=414389&highlight=tearing+VMR9)

So contrary to your recommendation I have to actually recommend YUV2 instead of YV12, at least if you use an NVIDIA card and are plagued by tearing problems with VMR9. Seemingly ATI users who suffer from tearing do not benefit at all from switching colorspaces but any NVIDIA user should at least give it a try.

Naturally I would very much love to have you Andy go over the YV12 to YUV2 conversion routines in ffdshow and make them faster ( for the SSE2 version ) but of course thats only a very bold wish ! :)

Bye,
Alex

AndyIEG
07-01-04, 09:23 AM
little question, for futher plans:

There are still some code to optimize but i just noticed that the resizer, color conversion and denoise3d cant be much more optimized... its a try and try again atm i change something and have to test if its faster... change again and test. I just noticed that going from mmx2 to sse2 isnt this easy with huge speed gains, since the long dependency chains kill's the speed. For the crazy main loop i already tryed 15-30 diff. versions of my sse2 code and all i can is "match" the org. mmx2 version. This is cause i cant keep the cpu/pipeline busy with new stuff so i free up some cycles but cant fill them... and this is cause i need more registers for software pipelineing.... wich only exist in the AMD64 mode....

So the question is: Is there a "real" demand for a AMD64 64bit version wich only runs on the Windows XP Pro 64bit edition (wich can be free downloaded atm)
The main problem is u need 64bit drivers for all of your hardware, 32bit programms still run fine as far as i tested.
The speed gain would be around 20-30%. Im also not sure how compatile intel's new 64bit cpu is compared to the AMD64.

Im not sure in wich direction i want go since i can see a dead end for speed gains on the 32bit version. I can prolly smack some more out of the color conversions and denoise3d.

"Naturally I would very much love to have you Andy go over the YV12 to YUV2 conversion routines in ffdshow and make them faster ( for the SSE2 version ) but of course thats only a very bold wish !"

Its on the todo list :) and yes we recommend this only cause of the extra conversion from yv12 to yuy2 for output.

BangoO
07-01-04, 10:03 AM
Ok so whatever we do, as long as we use ffdshow, everything is converted to YV12, therefore it is better to have a YV12 output.
But then... is there any loss in color conversion/contrast crushed when we use ffdshow (as opposed to when we don't use it at all) ?
If so, is there any way to avoid that ?

Thx ;)

TheLion
07-01-04, 10:09 AM
Well, I recommend YV12 output VERY STRONGLY because of the famous chroma upsampling bug. No matter which Software Decoder you use (I tried every version of Elecard, WinDVD, Nvidia FWMM) you will get this nasty bug. With forced YV12 output you use the colorspace conversion alg. of your graphic card. All present generation cards from either ATI or Nvidia do a "perfect" job compared with decoders in software mode (DVXA is another story). THAT makes forced YV12 output the only viable option.

@AndyIEG

in one word -> WOW. I haven`t had time to try out your newest release but a brief look at the feature list makes me cry out of luck. THIS is what I have been waiting for from day one since I tried ffdshow! Lanczos resize without the green bug and the horizontal lines bug. I tried Avisynth to get the same result but its too slow on my setup for using it together with denoise3D, etc.

And then you add Lanczos4+... I tested it with Avisynth for quite some time now and it proved to be the "best" scaling alg. (difference to Lanczos3 is generally hard to spot but there are a few scenes where they become VERY obvious -> e.g. Gladiator Superbit DVD, scene with golden ornaments on the throne) . And now you integrate it into ffdshow.

THANK YOU SO MUCH, and I think I`m talking here for any home cinema fan out there with an eye for quality.

AndyIEG
07-01-04, 10:29 AM
"And then you add Lanczos4+... I tested it with Avisynth for quite some time now and it proved to be the "best" scaling alg. (difference to Lanczos3 is generally hard to spot but there are a few scenes where they become VERY obvious -> e.g. Gladiator Superbit DVD, scene with golden ornaments on the throne) . And now you integrate it into ffdshow."

only to get something straight :) The new modes u can now use Lanczos1-10 was already in ffdshow. Means ffdshow always had this modes in the mplayer lib. The problem was there was a very little bug in ffdshow wich prevents u to use those modes... I was working on the resizer and noticed that this parameter should adjust those modes and i could see that always the default (lanczos3) was used... i simply mailed milan with this hint and he easy found the little bug... he changed 2 words and the parameter worked :)

And again bicubic and lanczos use the same way to resize the image only the internal filter size/tap deep and the coef aka parameter settings change. So there is only one main routine wich just work with diff. coefs.

Charles Black
07-01-04, 10:44 AM
Andy,

Thanks again for your great effort! Tonight I'l give your new version a spin.

Are you sure that ffdshow does format conversions that are not asked for? I remember that he Avisynth docs say that the format is preserved and allows the user to explicitly choose. I thought that ffdshow might be similar. If ffdshow does a call to a YV12 conversion at the beginning it would cause unnecessary CPU overhead for YUY2 outputs. If ffdshow has a call to the YV12 conversion routine could it be made switchable?

Thanks again.

Charlie

mpgxsvcd
07-01-04, 11:11 AM
Will the SSE2 versions run on a regular Athlon 3200+. Are they designed to only run on an Athlon 64 or a P4? For example should it throw an error if you run it on anything else?

Michele Spinolo
07-01-04, 11:30 AM
Originally posted by AndyIEG


So the question is: Is there a "real" demand for a AMD64 64bit version wich only runs on the Windows XP Pro 64bit edition (wich can be free downloaded atm)
The main problem is u need 64bit drivers for all of your hardware, 32bit programms still run fine as far as i tested.
The speed gain would be around 20-30%. Im also not sure how compatile intel's new 64bit cpu is compared to the AMD64.



Hi Andy,

IMHO a AMD64 version would be really appreciated: in 1 or at least 2 years I suppose 64bit will be the standard CPU used, so why do not be ready for this?:D

vpopovic
07-01-04, 11:35 AM
Originally posted by Charles Black
Andy,

Thanks again for your great effort! Tonight I'l give your new version a spin.

Are you sure that ffdshow does format conversions that are not asked for? I remember that he Avisynth docs say that the format is preserved and allows the user to explicitly choose. I thought that ffdshow might be similar. If ffdshow does a call to a YV12 conversion at the beginning it would cause unnecessary CPU overhead for YUY2 outputs. If ffdshow has a call to the YV12 conversion routine could it be made switchable?

Thanks again.

Charlie

YV12 is 4:2:0 - compressed chroma colorspace. YUY2 is 4:2:2 - upsampled chroma colorspace. As I understand it 4:2:0 is faster than 4:2:2 due to its lower bandwith (i.e. chroma is 0), and bit alignment (i.e. FFDShow routines were built around bit alignment in YV12). FFDShow (even the old version) was optimized for speed as opposed to Avisynth. I am not sure if any decoder can decode directly into YV12 or needs to unpack chroma to YUY2 (or similar 4:2:2) format and then internally convert to YV12.

YV12 gets the benefit of hardware chroma upsampling which is good. With TT comming out with Forceware filters, and Nvidia probably launching Forceware in forseeable future, you will have a two major applications using YV12 output. I'd say they will probably get 50% of the higher end market (i.e. people that upgrade their software as opposed to use OEM versions).

AndyIEG
07-01-04, 11:57 AM
all i know for sure is that ffdshow calls the yv12 conversions if u enable a filter and calls a conversion again dependent on what output colorspace is used.

If the input is yv12 a simple c++ copy routine is used and nothing changes, if yuy2 is the input the yuy2 to yv12 is called and all filters work on this format and at the end it is converted back to yuy2 if yv12 is disabled as output.

For the level thing i do a 16-235 (input) to 0-255 (output) conversion since this looks better for me...

N3W813
07-01-04, 12:19 PM
Awesome!! Thx Andy, will try the new version when I get home.

Really appreciate all the hard work you put into this project for us video enthusiasts. :D

BangoO
07-01-04, 12:40 PM
WinDVD6 can output YV12, but not if you use ffdshow as you need to use and intermediate filter (Abstract for example) that uses YUY2.

Then... what about VMR9 vs Overlay with Radeon video cards ?

BangoO
07-01-04, 12:45 PM
Originally posted by AndyIEG
For the level thing i do a 16-235 (input) to 0-255 (output) conversion since this looks better for me...
If I do this, I loose a lot of details in the blacks...

AndyIEG
07-01-04, 01:38 PM
Originally posted by BangoO
If I do this, I loose a lot of details in the blacks...

i compensate this with a gamma correction of 1.29 in the same pane

Charles Black
07-01-04, 01:58 PM
I ran a few tests to get an idea of how ffdshow changes color coding and it is not completely simple. I used Avisynth Info output to check color spaces before and after Resize with no other filters and with Picture Properties, Denoise3D and Levels. All outputs were forced to YUY2. WinDVD decoder (YUY2 only) for input.

Both Lanzcos and Spline didn't change the color space - if it was YUY2 in that is what was outputted.

All of the filters I tried (above) changed the color space to YV12 even though the eventual output was YUY2. That is really too bad from a quality standpoint.

If ffdshow is used to resize and sharpen in the resize only then it is possible to not have any color coding conversions at all. This works with Lanzcos and Spline. YUY2 will be used eveywhere.

I run Spline with some luma sharpening in the resizer and no other filters as my normal setup. This seems to have less artifacts and noise for me. I also use a gamma of 1.3 for really dark video to bring out the shadow detials. I expect that brighter pojectors might not need the extra gamma.

Charlie

AndyIEG
07-01-04, 02:12 PM
Originally posted by Charles Black
I ran a few tests to get an idea of how ffdshow changes color coding and it is not completely simple. I used Avisynth Info output to check color spaces before and after Resize with no other filters and with Picture Properties, Denoise3D and Levels. All outputs were forced to YUY2. WinDVD decoder (YUY2 only) for input.

Both Lanzcos and Spline didn't change the color space - if it was YUY2 in that is what was outputted.

All of the filters I tried (above) changed the color space to YV12 even though the eventual output was YUY2. That is really too bad from a quality standpoint.

If ffdshow is used to resize and sharpen in the resize only then it is possible to not have any color coding conversions at all. This works with Lanzcos and Spline. YUY2 will be used eveywhere.

I run Spline with some luma sharpening in the resizer and no other filters as my normal setup. This seems to have less artifacts and noise for me. I also use a gamma of 1.3 for really dark video to bring out the shadow detials. I expect that brighter pojectors might not need the extra gamma.

Charlie

yes this is true, if ONLY resize is used the resizer work with the input color format direct. If any other filter is used before or after resize it converts to yv12. The new SSE2 code is only for the yv12 resizer not for the yuy2 since it seems most use a denoise or level or whatever filter wich force the yv12 conversion.

The question is now if ffdshow "could" also work with the yuy2 format without converting, or at least the most used filters. I can try contact milan and ask him what all has to be changed to make this possible.
Im also not sure if this realy improve picture quality since i cant see any diff. from yv12 to yuy2 even if i just use the resizer.
So all you video freaks test this :) means only enable resizer in lanczos4 mode and and force yuy2 output+input, compare this with the image u get if u add a filter before resize (level or blur at 0).

BangoO
07-01-04, 02:39 PM
This is all very interesting :)
Charles can you explain why it is bad that it is converted to YV12 instead of YUY2 ?

Charles Black
07-01-04, 02:42 PM
Andy,

Milan might have some good feedback. It may be to difficult to include YUY2 in the filters but it would be very nice. I always am happy when there is a choice that allows for comparison.

I am trying to figure out a good test to see what the changes are to the video by recoding the color and by resizing. I think I can subtract the original frame from the modified frame and display the difference. This is going to require some care, with resizing, since I need to be sure that the original is resized to the target and then possibly resized back to the original for comparison.

Charlie

taci
07-01-04, 03:20 PM
I have just downloaded Andy's latest release ffdshow-20040701_SSE2.exe to get use of the optimized ffdshow code. Andy did a great job here and I appreciate him for his invaluable contribution. However, to my surprise when I try to enter resize value more than 2000 in resize pannel, the input field turned red whenever I enter a value more than 2000 and rejected my input. I knew that such restriction was removed from ffdshow a while ago. Is there something I am missing? I know people are using resize values more than 2000.

Thanks

nm88
07-01-04, 03:26 PM
Originally posted by AndyIEG
Make sure the player try to use VMR7/9 for the playback or filter connection errors happend. I need more input here from others too.
I was using VMR9, still got the error when I tried 3x DVD resolution (2160x1440). :( Using Intervideo 6 filters.
Originally posted by AndyIEG
The internal YV12->YUY2 conversion is not the best and cause this 10-15% more CPU usage. I dont see the reason why not outputting YV12 since everything is converted to yv12 anyways and just converted again if u choose for output.
Because switching to YV12 raises the output black levels quite a bit (I haven't tested if white level changes too). I can compensate by lowering the black level, but then I lose more greys. So I stick with YUY2 output since on the P4 hyperthreading seems to take care of the extra CPU usage.

If there were a way for ffdshow to preserve the black/white levels with YV12, I'd gladly choose it to lower overal CPU usage.

AndyIEG
07-01-04, 03:29 PM
Originally posted by taci
I have just downloaded Andy's latest release ffdshow-20040701_SSE2.exe to get use of the optimized ffdshow code. Andy did a great job here and I appreciate him for his invaluable contribution. However, to my surprise when I try to enter resize value more than 2000 in resize pannel, the input field turned red whenever I enter a value more than 2000 and rejected my input. I knew that such restriction was removed from ffdshow a while ago. Is there something I am missing? I know people are using resize values more than 2000.

Thanks

X still has to be divisible through 16!

"I was using VMR9, still got the error when I tried 3x DVD resolution (2160x1440). Using Intervideo 6 filters." for me it works... but i have to test the intervideo 6 filter than.

Energeezer
07-01-04, 06:28 PM
Well i'm new to FFD after all this time.
Last night I got it working but it is very unstable. Seems to be hit and miss. It works but then if I try to use any features (FF,RW, Pause etc) the audio will continue but the video will freeze. After the freeze if i stop and then restart the player it will sometiimes work but usually I have to restart the machine and/or eject and reinsert the DVD

I am running XP on a P4 2.6 512 ram with Radeon 8500 64Mb video
Win DVD 4 video and Power DVD 4 audio under ZP.
MY FFD settings are
Blur/NR gradual 40
resize to 1440/960
sharpen set to about 26

I'm going to try reducing resize to my display res of 1280/720.

Any ideas/suggestions??

gazzagazza
07-01-04, 07:57 PM
Tried the new version last night. Lanzcos4 (no additioanl sharpen) does indeed seem to bring out more detail. I was running a little denoise3D, but found that I needed to add the levels filter, just with output set 0-234 (ie essentially no effect) to get completely smooth replay. Turn off levels and CPU usage was higher and things less smooth.

New Bicubic defualt is too soft in my opinion...

vpopovic
07-01-04, 08:06 PM
Levels could shave off CPU usage if you are reducing the bandwith of the signal before rezise. Old trick never fails.

Bicubic is great - you can always add more sharpen through the filters. Default bicubic is really at 0.5 sharpen and this one is at 0.6 sharpen, so it already "rings" a bit.

strangethingz
07-01-04, 08:18 PM
Hey guys,

I have a question....

If my CPU is only running at around 70-80%, but playback is slightly slow/stuttered is it because I've hit the ceiling on my video card's capabilities?

It's a ATI AiW 8500DV, Athlon XP 3000+, 1GB pc2700

I get smooth playback when resize is no more than 1024 x 576... when I bump up to 1280 x 720 or 1440 x 960, I'm only using 75% CPU but the video is slightly stuttered...

I'm thinking I need to upgrade the video card to resize higher?

hoops10
07-01-04, 08:24 PM
Well, I am using the Barton 2500+ o/ced to about 2.2 ghz, and I am trying to get 1440x960 fluidly. I am getting a little bit of studdering but the CPU usage stays at 100%. What can I do to get it to play dvds smoothly? I ma using a Radeon 9600 Pro.

vpopovic
07-02-04, 12:14 AM
Andy did an extraordinary job (whether he is aware of it or not). I just tested my favorite horizontal scaling bug scenes (LORT ROTK CH I, The Last Samurai, chs 9 and 17). The result is even beter than with Avisynth. There is no banding of any kind. Andy, whatever you did it realy works.

Charles Black
07-02-04, 12:49 AM
Charles can you explain why it is bad that it is converted to YV12 instead of YUY2 ?Just on general principle. Every time the video signal gets processed data is distorted so reducing the number of filters will improve quality.

DVDs have 4:2:0 encoding which is similar to YV12 so it is practicle to use YV12 through the entire video chain up to when it becomes (ignoring digital out) RGB. Most Avisynth filters were origionally YUY2 and in version 2.5 or soon after they had YV12 as well. So YV12 is new-ish.

As far as ffdshow goes there are a couple of points that are worth concidering when deciding whether to use YV12 or YUY2.

If you are using a decoder that decodes to a quality YV12 it seems natural to use it. The only negitives are that it may not process (filters) with as little chroma distortion as YVY2. I don't know whether this is true but am going to do some tests some day soon to see. The other negative may be a positive for many - the output of YV12 is level 16 is black and level 235 is white. This is the best input for a TV type monitor - however it is not good for computer monitors which expect black at level 0 and white at level 255.

If your decoder outputs YUY2 only then there are two routes to concider.

The first is the most popular one of having YV12 output. This tells ffdshow that it should convert the YUY2 input to YV12 right away and use this lower data rate coding for all the filters. The output will be black at level 16 and white at level 235.

The second and very unpopular (the way I do it) method is to force the ffdshow output to YUY2. This leaves the YUY2 input as YUY2 through the entire chain right up to the conversion to RGB. Black is level 0 and white is level 255. You can not have any other filters, other than resize, if you want to stay in YUY2 all the way. You can use the sharpen in the resize filter.

If you want to use filters and have a computer style (black = 0mv and white = 700mv) monitor you can try (at least) four different approaches.

First and easiest on everything but your cpu - just force the ffdshow output to YUY2 and you will have black at level 0 and white at level 255.

Second and easy on cpu cycles - use the ffdshow black end levels slider to set the input to level 16 and the output slider to 0 as well as the white end slider to 235 and the output slider to 255.

Third - a little more difficult, still easy on cpu cycles, and you can see your work in the levels histogram which is fun. Use Picture properties and set the luma offset to -16 and the luma gain to about 148.

Fourth and very challenging - set you crt projectors G2 and Drives to have black at level 16 and white at level 235. Maybe I should have left fourth out since its limited to the "lucky" ;) few that can adjust there crts individually. The calling here is that the loss of 31 levels incurred during the shift in levels is avoided. I can't see the lost levels on greyscale tests on my machine but that may not be true for you. My old projector used to suffer from grey pedestal and grey instead of white untill I did this calibration. Video looked great but on screen windows generated by the projector had quite a bit of bloom.

Charlie

gazzagazza
07-02-04, 01:31 AM
Charlie,

The recent scope "look" I had at the output of my HTPC was with, I'm pretty sure, YV12 out of ffdshow... and my black was at 0mV and my white was at 700mV. Brightness & Contrast overlay controls were pretty close to what I'm seeing the TT guys have found. I will verify this next time I get a chance....

This is with Radeon 9600 card Overlay mode.

I am detecting a little "thread bleed" beginning here...:D

vpopovic
07-02-04, 01:33 AM
While your analsys might be right for your setup, generally there should not be much difference between YV12 and YVY2 output (chroma bug?) as your video card should be able to distinguish between the two and map all the inputs-outputs correctly. It is not clear whether some decoders (FM 3.0 beta) can decode natively to YV12, or they have to covert to YV12, but in any case most people seem to prefer this color space conversion over software chroma upsampling. I see no reason why this would be different with DVI or RGB output, unless video card drivers are "broken".

If it does not work this way, something is not working right. ATI seems to mess up things along the way, but at the end of the day you should be able to get levels right without excessive tweaks.

Spoonfed
07-02-04, 03:37 AM
PowerDVD 5 and Elecard bulid 2510 both can output, or should i say input YV12 to FFDShow. WinDVD5 does not, i guess due to the silly abstract DMO it needs.

As for "back at 16" etc, what is best for a DLP projector feed VGA?

BangoO
07-02-04, 03:50 AM
Thx a LOT Charles, now I get it (moreover, that's what I had understood by myself :D) !
But... I read before (from bblue I think) that he sets input white at 235 and output black at 16 when he uses YV12... do you have an idea why ?

BangoO
07-02-04, 05:11 AM
I tested all that... I compared a very dark scene that I know by heart.
I compared YUY2 (denoise + resize) vs YV12 (denoise + resize) vs YUY2 (with only resize) and I can't tell the difference. In any case, the blacks are not crushed at all !
I also tried the Levels, and the image gets really too dark, I need to use a big gamma to correct it...

Charles Black
07-02-04, 11:28 AM
While your analsys might be right for your setup, generally there should not be much difference between YV12 and YVY2 output (chroma bug?) as your video card should be able to distinguish between the two and map all the inputs-outputs correctly. It is not clear whether some decoders (FM 3.0 beta) can decode natively to YV12, or they have to covert to YV12, but in any case most people seem to prefer this color space conversion over software chroma upsampling. I see no reason why this would be different with DVI or RGB output, unless video card drivers are "broken". Vlad,

Thanks for this clarification. I forgot to put in the usual disclaimer that your mileage may vary.

My system is using a old TI500 video card and newer cards or other manufacturers cards may handle video differently. That is why it would be nice if everyone had a scope yo check their output. Also I use only VMR9 at this point.

The differences in output that I noted with different ffdshow paths are in the luma output - not the chroma. I haven't started looking at chroma accuracy yet. All the measurements were taken inside the ffdshow filter chain so they should be repeatable between systems. Video card RGB output voltages may vary slightly, from model to model, but if a card deviates far from a 0 to 0.7 volt range it is probably defective.

It is possible that some video cards could handle YV12 luma different than YUY2 luma but I hope this is not so as it would be chaotic. My card simply passes the video levels directly (via the LUT) to the DAC. It is easy to calculate how many volts out a certain video level in (to the card) should be and then measure them with a scope to check accuracy. The video card RGB output for a video level 0 input is 0 volts and the output for a level 255 is 0.714 volts. All the in between steps are evenly spaced unless you are loading the LUT with an ICM profile or Powerstrip gamma/color adjustment.

Charlie

N3W813
07-02-04, 11:29 AM
Does the > 2000x2000 resolution only work for VMR9? I've tried using VMR7 in ZP4.0, and I get a pin connection error between ffdshow and vmr7. VMR9 works fine, but I get alot of tearing, that's the reason why I went with VMR7.

I'm running at 1920x1488 with lanczos4 now, truly amazing. More detail that wasn't there before. Keep up the good work AndyIEG!!!

Charles Black
07-02-04, 11:31 AM
Vlad,

I forgot to mention that I am using Zoom Player and that it may be a factor too since ffdshow outputs to it.

Charlie

Charles Black
07-02-04, 11:46 AM
The recent scope "look" I had at the output of my HTPC was with, I'm pretty sure, YV12 out of ffdshow... and my black was at 0mV and my white was at 700mV. Brightness & Contrast overlay controls were pretty close to what I'm seeing the TT guys have found. I will verify this next time I get a chance.... Gazzagazza,

I wonder if difference is due to overlay? I use VMR9. I never have run any tests on my system with overlay since I use icm profiled video which requires VMR9.

Charlie

Charles Black
07-02-04, 12:01 PM
BangoO,

Are you running VMR9? Maybe it would help if you reposted what your system is at the present.

Charlie

AndyIEG
07-02-04, 01:13 PM
Originally posted by N3W813
Does the > 2000x2000 resolution only work for VMR9? I've tried using VMR7 in ZP4.0, and I get a pin connection error between ffdshow and vmr7. VMR9 works fine, but I get alot of tearing, that's the reason why I went with VMR7.

mhh im using zoomplayer 4.0 beta3 and i can resize to 4x dvd resolution and connecting to VMR7...

Make sure yv12 is the output colorspace.
Im using lanczos4 (sharpen 1.0) and level filter all other disabled, im not sure if i manualy changed some merits or stuff... since my windows is kinda old install and mainly a development system with lotsa SDK and crap installed. But as far as i remember im using the normal retail dx9b.

Can u give me the decoder and full zoomplayer dvd config u use? Maybe i can reproduce this connection problem.


PS: small updated version to fix some crashes with libavcodec and P4

2004-07-02 Andy2222 (ffdshow-20040701a_SSE2.exe)

* more robust/compatible compiling options for libavcodec.dll & mplayer.dll

2004-07-01 milan_cutka

* logoaway processes chroma planes
* MSS2 support in VFW

BangoO
07-02-04, 02:27 PM
Originally posted by Charles Black
BangoO,
Are you running VMR9? Maybe it would help if you reposted what your system is at the present.
I have a Radeon 9700 Pro BBA MP-1 on a Sony 1292 (9" CRT projector), and I use Overlay.
I use ffdshow-20040616_SSE2.exe and ZP with WinDVD6 filters.

pbpatel98
07-02-04, 03:28 PM
AndyIEG,

What would be the best way to trouble shoot resize performance issues? Would you concentrate on memory timings, FSB, video card or software settings? I guess what order would you try to make improvements? What are the most expensive operations?

I'm still trying to figure out why my AMD 64 3200+ / Radeon 9800 Pro system running nothing but XP/Zoomplayer/WinDVD cannot achieve Bicubic Resize (1440x960) on my NTSC DVDs even w/ your latest SSE2 optimizations and performance settings tips.

As always any suggestions/advice is appreciated. Thanks...

AndyIEG
07-02-04, 04:47 PM
Originally posted by pbpatel98
AndyIEG,

What would be the best way to trouble shoot resize performance issues? Would you concentrate on memory timings, FSB, video card or software settings? I guess what order would you try to make improvements? What are the most expensive operations?

I'm still trying to figure out why my AMD 64 3200+ / Radeon 9800 Pro system running nothing but XP/Zoomplayer/WinDVD cannot achieve Bicubic Resize (1440x960) on my NTSC DVDs even w/ your latest SSE2 optimizations and performance settings tips.

As always any suggestions/advice is appreciated. Thanks...

depends, what u mean with "cannot achieve Bicubic Resize (1440x960) on my NTSC DVDs"

u assume u get dropped frames?
Its prolly no hardware or ffdshow problem, since on my AMD64 3000+ with bicubic and 2x dvd resolution im getting just 60% cpu usage.
Its prolly something with windvd decoder and NTSC or filter graph.
There are many drivers/filters stuf included to play a dvd and its realy hard to say what cause the problem for u.
I would start with graphedit or media player classic and build a manual graph. Also video drivers/settings can cause this.
I bet its some crazy settings in the windvd decoder since using those decoder with zoomplayer is not recommend and a kinda hack. I assume u have all those abstract and stuff setup wich is needed for windvd and zoomplayer.

A good idea is to test the elecard or opensource mpeg2 decoder. Try rip a NTSC dvd with dvd decryptor and get rid of all ssc and copy protection stuff, than try play this rip with zoomplayer using elecard or the opensource mpeg decoder. If the problem still remains try diff. audio render and filters.

pbpatel98
07-02-04, 04:55 PM
depends, what u mean with "cannot achieve Bicubic Resize (1440x960) on my NTSC DVDs"

u assume u get dropped frames?
Its prolly no hardware or ffdshow problem, since on my AMD64 3000+ with bicubic and 2x dvd resolution im getting just 60% cpu usage.
Its prolly something with windvd decoder and NTSC or filter graph.
There are many drivers/filters stuf included to play a dvd and its realy hard to say what cause the problem for u.
I would start with graphedit or media player classic and build a manual graph. Also video drivers/settings can cause this.
I bet its some crazy settings in the windvd decoder since using those decoder with zoomplayer is not recommend and a kinda hack. I assume u have all those abstract and stuff setup wich is needed for windvd and zoomplayer.

A good idea is to test the elecard or opensource mpeg2 decoder. Try rip a NTSC dvd with dvd decryptor and get rid of all ssc and copy protection stuff, than try play this rip with zoomplayer using elecard or the opensource mpeg decoder. If the problem still remains try diff. audio render and filters.

I'll try out your suggestions. I've tried using Sonic decoders with the same results. If it's not too much trouble could you send me one of your graphs that you know works? Thanks...

KingKong954
07-02-04, 05:07 PM
AndyEIG:

how is that open source mpeg2 decoder as far as performance and quality are concerned? this mentioning of issues w/ Windvd (i use v6.) decoders caught my eye..

AndyIEG
07-02-04, 05:17 PM
Originally posted by KingKong954
AndyEIG:

how is that open source mpeg2 decoder as far as performance and quality are concerned? this mentioning of issues w/ Windvd (i use v6.) decoders caught my eye..

The opensource mpeg2 decoder is the fastest decoder around, there is also no croma bug and the quality is the same like windvd or elecard. The problem is u cant direct play dvd using this filter, i think the css and copy protect stuff is missing.

taci
07-02-04, 05:20 PM
Andy,

I am having trouble when I try to resize over 2048 using your latest SSE2 release. Whenever I resize over 2048, I get pin connection error between VMR9 and ffdshow. The maximum horizontal that I can resize is 2048. I am using WinDVD 5.0 for the decoder. Is this a limitation of VMR9 or my hardware? My hardware includes 128 MB Radeon 9800 non-pro and 256 Mbyte of RAM.
Thanks

Taci

AndyIEG
07-02-04, 05:30 PM
Originally posted by taci
Andy,

I am having trouble when I try to resize over 2048 using your latest SSE2 release. Whenever I resize over 2048, I get pin connection error between VMR9 and ffdshow. The maximum horizontal that I can resize is 2048. I am using WinDVD 5.0 for the decoder. Is this a limitation of VMR9 or my hardware? My hardware includes 128 MB Radeon 9800 non-pro and 256 Mbyte of RAM.
Thanks

Taci

mhh good question, seems X is not limited but Y resolution is. Im also getting connection errors if i go over 2000 Y with vmr7 but vmr9 works. I can alos run 3200x2500 and VMR9 + elecard decoder.

maybe its a problem with windvd decoder

PS: with windvd5 and zoomplayer i can run 3200x2500 in VMR9 (windowed or windowless)
I have no clue why on some system's u have problems and on some not... maybe owen or blight know what cause this. Again.. for me all works, with vmr9 and windvd/elecard i can do 3808x2500

hoops10
07-02-04, 05:34 PM
Andy, if I am using an ffdshow that is not yours and I am using a 2500+ barton and I am trying to get 1440x960 to work (which I am getting skipped frames), would your version help me? I am using a Radeon 9600 Pro card also with ZP.

AndyIEG
07-02-04, 05:42 PM
Originally posted by hoops10
Andy, if I am using an ffdshow that is not yours and I am using a 2500+ barton and I am trying to get 1440x960 to work (which I am getting skipped frames), would your version help me? I am using a Radeon 9600 Pro card also with ZP.

maybe, but only if u use denoise3d. The barton has no sse2 support so the sse2 code will crash your pc. I released a ffdshow version with a mmx2 optimized version of the denoise3d filter. I will redo some stuff in this mmx2 code and send the changes to milan, so he can include the code into the none sse2 version.
I dont want spend more time on mmx2 code, since its time consuming and i need a full sse2 version for a AMD64 version, wich is my next big goal. So u can try the none sse2 version on my download link. I will gather the stuff wich also work for the mmx2 version and send all to milan so the next Athos compile will have some little speed boost too.

PS: Seems a 64bit version makes sence now, since Intel also release a AMD64 compatible cpu and all i can say is that a 64 bit version is very recommend for all codec or multimedia software.

Owen
07-02-04, 06:00 PM
Andy,

Thanks for the new build, nice work.

I would also like to thank Milan Cutka for his ongoing development work and bug fixes.

I am sure that everyone in the HTPC community appreciates the time and effort that both of you have contributed.


Regards,

Owen

hoops10
07-02-04, 06:19 PM
Andy, once you send it to milan and he introduces it into the non sse2 version, where will I be able to download it at? Thanks.

Owen
07-02-04, 06:23 PM
Andy,

I am also unable to use resize above 2000 on my P4 system with Zoom Player and VMR9 (pin connection error), but it works with overlay on my Radeon 9600, DX9b combination.
On my Geforce 5200 notebook the opposite is happening.
I can resize above 2000 with VMR9 but NOT overlay. :confused:


It would appear that this may be a problem with the Radeon drivers.

Also, can you explain what the parameter setting is now doing with Bicubic and what settings are equivalent the old Bicubic default settings?



Regards,

Owen

AndyIEG
07-02-04, 06:52 PM
Originally posted by Owen
Andy,

I am also unable to use resize above 2000 on my P4 system with Zoom Player and VMR9 (pin connection error), but it works with overlay on my Radeon 9600, DX9b combination.
On my Geforce 5200 notebook the opposite is happening.
I can resize above 2000 with VMR9 but NOT overlay. :confused:


It would appear that this may be a problem with the Radeon drivers.

Also, can you explain what the parameter setting is now doing with Bicubic and what settings are equivalent the old Bicubic default settings?

Regards,

Owen

the Bicubic parameter change the coef's used to resize the image, since the mplayer resizer use some kind of reduce routine for those coefs its kinda random what tap deep is used. I changed the default parameter from -0.6 to -0.08 since with -0.6 the filtersize is higher than with -0.08.
Im still not sure what those coefs realy mean (im not a resize algo expert) all i know is that the image looks a bit smoother with the new parameter but with higher shapen values i could compensate this and still use the faster internal routines.
If u want the old setting set the parameter around -0.6, but this result in lesser coef reduction and higher filtersize.

PS: i realy have no clue what this pin connection error is about, maybe try build a manual graph in graphedit. But seems its some kind of limitation in the ATI drivers. On my Geforce 5700u i can use VMR7 and VMR9 with values higher than 2000x2000. Overlay seems limited to 2000x2000 max.

bedo
07-02-04, 09:01 PM
Originally posted by AndyIEG
The new modes u can now use Lanczos1-10 was already in ffdshow. Means ffdshow always had this modes in the mplayer lib.

Is it possible to include this in the non SSE2 build? This is my chance to say thank you.... Thank you for ffdshow!!!!! :)

Jeff

AndyIEG
07-02-04, 09:39 PM
Originally posted by bedo
Is it possible to include this in the non SSE2 build? This is my chance to say thank you.... Thank you for ffdshow!!!!! :)

Jeff

just use Athos latest build it has the same fix http://athos.leffe.dnsalias.com/

KingKong954
07-02-04, 10:31 PM
Originally posted by AndyIEG
i think the css and copy protect stuff is missing.

what can I do to get around this? I am all about performance + quality improvements.

Spoonfed
07-02-04, 10:32 PM
Originally posted by AndyIEG
The opensource mpeg2 decoder is the fastest decoder around, there is also no croma bug and the quality is the same like windvd or elecard. The problem is u cant direct play dvd using this filter, i think the css and copy protect stuff is missing.

Would it work with the DVD Region free or AnyDVD running? I guess though it also has no menu "highlight" support much like build 2510 of elecard i have, hmmm trying to move off WinDVD 5 codec for DVD playback :)

Mark_A_W
07-02-04, 11:00 PM
I get a pin connection error with the Gabest opensource decoder and ffdshow. I am playing PAL movies, but it doesn't get far enough that that would matter. I am running Anydvd.

Windvd5 still gives me the best all around results.

Spoonfed
07-03-04, 12:04 AM
Mark,

Yeah same here, for DVD playback its pretty good for me, but i use Elecard for file playback.

BTW, got reclock working quite well. Managed to get Powerstrip to "engage" and actively fix the refresh at about 50hz, well it gets to 49.996, which is much better than the 50.105hz windows/driver manages.
The "non pstrip" refresh gives about 74 or more "dropped" AC3 frames per hour, and i do notice it esp with music DVD's, but with the 49.996 pstrip refresh i think during entire Finding Nemo at most 4 were dropped/repeated, which is acceptable. It seems however sometimes have to open the "display config" section of pstrip for it to actively "engage" and monitor/fix the refresh, which is annoying.

Know of any other similar app that can tweak/hold/monitor the refresh?

Is it possible that ReClock could have such an abilty added? It would make a big difference, hmmmm i wonder.

vpopovic
07-03-04, 02:00 AM
Originally posted by AndyIEG
mhh good question, seems X is not limited but Y resolution is. Im also getting connection errors if i go over 2000 Y with vmr7 but vmr9 works. I can alos run 3200x2500 and VMR9 + elecard decoder.

maybe its a problem with windvd decoder

PS: with windvd5 and zoomplayer i can run 3200x2500 in VMR9 (windowed or windowless)
I have no clue why on some system's u have problems and on some not... maybe owen or blight know what cause this. Again.. for me all works, with vmr9 and windvd/elecard i can do 3808x2500

Andy,

Are you saying you have a smooth playback at 3808x2500? That would be something...

On my Quadro FX 1100 I can do above 2000 resize with both VRM7 and VMR9. When I check "prefer overlay" in FM 3.0 Beta, it just kicks in VMR9 (probably can't connect to overlay so defaults on VMR9).

vpopovic
07-03-04, 02:16 AM
Originally posted by AndyIEG
the Bicubic parameter change the coef's used to resize the image, since the mplayer resizer use some kind of reduce routine for those coefs its kinda random what tap deep is used. I changed the default parameter from -0.6 to -0.08 since with -0.6 the filtersize is higher than with -0.08.
Im still not sure what those coefs realy mean (im not a resize algo expert) all i know is that the image looks a bit smoother with the new parameter but with higher shapen values i could compensate this and still use the faster internal routines.
If u want the old setting set the parameter around -0.6, but this result in lesser coef reduction and higher filtersize.



Above 0.5 coef sharpens the image in bicubic. Less than 0.5 does not. With 0.08 you are not sharpening. As far as how is 0.08 different than 0.5, there is a good write up on resize coefs and how it works on Avisynth page.

yesgrey3
07-03-04, 05:33 AM
One of the changes recently made by milan in ffdshow was what I was expecting for a while...
Now we can change the position of deinterlacing.
Thanks milan! for viewing video interlaced material this is good news.

I have played with 20040701_SSE2 version for a while and the results are very good.

After a few tests the better image for now is with the following configuration:

-change order of deinterlacing and put it after resizing
-select denoise3d with all set to zero, HQ mode
-resize to 1280x1024 with Lanczos3 or Lanczos4
and check the interlaced check box (not the grayed check)
-deinterlace with 5-tap low pass
-output YV12

In this configuration the resizing is done for each interlaced field and the deinterlacing is done after the resizing, so the image is much more smooth. The menu letters are slightly blurred, but the moving images are much better, with less stair-stepping.

The higher you resize the better the results, I suppose.
With this configuration it stutters a bit, even with only 53% cpu utilization.
I had less stutter with elecard decoder, the sonic gave me more.
Also with overlay mixer I had less stutter than with VMR9.

I had select denoise3d because without it the image stutters more, and also appears some strange horizontal lines in the image. With it the image is ok.

With this configuration we must use Lanczos3 as the minimum.
All the other lower methods give a pixellated image.

Please give it a try and let us know about your experiences...

Now for AndyIEG (and also to milan),
Thanks for all the great work you are doing with ffdshow.
I would like to ask if is there any optimizations with SSE2 that you can implement in the deinterlacing code?
I feel the 5-tap low pass is the better one.

My HTPC:
P4 2.4C @ 3.0GHz
512MB
Radeon9500 softmod to 9500pro

Sam

TheLion
07-03-04, 07:17 AM
@AndyIEG

I just finished my testing for your present release. Conclusion: By FAR the best ffdshow release ever. Speed for Lanczos resizing increased by about 12% in average on my P4. Lanczos 3/4 works flawlessly -> no more green or hor. lines bug - what a relief!

I always prefered resizing to my "native" output resolution (720p) over multiple resizing - ANY resizing comes with artifacts! The reason why some (most) of you prefer resizing to very high resolutions is that first of all the picture "appears" to be sharper (upsizing an image followed by downscaling it is a commenly used method of sharpening a photo in (professional) digital photography).At the same time picture details appear smoother=less contrast in details because the picture gets downscaled by the bilinear filter of VMR9 (overlay is a different story) which results in a considerably weaker frequency response. Take a look at resolution test charts with very fine detail (PAL 576TVL).

@AndyIEG
One (final) wish for future ffdshow releases remains: PLEASE make the other Lanczos tap settings (5-8) work as fine as Lanczos3/4 -> no green bug (everything above Lanczos4 still has the green bug), no horizontal lines bug, SSE2 speed advantages. I consider the sweet spot of resizing lies anywhere between Lanczos4-8. I can easily come up with DVD sequences showing clear advantages of Lanczos6/7/8 over 3/4. Everything higher introduces way too much ringing to be usefull.

Everbody should try Lanczos1 or Lanczos2 and compare it to higher settings to see my point!

Thank you very much!

Spoonfed
07-03-04, 07:45 AM
yesgrey3

i read your post with interest on abilty to "move" the interlaced position. I did try this briefly, but seems to not give any improvment perhaps introduced motion artifacts.

interestingly though on my 2400+ Andy's latest non SSE2 version is a massive 20% more CPU effience with the same settings in the same order. It seems to be his Denoise that is much more efficient. good work.

AndyIEG
07-03-04, 10:31 AM
@yesgrey3
"I had select denoise3d because without it the image stutters more, and also appears some strange horizontal lines in the image. With it the image is ok."

u can also use level or gradual denoise at 0, this is cause with a filter enabled before resize yv12 is used and atm my new resize code only work on yv12 input. I just notice that some of u also use only resize without anything before... so yuy2 is the input, i will also rewrite this routine.

Since with yuy2 input the old code is used u see the old bugs.

@TheLion like i stated everything about lanczos4/spline/sinc use the old mmx2 code. I will try implement the changes for higher tap deeps also. The main problem is that the 4tap and 8tap routines perfect fit in the SSE2 registers, all 5/6/7 tap are using the 8 tap routine since they dont fit. For everything over 8 tap (lanczos5+) its a headshacking problem to write a routine wich is as fast as the 8tap version.

Oki i must admit that the release was not fully finished, but the stuff i had done worked so i released the version. I will try to finsih the code for yuy2 input and filtersize higher than 8.

For the deinterlace... um i dont have interlaced material for testing. But deinterlace had a low priority on my todo list, since i assumed no1 realy need this cause most dvd or stuff isnt interlaced anymore and also the windvd and other decoder do a good job i think?

@vpopovic nah i cant playback 3800x2500 was just to test how high i can connect to the render. The best i had so far without any other filter using the fast bicubic mode and xvid avi playback, was 3000x2500 or so.
For the -0.08 coef i dont realy know what happens, but in result the reduce routine can reduce the filtersize to 4 chroma/luma/Y and the fastest routines are used. For nearly any other coef the filtersize is beetwen 4-8, wich means with some coefs u could also us elanczos3/4 with the same speed. Since u can still change the parameter i was thinking to default to a value wich use the fastest possible routines as standard. There are also other parameters wich result in filtersize 4 but the image looked strange to me so i choose -0.08.

.....
07-03-04, 11:53 AM
Originally posted by AndyIEG
just use Athos latest build it has the same fix http://athos.leffe.dnsalias.com/

Are you sure about this? I tried the 20040629 one yesterday, and on my system it uses 25% more CPU than yours (20040607a) with the exact same settings. Also, the changelog on that site doesn't mention any of your work.

Never mind, I noticed the question was regarding a different feature, not denoise3d. However, it would be REALLY nice if they included your optimized denoise3d code.

vpopovic
07-03-04, 06:54 PM
Originally posted by TheLion
[B

I always prefered resizing to my "native" output resolution (720p) over multiple resizing - ANY resizing comes with artifacts! The reason why some (most) of you prefer resizing to very high resolutions is that first of all the picture "appears" to be sharper (upsizing an image followed by downscaling it is a commenly used method of sharpening a photo in (professional) digital photography).At the same time picture details appear smoother=less contrast in details because the picture gets downscaled by the bilinear filter of VMR9 (overlay is a different story) which results in a considerably weaker frequency response. Take a look at resolution test charts with very fine detail (PAL 576TVL).

[/B]

The reason I upsize beyond my 720p display resolution is to smooth the image and get more detail. FFDshow 720p resize compared to 2560x1440 or 1920x1440 for display at 720p introduces new level of detail in the image. It also reduces the noise, blocking, banding, etc in the background.

VMR9 uses bilinear filtering 2x2=4 tap in default settings, but if you use 8x antialiasing you are using 4x8=32 tap. There are many problems in making this work, and at the end of the day benefit is not more than 10-15% (together with some other 3D tweaks). The result is increased contrast, color saturation and level of detail. As far as resolution test (both chroma and luma patterns in DVD), in my system the higher you go with resize the better it resolves detail.

vpopovic
07-03-04, 07:08 PM
Andy,

You did a great job with bicubic default IMO. It looks good enough and is fast. Having it optimized for speed with default settings is a great idea.

Overall I am very very pleased with this release that increased overall performance and quality to a significant degree. For the first time I have hit the limit on the GPU - I actualy can't go above 1920x1440 without some micro-tearing issues on Quadro FX 1100. CPU can easily do 2560x1440 with some 80-85% utilization, but GPU starts causing problems. It looks like I am either hitting the limit on 128-bit memory interface, or all the beta crap on my system just can't take it any more. Can your FX 5700 Ultra do VMR7/9 above 2000 without any issues?

Ash Sharma
07-03-04, 08:14 PM
Andy,
I have been playing with ffdshow and Theatertek on my HTPC 3.4 ghz P4 - 2 Gig Ram
CPU Utilization is 0 to 1% in OSD
The problem is that I cannot resize to anything apart from 1280x720 which is my Benq 8700 dlp projectors native resolution
Any other resize (say 1440x960) gives the wrong aspect ratio (i have set donot correct aspect ratio)
I would love to resize higher to make full use of ffdshow - any ideas?
Thanks

jvincent
07-03-04, 08:21 PM
Originally posted by Ash Sharma

Any other resize (say 1440x960) gives the wrong aspect ratio (i have set donot correct aspect ratio)


Ash,

When you use ffdhsow with TT you have to redefine the aspect ratios in the TT AR editor.

Uncheck the "Keep AR" box in TT and change the size to fit your display.

KingKong954
07-03-04, 08:42 PM
Originally posted by .....
Are you sure about this? I tried the 20040629 one yesterday, and on my system it uses 25% more CPU than yours (20040607a) with the exact same settings. Also, the changelog on that site doesn't mention any of your work.

Never mind, I noticed the question was regarding a different feature, not denoise3d. However, it would be REALLY nice if they included your optimized denoise3d code.

I agree -- the lack of the line issues is AMAZING to have (from Athos' build), but I really need that great denoise3d code back (from good ol' andy), because it's causing stuttering.

keep up the great work, guys!

Ash Sharma
07-03-04, 08:56 PM
Jvincent
Thanks
After weeks of frustration problem solved
I love this forum
Ash

llamameat
07-04-04, 02:57 AM
Vpopovic,
Correct me if i'm wrong, but antialiasing has nothing to do with video playback and will not effect the image in any way. Antialiasing has to do with decreasing jaggies or 'stair-stepping' in polygons at angles on low resolution displays. They are algorithms that search for jaggies and blur them away. VMR is basically just 2 polygons making a square with 2 moving texures laid over top (which allows it to use the GPU on modern video cards), there are no polygons in the image that it would alter.
Bilinear filtering applies blending to 2d textures and is a dramatic effect that is easily noticable (in gaming it's better to have slightly blurred textures than blocky low-res textures). I know what bilinear filtering looks like and it's definitely not being applied.
Ansiotropic filtering has no effect either, nor does mip-mapping or any of the other features with are designed for 3d-gaming. If you're seeing some sort of benificial difference when playing with these features, it's more likely some sort of fortunate accident in the drivers rather than anything else.

There's just not enough gamers on this forum to squash these rumors :)

Spoonfed
07-04-04, 03:36 AM
On the topic of "jaggies" i seem to have "more of these" lately for all sources. Using FFDShow or not.

I use Overlay on my 9600pro, is the AA mention apply in Overlay? Is this a hardware "fixed" thing or is it driver related? In which case my last Catylist driver update may have been a backward move?

Michele Spinolo
07-04-04, 04:11 AM
Originally posted by AndyIEG

I dont want spend more time on mmx2 code, since its time consuming and i need a full sse2 version for a AMD64 version, wich is my next big goal.


PS: Seems a 64bit version makes sence now, since Intel also release a AMD64 compatible cpu and all i can say is that a 64 bit version is very recommend for all codec or multimedia software.

Andy, just a quick question!

Using a 64bit ffdshow compiled with an AMD64 platform will be enough to take advantage of 64bit optimization, or WinXP64 will be needed too?:confused:

Grooby
07-04-04, 07:25 AM
You know i must be doing something wrong...
i use a P4 3.2 oc'd to 3.5 with 1gig of RAM, a 9600XT oc'd graphics card, zoomplayer pro with latest SSE2 build of ffdshow and i cant resize at all. if i try the picture jerks all over the place, even at x2 PAL 1440 x 1152.

The CPU usage is real low as well, about 30%

Any ideas where i am going wrong?

Thanks
Andy

AndyIEG
07-04-04, 10:41 AM
@Michele Spinolo u will need winxp 64bit edition too

@Grooby "if i try the picture jerks all over the place" what this mean exact? Do u have an filter enabled before resize. What other settings u use?
Does some1 else have similar problems?

vpopovic
07-04-04, 11:17 AM
Originally posted by llamameat
Vpopovic,
Correct me if i'm wrong, but antialiasing has nothing to do with video playback and will not effect the image in any way. Antialiasing has to do with decreasing jaggies or 'stair-stepping' in polygons at angles on low resolution displays. They are algorithms that search for jaggies and blur them away. VMR is basically just 2 polygons making a square with 2 moving texures laid over top (which allows it to use the GPU on modern video cards), there are no polygons in the image that it would alter.
Bilinear filtering applies blending to 2d textures and is a dramatic effect that is easily noticable (in gaming it's better to have slightly blurred textures than blocky low-res textures). I know what bilinear filtering looks like and it's definitely not being applied.
Ansiotropic filtering has no effect either, nor does mip-mapping or any of the other features with are designed for 3d-gaming. If you're seeing some sort of benificial difference when playing with these features, it's more likely some sort of fortunate accident in the drivers rather than anything else.

There's just not enough gamers on this forum to squash these rumors :)

I appologise - I meant anisotropic filtering instead of antialiasing as far as scaling and taps. Antialiasing has an effect, but obviously different. So does LOD setting and most of other 3D settings. There are couple of my posts in different threads that discuss this, together with some low quality screenshots (but the difference is still visible).

I agree with you that it might be a fortunate coincidence in the drivers, and I addressed that as well (in terms how difficult it is to make it work).

Azzad
07-04-04, 11:18 AM
Grooby,

Just try putting a filter before your resize. A good one to try would be Unsharp set to minimum. You move the order of the filters by selecting the filter and then using the arrows which appear.

On my P4 at 3.35GHz I get stutters at 2 x PAL DVD (1440 x 1152) if I don't have a filter before resize.

Aaron

cetoole
07-04-04, 11:48 AM
Vlad,
Since you are reaching the limits of your GPU, what is next? Is some lucky person getting an upgrade?

vpopovic
07-04-04, 01:13 PM
I am looking for an upgrade, but not sure if I will pull the trigger any time soon. I am actualy realy happy with the image quality at this point. New FFDShow and all 3D tweaks are putting DVD very close to 720p HD captured through component inputs.

I will probably try GeForce 6800 Ultra at one point.

cyberbri
07-04-04, 01:46 PM
Originally posted by llamameat
Vpopovic,
Correct me if i'm wrong, but antialiasing has nothing to do with video playback and will not effect the image in any way. Antialiasing has to do with decreasing jaggies or 'stair-stepping' in polygons at angles on low resolution displays.

Actually they can appear more the higher-res your display goes. Gamers who switch from CRTs to HD DLPs or RP-LCDs complain about the PQ, because all that kind of stuff is suddenly blown up and displayed as-is, instead of blurred by CRTs. The more accurate and detailed a TV is, the more you'll see the bad stuff, like anti-aliasing (PS2 is notorious for it, as it's known that it's very hard to implement in the programming - overall PS2 is very hard to program for anyway compared to Xbox, etc.).

It's not just polygons this affects. Any time you take a vertical or horizontal line or edge and start rotating it slightly, you'll get jaggies. I.e., I was spending some time trying to tweak ffdshow settings to get Matrix Reloaded to show correctly (for some reason, it comes out super-dark on ZP-ffdshow and needs a ton of boosting, but it's hard to nail down the right mix), using the 2nd or 3rd ("Upgrades") chapter to check settings (where Neo comes up during the meeting after someone gave him a gift at the door - "tell him he set me free." When the "upgrades" walk in, I got jaggies on the suit collars between the collars and white shirts on these guys.


I've switched from unsharp mask to xsharp (or whatever the first one is) and have been playing around with other stuff as well - still can't get something that seems "right" on Matrix Reloaded - maybe overall "dark" movies are harder to do because you have to worry so much about darks/blacks and shadow detail...

blackmax2k1
07-04-04, 03:32 PM
I must be doing something wrong also cause I have a P4 2.8 with 9600pro and I get 100% CPU usage with PowerDVD encoders in ZP! I'm using Blur/NR denoise 3d before the resize of 1440x960. I use to use WinDVD encoders with the same settings except for gradual denoise instead of 3d with a 2500+ Barton and it worked fine. I would think a 2.8 could handle denoise3d where I wouldn't have to use the crappy PowerDVD encoders. One other thing, I'm using Overlay not VMR9. Any suggestions would be appreciated.

I just tried going from denoise3d to gradual and it still flickers 100%! I checked all encoders to see if hardware accleration was on and it's not.

Azzad
07-04-04, 09:36 PM
blackmax2k1

Your results seem about right to me. Gradual Denoise is far less CPU intensive than 3D denoise. A P4 2.8 is struggling to apply a noise filter and resize to 2xDVD. Lift your system up to about 3.2GHz or more and you should be able to run 3D Denoise with that resize IME.

Realtime Video processing is a serious amount of work for your CPU.

Aaron

gazzagazza
07-04-04, 10:22 PM
Originally posted by blackmax2k1
I must be doing something wrong also cause I have a P4 2.8 with 9600pro and I get 100% CPU usage

I have a similar setup, except I have overclocked to 3.4G. I can run what you are trying without stutters, especially with the latest ffdshow.

blackmax2k1
07-04-04, 11:46 PM
So what you are saying is I shouldn't of upgraded from an AMD 2500 to the P4 2.8 to get any better performance? I would think it could do a little better. I did say that even with using just gradual noise my usage was up to 100%. That doesn't make sense to me.

Azzad
07-05-04, 12:38 AM
blackmax2k1

One other thing to try is turning off the On Screen Display (OSD) it also takes a lot of CPU cycles. What really matters is if you are able to get smooth playback.

I have recently upgraded to from a Celeron 2.2 clocked at 2.7GHz to a P4 2.8 and it was unable to run 3D denoise then resize to 1440x1152 (2xPAL). Overclocking to about 3.2 - 3.3 GHz gave the neccessary CPU power to make playback smooth.

For realtime Video processing you need a little bit of CPU headroom to allow for peaks in CPU demand without dropped frames. Whilst the steady power of a 2.8 may be just enough you will drop frames when the CPU has to work a little harder to process a particular scene or service another application.

If you have a Northwood 2.8C then it will easily overclock to at least 3.2GHz and probably 3.5GHz. At 3.5GHz you will need to slow down your RAM but this doesn't have a big effect on FFDshow speed IME. My 2.8C will runs fine at 250MHZ FSB (3.50 GHz) and the RAM is running at 333 MHz.

Aaron

Owen
07-05-04, 05:54 AM
Originally posted by blackmax2k1
So what you are saying is I shouldn't of upgraded from an AMD 2500 to the P4 2.8 to get any better performance? I would think it could do a little better. I did say that even with using just gradual noise my usage was up to 100%. That doesn't make sense to me.

The P4 should definitely be faster.
Is your mother board Intel based?
Don’t worry about CPU usage. As long as playback is smooth, be happy.

I just tested my desktop Athlon XP overclocked to 2900+ with Geforce FX5200 video and compared it with my 3.06G P4 laptop with FX5200 Go video.

The P4 system can run Denoise3d followed by resize to 1440x1152 using overlay or VMR7 without problems, but VMR9 is out of the question with any useful resize, even using Andy’s ffdshow-20040701b_SSE2.exe SSE2 build.

The overclocked Athlon XP system is slower and has no chance of running these settings, even with overlay using Andy’s ffdshow-20040607a.exe MMX2 build. Although I think it should run better then that, it will never be a match for a good 2.8 P4 system.

The new SSE2 version of FFDShow has widened the performance gap between Athlon XP and P4 even more since the Athlon XP does not support SSE2.

My HTPC with P4 overclocked to 3.5 with overclocked Radeon 9600 Pro, blows the other systems away, and can run Denoise3d followed by resize to 1920x1728 with VMR9

Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

EC51
07-05-04, 06:02 AM
A question.

If I'm not mistaken, Athlon64 line of CPUs support SSE2. How is Athlon64 / FX CPUs perform in terms of SSE2 performance compared to Intel Pentium4 ??

Spoonfed
07-05-04, 06:04 AM
My 2400+ tbird on A7V333 non o'clocked runs 1440 x 1152 (2x PAL) bicubic with a 3D-Denoise First ..........JUST :) This is with Andy's version which gives a massive 20% cpu improvement.

I thought this wasn't to bad for a 1.5year old system that was not "cutting edge" even when i built it.

Owen
07-05-04, 06:38 AM
Originally posted by EC51
A question.

If I'm not mistaken, Athlon64 line of CPUs support SSE2. How is Athlon64 / FX CPUs perform in terms of SSE2 performance compared to Intel Pentium4 ??

From the feedback we have been getting here, it seems that the Athlon 64 with SSE2 is better then the Athon XP’s, but even an Athon 64FX cannot outperform a good P4 system at this time.

Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

Owen
07-05-04, 06:45 AM
Originally posted by Spoonfed
My 2400+ tbird on A7V333 non o'clocked runs 1440 x 1152 (2x PAL) bicubic with a 3D-Denoise First ..........JUST :) This is with Andy's version which gives a massive 20% cpu improvement.

I thought this wasn't to bad for a 1.5year old system that was not "cutting edge" even when i built it.

That’s exceptional performance for a 2400+ Soonfed.
What do you mean by “JUST” ? :D
If you cant run it with ZERO dropped frames, it does not count. ;)


Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

Spoonfed
07-05-04, 07:01 AM
Yes, im quite impressed also.

No dropped frames, the "frame" thingy on FFDShow flickers "spasticly", not sure what this means but can't "see" any dropped frames.

CPU is 82 - 91%

Lancos is 91 to 97% CPU

Given double DVD is not a bad "start" for FFDShow, and that an athlon 2400+ is cheap as chips CPU, perhaps its not a bad "budget" option after all?

The only issue i have is for "some" DTV source i need to deinterlace, which adds 5% CPU.....grrrrrr supid interlaced video :(

Also the Codec is Elecard build 2510

Grooby
07-05-04, 08:54 AM
Well, i cant resize at all on a oc'd P4 3.5, 1gig ram 9600XT oc'd card, zoomplayer pro 4, reclock, cinemaster filters...

I'd like to do 1440 x 1152 minimum for PAL with -
Sharpen - can manage this ok
Denoise 3D HQ - forget about it
Lanczos 4 or Bicubic - ok
Overlay - ok
YV12

i can forget about VMR9 - stutters like hell even using Reclock to try to help.

I've used last 2 SSE2 builds and normal build and cant resize at all !!! been banging my head against wall...

Please somebody tell me what i have got to do to get PAL configured for resize and smooth playback...hell...smooth playback of anykind would be nice!!

Thanks
Andy

Owen
07-05-04, 10:29 AM
Originally posted by Grooby
Well, i cant resize at all on a oc'd P4 3.5, 1gig ram 9600XT oc'd card, zoomplayer pro 4, reclock, cinemaster filters...

I'd like to do 1440 x 1152 minimum for PAL with -
Sharpen - can manage this ok
Denoise 3D HQ - forget about it
Lanczos 4 or Bicubic - ok
Overlay - ok
YV12

i can forget about VMR9 - stutters like hell even using Reclock to try to help.

I've used last 2 SSE2 builds and normal build and cant resize at all !!! been banging my head against wall...

Please somebody tell me what i have got to do to get PAL configured for resize and smooth playback...hell...smooth playback of anykind would be nice!!

Thanks
Andy

What mother board are you useing?
Is it an Intel based 865 or 875 board with dual channel RAM?.
What is your CPU usage with system idle?
You need to use a filter like Denoise3d or Gradual Denoise BEFORE resize or it will stutter.
If all else fails, you may need to do a clean Win XP install and start from scratch.



Regards,

Owen

blackmax2k1
07-05-04, 12:15 PM
Originally posted by Azzad



If you have a Northwood 2.8C then it will easily overclock to at least 3.2GHz and probably 3.5GHz. At 3.5GHz you will need to slow down your RAM but this doesn't have a big effect on FFDshow speed IME. My 2.8C will runs fine at 250MHZ FSB (3.50 GHz) and the RAM is running at 333 MHz.

Aaron

Too bad I have a Dell 2.8 with Intel mobo so I can't overclock. I have 512MB of ram would going to 1GB and using dual channel help?

cetoole
07-05-04, 12:23 PM
Owen,
Did overclocking your 9600 pro make any changes in your system?

Vern Dias
07-05-04, 01:00 PM
I would like to point you all to my post:

http://www.avsforum.com/avs-vb/showthread.php?postid=4018775#post4018775

in another thread.

Owen, could you please look at this? I don't know if it is related to a my P4, or hyperthreading, or what, but it is a major issue with trying to calibrate levels with FFDShow.

Vern

gazzagazza
07-05-04, 06:44 PM
Originally posted by Spoonfed
Also the Codec is Elecard build 2510

I tried this codec... works well, but it has a watermark in upper right. Is this removable?

gazzagazza
07-05-04, 06:48 PM
Originally posted by Grooby
Please somebody tell me what i have got to do to get PAL configured for resize and smooth playback...hell...smooth playback of anykind would be nice!!

Thanks
Andy

Andy,

Are you using Win XP? Is the machine used for anything else, or HTPC dedicated? If dedicated have you been through and optimized the OS, got rid of unnecessary services etc?

bedo
07-05-04, 08:07 PM
Originally posted by AndyIEG
just use Athos latest build it has the same fix http://athos.leffe.dnsalias.com/

Well, I tried your suggestion and dl'ed ffdshow-20040629.exe from there...

I had never been able to do anything beyond 1152x768 with everything disabled and I am now running stable at 1440x960 with denoise3d before resize!!
Hope I didn't jinx myself! I am running a 2800+ with a abit nf7-s v2.

Thank you very much for this unexpected find.

Keep up the good work, I hope I can join you down the road with some more righteous hardware.

bedo

AndyIEG
07-05-04, 08:12 PM
hehe,

u want hear the funny part? I had nothing to do with this build, my advice was just that Athos new version also has the parameter fix for scaling. It still hasnt the mmx2 denoise3d code or anything else from the sse2 version. But im glad u are happy

so dont thx to me send a thx to Milan and Athos:)

@gazzagazza my pc is a total mixed or should i say messed WinXP atm, means i have lotsa tools, tweaks, SDK packs 5 diff. compilers and other programming stuff installed. Im still quite happy that this install still works without problems for programming, gaming and my silly dvd/xvid playback needs. Yes i disabled services using latest nvidia beta drivers and running a cheap ASrock AMD64 SIS MoBo with latest agp drivers.
Im realy not sure why u dont have smooth playback, is this only for DVD or xvid also? Did u tryed media player classic or crystal player?

Vern Dias
07-05-04, 09:32 PM
AndyIEG, can you look into the Picture properties - Luminance offset issue for me, please? The gory details are here:

http://www.avsforum.com/avs-vb/showthread.php?postid=4018775#post4018775

Blight thinks it might be a table lookup error in the optimaization routines.

Thanks, Vern

gazzagazza
07-05-04, 09:35 PM
Originally posted by AndyIEG
@gazzagazza my pc is a total mixed or should i say messed WinXP atm, means i have lotsa tools, tweaks, SDK packs 5 diff. compilers and other programming stuff installed. Im still quite happy that this install still works without problems for programming, gaming and my silly dvd/xvid playback needs. Yes i disabled services using latest nvidia beta drivers and running a cheap ASrock AMD64 SIS MoBo with latest agp drivers.
Im realy not sure why u dont have smooth playback, is this only for DVD or xvid also? Did u tryed media player classic or crystal player?

Sorry, by Andy I was talking to "Grooby" who signed himself Andy... he was asking about why he can't get the performance he expected...

Mark_A_W
07-05-04, 10:15 PM
I noticed that if you use Gamma in Levels it will clip black and white on the DVE ramp test pattern (topic of some discussion in the pluge thread....)

hoops10
07-05-04, 11:08 PM
bedo: I have a similar setup to yours. What are all the settings that you are using? Thanks.

gamma_seraph
07-06-04, 01:11 AM
I would like to report my success:

Intel P4 2.8c
MSI 865PE chipset
Kingston value RAM 128MB x2 - running dual channel
Radeon 9800 Pro 128MB
Fresh install of WinXP pro with all newest updates and drivers
lastest Zoom Player and FFDshow with PowerDVD 5 filters.
Got FFdshow from here (http://athos.leffe.dnsalias.com/)
Used settings for FFDshow found here: HTPC News (http://htpcnews.com/main.php?id=ffdshowdvd_1)

No Overclocking in any fashion

PowerDVD plays just fine but Zoom Player locks up after messing with FFDshow. Will not play anything for more than 5 sec now.

Turned off Hyper-Theading

Zoomplayer works now but has horrible stuttering. Any tweaking to FFDshow yields poor results. PowerDVD plays the same.

Uninstalled FFDshow and reinstalled this version: ffdshow-20040701b_SSE2.exe found here (http://mitglied.lycos.de/ieggei2/ffdshow/)
I left the setting on bi-cubic defaults

Runs smooth as silk! Looks great! Now begins the endless tweaking! :)

Thanks AndyIEG!!!

Vern Dias
07-06-04, 07:42 AM
I have found this to be a common occurrence when I forget to uninstall a previous version before upgrading....

Vern

madpoet
07-06-04, 08:27 AM
Ditto here... I installed a new version over the top and suddenly had horrific stuttering at only 40% usage... I was pulling my hair out a bit ;).

Owen, if you don't mind me asking, what are you using for settings these days? We have very similar setups, so I'm curious if I can get the performance out of mine that you are.

bedo
07-06-04, 10:36 AM
Originally posted by hoops10
bedo: I have a similar setup to yours. What are all the settings that you are using? Thanks.

Sure, although the only thing I have enabled at this point is resize at 1440x960. Funny how some movies seem to have difficulty while others I can run denoise3d. I also have a PDI Deluxe and I have to say I prefer Zoomplayer + ffdshow.

Vern Dias
07-06-04, 04:09 PM
For those who want to see the banding I mentioned when using Luminance offset:

http://webpages.charter.net/tvdias/ffd-band.jpg

Zoom on the image to full size. The left half is unprocessed, the right half is processed at a Luminance offset of -10, luma gain of 132.

Look at the banding within each of the vertical bars. The width of the bands varies depending on the IRE level of the test pattern.

Vern

Spoonfed
07-06-04, 05:04 PM
I upgrade of the top every time (even downgrade some times) without problem

hoops10
07-06-04, 08:50 PM
bedo, the only filter you have enabled is resize? Do you plan on enabling BLur and NR, or Sharpen?

gamma_seraph
07-06-04, 11:42 PM
Could someone help me wrap my brain around this?

I own the X1 with a native resolution of 800x600.
I currently have switched my desktop between 800x600 and 1024x768 to test the picture.
I upconvert now to 1440x960 with FFdshow and ZoomPlayer.

Do I need to set the resolution of my desktop to 1440x960 in order to gain the maximum benefit of up converting? What is going on here?

I would appreciate if someone could give me the lowdown on the basic resolution setup.

I have no problem with FFDshow settings right now as it seems to be running just fine.

gazzagazza
07-07-04, 12:36 AM
Originally posted by gamma_seraph
Could someone help me wrap my brain around this?

I own the X1 with a native resolution of 800x600.
I currently have switched my desktop between 800x600 and 1024x768 to test the picture.
I upconvert now to 1440x960 with FFdshow and ZoomPlayer.

Do I need to set the resolution of my desktop to 1440x960 in order to gain the maximum benefit of up converting? What is going on here?

I would appreciate if someone could give me the lowdown on the basic resolution setup.

I have no problem with FFDshow settings right now as it seems to be running just fine.

If you're in NTSC land set your desktop to 800 x 600 60Hz. Then set ffdshow resize to 1440 x 960, use say Lanzcos parameter 3.

The graphics card will scale the image back to your desktop res which is the native res of the projector.

This scales up using ffdhsow (superior results) lets the graphics card scale down (which it will do well) and leaves the internal scaling in your PJ out of the loop.

Turn on the OSD in ffdshow to test if you are getting the resize you think you are. (then turn it off again)

cyberbri
07-07-04, 12:42 AM
Be sure to set your Zoom Player settings to always display at your 800x600 during full-screen playback.

gamma_seraph
07-07-04, 12:50 AM
If you're in NTSC land set your desktop to 800 x 600 60Hz. Then set ffdshow resize to 1440 x 960, use say Lanzcos parameter 3.

The graphics card will scale the image back to your desktop res which is the native res of the projector.

This scales up using ffdhsow (superior results) lets the graphics card scale down (which it will do well) and leaves the internal scaling in your PJ out of the loop.

Turn on the OSD in ffdshow to test if you are getting the resize you think you are. (then turn it off again)

Be sure to set your Zoom Player settings to always display at your 800x600 during full-screen playback.

Thanks guys. I will feel better about further tweaking now that I know I have the basic setup correct.

Owen
07-07-04, 04:19 AM
Vern,

I have not come across the banding problem you have found as I don’t use “Picture Properties” or “Levels” in FFDShow.
I have always calibrated my display with all video settings on the PC at default. This has always worked best for me.
I will sometimes make minor adjustments to Brightness or Color using Zoom Players Color controls to compensate for some DVD’s, but that is all.
Is there some reason why you don’t do the same?

I have just done my own tests here and can clearly see the problem you describe with Luminance offset.

Maybe Andy or Milan can get to the bottom of this problem.

Andy will need a gray scale test screen in Mpeg format as he does not have DVE or similar so I have made my own 9 second Mpeg clip of three relevant test patterns.

Andy, send me a PM if you want this 6.5Meg file.


Regards,

Owen

Owen
07-07-04, 04:32 AM
Originally posted by cetoole
Owen,
Did overclocking your 9600 pro make any changes in your system?

Overclocking my 9600 Pro to XT speed only made a marginal difference for VMR9 and none at all for Overlay as expected.

I also have my PCI and AGP busses overclocked to 40/80 and my PCI Latency timer set to 128.
These changes do help somewhat.

Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

Owen
07-07-04, 07:56 AM
Originally posted by madpoet
Ditto here... I installed a new version over the top and suddenly had horrific stuttering at only 40% usage... I was pulling my hair out a bit ;).

Owen, if you don't mind me asking, what are you using for settings these days? We have very similar setups, so I'm curious if I can get the performance out of mine that you are.


Madpoet,
I am using Elecard decoder and AC3 filter in Zoom Player with VMR9 and FFDShow settings of:

Denoise3d L0.5, C1.0, T5.0 HQ
Resize 1920x1728 for PAL (Bicubic, Parameter -3, no C or L sharpen)
YV12 output.



Regards,

Owen

___________________________
The FFDShow resize-sharpen dude.

madpoet
07-07-04, 08:07 AM
Thanks Owen. I couldn't remember if you were using AVISynth or not.

-MP

Vern Dias
07-07-04, 09:09 AM
I will sometimes make minor adjustments to Brightness or Color using Zoom Players Color controls to compensate for some DVD’s, but that is all.

Unfortunately, since I have to use VMR7 to obtain a high quality image without tearing or stuttering, ZP doesn't provide color controls to adjust.

As far as adjusting the monitor, it's a DLP connected via DVI and it is already calibrated for optimum black levels and no crushing from the desktop.

Since DVD's black/white/gamma levels are all over the place, FFDShow picture properties are required to work properly. I have ZP set up to automatically apply the proper FFDShow settings on a per DVD basis, which means I set FFDShow parameters for each DVD once and then save the definition file with all the AR, blanking, and FFDShow settings.

Any time I play the DVD, all the settings are automatically applied.

Vern

vpopovic
07-07-04, 10:14 AM
Originally posted by madpoet
Thanks Owen. I couldn't remember if you were using AVISynth or not.

-MP

I don't think there is need for Avisynth any more. Whatever Andy did in the latest SSE2 version, it fixed all scalling bugs so they are not easily noticable . The image with FFDShow Lanczos4 is actualy "crispier" than Avisynth's Lanczos4, and FFDShow if faster by a large margin. FFDShow Lanczos4 "rings" more than Avisynth, but overall I like the image with only FFDShow scaling better than only Avisynth, or Avisynth-FFDShow combo (as I used to run before). Avisynth 3.0 is in the works, but until then so long Avisynth.

MixTracks
07-07-04, 10:14 AM
I finally have made some sence to FFDShow on my system:

Athlon XP 2800+ OC'd 200 Mhz
Radeon 9800 Pro
1 gig ram

Monitor - Sammy DLP HLN437W via DVI desktop and TT set to 1280x720
TT DVD software
Latest version of FFDShow (non sse)

Using resize to double DVD res, and denoise 3D I get good performance watching movies, with no skipping.

Two questions though.

1. I get heavy stuttering with 4x3 content, either 4x3 menus or DVD's with 4x3 content, widescreen content and menus are fine. Anyone here expreience that? It seems that FFDShow would need a different setting for processing 4x3 content, like sensing what the content is, and using that different set of rules.

2. There are some really great FFDShow experts here, I would like to know what settings they use with a Sammy DLP, and see if I like what you all have better. I know.... Play around and see what you like best, but I am an audio guy, I leave PQ up to other experts. I have played and played, but I feel that other people here have gotten better results.

Thanks! :)

pontiacgagt
07-07-04, 10:58 AM
I get heavy stuttering with 4x3 content, either 4x3 menus or DVD's with 4x3 content, widescreen content and menus are fine

I get the same problems, It seems that with widescreen movies and my current settings

dnoise
resize 1440x960

my cpu runs about 80% (intel 2.4a OC to 2.7) If I throw in a fullscreen movie my cpu pegs at 100% and get stuttering.

Since most of my movies are widescreen I just adjust the settings when watching a fullscreen movie.

JD

pcgeek
07-07-04, 11:30 AM
Originally posted by pontiacgagt
I get the same problems, It seems that with widescreen movies and my current settings

dnoise
resize 1440x960

my cpu runs about 80% (intel 2.4a OC to 2.7) If I throw in a fullscreen movie my cpu pegs at 100% and get stuttering.

Since most of my movies are widescreen I just adjust the settings when watching a fullscreen movie.

JD

Stupid question but are you guys sure your 4:3 content is still 24fps? Most of the 4:3 DVD's I have are video material which has a 30fps frame rate (and most of the menus are as well). The higher frame rate will definetly increase the CPU utilization.

AndyIEG
07-07-04, 12:20 PM
"my cpu runs about 80% (intel 2.4a OC to 2.7) If I throw in a fullscreen movie my cpu pegs at 100% and get stuttering."

i think this is cause the most routines dont care about if the X horizontal resolution is higher they are well optimized for X, but all the Y loops arnt mmx or whatever optimized, so how higher the Y value is how more unoptimized Y loop stuff u trigger. So a fullscreen video prolly have a higher Y resolution compared to a widescreen movie and even if the X resolution is lower the movie still has to do more Y work wich might cause the problems.

Charles Black
07-07-04, 12:45 PM
MixTracks,

The 4:3 increase in CPU utilization is inevitable for the reason that Andy gave. If you look in some of the ffdshow filters you will notice a "Process whole Image" check box which should be unchecked. If checked the CPU utilization will go up significantly since ffdshow will think the black lines above and below the picture have video content.

Charlie

pcgeek
07-07-04, 12:54 PM
Actually, an anamorphic widescreen DVD has exactly the same number of video lines as a 4:3 DVD (480). Lettterbox also has 480 lines (I'm talking NTSC here) but some of it just doesn't have video content. FFDSHOW doesn't care about black or not unless someone's running a cropping algorithm before feeding it in and I've never seen that discussed.

Charles Black
07-07-04, 12:56 PM
Mark_A_W,

Beware of ffdshow gamma adjustment settings. I noticed that they didn't seem to be gamma functions at all. Any proper gamma control should leave black and white just where they were and only change the gain curve between black and White. When I checked ffdshow gamma adjustments some weeks ago they changed the blackpoint significantly. I decided to use Optical for gamma since it is very good. Also the gamma adjustment in Powerstrip seems OK.

Charlie

vpopovic
07-07-04, 03:44 PM
Originally posted by AndyIEG
"my cpu runs about 80% (intel 2.4a OC to 2.7) If I throw in a fullscreen movie my cpu pegs at 100% and get stuttering."

i think this is cause the most routines dont care about if the X horizontal resolution is higher they are well optimized for X, but all the Y loops arnt mmx or whatever optimized, so how higher the Y value is how more unoptimized Y loop stuff u trigger. So a fullscreen video prolly have a higher Y resolution compared to a widescreen movie and even if the X resolution is lower the movie still has to do more Y work wich might cause the problems.

Andy,

Any chance you might optimize Y axis routines? Would that produce significant speed boost? On my CPU going from 1920x1080 to 1920x1440 adds only about 10% load while resolution increase is about 33% (from 2.1 M pixels to 2.8 M pixels).

AndyIEG
07-07-04, 04:15 PM
nah its to much work to optimize Y routines, the compiler has to do this work. I could also be wrong and the speed problem is more related to the changed 30FPS problem.
Milan is working on new audio code and i mix. my free time with this offset/gamma problem and new resize test code.

N3W813
07-07-04, 04:25 PM
Originally posted by Charles Black
If you look in some of the ffdshow filters you will notice a "Process whole Image" check box which should be unchecked.

Hi guys,
I always thought that Resize filter SHOULD have the 'process whole image' checkbox checked, and other filters shouldn't.

Is this still the case??

n3wb

jvincent
07-07-04, 04:49 PM
I've always been using "process whole image" because when I didn't I had artifacts on the edge of the letterbox.

I'm pretty sure the problem is a 30FPS one. I can run Denoise3D + Lanczos4@1440x960 on my 2.5G P4 with crappy PC2100 memory for 24fps stuff just fine. CPU utilization is ~80%.

pbpatel98
07-07-04, 05:13 PM
I'm pretty sure the problem is a 30FPS one. I can run Denoise3D + Lanczos4@1440x960 on my 2.5G P4 with crappy PC2100 memory for 24fps stuff just fine. CPU utilization is ~80%.

I'm a little confused here. I thought PAL displays at ~25fps and NTSC ~30fps. When people are posting their high resize resolutions are they generally achieved using PAL format? Are the majority of people here viewing PAL format videos? Being in the US, I thought most DVDs are sold in the NTSC format and thought that was the more widely used format.

pcgeek
07-07-04, 05:20 PM
Originally posted by pbpatel98
I'm a little confused here. I thought PAL displays at ~25fps and NTSC ~30fps. When people are posting their high resize resolutions are they generally achieved using PAL format? Are the majority of people here viewing PAL format videos? Being in the US, I thought most DVDs are sold in the NTSC format and thought that was the more widely used format.

NTSC film after the 3:2 pulldown is back to the original 24fps of the source material. Video material is 30 fps. That's why you see a lot of posts for 24fps which is what most movie DVD's are.

-Pat

blackmax2k1
07-07-04, 05:59 PM
Originally posted by jvincent
I've always been using "process whole image" because when I didn't I had artifacts on the edge of the letterbox.

I'm pretty sure the problem is a 30FPS one. I can run Denoise3D + Lanczos4@1440x960 on my 2.5G P4 with crappy PC2100 memory for 24fps stuff just fine. CPU utilization is ~80%.

What decoders are you using?

jvincent
07-07-04, 06:02 PM
I use ffdshow with TheaterTek, so Sonic decoders.

Other varialbes are : I'm running XP Pro, with an AIW 9700 Pro, reclock, powerstrip, and DVDidle.

blackmax2k1
07-07-04, 06:33 PM
Anybody ever figure out what player uses more CPU, Zoomplayer or TheaterTek?

AndyIEG
07-07-04, 06:37 PM
Originally posted by blackmax2k1
Anybody ever figure out what player uses more CPU, Zoomplayer or TheaterTek?

Zoomplayer takes around 2-4% up to 8% on my AMD64 3000+ it depends if u use the internal audio filter or not. In general a good player should not use more than 5%.

jvincent
07-07-04, 06:38 PM
Originally posted by blackmax2k1
Anybody ever figure out what player uses more CPU, Zoomplayer or TheaterTek?

I don't ever remebering anyone comparing them, but I suspect that the TT/ZP overhead over and above the actual decoder pack is negligible.

One big difference is that TT is overlay only and many ZP folks are using VMR.

pbpatel98
07-07-04, 06:47 PM
Zoomplayer takes around 2-4% up to 8% on my AMD64 3000+ it depends if u use the internal audio filter or not. In general a good player should not use more than 5%.

AndyIEG,

What setup are you using to achieve 2-4% utilization on your AMD64 3000+ with ZP? Which audio/video decoders? Also which audio renderer are you referring to? Default DirectSound Device? I'm also assuming with those settings you have no additional filters set and your video renderer is Overlay Mixer? Finally are using 'customized' settings or a 'manual dvdgraph profile' in ZP?

I know that my ZP setup is using way more utilization (like 25%) on my AMD 64 3200+ using Sonic audio/video decoders, default directsound device audio render, no additional filters, and overlay mixer video render.

BangoO
07-07-04, 07:03 PM
I switched to VMR9, it looks gorgeous.
I resize at 1440*1440 and it's perfect.
I tried 1920*1920 or 1440*1920 but the image is then completely deformed and I see only a part of it. I use No Aspect Ratio Correction.
I use ffdshow-20040616_SSE2.exe, WinDVD6 Video filter and a Radeon 9700 Pro.

Any ideas ?

AndyIEG
07-07-04, 07:03 PM
oki i think u arnt comparing the player actual, u are comparing the dshow filters and decoders. Think of TT or zoomplayer like an interface for the installed renders and codecs/decoders.

The 2-4% was taken from a profile run wich can split every process running on your system into seperate cpu usage. The cpu usage u compare is not zoomplayer vs TT its whatever decoder/filter u setup in zoomplayer vs TT (Sonic mpeg decoders)

I was refering to your org. question means how many cpu usage the "interface" take to use those direct show filters.

25% sounds just oki with your setup and the sonic decoder.

I bet your question should be "What mpeg2 decoder takes what cpu usage"

As far as i know the opensource mpeg2 decoder is the fastest but has no deinterlacer so it wont connect to ffdshow. Elecard mpeg decoder is also heavy optimized, Sonic seems also pretty fast. Nvidia 3+, Windvd6 and Cyberlink use the most i think.

AndyIEG
07-07-04, 07:06 PM
Originally posted by BangoO
I switched to VMR9, it looks gorgeous.
I resize at 1440*1440 and it's perfect.
I tried 1920*1920 or 1440*1920 but the image is then completely deformed and I see only a part of it. I use No Aspect Ratio Correction.
I use ffdshow-20040616_SSE2.exe, WinDVD6 Video filter and a Radeon 9700 Pro.

Any ideas ?

try the latest sse2 version, but how high u can go with Y resolution depends on your video card drivers. For me everything above 1500 results in a black screen on my FX5700U. I can go up to 3500 with X but Y is limited.
Thats cause the driver need to support such high resolutions with vmr7/9

BangoO
07-07-04, 07:29 PM
I have the same problem with the latest Andy.

vpopovic
07-07-04, 07:59 PM
Originally posted by AndyIEG
nah its to much work to optimize Y routines, the compiler has to do this work. I could also be wrong and the speed problem is more related to the changed 30FPS problem.
Milan is working on new audio code and i mix. my free time with this offset/gamma problem and new resize test code.

Andy,

Too bad for Y optimization. I think you are on the right track. Going from 1920x1080 to 1920x1440 costs much less in relative CPU load increase (load per-pixel) than going to 2160x1080. Not to spoil anyone's party, but offset/gamma problem does not seem to be so widespread while need for speed is. As far as new resize, can you give us some info on what that might be.

Again, thanks for your hard work and congratulations on excellent results. There is nothing better (at least for me) when job is done well and results stand out.

llamameat
07-07-04, 09:14 PM
Andy,
Any word whether Milan will be adding an upsample routine to his audio version of ffdshow? There's a 'downmix', but that doesn't help when you want to go from 44.1khz to 48khz. It would be nice to be able to combine an upsample filter with ac3filter so I can encode mp3 files and hear them through my crappy ac'97 SPDIF.

Vern Dias
07-07-04, 09:53 PM
offset/gamma problem does not seem to be so widespread while need for speed is

Sorry Vlad, but I have to disagree. I have never run my system out of CPU, regardless of any reasonable FFDShow function usage (3.2Gig P4), but the offset problem is obvious on almost every DVD I play.

Vern

vpopovic
07-08-04, 01:51 AM
Vern,

Based on what I read on the forum most peope do run out of speed (most common question seems to be "why I cant resize to 2x DVD resolution?"), don't have 3.2 Ghz PCs, and don't ever use luma offset. If I was suggesting FFDShow development path that I am personaly interested in, it would be support for dual CPUs and XP 64-bit.

As I understand, FFDShow can now run multiple times in the graph, so you could have two FFDShows back to back using different CPUs. Blight said it is application level issue that he can't fix in ZP. Is this reasonable or excesive use of resize? Probalby excesive, but there is little pleasure in moderation.

BangoO
07-08-04, 07:32 AM
Andy, I can't go above something * 1536, otherwise I get a completely unwatchable image (it's either zoomed and deformed or I get only green bars).

I'm using VMR9 on a Radeon 9700 Pro video card, WinDVD 6 Video filter via ZP and the latest ffdshow...
Any ideas guys ?

Owen
07-08-04, 08:34 AM
Originally posted by BangoO
Andy, I can't go above something * 1536, otherwise I get a completely unwatchable image (it's either zoomed and deformed or I get only green bars).

I'm using VMR9 on a Radeon 9700 Pro video card, WinDVD 6 Video filter via ZP and the latest ffdshow...
Any ideas guys ?

Did you close and restart Zoom Player after adjusting resize settings?


Owen

BangoO
07-08-04, 08:55 AM
Of course ;)
By the way, I use Catalyst 4.3.

Vern Dias
07-08-04, 09:23 AM
As someone who has been in the computer business for a few dozen years, IMHO it is good programming practice to fix existing bugs BEFORE taking off on a feature/function binge.

When everything that comprises FFDShow today is working properly, then it is time to enhance the code.

Otherwise, eventually you'll wind up with an unstable, unsupportable, and ultimately unusable product.

Vern

vpopovic
07-08-04, 09:28 AM
Looks like it won't take textures larger than 2054x1536 for processing in 3D engine. For example, that is the same mess I get when I try to connect image in excess of 2000xXXXX to Nvidia overlay (that does not support input in excess of 2000xXXXX). That might be driver or hardware limitation. I am not sure how can you check that with ATI and if there is a way to "fix" the driver. For Nvidia there is utility NV Hardpage that will among other things report maximum texture card will take. It might be in one of the reviews as they disected 9700 Pro very thoroughly when it came out.

AndyIEG
07-08-04, 09:30 AM
Originally posted by BangoO
Andy, I can't go above something * 1536, otherwise I get a completely unwatchable image (it's either zoomed and deformed or I get only green bars).

I'm using VMR9 on a Radeon 9700 Pro video card, WinDVD 6 Video filter via ZP and the latest ffdshow...
Any ideas guys ?

like i sayed high Y resolutions have to be supported by the video drivers. I used 57.xx nvidea drivers and could not go over 1500 but i just tryed the new 61.80 and now im able to go up to 4000x4000. So its not a ffdshow resize problem its a video driver problem.

PS: before i get questions again, NO i only can run 4000x4000 as test, but its slow as hell (3fps)...

Hint: seems it also has something to do with the decoder or dvd mode. Since the 4000x4000 only works with xvid. On dvd i still cant go higher than 4000x2500 with VMR9 (vmr7 also wont connect but it worked for 4000x4000 and xvid?)...

vpopovic
07-08-04, 09:52 AM
Now if we could only do 4000x2500 with 24fps...

BangoO
07-08-04, 11:10 AM
Thx Andy... is someone able to run above 1536 on a Radeon card using VMR9 ?
With what catalyst ?

AndyIEG
07-08-04, 03:42 PM
Originally posted by Vern Dias
As someone who has been in the computer business for a few dozen years, IMHO it is good programming practice to fix existing bugs BEFORE taking off on a feature/function binge.

When everything that comprises FFDShow today is working properly, then it is time to enhance the code.

Otherwise, eventually you'll wind up with an unstable, unsupportable, and ultimately unusable product.

Vern

oki found the offset bug (little bug) will be fixed in the next release, thx to owen for the gray test movie. While im working on the picture prop. is there anything else bugged?

madpoet
07-08-04, 03:47 PM
My wife when I spend all night fiddling with this stuff... but I don't think anything short of tranquelizers is going to fix that ;)

Li On
07-08-04, 04:10 PM
Originally posted by Vern Dias
Look at the banding within each of the vertical bars. The width of the bands varies depending on the IRE level of the test pattern.


Under ffdshow Picture Porperties:

Gamma only works with Overlay (crash with VMR7/9)

Gain/Offset (contrast/brightness) only work with VMR7/9 (crash with Overlay)

And any Offset value at <0, vertical banding!

I still use a super old ffdshow 20031128.

regards,

Li On

Vern Dias
07-08-04, 04:16 PM
Li On: The crash is because you are using an ancient version of FFDShow. :(

AndyIEG: Thank you very much!!!! :)

Vern

Vern Dias
07-08-04, 07:00 PM
AndyIEG, I don't know if this is possible, but since you are deep into the properties code is there any way to make the gamma have less effect on the extremes of the luminance range? The curve that is used to modify the luma gain is too flat as it has way too much effect at the 10 and 90 IRE ends of the scale and not enough effect in the low-middle range. I don't know how difficult it would be to implement the change to make the gamma have more effect in the low-middle of the luma range and less at the ends.

See Charles Black's post on the previous page.

Hopefully, it's just a math function that is applied to the lookup process.

I'm not sure how to explain it better, but this is why the gamma seems to effect the black level. It doesnt change 0 and it doesn't change 100, but it seems to effect everything in between in a relatively linear way.

As an example, a 20 % increase in gamma should increase 30 IRE by 20% but 20 IRE by 10%, 15 IRE by 7.5%, 10 IRE by 2.5% 40 IRE by 10% 50 IRE by 5%, etc. These are not real numbers, but an example of the fact that gamma is supposes to make a curve of the linear ramp.

For a real example, go to your NVidia vido card control panel and play with the gamma control to see its effect on the amplitude graph.

Vern

Mastiff
07-08-04, 07:40 PM
Originally posted by madpoet
My wife when I spend all night fiddling with this stuff... but I don't think anything short of tranquelizers is going to fix that ;)

I always find that a few glasses of good white wine (I'm a strict beer man myself), the kids to bed and a filled hot tub will give me enough plus points for at least two hours of tinkering time... :cool:

AndyIEG
07-08-04, 08:12 PM
@ Vern Dias i understand, if i have time i will look into this

ah and new version up (normal mmx and SSE2) just few fixes:

004-07-09 Andy2222 (ffdshow-20040709_SSE2)

* Luma Offset/Gain fixed

2004-07-08 milan_cutka

* clear input buffer when decoding audio using libavcodec
* better keyboard handling in codecs and keys pages

2004-07-07 milan_cutka

* fixed aac decoding
* imported faad2 library
* working on liba52 integration
* better ac3 support

2004-07-06 milan_cutka

* working on audio part 2
* multithreaded encoding using libavcodec
* split long subtitle lines


PS: as usual bugs/crashes per pm, vern check if the luma thing is now working correctly.

Vern Dias
07-08-04, 09:28 PM
Andy, confirming the offset/luma bug is fixed.

Thanks again,

Vern

mnn1265
07-08-04, 09:47 PM
After trudging through a good portion of this thread my head is just spinning! I'm unable to find a "help" or ffdshow instructions document that explains what the various settings do... anyone have one?

Is the ffdshow intended only for advanced users?

genietime
07-08-04, 10:24 PM
mnn1265: Once you get into it it's not so bad. For me trying out the ffdshow options was relatively easy it was setting it up to work (and even knowing I needed Zoomplayer, etc) that I needed to figure out. Are not sure on ffdshow settings or how to set it up?
Here's a link to a beginner's guide Here... (http://htpcnews.com/main.php?id=ffdshowdvd_1)
You can get a better understanding of the settings from reading this thread but this covers some of the basics.

Be warned once you ffdshow you'll never go back!

mnn1265
07-09-04, 12:21 AM
Many thanks genietime, just what I was looking for! I'm about half way through the guide and things are much more clear now... enough so that I can actaully read this thread and not be totally lost.

From what I've seen so far I'm already hooked!

nm88
07-09-04, 01:55 AM
Great work on the new version (20040709_SSE2). Are you still doing optimizations on resize or denoise3d? My CPU utilization is down about 5% and tearing/artifacts with VMR9 are rarer.

madpoet
07-09-04, 08:17 AM
I agree, better CPU utilization on the new version. I'm capable of getting some insane resizes now if I want them ;)

hoops10
07-09-04, 08:25 AM
For those of you using the non-sse2 ver of ffdshow, what filters are you using? Just 'Blur and NR (using denoise3d with luma=0.5, chroma=0.5, time=5 and HQ checked)' and 'Resize and Aspect (to whatever resolution you use, Keep original aspect ration checked, and under Settings, Bicubic method checked with Luma Sharpen and Chroma Sharpen both at 1.10)'? Are any of you using the 'Sharpen' filter?

dedwards
07-09-04, 08:31 AM
I've gotten FFDshow to work smoothly with TheaterTek, and the results are impressive so far. The extra detail in Monsters, Inc. is amazing.

What is the easiest way to do an A - B comparison of the image with and without FFDShow? Is it possible to turn it on and off on the fly? Or do you make screen captures with and without? In the HTPCNews article, that is what he seems to have done.

Thanks for the info - this is a killer program!

DE

BangoO
07-09-04, 08:37 AM
If you don't use a resize, then you can turn off all the filters and compare.
Otherwise, you need to make captures and resize the "ffdshow resized" one and compare them.

madpoet
07-09-04, 08:38 AM
There are a couple ways... yes, you can change some of the settings on the fly (not resize, but most of the others I believe). You can also set a bookmark in the DVD at the point you want to make a screenshot, then pause and go to the bookmark and take the shot with and without.

dedwards
07-09-04, 09:56 AM
Thanks for the replies - I am doing a resize, so I will try the bookmark method. Seems like the best way to insure you're getting to exactly the same spot.

DE

thomaspf
07-09-04, 10:35 AM
AndyIEG

since you are fiddling with optimizations right now...

When I looked at the CPU load of ffdshow on a multiproc machine I realized that is only uses a single processor to run. What is the potential to enable multiproc and get much more cycles even on a HT CPU?

I would like to keep my ffdshow settings for hires material but I am running out of steam when using Zoomplayer with 1900x1080p material. The perf show 100% on one part of the HT CPU and constantly zero on the second.

Cheers

Thomas

The_smokester
07-09-04, 10:36 AM
dedwards,
You also have the option of applying the filters to half the screen. Look for the check box near the top right of the page.

pcgeek
07-09-04, 11:40 AM
Originally posted by thomaspf
AndyIEG

since you are fiddling with optimizations right now...

When I looked at the CPU load of ffdshow on a multiproc machine I realized that is only uses a single processor to run. What is the potential to enable multiproc and get much more cycles even on a HT CPU?

I would like to keep my ffdshow settings for hires material but I am running out of steam when using Zoomplayer with 1900x1080p material. The perf show 100% on one part of the HT CPU and constantly zero on the second.


A second vote for looking at this given that you are having problems keeping the pipeline full (too few registers) with the latest sse2 code. You should be able to split processing for the top and bottom half of an image without too much effort (from the REALLY quick glance I took at the code). I'd try but can't figure out how to get it to compile correctly and don't have the sse2 optimizations. The hardest part is going to be the synchronization of the threads and making sure that synchronizing them doesn't add too much latency to the process (time critical threads help here).

-Pat

BangoO
07-09-04, 06:14 PM
Hyperthreading optimisation would definitely be great Andy ;)

AJC13B
07-09-04, 06:42 PM
I grabbed the 2 latest versions of FFDshow from AnyIEG and it seems FFDshow keeps causing Zoomplayer to crash, well thats what the error report says.

Any ideas? Im a total newb to this stuff so please be gentle :)

blackmax2k1
07-09-04, 06:51 PM
Originally posted by BangoO
Hyperthreading optimisation would definitely be great Andy ;)

I second that!

Abstrakt
07-09-04, 08:59 PM
SMP support would be a godsend! When running FFDShow one of my HTPC's Xeon processors sits completely idle, while the second one nearly maxes out.

Without getting too fancy, it would already be a huge help if we could run each filter on a different processor. Resize on one, Denoise on the other for instance...

Cheers.

.....
07-09-04, 09:27 PM
Originally posted by AJC13B
I grabbed the 2 latest versions of FFDshow from AnyIEG and it seems FFDshow keeps causing Zoomplayer to crash, well thats what the error report says.

Any ideas? Im a total newb to this stuff so please be gentle :)

In the codec section of ffdshow, scroll down to the bottom and select "all suported" for raw video.

BangoO
07-10-04, 04:33 AM
Simple question :)
What's the difference between "VMR9 Windowless" and "VMR9 Windowed" and which one should be used ?

pcgeek
07-10-04, 08:20 AM
Originally posted by Abstrakt
SMP support would be a godsend! When running FFDShow one of my HTPC's Xeon processors sits completely idle, while the second one nearly maxes out.

Without getting too fancy, it would already be a huge help if we could run each filter on a different processor. Resize on one, Denoise on the other for instance...

Cheers.

This probably wouldn't work too well becuase it's a pipeline of operations. You need to complete the denoise before starting the resize. You could theoretically do it but you'd end up with a pipeline that started to add more delay to getting the images out (at least 1 frame behind). Additionallly, different filters use different amounts of CPU so you'd also get less bang for your buck this way.

It would be a lot cleaner to treat the frame as 2 distinct images split vertically (one on top of the other) and send each one of these through the processing pipeline on their own thread (and put it together on the other side). The beauty is that looking at the ffdshow code this should also be fairlly trivial to implement and try (assuming I could actually buiild it).

Don't necessarily expect to get a huge gain on HT machines though. I'm noot sure how efficient the context swaps are given that it's really still a single processor and that combinned with the thread synchronization may actually end up performing worse. A real MP machine on the other hand (or dual core when they start coming out for x86) should get a pretty goood benefit.

-Pat

Spoonfed
07-10-04, 09:34 AM
whoa..........100 pages :)

AndyIEG
07-10-04, 09:56 AM
im not an MP/HT expert, but for HT i know that its still only a single processor and HT is only designed if u have 2 programms open and want to ensure that both are working at good speed. The main loops in ffdshow are fully asm optimized and at this layer of code HT virtually dont exist. I also think that this "second cpu shows no cpu usage" is an irrelevant value. The asm code will make use of the cpu and all its power and the system will not hold back some magical power for HT and if we dont use HT it will be wasted. So for now simple ignore this fake CPU usage and HT.

For real multi CPU platforms we need openMP code but i think a AMD64 or better SSE2 version should be have higher priority than full support for multi processor system's?

pcgeek
07-10-04, 10:14 AM
Actually HT also benefits any multithreaded program and wasn't just targeted at multiple processes. I do agree that the % CPU utilization is useless but HT may have some benefit. If the CPU stalls for any reason with the current ASM code then it could benefit from HT as the OS/CPU can just schedule the other thread to fill in the gaps. By running 2 different threads of execution through the processor pipeline at the same time they get a better fill rate and can actually keep the pipeline full. I only mention this because I remember a comment about you having troubles keeping the P4 pipeline full for the SSE2 version because of a llack of registers (I assume this is named registers). HT can help get around this and won't bump into the same register problem because internallly the CPU actuallly has a lot more registers, they just can't be addressed directly.

pcgeek
07-10-04, 10:37 AM
FWIW, I did a little bit oof searching for Photoshop benchmarks and HT tests and most of them show from 10-25% improvement in filter performance for Photoshop when HT is enabled (Photoshop has multithreaded filter support for some of the filters). Photoshop filters are probably the closest comparison to what ffdshow does (particularly resize) so I do believe there is room for improvement (though not 100%). It's also possible that the engineers that did the filters didn't do as good a job as Andy has and that's why they benefited more but given the focus on performance and it's use by a lot of CPU manufacturers for bragging rights I assume it's had quite a few knowledgeble people looking at it.

-Pat

Valnar
07-10-04, 11:28 AM
whoa..........100 pages

Exactly! 100 pages hardly qualifies as a FAQ. Can we start a new thread with the suggested settings for FFDShow after this has been bantered back and forth forever?

-Robert

AndyIEG
07-10-04, 01:41 PM
Originally posted by pcgeek
FWIW, I did a little bit oof searching for Photoshop benchmarks and HT tests and most of them show from 10-25% improvement in filter performance for Photoshop when HT is enabled (Photoshop has multithreaded filter support for some of the filters). Photoshop filters are probably the closest comparison to what ffdshow does (particularly resize) so I do believe there is room for improvement (though not 100%). It's also possible that the engineers that did the filters didn't do as good a job as Andy has and that's why they benefited more but given the focus on performance and it's use by a lot of CPU manufacturers for bragging rights I assume it's had quite a few knowledgeble people looking at it.

-Pat

Intressting, seems i have to study the intel HT whitepapers and see what HT is doing behind the scenes.
Its still not this importand since HT is not supported onr AMD CPU's, even if i find something to improve i cant test the code or do test runs... since i dont own a P4 :( so prolly no HT stuff from me. I can prolly check the openMP support in the new Visual Studio 2005 and if this helps.

BangoO
07-10-04, 01:54 PM
For those who claim that a 0.5 1.0 5.0 HQ Denoise3D does not alter the sharpness, here is an example:

Original Image, no filter applied:
http://img44.exs.cx/img44/9361/originale-zoom.jpg

On the left, 1440*1440 lanczos resize (1.10 luma and no chroma sharpen) and nothing else
On the right, 1440*1440 lanczos resize (1.10 luma and no chroma sharpen) + Denoise3D 0.5 1.0 5.0 HQ
http://img38.exs.cx/img38/2186/no-denoise3d-zoom.jpg http://img38.exs.cx/img38/6953/denoise3d-zoom.jpg

Full captures are available here for the original image (http://img44.exs.cx/img44/9640/originale.jpg), here for the resize without Denoise3D (http://img24.exs.cx/img24/4638/no-denoise3d.jpg) and here for resize + Denoise3D (http://img24.exs.cx/img24/3258/denoise3d.jpg).


Now... I have 2 questions :)
1. Anyone knows the difference between VMR9 Windowless and VMR9 Windowed in ZP ?
2. I read that if only a resize is done, there is no color conversion as the signal stays in YUY2 all along... what about using a resize + a Picture Properties (in order to set the gamma) ?

JBlacklow
07-10-04, 02:32 PM
I only have an answer for your first question, here it is from Blight's ZP documentation:
Due to a major bug in DirectX-9, you should probably used the VMR9 Windowed mode when experimenting with VMR9 as the windowless support is broken in DirectX and you won't be able to navigate the DVD Interface using the mouse until microsoft fixes this issue.

Jah-Wren Ryel
07-10-04, 03:02 PM
Originally posted by AndyIEG
Intressting, seems i have to study the intel HT whitepapers and see what HT is doing behind the scenes.
Its still not this importand since HT is not supported onr AMD CPU's, even if i find something to improve i cant test the code or do test runs... since i dont own a P4 :( so prolly no HT stuff from me. I can prolly check the openMP support in the new Visual Studio 2005 and if this helps. The naive approach of just splitting the image in half horizontally and running an entire filter pipeline on each cpu might be the quickest way to get some performance benefit out of going parallel. I hadn't heard that Visual Studio was supporting OpenMP, but if so, that's great news and even an incomplete OpenMP should be useful to do such a simple parallelization since it ought to be what's often called, "emabarrasingly parallel."

One slight optimization to that approach would be to have the code that stitches the two halves back together for display keep a short history of which cpu finishes first (since it will *probably* want to wait for both halves to be completed before handing it off to the video card) and use that information to dynamically decide where to split the next frame image (e.g. 50/50 or maybe 40/60) because it may easily be that one half requires more work than the other, depending on the content and the previous few frames' workload is likely to be the cheapest predictor of the next frame's workload.

Not to get too detailed but, I say *probably* because pipelined writes to video memory might make it advantageous to start copying the half that completes first to video memory ASAP and worry about the synchronization on compeletion of the writing rather than at the start. -- Just a thought.

Since AMD64's are "easy" to do in up to 8-way configurations (ok, but not yet affordable to most mortals for htpc use), and AMD has said that multi-core cpus are due late next year, getting a handle on parallelization now is probably worthwhile.

========
Another idea to ponder is optimized filter combinations. As a general rule it is better to load the data into registers once and do as many operations on it as possible before writing it back out to memory rather than do a little work, write it to memory and then come back later and read it back in to do a ltitle more work.

So, it *might* make sense to take the most popular filter chains and make special optimized versions that do the whole chain in one pass rather than multiple passes. For example, Denoise3D + Lanczos resize. You trade off flexibility for performance. Not that I think the individual filters should go away either, just optimize for the common case from the "big picture" (no pun intended) perspective, and keep the individual filters around for people who need something other than the common case.

Note, I haven't looked at the algorithms, much less the actual code so it may not be feasible to implement such a combined filter chain, I'm just putting the idea up for consideration.

hoops10
07-10-04, 03:09 PM
I agree with Valnar's suggestion of a separate thread for ffdshow settings and results.

nm88
07-10-04, 03:41 PM
Originally posted by AndyIEG
I also think that this "second cpu shows no cpu usage" is an irrelevant value. The asm code will make use of the cpu and all its power and the system will not hold back some magical power for HT and if we dont use HT it will be wasted.
It won't though, you need to multiprocessor code to actually max out the processor because of hyperthreading. A multithreaded application will run much faster with hyperthreading on, meaning the CPU will get a lot more work done in the same time. And on a stability test, when you run two Prime95 instances, the CPU gets hotter and shows more stability problems with two simultaneous Prime95 processes, because the CPU is working harder.

Ffdshow already does this to an extent. When I activatate YUY2 output on ffdshow on a P4, an extra 20% or so of CPU usage appears in CPU 0 utilization whereas the rest of the code appears in CPU 1. This allows smooth playback without stuttering even though the combined usage is 90% or more. On an Athlon64 system, if I do the same thing, it will just reach its limit and start stuttering.

Goi
07-10-04, 03:57 PM
Originally posted by Jah-Wren Ryel
The naive approach of just splitting the image in half horizontally and running an entire filter pipeline on each cpu might be the quickest way to get some performance benefit out of going parallel. I hadn't heard that Visual Studio was supporting OpenMP, but if so, that's great news and even an incomplete OpenMP should be useful to do such a simple parallelization since it ought to be what's often called, "emabarrasingly parallel."

One slight optimization to that approach would be to have the code that stitches the two halves back together for display keep a short history of which cpu finishes first (since it will *probably* want to wait for both halves to be completed before handing it off to the video card) and use that information to dynamically decide where to split the next frame image (e.g. 50/50 or maybe 40/60) because it may easily be that one half requires more work than the other, depending on the content and the previous few frames' workload is likely to be the cheapest predictor of the next frame's workload.

Not to get too detailed but, I say *probably* because pipelined writes to video memory might make it advantageous to start copying the half that completes first to video memory ASAP and worry about the synchronization on compeletion of the writing rather than at the start. -- Just a thought.

Since AMD64's are "easy" to do in up to 8-way configurations (ok, but not yet affordable to most mortals for htpc use), and AMD has said that multi-core cpus are due late next year, getting a handle on parallelization now is probably worthwhile.

========
Another idea to ponder is optimized filter combinations. As a general rule it is better to load the data into registers once and do as many operations on it as possible before writing it back out to memory rather than do a little work, write it to memory and then come back later and read it back in to do a ltitle more work.

So, it *might* make sense to take the most popular filter chains and make special optimized versions that do the whole chain in one pass rather than multiple passes. For example, Denoise3D + Lanczos resize. You trade off flexibility for performance. Not that I think the individual filters should go away either, just optimize for the common case from the "big picture" (no pun intended) perspective, and keep the individual filters around for people who need something other than the common case.

Note, I haven't looked at the algorithms, much less the actual code so it may not be feasible to implement such a combined filter chain, I'm just putting the idea up for consideration.

Hmm...sounds like some of the dual card configurations of yesteryear and now. How about taking a page off 3dfx's history books and doing it on alternate scan lines? That should ensure a fairly equal workload on each CPU.

johnbrisbin
07-10-04, 04:25 PM
Originally posted by Goi
Hmm...sounds like some of the dual card configurations of yesteryear and now. How about taking a page off 3dfx's history books and doing it on alternate scan lines? That should ensure a fairly equal workload on each CPU.

In many respects, alternate scan lines is the worst way to divide work.

Virtually any filtering or scaling algorithm makes heavy use of adjacent pixels to determine its result. When you use scan line interleaving, all the vertically nearest pixels are outside the domain of the current task.

So when you put alternate scan lines in different tasks, each task must then use all the scan lines, increasing memory bandwidth requirements by a factor approaching 2.

Instead, you minimize the edge case (where a task must refer to pixels just outside its active range) when you simply split the screen vertically into an upper and lower halves, as nVidia does in their new SLI implementation.

This leaves a single edge that must be accounted for and minimizes redundant loads.

Ryokurin
07-10-04, 05:31 PM
another thing you need to remember about HT processors is that it still is single core. if your doing something that requires that one part to be done before the other its not going to make a difference, and if one part of the core is tied up with a process that the other process needs then the same thing applies. This is often why HT can be slower than normal processing and even if the app was made to be mulitthreaded it wont help in this case. This is part of the reason why AMD has not done it (and they have had a patent for this since the early 90s)

gazzagazza
07-10-04, 09:42 PM
Originally posted by BangoO
For those who claim that a 0.5 1.0 5.0 HQ Denoise3D does not alter the sharpness, here is an example:


This is exactly what I found... denoise 3D removes skin detail, making it look less real. I now just use resize... If the DVD is noisy, I'll live with it.

Martin Blank
07-10-04, 10:06 PM
gazzagazza, I'm not an expert at this, but when I compare the pictures posted of the scaled & scaled+denoised to the original, the scaled+denoised seems closer to the original.

Although, in playing around in my own setup, I've been debating the same thing -- skin does seem more plastic with the denoise.

bcherb2
07-10-04, 11:59 PM
Ive been trying multiple setting with ffdshow and zoomplayer with a ripped dvd and Ive gotten some pretty horrible results. My computer doesnt lag too bad on displaying a 1440x960 image, but any image over the standard dvd resolution looks blurry. Looks just like its out of focus or something! If I try to go real high on the sharpen I get big blocks that are noticeable.

What am I not doing to get a sharp image like you all are? :-(

-Ben

gazzagazza
07-11-04, 01:26 AM
Originally posted by Martin Blank
Although, in playing around in my own setup, I've been debating the same thing -- skin does seem more plastic with the denoise.

Thats EXACTLY it, plastic perfectly describes the effect... this comparison is complicated by the added sharpening in the processed images.

Donjo
07-11-04, 01:50 AM
I am having some ffdshow problems...
http://www.avsforum.com/avs-vb/showthread.php?s=&threadid=421586
if anybody can help that would be great :)