DIYing a DSP Processor engine. (Solution) - Page 2 - AVS Forum | Home Theater Discussions And Reviews
Forum Jump: 
 18Likes
Reply
 
Thread Tools
post #31 of 192 Old 11-01-2016, 11:33 PM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by DreamWarrior View Post
Cool little project -- at one point I was thinking about leveraging Nvidia's CUDA platform to compute multi-channel FIR filters without burdening the CPU. I figured it'd allow more channels and taps at higher sample rates than the CPU could handle given the GPU's propensity for high FLOPs.

In fact, I thought it'd be really cool to be able to use an HDMI capture card (with a HDFury to strip the protection) and then turn the PC into an all-digital processor. Of course, that brings up the lack of any non-payware decoders for Dolby TrueHD / DTS MA (and none for Atmos / DTS-X) that I know of, so...bit-streaming and object audio is out of the question at the moment.

Alas, I have no time to experiment. But, maybe if you get this going in any sort of capacity I'll be inclined to muck around, lol.
IIRC, the main issue with this approach is that round trip latencies are likely unpredictable and quite high. These cards are made for gaming or relatively long computing tasks, so there is not likely to be much optimization for latency, particularly with respect to the retrieval of results at the end. Of course, things could have changed since I last looked.
awediophile is offline  
Sponsored Links
Advertisement
 
post #32 of 192 Old 11-02-2016, 01:11 AM - Thread Starter
AVS Forum Special Member
 
BassThatHz's Avatar
 
Join Date: Apr 2008
Location: Northern Okan range (NW Cascades region)
Posts: 7,267
Mentioned: 74 Post(s)
Tagged: 0 Thread(s)
Quoted: 2050 Post(s)
Liked: 1587
Version 0.21 is done
https://drive.google.com/open?id=0Bw...1lfN21kcjFOQXc

When the CPU is below 40% it runs in a single thread, locked to a single core.
When it goes above 40% it spans all the cores (and will eventually launch a thread for each output channel in that mode.)

I added the CPU Usage, and AF and Internal Thread counts to the GUI so that you can see when it shifts into high gear.

You can see it working as per the screenshots.



Here is the code that makes that happen (in case you are wondering.)

I chose a 1 second timer to poll, that way it doesn't need an infinite loop, nor does it place a huge burden on the CPU.
I also reset the thread priority every second, just in case the user tries to downgrade it manually.

The hardest part was figuring out the hexadecimal bit shifting. (I don't normally do Int-to-Hex-to-Binary multiplication. LOL!)
AffinityMask = (0x0001 << ProcessorCount) - 1;

Code:
private void timer_CPU_Usage_Tick(object sender, EventArgs e)
        {
            SetProcessToRealTimePriority();

            var TotalCPU_Usage = DisplayCPU_Usage();

            if (int.Parse(TotalCPU_Usage) > 40)
            {
                SetAffinityToAllCores();
            }
            else
            {
                SetAffinityTo1Core();
            }
        }

        private string DisplayCPU_Usage()
        {
            string TotalCPU_Usage = "0";

            try
            {
                ManagementObjectSearcher searcher = new ManagementObjectSearcher("select * from Win32_PerfFormattedData_PerfOS_Processor");
                foreach (ManagementObject obj in searcher.Get())
                {
                    var name = obj["Name"].ToString();
                    if (name == "_Total")
                    {
                        var usage = obj["PercentProcessorTime"].ToString();
                        TotalCPU_Usage = usage;
                        break;
                    }
                }
                lblCPU_Usage.Text = TotalCPU_Usage;
            }
            catch(Exception ex)
            {
            }
            return TotalCPU_Usage;
        }

        private void SetProcessToRealTimePriority()
        {
            using (var p = Process.GetCurrentProcess())
                p.PriorityClass = ProcessPriorityClass.RealTime;  
        }

        private void SetAffinityTo1Core()
        {
            Process Proc = Process.GetCurrentProcess();
            long AffinityMask = (long)Proc.ProcessorAffinity;
            AffinityMask = 0x0001;
            Proc.ProcessorAffinity = (IntPtr)AffinityMask;

            ProcessThread Thread = Proc.Threads[0];
            AffinityMask = 0x0001;
            Thread.ProcessorAffinity = (IntPtr)AffinityMask;
            lblAssignedThreads.Text = @"AF1/T1";
        }

        private void SetAffinityToAllCores()
        {
            Process Proc = Process.GetCurrentProcess();
            long AffinityMask = (long)Proc.ProcessorAffinity;

            //Set it to all
            AffinityMask = (0x0001 << ProcessorCount) - 1;
            Proc.ProcessorAffinity = (IntPtr)AffinityMask;

            ProcessThread Thread = Proc.Threads[0];
            AffinityMask = (0x0001 << ProcessorCount) - 1;
            Thread.ProcessorAffinity = (IntPtr)AffinityMask;
            lblAssignedThreads.Text = "AF" + ProcessorCount + @"/T1";
        }
Attached Thumbnails
Click image for larger version

Name:	8763.jpg
Views:	316
Size:	245.2 KB
ID:	1748881   Click image for larger version

Name:	536.jpg
Views:	318
Size:	345.1 KB
ID:	1748889  
BassThatHz is offline  
post #33 of 192 Old 11-02-2016, 01:49 AM - Thread Starter
AVS Forum Special Member
 
BassThatHz's Avatar
 
Join Date: Apr 2008
Location: Northern Okan range (NW Cascades region)
Posts: 7,267
Mentioned: 74 Post(s)
Tagged: 0 Thread(s)
Quoted: 2050 Post(s)
Liked: 1587
I think what I'm gonna do is make a small multi-threaded c# test-app and see how many bytes-per-second I can generate running BiQuads on all cores loaded to 100% CPU on my 4C/T8 i7. That's pretty much gonna be the best-case scenario for my box.
8 threads going as fast as they can in .Net
Then we can see just how fast C# is (or isn't.)

2 channels of 16bits@44100 is 1411200bits per second or 176.4kB/s.

So activating all 24 channels of a Motu 24AO is gonna require a RAM speed of 2.2MB/s
My RAM does over 15GB/s read/write so it's plenty fast for any audio stream. (The ram and CPU won't be the bottleneck, C# will... for sure!)

I'll write that app tomorrow and see what I get. If it hits 1000Mbps that would be perfect! That would be as fast an a AVB stream doing 256-512 channels that Motu talks about...

No sense going any further until I can prove that C# is fast enough. Otherwise I'll have to write the BiQuads in C++ (which isn't a big deal...)

All you have to do is go File -> Add -> New -> C++ Project
and then link them together, were talking like 10 minutes of work...
Attached Thumbnails
Click image for larger version

Name:	888.jpg
Views:	318
Size:	113.7 KB
ID:	1748897  

Last edited by BassThatHz; 11-02-2016 at 01:52 AM.
BassThatHz is offline  
 
post #34 of 192 Old 11-02-2016, 04:13 AM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
Quote:
Originally Posted by awediophile View Post
IIRC, the main issue with this approach is that round trip latencies are likely unpredictable and quite high. These cards are made for gaming or relatively long computing tasks, so there is not likely to be much optimization for latency, particularly with respect to the retrieval of results at the end. Of course, things could have changed since I last looked.
hqplayer has a CUDA offload feature (http://www.signalyst.com/consumer.html) which I believe it uses for various upsampling algorithms, no idea what the RTT is like
3ll3d00d is online now  
post #35 of 192 Old 11-02-2016, 12:41 PM
AVS Forum Special Member
 
DreamWarrior's Avatar
 
Join Date: Nov 2005
Posts: 1,174
Mentioned: 13 Post(s)
Tagged: 0 Thread(s)
Quoted: 339 Post(s)
Liked: 250
Quote:
Originally Posted by BassThatHz View Post
*snip*
I believe the ASIO output buffer is actually interleaved between the channels, so literally ever X'th element in the array would be written to by the X'th thread. I don't think this would be a problem, if I recall correctly from the past, writing like that doesn't cause any problems as the other threads aren't trying to write or read to those memory locations, at all.
(Just don't EVER change the array size without first stopping the audio stream and all the threads. )
*snip*
If the ASIO output buffer is interleaved than you'd need a barrier on the threads to make sure they've all computed their part of the buffer before you send it to the driver. Lacking that, you're just hoping the processing gets done in time and, if it doesn't, you'll have to hope there isn't garbage on those channels when the buffer is consumed. No?

That said, I suppose if you use a lock-free barrier then you'll keep your threads CPU-bound. But, it does introduce some unpredictability that is probably undesirable in a real-time application. So, unless you really can't compute all your filters on a single core within your time-allotment, I wouldn't bother with multi-threading.
Quote:
Originally Posted by awediophile View Post
IIRC, the main issue with this approach is that round trip latencies are likely unpredictable and quite high. These cards are made for gaming or relatively long computing tasks, so there is not likely to be much optimization for latency, particularly with respect to the retrieval of results at the end. Of course, things could have changed since I last looked.
I'd agree they probably are unpredictable, but I'm not sure they are high. However, I'd guess that a FIR with enough taps to warrant GPU offload would be sufficiently latent that any additional latency required to buffer a few frames and overcome any CUDA unpredictability would go unnoticed.

That said, it may not be worth the effort considering the CPU is probably capable of computing FIRs fine enough on its own. Especially if using the SIMD instruction sets. I suppose starting there probably makes more sense in most cases.
Quote:
Originally Posted by 3ll3d00d View Post
hqplayer has a CUDA offload feature (http://www.signalyst.com/consumer.html) which I believe it uses for various upsampling algorithms, no idea what the RTT is like
Interesting. I saw a bunch of stuff using CUDA for FIRs but there wasn't much for practical working examples. Of course, a player of this kind could buffer a bit to make up for any latency without being very noticeable.

So, awediophile could still have a good point if one were attempting to use the device to real-time process an audio stream with low-latency (say sufficiently low to align with a video stream as would be required in an HT processor).
DreamWarrior is offline  
post #36 of 192 Old 11-02-2016, 01:50 PM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
from a purely functional point of view, i.e. the requirement is "n channels with fir filter of x taps and target delay of y ms" and any considerations around personal satisfaction of rolling your own are worthless, I don't really understand why people don't just try brutefir on an appropriately configured linux box first? I mean years ago it was as fast as a soundcard for a stereo setup and CPU power has come on by miles since then. Anyone seen (or run themselves) any recent benchmarks for brutefir?
3ll3d00d is online now  
post #37 of 192 Old 11-02-2016, 01:55 PM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by 3ll3d00d View Post
hqplayer has a CUDA offload feature (http://www.signalyst.com/consumer.html) which I believe it uses for various upsampling algorithms, no idea what the RTT is like
Who knows? A player application does not have the time constraints that a realtime processor has. What latency there is only really affects how long it takes for the player to respond when the user does a seek or a track change. Delays of several hundred milliseconds are likely to be entirely acceptable.
awediophile is offline  
post #38 of 192 Old 11-02-2016, 02:08 PM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
Quote:
Originally Posted by awediophile View Post
Who knows? A player application does not have the time constraints that a realtime processor has. What latency there is only really affects how long it takes for the player to respond when the user does a seek or a track change. Delays of several hundred milliseconds are likely to be entirely acceptable.
of course a music player has different requirements, I'm not sure of the relevance of that to hqplayer though (given that it also has an embedded mode on linux, possibly not obvious from the linked site perhaps?)
3ll3d00d is online now  
post #39 of 192 Old 11-02-2016, 02:11 PM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by DreamWarrior View Post
I'd agree they probably are unpredictable, but I'm not sure they are high. However, I'd guess that a FIR with enough taps to warrant GPU offload would be sufficiently latent that any additional latency required to buffer a few frames and overcome any CUDA unpredictability would go unnoticed.
That depends entirely on the FIR filter. A causal FIR filter (one that does not depend on output in the future) of any length does not introduce any extra latency. Such filters are still very useful for crossovers, signal shaping, and room EQ, and may be preferred where low latency operation is required. (I.e., video playback where video buffering is not available; live sound; and gaming.)
awediophile is offline  
post #40 of 192 Old 11-02-2016, 04:03 PM
AVS Forum Special Member
 
DreamWarrior's Avatar
 
Join Date: Nov 2005
Posts: 1,174
Mentioned: 13 Post(s)
Tagged: 0 Thread(s)
Quoted: 339 Post(s)
Liked: 250
Quote:
Originally Posted by awediophile View Post
That depends entirely on the FIR filter. A causal FIR filter (one that does not depend on output in the future) of any length does not introduce any extra latency. Such filters are still very useful for crossovers, signal shaping, and room EQ, and may be preferred where low latency operation is required. (I.e., video playback where video buffering is not available; live sound; and gaming.)
I thought a linear phase FIR necessarily adds delay. Granted, I suppose the FIR doesn't need to be linear phase, but isn't that the useful thing it can do that IIR can't?

I'm not schooled on signal processing, though...so...I'm just here to learn and observe. I can stand on the shoulders of giants (e.g. the NAudio peeps) and build software, though, lol.
DreamWarrior is offline  
post #41 of 192 Old 11-02-2016, 10:36 PM - Thread Starter
AVS Forum Special Member
 
BassThatHz's Avatar
 
Join Date: Apr 2008
Location: Northern Okan range (NW Cascades region)
Posts: 7,267
Mentioned: 74 Post(s)
Tagged: 0 Thread(s)
Quoted: 2050 Post(s)
Liked: 1587
With a single thread I got 9520Mbps, but that was with no DSP.
Attached Thumbnails
Click image for larger version

Name:	535.jpg
Views:	269
Size:	134.0 KB
ID:	1750833  

Last edited by BassThatHz; 11-02-2016 at 11:02 PM.
BassThatHz is offline  
post #42 of 192 Old 11-03-2016, 12:54 AM - Thread Starter
AVS Forum Special Member
 
BassThatHz's Avatar
 
Join Date: Apr 2008
Location: Northern Okan range (NW Cascades region)
Posts: 7,267
Mentioned: 74 Post(s)
Tagged: 0 Thread(s)
Quoted: 2050 Post(s)
Liked: 1587
It processed 32 channels of 4kb buffers with 5 BiQuads each channel, with 32 threads, in just 63ms or 66.6Gbps...
Running on a i7 4C/8T @ 3.7Ghz
40MB of ram.


With a 2k buffer it's 36ms.
With a 1k buffer it's 47ms.
With a 0.5k buffer it's 28ms.

The more cores you have in your machine the faster my app will run...

I highly recommend a Xeon E7-8890 for this task!
http://www.pcworld.com/article/30795...computers.html

32 channels of 192khz @ 24bits requires 148Mbps to avoid dropping bits.
But the Motu 24 only has 24, so it only needs 111Mbps to avoid dropping bits.
Attached Thumbnails
Click image for larger version

Name:	4324.jpg
Views:	249
Size:	564.1 KB
ID:	1750937  

Last edited by BassThatHz; 11-03-2016 at 01:43 AM.
BassThatHz is offline  
post #43 of 192 Old 11-03-2016, 02:09 AM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by DreamWarrior View Post
I thought a linear phase FIR necessarily adds delay. Granted, I suppose the FIR doesn't need to be linear phase, but isn't that the useful thing it can do that IIR can't?

I'm not schooled on signal processing, though...so...I'm just here to learn and observe. I can stand on the shoulders of giants (e.g. the NAudio peeps) and build software, though, lol.
Linear phase EQ is one thing FIR filters can do that IIRs can't, and yes it does add some delay.

Actually some clarification of terminology is in order here. What is typically desired instead of linear phase is zero-phase EQ, that is EQ that changes frequency response while leaving phase response alone. A zero phase filter impulse response is symmetric about t=0, so it depends on samples in both the past and future. As such, a zero phase filter cannot be realized without adding delay. If the IR is merely shifted right in time so that it's first non-zero sample starts at t=0, this will introduce the required delay and make it causal. Note that the phase response of a filter that does nothing but delay the signal is linear versus frequency, so the effect of the zero phase filter convoluted with a delay is a linear phase filter.

Other acausal filters may also be useful in room correction or to cancel out unwanted non-minimum phase effects elsewhere in the chain. These can also be time-shifted to turn them into causal filters with an additional linear phase shift. (The output of the time-shift isn't linear unless the input is.)

In any case, even causal FIR filters (including minimum phase) have substantial advantages over biquads, and in some instances, biquads have advantages over FIRs. It's a "right tool for the job" sort of thing.
awediophile is offline  
post #44 of 192 Old 11-03-2016, 02:40 AM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by 3ll3d00d View Post
from a purely functional point of view, i.e. the requirement is "n channels with fir filter of x taps and target delay of y ms" and any considerations around personal satisfaction of rolling your own are worthless, I don't really understand why people don't just try brutefir on an appropriately configured linux box first? I mean years ago it was as fast as a soundcard for a stereo setup and CPU power has come on by miles since then. Anyone seen (or run themselves) any recent benchmarks for brutefir?
I've discussed some of my reasons in the past. Perhaps the most important reason is that I already had a lot of code written for audio measurement, analysis, and simulation. Adding realtime processing capability to this suite of tools turns it into a powerful integrated package.

The BruteFIR code was not written to be used as a library, and so it would have required more effort to incorporate its capabilities into my system than for me to simply roll my own convolver. Even ignoring all that, BruteFIR doesn't support biquads, which I find to be useful even with FIR capability. How about limiters? How about run-time manipulation of filter characteristics beyond mere gains? And what happens if I run out of CPU on a single core? (I expect I eventually will with my appetite.) The author himself concedes that substantial improvements would require a code overhaul.

Lastly, I had enough knowledge experience in this area to know what I was doing and know that I could get results in a short time. I can't say that's true for the vast majority of programmers, including the OP who would be better off following your advice here, IMO (no offense meant).

But he'll have to answer for himself.
awediophile is offline  
post #45 of 192 Old 11-03-2016, 02:58 AM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by 3ll3d00d View Post
of course a music player has different requirements, I'm not sure of the relevance of that to hqplayer though (given that it also has an embedded mode on linux, possibly not obvious from the linked site perhaps?)
AFAICT, the embedded mode is just the music player without the GUI, so the user can basically roll their own UI for it. Under the hood, it's still a player. Please correct me if I'm missing something here. (The info on "embedded mode" is quite sparse.)
awediophile is offline  
post #46 of 192 Old 11-03-2016, 02:58 AM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
@awediophile I meant people in general as opposed to your specific case (which involves functionality not readily available off the shelf, ie a valid reason to roll your own).
3ll3d00d is online now  
post #47 of 192 Old 11-03-2016, 03:01 AM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by 3ll3d00d View Post
@awediophile I meant people in general as opposed to your specific case (which involves functionality not readily available off the shelf, ie a valid reason to roll your own).
Well now I'm curious. Who else is working on this sort of thing? I didn't think there were that many of us.
awediophile is offline  
post #48 of 192 Old 11-03-2016, 03:04 AM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
Quote:
Originally Posted by awediophile View Post
Well now I'm curious. Who else is working on this sort of thing? I didn't think there were that many of us.
I think some crossed wires here there are posts in this thread expressing interest and there have been a few posts about rolling your own processor, eg for atmos use. My point was just that one can build such a thing today using existing packages.
3ll3d00d is online now  
post #49 of 192 Old 11-03-2016, 03:23 AM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
Quote:
Originally Posted by awediophile View Post
AFAICT, the embedded mode is just the music player without the GUI, so the user can basically roll their own UI for it. Under the hood, it's still a player. Please correct me if I'm missing something here. (The info on "embedded mode" is quite sparse.)
Agree docs are sparse but it appears you can run it headless and control it programmatically, if so then it should be possible to wire it up as an audio processor that sits there churning away at this input stream. No idea whether this would work reliably mind you.

As an aside, given the cost of server grade Intel hardware, I suggest putting a limit on your filters will be good for your wallet
3ll3d00d is online now  
post #50 of 192 Old 11-03-2016, 04:30 AM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
Quote:
Originally Posted by awediophile View Post
And what happens if I run out of CPU on a single core? (I expect I eventually will with my appetite.)
I'm curious what your use case is for that.

To recap brutefir supports multiple threads (albeit as forked processes) but each filter is run by a single threaded process, this means you can easily statically allocate filters to threads which should be fairly easy to do as the filter and convolver configuration is static so should present a known load which is amenable to that sort of static allocation. If you want to really run it to the max and have the cores then use isolcpus to isolate the OS to the cores nearest the bus to avoid interrupt handling in your hot threads, turn off hyperthreading and any and all p state stuff in the bios (probably implies server grade hardware though hence many $$$) & then taskset your way through each filter to pin them to those specific cores (not sure if brutefir has a config option for this built in, probably not given when it was written).

If this is still not enough CPU for a single filter then that implies you need more cores than (filter) channels, the hardware required to achieve this for HT use would be incredibly expensive, would run extremely hot and cane a load of power. Obviously some people are extremely wealthy and prone to excess but this seems a little OTT

Another alternative is you have individual filters that are extremely expensive (computationally) and others which are not. This might be more feasible though I'm not sure what sort of setup would require that.
3ll3d00d is online now  
post #51 of 192 Old 11-03-2016, 05:29 AM
Member
 
Join Date: Oct 2016
Posts: 17
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 11 Post(s)
Liked: 0
What is your plan for the end result? Are you trying to go directly from your PC to your amplifiers in order to eliminate the need for a reciever.
dscoker is offline  
post #52 of 192 Old 11-03-2016, 05:35 AM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
Quote:
Originally Posted by BassThatHz View Post
With a single thread I got 9520Mbps, but that was with no DSP.
fwiw microbenchmarking is pretty hard to do in general, a benchmark like this is really not a reliable way to measure things
3ll3d00d is online now  
post #53 of 192 Old 11-03-2016, 01:06 PM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by 3ll3d00d View Post
Agree docs are sparse but it appears you can run it headless and control it programmatically, if so then it should be possible to wire it up as an audio processor that sits there churning away at this input stream. No idea whether this would work reliably mind you.

As an aside, given the cost of server grade Intel hardware, I suggest putting a limit on your filters will be good for your wallet
I'm sure it works perfectly reliably, but you must accept a long latency due to all the buffering. It's no different than buffering a network audio stream in order to tolerate intermittent hiccups. In the old days of streaming, the buffer size was configurable so you could decide on the balance between resilience and latency. Avoiding all pauses and skips often required buffering at least a few seconds worth of content. That led to a very high latency, which primarily manifested in a longer startup time because a certain amount had to be buffered from the sender before playback could start. Buffering on the sender side did help reduce these startup times by allowing data to be sent at faster-than-realtime speeds to receivers undergoing initialization, but this just effectively added latency between the content production (if live) and its transmission. Of course, if the broadcast was live and especially time sensitive, a few seconds of latency could be excessive, but most of the time long latencies were perfectly acceptable in this application.

For realtime through-processing of analog sources and digital sources without flow control capability, however, latencies of a few seconds are rarely acceptable. In some cases, a few hundred milliseconds may be OK, but not if you are trying to play games or are playing video and have no way to delay the video.

I haven't looked recently, but is server grade hardware really that expensive? I didn't feel that way a couple years ago when I built my storage server. I am in the USA, if that matters. A plus for server hardware is ECC memory support, which reduces the chance of intermittently faulty RAM causing your audio interface to blast your ears with full scale white noise.
awediophile is offline  
post #54 of 192 Old 11-03-2016, 01:30 PM
AVS Forum Special Member
 
3ll3d00d's Avatar
 
Join Date: Sep 2007
Location: London, UK
Posts: 2,780
Mentioned: 97 Post(s)
Tagged: 0 Thread(s)
Quoted: 1576 Post(s)
Liked: 564
High core count, high clock speed xeons are way into 4 figures each over here (between £1-7k each) Motherboard is £500 at least. I imagine it could be done on a more reasonable budget by going for kit a few years old. No idea what US prices are like, probably not as bad as here!
3ll3d00d is online now  
post #55 of 192 Old 11-03-2016, 02:19 PM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Quote:
Originally Posted by 3ll3d00d View Post
I'm curious what your use case is for that.

To recap brutefir supports multiple threads (albeit as forked processes) but each filter is run by a single threaded process, this means you can easily statically allocate filters to threads which should be fairly easy to do as the filter and convolver configuration is static so should present a known load which is amenable to that sort of static allocation. If you want to really run it to the max and have the cores then use isolcpus to isolate the OS to the cores nearest the bus to avoid interrupt handling in your hot threads, turn off hyperthreading and any and all p state stuff in the bios (probably implies server grade hardware though hence many $$$) & then taskset your way through each filter to pin them to those specific cores (not sure if brutefir has a config option for this built in, probably not given when it was written).

If this is still not enough CPU for a single filter then that implies you need more cores than (filter) channels, the hardware required to achieve this for HT use would be incredibly expensive, would run extremely hot and cane a load of power. Obviously some people are extremely wealthy and prone to excess but this seems a little OTT

Another alternative is you have individual filters that are extremely expensive (computationally) and others which are not. This might be more feasible though I'm not sure what sort of setup would require that.
The application is matrix processing. For example, 8 inputs to 16 outputs could utilize up to 128 independent processing pipelines.

And I guess you are right that BruteFIR does support multicore processing via multiple forked processes. This is a detail I had missed, especially given the authors comment about it needing a re-write to support "a more modern design (using threads instead of forked processes)". I'm not sure what the appeal of using threads as opposed to forked processes is. In Linux at least, the distinction between threads and processes are very minor anyway. It looks like they use SHM for data sharing. I wonder how BruteFIR handles synchronization? I can peek at the code again to find out.

Either way, I made my decision long ago. For others, BruteFIR looks to be a very attractive option.
awediophile is offline  
post #56 of 192 Old 11-04-2016, 01:22 AM - Thread Starter
AVS Forum Special Member
 
BassThatHz's Avatar
 
Join Date: Apr 2008
Location: Northern Okan range (NW Cascades region)
Posts: 7,267
Mentioned: 74 Post(s)
Tagged: 0 Thread(s)
Quoted: 2050 Post(s)
Liked: 1587
At work, our computers don't use Windows, iOS or any variant of Linux. (Not at ring-0 at least.) I'll leave you guessing what we use.

I wouldn't call Windows nor Linux secure, the only way to get REAL security is actually writing your own OS that processes only trinary on a quantum CPU, or simply disconnecting the computer from the internet and LAN altogether and AB-epoxying all the USB, sata and PCI ports shut, and a read-only BIOS. But now we are getting way off topic.

I'm really only doing this because I'm bored and have nothing better to do with my life.
I'm building a GUI mostly for AVS'ers, but I think I'll be registering and running mine in as a background service.

I am trying to AVOID having a dedicated Linux audio-processing machine though.
While I'm not against Linux, I just don't have any need for it (other than perhaps for real-time processing audio if I can't get it working under Windows.)

The only thing I do with my computer is surf the web, read emails, download and listen to 2-ch music, run the odd random app (usually in a protected VM), and play the odd game.

Windows does all of this without me needed to manually download and install a thousand separate modules that were coded by a bunch of monkeys, mavericks, 12 year olds, and man-childs living in their moms basement.

The Linux kernel is well written, it's all of the other stuff that needs to goes with it, is just terrible because of who coded those add-ons. That has been my experience with Linux.

Where as most of the Windows Updates are "mostly" coded by people who wear a white collar every day. They actually get fired if they do a bad job. (or at least they should... ). My linux updates weren't made by someone who has a thousand nose rings, black lip stick, and goes by the name of l33tzor.

That said, there is a very good reason why Google is written in custom Python and C++ compiler running a custom Linux kernel, and not Windows; that's because they needed to be able to search the whole internet and return a result in <1 second. Windows isn't good at that sort of high-volume high-speed task. There is a reason why ALL the Top 500 supercomputers never run Windows.

But for normal everyday people, linux is overkill and/or a waste of time. (Unless you enjoy spending all day typing out SSH commands and installing a zillion gz files, and being unable to open or save Office 2020 files.)

Ignoring gaming, there is absolutely nothing that puts any strain or load on my PC. I basically have a mini super-computer sitting here doing nothing but surfing the web.

As for business software, MOST can often tolerate delays of up-to 24 hours (like Payroll for example). If they are running their own Ebay or FedEx, or flight controller, or Dating Site matching CUDA engine, then speed is more important. But even then, a delay of 500ms isn't gonna really "break" anything. They aren't running a Stock Exchange or a Missile silo. There are definitely "varying levels" of what is traditionally considered "mission-critical" or "required security" or "speed-critical" or "HA/FT" etc. It depends entirely on what the company/business is doing or wanting to achieve.

That's about all I have to say about that for now... Now back to DIY DSP'ing
BassThatHz is offline  
post #57 of 192 Old 11-04-2016, 01:38 AM
Member
 
frenchfries's Avatar
 
Join Date: Jan 2016
Posts: 57
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 26 Post(s)
Liked: 14
Quote:
Originally Posted by BassThatHz View Post
At work, our computers don't use Windows, iOS or any variant of Linux. (Not at ring-0 at least.) I'll leave you guessing what we use.



I wouldn't call Windows nor Linux secure, the only way to get REAL security is actually writing your own OS that processes only trinary on a quantum CPU, or simply disconnecting the computer from the internet and LAN altogether and AB-epoxying all the USB, sata and PCI ports shut, and a read-only BIOS. But now we are getting way off topic.



I'm really only doing this because I'm bored and have nothing better to do with my life.

I'm building a GUI mostly for AVS'ers, but I think I'll be registering and running mine in as a background service.



I am trying to AVOID having a dedicated Linux audio-processing machine though.

While I'm not against Linux, I just don't have any need for it (other than perhaps for real-time processing audio if I can't get it working under Windows.)



The only thing I do with my computer is surf the web, read emails, download and listen to 2-ch music, run the odd random app (usually in a protected VM), and play the odd game.



Windows does all of this without me needed to manually download and install a thousand separate modules that were coded by a bunch of monkeys, mavericks, 12 year olds, and man-childs living in their moms basement.



The Linux kernel is well written, it's all of the other stuff that needs to goes with it, is just terrible because of who coded those add-ons. That has been my experience with Linux.



Where as most of the Windows Updates are "mostly" coded by people who wear a white collar every day. They actually get fired if they do a bad job. (or at least they should... ). My linux updates weren't made by someone who has a thousand nose rings, black lip stick, and goes by the name of l33tzor.



That said, there is a very good reason why Google is written in custom Python and C++ compiler running a custom Linux kernel, and not Windows; that's because they needed to be able to search the whole internet and return a result in <1 second. Windows isn't good at that sort of high-volume high-speed task. There is a reason why ALL the Top 500 supercomputers never run Windows.



But for normal everyday people, linux is overkill and/or a waste of time. (Unless you enjoy spending all day typing out SSH commands and installing a zillion gz files, and being unable to open or save Office 2020 files.)



Ignoring gaming, there is absolutely nothing that puts any strain or load on my PC. I basically have a mini super-computer sitting here doing nothing but surfing the web.



As for business software, MOST can often tolerate delays of up-to 24 hours (like Payroll for example). If they are running their own Ebay or FedEx, or flight controller, or Dating Site matching CUDA engine, then speed is more important. But even then, a delay of 500ms isn't gonna really "break" anything. They aren't running a Stock Exchange or a Missile silo. There are definitely "varying levels" of what is traditionally considered "mission-critical" or "required security" or "speed-critical" or "HA/FT" etc. It depends entirely on what the company/business is doing or wanting to achieve.



That's about all I have to say about that for now... Now back to DIY DSP'ing


Wow that's a skewed view. Most Linux development, the kernel at least, is done by employees of large firms such as red hat, oracle, IBM etc. even Microsoft directly contributes...

The assertion that it's only "freaks" is way off base.

That being said those one man band developers make some cool crap. Eg. Filebot, which is a great GUI for renaming media files.

Moreover, these one man bands often give far better support than large multinationals because they actually care. For them it's more than a job, it's often a hobby which means they have pride.

</Rant over>


Sent from my iPad using Tapatalk
frenchfries is offline  
post #58 of 192 Old 11-04-2016, 07:49 AM
AVS Forum Special Member
 
LastButNotLeast's Avatar
 
Join Date: Feb 2007
Location: 08077
Posts: 7,848
Mentioned: 33 Post(s)
Tagged: 1 Thread(s)
Quoted: 1520 Post(s)
Liked: 1127
Quote:
Originally Posted by BassThatHz View Post
Where as most of the Windows Updates are "mostly" coded by people who wear a white collar every day. They actually get fired if they do a bad job. (or at least they should... ). My linux updates weren't made by someone who has a thousand nose rings, black lip stick, and goes by the name of l33tzor.
Wishful thinking, Bass.

Michael
BassThatHz likes this.

Did you really need to quote that entire post in your reply?
Downloadable FREE demo discs: Demonstration Blu-Ray Discs (Independently Authored)
Welcome to AVS - Get out while you still can!
For most of the time, the here and now is neither now nor here. Graham Swift
LastButNotLeast is offline  
post #59 of 192 Old 11-05-2016, 01:17 AM
Member
 
awediophile's Avatar
 
Join Date: Oct 2013
Posts: 181
Mentioned: 4 Post(s)
Tagged: 0 Thread(s)
Quoted: 128 Post(s)
Liked: 51
Well because someone brought up the comparison: If I had to give one overpowering reason I favor Linux over Windows it's that Linux is far more DIY-friendly. This, rather than performance, is likely also the primary reason Google uses it to run their back-end tools and why it is such a strong player in the server world.

As a developer, I strongly prefer Linux, even on my Desktop. That's not to say that it's a good desktop OS for most users, even developers. (There's OS X for that.) At the same time, it's far superior to Windows in a server / operational context. One my of last jobs at a big Internet web site company, we had plenty of Windows expertise on staff, but we kept a minimum number of said boxes in our server room (to support mainly stuff that the admin/marketing side of the house relied on) because they were so much more painful to work on when they broke. I remember how much relief we felt when we migrated one of them off of some clunky hardware onto a VM.

Also, many of you probably use Linux all the time without realizing it. It's a very common embedded systems OS, running on TVs, media players, cars, and so on. I wouldn't be surprised if the AVB Motu models actually run Linux inside them. And as end users there is no pain to use this stuff at all because all the functionality is accessible through a domain specific user interface.
3ll3d00d likes this.
awediophile is offline  
post #60 of 192 Old 11-05-2016, 07:35 AM
Member
 
Join Date: Oct 2016
Posts: 17
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 11 Post(s)
Liked: 0
So can anyone tell me what this diy custom dsp application is going to get you over and above what is already possible with JRiver? I have always been under the impression that JRiver has the most extensive dsp possibilities of any product on the market. I cant imagine that you would need anything more extensive. What can this diy dsp application do that JRiver can not?
dscoker is offline  
Sponsored Links
Advertisement
 
Reply DIY Speakers and Subs

Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off