I am doing realtime audio DSP with a Motu 16A running in Linux. It is connected via USB 2.0. I wouldn't say that the Linux support via USB 2.0 is flawless, but it works very well for my needs. Currently I matrix process 5.1 channels to 12 (soon to be 16) channels out. I have been running in realtime mostly 24/7 for about 6 months and haven't experienced an XRUN yet. However, see my notes below. The CPU is pretty lightly loaded at ~20% on one core, as I'm mostly using biquads and a handful of 6k FIR filters. I'll probably eventually get around to trying some larger FIR filters, which may bump up the demand a bit.
The biggest issue I'm aware of is that sample rate changes must be initiated on the Motu unit itself. I may be wrong, but I don't think you can configure the host to act as the master clock. I'm not sure this is a flaw of the Motu so much as a limitation of controlling is as though it were a soundcard via USB 2.0. The AVB-enabled devices are specifically designed to allow interconnection with low-latency realtime streaming, and connection via AVB (and possible Thunderbolt) may allow the PC host to change the sample rate if the PC host is also the clock master. With that said, if you are using the units for analog-to-analog processing, this is of no concern because you will probably just pick one sampling rate and use it for everything.
If you are comfortable with writing code and are familiar with JSON APIs, the A16 and related units appear to support full access/control via a JSON API. I have used it via scripts to automate changes to the routing matrix and am happy to say it works great so far.
As a warning, achieving reliable low-latency realtime processing can be very challenging on a PC. If you can tolerate a lot of processing latency, you might be OK. A clear advantage to JRiver, currently, is that it handles video as well as audio and so can delay the video if the audio pipeline has a lot of lag. For gaming, however, high audio processing latencies may be unacceptable. I made quite a few changes to my Linux OS configuration on the PC I use for processing. Among other things, I had to reorder my hardware interrupt priorities (can one even do this in Windows?), and alter the CPU governor in order to pass my fitness tests.
I was surprised to learn that you were trying to test this without the actual hardware. You really need to test with the actual hardware. I also strongly recommend against trying to do realtime processing in a VM in general. There are obscure VM hosts that are specifically designed to simulate realtime systems (albeit at much slower than realtime speeds), but this is a totally different thing from running a live realtime process within a guest on a typical desktop/server hypervisor. Also, depending on your latency needs and the amount and type of processing you use, there's a decent chance that a large chunk of or even the majority of your CPU time is spent on overhead rather than actual processing. Overhead rises rapidly as you decrease latency because you are processing everything in smaller chunks.
Good luck with this. If you get it working well, you'll have an enormously flexible (and inexpensive!) many-channels DSP solution. I've barely tapped the capabilities that I have, but I will get there soon.