Some of the more pricey options might have some form of intelligent recognition, but frame by frame is generally how it's done. And even with the pricey options and the processing involved, it wouldn't really be faster than human interaction, until computers get significantly faster, it just takes the human out of the picture. Even if there is such advanced software, there will be some content that it's unable to cope with, requiring the old human frame by frame stuff. So you really don't gain much unless you do it often enough to warrant the expense.
Or just use a camera with very shallow DOF so 80% of the frame is already blurred.