Yeah, part of the problem does lie in patching and programming a beast like that. It's pretty daunting. Even a simple poly setup on my Digisound (basically, one VCO into one VCF into one VCA, same individual EG per voice for timbral and amplitude envelope, same LFO per voice for modulation), while it's fairly simple to patch, usually involves quite a bit of tweaking to get the balances and tunings right.
As for the difference, it sort of depends on how the VCF is being used, and how the voicings work. If the polyphony is very tight, and the VCF isn't doing anything really high-Q, it can sound much the same as normal polyphony. But when the VCF settings per voice start to get critical, as in something at near-breakthrough in resonance, then the drawbacks start to become apparent. Also, if the polyphony is spread widely, this too will be noticeable.
Really, this works better in actual polysynths, where everything can be under microprocessor control, and where everything is all chip-based. That way, the hardware costs get minimized, programming is less of a pain, and everything behaves more tractably. An example of one polysynth where this doesn't exactly happen is in the SEM-based Oberheims, where you technically have several individual synths under one master programmer's control, and once again, you can quickly find yourself in knob-tweaking hell trying to get the several SEMs to match up exactly. There are ways around that, though, but the SEM Oberheim polys are such an esoteric thing that that example's almost moot (even though Oberheim reissued them in recent years).