Yes, you'll need to add an envelope for each voice, as otherwise it will always sum all the voices (velocity stays at the same value after gate goes down). But as you seem to use an ADSR afterwards too, you might use a more simpler envelope inside the sub-patches like an AHR (so no decay and sustain-level stage, just an attack, then a sustain on full level and release when gate goes off. You could also use a glide/smooth-module for this too, connected to the gate signal, then multiply this one with the velocity).
But also:
have you edited the settings of the sub-patch?
Go to view->settings in the top-bar of the patcher.
Here you can set the settings of the sub-patch.
There are a couple of important things you can setup here:
-first one I would try (in relation to your problem): check the box "has midi selector" and set the subpatch entry to "poly".
This allows for more settings of the sub-patch in the front-panel-controls.
-but also important are the settings for preset and modulation. If you don't use them, ALWAYS set these to 0 as it saves you a ton of space and allows you to build bigger patches.
Apart from these settings, it's always a good idea to add a c* (multiply)  module at the end so you can attenuate the output. The output is initially set to clip/saturate at "64" (you can also un-set this in the above mentioned sub-patch settings, but this also allows the output to go tóo high for modules after your sub-patch), but if you play 4 voices, the chance is pretty high you go over the clipping value. That's why the outgoing signal should be attenuated (1/4 in this case if the source signal goes from -64 to 64, like the sine oscillator) to make sure it cannot go beyond the clipping level. If the source only goes from -32 to 32, you could just do with 1/2 for 4 voices.
I also see some patch-cables go from modules that are positioned lower then the modules they go to. This should be avoided as this puts the code of the module áfter the code of the module it goes to, so leaving you with a 1 sample-buffer delay (which cán be made use of if some signal needs to be delayed by 1 buffer though).
In case of control voltages which do not change, it isn't that big of a problem (although it does add an extra "memory", taking up memory resources), but when they change, they always fall behind 1 sample-buffer (which is 16 audio-rate samples). Especially triggers/envelope signals should not be going up, as then your triggers/envelopes will always be 1-sample-buffer too late.
eg. this is happening with your envelope-signal in the main patch going down to the inverse module for the bottom voice and then up again to the filter. Also the filters themselves are 1 line above the summing stage, so also these are 1 buffer behind->together the envelope response will be lagging 32 audio-rate samples by now on the other voices.