Well, it seems stable here without the delay. Maybe things were cleaned up further down since you tried.
Sample and hold is basically what the name implies, the voltage coming out is sampled and held at a specific value for a while. This means you can free up the DAC. This is the SP1200 schematic to see how simple it is to implement:
The AD7523 is not actually the main DAC, it's used as a digital gain control -- the actual audio signal comes as a voltage from another DAC further up and is used as the voltage reference. IC98 converts the output current to a voltage. That's not important, so we'll just refer to what comes out of IC98 as 'the output signal' and everything before it up until the data input from the sound memory as 'the DAC'.
First, the output DAC here, unlike the codec you'd use in a modern system, does not have a clock per se. It just changes its output value when you change its input, like the MCP4x22. So what we can do is send several signals interleaved, first a single sample from signal 1, then a single sample from signal 2, etc. The analogue signal the results from this is obviously messy, a bunch of signals overlaid.
This output signal goes into the 4051. If you don't know this chip, it has one common terminal that can be connected to one of eight other terminals, based on a 3 bit digital input. The 4051 works both ways, one of many inputs to a single output, or a single input to one of 8 destinations.
Remember we have exact control and knowledge of when the DAC converts a sample, so we can generate a signal to control the 4051 that's exactly in time with when the input to the DAC changes, ie., when the output signal is a sample from signal 1, the 4051 switches the voltage to output 1, and then just as the DAC changes its output to a sample from signal 2, the 4051 switches the voltage to output 2, etc.
After each output on the 4051 is a capacitor to ground and a simple non-inverting op-amp buffer. This is what holds the value. The size of the capacitor determines how long the value is held, and it needs to be replenished, so you have a limited to cycle through and thus a limited number of outputs, depending on the sample rate etc., which is obviously again limited by how fast your DAC is.
The SP1200 is by no means the only system to do this, it was quite common in 80's gear since DACs were stupid expensive back then.
For Axoloti, what we need is an object with 8 or however many inputs that has a buffer and continously changes a DAC and keeps the multiplexer in sync. The code I wrote was bare-metal with the ST libraries and used DMA and a timer running at the sample rate multiplied by the number of voices. I'm new to ChibiOS so still haven't figured out exactly how to do it in this environment, but I'm sure it's feasible somehow.