One fundamental problem is dealing with FFT operations in a low-latency framework.
In Axoloti, the whole dsp-chain needs to be processed in 0.333 milliseconds, and contains 16 audio samples. Common FFT-based manipulations work with audio blocks of 256, 512, or larger buffers. Expecting a whole FFT-based manipulation on a large buffer to finish within the cpu time available for 16 samples is not reasonable.
One approach is collecting a large buffer in the low-latency loop, and signalling a separate thread that does the fft-process at a lower priority, outside the dsp-loop. But that only works out for fft-based spectrum analysis, there is no guarantee that anything finishes in time for synthesis. "spectral/rfft 128" uses this approach.
Another approach is slicing the fft computation of a large buffer into pieces that fit in the low-latency process. I believe the CMSIS fft functions are not useful for this, it's a deep adventure in code.