Vocal emulation

lokki · 2018-01-18 08:36:33 UTC

check this out!

would be nice to port this to axolotiland! does somebody know neil?

toneburst · 2018-01-18 13:14:05 UTC

It's a lot of fun!

I'm not sure if the voice model would run on the Axoloti- there's a big difference between the processing power of the Axoloti and a desktop computer. Also, the GUI is great- not sure how well the whole thing would work, without the nice interface.

Having said all that, I'd love to see if it could be done. I love those vocals-ey sounds.

a|x

lokki · 2018-01-18 14:00:42 UTC

yeah, i will try to contact him and see if he would share the code...
the ui is not so important i think, mapping those controls to 4 or 5 cc's should be fun as well..

toneburst · 2018-01-18 14:48:53 UTC

It's JavaScript, so the code will be visible. It will probably be minified though, so not very readable.

I think you'd need quite a few controls, I think. If you could get hold of the basic synthesis code, you could probably work out a method to control the parameters of the model that made more sense in the context of controlling it via MIDI CCs. Simply having a knob for every parameter of the model probably won't make much sense.

In text-to-speech systems, these kind of models are usually fed parameter streams made from analysed natural speech.

a|x

Gaznesh · 2018-01-24 05:43:44 UTC

I'm thinking an X/Y touchpad could be implemented to continuously control the tongue placement. Triggering the flips would be easy and a foot pedal or velocity keystroke could influence the palate/lip position. That way, you still have the different notes on the keyboard to play/sing a tune!

toneburst · 2018-01-24 12:32:58 UTC

Sounds great! Getting it to speak or sing comprehensably would require a lot of work, I think. Reminds me a bit of The Voder, an early manually-operated vocoder, which required a year of solid training to master the multiple hand and foot-operated controls.

On the other hand, if you just want to make cool vocal-esque sounds, it would work very well, I think!

a|x

lokki · 2018-01-25 07:26:46 UTC

yeah, just some cool vocal-esque sounds... i looked at the code it seems not to use streamed data.

janvantomme · 2018-01-25 09:19:26 UTC

Browsers have text-to-speech and speech-to-text built in nowadays, accessible via JavaScript. It would be a huge effort to port a full speech engine to the Axoloti.

lokki · 2018-01-25 09:44:30 UTC

i'm pretty convinced it is not that, did you look at it? it is not text to speech...@toneburst did an lpc port a while back for the axoloti...

toneburst · 2018-01-25 09:56:29 UTC

I’m pretty sure it’s not leveraging the built in browser text-to-speech API. It’s a self-contained vocal-tract emulation.

a|x

janvantomme · 2018-01-25 10:00:42 UTC

Looks like it's indeed using something custom built on the web audio API.

toneburst · 2018-01-25 10:00:43 UTC

Some text-to-speech systems use similar models for speech-synthesis, but they’re usually trained on analysed natural speech, and driven by complex sets of rules to generate parameter values from text input.

High-level text-to-speech APIs, like those built in to browsers are ‘black boxes’, and don’t give direct access to the parameters of the synthesis model.

a|x

axoman · 2018-01-28 16:45:55 UTC

Would be great to have an Axoloti speech synth object like the system BitSpeek uses as well!

toneburst · 2018-01-29 09:26:10 UTC

My LPC objects go a lot further, in terms of sound-mangling potential than Bitspeek, but only works with pre-recorded LPC data.

I’d really love to make an object to convert audio to an LPC stream in real-time, but I’m not sure how to approach that, or if the Axoloti has the processing power to do that.

a|x

lokki · 2018-01-29 11:54:42 UTC

maybe praat can help?

http://www.fon.hum.uva.nl/praat/manual/LPC.html

lokki · 2018-01-29 12:04:24 UTC

and:

toneburst · 2018-01-29 12:39:28 UTC

I think what's needed is a Levinson-Durbin implementation. There seems to be lots of source code available for that, since it's been around for a long time.

Unfortunately, the theory is all a bit over my head, and I don't really have the coding or DSP skills to tackle attempting an Axoloti object implementation on my own.

I also have no real idea if it's practical to attempt this on an MCU like the Axoloti's. All the implementations I've seen documented are non-realtime, even on desktop computers.

a|x

toneburst · 2018-01-29 12:48:42 UTC

Is the source available for this?

a|x

lokki · 2018-01-29 12:56:11 UTC

toneburst · 2018-01-29 13:13:38 UTC

Ah, thanks @lokki.

I was thinking about doing an implementation of Klatt speech synthesis, too, a while back. One thing at a time, though...

a|x