Vocal emulation


#1

check this out!

https://dood.al/pinktrombone/

would be nice to port this to axolotiland! does somebody know neil?


#2

It's a lot of fun!

I'm not sure if the voice model would run on the Axoloti- there's a big difference between the processing power of the Axoloti and a desktop computer. Also, the GUI is great- not sure how well the whole thing would work, without the nice interface.

Having said all that, I'd love to see if it could be done. I love those vocals-ey sounds.

a|x


#3

yeah, i will try to contact him and see if he would share the code...
the ui is not so important i think, mapping those controls to 4 or 5 cc's should be fun as well..


#4

It's JavaScript, so the code will be visible. It will probably be minified though, so not very readable.

I think you'd need quite a few controls, I think. If you could get hold of the basic synthesis code, you could probably work out a method to control the parameters of the model that made more sense in the context of controlling it via MIDI CCs. Simply having a knob for every parameter of the model probably won't make much sense.

In text-to-speech systems, these kind of models are usually fed parameter streams made from analysed natural speech.

a|x


#5

I'm thinking an X/Y touchpad could be implemented to continuously control the tongue placement. Triggering the flips would be easy and a foot pedal or velocity keystroke could influence the palate/lip position. That way, you still have the different notes on the keyboard to play/sing a tune!


#6

Sounds great! Getting it to speak or sing comprehensably would require a lot of work, I think. Reminds me a bit of The Voder, an early manually-operated vocoder, which required a year of solid training to master the multiple hand and foot-operated controls.

On the other hand, if you just want to make cool vocal-esque sounds, it would work very well, I think!

a|x


#7

yeah, just some cool vocal-esque sounds... i looked at the code it seems not to use streamed data.


#8

Browsers have text-to-speech and speech-to-text built in nowadays, accessible via JavaScript. It would be a huge effort to port a full speech engine to the Axoloti.


#9

i'm pretty convinced it is not that, did you look at it? it is not text to speech...@toneburst did an lpc port a while back for the axoloti...


#10

I’m pretty sure it’s not leveraging the built in browser text-to-speech API. It’s a self-contained vocal-tract emulation.

a|x


#11

Looks like it's indeed using something custom built on the web audio API.


#12

Some text-to-speech systems use similar models for speech-synthesis, but they’re usually trained on analysed natural speech, and driven by complex sets of rules to generate parameter values from text input.

High-level text-to-speech APIs, like those built in to browsers are ‘black boxes’, and don’t give direct access to the parameters of the synthesis model.

a|x


#13

Would be great to have an Axoloti speech synth object like the system BitSpeek uses as well!


#14

My LPC objects go a lot further, in terms of sound-mangling potential than Bitspeek, but only works with pre-recorded LPC data.

I’d really love to make an object to convert audio to an LPC stream in real-time, but I’m not sure how to approach that, or if the Axoloti has the processing power to do that.

a|x


#15

maybe praat can help?

http://www.fon.hum.uva.nl/praat/manual/LPC.html


#16

and:


#17

I think what's needed is a Levinson-Durbin implementation. There seems to be lots of source code available for that, since it's been around for a long time.

Unfortunately, the theory is all a bit over my head, and I don't really have the coding or DSP skills to tackle attempting an Axoloti object implementation on my own.

I also have no real idea if it's practical to attempt this on an MCU like the Axoloti's. All the implementations I've seen documented are non-realtime, even on desktop computers.

a|x


#18

Is the source available for this?

a|x


#19

#20

Ah, thanks @lokki.

I was thinking about doing an implementation of Klatt speech synthesis, too, a while back. One thing at a time, though... :wink:

a|x