I’m a big fan of speech synthesis, and I love to make things babble! I also love to babble about things babbling. There is something I find so charming about getting a computer to chatter. One of my pipe dreams is to build an asemic speech engine that is built around prosody and inflection.
I once wrote a study called “computer on the phone with his mother”:
I wrote this in my Sporth language. You can find the sporth code here.
The recipe to make speech sounds like this is reasonably straight forward to replicate in a modular synthesis environment (Csound, SC, PD, FAUST, etc). What this is doing is taking a narrow pulse wave and putting it through a series of bandpass filters tuned to formant frequencies. Interpolating between the formant values gives it the “talky” effect. Randomly interpolating between formant values makes it “babble”. I also added some pauses, which makes it feel more like words in a language. The breaks are what give it personality (similarly, in singing synthesis, choosing the right vibrato makes a world of difference).
About a year or so later, I made another babble track. This one uses a Kelly-Lochbaum vocal track to synthesize the voice. From a control standpoint, the general concept is very similar, only instead of moving a bunch of bandpass filters around, you’re pushing and pulling on a virtual vocal tract to shape the sounds. The implementation itself is based on pink trombone, but I hastily ported the javascript code to C to make it work with the rest of my music software ecosystem.
There’s also stuff like espeak and festival which I’ve used in the past to generate electronic speech.