I would like to introduce OddVoices, a project to create quirky lo-fi singing synthesizers for General American English, inspired by retro TTS systems from the 80’s and 90’s. Here’s a sample:
Unlike many modern speech synthesizers created using machine learning, OddVoices is a 100% manual endeavor. I wrote a list of 628 English words that cover the most common diphones in GA, convinced a singer friend of mine to record all of them sung in monotone, and labeled all the diphones in Audacity. The analysis/resynthesis algorithm is MBR-PSOLA.
Currently, the interface is a command-line tool that takes a JSON file format (see examples/music.json) or a MIDI file + lyrics and outputs a WAV file. Further down the line, I want to make this work in real-time as a SuperCollider UGen, and have already ported the engine to C++.
The project is still very early in development, but it’s capable of producing entertaining musical results. Thanks for checking it out, and let me know if you have questions or feedback.