OddVoices: an open source singing synthesizer

I would like to introduce OddVoices, a project to create quirky lo-fi singing synthesizers for General American English, inspired by retro TTS systems from the 80’s and 90’s. Here’s a sample:

Unlike many modern speech synthesizers created using machine learning, OddVoices is a 100% manual endeavor. I wrote a list of 628 English words that cover the most common diphones in GA, convinced a singer friend of mine to record all of them sung in monotone, and labeled all the diphones in Audacity. The analysis/resynthesis algorithm is MBR-PSOLA.

Currently, the interface is a command-line tool that takes a JSON file format (see examples/music.json) or a MIDI file + lyrics and outputs a WAV file. Further down the line, I want to make this work in real-time as a SuperCollider UGen, and have already ported the engine to C++.

The project is still very early in development, but it’s capable of producing entertaining musical results. Thanks for checking it out, and let me know if you have questions or feedback.

41 Likes

sounds cool!! Always excited to learn about any new speech/singing synth (and maybe one day will actually finish some recording using them ha)

The dream of a norns port of this singing Hydrant just got a little closer!

6 Likes

We now have a real-time OddVoices SuperCollider UGen. It’s really buggy, and I’m committing a few sins in the audio thread, but it sings. Here’s a demo:

I’ll update the repo with better installation and usage documentation soon!

17 Likes

@nathan this is very impressive! The supercollider demo is awesome.

ccing @PaulBatchelor

2 Likes

It’s good to see progress on this! I love the aesthetic of it. I’d be really interested in hearing two or more lines going at once.

This looks really cool! Are the MBROLA voices able to be used in a commercial fashion? Will there be a Mac version?

Although I’m using the same algorithm as MBROLA, there’s no code in common with the MBROLA project and the voice bank is original, so you won’t run into any of MBROLA’s licensing issues. OddVoices is itself available under the Apache License.

Hmm… now that you mention it, I should probably give the voice banks an explicit CC0 license just to be safe.

I don’t have a Mac I can test on, but the non-realtime Python reference synthesizer, lightly documented in the README, should work cross-platform. The SuperCollider UGen might work if you follow the build instructions, but it’s still undocumented (I need to get on that).

3 Likes

Thanks a bunch for looking into this and clarifying the details! :slightly_smiling_face:

@PaulBatchelor

@nathan I just tested this out (on arch linux fwiw) and it’s really fun. :slight_smile: I’d like to try creating a new voice for your system at some point too!

bug report

I did have a number of issues with git-lfs though. Running git lfs pull looked like it downloaded the files, but the data was still just placeholder files in the filesystem when I checked, and running oddvoices died the first time trying to open the stub wav file in the quake voice, which was still just a textfile… anyway I manually downloaded that one file and then I was able to compile the voice, but rendering failed in a similar way on an unknown file resulting in a KeyError during part of the corpus.py routines to read the voice. (Sorry I forgot to make a note of the exact line number… I think it was trying to index into an object called string iirc… The compiled voice looks good though, the magic string is there and I was able to use it when I disabled git lfs so it must have been failing somewhere else…)

Anyway, when I fully disabled git-lfs by running:

git lfs migrate export --everything --include="*.wav,*.voice,*.pdf,cmudict-0.7b"

then everything worked great!

2 Likes

Thanks for giving it a whirl! I looked into the git lfs pull issue but was unable to reproduce, either locally (also on Arch) or in my CI. Super weird. Good that you found a workaround, though.

I’d also like to announce the second voice in this project, named Cicada Lumen! Cicada is a bright baritone with a top end buzz. Here’s singing a few lines from Jabberwocky:

1 Like

After several months’ hiatus from this project, I have recently resumed working on OddVoices. I ported the Python prototype over to C++ and built a friendlier option to the command-line interface: a web page where you can upload a MIDI file and text and produce a WAV output right in your browser thanks to the power of WebAssembly.

Example MIDI file attached:
frere_jacques.mid (486 Bytes)

This is still very experimental software, so expect lots of bugs…

I have not forgotten about the SuperCollider UGen, however, work on it is suspended until the core DSP code is more stable.

5 Likes

So cool! It seems pretty usable for short bits of synthesized singing. It will be fun to see how this progresses

Excited about this, trying the web version. Do you know of any archive of mono (Type 0?) midi files? Seems like most random midi sites don’t specify.

EDIT; ok this one does for a start http://www.piano-midi.de/

ha this is extremely promising for my absurd purposes :smiley: thanks again.

1 Like

lmao at the recording. There are some weird noise bursts in there that are certainly a bug that I need to track down.

https://abcnotation.com/ might be a good resource. It’s got tons of monophonic folk and traditional songs with MIDI files available for download.

EDIT: I have fixed the noise bursts. It still clips a bit, especially with the Cicada voice, but it’s not nearly as noticeable anymore.

2 Likes

Yeah it seems to do some weird clipping/distortion at high frequencies

Cool thanks will have a look at that!

1 Like