OddVoices: an open source singing synthesizer

I would like to introduce OddVoices, a project to create quirky lo-fi singing synthesizers for General American English, inspired by retro TTS systems from the 80’s and 90’s. Here’s a sample:

Unlike many modern speech synthesizers created using machine learning, OddVoices is a 100% manual endeavor. I wrote a list of 628 English words that cover the most common diphones in GA, convinced a singer friend of mine to record all of them sung in monotone, and labeled all the diphones in Audacity. The analysis/resynthesis algorithm is MBR-PSOLA.

Currently, the interface is a command-line tool that takes a JSON file format (see examples/music.json) or a MIDI file + lyrics and outputs a WAV file. Further down the line, I want to make this work in real-time as a SuperCollider UGen, and have already ported the engine to C++.

The project is still very early in development, but it’s capable of producing entertaining musical results. Thanks for checking it out, and let me know if you have questions or feedback.


sounds cool!! Always excited to learn about any new speech/singing synth (and maybe one day will actually finish some recording using them ha)

The dream of a norns port of this singing Hydrant just got a little closer!


We now have a real-time OddVoices SuperCollider UGen. It’s really buggy, and I’m committing a few sins in the audio thread, but it sings. Here’s a demo:

I’ll update the repo with better installation and usage documentation soon!


@nathan this is very impressive! The supercollider demo is awesome.

ccing @PaulBatchelor


It’s good to see progress on this! I love the aesthetic of it. I’d be really interested in hearing two or more lines going at once.

This looks really cool! Are the MBROLA voices able to be used in a commercial fashion? Will there be a Mac version?

Although I’m using the same algorithm as MBROLA, there’s no code in common with the MBROLA project and the voice bank is original, so you won’t run into any of MBROLA’s licensing issues. OddVoices is itself available under the Apache License.

Hmm… now that you mention it, I should probably give the voice banks an explicit CC0 license just to be safe.

I don’t have a Mac I can test on, but the non-realtime Python reference synthesizer, lightly documented in the README, should work cross-platform. The SuperCollider UGen might work if you follow the build instructions, but it’s still undocumented (I need to get on that).


Thanks a bunch for looking into this and clarifying the details! :slightly_smiling_face:


@nathan I just tested this out (on arch linux fwiw) and it’s really fun. :slight_smile: I’d like to try creating a new voice for your system at some point too!

bug report

I did have a number of issues with git-lfs though. Running git lfs pull looked like it downloaded the files, but the data was still just placeholder files in the filesystem when I checked, and running oddvoices died the first time trying to open the stub wav file in the quake voice, which was still just a textfile… anyway I manually downloaded that one file and then I was able to compile the voice, but rendering failed in a similar way on an unknown file resulting in a KeyError during part of the corpus.py routines to read the voice. (Sorry I forgot to make a note of the exact line number… I think it was trying to index into an object called string iirc… The compiled voice looks good though, the magic string is there and I was able to use it when I disabled git lfs so it must have been failing somewhere else…)

Anyway, when I fully disabled git-lfs by running:

git lfs migrate export --everything --include="*.wav,*.voice,*.pdf,cmudict-0.7b"

then everything worked great!


Thanks for giving it a whirl! I looked into the git lfs pull issue but was unable to reproduce, either locally (also on Arch) or in my CI. Super weird. Good that you found a workaround, though.

I’d also like to announce the second voice in this project, named Cicada Lumen! Cicada is a bright baritone with a top end buzz. Here’s singing a few lines from Jabberwocky:

1 Like

After several months’ hiatus from this project, I have recently resumed working on OddVoices. I ported the Python prototype over to C++ and built a friendlier option to the command-line interface: a web page where you can upload a MIDI file and text and produce a WAV output right in your browser thanks to the power of WebAssembly.

Example MIDI file attached:
frere_jacques.mid (486 Bytes)

This is still very experimental software, so expect lots of bugs…

I have not forgotten about the SuperCollider UGen, however, work on it is suspended until the core DSP code is more stable.


So cool! It seems pretty usable for short bits of synthesized singing. It will be fun to see how this progresses

Excited about this, trying the web version. Do you know of any archive of mono (Type 0?) midi files? Seems like most random midi sites don’t specify.

EDIT; ok this one does for a start http://www.piano-midi.de/

ha this is extremely promising for my absurd purposes :smiley: thanks again.


lmao at the recording. There are some weird noise bursts in there that are certainly a bug that I need to track down.

https://abcnotation.com/ might be a good resource. It’s got tons of monophonic folk and traditional songs with MIDI files available for download.

EDIT: I have fixed the noise bursts. It still clips a bit, especially with the Cicada voice, but it’s not nearly as noticeable anymore.


Yeah it seems to do some weird clipping/distortion at high frequencies

Cool thanks will have a look at that!

1 Like

I’ve been posting OddVoices dev logs on my blog. I’m here to share a few choice updates.

First, I have registered oddvoices.org, the new home of the web frontend:

Also we have the addition of a new voice, Air Navier, and incorporation of real vocal pitch phenomena such as “preparation” (sliding away from the target pitch before sliding towards it) and “overshoot” (sliding past the target pitch, then changing direction). See images, derived by running real vocal data through a pitch tracker.




Thanks for sharing this, and your blog, it’s interesting to understand how it all works… Looking forward to a Supercollider version one day!

[P.s.: I don’t know if you’ve come across FoF synthesis in your research, but it might be interesting to check out. FoF is a French acronym, and means “formant wave function” synthesis. It was used by the Ircam CHANT project back in the day and produced some really nice sounding singing voices. I think it pre-dates PSOLA, and has some similarities as well. There’s an example clip from CHANT here]

Also interesting


Thanks for the links. In the textbook Text-to-Speech Synthesis, Paul Taylor describes how speech synthesizers evolved over time in “generations.” The first generation of speech synths was developed in the 70’s and 80’s and uses parametric control of formants in schemes like FOF, windowed sync, and LPC.

OddVoices is a second-gen singing synthesizer, modeled after late 80’s and 90’s speech synths that use sample playback with time-frequency modification. MBROLA also falls in this category.

Unit selection, like in your second link, is a third-generation method along with Hidden Markov Models (the recent Casio synth is apparently based on Sinsy, which is HMM-based). They sound great due to their heavy use of contextual information to select samples. In the past decade, deep learning approaches have sprung up that could be called “fourth generation.”

The downsides of the later generations are that they require enormous amounts of training data to work their magic. When deciding which approach I wanted to go with at the inception of OddVoices, I made a deliberate tradeoff because I don’t have the resources to record long hours of singing. The benefit is that new voice banks can be created with not too much work, enabling a wide diversity of voices in the project.