Novel Vocal Processing

i’m curious of specific novel techniques for processing vocals. particularly i’m looking to split vowel and consonant sounds within one recording, similar to how a noise reduction process works (where you can preserve the noise or clean sound). my quick googling didn’t get me far (and i have to do real work) so i pose this question to the smart people here.

furthermore, i’d love to see examples of experimental vocal work.

robert ashley - automatic writing


I think these all miss what you’re after but are perhaps food for thought (or thought for food)

i always liked the way the books manipulate vocals, sometimes tongue 'n cheek edits, mostly just vocal cutting, sometimes layering but w/ cool end results

these early reichs are cool too, again not really processed beyond cut/phased

the bjork where she used just vocals has some sweet stuff too


I’m familiar with using bandpass filters to create vocal-sounding formants from simple signals, but I’ve never heard of (and am finding a hard time finding info about) filtering specific formants from existing vocal audio streams. It’s an interesting concept, I wish I knew the answer.

1 Like

Some potential starting points:

So far this seems to be heading down a path that leads to speech analysis, which involves way more calculus than I’m capable of (which is pretty close to nil).

this is interesting.

a bit on specific antiformants:

I guess the whole of this is rooted in how we HEAR consonants and vowels vs the SCIENCE of consonants and vowels.

spectrograms of specific voices:

more on antiformants:

antiformants and antiresonances (essentially application of the inverse):

maybe we could do a proof of concept where we do multiple recordings of a single sentence – one full tonality and one aspirating only the consonants, then see if using the inverse of the consonant track against the fully toned one results in vowels only? if this works, then see if a full tonality played against a vowel inverse results in only constants.

then…something. coding? number crunching? sandwiches? (for serious, maybe we can find a way to map the average of an individual’s aspiration of the sample sentence’s consonants to a patch that we can apply to them re-reading the sentence in real time to see if only vowel formants pass through? then try for a consonant-pass filter?)

i’m rich with ideas, but poor in practical coding knowledge. but i think there’s something here.

1 Like

I’m getting the feeling we need some advice from someone on Skype’s speech-to-speech translation team or something like that… This is hard!

(or maybe it isn’t and we’re just on the wrong track)

i feel like @zebra might have some ideas about this.

I hope this means that you are thinking of making a voice-centric module!.. Because that would rock my boat very hard.
Voice is present in about %99.9 of everything I do that involves sound, so imagining a thoughtful integration of vocal presence within modular environment seems very very interesting. Especially since there does not seemto be much of anything like that out there.

Granular effects & harmonizing via pitch shifting come to mind as fun/interesting vocal treatments. And formant synthesis can be a lot of fun too.

Harmonizing is helped by some stereo spatialization, and very small timing variations.

Not really a contribution but on the splitting of consonants/vowels Alex Harker showed me some of this new frame library he’s working on (framelib) and it can do mean filtering with a handful of objects, in real-time, and it sounds crazy good (ie splitting transients from sinusoids).


My first guess for a starting point would be the speech recognition engine by it self. There are several opensource options for speech recognition. It all depends on what you’re looking to “get out” of the service. By this I mean after the words have been spoken, the computer has processed said words, what do you want it to spit out next? Keeping it simple in theory, we’re simply looking at a classic stdin/stdout situation with variables spaced over time.

Say you choose speech to text. After the speech has occured and the closest translation has been determined by the SRE a sample output would be a string containing each word, with a character delimiter of ‘,’ terminated with a newline ‘\n’, calculated number of syllables per word (this is no doubt part of speech recognition), and the total amount of time the original recording elapsed (tp try and preserve timing). This could be written to a text file, or better yet sent to a FIFO/named pipe for the “vocal processing” portion we are interested in. The result could be a rule, or a macro, or a event containing OSC messages, midi, any number of generated data that originated from the speech, the timing or speeech, the recognition of what was spoken, and its make up.

no doubt this is a MONSTER idea and opensource SR is very complicated. But alas here are some Speech Recognition Projects that will recognize speech and pump out ALOT of data about what and how its doing its thing:



Voxforge (opensource speech corpus creation and integration)

1 Like

Made me think of this. Well, i was more looking for a piece called ‘Kassandra’ but only found ‘Korwar’ (there is a typo in the video). Also Elisabeth Choijnacka’s interpretation is far better.

And of course Meredith Monk :smile:

But Luciano Berio made the most incredible work with voice :


my two cents on experimental vocal work (not all processed) - you may know some of these already

Maja S. Ratkje - just amazing, both in terms of acoustic and electronic voice work - and as an all round improvisor -
live set -

slightly more acoustic but always stuns me is Meredith Monk’s work, ‘Braid 1 and Leaping Song’ from mercy is one of my favourite pieces of music ever… but this live clip actually reminds me of some of the modular videos you’ve been posting

looking at the old guard of voice processing you have to check Trevor Wishart, also because he built loads of software processing toolkits for voice/sound when everything had to be rendered offline :slight_smile: and he writes really well about it in ‘On Sonic Art’
not sure i ever actually enjoy listening to much of his stuff though! :smile:

and in the new school of voice people i’m really enjoying Katie Gately


Tangential here (too) but some mechanical vocal processing with crazy masks. Inspiring if nothing else:

i dunno about ‘novel’
but lot of digital voice processing (all the way back to the delilah project) use FFT for analysis and phase vocoder techniques for synthesis.

supercollider, max, pd etc all have dedicated classes for manipulating phase vocoder parameters. IIRC, max has a pvoc~ object which lets you patch directly in the sinusoid domain. supercolliders PVoc classes are similar, alowing you to specify arbitrary input->output mapping functions.

i would recommend just playing around in those environments. spectral companding and scrambling are really simple to try. simple formant tracking is not too hard; IIRC it is basically a question of looking for local centroids. simple transient separation is not too hard; IIRC it is basically looking at spectral flatness. supercollider has classes to calculate these things for you. there is of course tons and tons of literature on those topics, easily searched.

this album by paul demarinis is a classic:

outside the frequency domain, simple amplitude modulation is pretty powerful. as is enveloped filtering. those early bob ashley pieces are a good example. takehisu kosugi’s “catch-wave” is fantastic. jessica rylan does wonderful things with simple anvelope following, AM, and incredibly dirty preamps.
finally, here are a couple friends in oakland doing great vocal noise:


Cool to see Voicehandler here. Really good friends with Danishta from when we both lived in Miami.

LOVE the Katie Gately pieces!
So rich, and full of excitement/energy.

This newish show on WFMU is centered around interesting vocal techniques, perhaps would be of interest to those in this thread –

Music with vocals weird & wonderful – beatboxing, yodels, auction chants, Tuvan throatsinging, & eerie polyphonies.


Hello! would like to share this album by Ian William Craig.
Altho I’m not sure if this is what you were talking about, hope it inspires you guys anyway! Beautiful sounds.

I’ve really enjoyed looking through the links.

A few more!
Vocal synthesis on the Gameboy with LSDJ - The best example I have is the breakdown in this:

The great, Imogen Heap, (how has this not already been posted?!)

And I’m loving the vocal sampling of CHVRCHES at the moment:

1 Like