Disquiet Junto Project 0290: Text-to-Beat


Tapping my foot and appreciating your use of the stereo field :slight_smile:


I like the way you frame your contribution with the thought-provoking anecdote about Roger Ebert.

Also enjoying how the rhythm develops and I’m reminded I’ve been meaning to do something with a piano. Might do that today, cheers.


Instead of trying to find a rhythm that I liked, I decided to select the words and work with what came to me. My favorite opening sentence in a novel is from the Charles Dickens novel, A Tale of Two Cities. The sentence is much too long to establish a rhythm so I just used the part everyone knows, “It was the best of times, it was the worst of times”.

However, this results in a rhythm that is much too boring and predictable for this exercise, so I had the words converted to Japanese and pronounced through a text-to-speech program. The result was considerably more complicated and challenging, but offered something considerably more fun to work with.

Two Cities is written for saxophone quartet.

The score is available at http://bit.ly/2uU7dK7


Sorry, my pain has become yours…

Malfunction or: How I Learned to Stop Worrying and Become More Human(disquiet0290)

The strangest and most time consuming Junto I have done yet, with influences from Trio, The Pixies, Beach Boys, Yosemite Sam, Bob the Builder, Buck Henry and the Borg. Sampling, recording and editing TTS is probably not my favorite thing, but it was interesting. I ended up using a browser based TTS and recording into Ocenaudio on a pc, saving the recordings on google drive and then downloading into Cubasis on the pad. Ten different voices and tracks with one synth and a beat. Most of the differences in pronunciation were kept but some were stretched for rhythm or flow. The parts include words, sounds, letters and a doo-wop chorus influence. Most of the speech tracks have a few light effects, but there are heavier handed bits in there. Everything except synth and drums are also run to a verb bus with an instance of Roomworks. Mostly nonsensical, but still slightly disturbing and mildly(?) dark. Enjoy, but not too much, I will probably never spend this much time on TTS again :wink:


Put these bits through the example text-to-speech URL, mostly keeping sample text from some of the voices:

<p><s><prosody rate="-5%">
BOOM, shock. Boom, shock.
Buh BOOM, shock. Boom, shock.
BOOM, shock. Boom, shock.
Buh BOOM, shock. Boom, shock.
Dot, dot, Dot dot dot. dt dt, dt dt

<p><s> <prosody rate="-95%">espacio, aahhhhhhhh, eeeehhhhhhh, aahhhhhhhh, aahhhhhhhh</prosody></s></p>

<p><s><prosody rate="-45%">spirituale e morale<break time="300ms"/>valori indivisibili e universali della dignità umana, della libertà</prosody></s></p>

<p><s><prosody rate="-5%">Tata Ta, tata! Tata! Tata!</prosody></s></p>

Prosody is speed of talk, ‘s’ tag (?). Have basic rhythms, some whizzy things, hope to be done tomorrow night.


It’s been a while since I’ve participated in the Disquiet Junto Project, so please forgive my absence.
Back this week with something a little different. I was curious to see how the text-to-speech software would cope with none-words. I played around with a few ideas and then settled for something Dada-ish.
I tried four different language options - English, French, German and Japanese. The first three gave excellent result, but the Japanese version just didn’t sound right so I rejected that one.
Moving the tracks into Ableton Live I extracted melodies and harmonies from the recordings and created an ambient backdrop.
Each filtered recording is played twice during the piece, but you may struggle to hear them first time round as the fist play is very quiet.


Cheers :slight_smile: Mr. Ebert did a lot to raise public awareness of TTS as a mode of communication for people with disabilities.


Seems seldom I make it up here lately - couldn’t ignore this one as it works closely with a current set of experiments around TTS --> SpeechToText --> BOT chats & AI transcription. Interesting that this Junto was, conversely, a chance to use similar approaches, yet abandon any venture in interpretation & meaning.

My go-to TTS is generally --> http://www.fromtexttospeech.com/ as the voices are fluid. If they choke, zealous punctuation & manipulated misspellings can get them back on track.

Used the TTS site to spit some short vocal patterns, then sent them down a series of generative arrangement effect chains in Reaper. (also a current focus of experimentation)

A little tinkering allows for Reaper to freely apply cross-modulations between tracks, samples & instruments, tracking dynamics, pitch & spectra. I set up a net of tonal (fast) modulations over multi track sends, to develop ‘voices’; gradually teasing back the granularity of tracking down to the temporal (slow) scale of triggers/events. The resulting lively chaotic generators were then routed back to a clean set of ‘arrangement’ tracks, with fading applied to thin them down and give everything a turn.

Next challenge (harder) = try and read through + listen what all the rest of you did!


this is so right in every way!


I am a dummy when it comes to computers, so when I tried to get my computer to read any text all I achieved is what you can hear. I do not even know where the voices came from…
Things being as they are, I could not use these voices as beat or rhythm; so I played some guitar, bass and drums to it.
Even if I know the result does not comply with the given task. But I could not do any better


I’ve had the sentence “Language navigates this.” from Citizen by Claudia Rankine stuck in my head for months. I’ve been working on music related to this idea but nothing as explicit as using the phrase as a source. This project seemed perfect for that. I used the text to speech voices from Mac OS X, recording both the whole phrase as well as the syllables from each word. I used Ableton to slice the audio to midi tracks and went from there. The phrase as a rhythm only appears in full for a few brief moments.


Agreed on texttospeech.com, having used it on a few Disquiet projects. It responds to punctuation and purposeful misspellings better than text-to-speech-demo.mybluemix.net – for instance, using “aaaaaaaaahhhh” in texttospeech.com made a drawn-out vowel sound, akin to background singing. In the latter it would stutter, like “aaah-ak-ak-ak-hah”.

However text-to-speech-demo.mybluemix.net has some useful xml tags to manipulate speech, like < prosody rate="-45%" > and < break time="300ms” />.


Online text to speech Chinese counting to 4, then playing with Organelle sampler…


Yeah, I noticed the thing it had going on, more ambitious than TTS.
In the other ‘direction’, you can have fun by layering/corrupting multiple voice recordings then feeding it to --> https://speech-to-text-demo.mybluemix.net/ … you can watch as it projects and redacts in real time; trying to figure what you have ‘said’. (The downside being - when it really gets stuck - an odd preoccupation with the middle eastern ‘peace’ process.)


Expressing some of my feelings of the present moment.


After hearing an acapella group perform Boots and Cats, it is a natural option for this week’s Disquiet Junto. Saying “boots and cats” over and over forms a natural beatbox rhythm. We’re supposed to develop rhythm, so let’s use this as a foundation… and mess with it. Layering 12 different versions of Boots and Cats with different machine interpretations destroys the inherent groove. The only thing that gives this a musical aspect is the high pass, low shelf, and one notch filter.


My track is here, “Espacio Libertade”, named using some of the text in the samples of the web app:

I used the text-to-speech web app linked in the project. “Boom-shock, boom shock” steps off from the Kraftwerk song that uses similar phrasing. I think these are all the words I used:

<p><s><prosody rate="-5%">
BOOM, shock. Boom, shock.
Buh BOOM, shock. Boom, shock.
BOOM, shock. Boom, shock.
Buh BOOM, shock. Boom, shock.
Dot, dot, Dot dot dot. dt dt, dt dt
<p><s> <prosody rate="-95%">espacio, aahhhhhhhh, eeeehhhhhhh, aahhhhhhhh, aahhhhhhhh</prosody></s></p>

<p><s><prosody rate="-45%">spirituale e morale<break time="300ms"/>valori indivisibili e universali della dignità umana, della libertà</prosody></s></p>

<p><s><prosody rate="-5%">Tata Ta, tata! Tata! Tata!</prosody></s></p>

I feel I am still locked down to lining up beats to timeline, and doing standard patterns like intro/4 measures/2 measures/4 measures/bridge/2measures/outro. Fills are the hardest.


I was struggling to come up with an idea. So I took an old poem that I wrote about writer’s block - in which I used a collection of odd phrases and juxtapositions that I had been keeping for band-names / track titles. I had a female voice from www.fromtexttospeech.com/ read out the text and then looped a few phrases - placing them in line with the audio like a series of echoes. Each one has its own volume curve, panning, and different reverb. I had the original text play alone, then start echoing with a key phrase, before having only the echoed phrases play without the original text. I then used a male voice to read a re-worked version of the poem, which has a more structured form. So the whole piece represents the creative process - from an initial idea, through the mental processing and reworking of the ideas, and then the final piece.


hi folks,
loved this one as I had a writerbot poem waiting for this… enabled the construction of this week’s track:
…I had a funny feeling of de ja vu on Thursday, I was hoping for a computer text vox project! So I was happy that some text to speech was part of the project, as I had a piece that I wanted to take further; this was a perfect opportunity!
I got a poem or song lyric generated at writerbot.com - created a vocal via fromtexttospeech.com - I then chopped, sliced and diced this into a more intricate and intriguing pattern using Sector (kymatica.com/Software/Sector), I then combined this with some pads and noise which were in turn further processed with Borderlands Granular (borderlands-granular.com)- the inevitable Dubstation by AudioDamage and AUFX Space for added spacial dimensions… thanks for listening!


I had the computer say one of my favorite lines from poetry, “So I said I am Ezra” (from “So I said I am Ezra” by Archie Ammons) as I’ve always liked the alliteration.

I chopped it up, made some repetitive patterns, and ran them through reverb and echo before layering them back on themselves. Then…I hated it! So I took the whole piece and layered in a short passage from a Numbers station – still hated it. Then I took the whole thing, compressed it a little, and put it through the Robot voice filter in Garageband, which is when it started to get somewhere. A few more circlings and overlappings, and it was there. Or at least as much of a “there” as a line from a poem about the impossibility of identity would allow… ;