A Catalog of River-Dwellers

Hi all, I just released a free-to-download album and companion EP on bandcamp as He Can Jog.

This album started as an attempt to take a generative system I’m working on to process databases of field recordings and repurpose it to instead process entire records that I liked. The only thing that survived from that process was a ton of test renders and my favorites were sourced from @madeofoak’s What Now album.

That material was the starting point for this – there are two sort of prominent sections that make up the first and second halves which are sourced from it – mostly segmented convolutions of about ~50 sections from the album.

The rest of the process involved just writing things around that material basically. I used the east/west composer cloud library finally in the first half to add vibraphone and shakers – and the woodwinds. And I pulled from a bunch of foly-type recordings I’ve been making over the last year as well. Specifically scraping a piece of wood against more wood, manipulating the top of a mangled metal trash bin, a close-up recording of the sand falling through an hourglass, and various recordings of spaces and my apartment. (Especially the nice sound of the windows being closed.)

I’ve been kind of enamored also of Marie Poland Fish’s catalog of fish vocalizations. (Yes, her name is really Fish!) I tracked down the book she published which has hundreds of North Atlantic fish vocalizations documented as spectrographs with annotations like the biological source of the vocalizations – seems like a lot of swim bladder action but there are so many different types of vocalization it’s really crazy.

She also released a set of recordings of these vocalizations which I have had no luck tracking down (U of Rhode Island has ignored my emails asking about it…) but there are some samples on the website and those are featured in this album too: http://www.gso.uri.edu/fishsounds/

That kind of inspired me to structure this thing as a story, or at least try to evoke some narrative element. So, make of it what you will but the subject is a monster deep in the water-ways of a river that empties into a lake. I read a traditional Scandinavian story at the end which has a kind of political message that resonated with me as well.

I also released this companion EP which is just two songs, consisting of outtakes of material from the record which I really liked but wasn’t able to work into the record itself. It’s a more straight-ahead slow-moving instrumental affair I think, and the only story is in the sound.

Thank you for listening!


this is beautiful, Erik.

1 Like

<3 Thanks man

I forgot to mention some other thieving: the intro/outro bits work in a few convolved samples from the new mountain man record as well…

1 Like

Hi can you say a bit more about the generative system? I am thinking a generative system which is able to choose and play from a pool of field recordings segments. I like the @sbaio norns script zellen, @alphacactus meadowphysics, @dan_derks less concepts and I try to think how can I implement these or other with (long) sample triger and select? Thanks


Sure, this is the version used to generate some of the material from the What Now sources: https://github.com/hecanjog/radiopost/tree/97157aec8cf2c6363391ff316490638a78ea8708

In this form the software makes a few passes through to build everything up, and leaves a trail of intermediate renders and stems which I ended up drawing mostly from.

  1. It uses aubio to segment all the source audio into chunks by splitting at onsets (using the specflux approach which seems to be good at dealing with spectrally dense source material) [source]

  2. On each segment it makes an analysis pass and stores the features extracted with librosa into an sqlite3 database for later (along with the start/end points of the segments in the original files etc) [source]

  3. The first main stem (I called it particles) takes a sorted set of all the segments from the most noisy to the least noisy (based on librosa’s spectral flatness feature) and scrubs through them on a variable-shaped curve in several passes, applying a number of transformations to the sounds as it goes – mostly shaped LP/HP filter sweeps and variable speed playback. The final output is saved to disk for later processing. [source]

  4. The ingredients for the next main stem (I called waves) are sourced by taking 50 sets of a smooth (high spectral flatness & a stable enough pitch that the detector recorded a frequency in the database) sound and a noisy sound, smears them to the same length (generally longer up to about 10x) by variable-length overlap-looping and then convolves the two sounds together. (The material I used on the album all used a time domain convolution which takes a long time but has a great high frequency sparkle to it that I can’t seem to duplicate in the frequency domain) All these convolution outputs are saved for later processing. [source]

  1. The second main stem (mixed waves… naming is sometimes an afterthought :-P) is constructed from multiple layers of those convolved outputs, mixed and filtered. [source]

  2. The third main stem (basswaves) takes the output of the mixed waves and kind of makes a bunch of passes though to created plucked events from it. Each pluck is basically just a filtered and enveloped copy of a segment from the mixed waves stem, also variably pitch-shifted in octaves and fifths. The events are sequenced into clusters of rhythmic curves. [source]

  3. The final output is constructed in a few passes. The mixedwaves layer gets an enveloped copy following the amplitude of the particles layer, and it gets mixed into the particles layer. The basswaves and mixedwaves layers are shaped with fairly slow-moving variable envelopes and mixed into the output. Then this output is put through a series of “insert recipes” which do an additional set of processing (combinations of bit reduction, filtering, distortion, chorus-like things, and chopping/stuttering of segments or alternating interwoven segments) and copy the outputs back overdubbed into the main output – so each recipe is getting the post-processed version of the last output. A similar set of "replacement recipes get applied after this, except they completely replace the output of the segment they process instead of overdubbing with it. [source]

The code is really sloppy and undocumented!

The whole thing gets squashed a bit with soundpipe’s compressor (also all the filters are from soundpipe, thanks @PaulBatchelor!) and that’s basically the process. I mostly ended up using parts from the mixedwaves stems and the combined outputs in this record.

For the field recording project, I’m happy with where things are going but things are too samey on the macro-level for my taste. And I want to do more with going back and making analysis passes on the intermediate layers to inform processing and arrangement further down the line, instead of only doing analysis on the initial step. I’m in the middle of a rewrite that has a more plugin-based architecture (still python scripts, just loaded ad hoc as isolated modules in some given context) for each step in the process, and isn’t so fixed in its approach at the macro level. I’m just starting to explore that though so nothing really useful to report yet!


thank you, that’s impressive.

Thanks, maybe not really what you asked though… I think in general the takeaway I have so far with working on something meant to process some arbitrary database of field recordings, is that the initial segmentation and analysis stage is really useful.

Sounds like maybe you’re thinking of doing something norns-based? It could be useful to prepare that step first, and have the norns work with the annotated audio – sorting by different features is really fun, and you don’t have to read through it linearly – tracking a position in some feature (maybe spectral flatness / noisyness) back and forth along some curve is neat and I’d bet the norns would have no trouble crunching through an sqlite3 database of feature annotations with plenty of headroom to spare for additional processing etc.

1 Like

Based on what I read in your post, It seems like you just started the documentation :wink:

But actually, though. Those are really great “liner notes” to put in a README somewhere in your repo.

Hmmm. What convolution implementation are you using? What frequency convolution techniques hae you tried? Frequency + Time domain convolution shouldn’t sound different.

You’re welcome!

1 Like

Thank you, my interpretation is that by using these steps you apply some subjective aesthetic criteria?
My intended approach is to maximise the arbitrarity of the system…

Absolutely. The nature of designing a generative system is to make choices that necessarily have an aesthetic impact. But it’s up to you how much to consider the impact of the choices you make. Lack of intention is an interesting goal to chase, but in my opinion the thing you might be after – surprise, is more easily found in simple and deliberately designed systems with emergent properties. Any decision you make will have an aesthetic impact, whether you like it or not so if considering the impact of those decisions feels like the wrong direction maybe you also could try as much as possible to eliminate making decisions. A live microphone feed somewhere is the closest thing I could think of off the top of my head. You’re still left with choosing the microphone and its placement which have an overwhelming impact on the aesthetic of the result, but maybe you could tap into the live streams on the locus sonus sound map. You’d still have to make the decision of which stream to use and when, or maybe combine them (but again that’s getting further into the realm of intention) but in any case the feeds there are great! http://locusonus.org/soundmap/


I dropped this implementation in recently: https://www.nayuki.io/page/free-small-fft-in-multiple-languages

I’ll try to put together an A/B after work, I was surprised too!

1 Like

Really really enjoying this release. All the more impressive considering the system described above!

1 Like

Thank you @Oootini!

@PaulBatchelor I’m rendering the time domain part of the A/B now – I forgot to switch to my branch where it was parallelized with openmp so that part might not get posted until tomorrow. :upside_down_face:

I convolved these two samples:

https://www.dropbox.com/s/u0dme3hptt1s56x/mahler.wav?dl=0 with https://www.dropbox.com/s/s1i2p7uxszoutvw/esp.wav?dl=0

This is the freq domain output with the implementation linked above: https://www.dropbox.com/s/gehdo21nohttxz7/mahler-esp-fft.wav?dl=0

The time domain version is still rendering but I have an old render that uses one of the same sounds (the mahler with a different impulse) and I think you can hear the difference readily, especially in the middle, even though obviously the sounds are different too: https://www.dropbox.com/s/agrovz7jnx5a1kf/mahturler.wav?dl=0

So, maybe this is a bug in the time domain implementation? I mean it’s pretty simple to implement a FIR but also I’m dumb. On the other hand that sparkle was exactly what I remember loving about doing these long convolutions with soundhack back in the day, and that must have been FFT-based or it never would have been usable on those old PPCs!

I’d be curious if someone feels like replicating this experiment with the source sounds, to see if there is more high frequency content preserved with another FFT-based implementation.

Edit: my foggy memory is recalling being able to select the window shape and other FFT params for the soundhack convolution – could the shape of the analysis window have this low-pass effect?

Edit edit: yes, here’s the old familiar soundhack dialog:


Reading the manual I’m actually not sure what the implementation was, though it’s somehow block-based, which at least sounds like the FFT overlap-add approach I’ve read about. “In this implementation of convolution the sound is processed block by block, with each block as large as the impulse response.”

You could also definitely select the window, but that was a window that would be applied to the entire impulse before the convolution, not the analysis window. And: “the Brighten box applies a simple +6dB per octave high-pass filter to the impulse.” So some of the sparkle I’m remembering from soundhack might have just come from selecting that option…

I’ll be curious to hear the naive render that’s still chugging along because I’m starting to feel crazy. It felt really noticeable to my ears after months of listening to naive renders and then doing my first tests with the FFT implementation though.


Do post the time domain version when you can, this is interesting. Did you post the algorithm for the time domain version?

I used an FFT convolution on the two samples, and got the same results you did.

I’m using laptop speakers at the moment, but in the time domain example you did post, it sounded like there was very loud noise floor. Perhaps it’s from numerical truncation or roundoff? I wonder if some sort of surgical bitcrushing could be introduced somewhere in the FFT version to simulate it. It’s a very cozy sound, especially with the lo-fi source material.

1 Like

Thanks for your insights and the link, have you been to the space to observe the emerging properties? As I was reading your reply I thought to try to find patterns on long field recordings…My intention to build a generative system goes together with my research in systems view of life (F.Capra, G. Batetson) from ecological perspective.

1 Like

I dusted off the openmp branch to do some more tests. (Keeping it out of the main branch for now since I couldn’t figure out where it can be assumed it will be available on macos, and if not the best way to recommend installing it…)

Here’s the python script running with the omp branch of pippi – the basic time domain implementation used is here.

from pippi import dsp, fx

sources = {
    'mahler': 'sounds/mahler.wav',
    'esp': 'sounds/esp.wav', 
    'wn11a': 'sounds/sources-20170-11.wav', 
    'wn11b': 'sounds/rocks-20170-11.wav',
    'wn15a': 'sounds/sources-20170-15.wav', 
    'wn15b': 'sounds/rocks-20170-15.wav',

pairs = [
    ('mahler', 'esp'), 
    ('wn11a', 'wn11b'),
    ('wn15a', 'wn15b'),
    ('mahler', 'wn11b'),
    ('esp', 'wn11b'),

for s, i in pairs:
    src = dsp.read(sources[s])
    imp = dsp.read(sources[i])

    print('FTT', s, i)
    out = src.convolve(imp)
    out.write('outputs/%s-%s-fft.wav' % (s, i))

    print('TD', s, i)
    out = fx.fir(src, imp)
    out.write('outputs/%s-%s-td.wav' % (s, i))

I enabled indexes on this directory and put all the source sounds and outputs here to make them easy to grab: https://hecanjog.com/conv/

So, incredibly to me, there is one outlier – the one single-threaded time domain render I did yesterday doesn’t have the noisy output! I haven’t tried the single threaded version with the other sources yet.

This is probably the most audible example of the distortion – I realized I had the original convolutions and all their inputs from the parts I used on this record so that made it easy to go back and find outputs I liked to test the inputs again with the FFT. All the outputs in the conv/outputs directory are fresh runs from today except the one single-threaded output from yesterday.

So this was one I liked a lot – the two input sounds are: https://hecanjog.com/conv/sounds/rocks-20170-15.wav and https://hecanjog.com/conv/sounds/sources-20170-15.wav

This is the parallelized time domain output with the nice distortion which kicks in about 6 seconds in: https://hecanjog.com/conv/outputs/wn15a-wn15b-td.wav

This is the (still nice but undistorted) FFT output: https://hecanjog.com/conv/outputs/wn15a-wn15b-fft.wav

I’m really scratching my head about the single-threaded time domain render without distortion. I didn’t parallelize my FIR for quite a while and I remember hearing that sparkly distortion in early renders too.

I haven’t really sat down to speculate where the noise is coming from or how the parallelized version could be exacerbating that but I just wanted to share these in the meantime!

I really want to investigate this further because I love the way the distortion/sparkle sounds. It would be really cool to be able to combine the speed of the FFT with an optionally distorted output similar to these, or find some way to recreate something like it!

Thanks for expanding on your idea a bit more, that sounds really cool! I’m not familiar with either of those people but I’d love to hear more about how you’re thinking about working with their ideas. Also very poetic to take an ecological perspective to working with field recordings. :slight_smile:

The type of emergent behavior I’m talking about is the complexity that happens when simple systems collide. The sort of complexity in the boids flocking algorithm is maybe the most famous example, or the complexity that emerges out of simple rules in cellular automata-based systems. But I’m really talking about very simple systems – like I think the most interesting part of that system I described above was in the final two “insert or replace” passes, which incorporated a simple type of feedback. Each recipe is maybe a dozen lines of code, just little transformations on the inputs, but when you keep feeding the output of the last recipe into the input of the next, and vary the insertion and overlap points, some really crazy structural stuff starts to happen.

The section starting around 22 minutes in on this album is an example of this kind of output. Basically all (I stepped in and did some further editing too) the structure and lightswitch cuts and that fun stuff is a result of feeding the layers back on themselves through those simple recipes.

Anyway, I would love to hear more about your project!

Edit: just looking at the wikipedia entry for Fritjof Capra and read:

Capra advocates that Western culture abandon conventional linear thought and the mechanistic views of Descartes. Critiquing the reductionistic Cartesian view that everything can be studied in parts to understand the whole, he encourages a holistic approach. In The Web of Life , Capra focuses on systemic information generated by the relationships among all parts as a significant additional factor in understanding the character of the whole, emphasizing the web-like structure of all systems and the interconnectedness of all parts.

It sounds like maybe we’re talking about / interested in the same or really similar things. I’m putting Belonging to the Universe on my reading list. Do you think that’s a good place to start?

1 Like

wget -r --no-parent https://hecanjog.com/conv/

Will download all the sources & outputs from my testing locally into a hecanjog.com/conv/ directory FWIW!

(And like these are all sampled from stuff and just be cool and make some music and all that if you want to use any of it – I usually put my music in the public domain b/c it’s easy and woo but yeah YOU DECIDE. it’s plunderphonic!)

Thanks :slight_smile: I would start with the web of life or the systems view of life. It was the web of life that led me to the term " Autopoiesis" (Maturana, Varella) which refers to a system capable of reproducing and maintaining itself. “The tree of knowledge” by Maturana and Varella is really amazing. A parrallel exploration of our ecology as an autopoetic generative system and a similar sound or/and other art form system is what interests me. By studying the interconnections and interdependencies of these systems we could possibly reveal more questions about our ecollogy?