Kaizo Snare - Performance & Epic Blog Post

As part of the FluCoMa project (created by @tremblap (and helped along by @jamesbradbury93)) I created this performance for a festival towards the end of last year.

I think I posted the link to this a while back, so that’s not the release part of this. What is the “release” is the epic blog post that I’ve just finished which details all the thinking and motivation behind it. Given the length (14.6k words(!!)) it’s kind of a little book vs a normal blog thing, so I thought it might warrant a “release” category entry.

Here it is:

It covers a lot of ground, including the FluCoMa stuff (machine-learning and signal decomposition), as well as turntablism, my general approach to snare, onset detection, audio descriptors, 3d-printing, robotics, etc… I also get into some of the details of stuff that wasn’t included and where the overall idea will go next.


This is really great, and as an improvising percussionist, totally inspiring!

1 Like


It was great getting into all the details of the playing techniques and approaches. And I’m super glad I filmed little tidbits and examples the whole time I was working on things so I could use them in the blog post.

I knew it was gonna be a chunky one, but I had no idea how absolutely massive it would get…

I’ve done fairly comprehensive blogs for each project I’ve done since finishing my PhD, but this is the single longest blog I’ve ever written.


This is incredible! Great work :clap::clap::clap:

1 Like

this was great. Going to go back to it more thoroughly when not on a tiny screen, because it’s so rich, and I want to watch the videos.

Particularly enjoyed all the stuff on the materiality of faders (much of which is familiar…). Always like diving into things like that.


Hah, yeah.

It’s one of those things that seems reasonable as you’re doing things, but afterwards you realize how far down a rabbit hole you’ve gone. (not saying unreasonable-ness is bad, but just in terms of time and effort spent)

That saga continues on now, as I’m looking into contactless faders and want to figure out a way to have something with equally high resolution, but that feels nicer.

1 Like

Wow, this is so great, I love the performance and the level of detail in the write up. I barely understand it but it’s so fun to get access to someone’s deep dive…

1 Like


I totally love reading stuff like this, but sadly it isn’t something that many people do, or do in this way.

1 Like

I should note that when I say “love diving into stuff like that”, I meant more your writeup! Watching somebody spend the time really getting to grips with that, and documenting it, and spending the time making it clear - it’s super-valuable, but it’s also a rare insight into a deep/long thought process, as opposed to design disappearing into “I decided to” or “X was the best”. Really good stuff.

Also: the gap between ‘feel’ and ‘what you get for free’ - like, lots of nice ‘feeling’ instruments are layers of code on top of the limitations of hardware, as you explored. I think optical faders are a good place to go for feel and precision - at that size of object, it’s not the DAC reading it that’s the limitation. Good luck!


I’m eyeballing one of the bigger Innofader (PNP2 perhaps) to see if the feel (capacitive/contactless) is worth the tradeoff in resolution (10 vs 14bits). Having tried one of the smaller Innofaders on a friends proper mixing desk, it definitely feels good.

I wish I knew about the mechanical aspects of all of this go to full-tilt on a high-res and great feeling DIY thing, but I’m kind of shooting in the dark testing out the options out there.

1 Like

I guess this makes the most sense to post here as it’s another performance of the same “piece”.

But @Angela finally got around to bouncing out this performance from late last year. We had only just gotten our Blackmagic camera, and fucked the ISO settings, but since got a good plugin for denoising the footage so it’s usable.

It’s interesting listening back to this as it was just a couple of weeks after the performance in the initial post here, so I was quite fresh with all of it.


I just saw this percussion detector/categorizer pop up and thought of you @Rodrigo


Oh interesting ,thanks. And it was a NIME paper:


Ah very cool.

From the video it looks/sounds quite accurate. Giving the paper a quick readthrough and it’s doing something similar to what I’m doing above (slightly different descriptors), and in this case using a classifer instead of a generic euclidian distance. I do like the inclusion of zero crossing rate though, as that’s not something I’ve messed with very much.

The time frame, however, of 50ms is crazy latency for realtime use. The paper does say this:

Although a 50 millisecond analysis latency is noticeably late in musi- cal contexts, the authors are developing a predictive music generation algorithm which will correct for this and other delays introduced by the robotic system.

So I’m guessing their workaround for that will be to predict when the next attack might be based on pattern analysis, which is great for beat-based music, but not necessarily others.

Either way, 50ms is super luxurious! As a point of reference, I’m generally working on a 256 sample analysis window at the moment (half of what it was for Kaizo Snare) and that’s like 5.8ms(!!).

Obviously that’s fucking tiny, and there are loads of other tradeoffs (e.g. pitch and low frequencies in general are kind of shit), but getting quite promising results.

I’ve gone back to square one on my feature set, but I’m trying to come up with an aggregate descriptor space that will (hopefully) accurately capture the differences in sounds I’m interested in (mainly prepared snare drum):

On the right you see each individual subspace, and how it maps out on a 3d plot via UMAP reduction (just for visualization).

Each individual subspace is as follows:

  • Loudness (4D) - mean, std, min, max → robust scale
  • Timbre (4D) - loudness-weighted 20(19)mfccs, mean, std, min, max → standardize → 4D UMAP → robust scale
  • Envelope (4D) - deriv of loudness mean, deriv of loudness std, deriv of loudness-weighted centroid mean, deriv of loudness-weighted rolloff mean → robust scale
  • Pitch (2D) - confidence-weighted median, raw confidence → robust scale

This works alright, but the timbre one isn’t ideal, and I still need to refine which statistics I take. But it’s slowly getting there…


you always have such interesting things going on

and you document them in such a meaningful way

thank you for sharing so much throughout the years :slight_smile: