Source Separation

There’s a new source separation Python library available called Spleeter and it actually works surprisingly well.

Source separation is taking a mix and separating it into individual stems (e.g. a vocal track and an instrumental track). Spleeter uses Tensorflow, which is a Python platform for machine learning, to train neural networks that can perform source separation.

I’ve been playing around with it for a day or so and I’m pretty impressed with the results, especially compared with more primitive methods. Some songs separate better than others, and I’m not sure what drives those difference yet. I made some pretty good acapellas from songs that previously had no isolated vocals available. They have a few different pre-trained models, but you can train your own models if you have enough data.

Anyone else trying it out?


I downloaded it and ran it once with the pre-trained model. Haven’t thought much about how I’d use it though… Here’s a blog post (not mine) with several examples if anyone wants to hear some outputs from spleeter:

Part of this may have to do with the time-frequency overlap between sources in a given song. Spleeter estimates a time-frequency mask which determines what percentage of each “bin” can be attributed to each source. If lots of the bins are primarily due to a single source the separation should go pretty well. If there is a lot of overlap it will likely be worse. Spleeter also synthesizes the output using the mixture phase, which is another source of artifacts (even if there were an oracle that told us the “perfect” mask, this would still be a source of artifacts).


I’m interested to hear what happens if you apply this to something that isn’t a musical mix, something like field recordings or complex synthesizer tones. Or… what if you take eight songs and mix them together and then try to separate individual elements from that? Or how about non-audio signals hacked to looked like audio files…

Many of the worries people have about the social consequences of (over use and poorly thought-out applications of) machine learning is down to the fact that these systems aren’t really capable of saying “I have no idea”. Maybe there’s something of artistic interest to be found in that.