dowser (low-level spectral analysis utility)

thought i would post this small command-line tool for spectral analysis. it’s quite basic and low level, but is a kind of thing that i often find useful in computer music practice, and it’s nice to have a portable, self-contained, scriptable and efficient implementation.

the tool accepts a soundfile, performs a short-term fourier transform, and emits the peaks of the power spectrum for each analysis frame, alongside some statistical measures.


usage, examples

(from readme):

dowser <infile> <outfile>

output file is a supercollider script, defining a list of dictionaries.

each dictionary contains data for a single spectral analysis frame.

per-frame measurements includes:

  • peaks: list of events with (hz, mag) keys, listing spectral peaks for the given frame
  • papr: peak-to-average-power ratio, a measure of “tonal-ness”
  • flatness: AKA weiner entropy, geometric mean / arithmetic mean. another measure of “tonalness”
  • centroid: spectral centroid, a measure of “brightness”

example

starting with this short loop of electric viola:
[ http://catfact.net/dowser/loop2.mp3 ]

dowser produces this fairly large (286KB) supercollider script describing the analysis data:
[ http://catfact.net/dowser/dowser-output-loop2.scd.txt ]

which can then be processed in whatever way you find useful. the following is not very well-considered as a musical algorithm - it is sort of a naive and basic resynthesis of the spectral peaks as sine waves. but it gives a good idea of the nature of this data. (and its limitations! e.g. the relative noisiness of low frequencies)

script:

supercollider script
// load the data
(
post("reading data... ");
~data = this.executeFile(PathName(Document.current.path).pathOnly++"dowser-output-loop2.scd");
postln("done.");

AppClock.sched(0, {
	~data.collect({arg frame; frame.papr}).histo.plot;
	~data.collect({arg frame; frame.flatness}).histo.plot;
	~data.collect({arg frame; frame.peaks.collect({ arg peak; peak[\hz].cpsmidi})}).flatten.histo.plot;
	nil
});
)

// play some tones
(
s = Server.default;
s.boot;
s.waitForBoot {
	r = Routine {

		~frame_stretch = 4;
		/// magic numbers here:
		// 2**13 = fft size
		// 2 = overlap factor
		// 48k = original samplerate of analyzed file
		~frame_period = (2**12) / 48000.0;
		~frame_period = ~frame_period * ~frame_stretch;
		~frame_period.postln;

		b = Bus.audio(s, 2);


		SynthDef.new(\sine_1shot, {
			arg out=0, amp=0, hz=110, pan=0, atk=1, sus=0, rel=2;
			var snd, env;
			env = EnvGen.ar(Env.linen(atk, sus, rel), doneAction:2);
			snd = SinOsc.ar(hz) * amp * env;
			Out.ar(out, Pan2.ar(snd, pan));
		}).send(s);

		~out_limit = {
			Out.ar(0, Limiter.ar(In.ar(b, 2), 0.9, 0.2).clip(-1, 1))
		}.play(s);


		s.sync;

		~papr_min = 20;
		~flat_max = 20;
		~mag_min = 1;
		~hz_max = 3000;
		~max_peaks_per_frame = 3;

		~data.do({
			arg frame;
			var flat, papr;
			flat = frame[\flatness];
			papr = frame[\papr];
			if ((flat < ~flat_max) && (papr > ~papr_min), {
				frame[\peaks].do({
					arg peak, i;
					var hz, mag, sineIdx, amp;

					//peak.postln;
					hz = peak[\hz];
					mag = peak[\mag];

					if (i < ~max_peaks_per_frame, {
						if ((mag > ~mag_min) && (hz < ~hz_max), {
							var db;
							amp = (mag / 64);
							db = amp.ampdb;
							postln([hz, db]);

							Synth.new(\sine_1shot, [
								\out, b,
								\hz, hz,
								\amp, amp,
								\pan, i.linlin(0, ~max_peaks_per_frame, 0, 1).rand2,
								\atk, ~frame_period * 4,
								\dur, ~frame_period * db.linlin(-60, 0, 4, 16),
								\rel, ~frame_period * db.linlin(-60, 0, 4, 32)

							], s, \addToHead);
						});
					});
				});
			});

			~frame_period.wait;
		});

	}.play;
}
)

(note that this must be run in two sections.)

resulting audio:
[ http://catfact.net/dowser/loop2-resynth.mp3 ]

[warning: woops, i “mastered” the original loop rather quietly so the resynthesis is quite loud by comparison]


where to get it

github : https://github.com/catfact/dowser

i have not created any binary release packages and am not particularly planning to, because frankly it has become painful to do this for macOS and windows. if there is interest i could make source packages for e.g. homebrew and chocolatey.


roadmap

the readme mentions a few more wish-haves. but the most pressing would be:

  • support for other output formats. (.csv and .npy come to mind.)
  • expose more tuning parameters (especially frequency range.)

happy to accept contributions. i made the app using JUCE and cmake, not because either are necessary (it is a simple project,) but to facilitate cross-platform development (have tested on macOS, windows 10 and ubuntu,) and the potential addition of higher-level features.

and, please use this topic for questions, ideas etc!

58 Likes

if you’re still willing to do this i’d like to try building the app

hey @glia i can certainly look into doing that. but the steps to build on mac are not too hard anyway… just checked that this is working for me on a fresh system running macos 10.15.5 catalina.

first install cmake

brew install cmake

then in the dowser repo:

git submodules update --init --recursive

(this pulls a local copy of JUCE which is stupidly oversized for the task sorry about that)

and build

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build .

fresh executable then located at <dowser_repo>/build/dowser_artefacts/Release/dowser

(making me think i should ditch JUCE for this. the main reason to keep it would be to build out UI niceties like a drag-and-drop window.)

2 Likes

Sorry I meant the location where you cloned the repo

Sorry I’m on my phone and someone is climbing my hai r, I cannot perform a full illustration

5 Likes

thanks for the additional tips to get this running on macos

my attempt failed while trying to get juce properly set up…i get this error in terminal:

git@github.com: Permission denied (publickey).

fatal: Could not read from remote repository.

hoping you may have a suggestion to help but no rush at all…
i’ll try to troubleshoot myself and can hopefully find a solution in the meantime

i’m pretty sure this is from using an ssh url for the subrepo ? i think you can fix it on either end by switching out the url in the .gitmodulesfile with the regular https:// url of the subrepo

((also on a phone here))
((the culprit i think → dowser/.gitmodules at 146253046ede5964118c1c15e7efeda1b0ff054c · catfact/dowser · GitHub))

that should work and the https path would be https://github.com/juce-framework/JUCE.git (note the change from : to /)

but i would highly recommend getting set up to use ssh with github. (https access and passphrase auth is being phased out, for good reason; you must have ssh to contribute to projects.)

3 Likes

excellent excellent advice

i got my ssh key sorted out and just built dowser following your instructions!
now…perhaps my last questions since i have the app built:

do i need to change directory to where my infile is stored? or specify the path? and then name the .scd.txt outfile?

are there any commands for dowser or it should be one step as shown above?

Give it the path to input file, relative to current working directory. Give path/name for output file or accept the default. No other options

1 Like

don’t know if i’m missing something, but whenever i run Dowser, i get a segfault. has anyone run into this? it runs up until the performing output... step before it segfaults.

ed: i figured out that it was this conditional causing it for some reason. i commented it out and it was fine (i have no idea how JUCE works).

additionally, i was getting consistent segfaults here too – but then i discovered that it only segfaults on files that were generated with my DAW (no clue why – all the files i tried were the same samplerate.) again, i have no idea how JUCE works

very cool tool! thank you!

1 Like

woop - sorry, some WIP made it into main and was a little broken.

pushed some changes and bumped version to 0.2.0.

first crash should be fixed. (initial pointer value needs to be explicitly set to nullptr on some platforms.)

second crash yeah sounds like juce didn’t like your audio files. not sure why without more info.

additionally i cleaned up some issues related to new/improved peak picking algo (using homology persistence instead of amplitude) and more output formats. (which could need more tests, and a binary format would also be nice for large soundfiles.)

note that the new peak tracking algo means that the output format and units have changed. peaks are now ranked by their homology persistence, and the strongest peak’s persistence is always inf within each frame. (the second-most persistent peak has normalized persistence value =1.0.)

finally, i’ve added program arguments for frequency bin range, ampltude threshold, and max peaks per analysis frame (see the readme.)

if i have time i’ll rewrite this thing without juce (and with sndfile/fftw instead.) it’s a silly dependency here.

this is a pretty down and dirty little project but feel free to open issues on github with the usual caveats (repro steps etc.) that’s easier than tracking issues here. if you can run in a debugger that’s always helpful. please do include platform information as a minimum.

7 Likes

sorry for the silly question, but for the unfamiliar how does this correlate to the input sound? from a cursory wikipedia dive i am a little lost


thank you for the new changes! i’m excited about revisiting later tonight

1 Like

sorry - no its not a silly question.


first there is a pragmatic and banal reason for the change: the old peak picking module was not good and in some circumstances would report peaks that were just objectively wrong (plateaus and barely-peaks.) it was wrong enough that i kept noticing stuff i didn’t like in the output. peak picking is a surprisingly tricky little problem as our human sense of “significance” in structure is actually kind of sophisticated.

so with simply looking at magnitudes and positions, you end up with all kinds of ad-hoc rules involving thresholds and window sizes and hysteresis, that is still liable to fail on edge cases and have bugs.

whereas using homology persistence (call it HP) involves a dead simple algorithm that maps to our intuitions about significance pretty naturally and easily. but primarily it is about correctness and efficiency in the search process.


but more subtly, why might the HP metric be “better” than a sort by magnitude? well…

first let’s recap how HP works:

the metric gets its formal definition from topology, but the concept has a simple intuitive meaning:

  • imagine the magnitude spectrum for a given analysis frame as a series of hills and valleys.

  • imagine this “landscape” entirely submerged under water.

  • now imagine the water level gradually being lowered. the peaks in the landscape emerge from the water as the summits of “islands.”

  • sometimes, one island merges into a bigger one. when that happens, the smaller island’s persistence is recorded as the vertical distance (change in water level) from where it emerged from the water, to where it was subsumed by its larger neighbor.

  • the “highest” or “most significant” peak is the last one left un-merged when all the water is drained.

so how does this apply to audio/music analysis? the basic advantage of the perisistence metric is that it can help filter out “false” peaks in favor of those that are more spaced out (and likely to be part of a harmonic series etc.)

here’s a contrived example, but relevant. we have a signal containing three things:

  • bandlimited sawtooth, with steadily decaying harmonic partials
  • a single sinewave close to the saw’s fundamental
  • noise floor

(the sine is standing in here for any kind of transitory and spurious isolated frequency component that we would regard as “noise” in our peaks.)

(supercollider)

SynthDef.new(\false_peak_demo, {
	arg out=0, sawHz=110, sineHz=116, noiseAmp=0.1, sawAmp=0.5, sineAmp=0.12;
	Out.ar(out, ((Saw.ar(sawHz)*sawAmp) + (SinOsc.ar(sineHz)*sineAmp) + (WhiteNoise.ar * noiseAmp)).dup);
}).send(s);

taking the magnitude spectrum with 8192 FFT points, and plotting the first 100 bins:

(matlab/octave)

x = audioread("/Users/emb/Desktop/peaks_test.wav");
x = x(:,1);

nfft = 8192;

X = fft(x, nfft);
magX = abs(X);

plot(magX(1:100))

you see the isolated sine as a peak (call it C) sticking out of the right side of the biggest/leftmost peak corresponding the saw fundamental (call this A). to the right of that and well isolated is the first harmonic of the saw (call this B)

in many contexts, C is less interesting to us than B even though it has a higher magnitude. however it has a lower persistence value since it is “absorbed” by the bigger peak to its left.

this is a contrived example but in practice ive found it to be typical of the kind of case where i prefer the HP measure to a magnitude measure (or even a local relative magnitude, which is affected by arbitrary things like size of the “locality window” or whatever.)

still, worth noting: the same peaks will be found either way, given that you’ve set the maxPeaks parameter big enough to catch everyhing you’re interested in. both magnitude and persistence are stored in each peak data structure in the output, so you can sort peaks in each frame any way you want.

hope that helps.

9 Likes