there is a PR open to have audio.file_info use absolute path instead. will merge and release update within a few days, and also address fixes in mlr etc
Is there an annotated version of the github soft cut syntax somewhere? I’m trying to diagnose some distortion and sizzle, and I’m having a hard time understanding the gain staging in the signal path, which parameters to adjust, and by how much.
annotated? not really
i’m going to do a deep test/fix session on the gain staging and meters. probably tonight…
oh, one thing to note is that inputs to each softcut voice are softclipped before writing to the buffer. so the input gain matrix for each voice is probably the first thing to check.
there are some softcut changes ready for the next update (which will be released tomorrow or monday)
are you writing a new script or diagnosing an existing script?
to make sure you’ve seen the not-so-annotated: https://monome.github.io/norns/doc/modules/softcut.html
New script with multi voice playback; I’m getting occasional sizzles which i think are happening when voices stack, or stack with external input (@zebra pointing the way above).
I will experiment more after the update!
First: @zebra, thank you for creating this… system? library? powerful magic? I’ve had to readjust my worldview somewhat to accept that the sort of layered varispeed looper I’d always wanted, searched for for years, never found, and gave up on ages ago, is actually well within reach now.
I have one thing I want to confirm, though, and a couple questions that I hope I haven’t overlooked answers to elsewhere. Background: the script I’m working on lets me switch a voice’s rate from positive to negative at the press of a button.
Thing to confirm: if I set loop points for a voice, record to that looped buffer section, and then change the voice’s rate from 1 to -1, I seem to also need to move the loop start & end point forward by the voice’s fade time (i.e. if fade time is 0.04, then a start point of 1 becomes a start point of 1.04), otherwise playback drops out for a moment when it reaches the loop start point. That’s because when rate is negative, Softcut plays a bit of the buffer “before” the loop start while crossfading to the loop end point. Right? (I find it really hard to phrase this stuff in a way that makes sense – this is one of those situations where special verb tenses for time travel might be useful.)
rec_offset. I don’t think I’ve seen an explanation of or application for this parameter. This is the distance between the read and write “heads,” correct? And it’s non-zero by default, presumably because it has to be. So if voices continually read from one point in a buffer and then write that data back (when pre_level = 1.0) at a slightly different point, doesn’t that introduce some small degree of inaccuracy in loop lengths? (Again, my linear-time brain may be failing me here.)
Question 2: Recording while a voice’s rate crosses zero.
Is this a feature that needs to be enabled per voice, or is it always on? I’ve noticed that playing back audio recorded while a voice is switching direction seems to result in a click a the point where the rate hits 0, like you described here:
but the LPF-based solution you offered in that thread didn’t seem to help that much. The click became less nasty, because it was being filtered, but it didn’t go away. For now I’ve gotten rid of the click by “manually” (in Lua, using a metro) setting rec_level to 0, then changing the voice rate, then resuming recording after the Ensuing Slew. If there’s a more “built-in” way to do this, I’d love to know about it. If not, no big deal – I really enjoy how low-level Softcut is in general, that’s what makes it so useful and cool.
hey, thanks for the close look and the clear/specific feedback! very helpful.
these things all need a little work and tightening up. i just need a push to get it done.
right. when rate > 0, you want some post-roll material in the buffer after loop end. when rate < 0, you want pre-roll before loop start. so it’s probably good practice to leave both when loading samples. you need enougn pre/post to accomodate the fade times you want.
yes. really, this probably doesn’t need to be a parameter at all. it just needs to be long enough that the read and write interpolation windows don’t overlap. 8 samples is sufficient. offset sign is flipped automatically when rate goes negative.
if voices continually read from one point in a buffer and then write that data back (when pre_level = 1.0) at a slightly different point, doesn’t that introduce some small degree of ey, inaccuracy in loop lengths?
mm… i don’t think so.
pre_level isn’t implemented by feedback through the read head. it’s implemented in the write calculation itself. feedback through the read head can be done with the voice matrix mixer. if you use that to implement regenerating delay with one voice then… yea i guess the delay time might be off by 8 samples from what you’d expect.
hey actually, i forgot - this was removed at some point. the configuration wasn’t quite right and it wasn’t doing it’s job and it was eating cycles. i should fix it and bring it back. this shouldn’t be hard.
two things: 1) over last couple updates, default pre-filter settings were changed so that it is completely dry. this was because we had complaints that the softcut output tone didn’t exactly match the input, and particularly that some phase distortion was detectable on very low frequencies. this is an expected outcome of the filtering. so now the LPF, dry, and modulation settings have to be explicitly set in the script.
and 2) you’re right that the filter modulation is only a partial mitigation of the clicks encountered when crossing rate==0. firstly, it’s not a perfect brickwall. secondly, although rate is updated per sample (with slew) the filter is only updated per block. that is because the filter coefficient computations include
tan and so are a little expensive. for the
softcut~ max external i changed the filter update to per-sample. for norns i could do this but i’m concerned about CPU hit. to mitigate that i would try to find a decent approximation of
tan for the coeff calc purpose. i did this for the implementation of a similar filter structure on aleph; it works fine but has the drawback of limiting the useful range of cutoff frequency - and also (importantly for this application, maybe?) low cutoff frequencies are less accurate.
finally, there is another de-clicking feature that i implemented in a branch but didn’t merge to master because it carries a significant performance hit. in this feature, each voice (call it
V) tracks the position of the record head of each other voice (call it
W). proximity of
W record head and
V play head causes
V playback volume to duck.
to me, this is a simple and acceptable solution to clicks introduced when voices read/write overlapping buffer regions at different speeds. (better to my ears than the ‘switch-and-ramp’ integrator approach.) but, it requires each voice to be computed per sample, rather than per block (as is done now.) this introduces more, cache misses and doesn’t allow the compiler to vectorize inner loops.
so in short, i’d like to button up all these things better. but to make it really super clean, i think i’d want to drop the voice count to 4. (or 5, but that seems weird.) (alternatively, could keep voice count at 6 but limit the number of simultaneous voices that can be writing - write is substantially more expensive.)
maybe this thread is a good place to ask: would anyone not think that cutting back to 4 voices is an acceptable tradeoff for dang-near perfectly clickless audio in all conditions? (of course it’s still possible to deliberately create ‘clickful’ configurations if you want that.)
4 voices of
Sounds amazing to me.
I kinda wanted 5 or 7 for a thing someday but they don’t all need to be recording at the same time.
would disabling voices you don’t need make more CPU room for the accurate stuff ? maybe just like voices 4+ are disabled by default ?
This is how I feel about it too. The closest thing has been the Octatrack, but softcut requires so much less ceremony and offers superior routing options and buffer volatility at just the right level of abstraction.
yeah, simply disabling recording on a voice is enough to cut its CPU footprint by >50%.
with the max external of course, the voice is the abstraction; simply instance as many as you want.
(at the same time it’s harder to see how one would implement the cross-voice ducking feature in the max environment, if you wanted multiple voice isntances working with the same buffer region.)
but yeah, for norns it seems reasonable to have
N voices that record, and
M voices that don’t, maybe
N = M = 4 is possible.
anyways, thanks for the feedback and the push. very gratifying to hear that the work is useful. i’m going to block out some time next weekend for profiling and (re-)implementation of some of these things.
also, happy to help out if anyone else feels interested in working on the codebase.
Ah, that’s what I was missing. I think I got carried away by the tape metaphor. I see the write operation multiplies the existing sample value by the pre amount, then adds the input value. Makes much more sense.
I’d have no right to complain if the number of voices was reduced to 4, but four read/write voices + two read-only would be even better. Two more read-only voices, if that’s what you’re suggesting, would be like whoa.
edit to add: my DSP knowledge is probably pretty firmly in the “just enough to be dangerous” range, or possibly just enough to be annoying. But I could probably handle removing a feature, like the rec offset parameter, if you want a PR for that!
yea I was speaking for norns (pi shield soon!)
16g of ram on my mac and I’ve run 16 voices in max without anything exploding, but that’s hardly necessary.
are you still interested in working on a shared library for softcut ? It’s not really necessary for me, but might be nice. If so I could probably put aside some time to update the max external and work on a pd version.
i’m in strong preference of a more-perfect 4 than a clicky more.
As we enter in to the era of RPi4 with the DACs, would the extra computing power be able to be applied to a greater number of perfect voices? If so, being able to choose between # of voices and level of quality would be nice. This could enter in to a bit of potential nonsense of ‘why is my norns creasing? …Oh!’ moments, but it could be a usual option…
Know what? I don’t think I know enough about how this all works to make many suggestions about the way Softcut progresses. Instead, I shall put my trust in those far more in tune than I with this.
sure. refactoring and supporting multiple platforms is the kind of task that’s hard to prioritize compared to actually making stuff. a third port would probably tip the balance.
i assume you mean “shared” in a broad sense (probably static or source lib.)
(thinking through some of the attendant issues today i stepped in a small rabbit hole concerning “correct” compile-time conditionals in modern cpp. something with template overloading, jury still out.)
hm, moving on, but also relatedly…
sure. i’ll say upfront that i’m not actually interested in rpi4 or upgrading norns, for various reasons. fortunately, softcut voices don’t take up CPU when not enabled. so i don’t have a problem making voice count a runtime decision. however, it’s (slightly) more efficient to make voices that can never record as a compile-time decision.
anyways… i think i can get to this stuff this week, at least partially. i like softcut, it’s something i use on the desktop quite a bit.
i’m on a sudden business trip. these things can be good for putting in tech project time, because i’m out of my normal human flow - no wife, dogs, music, garden. and at the same time i’m not doing my usual work routine (code, test audio, research) and aren’t as burned out on that stuff in the evenings…
What would be the best way to go about creating a ping-pong delay with softcut? Should I pan 2 voices left and right, and somehow offset the playback of the second voice? I’ve tried doing this but couldn’t find a softcut param that does what I want exactly. Very likely I’m misunderstanding how the buffers work.
Would it make more sense to use one voice, and create a lua LFO to dynamically pan it left and right?
maybe something like
input > voice1/voice2 stereo pair
voice1 > voice2
voice2 > voice1
voices are synced, pre_level=0
not sure if I explained that super well, but essentially for ping pong the stereo image needs to be reversed in the feedback stage. you can do that with voice routings (as opposed to pre_level)
@crim you’ve got an autopanning delay there–for a pingpong my understanding was that the dry input is fed into only one delay, is that right?
input -> delayLeft ---------------> outputLeft delayLeft -> delayRight -> outputRight delayRight -> delayLeft
I think this would work if ya only wanted it in mono but to my memory it’s all good to feed a stereo input into both delays as long as you cris-cross the feedback lines