For me this comes down to making a mental model of the signals; this gets missed in a lot of modular synth “tutorials” which are either too fundamental, or too focused on a particular end result. That’s to say: it’s all voltage but outputs create signals with certain intended significance, encoded in a certain way and inputs correspondingly expect certain signals encoded in a certain way to have certain meanings. You need to understand the voltage range, intent of the signal and its encoding to understand the topics you describe.
First voltage, you have a +/-12V supply meaning almost all voltages fall in that range. Since most amps can’t reach rail to rail output, the practical voltage range for encoded signals tends to be +/-10V. Most signals are AC but some are DC so you may hear terms like 10Vpp meaning 10V peak to peak which could be a range of 0-10V or +/-5V. Modules which are “AC coupled” have inputs and or outputs that filter out DC signals meaning if you send in a 0-10V AC signal it treats it the same as a +/-5V signal; a DC coupled module retains the asymmetry.
Broadly you have:
- Audio
- Pitch
- Gate
- Modulation
Audio is an audio waveform: one that, if used to move a speaker cone, would produce the corresponding sound. Typically the waves run in the +/-5V range, as they are AC you often hear modular audio as being 10Vpp. In line level terms, full scale modular audio is at about +14db. Offsetting audio adds a DC bias which may be removed by the next AC coupled module. If you go into a DC coupled module with an DC bias, you get asymmetry which may result in asymmetric clipping (distortion). Mixing works in the way you probably intuitively understand for audio, quantization would result in a “bit crushed” version of the signal with more crushing, the less fine grained the quantization (quantization just snaps a continuous voltage to a set of predefined values based on which is closest).
Pitch is encoded in either Hz/v or v/Oct with an agreement that a certain voltage corresponds to an agreed pitch (e.g 0V=C0 for v/Oct or 1V = C0 in Hz/v): note that the pitch reference I mentioned is not the only choice and different inputs and outputs may generate or interpret differently. Focusing on V/Oct, consider the voltage range: if you chose 0V=C4 for example, you need negative voltages for lower notes; in contrast 0V=C0 needs higher voltages to reach higher notes. You often see 0-5V or +/-5V for pitch. If you add an offset to a pitch for V/Oct, a +1V offset raises the pitch by 1 octave, a -1V offset lowers by 1 octave. Mixing is a weighted sum of pitch signals. If the mixing is “unity gain” on quantized inputs the result will still be chromatically tuned but not necessarily in scale, if the inputs aren’t quantized then the outputs are also unquantized. If you quantize you snap the pitch voltages to notes in a scale. Quantizing after the sum often makes sense and keeps things melodic, but such sums can be confusing.
Gate, trig and clock are all variations on a theme. They typically output 0V or 5V where. Gate represents an extended event (held at 5v while the event lasts), trig represents an instantaneous one (short pulse of 5V: we usually only watch for the rising edge). Clock is a series of trigs at a set frequency in musical time, e.g 24 pulses per quarter note. Inputs expecting such signals usually have a threshold voltage that trigger the input so offsetting will usually have not effect up to a point, then cause it to always, or never trigger. Mixing can end up working like a logical OR. Quantizing typically has no effect.
Modulations are the final type and are intended to adjust other parameters. They run in various ranges up to +/-10V, but 0-10V, 0-5V, +/-5V are also common ranges. Typically inputs interpret positive voltages as positive adjustments to the parameter and negative voltages as negative offsets. An offset voltage will bias the modulation allowing a bipolar (+/-5V) signal to be offset to 0-10V and visa versa. A quantizer will “bit crush” the signal in the same way as it does for other signals: making it “steppy”. Mixing modulations will produce a weighted sum allowing for composite modulations to be created.
The final thought I had was on VCAs. A (linear) VCA can be thought of as multiplying signals where their full voltage range is ‘1’ so if the signal and CV are 1 the output is 1, if either or both inputs is 0 you get 0: this is like a logical ‘and’ when used for gates. Thinking of then as multiplying made it very obvious to me what the output of various combinations of signals would be.
Anyway, this is a fairly long and rambling post so I’ll stop. My main point was: try and understand what the voltages look like and what impact what signals would have on which inputs. This mental model can make various “special” methods quite obvious. The complexity really comes in composing of signals and their second order interactions; similarly when adding elements which delay things (delay lines, shift registers etc). Either way, I hope this is helpful in some way.