i encountered this article: The Ghost in the MP3 | Semantic Scholar while googling the subject a while ago.
Something fascinating about that idea. Has anyone given this a go themselves?
Do you have any process recommandation ?
Over the past day or two I’ve been tinkering with quite a simple python script to carry out the process.
Ryan’s original work is really interesting and well thought out, but I couldn’t find any source code or specific steps for the algorithms he used to perform the subtraction / resynthesize the sound.
A few relevant passages:
From a simple technique such as phase inversion to something as complex as developing a new codec– the inverse mp3,
Two possible ways forward emerge here: we can either
resynthesize the new matrix directly using an inverse
transform or, we can zero the corresponding bins in the
original uncompressed file where the difference is null or
near-null, i.e.- using the MP3 as a mask on the original
(the “new matrix” he is referring to corresponds to taking the difference bin by bin between the original / compressed spectra)
I’ve got a few different approaches in the script, just for experimentation and seeing what works out best.
- Just performing a regular complex subtraction of the polynomial complex no. representation of the data. This means that the magnitudes / phase are not handled separately
- Converting to polar representation and subtracting the magnitude and phase separately before converting back into polynomial representation and doing an inverse short time fourier transform
- Zeroing the bins in the original spectra where the difference in the magnitudes (mag_orig - mag_compressed) is < some threshold
Each lead to slightly different and yet somehow unsatisfying results. I mostly just get out either a noisier or quieter version of what I put in. But I’ll keep on experimenting, there are a few other points Ryan mentions which I haven’t done: dropping phase entirely, performing a constant-q transform after the STFT etc
I’d also like to get the set of files he was using in his original analysis for comparison, but they’re not downloadable from his SC
Anyway, for anyone who would like to use the script the dependencies (for python3) are: pydub (for mp3 conversion), matplotlib, scipy & numpy. I think the only change necessary to get it to run is to change the path to the audio input file towards the bottom