Audio over Bluetooth changing pitch


When listening to music off my phone via Bluetooth in my car, the pitch will regularly fluctuate. The whole track will just go sharp or flat for maybe 20-30 seconds at a time before changing again.

I’ve found lots of commentary about causes and fixes online, but I’m really curious how the signal changes pitch. Does anyone here know?

I’m not looking for a fix, to be clear. I can’t find anything from curious audio nerds about what in the tech would result in a change in pitch as opposed to drop outs, distortion, skipping, etc.

Some kind of error between sample rate when sent vs when received?


It could be a sample rate issue, yeah. For a while I was using a cheap analog-to-SPDIF converter with my audio interface, and it would occasionally decide that it was the master clock and everything was going to run at 48Khz while the computer thought it was at 44.1, or something like that. All the voices in Overwatch sounded like they were on helium :laughing:


Sounds indeed like a sample rate error. Any and all audio problems are possible with Bluetooth and the car situation is indeed one of the worst cases. You don’t want to get me started on all of the challenges in doing robust speech recognition in this environment…


Ditto last night with a Logic X session where all plugins were out of tune, not to mention general crashiness that I never had before. Time to go back to the mountain lion machine w/Logic 9… I’m so close to doing this right now…


The ratios don’t sound as extreme as between 48 and 44.1, to be honest. Like the bending would be well within 1 semitone.

A new thought: almost all music I listen to in this situation would be VBR MP3s. A brief bit of reading suggests “frames” of an MP3 file contain the header + data, so perhaps somehow decoding the right bitrate of the MP3 for any given frame is causing the fluctuations.

I have absolutely no idea if that’s possible. :confused:

I might check what on my phone is 320 and just see if that ever bends in the car, just out of curiosity.


Bitrate and sampling rate aren’t the same thing. In theory, the sampling rate determines the maximum possible bitrate, but MP3 encodings are well below this, and for good reason.

MP3 bitrate controls the level of artifacts in the audio, such as muffled transients, and in extreme cases phasiness and “musical noise” (extra sine-tone glitches). 320 should be fine for virtually all music, even percussive stuff. VBR just means that different chunks of the file are encoded/decoded at different bitrates, so you’ll just experience a loss in quality if the connection is bad – but no interruption in audio. VBR should not result in a change in the underlying audio sample rate.

It’s the same with image codecs, like JPEG. The analogy to bitrate is file size. You can encode a 1000x1000 image at various quality levels – the higher quality, the larger the file size – but it will always decode as a 1000x1000 image. For very small file sizes, the image will appear blocky, and dark areas will have strange colors. But you’ll still get the same size image in the end.

Also – the 48K/44.1K ratio works out to 1.467 semitones, so this could still be what’s going on. It’s smaller than you think.


I had a Bluetooth headphone issue with supercollider where the server wouldn’t start because the input sample rate was different then the output rate. I suspect there are profiles for Bluetooth audio that are constrained by some protocol details. In my case the headphones had a microphone at 16k sample rate and output at 44.1k.


only other thing – your car’s audio system may deal with missed buffers by intentionally buffering up past audio and slowing it down… but then you’ll have a discontinuity when it catches up. but the fact you get this with VBR more often is confusing. In theory you should have fewer missed buffers with VBR even if the audio experience is inconsistent in other ways.

I admit the pitch shift may sound better than silence, but there are so many better ways to do audio gap filling…even repeating the last buffer (stuttering) would be better than a pitch shift, and requires less effort to code. But there are much better and well-known ways even than this.


Thanks all.

This sounds pretty likely to me to be the story!