Hey smart people! How do you think iPhone 11 Pro Audio Zoom works?

I’m excited to get the new iPhone 11 Pro for my vlogging adventures. I’m hoping that the new mic extends the capabilities of recording without an additional mic or lav.

The demos I’ve seen of Audio Zoom look pretty impressive, although it also sounded a little ‘phasey’. Wonder how it works? My guess is that they are taking in a stereo signal from the top and bottom of the phone, and then using some mid/side processing to reduce the level of the sides.

from the demo video i could find, it didn’t sound impressive to me, only garbage slightly more directional than standard garbage. In a way that is a relief, physics law still work and good audio still requires optimal placement of the microphone, that is, not on the camera.
They’ll better try to read lips and synthesize speech, that would be actually impressive ^^.

Maybe a beamforming microphone?

Here’s a better example:

Like I said, it does sound phasey to me, but hard to know if that’s the youtube codec…

I wonder if it can work live, or if its heavily post processed?
Could be some combination of beamforming microphone array, independant component analysis and image recognition to assess the subject and filter the audio to give best results for the subject

The diagram on the 11 specs page shows 3 microphones, 2 of them sufficiently away from each other to perform DSP for this purpose. but on the bottom it says “built-in microphones”. Phase differences between the mics, the orientation sensor data, a spectral gate with a balanced window, and a powerful chip such as the one those phones have, altogether not impossible for it to work live - just record the audio with a slight delay (DSP chain latency). No problem for the video container file format.