This paper is interesting but utterly inconclusive about anything really dealing with sample theory. It’s interesting to note that in their most statistically significant result the result was that nearly everybody got it wrong… also the listeners were all highly experienced, trained sound professionals who rated the difficulty as a 9 / 10 collectively and who stated that they often doubted their own perceptions. The results which are solidly grouped around 50%, corroborate this.
I don’t believe the data supports the authors’ conclusions definitively, it hints that there might be something there, but that’s all. It also is very important to point out that this does not cover audio that’s been UPsampled to 88.2kHz, which I think would have been an essential cross-test as the output conversion of their hardware may simply be better at 88.2kHz.
It’s also very important to note that frequency domain operations will get the same benefits in terms of frequency resolution using upsampling alone - you do not need the source to be sampled at the higher resolution at all, if you’re concerned with frequencies in the audio band only.
In fact, in some other studies (I don’t have references handy, I studied this back in uni) the addition of ultrasonic frequencies from higher sample rates actually made the recorded audio perceptibly worse because they affected the amplification and reproduction equipment unfavourably (stealing energy in the high frequencies from audio content, in some cases). Higher sample frequencies also suffer more unfavourably from clock jitter, which means that, again, they can end up sounding worse than music recorded at a lower sample rate (and clock jitter damage is irrecoverable).
Again, the evidence strongly suggests that the only benefit to running at 88.2kHz and above is if your particular equipment for some reason performs better at that setting (e.g it has poor antialiasing filters at lower sample rates) SoundOnSound discusses this in some detail (mentioning both upsampling and poor antialiasing filtration), or if you need content in the ultrasonic region (e.g. for restoration work where the ultrasonics can be useful frequencies for identifying and filtering out unwanted noise in the audio band). This can be a good reason to run at these rates, but it’s important to note that even in the best case situation only pros seem to be able to tell the difference, and even then it’s quite difficult and requires special listening conditions (and even then they seem to get it wrong more strongly than right!). So if you want to spend the CPU power and storage processing high bandwidth audio, go for it, just realize what you’re (likely not) getting in return.
This is definitely correct. 16 bits is fine for final master output (with proper dithering), but up to then the extra bits make a vastly more significant difference in the final quality than sample rate.