or Connect
AVS › AVS Forum › Audio › Audio theory, Setup and Chat ›  Human hearing beats sound’s uncertainty limit, makes MP3s sound worse
New Posts  All Forums:Forum Nav:

Human hearing beats sound’s uncertainty limit, makes MP3s sound worse

post #1 of 13
Thread Starter 
Thought this was interesting and wanted to share.

Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

http://arstechnica.com/science/2013/02/human-hearing-beats-sounds-uncertainty-limit-makes-mp3s-sound-worse/
post #2 of 13
I've asked one of the authors for a courtesy reprint.
post #3 of 13
Thread Starter 
Quote:
Originally Posted by Chu Gai View Post

I've asked one of the authors for a courtesy reprint.
It's an interesting read. I wonder if there could be a graph of some sort to show the effect.
post #4 of 13
Quote:
Originally Posted by pgwalsh View Post

Thought this was interesting and wanted to share.

Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

http://arstechnica.com/science/2013/02/human-hearing-beats-sounds-uncertainty-limit-makes-mp3s-sound-worse/

One of several flies in the author's ointment is that the standard the author used, based on a simple reading of the laws of physics, recovers both phase and amplitude information, while the ear is notoriously insensitive to phase. Another is that the test given the humans allowed them to improve their scores by means of learning, while the method based on the simple reading of physics can't learn. Finally, it is the choice of the coder developer whether or not he actually takes the time to obtain the phase information.

In short, it wasn't an apples-to-apples comparison.
post #5 of 13
Since MP3 can use sampling rates from 128K to 320K, it is wise to avoid too much generalization.

Lower-rate MP3 copies sound poor to me on most types of music, although a lot of pop recordings are crap anyway so it doesn't matter.

I just go with WAV files and get great results.
post #6 of 13
Quote:
Originally Posted by commsysman View Post

I just go with WAV files and get great results.

Yep.

But I always thought MP3 itself was patented by Fraunhofer based on psychoacoustic research, not just "simple physics" ?
post #7 of 13
Quote:
Originally Posted by GIK Acoustics View Post

But I always thought MP3 itself was patented by Fraunhofer based on psychoacoustic research, not just "simple physics" ?
MP3, WMA, AAC, etc. are all what is called "transform based" codecs. Music signal is taken from time domain into frequency. Once there, we can apply (frequency based) psychoacusitics rules to what is audible and what is not. We can then further truncate the resolution of each frequency band based on our hearing sensitivity. Once done, the rest of the data is losslessly compressed and that is your final bit stream.

When the transformation is made into frequency domain, a time window is selected. The size of that window determines our "time and frequency" resolution. If we use short windows, we improve our time resolution but lose efficiency because we don't have sufficient resolution in frequency domain. If we increase the frequency resolution, distortion spreads across our time window which smears transients. There is no one optimal choice here and hence the uncertainty principal discussed in the article.

Good encoders make the right trade off between different window sizes by analyzing the content and matching the window to the frame of music being encoded (usually there is only a choice of two window sizes). Short windows for transients, long windows for steady state. Alas, this analysis is far from trivial due to efficiency losses of short windows as data rates get lower. Even detecting transients can be challenging.
post #8 of 13
As far as "lossy" codecs go, I was involved in some blind testing many years ago between WMA and MP3 and WMA beat MP3 at the same bit rate and even when it was lower than the MP3. Every person in my test group picked the WMA as better sounding.
post #9 of 13
Quote:
Originally Posted by Riffmeister View Post

As far as "lossy" codecs go, I was involved in some blind testing many years ago between WMA and MP3 and WMA beat MP3 at the same bit rate and even when it was lower than the MP3. Every person in my test group picked the WMA as better sounding.
Thanks for saying that. My team at Microsoft developed WMA and what you state was its target! I am sure if they were reading this post they would be very pleased smile.gif. We created WMA when the common connection to the Internet was a dial-up. At that time, all MP3 could do was produce AM radio quality due to its very low sampling rate (and hence frequency response) at effective dial-up rate (just 20 Kbps for 28K modem). Our goal was to achieve FM quality at the same rate which was a tall order as high frequencies are the hardest to encode (transients are made up of high frequency components). One of the main application of online media was streaming radio at the time and hence this initiative.

Similarly at the time, typical MP3 player (prior to the introduction of the iPod) had flash memory which was very expensive and so there was a need to get to "CD quality" at 64 kbps instead of 128 kbps required by MP3 to reach 44.1 KHz sampling.

I don't know that we achieved the above targets for all content but we did push forward the goal post relative to MP3.
post #10 of 13
Quote:
Originally Posted by amirm View Post

MP3, WMA, AAC, etc. are all what is called "transform based" codecs. Music signal is taken from time domain into frequency. Once there, we can apply (frequency based) psychoacusitics rules to what is audible and what is not. We can then further truncate the resolution of each frequency band based on our hearing sensitivity. Once done, the rest of the data is losslessly compressed and that is your final bit stream.

Amir,

I understand this process of compression, I just got confused on the wording of the article. I felt like it read: "We need to revisit compression because *insert the reason MP3 was invented*" which wouldn't really make a strong argument, but I think I just read through it too fast. I now understand that it was simply pointing out the downfalls of the current MP3 compression regime. Of course FLAC has already done job of lossless file size compression themselves, and if they applied slight audio compression, we could likely get very small file sizes with hardly any loss - do we really need to re-evaluate compression or do we just need to switch over to a different standard audio codec?
post #11 of 13
I think everyone also needs to consider that the "psychoacoustic assumptions" made when MP3 was developed were generalized (a lot like the compression algorithms used for JPG pictures). This means that they were designed to work well on most music, for most listeners, most of the time. I've certainly heard low-rate MP3 files that sounded very good (and indistinguishable from the original). I've also heard specific pieces of music that always sounded bad when converted to a MP3 (no matter what bit rate or encoder). It seems obvious to me that certain sounds, or combinations of sounds, fail to "fit into the model" that was used when MP3 was developed (they probably have too many disparate tones - disparate in time or frequency - and the encoding algorithm is unable to find a way to apply the encoding algorithms to successfully reduce the size while ensuring that all the removed information is masked). A similar thing happens with JPG pictures.... certain originals contain elements that the process doesn't work well on.... and so the result is poor. [With JPGs, they work well on continuous tones, like portraits, and very poorly on artwork with lines, like cartoons or text. You can easily JPG a portrait to 10x smaller with no visible loss of quality, but text usually shows noticeable degradation - under magnification - even when JPGed at full size.]

Of course, lossless files have the huge advantage that, since nothing has to be omitted, you don't run the risk of omitting something that cannot be masked.....
Reply
Reply
post #12 of 13
Quote:
Originally Posted by commsysman View Post

.... although a lot of pop recordings are crap anyway so it doesn't matter..

+100.

If the engineer is running a million db of compression on the 2 bus in efforts to have a competitive stake in the loudness wars, the damage is already done and no lossless standard will ever make it sound any better. I have done the "old recording vs new recording" listening game with my wife recently and even she, who usually opines things to sound "the same" can tell a dynamic difference across most popular genres recorded within the past 15-20 years, regardless of playback format.
post #13 of 13
Hi,

I have some additional information about the scientific paper that is the topic of this thread:
Jacob N. Oppenheim and Marcelo O. Magnasco, "Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle", Phys Rev Lett 110 044301 (2013)

which is found on-line here: http://prl.aps.org/abstract/PRL/v110/i4/e044301

[1] Full access to the Physical Review Letters (abbreviated "Phys Rev Lett" or PRL) article requires either of the following:
(a) Membership in the American Physical Society (http://www.aps.org/membership/join.cfm) and, in addition, an online subscription to PRL, which is an additional $50 / year for American Physical Society members.
(b) (Probably) purchase of the individual article. Sorry that I can't provide full information on the "individual purchase" option right now. However, the free option described in my next paragraph makes any of the cost options mostly unnecessary.

[2] This is a "mostly good news / some bad news" item wink.gif
You don't need a subscription to PRL to get access to the article. A "preprint" of the full article, which is almost identical to the final PRL version, is available at the free "arxiv.org" site. Here: http://arxiv.org/abs/1208.4611

So what's the "bad news" part? The preprint at "arxiv.org" includes the full article, but does not include the Supplemental Material (Footnote [21]: "See Supplemental Material at [URL will be inserted by publisher] for testing procedures and parameters, fitted data, controls, and discussion of performance at other parameter values)." The Supplemental Material is (as far as I know) available only from PRL: http://prl.aps.org/supplemental/PRL/v110/i4/e044301 The Supplement turns out to be of equal length to the "regular" article and includes considerable additional information on some topics, especially Experimental Design and Controls. (BTW, this is a trend I've noticed in publication of scientific papers - to put a lot of information about the study in a separate Supplement document, which can be as long as the "regular" article - not a helpful trend, in my opinion mad.gif.)

I have access to an online PRL subscription through my employer, so I can see the PRL version of the regular article, which as I said is almost identical to the free Arxiv.org preprint, and the Supplement. (Sorry I can't publicly share copyrighted material from the PRL website - please don't ask.) Maybe the corresponding author would agree to make the Supplement publicly available if the author knew the interest his work has generated in audio discussion groups. The author's contact info is available here: http://prl.aps.org/abstract/PRL/v110/i4/e044301

[3] Finally I have a comment about the content of the scientific paper, versus this statement by pgwalsh in his first post (my emphasis):
Quote:
a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics.

From my reading, the "vast majority of humans" part is absolute wrong as a summary of the scientific paper, and should be replaced by something like "researchers demonstrated that a tiny minority of humans, namely composers and conductors of 'classical' or 'serious' music (and, to a much lesser extent, musicians), can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics". (I'm not blaming pgwalsh for this inaccuracy if pgwalsh only had access to the arstechnica article, not the actual scientific paper.)

Let me explain with some quotes from the (open, Arxiv.org version of the) paper. First, this study consisted of controlled experiments on the hearing abilities of a group of humans, the test subjects of the study. The subjects were given five hearing discrimination "tasks" of increasing difficulty. Only success in the final and most difficult task, task 5, showed that "human hearing beats sound's uncertainty limit". (In other words, if all the subjects had succeeded in tasks 1 to 4, and all had failed in task 5, the paper would not exist in its present form with its present title, and we probably wouldn't be talking about it here.)

So what was the critical 5th task, and the other four tasks? To quote from the caption of Figure 1:
Quote:
In our final task 5, subjects are asked to discriminate simultaneously whether the test note (red) is higher or lower in frequency than the leading note (green), and whether the test note appears before or after the flanking high note (blue). For each instance of the task, two numbers are generated (Dt and Df) and two Boolean responses (left/right, up/down) are recorded. Tasks 1 through 4 lead to this final task: task 1 is frequency only (uses two flanking notes), task 2 timing only, task 3 is frequency only but with the flanking high note (blue) as a distractor, and task 4 is timing only, with the leading (green) note as a distractor.

and to quote from the caption of Figure 3 (my emphasis):
Quote:
Each round dot is a completion of Task 5 by a subject on an individual day, with at least 100 presentations. There were 12 subjects totaling 26 individual sessions for Gaussian and 12 sessions for notelike tests. Blue denotes Gaussian packet while red denotes notelike. The two solid lines are the locus of the relation [doesn't copy well from the PDF]; any dots below these curves violate the corresponding uncertainty relation.

So, who were the subjects, and who succeeded at which tasks, including the critical 5th task? To quote from the paper once more:
Quote:
It is important to stress where the difficulty of the task lies. Our preliminary testing included non-musicians, who where often close in performance to musicians on tasks 1 and 2 (separate time and frequency acuity), but then found tasks 3 and 4 hard, while musicians, trained to play in ensembles, found them easy. We further found that composers and conductors achieved the best results in task 5, consistently beating the uncertainty principle by factors of 2 or more, whereas performers were more likely to beat it only by a few percentage points. After debriefing subjects, it appears that the necessity of hearing multi-voiced music (both in frequency and in time) in one's head and coaching others to perform it led to the improved performance of conductors and composers.

So, according to this study, "composers and conductors" are the true high-end audiophiles, with specific hearing abilities that far exceed the abilities of average humans smile.gifsmile.gif (Did anyone besides me ever anticipate or imagine that a scientific study, with this very specific result, might one day exist? biggrin.gif)

[small clarification: I'm definitely not claiming that I "imagined" or "anticipated" the existence of a study on the topic "Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle". I am claiming that I imagined (long before this study) that scientific research might show "hearing acuity" of the most highly trained and skilled musicians is superior, in some unspecified way, to "hearing acuity" of average humans. I have no proof that my imagination did this - I didn't write down a prediction in AVS Forum or elsewhere - but you can take my word wink.gif ]
Edited by Sonic icons - 6/5/13 at 6:34pm
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Audio theory, Setup and Chat
AVS › AVS Forum › Audio › Audio theory, Setup and Chat ›  Human hearing beats sound’s uncertainty limit, makes MP3s sound worse