Human hearing beats sound’s uncertainty limit, makes MP3s sound worse - AVS Forum
Forum Jump: 
 
Thread Tools
post #1 of 13 Old 02-27-2013, 09:21 AM - Thread Starter
AVS Special Member
 
pgwalsh's Avatar
 
Join Date: Feb 2008
Location: Colorado Springs
Posts: 1,339
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 24 Post(s)
Liked: 43
Thought this was interesting and wanted to share.

Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

http://arstechnica.com/science/2013/02/human-hearing-beats-sounds-uncertainty-limit-makes-mp3s-sound-worse/

Builds: Maelstrom 21 Ottoman Build, Dual Opposed MFW's x 2, Statements, SEOS-12/TD12M x 5. 
pgwalsh is online now  
Sponsored Links
Advertisement
 
post #2 of 13 Old 02-27-2013, 10:55 AM
AVS Addicted Member
 
Chu Gai's Avatar
 
Join Date: Sep 2002
Location: NYC area
Posts: 14,728
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 121 Post(s)
Liked: 418
I've asked one of the authors for a courtesy reprint.

"I've found that when you want to know the truth about someone that someone is probably the last person you should ask." - Gregory House
Chu Gai is online now  
post #3 of 13 Old 02-27-2013, 12:03 PM - Thread Starter
AVS Special Member
 
pgwalsh's Avatar
 
Join Date: Feb 2008
Location: Colorado Springs
Posts: 1,339
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 24 Post(s)
Liked: 43
Quote:
Originally Posted by Chu Gai View Post

I've asked one of the authors for a courtesy reprint.
It's an interesting read. I wonder if there could be a graph of some sort to show the effect.

Builds: Maelstrom 21 Ottoman Build, Dual Opposed MFW's x 2, Statements, SEOS-12/TD12M x 5. 
pgwalsh is online now  
post #4 of 13 Old 02-27-2013, 12:08 PM
AVS Addicted Member
 
arnyk's Avatar
 
Join Date: Oct 2002
Location: Grosse Pointe Woods, MI
Posts: 13,880
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 438 Post(s)
Liked: 1059
Quote:
Originally Posted by pgwalsh View Post

Thought this was interesting and wanted to share.

Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

http://arstechnica.com/science/2013/02/human-hearing-beats-sounds-uncertainty-limit-makes-mp3s-sound-worse/

One of several flies in the author's ointment is that the standard the author used, based on a simple reading of the laws of physics, recovers both phase and amplitude information, while the ear is notoriously insensitive to phase. Another is that the test given the humans allowed them to improve their scores by means of learning, while the method based on the simple reading of physics can't learn. Finally, it is the choice of the coder developer whether or not he actually takes the time to obtain the phase information.

In short, it wasn't an apples-to-apples comparison.
arnyk is online now  
post #5 of 13 Old 02-28-2013, 07:55 AM
AVS Special Member
 
commsysman's Avatar
 
Join Date: Dec 2007
Location: Southern California
Posts: 5,209
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 89 Post(s)
Liked: 244
Since MP3 can use sampling rates from 128K to 320K, it is wise to avoid too much generalization.

Lower-rate MP3 copies sound poor to me on most types of music, although a lot of pop recordings are crap anyway so it doesn't matter.

I just go with WAV files and get great results.
commsysman is offline  
post #6 of 13 Old 03-01-2013, 05:41 PM
Senior Member
 
GIK Acoustics's Avatar
 
Join Date: Sep 2012
Location: Atlanta, GA & Bradford, UK
Posts: 208
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 21
Quote:
Originally Posted by commsysman View Post

I just go with WAV files and get great results.

Yep.

But I always thought MP3 itself was patented by Fraunhofer based on psychoacoustic research, not just "simple physics" ?

Alexander Reynolds
GIK Acoustics
GIK Acoustics is offline  
post #7 of 13 Old 03-01-2013, 06:27 PM
AVS Addicted Member
 
amirm's Avatar
 
Join Date: Jan 2002
Location: Washington State
Posts: 17,803
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 528 Post(s)
Liked: 339
Quote:
Originally Posted by GIK Acoustics View Post

But I always thought MP3 itself was patented by Fraunhofer based on psychoacoustic research, not just "simple physics" ?
MP3, WMA, AAC, etc. are all what is called "transform based" codecs. Music signal is taken from time domain into frequency. Once there, we can apply (frequency based) psychoacusitics rules to what is audible and what is not. We can then further truncate the resolution of each frequency band based on our hearing sensitivity. Once done, the rest of the data is losslessly compressed and that is your final bit stream.

When the transformation is made into frequency domain, a time window is selected. The size of that window determines our "time and frequency" resolution. If we use short windows, we improve our time resolution but lose efficiency because we don't have sufficient resolution in frequency domain. If we increase the frequency resolution, distortion spreads across our time window which smears transients. There is no one optimal choice here and hence the uncertainty principal discussed in the article.

Good encoders make the right trade off between different window sizes by analyzing the content and matching the window to the frame of music being encoded (usually there is only a choice of two window sizes). Short windows for transients, long windows for steady state. Alas, this analysis is far from trivial due to efficiency losses of short windows as data rates get lower. Even detecting transients can be challenging.

Amir
Founder, Madrona Digital
"Insist on Quality Engineering"

amirm is online now  
post #8 of 13 Old 03-03-2013, 08:01 AM
Member
 
Riffmeister's Avatar
 
Join Date: Jul 2003
Location: Indianapolis
Posts: 187
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 2 Post(s)
Liked: 31
As far as "lossy" codecs go, I was involved in some blind testing many years ago between WMA and MP3 and WMA beat MP3 at the same bit rate and even when it was lower than the MP3. Every person in my test group picked the WMA as better sounding.
Riffmeister is offline  
post #9 of 13 Old 03-03-2013, 08:30 AM
AVS Addicted Member
 
amirm's Avatar
 
Join Date: Jan 2002
Location: Washington State
Posts: 17,803
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 528 Post(s)
Liked: 339
Quote:
Originally Posted by Riffmeister View Post

As far as "lossy" codecs go, I was involved in some blind testing many years ago between WMA and MP3 and WMA beat MP3 at the same bit rate and even when it was lower than the MP3. Every person in my test group picked the WMA as better sounding.
Thanks for saying that. My team at Microsoft developed WMA and what you state was its target! I am sure if they were reading this post they would be very pleased smile.gif. We created WMA when the common connection to the Internet was a dial-up. At that time, all MP3 could do was produce AM radio quality due to its very low sampling rate (and hence frequency response) at effective dial-up rate (just 20 Kbps for 28K modem). Our goal was to achieve FM quality at the same rate which was a tall order as high frequencies are the hardest to encode (transients are made up of high frequency components). One of the main application of online media was streaming radio at the time and hence this initiative.

Similarly at the time, typical MP3 player (prior to the introduction of the iPod) had flash memory which was very expensive and so there was a need to get to "CD quality" at 64 kbps instead of 128 kbps required by MP3 to reach 44.1 KHz sampling.

I don't know that we achieved the above targets for all content but we did push forward the goal post relative to MP3.

Amir
Founder, Madrona Digital
"Insist on Quality Engineering"

amirm is online now  
post #10 of 13 Old 03-03-2013, 02:27 PM
Senior Member
 
GIK Acoustics's Avatar
 
Join Date: Sep 2012
Location: Atlanta, GA & Bradford, UK
Posts: 208
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 21
Quote:
Originally Posted by amirm View Post

MP3, WMA, AAC, etc. are all what is called "transform based" codecs. Music signal is taken from time domain into frequency. Once there, we can apply (frequency based) psychoacusitics rules to what is audible and what is not. We can then further truncate the resolution of each frequency band based on our hearing sensitivity. Once done, the rest of the data is losslessly compressed and that is your final bit stream.

Amir,

I understand this process of compression, I just got confused on the wording of the article. I felt like it read: "We need to revisit compression because *insert the reason MP3 was invented*" which wouldn't really make a strong argument, but I think I just read through it too fast. I now understand that it was simply pointing out the downfalls of the current MP3 compression regime. Of course FLAC has already done job of lossless file size compression themselves, and if they applied slight audio compression, we could likely get very small file sizes with hardly any loss - do we really need to re-evaluate compression or do we just need to switch over to a different standard audio codec?

Alexander Reynolds
GIK Acoustics
GIK Acoustics is offline  
post #11 of 13 Old 03-03-2013, 10:39 PM
Member
 
EmotivaKeith's Avatar
 
Join Date: Feb 2013
Posts: 20
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 14
I think everyone also needs to consider that the "psychoacoustic assumptions" made when MP3 was developed were generalized (a lot like the compression algorithms used for JPG pictures). This means that they were designed to work well on most music, for most listeners, most of the time. I've certainly heard low-rate MP3 files that sounded very good (and indistinguishable from the original). I've also heard specific pieces of music that always sounded bad when converted to a MP3 (no matter what bit rate or encoder). It seems obvious to me that certain sounds, or combinations of sounds, fail to "fit into the model" that was used when MP3 was developed (they probably have too many disparate tones - disparate in time or frequency - and the encoding algorithm is unable to find a way to apply the encoding algorithms to successfully reduce the size while ensuring that all the removed information is masked). A similar thing happens with JPG pictures.... certain originals contain elements that the process doesn't work well on.... and so the result is poor. [With JPGs, they work well on continuous tones, like portraits, and very poorly on artwork with lines, like cartoons or text. You can easily JPG a portrait to 10x smaller with no visible loss of quality, but text usually shows noticeable degradation - under magnification - even when JPGed at full size.]

Of course, lossless files have the huge advantage that, since nothing has to be omitted, you don't run the risk of omitting something that cannot be masked.....

EmotivaKeith is offline  
post #12 of 13 Old 03-06-2013, 10:47 AM
Member
 
adrummingdude's Avatar
 
Join Date: Feb 2013
Posts: 97
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 1 Post(s)
Liked: 17
Quote:
Originally Posted by commsysman View Post

.... although a lot of pop recordings are crap anyway so it doesn't matter..

+100.

If the engineer is running a million db of compression on the 2 bus in efforts to have a competitive stake in the loudness wars, the damage is already done and no lossless standard will ever make it sound any better. I have done the "old recording vs new recording" listening game with my wife recently and even she, who usually opines things to sound "the same" can tell a dynamic difference across most popular genres recorded within the past 15-20 years, regardless of playback format.

Samsung UN55F8000
Pioneer Elite SC-65
Bowers and Wilkins 683
Bowers and Wilkins HTM-61
Definitive Tech XTR-20BP
Definitive Tech Supercube 2
8TB Iomega Media Server
adrummingdude is offline  
post #13 of 13 Old 06-05-2013, 05:47 PM
Senior Member
 
Sonic icons's Avatar
 
Join Date: Aug 2005
Location: Lafayette, Colorado
Posts: 358
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 21
Hi,

I have some additional information about the scientific paper that is the topic of this thread:
Jacob N. Oppenheim and Marcelo O. Magnasco, "Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle", Phys Rev Lett 110 044301 (2013)

which is found on-line here: http://prl.aps.org/abstract/PRL/v110/i4/e044301

[1] Full access to the Physical Review Letters (abbreviated "Phys Rev Lett" or PRL) article requires either of the following:
(a) Membership in the American Physical Society (http://www.aps.org/membership/join.cfm) and, in addition, an online subscription to PRL, which is an additional $50 / year for American Physical Society members.
(b) (Probably) purchase of the individual article. Sorry that I can't provide full information on the "individual purchase" option right now. However, the free option described in my next paragraph makes any of the cost options mostly unnecessary.

[2] This is a "mostly good news / some bad news" item wink.gif
You don't need a subscription to PRL to get access to the article. A "preprint" of the full article, which is almost identical to the final PRL version, is available at the free "arxiv.org" site. Here: http://arxiv.org/abs/1208.4611

So what's the "bad news" part? The preprint at "arxiv.org" includes the full article, but does not include the Supplemental Material (Footnote [21]: "See Supplemental Material at [URL will be inserted by publisher] for testing procedures and parameters, fitted data, controls, and discussion of performance at other parameter values)." The Supplemental Material is (as far as I know) available only from PRL: http://prl.aps.org/supplemental/PRL/v110/i4/e044301 The Supplement turns out to be of equal length to the "regular" article and includes considerable additional information on some topics, especially Experimental Design and Controls. (BTW, this is a trend I've noticed in publication of scientific papers - to put a lot of information about the study in a separate Supplement document, which can be as long as the "regular" article - not a helpful trend, in my opinion mad.gif.)

I have access to an online PRL subscription through my employer, so I can see the PRL version of the regular article, which as I said is almost identical to the free Arxiv.org preprint, and the Supplement. (Sorry I can't publicly share copyrighted material from the PRL website - please don't ask.) Maybe the corresponding author would agree to make the Supplement publicly available if the author knew the interest his work has generated in audio discussion groups. The author's contact info is available here: http://prl.aps.org/abstract/PRL/v110/i4/e044301

[3] Finally I have a comment about the content of the scientific paper, versus this statement by pgwalsh in his first post (my emphasis):
Quote:
a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics.

From my reading, the "vast majority of humans" part is absolute wrong as a summary of the scientific paper, and should be replaced by something like "researchers demonstrated that a tiny minority of humans, namely composers and conductors of 'classical' or 'serious' music (and, to a much lesser extent, musicians), can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics". (I'm not blaming pgwalsh for this inaccuracy if pgwalsh only had access to the arstechnica article, not the actual scientific paper.)

Let me explain with some quotes from the (open, Arxiv.org version of the) paper. First, this study consisted of controlled experiments on the hearing abilities of a group of humans, the test subjects of the study. The subjects were given five hearing discrimination "tasks" of increasing difficulty. Only success in the final and most difficult task, task 5, showed that "human hearing beats sound's uncertainty limit". (In other words, if all the subjects had succeeded in tasks 1 to 4, and all had failed in task 5, the paper would not exist in its present form with its present title, and we probably wouldn't be talking about it here.)

So what was the critical 5th task, and the other four tasks? To quote from the caption of Figure 1:
Quote:
In our final task 5, subjects are asked to discriminate simultaneously whether the test note (red) is higher or lower in frequency than the leading note (green), and whether the test note appears before or after the flanking high note (blue). For each instance of the task, two numbers are generated (Dt and Df) and two Boolean responses (left/right, up/down) are recorded. Tasks 1 through 4 lead to this final task: task 1 is frequency only (uses two flanking notes), task 2 timing only, task 3 is frequency only but with the flanking high note (blue) as a distractor, and task 4 is timing only, with the leading (green) note as a distractor.

and to quote from the caption of Figure 3 (my emphasis):
Quote:
Each round dot is a completion of Task 5 by a subject on an individual day, with at least 100 presentations. There were 12 subjects totaling 26 individual sessions for Gaussian and 12 sessions for notelike tests. Blue denotes Gaussian packet while red denotes notelike. The two solid lines are the locus of the relation [doesn't copy well from the PDF]; any dots below these curves violate the corresponding uncertainty relation.

So, who were the subjects, and who succeeded at which tasks, including the critical 5th task? To quote from the paper once more:
Quote:
It is important to stress where the difficulty of the task lies. Our preliminary testing included non-musicians, who where often close in performance to musicians on tasks 1 and 2 (separate time and frequency acuity), but then found tasks 3 and 4 hard, while musicians, trained to play in ensembles, found them easy. We further found that composers and conductors achieved the best results in task 5, consistently beating the uncertainty principle by factors of 2 or more, whereas performers were more likely to beat it only by a few percentage points. After debriefing subjects, it appears that the necessity of hearing multi-voiced music (both in frequency and in time) in one's head and coaching others to perform it led to the improved performance of conductors and composers.

So, according to this study, "composers and conductors" are the true high-end audiophiles, with specific hearing abilities that far exceed the abilities of average humans smile.gifsmile.gif (Did anyone besides me ever anticipate or imagine that a scientific study, with this very specific result, might one day exist? biggrin.gif)

[small clarification: I'm definitely not claiming that I "imagined" or "anticipated" the existence of a study on the topic "Human Time-Frequency Acuity Beats the Fourier Uncertainty Principle". I am claiming that I imagined (long before this study) that scientific research might show "hearing acuity" of the most highly trained and skilled musicians is superior, in some unspecified way, to "hearing acuity" of average humans. I have no proof that my imagination did this - I didn't write down a prediction in AVS Forum or elsewhere - but you can take my word wink.gif ]
Sonic icons is offline  
Reply Audio theory, Setup and Chat

User Tag List

Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off