AVS Forum

AVS Forum (http://www.avsforum.com/forum/)
-   2 Channel Audio (http://www.avsforum.com/forum/173-2-channel-audio/)
-   -   Is High-Resolution Audio Irrelevant? (http://www.avsforum.com/forum/173-2-channel-audio/1529833-high-resolution-audio-irrelevant.html)

Scott Wilkinson 04-30-2014 08:16 PM

Mark Henninger's excellent piece about whether or not high-end audio is obsolete got me thinking about a related topic that's been in the AV news a lot lately: high-resolution audio. As most AVS readers probably know, LPCM (linear pulse-code modulation) digital audio, the most common type used in commercial music recording, has two basic parameters that define its resolution—sampling frequency (aka sample rate) and bit depth.

 

SACD uses a different type of digital audio called DSD (Direct Stream Digital), which I'll put aside for the moment. Also, sampling frequency and bit depth don't apply to lossy compressed formats like MP3, only to the original files that were used to create them, so I won't include them here.

 

Before I address the question in the title, I'd like to make sure everyone is up to speed on the basics of digital audio. If you're familiar with those basics, you can skip to the next subhead.

 

Digital Audio Basics

In all analog-audio electrical signals, the voltage smoothly rises and falls between a minimum and maximum value in a pattern called the waveform. In most cases, the waveform is quite complex, and it determines the tone or timbre of the sound it represents.

 

All complex waveforms can be separated into pure tones whose waveform is a sine wave with a single frequency and a certain amplitude (the difference in voltage between the peaks and troughs of the sine waveform); this process is called Fourier analysis. If you were to combine all the pure tones at the proper levels, you would end up with the original complex waveform. (Actually, it's not as simple as this—the waveform varies over time as well, but the basic concept still applies.)

 

When added together, the sine waves combine to form the complex waveform.

 

In the process of sampling an analog-audio signal, the instantaneous voltage of the signal is measured multiple times as it rises and falls during each cycle of the waveform. The number of times this measurement is taken per second is the sampling frequency. Each measurement, or sample, is represented by a digital number that includes a certain number of bits; this is the bit depth. The higher the sampling frequency, the more measurements are taken per second, and the greater the bit depth, the more accurate each of those measurements is.

 

As the sampling frequency and bit depth increase, the measurements of the instantaneous level of the analog waveform become more accurate. Also, as bit depth increases, so does the dynamic range, which is represented here by depicting the 24-bit samples as taller than the 16-bit sample. The "size" of each bit in all three graphs remains the same. Note that there are far more than 16 steps in a 16-bit sample; in fact, there are 65,536 steps. These graphs are meant as conceptual illustration, not to be numerically accurate.

 

The sampling frequency establishes the highest audio frequency that can be accurately represented in the digital information. According to a well-established tenet called the Nyquist Theorem, any analog signal with a frequency no greater than half the sampling frequency can, in principle, be sampled and reconverted back to its analog form with perfect fidelity. (Remember that most analog-audio signals include many frequencies in combination, and the Nyquist Theorem applies to each of them individually.)

 

If frequencies above half the sampling frequency—which is called the Nyquist frequency—are digitized, they create lower-frequency artifacts that were not in the original signal. To avoid these so-called aliasing artifacts, the analog input signal is first sent through a lowpass filter that removes any frequencies above the Nyquist frequency.

 

In this diagram, a waveform (red) is sampled at less than twice its frequency, resulting in a low-frequency aliasing artifact (blue).

 

When the digital signal is converted back to analog, it first looks like a series of stairsteps that roughly follows the shape of the original waveform. These stairsteps correspond to the values that were sampled when the signal was digitized. Such a stairstep waveform includes many high-frequency components, or harmonics, which are removed by sending the signal through another lowpass filter (also called a reconstruction filter). This removes all frequencies above the Nyquist frequency, returning the waveform to its original shape. Theoretically—and amazingly—no information whatsoever from the original waveform is lost in this process.

 

A waveform is sampled, converting it into a series of numbers. When the numbers are converted back to analog, they start as a stairstep approximation of the original waveform. A lowpass reconstruction filter restores the waveform's original shape.

 

As I mentioned earlier, the bit depth is the number of bits used to represent each sampled value, from the peaks to the troughs of the waveform. With a bit depth of 8 bits, 256 different values can be represented (28 = 256); with 16 bits, 65,536 different values can be represented (216 = 65,536), and with 24 bits, 16,777,216 different values can be represented (224 = 16,777,216). The bit depth determines the maximum dynamic range that can be represented—each bit adds roughly 6 dB to the dynamic range, so 16 bits corresponds to a dynamic range of about 96 dB and 24 bits corresponds to a dynamic range of 144 dB.

 

Because there is a finite number of possible values for each sample, most samples will not correspond exactly with the instantaneous voltage of the analog waveform, so the value of the sample is the closest it can be without exceeding the voltage it represents. The difference between the actual instantaneous voltage and the sampled value is called the quantization error.

 

Occasionally, the voltage and the sampled value are precisely equal, in which case the quantization error is 0, but most of the time, the samples do not equal the voltage by different amounts. So quantization error is often expressed as an average over many samples. The greater the bit depth, the more accurate all sample values are and the lower the average quantization error, which is also called quantization noise or distortion. This defines the noise floor, below which no actual signal can be represented or reproduced.

 

In this image, the green curve is the original waveform, and the yellow curve is the waveform resulting from quantization. The red curve represents the quantization errors or quantization noise.

 

CDs store LPCM digital audio with a sampling frequency of 44.1 kHz and a bit depth of 16 bits, often specified with the shorthand designation 44.1/16 or 16/44.1. (Many professional digital-audio systems use a sampling frequency of 48 kHz with a bit depth of 16 bits.) Why were these values chosen? According to most research, humans can't hear frequencies above 20 kHz, so if the sampling frequency is more than twice that, all audible frequencies can be accurately represented as digital data.

 

The dynamic range encompassed by healthy human hearing—that is, the difference between the softest sound we can perceive (the threshold of hearing) to the loudest sound we can perceive without pain (the threshold of pain)—is around 140 dB, which is more than the theoretical maximum of 96 dB represented by 16 bits. But using more bits would require more storage capacity, and one of the design goals of the CD format was the ability to store at least 60 minutes of audio, so 16 bits was deemed sufficient while allowing that goal to be achieved.

 

Higher-resolution LPCM audio recordings—typically 24-bit/96 kHz or even 24/192—can be distributed on Blu-ray or DVD-Audio discs or made available for downloading from websites. A bit depth of 24 bits represents a theoretical dynamic range of about 144 dB, and a sampling rate of 96 kHz can accurately represent frequencies up to 48 kHz; a sampling rate of 192 kHz can represent frequencies up to 96 kHz.

 

However, it's important to verify that the original recordings were made at the higher resolution and not upconverted from 16/44.1, which would negate any potential improvement in the audio quality. Then there's the issue of an analog master tape being digitized at 24/96 or 24/192, the value of which is debatable, since professional analog-audio tape has a dynamic range of 60-70 dB without noise reduction.

 

Another form of high-resolution audio is DSD (Direct Stream Digital), the digital-audio format used on SACD discs. DSD uses a very high sampling rate of 2.8 MHz and a bit depth of only one bit, but it uses a different encoding scheme called pulse density modulation (PDM), so it's not directly comparable to LPCM. According to Wikipedia, it's approximately equivalent to 20-bit/96 kHz LPCM.

 

There is much more to digital audio than I have explained here, but this is enough to understand the issues of high-resolution audio and whether or not it is irrelevant.

 

High-Resolution Audio Goes Mainstream

Recently, digital audio with higher resolution than CD has gotten a lot of attention, especially with the news that Neil Young is moving ahead with his PonoMusic project now that its Kickstarter crowd-funding campaign has raised over $6 million. Young proposes to distribute commercial music recorded at a sampling frequency of 96 or even 192 kHz and a bit depth of 24 bits, which will be playable on a portable Pono Player built by high-end manufacturer Ayre Acoustics.

 

The Pono Player will include hardware from Ayre Acoustics.

 

Young is not the first to distribute high-res music files. AIX Records has been recording and distributing 24/96 music files on DVD-Audio and Blu-ray discs since 2000, and it launched itrax.com, the first high-resolution audio-download site, in the fall of 2007. Chesky Records sells SACDs, DVD-Audio discs, and DVD-ROM discs with 24/192 audio files that you can copy to a computer hard drive. Other sources of high-res downloads include Bowers & Wilkins, Linn Records, Naim Label, and 2L. Another well-known source is HDtracks.com, but some audiophiles suspect that some of its files are upconverted from 16/44.1; see Polk Audio's forum for a discussion of this.

 

The crux of the question I pose in the title of this thread is whether or not true high-resolution audio—recorded, edited, mastered, and distributed in 24/96, 24/192, or DSD—offers an audible improvement over the good ol' 16/44.1 audio found on CDs. And as you might imagine, there is much debate over this proposition.

 

For example, it seems clear that a bit depth of 24 bits could potentially sound better than 16 bits, since the dynamic range of human hearing is about 140 dB. But very few recordings are made without some form of dynamic compression. (I'm not talking about data compression like MP3.) In the case of most popular music, the dynamic range is severely compressed so that everything can be heard in the presence of road noise in a car or a city street while you're out for a stroll or bike ride.

 

In terms of frequency range, traditional research has established that humans can't hear above 20 kHz—and virtually all adults can't hear anywhere near that high—so a sampling rate of 44.1 kHz should be more than enough, especially since the Nyquist Theorem states that all frequencies less than half the sampling frequency can be reconstructed with perfect accuracy. The problem here is that the anti-aliasing input filter and reconstruction output filter must have very steep slopes to allow 20 kHz to pass unattenuated while completely blocking 22.1 kHz and above. This type of "brick-wall" filter is very difficult to design and implement without introducing some audible artifacts of its own—at least in the analog domain. By using a higher sampling frequency, the slope of these filters can be much more gradual, which results in much less artifacts.

 

Then there's the issue of whether or not ultrasonic frequencies above 20 kHz somehow affect the audible range, even though we can't hear them directly. For example, some believe that ultrasonic harmonics interact with each other, producing what are called difference or interference tones down in the audible range. So capturing and reproducing those harmonics could affect the sound we can hear, as many listeners claim they do.

 

On the other hand, you need some unusually capable equipment to record and reproduce frequencies above 20 kHz. Some speakers can do it—for example, Sony's new Core Series of speakers are spec'd up to 50 kHz for the SS-CS3 floorstander and SS-CS5 bookshelf, and they're not even that expensive ($480/pair for the SS-CS3, $220/pair for the SS-CS5); for more info about these speakers, see our coverage here. In fact, Sony is placing a lot of emphasis on high-resolution audio in many of its new products.

 

Assuming the ADC (analog-to-digital converter), DAC (digital-to-analog converter), and all digital electronics in the recording and playback chain are capable of accurately representing 24/96 or higher, what about the other analog components in the signal chain, including microphones, preamps, and power amps, along with the analog portions of the converters? If any of them can't support at least 48 kHz and 140 dB of dynamic range, the effort to record and deliver 24/96 audio—not to mention 24/192—is moot.

 

Argument For the Proposition

Aside from Mark Henninger's piece about whether or not high-end audio is obsolete, the article that inspired me to write this post is "24/192 Music Downloads...and why they make no sense" by Monty Montgomery on xiph.org. Among the arguments in this article is the assertion that all transducers and amplifiers exhibit some amount of distortion, which increases at the lowest and highest frequencies. In particular, reproducing ultrasonic frequencies leads to intermodulation distortion that can extend into the audible range. Thus, it's better not to encode ultrasonics to avoid any possibility of intermodulation distortion.

 

Montgomery also points out that, while an analog anti-aliasing filter works better if its slope is gradual as explained earlier, a digital anti-aliasing filter has no such limitation. If you sample at a high sampling frequency—say, 96 or 192 kHz—you can apply a digital lowpass filter that simply discards the ultrasonic components, and you're left with a 44.1 kHz dataset that has no aliasing artifacts.

 

Regarding the use of 24 bits instead of 16, the article argues that the threshold of hearing increases with age and hearing damage, and the threshold of pain decreases, reducing the dynamic range of human hearing as we get older. Also, a technique called dithering, which adds a bit of noise to the signal to mask quantization noise, allows amplitudes of less than one bit to be encoded and reproduced.

 

Finally, Montgomery points out that if the loudest possible undistorted sound is defined as 0 dB, the quantization-noise floor is -96 dB with a bit depth of 16 bits. But this is the RMS noise floor of the entire broadband signal, and each hair cell in the inner ear is sensitive to a narrow fraction of the total bandwidth, which means the noise floor of each hair cell is much lower than -96 dB. With the use of dither, the article claims that the practical dynamic range of a 16-bit digital audio signal is actually more like 120 dB.

 

The article does acknowledge that using more than 16 bits is important during recording, mixing, and mastering to avoid clipping and allow digital signal processing without raising the noise floor to objectionable levels. But once the music is ready to be distributed, there is no reason to use more than 16 bits.

 

After all this theory, Montgomery cites some empirical tests performed by the Boston Audio Society (BAS) in which listeners were played high-resolution DVD-Audio and SACD content and the same content downsampled to 16/44.1 on the spot (no dithering), and they were asked to identify which was which. The tests were said to be conducted using high-end equipment in noise-isolated environments with both amateur and trained professional listeners. In over 500 trials, listeners chose correctly 49.8% of the time, which is no better than random chance.

 

Argument Against the Proposition

One of the staunchest and longest-active advocates for high-resolution audio is Dr. Mark Waldrep, founder and chief engineer for AIX Records. Waldrep responds to part of the xiph.org article on his website, realhd-audio.com, in a post entitled "24-Bits Makes Sense!" Waldrep acknowledges that most pop/rock recordings and some classical and jazz recordings are subjected to dynamic-range compression, and that most commercial music does not exceed a dynamic range of 96 dB even without compression. But he has, in fact, recorded pieces that do exceed this dynamic range and thus benefit from 24-bit resolution.

 

In this graph, you can see the dynamic range of human hearing, a typical room, music, and an analog-audio signal. (Courtesy RealHD-Audio.com)

 

When I asked Waldrep about the xiph.org article, he said, "I agree with Monty that we do not derive any sonic benefit from sample rates higher than 96 kHz. But he's incorrect about the 24-bits claim. His statement that 16-bit CDs can deliver more than 96 dB requires some fancy dithering, which no one is actually doing in practice. CDs have the potential to achieve greater than 90 dB of dynamic range, but why not just shift to 24 bits, since the hardware and software are already there?"

 

Waldrep maintains that the CD, which has been around since 1982, is really hard to beat when it comes to convenience and fidelity. The format has the potential to eclipse analog tape and vinyl LPs, but only if the entire production chain is up to the task and the engineering/production team are focused on audio fidelity.

 

He goes on to say that moving to high-resolution PCM audio offers additional fidelity thanks to its increased specifications. In fact, 24/96 PCM provides an additional octave of frequency response and brings the dynamic range to the capability of human hearing. The fact that the ultrasonics included in high-resolution audio might be impossible to hear doesn't deter him.

 

"It's all about fidelity," he says. "if Wallace Roney is playing his trumpet with a Harmon mute that's outputting partials well above 20 kHz, and I have microphones and a complete signal path that can capture and reproduce those frequencies, shouldn't I include them? I'm giving back everything that was being performed. I'm not willing to arbitrarily roll off the ultrasonics because we haven't proven that humans can't hear them." This might be an intellectual argument, but there is some evidence that recording at higher than 44.1 or 48 kHz is perceptible in some way. The jury is still out on that, he says.

 

I asked Waldrep about the BAS study, and he dismissed it as being completely botched. According to him, "the examples that were evaluated came primarily from the major labels with a few audiophile recordings as well. The recordings were either DVD-Audio or SACD discs from the private collections of BAS members. This is where the issue of provenance becomes important." The term was first applied to the production history of audio recordings by Waldrep in 2007. "If the original sessions were recorded on analog tape, mixed to a stereo analog tape, and then mastered to yet another copy, the dynamic range would span about 10-12 bits! How were the listeners in the BAS study supposed to hear the difference between high-resolution audio and a downconverted CD version when both had the same dynamic range?

 

"And the same can be said for the frequency response. Even the new recordings from the Chesky label that were released on SACD had no frequencies above 20 kHz. The DSD 2.8224 MHz 1-bit format forces all of the 'in-band' noise above the upper range of human hearing in a process known as 'noise shaping.' This is the reason that DSD at higher rates have appeared—to push the noise out even further."

 

According to Waldrep, the BAS study was so seriously flawed that its conclusions are completely invalid. "If the listeners were attempting to discern a difference between two things that were essentially identical, of course the results would be the same as random choice! There weren't any real high-resolution audio titles among those that were auditioned."

 

So there you have it—high-resolution audio recordings are moving into the mainstream, thanks in large part to advances in recording and playback technology that make it relatively inexpensive to create, distribute, and reproduce. But does it offer any real, tangible benefit over CD? It's time for you to weigh in with your thoughts, opinions, and experiences. I look forward to following the discussion.

 

Like AVS Forum on Facebook

Follow AVS Forum on Twitter

+1 AVS Forum on Google+


N8DOGG 04-30-2014 08:24 PM

If you go to phillips site, you can try out the online golden ear challenge and try out some of the different bit rates for yourself. I breezed through the first 2 full sets but it gets pretty tough at the end before you get your certificate! : https://www.goldenears.philips.com/en/login.html

Eric Tatara 04-30-2014 08:37 PM

E. MEYER and D. MORAN, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, J. Audio Eng. Soc., Vol. 55, No. 9, 2007.

"The test results for the detectability of the 16/44.1 loop on SACD/DVD-A playback were the same as chance: 49.82%. There were 554 trials and 276 correct answers."

"There is always the remote possibility that a different system or more finely attuned pair of ears would reveal a difference. But we have gathered enough data, using sufficiently varied and capable systems and listeners, to state that the burden of proof has now shifted. Further claims that careful 16/44.1 encoding audibly degrades high resolution signals must be supported by properly controlled double-blind tests."

http://www.drewdaniels.com/audible.pdf

RLBURNSIDE 04-30-2014 08:54 PM

I believe we should record + store all audio in lossless 24/96, for the time when gene therapy or cybernetic implants can improve our listening to actually be able to tell the difference. No, I'm not kidding. That time is not so incredibly far off, compared to the long road towards posterity that lies beyond...

Perhaps one day those with "golden ears" will actually walk around with golden ears, for real. In clubs. And meet in secret smile.gif

The big thing that 24/96 that excites me is the ability for stuff like Wisa to transmit this wirelessly uncompressed in such a way that interference is imperceptible, in a way that for example, a (substantial) scratch on a CD is not. It's a trainwreck.

What really pisses me off though, is that modern receivers all seem to have removed pre-outs, including the latest high res receivers with HDMI 2, to force those of us who'd like these new capabilities to pay upwards of two grand so we can use separates and/or powered speakers with the latest codecs. Arrrrgh, the industry is totally going in the wrong direction. Putting high res audio on low-end equipment that doesn't support powered speakers via line-level pre-outs at affordable prices is a real sham. And it's a collusion with HDMI, so force us to pay through the nose for this stuff. Without HDMI, we could have separate wires for audio or splitters that actually work, rather than force you to have your TV on if you want to play audio through your amp via HDMI. It's such...a...freaking...scam. And stuff like HD codecs for movies and lossless and these new high resolution formats are all part of that big picture, to fleece customers into paying more and upgrading more often than they need to or want to. It's the hardware equivalent of cable TV channel bundling, to get HBO you have to buy the shopping network and 25+ other channels you don't want. What if I want a receiver with high res audio and preouts but no amp section? Whoops, can't have that, even though it'd be cheaper to manufacture. Balanced XLR outputs? Forget that, you gotta pay 2 grand for that, even though differential encoding is half a century old (or more).

In an idea world, we'd have an upgrade to optical audio connectors that'd support 7.1 or 9.1 in 24/96 and then we could ignore receivers' HDMI capability entirely, and just use them to switch between audio sources.

fierce_gt 04-30-2014 09:05 PM

when I listen to music, I don't really listen to it. when I first got into the hobby, I tried a few times to just sit in my room and play a cd and enjoy it, but I couldn't. so for me, it makes no sense, since I will literally NEVER pay full attention to any music. it will always be in the background of what is going on in my life at the time, whether it's me driving, getting some work done, enjoying some good times with some good friends. the sad truth is I really don't have anything to complain about when listening to MP3's, and rarely would I be able to tell you if I was listening to the radio, an mp3, or a cd.

so from my point of view, I see a niche market for the real music lovers that will actually enjoy immersing themselves in nothing but the music, and a market for those techno hipster teenagers that will buy them because they are 'better' and it will become as much about the image as it is the music.

wgscott 04-30-2014 09:28 PM

The case for 24-bit audio (vs. 16 bit) I think is somewhat stronger than that for high sampling frequency.

Although 24 bits is inarguably overkill, 20 could provide a useful margin of error. 24 bit also gives some headroom for DSP and digital volume control.

The case for sampling at twice the Shannon cutoff for audible frequencies (88.2kHz or 96kHz, depending on the source) is a lot weaker, but the ability to move noise and aliasing artifacts well away from the audible range, and simultaneously avoiding Fourier truncation artifacts from applying a steep filter, do make some sense. The idea of not throwing away any of the data is key in image reconstruction; by analogy it seems at worst to be a harmless indulgence of audio paranoia. Anything beyond 88.2/96 kHz sampling frequency seems completely pointless. I doubt most microphones record above 25 to 30 kHz anyway (corresponding to sampling at 50 to 60 kHz). Most speakers used for playback hardly extend to 20 kHz. My hearing goes to 17 kHz if a mosquito buzzes my ear canal.

One of the problems with testing one's ability to hear any difference in "high res" music is that so many of the commercially available tracks are fake -- up sampled redbook. I've seen and unfortunately purchased a bunch of examples from HDtracks, and even some of Neil Young's supposedly high res stuff on DVD is bricked at 44.1 kHz.

Tank_PD 04-30-2014 09:58 PM

Can't really find very much of the music I like so, essentially yes. biggrin.gif

Player3 04-30-2014 11:51 PM

Allen Sides would say high-resolution audio is relevant. I'm having what he's having. LOL, but seriously. cool.gif

Scott Wilkinson 05-01-2014 12:10 AM

Quote:
Originally Posted by Eric Tatara View Post

E. MEYER and D. MORAN, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, J. Audio Eng. Soc., Vol. 55, No. 9, 2007.

"The test results for the detectability of the 16/44.1 loop on SACD/DVD-A playback were the same as chance: 49.82%. There were 554 trials and 276 correct answers."

"There is always the remote possibility that a different system or more finely attuned pair of ears would reveal a difference. But we have gathered enough data, using sufficiently varied and capable systems and listeners, to state that the burden of proof has now shifted. Further claims that careful 16/44.1 encoding audibly degrades high resolution signals must be supported by properly controlled double-blind tests."

http://www.drewdaniels.com/audible.pdf


This seems to be the very test that Monty Montgomery and Mark Waldrep refer to as recounted in the OP. I scanned through this paper, and nowhere did I find a list of the SACD/DVD-A titles they used, which means we have no way to verify their provenance (how they were recorded) and whether or not they actually contained frequencies beyond 20 kHz or a dynamic range beyond 96 dB. As Waldrep is quoted as saying in the OP, the high-resolution selections used in these tests did not go beyond these limits, so of course the results were no better than random chance, because there was effectively no difference between the high-res and CD versions. What is needed is another such test in which the high-res selections are verified to contain that extra octave of frequencies and a wider dynamic range than CDs are capable of.


Phrehdd 05-01-2014 02:32 AM

Quote:
Originally Posted by wgscott View Post

The case for 24-bit audio (vs. 16 bit) I think is somewhat stronger than that for high sampling frequency.

Although 24 bits is inarguably overkill, 20 could provide a useful margin of error. 24 bit also gives some headroom for DSP and digital volume control.

The case for sampling at twice the Shannon cutoff for audible frequencies (88.2kHz or 96kHz, depending on the source) is a lot weaker, but the ability to move noise and aliasing artifacts well away from the audible range, and simultaneously avoiding Fourier truncation artifacts from applying a steep filter, do make some sense. The idea of not throwing away any of the data is key in image reconstruction; by analogy it seems at worst to be a harmless indulgence of audio paranoia. Anything beyond 88.2/96 kHz sampling frequency seems completely pointless. I doubt most microphones record above 25 to 30 kHz anyway (corresponding to sampling at 50 to 60 kHz). Most speakers used for playback hardly extend to 20 kHz. My hearing goes to 17 kHz if a mosquito buzzes my ear canal.

One of the problems with testing one's ability to hear any difference in "high res" music is that so many of the commercially available tracks are fake -- up sampled redbook. I've seen and unfortunately purchased a bunch of examples from HDtracks, and even some of Neil Young's supposedly high res stuff on DVD is bricked at 44.1 kHz.

Please note I am not challenging you but your post caught my eye with respect to your comment about HDtracks. You suggest that some of their offerings are up-sampled. Could you explain further how you came to this conclusion please.

Also - For any and all here, if 44.1/16 is considered all that is needed, why do Blu Ray and such offer 48/24 and 96/24 audio? - This is an honest question here not a challenge.

dabotsonline 05-01-2014 04:45 AM

Qobuz as an online store for 24/192 shouldn't be overlooked, either. I know it isn't available in the US per se but with a Smart DNS service such as Overplay's that becomes a moot point.

arnyk 05-01-2014 04:48 AM

Quote:
Originally Posted by wgscott View Post

The case for 24-bit audio (vs. 16 bit) I think is somewhat stronger than that for high sampling frequency.

Although 24 bits is inarguably overkill, 20 could provide a useful margin of error. 24 bit also gives some headroom for DSP and digital volume control.

In general 16 bits has from 10 to 20 dB of built in headroom when one is recording live performances. Rooms, particularly ones with real live humans in them, are relatively noisy. Live recordings tend to top out with around 70 dB dynamic range. I've personally made 24/96 recordings with more dynamic range - almost 90 dB but the conditions were very artificial. It is also true that with good noise shaping 16 bits can have an effective dynamic range of over 110 dB.

Waltrip says this: "The focus of his discussion regarding 24-bits for downloaded files is based on the recognition of the status quo and the limits that have been accepted by (or forced on) recording engineers and producers." and that tells me more about about his lack of real world experience and understanding of how to exploit the digital domain than anything else.

While lots of dynamic range gets done away with during production, the real problem is the real world as both a recording and playback venue. Recordings with reduced dynamic range meet a real world need for as pleasurable listening as possible in settings with dynamic range issues of their own. If everybody listened to music and nothing else and did it in 0 dB quiet rooms there would be a vastly reduced need for recordings with reduced dynamic range.
Quote:
The case for sampling at twice the Shannon cutoff for audible frequencies (88.2kHz or 96kHz, depending on the source) is a lot weaker, but the ability to move noise and aliasing artifacts well away from the audible range, and simultaneously avoiding Fourier truncation artifacts from applying a steep filter, do make some sense.

It might make sense as long as again, one stays away from the real world. There is one situation where very high sample rates e.g. 96 KHz) makes sense, and that is for using audio production tools that apply heavy nonlinear distortion to the music as is sometimes done with techno and other kinds of electronic music. Normal music production never goes there.

The fact is that everybody who listens to mainstream media is probably getting a fairly steady diet of music with a far lower brick wall filter than 22 Khz. That's because one easy way to make lossy compression as sonically transparent as possible is to brick wall filter the music at 16 KHz. So it gets done a lot. Notice it? In extensive DBT testing of audio processing it has been found that fairly egregious processing centered at 22 KHz is generally innocuous. There are many reasons for this, one being the simple fact that the energy in real world music decreases as the frequencies go up, and by the time you get to 13 KHz you can throw away what you want, but you aren't throwing away much.
Quote:
The idea of not throwing away any of the data is key in image reconstruction; by analogy it seems at worst to be a harmless indulgence of audio paranoia.

In fact ultra high frequencies are meaningless for reconstruction of sonic scenes. In fact bandwdith wasted on poorly chosen sample rates can be more profitably invested in high quality processing at lower frequencies that can actually be heard.
Quote:
Anything beyond 88.2/96 kHz sampling frequency seems completely pointless. I doubt most microphones record above 25 to 30 kHz anyway (corresponding to sampling at 50 to 60 kHz). Most speakers used for playback hardly extend to 20 kHz. My hearing goes to 17 kHz if a mosquito buzzes my ear canal.

Actually a large proportion of all professional microphones start rolling off an octave lower - at 10 or 12 KHz.

This is the manufacturer's FR spec for what may be the most widely used professional microphone in the world:

http://cdn.shure.com/specification_sheet/upload/82/us_pro_sm58_specsheet.pdf



it is pretty typical.
Quote:
One of the problems with testing one's ability to hear any difference in "high res" music is that so many of the commercially available tracks are fake -- up sampled redbook. I've seen and unfortunately purchased a bunch of examples from HDtracks, and even some of Neil Young's supposedly high res stuff on DVD is bricked at 44.1 kHz.

True story. This is a funny story for those of us who are realistic about sample rates. It turns out that about half of all DVD-A and SACD recordings were upsampled from sources with far lower sample rates and less resolution. Until some meter readers discovered it, nobody detected it. All of the high end music reviewers in the world ranted and raved about their favorite so-called Hi Reza recordings, when in fact their resolution was lost decades ago when it was originally recorded. The immutable laws of physics say that once resolution is lost, it is gone forever.
Quote:
Please note I am not challenging you but your post caught my eye with respect to your comment about HDtracks. You suggest that some of their offerings are up-sampled. Could you explain further how you came to this conclusion please.

Somebody who was tasked with transcribing a lot of SACDs and DVD-As to digital files spilled the truth. This was after some traditional audio authorities took a peak at their own transcriptions of the same media and found the clear evidence of brick wall filtering in the 20-30 KHz range.
Quote:
Also - For any and all here, if 44.1/16 is considered all that is needed, why do Blu Ray and such offer 48/24 and 96/24 audio? - This is an honest question here not a challenge.

It's simple. It is well known that you can fool an ignorant public including most high end audio reviewers with a meaningless display of larger numbers. Audio CDs and CD players had become mere commodity items and could no longer be sold for outlandish prices. A great amount of money had been made relicensing and reselling zillions of people music they had already bought once by selling them CDs that duplicated the LPs and tapes that they already owned. Been there, done that adn with a big smile on my face.

People were betting that this profitable lightning based on re-issues would strike again in the same marketplace. This is, CDs generally sounded appreciably better than the LPs and tapes that they replaced. The SACDs and DVD-As not so much, even if they were re-mastered.

(I think I messed up some of the attributions of the above posts- my apologies in advance)

Scarpad 05-01-2014 04:55 AM

No my old ears don't hear those freq anymore, if they ever did, but there is a case for offering all downloads in lossless format not compressed and I support that. And in creating a player with enough physical storage to hold them.

WayneJoy 05-01-2014 04:58 AM

Sometimes high resolution audio is mastered with less dynamic range compression which makes most of the difference in the audio quality between that and the CD. I definitely am in favor of lossless downloads.

FMW 05-01-2014 05:38 AM

The bit depth controversy is interesting to me. While I've done the bias controlled tests to discover for myself that 16/44 is good enough, I doubt I've ever had a chance to work with something that has a dynamic range greater than 96db. The counter argument that greater bit depths are valuable may well be true but moot because we don't have any source material with enough dynamic range to take advantage of it. Perhaps I'll put on a CD of the 1812 Overture today and measure the start against the cannon shots to find out if it even gets to 96 db.

arnyk 05-01-2014 06:08 AM

Quote:
Originally Posted by WayneJoy View Post

Sometimes high resolution audio is mastered with less dynamic range compression which makes most of the difference in the audio quality between that and the CD. I definitely am in favor of lossless downloads.

Adjustments to dynamics is a normal part of remastering and unlike some of the other issues there is no doubt that it makes things sound different. IME most remastering has the effect of retailoring the sound of the recording for more modern tastes A lot of SACDs and DVDs were remastered and at least sounded different and possibly better (whatever that means) from earlier versions for that reason.

arnyk 05-01-2014 06:10 AM

Quote:
Originally Posted by FMW View Post

The bit depth controversy is interesting to me. While I've done the bias controlled tests to discover for myself that 16/44 is good enough, I doubt I've ever had a chance to work with something that has a dynamic range greater than 96db. The counter argument that greater bit depths are valuable may well be true but moot because we don't have any source material with enough dynamic range to take advantage of it. Perhaps I'll put on a CD of the 1812 Overture today and measure the start against the cannon shots to find out if it even gets to 96 db.

If its the Telarc version probably not so much. IME the classical recordings with the widest dynamic range around come from Bis - a Swedish classical house, if memory serves.

FMW 05-01-2014 06:24 AM

Quote:
Originally Posted by arnyk View Post

If its the Telarc version probably not so much. IME the classical recordings with the widest dynamic range around come from Bis - a Swedish classical house, if memory serves.

I wonder if there is a 24 bit wav file I can buy or download with maximum range. It would be interesting to compare it to a downsampled red book copy to get a sense of the difference.

imagic 05-01-2014 06:52 AM

Now is the moment when I admit... I bought 95% of my music through iTunes and unless I'm allowed to upgrade it to high-resolution for free, it'll remain compressed 16/44. If I had to choose, I'd prefer lossless 16/44 to lossy 24/96.


arnyk 05-01-2014 07:13 AM

Quote:
Originally Posted by FMW View Post

Quote:
Originally Posted by arnyk View Post

If its the Telarc version probably not so much. IME the classical recordings with the widest dynamic range around come from Bis - a Swedish classical house, if memory serves.

I wonder if there is a 24 bit wav file I can buy or download with maximum range. It would be interesting to compare it to a downsampled red book copy to get a sense of the difference.

This is the hybrid SACD set fo the recordings that I am familiar with.

http://www.amazon.com/Beethoven-The-Symphonies-Ludwig-van/dp/B002QEXN6Q

I did some looking around for downloadible hi rez files for these selections and well, give it a try you might do better than I.

The recommended procedure is to downsample the 24 bit .wav or flac file to 16/44 (Redbook) and then immediately upsample it back to the original format. ABX the two files with say Foobar2000 with the ABX plug in. That way you are sure that the mastering is the same.

wgscott 05-01-2014 07:35 AM

Quote:
Originally Posted by Phrehdd View Post

Please note I am not challenging you but your post caught my eye with respect to your comment about HDtracks. You suggest that some of their offerings are up-sampled. Could you explain further how you came to this conclusion please.

Also - For any and all here, if 44.1/16 is considered all that is needed, why do Blu Ray and such offer 48/24 and 96/24 audio? - This is an honest question here not a challenge.

In my case, I had a friend and senior colleague over for dinner. He was in his late 60s, and in addition to being a well-regarded scientist (National Academy, etc.), he is an amateur musician. He really likes Jazz, and mostly has vinyl and some CDs (his lifetime collection). I told him that computer audio offered the possibility of "better than CD" playback, and since I don't know anything about jazz, he suggested some titles and I found one, called Lush Life by John Coltrane on HDtracks. I purchased it and downloaded it and he listened politely.

Filled with expectation bias, I asked him what he thought. He said it sounded like a cheap CD or some such thing.

Since we both do macromolecular X-ray crystallography, he suggested I should do a FFT on the download and see what it looked like. After a bit of Googling, I found and downloaded Audacity, which is an open-source app that (among many other things) lets you do such analyses. Here is what I got:



It turns out there are many other examples. We started posting these sorts of analyses here.

HDtracks withdrew Lush Life but after it became apparent that this was a widespread problem, I started getting a lot of criticism elsewhere, questioning my professional competence; apparently this was the guy who makes a lot of these files (I know absolutely nothing about the music industry or how any of this works.)

The amount of bullshi'ite in this hobby is quite staggering.

Eric Tatara 05-01-2014 07:37 AM

Quote:
Originally Posted by Scott Wilkinson View Post

we have no way to verify their provenance (how they were recorded) and whether or not they actually contained frequencies beyond 20 kHz or a dynamic range beyond 96 dB.
A valid criticism. It's unfortunate that the authors did not indicate the exact details of both the recordings and playback equipment as this limits reproducibility of the experiments.
Quote:
Originally Posted by Scott Wilkinson View Post

What is needed is another such test in which the high-res selections are verified to contain that extra octave of frequencies and a wider dynamic range than CDs are capable of.
Also true. However, the cited study still provides value in that it sets a benchmark by which counter-claims must be evaluated.

On this topic, Meyer and Moran cited previous studies that investigated the audibility of high frequency components, e.g.:

T. Nishiguchi, K. Hamasaki, M. Iwaki, and A. Ando, Perceptual Discrimination between Musical Sounds with and without Very High Frequency Components, presented at the 115th Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol., 51, p. 1222 (2003 Dec.), convention paper 5876. (http://www.aes.org/e-lib/browse.cfm?elib=12955)

in which they found mixed results in subjects ability to discern high frequency components.. A newer version of their work is published in:

T. Nishiguchi, K. Hamasaki, K. Ono, M. Iwaki, and A. Ando, Perceptual discrimination of very high frequency components in wide frequency range musical sound, Applied Acoustics, 70:7, 921-934, 2009. (http://www.sciencedirect.com/science/article/pii/S0003682X09000036)

I have these full articles, but unfortunately I could not find any links to non-subscription versions online, so I quote the conclusions from their 2009 paper:

"This paper described three subjective evaluation experiments. In the first experiment, thirty-six subjects evaluated 20 kinds of stimulus, and each stimulus was evaluated 40 times in total. The results showed no significant difference among the sound stimuli, but one subject attained a significant difference in a binomial test at a significance level of 0.05. In order to confirm the reliability of this result, a supplementary test with this subject was conducted. This subject evaluated six different sound stimuli 20 times. As a result, no significant difference was found among the six sound stimuli. Therefore, we concluded that this subject could not discriminate between these sound stimuli with and without very high frequency components."

After consideration of the above results and the previous studies [8,9], we conducted further subjective evaluation tests. The durations of sound stimuli were different in each test: the shorter duration was around 20 s, and the longer one was around 2 min. Sound stimuli were re-recorded with a newly developed very wide frequency range microphone in order to sufficiently capture the very high frequency components. The results of these tests showed that two subjects achieved a correct answer rate at a significance level of 0.05 for one sound stimulus, ‘‘Chikuzen-Biwa”, of the long duration. To examine the results, the non-linear distortion level within the auditory band caused by reproduction of the very high frequency band was measured. From this measurement, there was only 25 dB SPL of non-linear distortion within the audible frequency band in the sound stimulus for which the two subjects achieved a significant rate of correct answers on the binomial test. According to these results, the subjects could discriminate with and without very high frequency components in the musical sound stimuli. On the other hand, no significant difference was found for the other sound stimuli of both short and long duration. The hearing threshold above 20 kHz of the two subjects was also measured, and it was found that their hearing thresholds were below 22 kHz. According to the binomial test, two subjects, who could not hear the pure tone above 22 kHz, perceived differences between audio signals with and without a higher frequency band above 22 kHz only for a longer stimulus with the highest level of very high frequency components. We have no hypothesis or scientific reason that can explain this finding. Previous studies also reported that humans could perceive ultrasound by bone conduction; there is no established theory on the mechanism of auditory perception of ultrasound by bone conduction [28]. It has also been suggested that humans have an unknown information channel for very high frequency sound that is not air conduction hearing [29]. Additional studies are required in order to have further discussion about this potential phenomenon."

Bold emphasis mine. So it seems that, if the signal actually does contain ultrasonic components, some people can tell the difference between from the original with HF and a signal with the high-frequencies filtered out. I found the last few sentences in their conclusion fascinating. Nishiguchi et al reference the work of Honda et al, a later version of which is published in Brain Research:

T.Oohashi, N. Kawai, E. Nishina, M. Honda, R. Yagi, S. Nakamura, M. Morimoto, T. Maekawa, Y. Yonekura, and H. Shibasaki, The role of biological system other than auditory air-conduction in the emergence of the hypersonic effect, Brain Research, 1073–1074, 339-347, 2006. (http://www.sciencedirect.com/science/article/pii/S0006899305019499)

This is a remarkable article to read. OOhashi et al essentially re-create the experiment of Meyer and Moran, but evaluate the EEG of participants rather than only a self-reported response of perceived differences. Again, I didn't find a freely available version online to share, but I quote from their article:

Note: HFC/LFC = high/low frequency components.

"It is reasonable to consider, therefore, that the hypersonic effect detected by the increase in the power of the alpha-EEG in the present study reflects the activation of the deep-lying brain structure, including the brainstem and thalamus. These brain areas, containing distinct neuronal groups that are the major source of the monoaminergic projections to various parts of the brain (Role and Kelly, 1991), introduce approaching behaviors and are considered to be intimately connected with registering pleasurable sensations (Thompson, 1988)."

"The point of the present experimental design is to focus on the fact that the hypersonic effect does not emerge at the presentation of HFC alone but it emerges only when HFC and LFC were simultaneously presented. Therefore, we compared the listeners' responses under two experimental conditions: FRS and LFC alone in an experimental setting that clearly showed whether the air-conducting auditory system or another biological system was involved. The subjects did not perceive HFC alone as a sound when it was presented either through speakers or earphones. Nevertheless, when both the LFC and HFC were simultaneously presented through speakers not only to the conventional air-conducting auditory system but also to a large area of the body surface, which might have another sensing system, the power of the spontaneous EEG activity of the alpha 2 range was significantly enhanced and the subjects spontaneously adjusted the sound to a significantly greater magnitude for more comfortable listening than when LFC alone was presented through the speakers, indicating the emergence of the hypersonic effect in the given experimental setting. These results are in complete agreement with our previous reports (Oohashi et al., 1991, Oohashi et al., 2000, Oohashi et al., 2002, Yagi et al., 2002, Yagi et al., 2003a and Yagi et al., 2003b).

More importantly, the emergence of the hypersonic effect was not observed in any of the two indexes when both the LFC and HFC were presented selectively to the air-conducting auditory system through the earphones. Since a setting in which both LFC and HFC are presented selectively to the air-conducting auditory system without disturbance is ideal, we would expect to observe the hypersonic effect if it could be induced by the air-conducting auditory system alone. However, the fact that absolutely no hypersonic effect was observed under this condition demonstrates that the air-conducting auditory system does not respond to HFC."

So, we cannot "hear" high frequency components, yet the brain is somehow able to respond to HFC in the presence of LFC. The conclusion also suggests that using headphones diminishes any benefit of high-rez audio with respect to HFCs.

David James 05-01-2014 07:55 AM

The studies referenced are far from universally accepted. The results themselves are questionable and even then, the general results of the study have never been reproduced.

commsysman 05-01-2014 08:08 AM

I have seen many studies put forward that claim to prove this or that about audio performance or A/B testing. The claims never stand up to proper scrutiny.

When you look at the methodology and number of test subjects etc. etc., every one of them falls so far short of the requirements for a scientifically valid study that they are laughable.

Most of these studies seem to be trying to imply that they have proved a negative thesis anyway, which is a logical impossibility.

A study which says they "were unable to detect any difference between A and B" does NOT prove that there WAS no difference; only that the study proved nothing.


Studies so poorly conceived and executed would be ripped to pieces if they were subjected to the kind of scientific scrutiny that a study receives when it is submitted as part of a Doctoral Thesis.

Too bad people don't try to learn about the nature of scientific methodology before they waste a lot of time spinning their wheels in amateurish silly games.

arnyk 05-01-2014 08:08 AM

Quote:
Originally Posted by wgscott View Post


Since we both do macromolecular X-ray crystallography, he suggested I should do a FFT on the download and see what it looked like. After a bit of Googling, I found and downloaded Audacity, which is an open-source app that (among many other things) lets you do such analyses. Here is what I got:



It turns out there are many other examples. We started posting these sorts of analyses here.

HDtracks withdrew Lush Life but after it became apparent that this was a widespread problem, I started getting a lot of criticism elsewhere, questioning my professional competence; apparently this was the guy who makes a lot of these files (I know absolutely nothing about the music industry or how any of this works.)

The amount of bullshi'ite in this hobby is quite staggering.

The first place I saw this sort of thing revealed was here:

www.davidgriesinger.com/intermod.ppt‎ slides 24-28

This paper was presented to the AES 24th International Conference, Banff, Alberta, Canada in 2003.

BTW:

http://www.nhk.or.jp/strl/publica/labnote/lab486.html

"Thirty-six subjects evaluated 20 kinds of stimulus, and each stimulus was evaluated 40 times in total. The results showed no significant difference among the sound stimuli, but that the correct response rate for three sound stimuli was close to the significance probability (5% level). Furthermore, it showed that one subject attained to a 75% correct response rate, which indicated a significant difference. In order to confirm the reliability of this result, a strict statistical supplementary test with this subject also was conducted. This subject evaluated 20 times over six kinds of sound stimulus. As a result, no significant difference was found among the six sound stimuli. Therefore, it is concluded that this subject could not discriminate between these sound stimuli with and without very high frequency components."

It is a truism that it is difficult or impossible to prove a negative hypothesis. That > 20 KHz audio information makes no audible difference is a negative hypothesis. However, when enough people try hard to prove the positive hypothesis that there was a difference and fail, its got to at least be a good guide.

The fact that so many so-called hi-rez recordings were generally accepted as being sonically beneficial even though technical scrutiny shows that there was no difference has to mean something.

wgscott 05-01-2014 08:14 AM

This is the strangest one I have come across (a Kodama Beethoven Piano Sonata from HDtracks). Any idea what is going on with this? Bis/eClassical has the same thing and it looks perfectly fine.

WTF?

wgscott 05-01-2014 08:20 AM

Quote:
Originally Posted by arnyk View Post


The fact that so many so-called hi-rez recordings were generally accepted as being sonically beneficial even though technical scrutiny shows that there was no difference has to mean something.

It is a triple-blind test. biggrin.gif

arnyk 05-01-2014 08:22 AM

Quote:
Originally Posted by wgscott View Post

Quote:
Originally Posted by arnyk View Post


The fact that so many so-called hi-rez recordings were generally accepted as being sonically beneficial even though technical scrutiny shows that there was no difference has to mean something.

It is a triple-blind test. biggrin.gif

Right, and the number of test subjects has to satisfy any reasonable person.

Eric Tatara 05-01-2014 08:28 AM

Quote:
Originally Posted by David James View Post

The studies referenced are far from universally accepted. The results themselves are questionable and even then, the general results of the study have never been reproduced.

I'm not taking a position in favor of high-rez audio, but the Wikipedia article is both out of date and logically inconsistent. The NHK group never attempted to reproduce the results of Oohashi et al, so to say the results are not reproducible is incorrect (Wikipedia's argument, not yours). Second, the NHK group has a more recent 2009 article that I cited in which they used a new experiment and did report subjects ability to distinguish signals with HF filtered from the original.

wgscott 05-01-2014 08:31 AM

Quote:
Originally Posted by arnyk View Post

Right, and the number of test subjects has to satisfy any reasonable person.

You would be amazed (or maybe not) at the number of people who say they hear differences (always an alleged improvement) in the demonstrably up-sampled music, and this only proves that you have to listen with your ears, rather than use your eyes to look at graphical output of spectral analyses.


All times are GMT -7. The time now is 11:50 PM.

Powered by vBulletin® Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.

vBulletin Optimisation provided by vB Optimise (Pro) - vBulletin Mods & Addons Copyright © 2014 DragonByte Technologies Ltd.
User Alert System provided by Advanced User Tagging (Pro) - vBulletin Mods & Addons Copyright © 2014 DragonByte Technologies Ltd.