Originally Posted by hellokeith
You have talked before about jitter, in respect to real-time optical audio playback. I went back and read through some of those posts including all the posts by Shore.. that guy is WAY over my head.
Ok, let me explain it one more time as to get others caught up with what is buried in other thread.
We all know that the goal of our systems is to covert the analog signal to digital, process it, transport it, and then convert it back to analog. The first stage is accomplished by analog to digital converted (A/D) and the latter with digital to analog converter (D/A).
The A/D samples the analog input at regular intervals as dictated by the sampling rate (which is usually 48Khz in video). The math says that we can 100% replicate the analog signal as long as we try to only preserve half the bandwidth as represented by the sampling rate (which is 48/2 = 24 Khz in our example). The math however, assumes perfect timing at both end. If the intervals were exactly 0.00001112 seconds at input (I am making up this number), then the D/A converter better convert that digital sample at exactly 0.00001112 not 0.00001113 or 0.00001111.
What happens unfortunately in real life is that there is no way to achieve 100% accuracy. Let's take the S/PDIF or Toslink digital connection. Both of these are serial interfaces meaning there is only a single wire carrying the pulses representing the source digital stream. Buried in the same signal is a clock which determines the timing mentioned above. A circuit called PLL (phased locked loop) is used to extract this clock. Which tells us when we should look at each sample on the wire. Yes, the samples are digital on this wire. But their timing is not! The time is captured by the receiver based on the PLL accuracy which itself is dependent on quality of the PLL and signal quality on the wire. If you use a cheap and long capable, the digital pulses lose fidelity and the timing detected on the receiver will then change.
Now if the timing just changes by a fixed delay, that is OK. The signal will only be delayed but will not become distorted. Unfortunately, what we get is not this, but a signal that changes in timing all the time and sometimes at very high frequency. That is, the arrival time of the input pulses jumps forward and back within a range and at certain speed (frequency). We call this variation jitter' and the characterize it by its frequency and amplitude.
Obviously if the timing variation (amplitude) is 1 billionth of a second and frequency is 1Hz, (i.e. once a second), you are not going to hear any distortion. But at certain frequencies and certain magnitude it becomes audible.
Where is the distortion coming from? The variation in timing means that the input source is getting warped back and forth at every one of those points in time. Imagine taking points on a graph and tugging them left and right with certain pattern (frequency) and amount (amplitude). The curve starts to look different, right? The simplest case maybe is easier to see. Imagine the source being a straight line at 45 degrees. If you take points along it and move them up and down a bit, the line is no longer ruler flat.
In the example above, you can see how we go from a source that was linear (i.e. it simple increased in value over time) to a curve which was no longer flat, or non-linear. Non-linearity, means harmonic distortion. Just like the quantization distortion we talked about. Putting it all together, jitter causes harmonic distortion with the amount dictated by the characteristics of jitter.
Now, let's bring the topic down to optical formats. Here, we are writing a sequence just like the digital interfaces above, except that we are putting them on an optical disc. Same schemes must be used to recover the clock from it, etc. However, there is one big difference. We use these optical discs as data. The input has error correction, and is buffered. Buffered means we read things into memory and then use data out of it. By definition the, what comes out of the buffer is reliable since it is not subject to variations of a serial device. And the error correction makes sure that if some sample is read incorrectly due to jitter, that it is able to correct that mistake. So jitter does not play a role here with respect to fidelity (but is an important characteristic with respect to how reliable your drive is in recovering from borderline signal conditions).
We have a secondary line of defense here. As soon as you compress audio or video, they also become immune to jitter. Reason is that once you compress the signal, it loses any sense of timing it had. Sending Dolby Digital stream for example, over even the worst S/PDIF connection does not suffer one bit from jitter. If you hear it, it is perfect! Why? Because once the source becomes compressed, it becomes like computer data. The destination only needs to extract their digital values. It does not matter when they came because we are not trying to play them right then and then. Instead, we just need the correct sequence of bytes to feed the decoder for Dolby Digital so that it can recreate the PCM values its encoder means to be reproduced by the DD encoder.
Here is another way to look at it. Take TrueHD. We know that on the average it compresses the source by a factor near 3:1. So for every three PCM samples, you get one TrueHD sample. Therefore, it goes without saying that if I transmit such a signal from one side or the other, I am in essence sending three bytes for every one byte of the source. In other words, it is working faster than real-time. Because of this, the receiver must have a buffer (memory) where it stores the decoded data as often, it may get ahead of the time when we need to play it. And as soon as we buffer things, then jitter goes out the window as a consideration.
Same is of course true of video compression. All codecs require some amount of buffering to operate.
So let's put what we just learned to use.
If we take the PCM samples stored on CD, we are not dealing with compressed signals so no buffering and no advantage from data transmission. Therefore, everything in the chain becomes a source of (accumulative) jitter. The source could add it, the connection to AVR may add it. And the paths through the AVR may add it. Issues here have made some people think the same applies to HD optical but such is not the case as I explained above and more below.
If you decode these codecs in your player, and convert to analog in your player, then the only place jitter can interfere is from the output of the audio codec to input of D/A converter. This distance being short and proprietary, it can be of extremely high quality and subject to little jitter. Note that it is not a given that this short path is high quality. With video running around plus a bunch of processors inside a single box, there is liable to be jitter and a lot of it. But in theory, this is a rather optimal chain. Unfortunately, this means that you now have to find an analog only path for amplification which not many people have. As soon as you digitize this signal for say, base management in the AVR, all bets are off.
If you decode audio in the player and send it as PCM over HDMI to the receiver, those samples are real-time and therefore, subject to jitter. This jitter is additive to whatever jitter also exists between the HDMI receiver and the DAC. In other words, you have all the problems of above scenario plus jitter induced by HDMI.
If you send out the bitstream of the codec (i.e. prior to decoding in the player) over HDMI then you eliminate jitter on the link since we go back to data transmission, not time sensitive PCM. If jitter is well under control in the AVR and much lower than what HDMI capable contributes, then you get better sound. But the same caution applies as decoding in the player with respect to having a lot of noisy parts in the AVR causing jitter.
This got me thinking about something you said wrt to buffering. Someone said "Why not just buffer a lot of the optical disc data, like a full second or two, so that any errors or timing problems are handled by the buffering" and your response was that in theory it sounds like a good idea but in practice people don't like their remote control actions delayed. So my question, why would the RC control data be delayed? A large amount of buffering only really makes sense in the optical disc player, and it knows the milisecond you press stop to do whatever is necessary to stop playback, regardless of what is in the buffers, no? Now one could argue that a second or two of buffering might cause a delay with smooth ffwd/rwd or skip ffwd/rwd, but those actions wouldn't need buffering since you aren't concerned about fidelity at 4x playback speed.
Let me expand on what this kind of buffering is. If you have some memory on the receiver, you can use it to store the incoming samples. Yes, the samples will still come with timing errors but remember, their digital value (i.e. what the PCM number is) is still correct. So the data that is in the memory, is correct. When you decide to output, you use a new, high quality clock that runs at the same sampling rate as the source (e.g. 48Khz). Since you are not relying on input pulse train for your clock, then variations/jitter is removed and you get better sound.
But with buffering, you run into a new problem. There is no way to have a clock that has the same accuracy as the source. Even world's best oscillators (clocks) have accuracy differences and drift with time and temperature among other factors. This means that your receiver clock may run faster or slower than the original clock that was used to encode the analog sound. If you go slower, then your buffer starts to overflow. If the clock in the receiver goes faster than the source, then you soon run out of samples. Then what do you do? The solutions are not good. You wind up using a form of sample rate conversion to eliminate to manufacture new samples! And you better hope a good solution is used to do this as otherwise, you occasionally get anomalies as samples are created or destroyed.
Problem becomes more complicated for video where you now have to store both audio and video. And video takes even more space than audio and unfortunately, it cannot be sample rate converted easily. So you wind up making audio slave to video requiring even more resampling.
OK, assuming you are OK with all the complexity, and cost in the player, you now have a new problem stated in your question. If you buffer too little, then you wind up running dry too many times. If you buffer more, then anytime the user wants to play something, you have to wait to fill the buffer before doing so. If you buffer just quarter of a second, this could cause noticeable delay to user remote control actions to jump forward, etc. Could you tolerated it? Sure. But given the cost and complexity above, I doubt that it will be implemented anytime soon.
Believe it or not, the best solution here is a Home Theater in a Box! It is ironic that such components are budget systems. A high-end solution could run circles around having separate pieces and having to move things through long cables. I just noticed a $3,500 box like this from Arcam with built-in amps and such. If this had HD DVD in it, it would have the potential to outperform any separate device in audio!
Probably a lot more than you ever wanted to know about the topic but here it is