Hello everyone and thanks for joining today’s AMA (Ask Me Anything) session on immersive audio. For those who also joined our AMA on Hi-Res Audio last fall, welcome back.
I’m excited once again to host this Sony-sponsored AMA and look forward to what I’m sure will be a rollicking discussion on a very hot topic.
To get things going, I figured I’d start with a basic definition of immersive audio so that we have a shared point of reference. Essentially, immersive audio is any audio signal with more than two discrete channels that attempts to envelop the listener in a field of sound. The earliest immersive-audio standard for home listening came from Dolby Laboratories in 1987 in the form of Dolby Pro Logic, which offered a four-channel, analog, matrixed signal. (In this case, “matrixed” means a standard stereo file format that also incorporated, or “folded in” two extra channels. Upon playback those extra channels were decoded and “unfolded” in a manner that provided four discrete channels.) This first surround-sound format provided left, center and right signals in front of the listener and one surround channel behind the listener. Dolby rival Digital Theater Systems (DTS) unveiled its functionally similar DTS encoding and decoding system in 1991.
Since then, there have been several major improvements in each of the Dolby and DTS standards, including ProLogic IIx, IIz, Dolby Digital, DTS Neo and DTS HD. In each case, improvements to the technology enabled the use of more speakers and increased the overall quality and “immersiveness” of the sound. In all cases, however, the fundamental concepts behind immersive audio—placing sounds in specific channels of a predefined mix—remained the same.
Audio content from a movie or TV show or a multichannel audio recording were encoded in a specialized format. Essentially that means multiple channels of discrete audio that had been recorded and intentionally mixed to create a specific sonic soundstage, were combined into a single data stream and formatted to fit either onto a recorded disc or embedded into a digital file (the source). Then the data stream from that file was sent to a playback device (typically an audio/video receiver or the audio components inside another device). Inside that device, the signal was decoded back out to multiple individual audio channels before being sent to the connected speakers. The method of encoding and decoding was (and is) where the “magic” of immersive audio happened, and these encoding processes continue to be the way in which competitive immersive audio standards differ from one another.
The latest standards for immersive audio, including Dolby Atmos (which was first introduced for the home environment in 2014) and DTS:X (introduced in 2015) add an important twist to that basic concept: object-oriented audio. Whereas earlier immersive-audio standards had predetermined locations within the multichannel audio mix for certain sounds (a location, by the way, that was determined by the sound engineers who created the audio), the object-oriented audio techniques used in these new standards lets the sound be positioned dynamically by the decoding device (typically your AVR). So, for example, with a traditional immersive-audio system, if there is spaceship flying by in a movie, the location of the audio for the ship is predefined in such a way that at a given moment, there might be 60 dB of audio signal going to the left front speaker, and another 5 dB in the left surround. While this obviously would change as the movie played, the key point is these levels would be the same across different AVRs, different speakers and even different configurations of speakers. (It actually does get a bit more complicated than that, but I’ll get to the issue of what’s called upmixing in a second.)
In an object-oriented environment like Dolby Atmos, the receiver has knowledge of the speaker configuration you have (thanks to an initial setup process) and can, therefore, make different adjustments to different environments. Importantly, the objects (such as the aforementioned spaceship) in a scene also have additional data associated with them. So, in addition to the audio signal of the spaceship, there is also information, or “metadata” about how the object is moving through the scene. This metadata can be used by the decoding device to determine how the sound is output. In other words, you could end up getting slightly different levels than in the previous “predetermined” example if you use different AVRs and different types (and numbers) of speakers. Essentially, what’s going on is a lot of real-time computing and audio processing. In fact, it’s only because of the computing power now available that we can have these new object-oriented immersive audio standards.
Now, to get back to the “upmixing” comment I made earlier, there is a bit more to this. Everything I described is true for audio content that was specifically created for and encoded for a particular immersive audio format, such as Dolby Atmos or Dolby Digital 5.1. However, not all the movies or music we own or listen to/view comes in that format. So, in order to take advantage of, say, a 7.1 speaker system to listen to two-channel music, or even a Dolby Digital 5.1 surround-encoded audio track, the decoding engines in AVRs or other audio components can essentially “create” new content based on the signals of the audio tracks that are there, and then play that material out over all the connected speakers. This process of “filling in” the otherwise unused channels is called upmixing.
As you can imagine, some audio purists are not big fans of this for stereo music because it certainly alters the listening experience, but many people like the immersive sound impact. On movies, the extra created channels are typically much more subtle and, thanks to clever algorithms that are used to generate the synthesized content, can increase the immersive feeling of the audio. Nevertheless, there are limits to how much “filling in” you can do, which is one of the key reasons why companies like Dolby and DTS moved toward the more flexible, object-oriented approach of technologies like Atmos and DTS:X.
In addition to the object-oriented nature of these new standards, the biggest and most obvious change that Dolby Atmos and DTS:X bring to immersive audio is the sense of height. While earlier encoding standards such as Dolby Pro Logic IIz tried to bring a vertical aspect to what has essentially been horizontal immersion of sound, the focus on sound that comes from above the listener is what has made so many people excited about these new immersive technologies. Basically, this height aspect helps create a much more enveloping and, therefore, more realistic immersive audio experience than has been available from previous immersive audio standards, such as 5.1 or 7.1 surround. Both Atmos and DTS:X have separate height channels that are meant to drive either ceiling-mounted or upward-facing reflection-based speakers that help create a hemispherical sound experience.
The new nomenclature for these height channels comes at the end of the traditional 5.1-type system descriptions. So, a system that has two overhead speakers along with a common 5.1 setup would be called 5.1.2. Because Atmos supports up to 24 individual front, center and surround speakers and 10 ceiling speakers (but only a single subwoofer channel), you could theoretically have configurations up to 24.1.10. (Note that some people choose to use more than one subwoofer, however, so that’s why you might see descriptions such as 9.2.4.)
By the way, if you’d like to read the latest version of Dolby’s official explanation of Atmos, here’s a link: https://www.dolby.com/us/en/technolo...me-theater.pdf
OK, so now that I’ve covered the basics of these new immersive audio standards, it’s time to dig into your questions.