A Guide to Immersive Audio Pt 1

The Funky Junk Guide to Immersive Audio

Immersive Audio is an evolving, collective term for any listening experience beyond stereo. It places the listener at the epicentre of a 3-dimensional audio environment. Immersive Audio systems can create more realistic, emotive and complex soundscapes than traditional stereo due to their ability to position sound in front, to the side, behind or above the listener and, in some cases, respond dynamically to the listener changing position. This capability has made it an increasingly popular audio format in Film, Television, Gaming, Virtual Reality, Live Music & Installation Sound production.

Immersive Audio formats have become mainstream in Film, Television, VR, Gaming & Sound Installation Productions

Driving the technology and demand for these next generation audio formats are the expectations and demands of consumers. Consumers are increasingly time poor and, when it comes to leisure, they want their experiences to be as fulfilling as possible; games where the soundscape lends greater reality to play, films and programmes that place them in the centre of the action visually and sonically, installation art that enhances the visitor experience with dynamic changes in sound from exhibit to exhibit, Virtual Reality that is augmented by the soundscape moving in sync with your head movements. Where this could all lead is very exciting and limited only by the technology of the day and the imagination of the immersive content creator.

Understanding these emerging formats then, can provide a gateway into many forms of media beyond chart music production. So, in this guide we’re going to take a look at the formats and the technology behind Immersive Audio from a content creator’s eye view. We’re going to do this in 3 parts:

Part 1 – Understanding How Immersive Audio Works (Axes, Formats)
Part 2 – Breaking Enigma (Encoding immersive audio, up/down mixing and ways of working with them)
Part 3 – All The Gear & Some Idea (example setups for Atmos & Ambisonics content creators)

Part 1 – Understanding How Immersive Audio Works

Bold As Love – Axis X, Y & Z

In a traditional stereo monitoring system, the listener is presented with audio on one plane – in front of them with a degree of width and at a fixed height relative to the speakers. For the purposes of this article we’ll call that the X (side to side) Axis.

The human perception of depth and surroundings is highly evolved; both visually and aurally so, with the speakers being on this singular flat plane, a lack of depth (front to back placing of different instruments or noises) is discernible. To our ears and brain, quieter sounds will seem to be further away than louder sounds and higher pitch sounds will appear to be naturally wider during stereo playback but, the sound on these systems will always remain less than convincing to us; more 2 than 3 dimensional.

To improve depth perception an immersive playback system utilises additional speakers to create both an X & Y (front to back) axis sound placement. One of the earliest examples of this approach is Quadraphonic sound, which had its hay-day in the 1970’s. By positioning a second pair of speakers (Lr & Rr) behind the listener, you are placing them at the centre of a more 360 degree or “Surround” sound scape.

With all speakers playing back at the same level, the sound will be centred on the listener. If the front speakers are playing back louder than the rear the sound will be centred in front of the listener and the reverse is true too. In this way a greater illusion of depth can be achieved than in a traditional stereo audio system.

So how does that work and how do we control the position of a sound in this basic surround system?

Let’s look at this in simple terms. If you imagine you’re working with a single mono signal and it is split across 4 faders and each fader’s output goes to one of these four speakers as in the diagram below; it is the balance of the 4 faders that positions the signal in the sound field. Rather than trying to explain this in words it’s often better to visualise what’s going on under the hood so study the positions of the faders and follow the green ball that represents where the image is perceived to be by the listener on the diagrams below to help you get an idea.

What the faders are doing in our analogy above is acting as a “surround panner”. Of course, there has to be a simpler way of achieving this in real world practice and that’s where an XY Axis joystick comes to the rescue. The tip of the joystick effectively becomes the green ball in the examples above and adjusts levels to each speaker accordingly.

Each audio channel in a surround mix therefore has a surround panner that is more demanding in terms of resources than a stereo panner (the number of “faders” in our analogy increases with the number of monitors in the surround system).

Of course, a modern surround system can do much more than the basic examples shown above – it can fly a sound in a circle around the listener if that’s your thing, position the audience behind the listener and the band in front in a live recording context to create the feel of being in the mosh pit, or reverse things around to put the listener at the front of the band facing the crowd. As I mentioned earlier, the main limit is your imagination. In technology terms Quadraphonic is a basic immersive audio format; by adding more speakers, presenting sound from more points on the circle around the listener and with more sophisticated encoding (more on this later); a greater degree of realism can be achieved.

Fast forward from the Quadraphonic 70’s to the 21st Century then and XY Axis based surround sound formats have been widely adopted in film and television; popular examples of which include 5.1 & 7.1 surround.

In a 5.1 surround system a Central Speaker ( C ) is added to the Quadraphonic Setup as well as a Sub-Woofer (S). The .1 refers to the number of sub-woofers, the 5 to the number of other speakers in the system. Expanding that to a 7.1 system is a matter of adding 2 dedicated side speakers (Ls & Rs). The impact of these additional monitors on the level of realism and dimension achievable is significant. The centre panned load taken up by speaker C and more omnidirectional sub bass load taken up by Sub-Woofer W allow the remaining speakers to focus more on wider mid to high frequency sounds, increasing the perception of 3-dimensional ambience. The additional side speakers Ls & Rs in the 7.1 array fills out the 360-degree image around the listener more, allowing sounds to be more precisely positioned in the surround soundscape and, when dynamically panning; a smoother transition of a sound from one position to another.

Whilst XY axis immersive systems have come a long way towards creating a more realistic and exciting soundscape for listeners, they are still missing a few vital ingredients.

Remembering my earlier point about stereo audio sitting in only one plane at a fixed height; whilst 5.1 & 7.1 systems can surround a listener with audio on the X & Y axis, adding depth and increasing direction possibilities, they are still doing it at a constant height. The result is that wherever the signal is panned around the listener, it appears to be coming from the same height as the speakers. To create a sense of relative height in immersive audio we need to add a Z (up and down) axis to the soundfield.

As shown above, one way to achieve this sense of height (Z axis), is to add overhead speakers to the surround rig. A system capable of positioning audio at any point on the X,Y & Z axis is delivering 3-dimensional sound. The example shown is a 7.1.4 immersive system where 7 is the number of horizontal (XY) plane speakers, 1 is the number of sub-woofers and 4 is the number of overhead speakers.

The complexity of positioning each individual track in a 7.1.4 sound-field is considerable. Once again, rather than trying to explain this in words let’s refer to Image 9 below and imagine we want to place a sound (A) at a specific point (represented by the green ball) on the X,Y,Z Axis relative to the listener. If we were to achieve this using conventional faders linked to each speaker they might be positioned as shown.

Again, huge numbers of faders are not a practical real-world solution to panning in such a complex system so a 3 (XYZ) Axis Panner is required. There are two distinct choices in this area; a physical controller similar to a Joystick controller but with an added Z axis control, or a screen/touch screen based graphical user interface. An example of each is shown below.

Major Immersive Formats

The 2 major immersive audio formats we’ll focus on in this series are:

Dolby Atmos – widely adopted in television and film production
Ambisonics – widely adopted in VR, gaming and Installation Sound productions

From the listeners perspective, both formats are capable of delivering 3 dimensional sound but one important difference is that Dolby Atmos does this from a static position – ie the speakers and soundfield are in a fixed position in relationship to the listener, whereas Ambisonics over specialist sensor bar headphones allow the soundfield to change in relation to the direction the listener turns their head in. This critical difference is one of the reasons both formats have occupied different segments in the market.

The encoding approaches and science behind how both formats deliver immersive soundfields are also quite different, but that subject is more the territory of a white paper than this guide.

Stay tuned for Pt 2, coming soon…

Talk to one of our sales team about your requirements today.

Plus de 30 ans d’expérience

Expédition dans le monde entier

Un Groupe Cinq Pays

A Guide to Immersive Audio Pt 1

The Funky Junk Guide to Immersive Audio

Part 1 – Understanding How Immersive Audio Works

Major Immersive Formats