Audio Authoring for MPEG-H

MPEG-H Audio is an interactive immersive audio format, which allows creators to deliver highly customizable sound experiences. Developed by the Fraunhofer-Institut für Integrierte Schaltungen (Fraunhofer Institute for Integrated Circuits or Fraunhofer IIS), during the production process of MPEG-H Audio adjustable personalization options can be defined, including object positioning, adjustable dialogue levels, customizable audio description, and even multi-language programming. Viewers can then sculpt these to their own preferences.  

Preparing the Project

Creating an MPEG-H Audio production is straightforward, after initial parameters are set. The sample rate must be 48 kHz or 96 kHz, ASIO buffer size 512 or 1,024 samples and 3D pan mode set to 3-Layer. Any Nuendo track can be an MPEG-H component (object or bed). It is a good idea to organize the audio tracks or stems you want to assign as components into subgroups. The Renderer for MPEG-H allows you to configure metadata parameters, monitor changes in real time and then export these settings as an MPEG-H ADM or MPEG-H Master file. The Renderer must be inserted on the main mix output bus, which should correspond to the largest speaker setup your studio can support. There is also a Setup Assistant to help prepare sessions for an MPEG-H Audio project with a few mouse clicks, including essential parameters and basic routing, a channel-based bed component and an MPEG-H preset.

    Scene authoring concepts

    An interactive, personalized experience is a key feature of MPEG-H Audio, and this requires a range of metadata. Music and effects are usually in a bed mix, then extra audio objects can be configured to move freely in space and can be customized by the viewer. For example, it enables sports events to feature a main channel component, but with extra audio objects giving the listener options of the main commentary in different languages, content from the different teams, and more. Options for the end user can range from simple interactivity, via a single button push on the remote control, to more complex control of the audio elements, which can be enabled in the advanced MPEG-H Audio interactivity menu. An MPEG-H Audio scene is made up of Presets, Components, and Switch Groups.


      The most prominent personalization feature for the end user is a Preset — a combination of components, switch groups and their respective levels. A scene created within the Renderer can contain between one and eight presets, with the first being the default audio mix with neutral gain settings. As an example, a sports event could be presented with three Presets — “Default” for the standard configuration, “Dialog+” for enhanced intelligibility, and “Venue” for ambience only. Preset labels can be chosen freely by the content creator and will appear on the end users’ on-screen displays.


      Components are the smallest addressable units in an MPEG-H Audio scene, with the number of audio channels per Component determined by the type of audio track assigned to it. A mono track assigned to an Object in the ADM editor forms a Component of a single audio signal, whereas a 5.1+4 multi-object Component will be formed by a 10-track audio signal. Channel-based components — also called beds — are typically the representation of a group channel, with all automation already applied before arriving at the renderer. The other Component type is an audio object, whose position is determined by position metadata transmitted alongside the audio content and interpreted by the playback device. 

      Switch Groups

      Switch Groups accommodate Components with characteristics that the viewer has to choose between. For example, a commentary track in several languages, where the consumer or playback device chooses which language to listen to. A drop-down menu allows selected Components to be assigned to the Switch Group.


      Advanced interactivity options can be included, for example, setting the playback gain of an object within an adjustable range, or multi language support. With MPEG-H, component and preset labels can be set up in several languages. Up to four sets of labels can be added to an MPEG-H authoring session.



      Objects and beds are monitored through the Renderer for MPEG-H, which offers a preview of presets and interactivity settings and allows you to listen to your audio scene in different output speaker layouts, including binaural rendering. The Monitoring tab emulates consumer playback options, so none of the settings have any effect on the exported file, they only affect the monitor rendering. For correct playback and loudness-compensated preset switching, a loudness measurement should be run before playback.


        MPEG-H opens up new possibilities for immersive panning, including vertical positions below the listener. Objects can always be panned in three dimensions, even with a stereo Master bus. When 3D pan mode is set to 3-Layer in the project setup, the VST MultiPanner will visualize the three layers. While Nuendo panning takes place in a virtual room, panning values in the Renderer for MPEG-H (and resulting ADM export) are calculated using azimuth and elevation. Conversion between the two concepts is done in real time.


          MPEG-H Audio content can be played back on many different devices in various formats — everything from a fully immersive speaker setup to a binaural downmix over headphones. The device will perform a downmix of immersive content, the built-in MPEG-H Audio system automatically rendering channels and objects as appropriate. The current downmix settings can be previewed via the monitoring page of the Renderer for MPEG-H. This setting only simulates the processing which will be done by the end user playback device and has no effect on the exported file.


            Loudness metadata is an important part of any MPEG-H Audio scene. The Renderer for MPEG-H calculates the loudness for each Preset and embeds it in the metadata. In the playback device, the decoder then adjusts the playback volume accordingly to guarantee preset switching without jumps in volume. You can use the Loudness tab to get information on the loudness values of the Components and Presets in the scene.

            MPEG-H Export

            Once the authoring and monitoring of an MPEG-H Audio scene is completed, the mix can be exported with metadata in either the MPEG-H BWF/ADM (Broadcast Wave Format with embedded Audio Definition Model metadata) or MPF (short for MPEG-H Production) format. You can also export a Channel Mix, which creates a channel-based rendering of the first Preset in the MPEG-H scene. This is a stereo PCM file and does not contain any MPEG-H metadata.