What is MPEG-H?

MPEG-2 we all know. That’s the video encoding used in DVDs and most free-to-air digital TV broadcasts. Well, no, that’s only part of MPEG-2. The newest MPEG – MPEG-H – is similarly complicated. Here Stephen Dawson tries to sort it all out.

MPEG stands for Motion Pictures Expert Group, a joint endeavour of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). It was established way back in 1988, even before the effective dawn of the internet (the web got going in the early 1990s). Its role from the outset was to develop suitable standards for the compression of audio and video, principally to allow them to be efficiently transmitted.

MPEG-1 video and audio compression, released in 1993, became the standard for the Video CD. MPEG-2 came in 1995, in time for the DVD. And MPEG-4 landed in 1998, some eight years before the widespread release of Blu-ray, which mostly uses the MPEG-4 AVC (Advanced Video Coding) compression system for video.

But each MPEG version has included multiple parts: some for video compression, some for audio compression, some for the transport protocol, some for testing and some for all manner of related things. They are, in short, a bunch of documents describing these things.

MPEG-H parts

MPEG-H was released in 2013. Some elements of it were new, while others folded in some existing, but then fairly recent developments.

Most of the parts concern things like error correction standards, conformance testing, adaptation to the new standards and so on. Of broader interest to us here are Parts 1 (transport), 2 (video compression), 3 (3D audio) and to a lesser extent 12 (image compression).

MPEG-H Part 1

Part 1 defines the ‘MPEG Media Transport’. The important thing to understand is that the data holding the video or audio information can be carried in different ways. Often those different ways are called ‘Containers’. Three examples:

The MPEG-2 video carried on a DVD is generally the same as the MPEG-2 video carried by SDTV, but the first is carried in a ‘Program Stream’ while the second is carried in a ‘Transport Stream’. Once the data gets to the decoding device, it’s decoded the same way. But the ‘Stream’ or ‘Container’ varies according to the special needs of the medium of transmission. A DVD can, for example, be re-read if there’s a problem. A digital TV transmission can’t be.
Direct Stream Digital – the audio format used by the Super Audio CD – has become popular in the audiophile community as a high resolution file format for music. This is fed from a computer to a Digital to Analogue converter via USB. But the USB Audio protocols didn’t support DSD, so up until quite recently the DSD was packaged up to look like standard PCM, albeit in high resolution forms. Codes within the data triggered suitable DACs to switch to DSD mode.
MKV is the file extension for AV programs contained in the Matroska Multimedia Container. That’s a free, open standard. It is designed to hold pretty much any audio and video, however compressed. The container was designed for robust streaming. Not surprisingly, pirated movies tend to end up in MKV files. Within the container, the video is often AVI or ASF. For a player to play an MKV file, it must support reading the MKV container as well as being able to decode the actual codec employed within it.

The MPEG-H Part 1 defined MMT is designed to have ‘low computational demands’ and to work well with new internet network standards. Remember, traditionally digital streams have been sequential, whether delivered over broadcast media or from a disc. Internet-based streams come in packets. MMT is designed to work better with this.

MPEG-H Part 2

Part 2 defines the fairly new High Efficiency Video Coding (HEVC) standard. From 2004 MPEG was working with the Video Coding Experts Group on a compression standard to improve on Blu-ray’s MPEG-4 AVC (or H.264). The aim was to reduce the required bitrate by half for similar quality output. The final standard is called HEVC and is otherwise known as H.265.

And, of course, that’s what’s used on UltraHD Blu-ray. Because of the higher efficiency, even though UHD BD discs have a maximum size of a little more than double a regular Blu-ray disc, they can hold video at four times the resolution.

MPEG-H Part 12

Skipping ahead briefly, I thought I should mention Part 12. That defines the High Efficiency Image File Format, or HEIF. This standard was finalised in 2015. It’s a container for still images. Well, extended still images, if you like. A HEIF file can hold one image or a sequence of them, kind of like a short video clip. Because it’s a container, it can hold images encoded with different codecs. Primarily, it’s HEVC that’s intended, but there’s also support for H.264.

One implementation of this has been in the Apple iPhone for the last couple of years. Although it supports only the still version, not sequences, it can realise quite marked space savings using HEVC inside HEIF compared to JPEG.

MPEG-H Part 3

What may yet have considerable impact in consumer world, but remains little known, is MPEG-H Part 3. That’s so-called ‘3D Audio’.

We’ve all seen how movie (and to a lesser extent, music) audio has grown over the years. Initially it was all mono, including in cinemas. Then an array of front channel speakers provided for on-screen localisation of sound in those super-wide screens of the late 1950s. In the 1960s and 1970s, cinema ‘stereo’ appeared. That added a rear channel to the three at the front. In the 1990s Dolby Digital and DTS brought in 5.1, with two full-bandwidth discrete surround channels. Then 6.1 and 7.1 came in dribs and drabs. Most recently we’ve seen Dolby Atmos and DTS:X. These add a real up/down dimension.

And they started dealing with audio in a fundamentally different way. Until those two, everything was channel based. The sound engineer put a particular signal in the front left channel and it was expected that the user would have the front left channel speaker in the right place in their home to put that sound to best use. The only major concessions to the real world were the ability to adjust the level and change the timing. But nothing could be done if the speaker were at the wrong angle.

Atmos and DTS:X include a new model: audio can be an object. The sound object – say, a voice – has its location defined in space. It’s then up to the decoder to allocate this object in the correct timing and qualities to the installed speakers. (We’ll leave aside for now that few current home decoders actually know precisely where the speakers are.)

The 3D Audio defined by MPEG-H Part 3 includes support for up to 128 channels of sound, plus objects. Plus, there’s something included called Higher Order Ambisonics. This is a technology in which ambient sounds are delivered separately to the main channels.

In a sense, MPEG-H 3D Audio is pretty much designed to cover every kind of entertainment now, and into the future. The Ambisonics stuff, for example, is expected to be more heavily used with Virtual Reality. Also supported is binaural sound for headphones (that’s material specifically recorded for headphones, with microphones typically placed as though they were ears on a head).

Size counts

MPEG-H 3D Audio is really for broadcast and Internet-based entertainment delivery. Given that the UltraHD Blu-ray standards have long since been settled, it’s unlikely to make an appearance there.

What it will potentially provide is high quality surround sound for new TV systems, including Internet ones. But of course there are competitors. In particular, Dolby AC-4 is being closely looked at by various standards committees.

MPEG-H 3D Audio does have an real-world existing implementation: it has been in operation for nearly two years in an Ultra High Definition terrestrial broadcaster in South Korea. But according to the MPEG-H 3D Audio website, only one consumer device has yet been approved for its reproduction. That’s a new Sennheiser soundbar. Apparently there’s another device from LG for its reproduction, although that’s not listed as approved.

And now Sony has announced ‘360 Reality Audio’. That’s based on MPEG-H 3D Audio, but is more a whole ecosystem for music rather than movies. The idea is that music will be recorded (or at least processed) to provide fully immersive surround sound. Given that the several attempts over the years to introduce high quality surround music – SACD, DVD Audio, multichannel music on Blu-ray discs – have resulted only in niche acceptance, I wouldn’t be confident that this initiative will be successful.

Nonetheless, there’s a moderate chance that MPEG-H 3D Audio might become the choice for the new European UltraHD broadcast standard (the US ATSC 3.0 standard will probably tilt towards Dolby).

Of course, as broadcast TV continues to suffer at the hands of video on demand via streaming services and a younger audience who seem uninterested in traditional TVs, who knows whether there’s be that much of an industry for either MPEG-H 3D Audio or Dolby AC-4 to dominate.

Connected Magazine

Main Menu