Mysteries of metadata
Ever wonder how your AV receiver knows what to do with a file? David Meyer explores metadata.
For years I’ve been writing about all things video and HDMI, and countless times mentioned metadata. But I’ve not really gone into any detail before about this all important yet enigmatic part of an AV signal. So just what is metadata, and how widespread is its use? Let’s lift the lid and have a snoop around…
“Meta” means something that is referring to itself. “Metadata” can therefore be described as data about data. And the data in question can be just about anything. Think of it as the information or instruction sheets that accompany and give meaning to the exabytes of digital information that are constantly whizzing around the world over assorted mediums.
There are many applications, standards and policies surrounding metadata — some controversial, others totally benign. Government access and use of metadata that track peoples’ activities online has created a stir amongst privacy advocacy groups for a few years now. On the standards side, the Library of Congress in the US defines the Metadata Encoding and Transmission Standard (METS) as used on the World Wide Web, while the International Press Telecommunications Standard (IPTC) defines the structure and use of metadata in the press and imagery; its members include:
- the European Broadcasting Union (EBU),
- News agencies such as The New York Times, Reuters, Bloomberg, Associated Press, and BBC News,
- Image libraries including Adobe Systems, Getty Images and Shutterstock.
In these applications, metadata may be used to label a file with things like the date and time of creation, author or creator name, creation information (e.g., camera model and photo settings),
copyright, file size and data quality. Typically, benign stuff that we take for granted.
But I’m here to talk about metadata in AV signals — that’s way more fun! In this context metadata is the labelling and instruction sheet that accompanies a media file to inform relevant devices in the system — e.g., AV receiver or display — to know what the AV data is and what to do with it. It doesn’t take up much of a media file or stream: mere hundreds of bits out of the millions of bits per second in a compressed AV stream, or billions of bits per second in an uncompressed stream.
There’s a number of different video and audio metadata applications that are crucial to an AV system working and performing as expected.
When you stream a show on Netflix, metadata is packaged with the incoming video that contains information about the composition of said video — container (codec) used, resolution, frame rate, colour space and profile, whether it features HDR, and if so which type, etc.
If you’re using the Netflix app on a smart TV, it works out the optimum settings directly, and displays the video accordingly. For example, the “Netflix Optimised” mode on a Sony Master series TV uses metadata delivered by Netflix to optimise a corresponding pre-calibrated mode in the display. If using an external media streaming box such as Apple TV 4K or Nvidia Shield, it will compare the EDID coming through the HDMI port from downstream devices (AVR and display, etc.) with the metadata-defined capabilities of the source media, and output the optimal combination through HDMI.
Same goes for other sources including broadcast and satellite services, gaming consoles, and packaged media such as UHD Blu-ray. Regardless of the incoming delivery method, metadata is crucial in defining the composition and available features of the video signal. If something happens to the metadata, the AV stream will not present as expected, and that can lead to unscheduled project overruns and troubleshooting time. Boo!
Three really good examples of emerging vulnerabilities in HDMI video metadata are:
- Frame/refresh rate, and
- Applied compression.
Joel Silver of ISF famously said “the biggest problem with HDR is getting it to turn on”. He was talking about metadata. This was actually the catalyst for the development of the upcoming CTA/CEDIA recommended practice called CEB28 HDMI System Design and Verification, in which HDR features heavily. Joel is Chair of the working group.
HDR metadata is quite literally the key to making HDR work in all formats except HLG (which doesn’t need metadata). The metadata includes HDR type, mastering brightness level, and colour and tone mapping profile that a display can then apply for optimal presentation. That way, the same HDR content can render optimally whether the supporting display is, for example, 800 or 2,000 cd/m2 (nits) brightness, or OLED, LCD, or quantum dot.
“Static metadata” for HDR has one settings profile for an entire program, based primarily on the SMPTE ST 2084 standard, with transport from source introduced with HDMI 2.0a. “Dynamic
metadata” is far superior in that it’s included in the header of every frame, enabling frame-by-frame optimisation of the HDR settings. This is defined by the SMPTE ST 2094 suite of standards,
and HDMI 2.1 for transport from source to display.
Historically, HDMI has always operated on constant video frame rates, and accompanying clock to keep timing in the video signal, like a metronome. A very cool new feature of HDMI 2.1 is variable refresh rate (VRR), that will be particularly popular with gamers. In fact, some new TVs and gaming consoles already support it. But because this breaks the frame rate and frame size (including blanking interval) away from the clock speed, it has to instead use metadata to let the display know what to do. This can be bundled alongside the dynamic HDR metadata in every frame’s header.
Another consideration is the introduction of compression in HDMI. It will be a while until we see this, but keep in mind it won’t just be for the super high-end formats above 48Gbps — it will also be possible to enable 8K through a smaller pipe such as 18Gbps. But this too will rely on metadata to communicate what’s going on.
Metadata for audio is every bit as prevalent in AV media streams as that for video. After all, that’s how a device receiving the media stream knows what format and quality of audio is included. Is it L-PCM 2.0, compressed DTS 7.1, or lossless Dolby Atmos, and is it channel or object-based? And then at what sampling rate, etc.? It’s all in there. As with video, the metadata structure may differ between media coming into the home versus what goes out on HDMI, but the purpose is the same.
HDMI refers to multi-channel surround sound as “3D Audio”, for which there are several standards defining speaker layouts and channel designations — ITU for 10.2, SMPTE for 22.2, CTA 26.2, and IEC maxes out current limits with 30.2 channels. Metadata will inform the system which standard, channel count and layout needs to be applied.
Object-based audio can’t even exist without metadata. Every object has to be described in terms of its X-Y-Z position, gain, correlation, and snap tolerance, amongst other things. The cinematic
standard for this is SMPTE ST 2098-1.
There’s also some other fascinating developments on the audio side of things too. One is the proposed use of metadata to optimise audio depending on the listener’s environment. For example, listening to hi-res music through a 2-channel speaker system in a quiet room, versus listening to the same music through headphones on a noisy train (noise cancelling notwithstanding). For the latter, the audio may be processed by the portable device to increase loudness and reduce the dynamic range, thereby lifting any softer passages to above the ambient noise floor, but that could introduce unwanted artefacts in the sound.
The Audio Engineering Society (AES) community are working on emerging use cases and capabilities to use metadata to preset loudness and dynamic range profiles for a user to select depending on their listening circumstances, while preserving fidelity and artistic intent in the soundtrack.
The video and audio metadata applications mentioned here are just some examples, and hopefully enough to demonstrate the point about how important metadata is, and how care
needs to be taken to ensure it is delivered intact. Irrespective of the use, if metadata breaks, things don’t work as they should. Metadata can be likened to a good control system — even the best AV system is useless if the user doesn’t know how to turn it on.
As metadata can be found at the top of each video frame in HDMI, any unpacking or manipulation of the video could jeopardise things. For example, with an HDMI extender or AV-over-IP system that needs to compress the video, the HDR metadata needs to be taken out and sent separately, then get re-inserted at the far side for delivery over the last hop of HDMI to the display. If this metadata is lost, then the HDR instructions are lost and the picture may be too dark or way off colour.
There are plenty of examples that could fill pages in this magazine. Instead I’d highly recommend looking out for the release of CEB28 in the second half of 2020. This will serve as an industry guide to HDMI system design, and also how to verify each and every required feature to ensure optimal system performance.