Connected Magazine

Main Menu

  • News
  • Products
    • Audio
    • Collaboration
    • Control
    • Digital Signage
    • Education
    • IoT
    • Networking
    • Software
    • Video
  • Reviews
  • Sponsored
  • Integrate
    • Integrate 2024
    • Integrate 2023
    • Integrate 2022
    • Integrate 2021

logo

Connected Magazine

  • News
  • Products
    • Audio
    • Collaboration
    • Control
    • Digital Signage
    • Education
    • IoT
    • Networking
    • Software
    • Video
  • Reviews
  • Sponsored
  • Integrate
    • Integrate 2024
    • Integrate 2023
    • Integrate 2022
    • Integrate 2021
ContributorsVideo
Home›Contributors›The importance of lip sync

The importance of lip sync

By Stephen Dawson
13/08/2014
563
0

A lot of people are finding their viewing somewhat disquieting for reasons that aren’t all that obvious. Stephen Dawson explains the importance of lip sync adjustment and why you might need to do it manually.

Once upon a time we received our home entertainment in simple, straightforward ways. The video and the audio were multiplexed together into a single RF stream. Your TV would demux them, send the sound off the speaker and the picture off to the CRT. Both would treat their respective signal as a simple stream and act on it instantly.

If the sound and picture drifted apart, then there was something wrong at the TV station.

ADVERTISEMENT

But now it is the norm that they don’t match. The reason is simple: while the processing of sound has been slowed up a little due to modern technology, the processing of the picture has been slowed a great deal more.

Mind Games
Fortunately as human beings we have a fair bit of tolerance for mismatches between picture and sound, and our brains adjust for it. But in real life, the mismatch is the other way around.

Sound travels at around 340 metres per second, light at about 300 million metres per second. 340 metres isn’t very far: the length of three or four football fields, depending on the code you follow. If something makes a noise that far away you will see it, for all practical purposes, instantaneously. But it will take a full second for the sound to get to your ears.

If you are talking to someone on the other wide of the room – say, 5m away – then again you will see their lips moving at the same instant that they are actually moving, but their voice will be 15 milliseconds behind it. Now 15 milliseconds – 15 one-thousandths of a second – may seem like hardly any time to make a fuss about. But our brains use timing cues much smaller than that. We tell the direction of a sound largely by the differences between when the sound reaches our left and right hears, and those differences are usually just a fraction of a millisecond.

Fortunately in day to day life the various processing circuits of our brains do a little magic and make the voice of the person talking to us sound like it is precisely matched to the movement of that person’s lips.

Up to a point. If they are too far away then the adjustment is no longer made, and the lips and voice don’t match.

Lots of data
But things are backwards when it comes to the picture and sound in a home entertainment system. The picture gets to you slower than the sound does because of the nature of the playback system.

The sound – even with modern digital systems – is delayed hardly at all. Digital decoding and DSP manipulation of the sound all happens in just a millisecond or two.

But the video is a different matter.

Let’s use the standard 1080p24 signal from Blu-ray as the basis of our consideration.

There’s a lot of data in each video frame: over two million pixels. It takes time to apply the various noise reduction and colour correction algorithms employed by the modern TV or display.

In addition, there are 24 frames per second, so after each frame the TV has to wait nearly 42 milliseconds for the next frame. If there is any kind of motion smoothing employed by the display, then it needs the data from that next frame as well to calculate intermediate frames. All these add up to delaying the display of the video.

By how much?

The delay varies depending on the display itself, and the amount of processing it is applying to the video. In practice a typical large screen TV will delay the image somewhere between about 40ms through to, in extreme cases, 150ms.

These delays are particularly important to gamers because they impact adversely on their reaction times.

But they can also lead to the phenomenon of lips appearing to be out of synchronisation with voices, and other audio-video mismatches.

Even though the order is reversed by TVs – the picture being delayed rather than the sound – the same brain mechanisms that fix lip sync issues in the real world also tend to fix them with our home entertainment systems.

So long as the delay isn’t too long.

How long is too long varies from individual to individual. A person sensitive to the issue may notice a problem with a video delay of just 20ms. Others can comfortably cope with 50ms. Get up above 100ms, however, and the delay is discomforting to just about all viewers.

For each individual there seems to be a threshold amount of tolerable delay. If for a particular person it is 40ms, then the processing circuits in his or her brain will make the picture and sound match up to that point, but with a slightly higher delay the two will snap out of sync.

We also seem to be more tolerant of timing errors between voice and lips than we are between the sound and picture of objects striking each other.

HDMI 1.3 to the rescue!
The solution to this is slow down the audio so that it matches the video. With modern digital processors this is a relatively straightforward technical task. Most home theatre receivers have for some years implemented an ‘Audio Delay’ feature, in which you can typically delay the sound by up to 200ms. Most offer millisecond by millisecond adjustments, but some more sensibly have interval jumps of ten milliseconds. Likewise, a lot of Blu-ray players also have this feature.

But the question is: how much delay should be dialled in?

The problem here is that our tolerance for these lip sync errors perversely make it hard to estimate the right correction to make. I for one find it extremely hard to tell with normal program material whether the audio is leading the video or vice versa.

I find that if there is a problem and there’s no other way to solve it, it’s best to just set a 50ms audio delay and then see for a while whether that fixes it. If not go up or down by 25ms.

But a solution should already have been in place. HDMI 1.3 implements an automatic lip sync feature. The TV uses this to tell the device supplying its image how much delay it imposes. That device – usually the home theatre receiver – can then delay the audio by the right amount.

Just about all home theatre receiver implement this feature. But the problem is many displays don’t. Often home theatre projectors support it, but most big screen TVs don’t.

So we’re back to the same problem: how to work out how much delay should be imposed?

There are specialised instruments that can help, but almost no-one has access to those.

The best practical solution is a test pattern.

While it’s hard to assess synchronisation errors with regular video material, a smoothly moving pattern and a regular click noise make this far easier.

The only one I’ve found in all the test discs I’ve acquired over the years is on the Digital Video Essentials HD Basics Blu-ray, available from Beyond Home Entertainment (www.beyondhe.com.au). This has a circular arm (sometimes in colour) that sweeps through a full circle every second, and a beep that sounds when it’s supposed to be at top dead centre. By following the arm with your eyes you can get a good sense of how far before it reaches the top that you hear the sound. Then you can work out the required delay. If you hear the sound when the arm is, say, still fifteen degrees short of the top, then the video delay is 15/360*1000 milliseconds, which is 42 milliseconds. Dial that into your home receiver and the problem is solved.

  • ADVERTISEMENT

  • ADVERTISEMENT

TagsVideo
Previous Article

REVIEW: Linn Sneaky DSM network audio system

Next Article

Readying your business for the digital dividend

  • ADVERTISEMENT

  • ADVERTISEMENT

Advertisement

Sign up to our newsletter

Advertisement

Advertisement

Advertisement

Advertisement

  • HOME
  • ABOUT CONNECTED
  • DOWNLOAD MEDIA KIT
  • CONTRIBUTE
  • CONTACT US