Audio quality concerns in video-conferencing
Video conferences should be a two-way conversation yet are often hindered by poor audio quality. Andy Ciddor looks at how to rectify the common issues.
The video conference is a brutally hostile audio environment.
Although the simple requirement is to provide free and natural conversations between groups of people in different spaces, every aspect of the audio configuration contradicts what we know about good audio practice. There are open microphones and live loudspeakers in close acoustic proximity, yet tight microphone placement is all-but unachievable, low speaker volumes are not very useful and operator intervention is neither desirable nor really possible.
Early solutions to the two-way conversation problem involved a separate, microphone-only clean feed from each space, sent over separate signal paths and mixed down at each endpoint. Each endpoint required critical placement of narrow pattern microphones for local pickup, low volume loudspeakers delivering the sound from the other endpoint, and much concentration and quick-witted intervention from the audio operators at each end, to avoid feedback.
So complex and expensive was this to set up and operate, that it was only ever used for such occasions as multi-studio television interviews and very high-end corporate meetings and presentations. Even today, most multi-point video setups entail the use of earpieces for IFB (interruptible foldback) to reduce the risk for feedback. This hardly makes for a spontaneous, natural conversation between the participants.
Any attempt to use a single signal path, such as a telephone line, VoIP session or comms link, to carry the combined signals from both ends, rapidly vanishes in an interesting collection of pulsing and squealing feedback noises. With open microphones and live loudspeakers at both ends of the link there is a very high probability that the signals from the microphones at one end will wind up in both the speakers and microphones at the other end, providing both an echo, and eventually, a feedback loop.
One simple, but not particularly helpful, way to manage the problem, is to use what is known as half-duplex mode, where only one end is sending a signal at any one time. Although this can be achieved with push-to-talk switches like two-way radio, it is usually implemented for ‘hands free’ operation via a simple audio gating device that mutes the local loudspeakers as soon as any sound is detected by the local microphone. You may recognise this as the immensely frustrating, and almost completely useless, ‘speaker’ mode found on many desktop phones.
The key to solving the single audio line feedback problem is to remove all local microphone content from the signal feeding the local loudspeakers. This is quite difficult when there are open microphones, active loudspeakers and acoustic reflections at both ends of the conference. The means to achieve this is an acoustic echo cancellation (AEC) processor that recognises the originally transmitted signal when it reappears, with some delay, in the transmitted or received signal. When such an echo is detected, the processor subtracts it from the signal.
Unlike the voice-operated switch (VOX) circuit, which lies at the heart of half-duplex systems, there were no simple analogue audio circuits available for AEC. It took the development of an echo-cancelling algorithm for the low-cost DSP (digital signal processor) family of microprocessors before relatively simple and affordable devices became available. Virtually all of today’s video and audio conferencing systems include DSP-based AEC in their audio signal path, although the exact processing location varies between being embedded in microphone systems and dedicated processing devices in system control racks. Some of today’s smart phones allocate a portion of their many processor cores (the current iPhone and Galaxy each have about nine of them) to perform AEC for a mode that resembles two-way, hands-free conferencing.
The sound from the conference participants needs to be captured as clearly and as cleanly as possible. Lapel/lavaliere microphones may seem like an obvious, visually unobtrusive choice for collecting the voices of conference participants. However, they are usually omnidirectional in pickup pattern and therefore likely to pick up the outputs of loudspeakers. They are also far from ideal for non-professional users as they are both inconvenient to fit and tend to pick up movement and clothing noises from the wearer.
Desk-mounted narrow-pattern (cardioid and hyper-cardioid) microphones work quite well, provided the conference participants remain seated during the conference, work close to the microphone and desist from banging things around on the table. The narrow pick-up pattern means that loudspeakers can be placed off-axis from the microphone to reduce the possibility of feedback. Some desktop microphones designed for conferencing applications include a small (off-axis) loudspeaker in their base. This allows participants to hear the signal from the other end without requiring high volume levels in the conference space, which further reduces the likelihood for feedback.
Perhaps the most significant recent advance in audio conferencing technology is the development of the steerable microphone array. This technology uses an array of microphone elements feeding in to DSP that uses cancellation techniques to configure the beam pattern in real time, creating a virtual microphone that can follow a moving source or even switch instantaneously between multiple narrow beam patterns as sources are detected in different locations. The main configurations seen so far for these arrays are as tabletop modules, overhead panels and as pendants. Beam steering microphones using these principles are now available from a wide range of audio companies with a deluge of new products released at the recent ISE show in Amsterdam.
In video conferences, most of the local participants are looking at, and speaking towards a video screen showing the remote participants, which is the logical direction they will expect the remote sound to come from. Loudspeakers delivering the remote audio are generally placed in close proximity to, or even behind, the screen. Distributed in-ceiling speakers may be fine for general announcements, evacuation information, sound masking and background music, but they are not particularly suitable for the task of conferencing audio. Aside from the lack of directionality of the sound image, ceiling speakers are likely to disperse sound throughout the conference room and increase the amount of reverberation and echo.
As the requirement for clean conference sound is to limit the audio to the areas where participants are located, loudspeakers directly aimed at the conference participants is a common approach. In larger conference spaces this is frequently implemented through the selection of narrow-beam array elements, although the advent of affordable and effective steerable beam loudspeakers has allowed for even finer tuning and the development of conference spaces that can be remotely reconfigured for a range of different applications and layouts.
Conferencing audio, from the stand-alone tabletop unit to the dedicated video conference room and the full-service, multi-function auditorium has only really become practicable since we’ve begun to exploit signal processing to undertake what we can’t achieve acoustically.