Transforming video conferencing with edge-based generative AI
Not too long ago, meetings and conferences were synonymous with gathering in a physical space. Nowadays, virtual meetings have become just as common, if not more so, than in-person interactions. However, the personal connection inherent in face-to-face meetings is something that virtual meetings still struggle to replicate.
Generative artificial intelligence (GenAI) has started to make its mark on virtual meetings, introducing new features designed to enhance productivity and engagement, making these interactions feel more like real-life experiences. For these enhancements to be effective on a large scale, they must operate in real-time with minimal delay and at a reasonable cost. This necessitates that some of the new functionalities be integrated directly into the connected devices. Solution providers are already incorporating AI into videoconferencing platforms and personal computers to optimise real-time performance, add virtual enhancements and automate meeting management.
ADVERTISEMENT
Redefining video conferencing with GenAI
GenAI has the potential to transform the video, audio and text experience of virtual meetings. Picture a hybrid meeting with both boardroom and remote participants. Instead of sending a static wide shot of the boardroom participants to the remote team, intelligent video processing can dynamically zoom in on speakers, replicating the nuanced experience of in-person interactions. Technologies like neural radiance field (NeRF) can generate engaging views of remote participants, providing an immersive experience by dynamically changing the angle of view at each endpoint. AI can create a harmonious and consistent gallery view, displaying all participants in a uniform size, posture and style. If there’s a whiteboard in the boardroom, AI can auto-detect it, recognize the written notes and convert them into an editable format. Then, a personal version can be created for note taking and on-the-fly comments.
On the audio and text front, GenAI can serve as a personal assistant for each participant, enhancing productivity. This assistant can convert audio to text, summarise meetings, assign actions to respective owners and even suggest relevant responses in real time. For multilingual teams, language barriers can be overcome with instantaneous audio translation provided by the AI assistant.
However, despite its vast potential, GenAI as it exists today is limited by the technology that enables it. To make AI-based videoconferencing effective and widely accessible, relying solely on existing cloud-based services is insufficient.
Unlocking GenAI’s power on the edge
To enable the above applications, video conferencing systems should be able to perform GenAI processing bound to the endpoints themselves — either on the personal computer or the conferencing gateway device – without needing to reach back to the cloud for processing.
One of the key aspects of conferencing systems is their ability to scale. When it comes to scalability, it is vital to identify the cases in which centralised processing is relevant and the ones that require edge processing.
There are three main cases in which processing at a central point is advantageous:
- Time sharing – when functionality requires light processing that can be handled easily by a central machine at a fraction of its capacity, like an alert when a participant enters the room or unmutes their microphone, the central machine can serve all endpoints, each at a different time slot without noticeable impact.
- Information sharing – when the same information needs to be shared by all participants. For example, a shared whiteboard with no personal comments per participant.
- Resource sharing – when the function has an inherent processing that is common to all endpoints, such as searching on a shared database. In such cases, the shared processing can be applied once and it is reusable for many or all endpoints.
Most of the capabilities formerly described do not meet these three cases. Therefore, to build scalable video conferencing systems that can make these functions available for all participants, distribution of the AI capabilities downstream is required, equipping the different nodes with proper AI compute capacity.
This will result in multiple benefits, including:
- Sustainability – The environmental impact of cloud-based AI processing cannot be underestimated, with significant energy consumption and pollution generated in the process. Researchers at Carnegie Mellon University and Hugging Face measured the carbon footprint of different machine learning tasks. Their findings show that AI tasks that involve the generation of new content, such as text generation, summarisation, image captioning and image generation, stand out as the most energy intensive. The findings show that the most energy-intensive AI models such as Stability AI’s Stable Diffusion XL, produce nearly 1,600g of CO2 per session, which is about the same environmental impact as driving four miles in a gas-powered car. Edge devices offer a more sustainable option for generative AI, consuming less power, minimising cooling requirements and reducing carbon footprint, thereby contributing to a greener and more eco-friendly approach to AI conferencing.
- Latency – In virtual conferences, instantaneous results are imperative for smooth interactions, whether it’s real-time translation, content creation or video adjustment. Leveraging generative AI on edge devices reduces latency, ensuring a fluent discussion and seamless user experience without delays.
- Connectivity – Virtual conferences are often affected by shortage of bandwidth, especially when participants have limited internet connectivity, such as during travel or in remote locations. Edge-based generative AI can locally crop out irrelevant information, guaranteeing that only relevant and important data is transmitted and enabling uninterrupted and productive meetings.
- Cost – The expense of monthly subscriptions to cloud-based generative AI tools can be daunting for many organisations. With a multitude of tools catering to various user needs like chat, search engine and image/video creation, costs can quickly add up to hundreds of dollars per user per month, straining budgets further. By migrating generative AI to the personal computer of the users or to the conferencing device, users become owners of the tools without the need for monthly subscriptions or long-term commitments, presenting a more financially viable solution.
Enhancing devices with built-in AI processing
Designing a videoconferencing system that processes AI directly on edge devices requires closed-loop systems that can handle parts of what is currently done in the cloud. Processing AI on endpoint devices, such as laptops, conference room devices and cameras, ensures meetings run smoothly and at an affordable cost, while keeping AI-generated content like auto-summaries or dynamic presentations more secure.
Hailo offers AI processors purpose-designed to handle AI models like the ones described above, in an energy-efficient and cost-effective manner suitable for a variety of edge devices. Today, the company is working with conferencing manufacturers to integrate AI processors into their hardware.
Soon, AV integrators and designers will be able to access videoconferencing systems ready for the GenAI era, providing the benefits of GenAI alongside the performance, reliability and security advantages of edge processing. Enjoying the best of both worlds, in a design meant to take collaboration to the next level.
This article was written by Hailo chief technology officer Avi Baum. Hailo in an AI-focused, Israel-based chipmaker that has developed a specialised AI processor for enabling data center-class performance on edge devices.
-
ADVERTISEMENT
-
ADVERTISEMENT
-
ADVERTISEMENT
-
ADVERTISEMENT