Announced at Google I/O 2024, Google introduced a groundbreaking feature for its Gemini AI chatbot: Gemini Live. This innovative feature allows users to engage in “in-depth” voice conversations with Gemini directly on their smartphones. One of the most notable aspects of Gemini Live is its ability to allow users to interrupt conversations to ask clarifying questions, with the chatbot adapting to the user’s speech in real time. Moreover, Gemini Live can interpret and react to the user’s surroundings using photos or videos captured by smartphone cameras.
In essence, Gemini Live integrates elements of the Google Lens computer vision platform and the Google Assistant virtual assistant, taking them to new heights. At first glance, Gemini Live may seem like a marginal improvement over existing technologies. However, Google asserts that the system utilizes new generative AI techniques to offer superior, less error-prone image analysis, combined with an enhanced speech engine for more consistent, emotionally expressive, and realistic turn-by-turn dialogue.
The technical innovation driving Gemini Live is partly attributed to Project Astra, DeepMind’s latest initiative aimed at developing AI-powered applications and “agents” capable of real-time “understanding” of various data sources, including text, audio, and images. Demis Hassabis, CEO of DeepMind, explained during a briefing, “We have always wanted to create a universal agent that is useful in everyday life. Imagine agents who can see and hear what we’re doing, better understand the context we’re in, and respond quickly in conversation, making the pace and quality of interaction much more natural.”
Real-Time Understanding and Interaction
Set to launch at the end of the year, Gemini Live will have the capability to answer questions about objects in the user’s immediate environment or recently captured by the smartphone camera. For instance, users can ask about the area they are in, identify a broken bicycle part, or even seek explanations for pieces of computer code. Furthermore, if a user misplaces their glasses, Gemini Live can recall where they were last seen, simplifying the search process for items like the elusive TV remote control.
Beyond its practical applications, Gemini Live is designed to function as a virtual mentor. It can assist users in rehearsing speeches for events, brainstorming ideas, and more. The system can suggest which skills to highlight in an upcoming interview or internship and offer tips on public speaking.
The impressive ability of Gemini Live to “remember” recent interactions is made possible by the architecture of its underlying model, Gemini 1.5 Pro, alongside other specific generative models. Gemini 1.5 Pro features a large context window, allowing it to accept and process substantial amounts of data—approximately an hour of video—before generating a response. Google has highlighted that Gemini Live will retain memories of interactions from the past several hours.
Gemini Live’s capabilities are reminiscent of the generative AI found in Meta glasses, which also interpret images captured by a camera in near real-time. According to the demo videos showcased during the presentation, Live bears similarities to OpenAI’s recently updated ChatGPT.
Exclusive Access and Future Prospects
A key distinction between the new ChatGPT and Gemini Live lies in accessibility. Unlike the free version of ChatGPT, Gemini Live will be exclusive to Gemini Advanced, a premium version available to subscribers of the $20 per month Google One AI Premium Plan.
In a potential nod to the Meta glasses, one of Google’s demo videos featured an individual wearing AR glasses equipped with an app resembling Gemini Live. However, aiming to avoid previous missteps in the smart glasses market, Google refrained from confirming whether such a generative AI product would be available in the near future, notes NIX Solutions.
We’ll keep you updated with the latest developments as Google continues to innovate and refine its AI technologies. The introduction of Gemini Live marks a significant step forward in the evolution of AI-powered interactions, promising to make everyday life more intuitive and connected.