Google DeepMind has introduced two AI models designed to enhance robotic capabilities in real-world scenarios. Gemini Robotics is a vision-language-action model that can understand new situations without prior training, while Gemini Robotics-ER is an advanced model capable of interpreting complex environments and controlling robotic movements.
Built on Google’s flagship AI model, Gemini 2.0, Gemini Robotics integrates physical actions with multimodal understanding. According to Carolina Parada, head of robotics at Google DeepMind, this development allows robots to engage with the world more effectively by incorporating physical action as a core function.
Key Features and Safety Measures
Google DeepMind highlights three essential attributes of Gemini Robotics: versatility, interactivity, and dexterity. The model can generalize to new environments, interact more effectively with people and objects, and perform precise physical tasks like folding paper or opening bottles.
Parada emphasized that this model represents a major leap in robotic development. “While we’ve made progress in each of these areas individually in the past, we’re now delivering dramatically increased performance in all three areas with a single model,” she stated. This results in robots that are more capable, responsive, and adaptable.
Gemini Robotics-ER is specifically designed for roboticists, enabling them to integrate it with existing low-level controllers. Parada illustrated its functionality using a lunchbox-packing scenario, where the model determines item locations, opens containers, and organizes objects accordingly.
Safety remains a priority in AI-driven robotics. Google DeepMind researcher Vikas Sindhwani explained that the company employs a “layered approach” in which Gemini Robotics-ER assesses whether an action is safe before execution. To further AI safety research, Google DeepMind has developed benchmarks and frameworks, including the “Robot Constitution,” a set of rules inspired by Isaac Asimov’s 1942 short story “Round Dance.”
Future Developments and Collaborations
Google DeepMind is actively collaborating with Apptronik to create next-generation humanoid robots, notes NIXSolutions. Additionally, the Gemini Robotics-ER model is being tested by trusted partners such as Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools.
Parada reiterated the company’s commitment to advancing AI-driven robotics: “We’re completely focused on building intelligence that can understand the physical world and act in that physical world. We’re really excited about using that in multiple incarnations and applications.”
In September 2024, Google DeepMind researchers demonstrated a learning method that enabled robots to perform intricate tasks like tying shoelaces, hanging shirts, and even repairing other robots. These advancements signal continued progress in AI-powered robotics, and we’ll keep you updated on further developments.