Gemini Robotics la génération d’agents physiques

Artificial Intelligence

Gemini Robotics : The generation of physical agents

Published on 27 September 2025

Artificial intelligence is taking a decisive new step with the arrival of Gemini Robotics, a family of models designed not only to understand the world, but also to act in it autonomously. While classic language models remain confined to text-based exchanges, Gemini Robotics paves the way for a new era, that of physical AI agents capable of perceiving, reasoning, planning, and executing tasks in the real world. This breakthrough, driven by Google DeepMind and Google AI, marks a major turning point toward robots that are smarter, more adaptable… and above all, more useful in everyday life.

From language to action: starting from a voice request and the location (San Francisco), Gemini Robotics-ER 1.5 plans and controls the robot to correctly sort compost, recycling, and trash. Source [1]

Table of Contents

Two brains for a smart robot

At the core of this revolution are two complementary models: Gemini Robotics-ER 1.5 and Gemini Robotics 1.5. The first, Gemini Robotics-ER 1.5, serves as the “strategic brain.” It is an embodied reasoning model specialized in spatial understanding, long-horizon planning, and calling external tools (such as a Google search to look up local waste-sorting rules). It understands natural-language instructions, breaks a complex task into simple steps, and orchestrates the entire process.

The second, Gemini Robotics 1.5, is a vision–language–action (VLA) model. It receives detailed instructions from the strategic brain and translates them into precise motor commands for the robot. This model doesn’t just execute, it also “thinks” before acting, generating an internal chain of reasoning that lets it adapt its movements to the situation. For example, when given the instruction “sort the laundry by color,” it understands not only what a color is, but also how to handle a red sweater gently and place it in the correct basket.

Spatial understanding and learning through embodiment

What sets Gemini Robotics apart from earlier approaches is its ability to combine fine-grained perception, temporal reasoning, and robust execution. Thanks to state-of-the-art spatial understanding, Gemini Robotics-ER 1.5 can precisely locate objects in an image (as normalized coordinates), identify their state (open/closed, full/empty), or even describe a sequence of actions in a video with exact temporal segmentation. These capabilities are essential for enabling the robot to understand not only what is there, but also what has happened and what it should do next.

Another major innovation, Gemini Robotics 1.5 learns “through bodies.” This means it can transfer skills acquired on one type of robot (for example, an ALOHA 2 robotic arm) to a completely different robot (such as Apptronik’s Apollo humanoid) without any task-specific retraining [1]. This is known as learning through embodiment. Such generalization dramatically speeds up the development of new robotic behaviors and paves the way for more versatile systems.

Toward robots that are useful, responsible, and accessible

Google is already making Gemini Robotics-ER 1.5 available to developers through Google AI Studio and the Gemini API, in a preliminary version. This allows the community to start experimenting with this high-level “brain” to create physical agents capable of handling everyday tasks: tidying a table, sorting waste, making coffee, and so on. These scenarios, simple on the surface, actually require a subtle blend of perception, contextual reasoning, and motor coordination, exactly what Gemini Robotics now makes possible.

Of course, bringing AI agents into the physical world raises safety questions. Google emphasizes a “layered” approach, combining software filters, semantic risk reasoning, and hardware safety systems (such as emergency stops). The model has also been evaluated on an improved version of the ASIMOV benchmark, which is dedicated to semantic safety in robotics.

Comparison of models by generality (x-axis) and embodied reasoning (y-axis). Blue: Gemini models; white: GPT models. Gemini Robotics-ER 1.5 (Thinking On) achieves the best score in embodied reasoning, while GPT-5 has the greatest generality. Source [2]

The era of physical AI agents

Gemini Robotics doesn’t just improve robot performance, it redefines their role. Instead of simple, preprogrammed executors, we are entering the era of autonomous physical AI agents capable of understanding complex human intentions and responding adaptively. In the longer term, this technology could transform entire sectors, logistics, in-home assistance, industrial maintenance, while bringing AI closer to our tangible, everyday lives.

[1] Gemini Robotics 1.5 brings AI agents into the physical world

[2] Building the Next Generation of Physical Agents with Gemini Robotics-ER 1.5

Gemini Robotics Team, Abeyruwan, S., Ainslie, J., Alayrac, J.-B., Gonzalez Arenas, M., Armstrong, T., Balakrishna, A., Baruch, R., Bauza, M., Blokzijl, M., Bohez, S., Bousmalis, K., Brohan, A., Buschmann, T., Byravan, A., Cabi, S., Caluwaerts, K., Casarini, F., Chang, O., … Zhou, Y. (2025). Gemini Robotics: Bringing AI into the Physical World. arXiv.

Franck da COSTA

Software engineer, I enjoy turning the complexity of AI and algorithms into accessible knowledge. Curious about every new research advance, I share here my analyses, projects, and ideas. I would also be delighted to collaborate on innovative projects with others who share the same passion.

Click to comment

Algo Mania

Gemini Robotics : The generation of physical agents

Artificial Intelligence

Gemini Robotics : The generation of physical agents

Two brains for a smart robot

Spatial understanding and learning through embodiment

Toward robots that are useful, responsible, and accessible

The era of physical AI agents

Leave a Reply
Cancel reply

Leave a Reply

More in Artificial Intelligence

Artificial Intelligence

Large Language Models (LLM): Understanding these giants of artificial intelligence

Artificial Intelligence

Understanding AI Agents: Architecture and Functioning

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Large Language Model

Understand LoRA and QLoRA : Fine-Tuning Techniques

Artificial Intelligence

Learn Your Way : Google in learning at school

Tendance

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Robotics

Humanoid Robot T800 : EngineAI out of science fiction

Two brains for a smart robot

Spatial understanding and learning through embodiment

Toward robots that are useful, responsible, and accessible

The era of physical AI agents

Leave a Reply Cancel reply

Leave a Reply

More in Artificial Intelligence

Artificial Intelligence

Large Language Models (LLM): Understanding these giants of artificial intelligence

Artificial Intelligence

Understanding AI Agents: Architecture and Functioning

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Large Language Model

Understand LoRA and QLoRA : Fine-Tuning Techniques

Artificial Intelligence

Learn Your Way : Google in learning at school

Tendance

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Robotics

Humanoid Robot T800 : EngineAI out of science fiction

Leave a Reply
Cancel reply