News

Google Gemini Embedding 2 for Multimodal Semantic Search

Google launches Gemini Embedding 2: record-breaking performance on benchmarks, support for 5 modalities, and native integration into a single embedding space.

Franck da COSTA

Published on 12 March 2026

In the constantly evolving ecosystem of artificial intelligence, Google has just taken a major step with the launch of Gemini Embedding 2, its first natively multimodal embedding model. This innovation redefines how machines understand and organize information across different media types. For developers and companies leveraging AI, this advancement opens up unprecedented opportunities in semantic search and data analysis.

Announcing Gemini Embedding 2 ✨ the first fully multimodal embedding model built on the Gemini architecture. Now available in preview via the Gemini API and Vertex AI.

The new model provides semantic understanding across 100+ languages — and support for modalities across text,… pic.twitter.com/gAlVQ5fPaV
— Google for Developers (@googledevs) March 10, 2026

Table of Contents

Understanding embeddings and their role in modern AI

Before exploring the specificities of Gemini Embedding 2, it is worth clarifying the concept of embedding. In the field of machine learning, an embedding is a numerical representation that transforms complex data such as text, images, or sound into mathematical vectors. These vectors capture the semantic meaning of the content and allow algorithms to compare, classify, and search for information efficiently.

Traditionally, embedding models focused on a single modality. One model processed only text, another only images. This fragmented approach required complex pipelines to manage multiple data types simultaneously. Gemini Embedding 2 breaks with this limitation by offering a unified architecture capable of simultaneously processing text, images, videos, audio, and PDF documents within a common embedding space. This convergence significantly simplifies technical architectures while improving the precision of contextual understanding.

Major new features of Gemini Embedding 2

Version 2 of Gemini Embedding introduces several remarkable innovations. Built on Google’s Gemini architecture, it inherits the cutting-edge multimodal understanding capabilities developed by DeepMind. The model now supports five distinct modalities with impressive technical specifications: up to 8192 tokens for text, 6 images per query in PNG and JPEG formats, 120 seconds of video in MP4 or MOV, audio processed natively without intermediate transcription, and PDF documents up to 6 pages.

The most significant innovation lies in the model’s ability to process interleaved inputs. Specifically, Gemini Embedding 2 can simultaneously analyze an image accompanied by a textual description in a single query, thus capturing complex relationships between different media types. This approach better reflects how humans perceive information, where text and visuals mutually enrich each other.

The model also incorporates the Matryoshka Representation Learning technique, which allows for dynamically adjusting output dimensions. With 3072 dimensions by default, developers can reduce this size to 1536 or 768 dimensions according to their needs, thereby optimizing the balance between performance and storage costs.

Record performance on benchmarks

Google states that Gemini Embedding 2 sets a new performance standard in the multimodal domain. Published benchmarks show that the model outperforms competing solutions on text search, image analysis, and video understanding tasks. This measurable improvement is not a mere marginal gain but represents a qualitative leap in the precision of semantic search.

Gemini Embedding 2 performance on benchmarks. Source [1]

The model’s audio capabilities deserve special attention. Unlike traditional approaches that require prior textual transcription, Gemini Embedding 2 processes audio data directly, thus preserving tonal and contextual nuances often lost during text conversion. This feature opens up promising applications in multimedia content analysis, recommendation systems, and automatic classification of audio-visual libraries.

Practical Implementation of Gemini Embedding 2

For developers wishing to integrate this technology, Google offers two main access points: the Gemini API and Vertex AI. The public preview phase already allows for experimentation with the model and building prototypes. Interactive notebooks are available on Colab to facilitate getting started, while integration with popular frameworks like LangChain, LlamaIndex, Haystack, or Weaviate simplifies adoption in existing projects.

Practical use cases cover a wide spectrum. Retrieval-augmented generation particularly benefits from multimodality, allowing language model responses to be enriched with relevant visual contexts. Semantic search engines can now process complex queries mixing several media types, while sentiment analysis systems gain precision by combining text with visual or audio elements.

Towards Truly Multimodal AI

Gemini Embedding 2 marks a significant step in the evolution towards AI systems capable of understanding the world in all its multimodal richness. By unifying text, image, video, audio, and documents into a single semantic space, Google offers developers a powerful tool to build smarter and contextually aware applications. This technical convergence simplifies architectures while opening new creative possibilities to leverage the diversity of data that surrounds us.

[1] Gemini Embedding 2: Our first natively multimodal embedding model

[2] Notebook Gemini Embedding 2 Gemini API

[3] Notebook Gemini Embedding 2 Vertex AI

Franck da COSTA

Software engineer, I enjoy turning the complexity of AI and algorithms into accessible knowledge. Curious about every new research advance, I share here my analyses, projects, and ideas. I would also be delighted to collaborate on innovative projects with others who share the same passion.

Algo Mania

Google Gemini Embedding 2 for Multimodal Semantic Search

News

Google Gemini Embedding 2 for Multimodal Semantic Search

Understanding embeddings and their role in modern AI

Major new features of Gemini Embedding 2

Record performance on benchmarks

Practical Implementation of Gemini Embedding 2

Towards Truly Multimodal AI

More in News

News

GPT-5.4 and Native Agentic Capabilities to Transform Professional Work

News

Google Pomelli Photoshoot: Tool for Creating Professional Marketing Visuals

News

Perplexity Computer: The AI that Orchestrates Models to Work for You

News

Nano Banana 2: Speed and Intelligence in Image Generation Model

News

Gemini Pro 3.1: Google’s Artificial Intelligence Reaches a New Milestone

Tendance

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Robotics

Humanoid Robot T800 : EngineAI out of science fiction