Artificial Intelligence

What is a model of diffusion ?

Published on 9 September 2025

The diffusion model is today one of the engines of the most popular AI generative, in particular, to create images from text. , This approach has powered tools such as Stable Diffusion, DALL·E or Midjourney on the front of the stage, because it generates detailed graphics and realistic, while remaining relatively stable in the training compared to other families of models. Understand what is a diffusion model helps to understand how these systems ‘think’ of the content and why they dominate the generation of images modern.

Table of Contents

Diffusion model, the simple definition

A diffusion model is a generative model that learns to manipulate the sound. In a first time, it leads to a gradual change in a given clear, as an image, as in the covering of noise up to make it unrecognizable. Then, he learns the inverse operation, remove the noise, step-by-step to reconstruct the initial information. One can imagine it as a drop of ink spreading in the water, the model observes the broadcast, and then carries on to go up the film in the opposite direction.

Once the mechanism has been learned, it becomes able to create new data from scratch. We share a simple cloud of random noise, and the model refines gradually, step by step, up to the emergence of a coherent picture of. This progressive process makes the generation of more stable and more realistic than if the model was attempting to produce the final result of a single blow.

How does a model of diffusion ?

During training, the model learns two things, the way to go (turn a clear picture noise) and, especially, the way back (to remove noise in small steps). Technically, these steps often follow a Markov chain, and optimize with tools of probability as the behind variational inference, we adjust the model to reconstruct accurately the original data from images more and more noisy. Once trained, the model of scattering from a gaussian noise and applies the reverse sequence to generate a credible result.

In practice, it combines frequently a text encoder (which includes your prompt) with a diffusion model operating in a space latent, less space and easier to handle than the raw pixel. This formulation of ‘latent’ makes the generation of faster and more economic calculation, while retaining the fine detail.

Diffusion model vs. model multimodal

A model multi-modal system is a system capable of understanding and/or to produce multiple types of data, such as text, images, audio or video. It is not a particular method, but rather a skill : linking various forms of information within a single model.

A diffusion model is based on a specific process for generating content. Its operation consists in starting from a cloud of random noise and to refine it gradually until you get an output consistent (often an image). So we can say that it is a model that uses the principle of denoising step-by-step to create.

The two ideas can be crossed, for example, when a system takes text as input and produces an image, it is multimodal (because it connects two different ways) and it can use a diffusion model as an engine to build the final image. In summary, the multimodality describes the types of data handled, while the diffusion describes the process used to generate these data.

Franck da COSTA

Software engineer, I enjoy turning the complexity of AI and algorithms into accessible knowledge. Curious about every new research advance, I share here my analyses, projects, and ideas. I would also be delighted to collaborate on innovative projects with others who share the same passion.

Click to comment

Algo Mania

What is a model of diffusion ?

Artificial Intelligence

What is a model of diffusion ?

Diffusion model, the simple definition

How does a model of diffusion ?

Diffusion model vs. model multimodal

Leave a Reply
Cancel reply

Leave a Reply

More in Artificial Intelligence

Artificial Intelligence

Large Language Models (LLM): Understanding these giants of artificial intelligence

Artificial Intelligence

Understanding AI Agents: Architecture and Functioning

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Large Language Model

Understand LoRA and QLoRA : Fine-Tuning Techniques

Artificial Intelligence

Gemini Robotics : The generation of physical agents

Tendance

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Robotics

Humanoid Robot T800 : EngineAI out of science fiction

Diffusion model, the simple definition

How does a model of diffusion ?

Diffusion model vs. model multimodal

Leave a Reply Cancel reply

Leave a Reply

More in Artificial Intelligence

Artificial Intelligence

Large Language Models (LLM): Understanding these giants of artificial intelligence

Artificial Intelligence

Understanding AI Agents: Architecture and Functioning

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Large Language Model

Understand LoRA and QLoRA : Fine-Tuning Techniques

Artificial Intelligence

Gemini Robotics : The generation of physical agents

Tendance

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Artificial Intelligence

AI Model: Understanding Safetensors and GGUF Formats

Robotics

Humanoid Robot T800 : EngineAI out of science fiction

Leave a Reply
Cancel reply