Qwen3 ASR, Reconnaissance vocale multilingue open source

Research

Qwen3 ASR: Open Source Multilingual Speech Recognition

Qwen3 ASR, a new family of automatic speech recognition models

Published on 9 February 2026

The Qwen team from Alibaba Cloud made a major splash at the end of January 2026 by unveiling Qwen3 ASR, a new family of automatic speech recognition models that is disrupting the audio AI landscape. Available as open source under the Apache 2.0 license, this solution promises to rival the most powerful commercial APIs while supporting 52 languages and dialects. An announcement that consolidates Qwen’s position as the most downloaded open source AI model family in the world, with over 700 million cumulative downloads on Hugging Face.

Table of Contents

Available Qwen3 ASR Model Types

Qwen3 ASR comes in three distinct variants, each addressing specific use cases. The flagship model, Qwen3 ASR 1.7B, features 1.7 billion parameters and aims for maximum performance. For applications requiring a balance between accuracy and efficiency, Qwen3 ASR 0.6B offers a lighter alternative with 600 million parameters. This compact version displays remarkable performance with an average latency of only 92 milliseconds and an impressive capacity to transcribe 2000 seconds of audio in a single second with concurrency of 128 simultaneous requests.

The Qwen3 ASR models are built on Qwen3-Omni as their foundation, recognized for its excellent audio comprehension capabilities. The speech signal is first processed by an AuT encoder, an automatic recognition model based on an attention-encoder-decoder architecture that compresses audio features by a factor of 8, thus producing representations at a frequency of 12.5 Hz.

This design incorporates a dynamic attention window adjustable between 1 and 8 seconds, allowing Qwen3 ASR to function both in streaming mode with short segments and in offline mode for long queries. The flagship Qwen3 ASR 1.7B model combines the Qwen3-1.7B language model with a 300 million parameter AuT encoder and a hidden dimension of 1024, connected by a projection module. [1]

The Qwen3 family also stands out with the addition of Qwen3 ForcedAligner 0.6B, an innovative model dedicated to forced alignment. This non-autoregressive solution enables precise prediction of word or character-level timestamps for 11 languages, paving the way for particularly sophisticated subtitling and audio-text synchronization applications.

Qwen3 ASR Performance Challenges OpenAI’s Whisper

Tests reveal that Qwen3 ASR 1.7B achieves state-of-the-art performance among open source models and proves competitive against the most powerful commercial APIs. The model notably surpasses OpenAI’s Whisper large v3, widely considered a benchmark in the field, on several reference benchmarks. This feat is explained by Qwen3’s sophisticated architecture, which relies on the Qwen3 Omni foundation model and benefits from an AuT audio encoder trained on massive volumes of speech data.

Evaluation of Qwen3 ASR 1.7 B — Evaluation of Qwen3 ASR 1.7B [2]

One of the major strengths of Qwen3 ASR lies in its exceptional robustness when facing complex acoustic environments. The model maintains high recognition quality even in the presence of significant ambient noise, background music, or regional dialects. This ability to handle difficult real-world situations distinguishes it from competing solutions that often excel only on clean laboratory audio recordings.

Impressive Language Coverage

With support for 30 languages and 22 Chinese dialects, Qwen3 ASR displays a decidedly global ambition. The model also handles English accents from multiple geographic regions, ensuring reliable recognition regardless of the speaker’s origin. This linguistic versatility is accompanied by an automatic language identification feature, allowing the system to detect and transcribe audio without requiring prior configuration.

30 languages and 22 chinese dialects bear by Qwen3 ASR — 30 languages and 22 chinese dialects supported by Qwen3 ASR. [2]

The all-in-one approach adopted by Qwen considerably simplifies deployment for developers. No need to juggle multiple specialized models or build complex pipelines: a single Qwen3 ASR model is sufficient to handle language identification, multilingual transcription, and even non-speech segment detection.

An Open Source Model That Opens New Perspectives

The release of Qwen ASR is part of Alibaba Cloud’s offensive in the open source artificial intelligence arena. With nearly 400 Qwen models made accessible and more than 180,000 derivative versions created by the community, the company rivals Meta and its Llama models directly.

Opening under the Apache 2.0 license allows unrestricted commercial use, facilitating adoption by companies wishing to integrate advanced speech recognition capabilities without depending on expensive proprietary services. Applications cover a broad spectrum: automatic subtitling, voice assistants, meeting transcription, call centers, and accessibility for the hearing impaired.

[1]Shi, X., Wang, X., Guo, Z., Wang, Y., Zhang, P., Zhang, X., Guo, Z., Hao, H., Xi, Y., Yang, B., Xu, J., Zhou, J., & Lin, J. (2026). Qwen3-ASR Technical Report. arXiv preprint arXiv:2601.21337v2. https://doi.org/10.48550/arXiv.2601.21337

[2] Qwen3-ASR & Qwen3-ForcedAligner is Now Open Sourced: Robust, Streaming and Multilingual!

Download the templates of the family QwenASR on HugginFace

Demo model QwenASR-1.7 B

Franck da COSTA

Software engineer, I enjoy turning the complexity of AI and algorithms into accessible knowledge. Curious about every new research advance, I share here my analyses, projects, and ideas. I would also be delighted to collaborate on innovative projects with others who share the same passion.

Click to comment

Qwen3 ASR: Open Source Multilingual Speech Recognition

Research

Qwen3 ASR: Open Source Multilingual Speech Recognition

Available Qwen3 ASR Model Types

Qwen3 ASR Performance Challenges OpenAI’s Whisper

Impressive Language Coverage

An Open Source Model That Opens New Perspectives

Leave a Reply
Cancel reply

Leave a Reply

More in Research

Research

PaperBanana: The AI Tool That Automates Scientific Diagram Creation

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Research

RAG Anything : The New Era of the RAG-Modal

Research

Meta SAM 3D: 3D Reconstruction of Images from the Physical World

Research

DS-STAR, a versatile agent : Google for data science

Tendance

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Robotics

Humanoid Robot T800 : EngineAI out of science fiction

Algorithm

Greedy Algorithms: When Greed Leads to the Solution

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Available Qwen3 ASR Model Types

Qwen3 ASR Performance Challenges OpenAI’s Whisper

Impressive Language Coverage

An Open Source Model That Opens New Perspectives

Leave a Reply Cancel reply

Leave a Reply

More in Research

Research

PaperBanana: The AI Tool That Automates Scientific Diagram Creation

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Research

RAG Anything : The New Era of the RAG-Modal

Research

Meta SAM 3D: 3D Reconstruction of Images from the Physical World

Research

DS-STAR, a versatile agent : Google for data science

Tendance

Research

MedGemma 1.5 and MedASR: Google Redefines Open-Source, Multimodal Medical AI

Hackathons

Built with Opus 4.6: The Claude Code Hackathon

Robotics

Humanoid Robot T800 : EngineAI out of science fiction

Algorithm

Greedy Algorithms: When Greed Leads to the Solution

Hackathons

Mistral Hackathon 2026: French AI Shines Across 7 Cities Worldwide

Leave a Reply
Cancel reply