Choose a model for image analysis, video understanding, OCR, and more.
Image and video understanding
Start with qwen3.6-plus — strongest accuracy, 1M context, 2-hour video support, and the full feature set including function calling and built-in tools. Once your use case works well, try qwen3.5-flash to reduce cost — near-flagship quality with the same context and features.
Image resolution
Most models support up to 16M pixels per image. Higher resolution costs more tokens: each image uses h × w / (32 × 32) + 2 tokens.
Video support
- Up to 2 hours / 2GB →
qwen3.6-plus,qwen3.5-plus,qwen3.5-flash - Up to 1 hour / 2GB →
qwen3-vl-plus,qwen3-vl-flash - Up to 40 seconds / 150MB →
qwen3-omni-flash(also accepts audio — see Speech models)
Function calling + built-in tools
Let the model take actions based on what it sees in images or video.
- Function calling: Qwen3.6, Qwen3.5, and Qwen3-VL models
- Built-in tools (web search, code execution — no setup):
qwen3.6-plus,qwen3.5-plus,qwen3.5-flashonly
Structured output
Get valid JSON from visual input — e.g., extract product info from a photo.
Available on Qwen3.6, Qwen3.5, and Qwen3-VL in non-thinking mode.
OCR and document extraction
qwen-vl-ocr — specialized in documents, tables, exam questions, and handwriting. Or use qwen3.6-plus / qwen3.5-flash for general text extraction from images.
Recommended models
| Model | Context | Max pixels/image | Max video duration | Max video size | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch |
|---|---|---|---|---|---|---|---|---|---|---|
qwen3.6-plus | 1M | 16M | 2h | 2GB | 256 / 250 | 64 | ✓ | ✓ | ✓ | — |
qwen3.5-flash | 1M | 16M | 2h | 2GB | 256 / 250 | 64 | ✓ | ✓ | ✓ | — |
qwen3-vl-plus | 256k | 16M | 1h | 2GB | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-flash | 256k | 16M | 1h | 2GB | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-omni-flash | 64k | — | 40s | 150MB | 2,048 | 1 | ✓ | — | — | — |
All models
Qwen3.6
Qwen3.6
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch | Coding Plan |
|---|---|---|---|---|---|---|---|---|---|---|---|
qwen3.6-plus | Text, image, video | Text | 1M | 64k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.6-plus-2026-04-02 | Text, image, video | Text | 1M | 64k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
Qwen3.5
Qwen3.5
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch | Coding Plan |
|---|---|---|---|---|---|---|---|---|---|---|---|
qwen3.5-plus | Text, image, video | Text | 1M | 64k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | ✓ |
qwen3.5-plus-2026-02-15 | Text, image, video | Text | 1M | 64k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.5-flash | Text, image, video | Text | 1M | 64k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.5-flash-2026-02-23 | Text, image, video | Text | 1M | 64k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.5-397b-a17b | Text, image, video | Text | 32k | 8k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.5-122b-a10b | Text, image, video | Text | 32k | 8k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.5-27b | Text, image, video | Text | 32k | 8k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
qwen3.5-35b-a3b | Text, image, video | Text | 32k | 8k | 256 / 250 | 64 | ✓ | ✓ | ✓ | — | — |
Qwen3-VL
Qwen3-VL
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch |
|---|---|---|---|---|---|---|---|---|---|---|
qwen3-vl-plus | Text, image, video | Text | 256k | 32k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-plus-2025-12-19 | Text, image, video | Text | 256k | 32k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-plus-2025-09-23 | Text, image, video | Text | 256k | 32k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-flash | Text, image, video | Text | 256k | 32k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-flash-2026-01-22 | Text, image, video | Text | 256k | 32k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-flash-2025-10-15 | Text, image, video | Text | 256k | 32k | 256 / 250 | 64 | ✓ | — | ✓ | — |
Qwen-Omni
Qwen-Omni
Unlike other models on this page, Qwen-Omni accepts audio input and can output both text and speech.Standard
Realtime — streaming audio input with built-in Voice Activity Detection (VAD).
Captioner (open source) — audio captioning model.
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch |
|---|---|---|---|---|---|---|---|---|---|---|
qwen3-omni-flash | Text, image, audio, video | Text, audio | 64k | 16k | 2,048 | 1 | ✓ | — | — | — |
qwen3-omni-flash-2025-12-01 | Text, image, audio, video | Text, audio | 64k | 16k | 2,048 | 1 | ✓ | — | — | — |
qwen3-omni-flash-2025-09-15 | Text, image, audio, video | Text, audio | 64k | 16k | 2,048 | 1 | ✓ | — | — | — |
qwen-omni-turbo | Text, image, audio, video | Text, audio | 32k | 2k | 2,048 | 1 | — | — | — | — |
qwen-omni-turbo-latest | Text, image, audio, video | Text, audio | 32k | 2k | 2,048 | 1 | — | — | — | — |
qwen-omni-turbo-2025-03-26 | Text, image, audio, video | Text, audio | 32k | 2k | 2,048 | 1 | — | — | — | — |
| Model ID | Input | Output |
|---|---|---|
qwen3-omni-flash-realtime | Text, image, audio (streaming) | Text, audio |
qwen3-omni-flash-realtime-2025-12-01 | Text, image, audio (streaming) | Text, audio |
qwen3-omni-flash-realtime-2025-09-15 | Text, image, audio (streaming) | Text, audio |
qwen-omni-turbo-realtime | Text, audio (streaming) | Text, audio |
qwen-omni-turbo-realtime-latest | Text, audio (streaming) | Text, audio |
qwen-omni-turbo-realtime-2025-05-08 | Text, audio (streaming) | Text, audio |
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch |
|---|---|---|---|---|---|---|---|---|---|---|
qwen3-omni-30b-a3b-captioner | Audio | Text | 64k | 32k | — | — | — | — | — | — |
Qwen-OCR
Qwen-OCR
Specializes in extracting text from documents, tables, exam questions, and handwriting.
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch |
|---|---|---|---|---|---|---|---|---|---|---|
qwen-vl-ocr | Text, image | Text | 38k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-ocr-2025-11-20 | Text, image | Text | 38k | 8k | 256 / 250 | — | — | — | — | — |
Legacy
Legacy
Older model versions retained for backward compatibility. We recommend Qwen3.5 or Qwen3-VL for new projects.
| Model ID | Input | Output | Context | Max Output | Max images | Max videos | Function calling | Built-in tools | Structured output | Batch |
|---|---|---|---|---|---|---|---|---|---|---|
qwen3-vl-235b-a22b-thinking | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | — | — |
qwen3-vl-235b-a22b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-32b-thinking | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | — | — |
qwen3-vl-32b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-30b-a3b-thinking | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | — | — |
qwen3-vl-30b-a3b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen3-vl-8b-thinking | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | — | — |
qwen3-vl-8b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen2.5-vl-72b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen2.5-vl-32b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen2.5-vl-7b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen2.5-vl-3b-instruct | Text, image, video | Text | 128k | 8k | 256 / 250 | 64 | ✓ | — | ✓ | — |
qwen2.5-omni-7b | Text, image, audio, video | Text, audio | 32k | 8k | 2,048 | 1 | — | — | — | — |
qwen-vl-max | Text, image | Text | 32k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-max-latest | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-max-2025-08-13 | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-max-2025-04-08 | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-plus | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-plus-latest | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-plus-2025-08-15 | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-plus-2025-05-07 | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qwen-vl-plus-2025-01-25 | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qvq-max | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qvq-max-latest | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |
qvq-max-2025-03-25 | Text, image | Text | 128k | 8k | 256 / 250 | — | — | — | — | — |