Pricing - Qwen Cloud

Billing varies by model type: text models charge per token, image generation per image, video generation per second, and speech models per character or per second of audio.

Text generation

Billed per million tokens. Models with long-context support use tiered pricing — longer prompts cost more per token.

Model	Context tier	Input	Output
qwen3.6-max-preview	≤ 128K	$1.30	$7.80
	128K – 256K	$2.00	$12.00
qwen3.6-plus	≤ 256K	$0.50	$3.00
	256K – 1M	$2.00	$6.00
qwen3.6-flash	≤ 256K	$0.25	$1.50
	256K – 1M	$1.00	$4.00

For complete text model pricing, see Model Marketplace.

Images & videos

Understanding

Vision understanding is billed per token. Qwen text generation models (qwen3.6-plus, etc.) support vision input at the same token price listed above. Dedicated vision models have separate pricing:

Model	Context tier	Input	Output
qwen3-vl-plus	≤ 32K	$0.20	$1.60
	32K – 128K	$0.30	$2.40
	128K – 256K	$0.60	$4.80
qwen3-vl-flash	≤ 32K	$0.05	$0.40
	32K – 128K	$0.075	$0.60
	128K – 256K	$0.12	$0.96

Image and video inputs are automatically converted to tokens. The conversion varies by model:

Model family	Image conversion	Example (1024×1024)
Qwen (qwen3.6-plus, etc.)	1 token per 32×32 pixels	≈ 256 tokens
Qwen-VL (qwen3-vl, etc.)	1 token per 32×32 pixels	≈ 256 tokens
Qwen3.5-Omni	1 token per 32×32 pixels	≈ 256 tokens
Qwen3-Omni-Flash	1 token per 32×32 pixels	≈ 256 tokens

Video tokens = sampled frames × tokens per frame. See Token counting → for details.

Generation

Image generation is billed per image (resolution-independent). Video generation is billed per second of output video. Image generation

Model	Price per image
qwen-image-2.0-pro	$0.075
qwen-image-2.0	$0.035
qwen-image-edit	$0.045
wan2.6-t2i	$0.03
wan2.6-image	$0.03
z-image-turbo	$0.015 (prompt rewrite off) / $0.03 (on)

Video generation

Model	Price per second
wan2.6-t2v	$0.10
wan2.6-i2v	$0.10
wan2.6-i2v-flash	$0.05

For all image and video model pricing, see Model Marketplace.

Audio & speech

Text to speech

Billed per 10,000 characters of input text.

Model	Price per 10K chars
cosyvoice-v3-plus	$0.26
cosyvoice-v3-flash	$0.13
qwen3-tts-flash	$0.10

Speech to text

Billed per second of audio input.

Model	Price per second
fun-asr	$0.000035
fun-asr-realtime	$0.00009
qwen3-asr-flash	$0.000035

Speech to speech

Qwen-Omni is a multimodal model that handles text, audio, and image/video in a single call. All modality prices are listed in the table below.

Billed per million tokens, with different rates per modality. Token conversion

Input type	Conversion rate
Text	Standard tokenizer
Audio input	≈ 7 tokens/sec (Qwen3.5-Omni) or 12.5 tokens/sec (Qwen3-Omni-Flash) or 25 tokens/sec (Qwen-Omni-Turbo)
Audio output	≈ 12.5 tokens/sec (Qwen3.5-Omni) or 12.5 tokens/sec (Qwen3-Omni-Flash)
Image/Video	See Understanding section above

qwen3.5-omni pricing Price per 1M tokens:

Model	Text/Image/Video input	Audio input	Text output	Text + Audio output
qwen3.5-omni-plus	$1.4	$11	$8.3	$44
qwen3.5-omni-flash	$0.4	$3	$2.2	$11.9

For all speech model pricing, see Model Marketplace.

Embedding & reranking

Billed per million input tokens (output is not charged). Multimodal embedding models may charge different rates for image vs text input. Image/video token conversion for embedding models is handled internally — check the usage field in the API response for actual token counts.

Model	Modality	Price per 1M tokens
text-embedding-v4	Text	$0.07
tongyi-embedding-vision-plus	All	$0.09
tongyi-embedding-vision-flash	Image/Video	$0.03
	Text	$0.09
qwen3-rerank	Text	$0.10

For all embedding and reranking model pricing, see Model Marketplace.

Built-in tools

Some built-in tools incur per-call fees in addition to model token costs.

Tool	Fee	Notes
Web Search	$10 / 1K calls
Web Extractor	FREE	Limited time
Code Interpreter	FREE	Limited time
Image Search	$8 / 1K calls	Text-to-image and image-to-image

Function calling and MCP have no tool fees — tool descriptions count as input tokens.

Free quota

New users get 1 million free tokens per model for 90 days. Applies to real-time API calls only. Learn more →

Save on costs

Batch API — 50% off for async workloads. Learn more →
Context caching — Reuse long prompts at reduced cost. Learn more →
Model selection — Match model tier to task complexity. Compare models →

Batch and cache discounts cannot be combined on the same request.

For worked examples and advanced strategies, see Cost optimization →.

Learn more

Model Marketplace — Complete pricing for all models
Free quota — Eligibility and activation
Cost optimization — Advanced strategies
Token Plan — Credits-based pricing for AI coding tools
Coding Plan — Fixed monthly pricing for AI coding tools
Billing FAQ — Common questions
Bill management — View usage and invoices

​Text generation

​Images & videos

​Understanding

​Generation

​Audio & speech

​Text to speech

​Speech to text

​Speech to speech

​Embedding & reranking

​Built-in tools

​Free quota

​Save on costs

​Learn more

Text generation

Images & videos

Understanding

Generation

Audio & speech

Text to speech

Speech to text

Speech to speech

Embedding & reranking

Built-in tools

Free quota

Save on costs

Learn more