- Create a task (this endpoint) and receive a
task_id. - Query the result by polling with the
task_id.
messages format and parameters as the synchronous endpoint, but requires the X-DashScope-Async: enable header.Authorizations
DashScope API Key. Create one in the Qwen Cloud console.
Header Parameters
Asynchronous processing configuration. Must be set to enable.
Body
application/jsonModel name. Set to wan2.6-image.
Input data containing the messages array.
Show child attributes
Show child attributes
Array of request content. Only single-turn conversations are supported. Provide one message with role: user.
Show child attributes
Show child attributes
Message role. Must be user.
Message content array. Must contain exactly one text object. Image objects depend on the mode:
- Image editing (
enable_interleave=false): 1 to 4 image objects required. - Interleaved text-image (
enable_interleave=true): 0 to 1 image objects.
When using multiple images, include multiple image objects in the array. Image order is determined by array position.
Show child attributes
Show child attributes
Positive prompt describing the desired image content, style, and composition. Supports Chinese and English. Maximum 2,000 characters (each Chinese character, letter, digit, or symbol counts as one character). Excess is auto-truncated. The content array must contain exactly one text object.
Input image as a public URL (HTTP/HTTPS) or Base64-encoded string (data:{mime_type};base64,{data}).
Image constraints:
- Formats: JPEG, JPG, PNG (alpha channel not supported), BMP, WEBP.
- Resolution: Width and height each between 240 and 8,000 pixels.
- File size: Max 10 MB.
Image quantity limits:
- When
enable_interleave=false(image editing): must input 1 to 4 images. - When
enable_interleave=true(interleaved text-image): can input 0 to 1 images.
Image processing parameters.
Show child attributes
Show child attributes
Negative prompt describing content you do NOT want in the image. Supports Chinese and English. Maximum 500 characters. Excess is auto-truncated.
Example: Low resolution, low quality, deformed limbs, deformed fingers, oversaturated colors, wax-like appearance, no facial details, overly smooth skin, AI-looking artifacts, chaotic composition, blurry or distorted text.
Output image resolution. Supports two methods: referencing input image proportions or directly specifying dimensions.
Image editing mode (enable_interleave=false):
- Method 1 (recommended):
1K(default) or2K. Output total pixels close to 1280*1280 or 2048*2048, maintaining the aspect ratio of the last input image. - Method 2: Specify
width*heightin pixels. Total pixels must be between [768*768, 2048*2048], aspect ratio [1:4, 4:1]. Actual values are multiples of 16.
Interleaved text-image mode (enable_interleave=true):
- Method 1 (default): References input image proportions. If total pixels <= 1280*1280, output matches input. If > 1280*1280, output scales to ~1280*1280.
- Method 2: Specify
width*height. Total pixels must be between [768*768, 1280*1280], aspect ratio [1:4, 4:1].
Recommended resolutions: 1280*1280 (1:1), 800*1200 (2:3), 1200*800 (3:2), 960*1280 (3:4), 1280*960 (4:3), 720*1280 (9:16), 1280*720 (16:9), 1344*576 (21:9).
Controls image generation mode:
false(default): Image editing mode. Supports multi-image input (1-4 images) and subject consistency generation. Can generate 1 to 4 result images.true: Interleaved text-image output mode. Supports 0-1 input images. Generates mixed content containing both text and images. Requiresstream=trueandX-DashScope-Sse: enableheader.
Number of images to generate. Behavior depends on mode:
- Image editing (
enable_interleave=false): Range 1-4. Default: 4. - Interleaved text-image (
enable_interleave=true): Must be 1. Usemax_imagesto control image count instead.
Note: n directly affects cost. Cost = unit price x number of successfully generated images.
Only effective in interleaved text-image mode (enable_interleave=true). Specifies the maximum number of images the model can generate in a single response. Range: 1-5. Default: 5. The actual number of generated images is determined by model inference and may be less than this value.
Note: max_images affects cost. Cost = unit price x number of successfully generated images.
Only effective in image editing mode (enable_interleave=false). Enables intelligent prompt rewriting that optimizes and refines the positive prompt. The negative prompt is not affected.
Controls whether the response uses streaming output. In interleaved text-image mode (enable_interleave=true), you must set this to true.
Adds a watermark label in the bottom-right corner of the image with fixed text "AI Generated".
Random number seed. Range: [0, 2147483647]. Same seed produces more consistent (but not identical) results. If omitted, a random seed is used.
Response
Unique request identifier for troubleshooting.