Wan 2.6 — Generate from reference

POST

/services/aigc/video-generation/video-synthesis

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
  -H 'X-DashScope-Async: enable' \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "wan2.6-r2v-flash",
  "input": {
    "prompt": "Character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. Character1 says to Character2: \"that sounds great\"",
    "reference_urls": [
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png"
    ]
  },
  "parameters": {
    "size": "1280*720",
    "duration": 10,
    "audio": true,
    "shot_type": "multi",
    "watermark": true
  }
}'

{
  "request_id": "<string>",
  "output": {
    "task_id": "<string>",
    "task_status": "PENDING"
  }
}

Generate natural, lifelike performance videos from multimodal input (text, image, or video). Use a person or object as the main character.

Basic capabilities: Set duration (2–10s), resolution (720P/1080P), and watermarks.
Character portrayal: Replicate appearance from reference image or video. Videos also replicate voice timbre. Supports single or multi-character performances.
Multi-shot narrative: Intelligent multi-shot scheduling maintains character consistency across dialogue and interactions.

Authorizations

string

header

required

DashScope API Key. Create one in the Qwen Cloud console.

Header Parameters

enum<string>

required

Must be enable to create an asynchronous task.

Available options:enable

Body

application/json

enum<string>

required

Model name.

Available options:wan2.6-r2v-flash,wan2.6-r2v

Example:wan2.6-r2v-flash

object

required

Input data for reference-to-video generation.

Show child attributes

string

required

Text prompt describing the desired video content. Use character1, character2, etc. to reference characters from the reference_urls array in order. Each reference must contain only a single character.

Example:Character1 says to Character2: "that sounds great"

string[]

required

Array of reference image or video URLs. Up to 5 URLs total (up to 5 images, up to 3 videos). Each reference must contain a single character. The order determines character identifiers (character1, character2, etc.).

Example:

[
  "https://example.com/person1.mp4",
  "https://example.com/person2.mp4",
  "https://example.com/object.png"
]

Required range:items: 1–5

object

Generation parameters for reference-to-video.

Show child attributes

enum<string>

Output resolution as width*height. Determines the video aspect ratio (e.g., 1280*720 for 16:9, 720*1280 for 9:16).

Available options:1280*720,720*1280,960*960,1920*1080,1080*1920

Example:1280*720

integer

Video duration in seconds. Integer from 2 to 10 for both models.

Example:10

Required range:2 <= x <= 10

boolean

defaulttrue

Generate audio in the video. true (default): generate video with audio. false: generate silent video. Silent video is only supported by wan2.6-r2v-flash.

enum<string>

Shot mode. multi: multi-shot switching for enhanced expressiveness with natural dialogue and scene transitions. single: fixed single-shot perspective.

Available options:multi,single

Example:multi

boolean

defaultfalse

Add watermark to the output video.

Response

200-application/json

string

Unique request identifier.

object

Show child attributes

string

Task identifier. Use this with GET /tasks/{task_id} to poll for results.

enum<string>

Initial task status, typically PENDING.

Available options:PENDING