import base64
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import mimetypes
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# If you haven't configured the environment variable, replace the next line with your API key: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# --- Helper function: For Base64 encoding ---
# Format: data:{MIME_type};base64,{base64_data}
def encode_file(file_path):
mime_type, _ = mimetypes.guess_type(file_path)
if not mime_type or not mime_type.startswith("image/"):
raise ValueError("Unsupported or unrecognized image format")
with open(file_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
return f"data:{mime_type};base64,{encoded_string}"
"""
Image input methods:
Choose one of the following three methods,
1. Use a public URL - Suitable for publicly accessible images
2. Use a local file - Suitable for local development and testing
3. Use Base64 encoding - Suitable for private images or scenarios requiring encrypted transmission
"""
# [Method 1] Use a publicly accessible image URL
# Example: Use a public image URL
img_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
# [Method 2] Use a local file (supports absolute and relative paths)
# Format requirement: file:// + file path
# Example (absolute path):
# img_url = "file://" + "/path/to/your/img.png" # Linux/macOS
# img_url = "file://" + "/C:/path/to/your/img.png" # Windows
# Example (relative path):
# img_url = "file://" + "./img.png" # Relative to the current executable file's path
# [Method 3] Use a Base64-encoded image
# img_url = encode_file("./img.png")
# Set audio URL
audio_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
def sample_call_i2v():
# Synchronous call, returns result directly
print('please wait...')
rsp = VideoSynthesis.call(api_key=api_key,
model='wan2.6-i2v-flash',
prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.',
img_url=img_url,
audio_url=audio_url,
resolution="720P",
duration=10,
prompt_extend=True,
watermark=False,
negative_prompt="",
seed=12345)
print(rsp)
if rsp.status_code == HTTPStatus.OK:
print("video_url:", rsp.output.video_url)
else:
print('Failed, status_code: %s, code: %s, message: %s' %
(rsp.status_code, rsp.code, rsp.message))
if __name__ == '__main__':
sample_call_i2v(){
"request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx",
"output": {
"task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx",
"task_status": "PENDING"
}
}Authorizations
DashScope API Key. Create one in the Qwen Cloud console.
Header Parameters
Must be set to enable. HTTP requests support only asynchronous processing. Omitting this header returns a "current user api does not support synchronous calls" error.
Body
application/jsonModel name.
Input data including the first-frame image, prompt, and optional audio.
Show child attributes
Show child attributes
URL or Base64 string of the first-frame image.
Image constraints:
- Format: JPEG, JPG, PNG (no alpha channel), BMP, WEBP.
- Resolution: Width and height must be between 360 and 2,000 pixels.
- File size: Up to 10 MB.
Supported input formats:
- Public URL (HTTP/HTTPS supported).
- Base64-encoded image:
data:{MIME_type};base64,{base64_data}.
Text prompt describing the desired content and visual characteristics for the generated video. Both Chinese and English are supported. Length limits vary by model:
- wan2.6 and wan2.5 models: up to 1,500 characters.
- wan2.2 and wan2.1 models: up to 800 characters.
Text exceeding the limit is truncated automatically. For prompt tips, see Text-to-video/image-to-video prompt guide.
Negative prompt describing content to exclude from the video. Both Chinese and English are supported. Maximum 500 characters; excess is truncated automatically.
URL of an audio file. The model synchronizes video generation with this audio. Supported by wan2.6 and wan2.5 models only.
Audio constraints:
- Format: wav, mp3.
- Duration: 3–30 seconds.
- File size: Up to 15 MB.
- If the audio exceeds the video
duration, it is truncated. If shorter, the remaining video is silent.
Video generation parameters.
Show child attributes
Show child attributes
Resolution tier of the generated video. The model scales output to a similar total pixel count. The aspect ratio closely matches the input image. Resolution directly affects cost (1080P > 720P > 480P).
Default values and options by model:
- wan2.6-i2v-flash: 720P, 1080P (default: 1080P)
- wan2.6-i2v: 720P, 1080P (default: 1080P)
- wan2.5-i2v-preview: 480P, 720P, 1080P (default: 1080P)
- wan2.2-i2v-flash: 480P, 720P, 1080P (default: 720P)
- wan2.2-i2v-plus: 480P, 1080P (default: 1080P)
- wan2.1-i2v-turbo: 480P, 720P (default: 720P)
- wan2.1-i2v-plus: 720P (fixed)
Duration of the generated video in seconds. Longer durations cost more (billed per second).
Valid values by model:
- wan2.6-i2v-flash: integer 2–15 (default: 5)
- wan2.6-i2v: integer 2–15 (default: 5)
- wan2.5-i2v-preview: 5, 10 (default: 5)
- wan2.2-i2v-plus: fixed 5 (not configurable)
- wan2.2-i2v-flash: fixed 5 (not configurable)
- wan2.1-i2v-plus: fixed 5 (not configurable)
- wan2.1-i2v-turbo: 3, 4, 5 (default: 5)
Whether to enable prompt rewriting. When enabled, an LLM rewrites the input prompt, which can improve generation quality for shorter prompts but increases processing time.
Whether the video uses a single continuous shot or multiple switching shots. Supported by wan2.6 models only. Takes effect only when prompt_extend is true. When specified, overrides shot-related descriptions in the prompt.
Whether to generate a video with sound. Supported by wan2.6-i2v-flash only. Priority: audio > audio_url. If audio=false, the output is silent even if audio_url is provided. Audio setting affects pricing.
Whether to add an "AI Generated" watermark in the lower-right corner.
Random number seed for reproducibility. Range: [0, 2147483647]. If omitted, a random seed is used. Identical seeds do not guarantee identical results.
Response
Unique request identifier for tracing and troubleshooting.