Animate from a single image
- Basic settings: Choose a duration and resolution based on the model (see the Availability section). Duration values depend on the model: wan2.6 supports any integer from 2 to 15 seconds, while wan2.5 supports only 5 or 10 seconds. The model also supports prompt rewriting and adding watermarks.
- Audio capabilities: Supports automatic dubbing or uploading audio to achieve audio-video sync. (Supported by wan2.6 and wan2.5)
- Multi-shot narrative: Generate videos with multiple shots while keeping the main subject consistent across shots. (Supported only by wan2.6)
Getting started
| Input prompt | Input first frame | Output video (multi-shot video with audio) |
|---|---|---|
| The camera slowly moves up from below the sea turtle. The turtle swims leisurely, and the details of its belly are clearly visible. | ![]() |
- DashScope Python SDK: 1.25.8 or later
- DashScope Java SDK: 2.22.6 or later
Availability
Supported models:
| Model | Features | Input modalities | Output video specifications |
|---|---|---|---|
wan2.6-i2v-flash Recommended | Video with audio, video without audio. Multi-shot narrative, audio-video sync | Text, image, audio | Resolution options: 720P, 1080P. Video duration: [2s, 15s] (integer). Defined specifications: 30 fps, MP4 (H.264 encoding) |
wan2.6-i2v Recommended | Video with audio. Multi-shot narrative, audio-video sync | Text, image, audio | Resolution options: 720P, 1080P. Video duration: [2s, 15s] (integer). Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.5-i2v-preview | Video with audio. Audio-video sync | Text, image, audio | Resolution options: 480P, 720P, 1080P. Video duration: 5s, 10s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.2-i2v-flash | Video without audio. 50% faster than model 2.1 | Text, image | Resolution options: 480P, 720P, 1080P. Video duration: 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.2-i2v-plus | Video without audio. This model offers comprehensive improvements in stability and success rate over model 2.1. | Text, image | Resolution options: 480P, 1080P. Video duration: 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.1-i2v-plus | Video without audio | Text, image | Resolution options: 720P. Video duration: 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.1-i2v-turbo | Video without audio | Text, image | Resolution options: 480P, 720P. Video duration: 3s, 4s, 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
Core features
Create multi-shot videos
Models: wan2.6-i2v-flash, wan2.6-i2v.
Introduction: Automatically switches shots, such as from a wide shot to a close-up. This feature is suitable for creating MVs and other scenarios.
Parameter settings:
shot_type: Must be"multi".prompt_extend: Must betrueto enable intelligent rewriting for optimized shot descriptions.
| Input prompt | Input first frame | Output video (wan2.6, multi-shot video) |
|---|---|---|
| A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise. | ![]() |
Make sure that your DashScope SDK version is up to date:
- Python SDK:
1.25.8or later - Java SDK:
2.22.6or later
Audio-video synchronization
Models: wan2.6-i2v-flash, wan2.6-i2v, wan2.5-i2v-preview.
Introduction: Animates characters in photos to speak or sing, with lip movements that match the audio. For more examples, see Sound generation.
Parameter settings:
- Provide an audio file: Pass the
audio_url. The model will align the lip movements based on the audio file. - Automatic dubbing: Do not pass an
audio_url. The model outputs a video with audio by default. It automatically generates background sound effects, music, or vocals based on the visuals.
| Input prompt | Input first frame | Output video (with audio) |
|---|---|---|
| A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise. | ![]() |
Make sure that your DashScope SDK version is up to date:
- Python SDK:
1.25.8or later - Java SDK:
2.22.6or later
- Provide an audio file
- Automatic dubbing
{task_id} with the task_id value returned by the previous API call.
Generate videos without audio
Models: wan2.6-i2v-flash, wan2.2 and earlier models.
Introduction: Suitable for visual-only scenarios that do not require audio, such as dynamic posters and silent short videos.
Parameter settings:
wan2.6-i2v-flash: By default, this model generates a video with audio. To generate a video without audio, you must explicitly setaudio=false. Even if you pass anaudio_url, the output is a silent video as long asaudio=false. For pricing details, see wan2.6-i2v-flash pricing.wan2.2 and earlier models: These models generate silent videos by default, with no extra configuration needed.
| Prompt | Input first frame | Output video (without audio) |
|---|---|---|
| A cat running on the grass | ![]() |
Make sure that your DashScope SDK version is up to date:
- Python SDK:
1.25.8or later - Java SDK:
2.22.6or later
- wan2.6-i2v-flash
- wan2.2 and earlier models
{task_id} with the task_id value returned by the previous API call.
How to provide images and audio
Input image
- Number: 1.
- Input methods: Public image URL, local file path, or Base64-encoded string.
Method 1: Public URL (HTTP interface, SDK) - Recommended
Method 1: Public URL (HTTP interface, SDK) - Recommended
- Requirements: Supports the HTTP or HTTPS protocol. Ensure that the image URL is directly accessible from the internet.
- Example: "https://example.com/img.png"
Method 2: Local file path (SDK only)
Method 2: Local file path (SDK only)
The file path requirements differ slightly for Python and Java. On Windows: Python uses two slashes (
Java SDK: Supports absolute paths only. The file path rules are as follows:
file://), while Java uses three (file:///). Follow the rules below carefully.Python SDK: Supports absolute and relative paths. The file path rules are as follows:| Operating system | Input File Path | Example (absolute path) | Example (relative path) |
|---|---|---|---|
| Linux / macOS | file:// + absolute or relative path | file:///home/images/test.png | file://./images/test.png |
| Windows | file:// + absolute or relative path | file://D:/images/test.png | file://./images/test.png |
| Operating system | Input file path | Example (absolute path) |
|---|---|---|
| Linux / macOS | file:// + absolute path | file:///home/images/test.png |
| Windows | file:/// + absolute path | file:///D:/images/test.png |
Method 3: Base64-encoded string (HTTP interface, SDK)
Method 3: Base64-encoded string (HTTP interface, SDK)
- Example:
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABDg......(The example is truncated for demonstration purposes only). - Format requirement: Follow the
data:<MIME_type>;base64,<base64_data>format, where:-
<base64_data>: The Base64-encoded string of the image file. -
<MIME_type>: The media type of the image, which must correspond to the file format.Image format MIME type JPEG image/jpeg JPG image/jpeg PNG image/png BMP image/bmp WEBP image/webp
-
Example code: Three input methods
Input audio
- Number: 1.
- Input method: Only publicly accessible URLs (HTTP or HTTPS) are supported—local file paths and Base64 encoding are not supported.
- Audio file constraints:
- Format: wav, mp3
- Duration: 3–30 seconds
- File size: Up to 15 MB
- If the audio exceeds the video duration, it is truncated. If the audio is shorter, the remaining video is silent.
Output video
- Number: 1.
- Output video specifications: Output specifications vary by model, see Availability.
- Output video URL validity: 24 hours.
- Output video dimensions: The dimensions are determined by the input image and the
resolutionsetting.- The model tries to maintain the aspect ratio of the input image while scaling it to a total pixel count close to the target value. Because of encoding standards, the width and height must be divisible by 16, so the model automatically adjusts the dimensions slightly.
- For example, if the input image is 750 x 1000 (aspect ratio 3:4 = 0.75) and you set resolution = "720P" (target pixel count is about 920,000), the final output might be 816 x 1104 (aspect ratio approximately 0.739, total pixels approximately 900,000), where both width and height are multiples of 16.
Billing and rate limits
- For the free quota and unit price, see Model pricing.
- For rate limits, see Wan series.
- Billing details:
- Billing is based on the duration in seconds of the successfully generated video.
- Failed model calls or processing errors do not incur fees or consume the free quota for new users.
API reference
Wan - image-to-video - first frame API reference
FAQ
Why can't I directly set the video aspect ratio (such as 16:9)?
The current API does not support directly specifying the video aspect ratio. You can only set the video's resolution using the resolution parameter.
The resolution parameter controls the total number of pixels in the video, not a fixed ratio. The model prioritizes preserving the original aspect ratio of the initial input image and makes minor adjustments to meet video encoding requirements. Both width and height must be multiples of 16.
SDK error: "url error, please check url!"
Make sure that:
- Your DashScope Python SDK version is
1.25.8or later. - Your DashScope Java SDK version is
2.22.6or later.


