Generate video from text
The Wan text-to-video model supports multimodal input — including text and audio — and generates videos up to 15 seconds long at 1080P resolution.
Before calling the API, get an API key. Then set your API key as an environment variable. To use the SDK, install the DashScope SDK.
Sample output
Supported models:
Supported models:
Supported models:
Supported models:
Text-to-video API reference
Use the
Make sure:
Check these items:
- Core capabilities: Supports integer video durations (2-15 seconds), custom video resolutions (480P, 720P, or 1080P), prompt rewriting, and watermarking.
- Audio capabilities: Supports automatic dubbing or custom audio files for synchronized audio and video. (Supported by wan2.5 and wan2.6)
- Multi-shot narrative: Generates videos with multiple shots while keeping the main subject consistent across shot transitions. (Supported only by wan2.6)
Getting started
| Input prompt | Output video (multi-shot, audio-enabled) |
|---|---|
| A thrilling detective chase story with cinematic storytelling. Shot 1 [0-3 s]: Wide shot of a rainy New York street at night, neon lights flickering, a detective in a black trench coat walking briskly. Shot 2 [3-6 s]: Medium shot of the detective entering an old building, rain soaking his coat, the door closing slowly behind him. Shot 3 [6-9 s]: Close-up of the detective's focused, determined eyes as distant sirens wail and he frowns slightly in thought. Shot 4 [9-12 s]: Medium shot of the detective moving carefully down a dim hallway, his flashlight illuminating the path ahead. Shot 5 [12-15 s]: Close-up of the detective discovering a key clue, his face lighting up with sudden realization. |
- Python SDK
- Java SDK
- curl
Make sure your DashScope Python SDK version is at least
1.25.8 before running the code below.If your version is too low, you may see errors such as "url error, please check url!". Install the SDK.video_url expires after 24 hours. Download the video promptly.Availability
Supported models:
| Model | Features | Input modalities | Output video specifications |
|---|---|---|---|
wan2.6-t2v Recommended | Video with audio. Multi-shot narrative, audio-video sync | Text, audio | Resolution options: 720P, 1080P. Video duration: [2s, 15s] (integer). Defined specifications: 30 fps, MP4 (H.264 encoding) |
wan2.5-t2v-preview Recommended | Video with audio. Audio-video sync | Text, audio | Resolution options: 480P, 720P, 1080P. Video duration: 5s, 10s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.2-t2v-plus | Video without audio | Text | Resolution options: 480P, 1080P. Video duration: 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.1-t2v-turbo | Video without audio | Text | Resolution options: 480P, 720P. Video duration: 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
| wan2.1-t2v-plus | Video without audio | Text | Resolution options: 720P. Video duration: 5s. Defined specifications: 30 fps, MP4 (H.264 encoding) |
Core capabilities
Create multi-shot videos
Supported models: wan2.6 series.
Description: The model automatically switches between shots — for example, from a wide shot to a close-up — ideal for music videos and similar use cases.
Parameters:
shot_type: Set to"multi".prompt_extend: Set totrue(enables prompt rewriting to optimize shot descriptions).
| Input prompt | Output video (multi-shot video) |
|---|---|
| A vision of harmony between future technology and nature. Shot 1 [0-2 s]: Wide shot of an aerial garden in a futuristic city, floating plants swaying gently in the breeze. Shot 2 [2-4 s]: A robot gardener carefully trims plants with precise, graceful movements. Shot 3 [4-7 s]: Sunlight streams through a transparent dome, illuminating the entire garden and showcasing perfect fusion of technology and nature. Shot 4 [7-10 s]: The camera pulls back to reveal the grand scale of the entire futuristic city, with the aerial garden just one part of it. |
- Python SDK
- Java SDK
- curl
Make sure your DashScope Python SDK version is at least
1.25.8. Install the SDK.Synchronize audio and video
Supported models: wan2.5 and wan2.6 series.
Description: Make characters in photos speak or sing, with mouth movements matching the audio. For more examples, see Video audio generation.
Parameters:
- Provide an audio file: Pass an
audio_url. The model aligns mouth movement to the audio. - Automatic dubbing: Audio-enabled video is generated by default. Do not pass
audio_url. The model auto-generates background sound effects, music, or voice based on the scene.
| Input example | Output video (audio-enabled video) |
|---|---|
| Input prompt: Shot from a low angle, in a medium close-up, with warm tones, mixed lighting (the practical light from the desk lamp blends with the overcast light from the window), side lighting, and a central composition. In a classic detective office, wooden bookshelves are filled with old case files and ashtrays. A green desk lamp illuminates a case file spread out in the center of the desk. A fox, wearing a dark brown trench coat and a light gray fedora, sits in a leather chair, its fur crimson, its tail resting lightly on the edge, its fingers slowly turning yellowed pages. Outside, a steady drizzle falls beneath a blue sky, streaking the glass with meandering streaks. It slowly raises its head, its ears twitching slightly, its amber eyes gazing directly at the camera, its mouth clearly moving as it speaks in a smooth, cynical voice: 'The case was cold, colder than a fish in winter. But every chicken has its secrets, and I, for one, intended to find them '. Input audio: |
- Python SDK
- Java SDK
- curl
Make sure your DashScope Python SDK version is at least
1.25.8. Install the SDK.Generate silent videos
Supported models: wan2.2 series, wan2.1 series.
Description: Ideal for visual-only use cases like animated posters or silent short videos.
Parameters: Silent video is the default output for wan2.2 and earlier versions. No extra configuration is needed.
| Input prompt | Output video (silent video) |
|---|---|
| Low contrast. A street musician performs in a retro 1970s-style subway station, bathed in dim colors and rough textures. He wears a vintage jacket and plays guitar with intense focus. Commuters rush past. A small crowd gradually gathers to listen. The camera pans slowly right, capturing the interplay of instrument sounds and city noise, with vintage subway signs and peeling walls in the background. |
- Python SDK
- Java SDK
- curl
Ensure that the DashScope SDK for Python version is at least
1.25.8. For instructions on how to update, see Installing the SDK.Input audio
- Number of files: One.
- Input methods:
- Public URL: Supports HTTP or HTTPS protocols.
Output video
- Number of files: One.
- Format: MP4. See Video specifications for details.
- URL expiration: 24 hours.
- Dimensions: Determined by the
sizeparameter. For example, whensizeis set to1280*720, the output video has a 16:9 aspect ratio.
Billing and rate limits
- For free quota and pricing details, see Model invocation pricing.
- For model rate limits, see Rate limits.
- Billing details:
- Input is free. Output is billed per successfully generated second of video.
- Failed model calls or processing errors incur no charge and do not consume your free quota.
API reference
Text-to-video API reference
FAQ
How do I set the video aspect ratio (for example, 16:9)?
Use the size parameter to specify the video resolution. The system calculates the aspect ratio automatically from that resolution.
For example, setting size=1280*720 outputs a 16:9 video. Each size maps to a fixed aspect ratio. Choose the resolution that matches your target ratio.
SDK error: "url error, please check url!"
Make sure:
- Your DashScope Python SDK version is at least
1.25.8. - Your DashScope Java SDK version is at least
2.22.6.
Why does the call fail with "Model not exist"?
Check these items:
- Is the model name spelled correctly?