Skip to main content
Video gen & edits

General video editing

Repaint, extend, and edit

Availability

Supported models:
ModelFeaturesInput modalitiesOutput video specifications
wan2.1-vace-plusVideo without audio. Multi-image reference, Video repainting, Local editing, Video extension, Frame expansionText, image, videoResolution options: 720P. Video duration: Up to 5s. Defined specifications: 30 fps, MP4 (H.264 encoding)

Core capabilities

Multi-image reference

Description: Supports up to 3 reference images, including subjects and backgrounds (people, animals, clothing, scenes). The model merges the images to generate coherent video content. Parameter settings:
  • function: Must be image_reference.
  • ref_images_url: An array of URLs. Supports 1 to 3 reference images.
  • obj_or_bg: Identifies each image as a subject (obj) or background (bg). The length of this array must be the same as the length of the ref_images_url array.
Input promptInput reference image 1 (Reference subject)Input reference image 2 (Reference background)Output video
In the video, a girl walks out from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every graceful moment. When she stops and looks around at the lush trees, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of intertwined light and shadow, records her wonderful encounter with nature.
image
image
Output video
Before calling the API, get an API key. Then set your API key as an environment variable.
  • curl
  • Python
  • Java
Step 1: Create a task to get the task ID
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "wan2.1-vace-plus",
  "input": {
    "function": "image_reference",
    "prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When she stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.",
    "ref_images_url": [
      "http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
      "http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
    ]
  },
  "parameters": {
    "prompt_extend": true,
    "obj_or_bg": ["obj","bg"],
    "size": "1280*720"
  }
}'
Step 2: Get the result using the task IDReplace {task_id} with the task_id value returned by the previous API call.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Video repainting

Description: Extracts the subject's pose and motion, composition and motion contours, or sketch structure from an input video. Then combines this with a text prompt to generate a new video with the same dynamic features. You can also replace the subject with a reference image. Parameter settings:
  • function: Must be video_repainting.
  • video_url: Required. The URL of the input video. Must be MP4 format, no larger than 50 MB, and no longer than 5 seconds.
  • control_condition: Optional. Video feature extraction method. This determines which features from the original video are retained:
    • posebodyface: Extracts facial expressions and body movements. Retains facial expression details.
    • posebody: Extracts only body movements, without the face. Controls only body motion.
    • depth: Extracts composition and motion contours. Retains the scene structure.
    • scribble: Extracts the sketch structure. Retains sketch edge details.
  • strength: Optional. Controls feature extraction strength. Range: 0.0--1.0. Default: 1.0. Higher values make the output more similar to the original; lower values allow more creative freedom.
  • ref_images_url: Optional. URL of a reference image to replace the subject in the input video.
Input promptInput videoOutput video
The video shows a black steampunk-style car driven by a gentleman, adorned with gears and copper pipes. The background is a steam-powered candy factory with retro elements, creating a vintage and playful scene.Input videoOutput video
  • curl
  • Python
  • Java
Step 1: Create a task to get the task ID
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "wan2.1-vace-plus",
  "input": {
    "function": "video_repainting",
    "prompt": "The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.",
    "video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
  },
  "parameters": {
    "prompt_extend": false,
    "control_condition": "depth"
  }
}'
Step 2: Get the result using the task IDReplace {task_id} with the task_id value returned by the previous API call.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Local editing

Description: Performs fine-grained editing on specified video areas. Supports adding, deleting, and modifying elements, or replacing subjects and backgrounds. Upload a mask image to specify the editing area -- the model automatically tracks the target and blends the generated content. Parameter settings:
  • function: Must be video_edit.
  • video_url: Required. The URL of the original input video.
  • mask_image_url: Optional. Specify either this parameter or mask_video_url. We recommend using this parameter. The URL of a mask image. White areas of the mask are edited; black areas remain unchanged.
  • mask_frame_id: Optional. Use with mask_image_url to specify which video frame the mask corresponds to. Default: first frame.
  • mask_type: Optional. Specifies the behavior of the editing area:
    • tracking (default): The editing area automatically follows the target's motion trajectory.
    • fixed: The editing area stays in a fixed position.
  • expand_ratio: Optional. Only effective when mask_type is tracking.
    • The ratio by which the mask area expands outward. Range: 0.0--1.0. Default: 0.05.
    • Lower values fit the target more closely; higher values expand the mask area.
  • size: Optional. Output resolution as width*height (e.g., 1280*720).
  • ref_images_url: Optional. URL of a reference image. Content in the editing area is replaced with the reference image content.
Input promptInput videoInput mask imageOutput video
The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft tones and warm lighting illuminating the area where the lion is.Input video
mask
The white area indicates the editing area.
Output video
  • curl
  • Python
  • Java
Step 1: Create a task to get the task ID
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "wan2.1-vace-plus",
  "input": {
    "function": "video_edit",
    "prompt": "The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.",
    "mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
    "video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
    "mask_frame_id": 1
  },
  "parameters": {
    "prompt_extend": false,
    "mask_type": "tracking",
    "expand_ratio": 0.05,
    "size": "1280*720"
  }
}'
Step 2: Get the result using the task IDReplace {task_id} with the task_id value returned by the previous API call.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Video extension

Description: Predicts and generates continuous content based on an input image or video clip. Supports extending a video forward from the first frame or clip, or backward from the last frame or clip. The generated video is 5 seconds long. Parameter settings:
  • function: Must be video_extension.
  • prompt: Required. A description of the desired extended content.
  • first_clip_url: Optional. The URL of the first video clip (3 seconds or shorter). The model generates the rest of the video based on this clip.
  • last_clip_url: Optional. The URL of the last video clip (3 seconds or shorter). The model generates the preceding content based on this clip.
  • first_frame_url: Optional. The URL of the first frame image. The video extends forward from this frame.
  • last_frame_url: Optional. The URL of the last frame image. Generation proceeds backward from this frame.
Specify at least one of the following: first_clip_url, last_clip_url, first_frame_url, or last_frame_url.
Input promptInput first clip video (1 second)Output video (Extended video is 5 seconds)
A dog wearing sunglasses is skateboarding on the street, 3D cartoon.Input videoOutput video
  • curl
  • Python
  • Java
Step 1: Create a task to get the task ID
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "wan2.1-vace-plus",
  "input": {
    "function": "video_extension",
    "prompt": "A dog wearing sunglasses is skateboarding on the street, 3D cartoon.",
    "first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
  },
  "parameters": {
    "prompt_extend": false
  }
}'
Step 2: Get the result using the task IDReplace {task_id} with the task_id value returned by the previous API call.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Frame expansion

Description: Expands video frame content proportionally in all directions (top, bottom, left, right) based on a prompt. Maintains video subject continuity and ensures a natural blend with the background. Parameter settings:
  • function: Must be video_outpainting.
  • video_url: Required. The URL of the original input video.
  • top_scale: Optional. Upward expansion ratio. Range: 1.0--2.0. Default: 1.0 (no expansion).
  • bottom_scale: Optional. Downward expansion ratio. Range: 1.0--2.0. Default: 1.0.
  • left_scale: Optional. Leftward expansion ratio. Range: 1.0--2.0. Default: 1.0.
  • right_scale: Optional. Rightward expansion ratio. Range: 1.0--2.0. Default: 1.0.
Example: Setting left_scale to 1.5 expands the left side of the frame to 1.5 times its original width.
Input promptInput videoOutput video
An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.Input videoOutput video
  • curl
  • Python
  • Java
Step 1: Create a task to get the task ID
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "wan2.1-vace-plus",
  "input": {
    "function": "video_outpainting",
    "prompt": "An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.",
    "video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
  },
  "parameters": {
    "prompt_extend": false,
    "top_scale": 1.5,
    "bottom_scale": 1.5,
    "left_scale": 1.5,
    "right_scale": 1.5
  }
}'
Step 2: Get the result using the task IDReplace {task_id} with the task_id value returned by the previous API call.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Input images and videos

Input images

  • Number of images: See the number required for your selected feature above.
  • Input method:
    • Public URL: Supports HTTP and HTTPS protocols. Example: https://xxxx/xxx.png.

Input videos

  • Number of videos: See the number required for your selected feature above.
  • Input method:
    • Public URL: Supports HTTP and HTTPS protocols. Example: https://xxxx/xxx.mp4.

Output video

  • Number of videos: One.
  • Format: MP4. See video specifications below for resolution and dimensions.
  • URL expiration: 24 hours.
  • Dimensions: Varies based on the selected feature.
    • Multi-image reference / Local editing:
      • Output resolution is fixed at 720P.
      • Specific width and height are determined by the size parameter.
    • Video repainting / Video extension / Frame expansion:
      • If the input video resolution is 720P or lower, the output resolution matches the input.
      • If the input video resolution is higher than 720P, the output is scaled down to 720P while maintaining aspect ratio.

Billing and rate limits

  • For free quota and pricing, see Model invocation pricing.
  • For rate limits, see Rate limits.
  • Billing details:
    • Input is free. Output is billed per successfully generated second of video.
    • Failed model calls or processing errors incur no charge and do not consume your free quota.

API reference

General video editing API reference

FAQ

Max images for multi-image reference?

Supports a maximum of 3 reference images. If you provide more than 3, only the first 3 are used. For best results, use a solid background for the subject image to highlight the subject better, and ensure the background image does not contain subject objects.

When should I disable prompt rewriting for video repainting?

If the text description is inconsistent with the input video content, the model may misinterpret your request. In this case, we recommend manually disabling prompt rewriting by setting prompt_extend=false and providing a clear, specific scene description in the prompt. This improves consistency and accuracy.

Mask image vs mask video in local editing

Specify either a mask image using mask_image_url or a mask video using mask_video_url. We recommend using a mask image because you only need to specify the editing area in a single frame, and the system automatically tracks the target.
General video editing | Qwen Cloud