Model releases - Qwen Cloud

July 1, 2026

wan2.7-t2v-2026-06-12

A snapshot version of the Wan 2.7 text-to-video model. The model capabilities are the same as wan2.7-t2v. API reference

wan2.7-r2v-2026-06-12

A snapshot version of the Wan 2.7 reference-to-video model. It supports subject referencing and voice customization, and can generate a scripted video directly from a single, multi-panel storyboard image. API reference

June 26, 2026

kimi-k2.7-code

Kimi K2.7 Code model is now available in the Singapore region. An agent-centric coding model optimized for long-range software engineering tasks. Supports thinking mode only.

June 25, 2026

qwen-image-2.0-pro-2026-06-22

The latest snapshot of the Qwen-Image-2.0 series, unifying image generation and editing. Compared with the April 22 snapshot, this model features enhanced text rendering with support for up to 1k token instructions, more refined photorealistic quality and scene detail, and stronger semantic adherence.

June 22, 2026

HappyHorse 1.1

happyhorse-1.1-t2v

HappyHorse 1.1 text-to-video model. Supports audio video generation with 3--15 seconds duration at 720P/1080P resolution. API reference

happyhorse-1.1-i2v

HappyHorse 1.1 image-to-video model. Supports audio video generation with 3--15 seconds duration at 720P/1080P resolution. API reference

happyhorse-1.1-r2v

HappyHorse 1.1 reference-to-video model. Supports multiple reference images as input to generate audio videos with 3--15 seconds duration at 720P/1080P resolution. API reference

June 10, 2026

qwen3.7-max-2026-06-08

The Max model, the largest and most capable in the Qwen3.7 series, has added visual-modal understanding compared to the May 20 snapshot, enabling it to perceive real-world scenes and supporting multimodal interactive hybrid agent capabilities.

June 1, 2026

qwen3.7-plus, qwen3.7-plus-2026-05-26

Qwen 3.7 Plus series builds upon strong text capabilities with comprehensively upgraded vision-language abilities while maintaining full agentic capabilities in coding, tool use, and productivity workflows. Its core differentiator is multimodal interactive hybrid agent capabilities — perceiving real-world scenarios, reading screens and operating GUIs, generating code based on visual references, and end-to-end navigation of mobile applications.

May 27, 2026

glm-5.1

The Zhipu GLM-5.1 model is designed for long-horizon tasks and supports a 200K context window with a maximum output of 128K tokens. With strong logical reasoning, long-text comprehension, and code generation capabilities, it delivers excellent results across multiple benchmarks and is well suited for intelligent interaction, enterprise applications, and development assistance.

May 25, 2026

qwen3.7-max-preview, qwen3.7-max-2026-05-17

Qwen Max series model snapshots. Text-only input, thinking mode enabled by default.

May 22, 2026

qwen3.5-livetranslate-flash-realtime, qwen3.5-livetranslate-flash-realtime-2026-05-19

High-precision, real-time multilingual audio and video translation model built on Qwen3.5-Omni. Supports 60 languages (29 with audio + text output, 31 with text-only output), voice cloning for speaker-preserving translation, and visual context from video streams to improve accuracy. Upgrades from qwen3-livetranslate-flash-realtime with broader language coverage and lower latency. User guide | API reference

May 21, 2026

qwen3.7-max, qwen3.7-max-2026-05-20

Next-generation flagship in the Qwen Max series. Text-only input, thinking mode enabled by default, supports explicit cache. Excels at coding, office & productivity, and long-horizon autonomous execution.

May 11, 2026

deepseek-v4-pro, deepseek-v4-flash

DeepSeek V4 series. deepseek-v4-pro is a large-scale MoE model with strong general reasoning. deepseek-v4-flash is a lightweight, cost-effective model optimized for speed. Both support function calling, context cache, and thinking mode (enabled by default) with 1M context and a shared 384k output budget.

April 27, 2026

HappyHorse video series

happyhorse-1.0-t2v

HappyHorse text-to-video model. Supports audio video generation with 3--15 seconds duration at 720P/1080P resolution. API reference

happyhorse-1.0-i2v

HappyHorse image-to-video model. Supports audio video generation with 3--15 seconds duration at 720P/1080P resolution. API reference

happyhorse-1.0-r2v

HappyHorse reference-to-video model. Supports multiple reference images as input to generate audio videos with 3--15 seconds duration at 720P/1080P resolution. API reference

happyhorse-1.0-video-edit

HappyHorse video editing model. Supports video editing and processing. API reference

April 26, 2026

wan2.7-t2v-2026-04-25

A snapshot version of the Wan 2.7 text-to-video model. The model capabilities are the same as wan2.7-t2v. API reference

wan2.7-i2v-2026-04-25

A snapshot version of the Wan 2.7 image-to-video model. The model capabilities are the same as wan2.7-i2v. API reference

April 23, 2026

qwen3.5-plus-2026-04-20

New Qwen3.5-Plus snapshot with significantly improved agentic coding and faster inference compared to the February 15 snapshot. Knowledge, reasoning, and long-context capabilities remain strong — ideal for coding agents, production workflows, and high-throughput scenarios.

qwen3.6-27b

Qwen3.6 27B dense open-source vision-language model. Enhanced agentic coding and STEM reasoning over Qwen3.5-27B. Significant improvements in spatial intelligence, object localization and detection. Steady gains in video understanding, document OCR, and vision agent capabilities. Supports function calling and structured output but not built-in tools.

qwen-image-2.0-pro-2026-04-22

New Qwen-Image-2.0-Pro snapshot with unified image generation and editing. Compared to the March 3 snapshot, this model delivers noticeable improvements in visual quality — especially texture detail, lighting, and materials. Supports multilingual in-image text rendering and more balanced artistic style expression.

April 20, 2026

qwen3.6-max-preview

The largest closed-source model in the Qwen3.6 series with improved coding and more efficient Agent execution. Supports text-only input, thinking mode (enabled by default), explicit cache, and Function Calling.

April 16, 2026

qwen3.6-flash, qwen3.6-flash-2026-04-16, qwen3.6-35b-a3b

Qwen3.6 native vision-language Flash series with significant overall improvements over Qwen3.5-Flash. Enhanced agent programming capabilities with major benchmark gains over the previous generation, stronger math and code reasoning, and improved spatial intelligence — particularly object localization and detection.

fun-asr, fun-asr-2025-11-07

Fun-ASR real-time speech recognition upgrades: dialect support covering seven major Chinese dialect groups and 20+ regional accents, improved classical poetry recognition, enhanced punctuation prediction and text normalization (numbers, dates, monetary values), and multilingual support for 30 languages.

April 3, 2026

Wan 2.7 video series

wan2.7-videoedit

Instruction-based video editing and migration. Supports content replacement with reference images and replicating actions, effects, and camera movements.

wan2.7-i2v

Multimodal input (text, image, audio, video) for first-frame, start-and-end-frame, and video continuation tasks.

wan2.7-t2v

New resolution options and custom aspect ratios for different scenarios and platforms.

wan2.7-r2v

Entity reference, voice customization, and playbook-based video generation from a single storyboard.

April 2, 2026

qwen3.6-plus, qwen3.6-plus-2026-04-02

Upgraded coding (Agentic Coding, frontend, Vibe Coding), multimodal (object recognition, OCR, localization), and general reasoning. Fixes known issues from Qwen-3.5-Plus.

April 1, 2026

wan2.7-image-pro, wan2.7-image

Text-to-image, image-to-image, image editing, multi-image reference, and interactive editing. Pro series supports 4K output.

March 31, 2026

General availability

Qwen Cloud is now generally available.