Generate vectors from text, images, and video for cross-modal search and retrieval
Multimodal embedding models convert text, images, and video into numerical vectors. These vectors enable cross-modal search (text-to-image, image-to-image, text-to-video), image classification, video classification, and content retrieval.
Get an API key and set it as an environment variable.
Generate separate vectors for each input modality (text, image, or video). Use this when you need to process each content type independently.
To generate a separate vector for each input (such as an image and its text caption), use
For detailed parameter descriptions and response schemas, see the Multimodal Embedding API reference.
If a call fails, see Error messages.
Prerequisites
Get an API key and set it as an environment variable.
Independent vectors
Generate separate vectors for each input modality (text, image, or video). Use this when you need to process each content type independently.
Multimodal independent embedding requires the DashScope SDK or API. OpenAI-compatible endpoints are not supported.
- Python
- Java
Model selection
To generate a separate vector for each input (such as an image and its text caption), use tongyi-embedding-vision-plus or tongyi-embedding-vision-flash.
Available models
| Model | Dimensions | Text limit | Image limit | Video limit |
|---|---|---|---|---|
| tongyi-embedding-vision-plus | 64, 128, 256, 512, 1024, 1152 (default) | 1,024 tokens | Max 3 MB per image | Max 10 MB per video |
| tongyi-embedding-vision-flash | 64, 128, 256, 512, 768 (default) | 1,024 tokens | Max 3 MB per image | Max 10 MB per video |
Input and language support
| Model | Text | Image | Video | Multi-images | Max items per request |
|---|---|---|---|---|---|
| tongyi-embedding-vision-plus | Chinese and English | JPEG, PNG, BMP (URL or Base64) | MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV (URL only) | Max 8 images | No element count limit. Total tokens must stay within the batch token limit. |
| tongyi-embedding-vision-flash | Chinese and English | JPEG, PNG, BMP (URL or Base64) | MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV (URL only) | Max 8 images | No element count limit. Total tokens must stay within the batch token limit. |