Skip to main content
Multimodal Embeddings

DashScope multimodal embedding

Multimodal embedding API

POST
/services/embeddings/multimodal-embedding/multimodal-embedding
curl --location --request POST \
  'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "tongyi-embedding-vision-plus",
    "input": {
        "contents": [
            {"text": "Multimodal embedding model"},
            {"image": "https://example.com/image.jpg"},
            {"video": "https://example.com/video.mp4"}
        ]
    }
}'
{
  "output": {
    "embeddings": [
      {
        "index": 0,
        "embedding": [
          0
        ],
        "type": "text"
      }
    ]
  },
  "usage": {
    "input_tokens": 0,
    "image_tokens": 0
  },
  "request_id": "1fff9502-a6c5-9472-9ee1-73930fdd04c5"
}
Convert text, images, and video into numerical vectors in a unified semantic space for cross-modal retrieval, similarity search, and content classification.

Endpoint

  • HTTP: POST https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding
  • SDK base_http_api_url: https://dashscope-intl.aliyuncs.com/api/v1

Model overview

ModelModalitiesDimensions
tongyi-embedding-vision-plusText, Image, Video, Multi-images64, 128, 256, 512, 1024, 1152 (default)
tongyi-embedding-vision-flashText, Image, Video, Multi-images64, 128, 256, 512, 768 (default)

Notes

  • Image input: Public URL or Base64 data URI (data:image/{format};base64,{data}).
  • Multi-images: Key multi_images. Value is a list of image URLs, max 8 images.
  • Video input: Must be a public URL. Use the fps parameter in parameters to control frame sampling rate (range [0, 1], default 1.0).

Authorizations

string
header
required

DashScope API Key. Create one in the Qwen Cloud console. Alternatively, you can pass the API Key via the X-DashScope-ApiKey request header.

Body

application/json
enum<string>
required

Model name for multimodal embedding.

tongyi-embedding-vision-plus,tongyi-embedding-vision-flash
tongyi-embedding-vision-plus
object
required

Input data containing the content items.

object

Parameters for multimodal embedding.

Response

200-application/json
object
object

Token usage statistics.

string

Unique request identifier.

1fff9502-a6c5-9472-9ee1-73930fdd04c5