Embedding & reranking FAQ

Text embedding

What is the maximum input length per text?

Each text can contain at most 8,192 tokens. Content exceeding this limit is truncated before embedding. Monitor input length when embedding long documents.

What batch size is supported?

Each API call accepts up to 10 texts. For larger collections, split the input into batches of 10 and make multiple calls. If you pass a file instead of an array, the file may contain at most 10 lines.

What embedding dimensions are available?

text-embedding-v4 supports 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, and 64 dimensions. text-embedding-v3 supports 1,024 (default), 768, and 512 dimensions. Use the dimension parameter to select a specific size. Higher dimensions retain more semantic information but increase storage and computation cost. 1,024 dimensions work well for most use cases. Choose 1,536 or 2,048 for high-precision domains; choose 512 or lower when storage is constrained.

When should I use the `text_type` parameter?

Use text_type for search tasks to get better results:

text_type: 'query' — apply to user queries. Produces a directional vector optimized for finding information.
text_type: 'document' (default) — apply to stored documents. Produces a comprehensive vector optimized for being retrieved.

For tasks where all texts have the same role (clustering, classification), omit text_type. This parameter is available via the DashScope endpoint only.

What are dense vs. sparse vectors?

text-embedding-v4 and text-embedding-v3 support three output types controlled by the output_type parameter:

Type	Strengths	Weaknesses	Best for
`dense`	Deep semantic understanding, handles synonyms and context	Higher compute and storage cost, no exact keyword match guarantee	Semantic search, RAG, content recommendation
`sparse`	Fast exact keyword matching, low overhead	No semantic understanding, misses synonyms	Log retrieval, SKU lookup, precise filtering
`dense&sparse`	Combines semantics and keywords	Higher storage, more complex retrieval logic	Production hybrid search engines

Generation cost for dense&sparse is the same as for single-vector mode. This parameter is available via the DashScope endpoint only. The OpenAI-compatible endpoint does not support output_type.

What use cases are text embeddings suited for?

Common applications: semantic search (vector similarity matching), RAG (retrieval-augmented generation), recommendation systems (similarity between items), clustering, and text classification.

Multimodal embedding

API: DashScope multimodal embedding

What modalities are supported?

Both tongyi-embedding-vision-plus and tongyi-embedding-vision-flash support text, image, and video. Text is limited to Chinese and English. Use cases include cross-modal search (text-to-image, image-to-image, text-to-video), image classification, and video classification.

What image and video formats are accepted?

Images: JPEG, PNG, BMP — passed as a public URL or Base64-encoded string. Up to 8 images per request. Maximum 3 MB per image.
Videos: MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV — URL only (Base64 not supported). Maximum 10 MB per video.

Is the OpenAI-compatible endpoint supported for multimodal embedding?

No. Multimodal embedding requires the DashScope SDK or REST API. The OpenAI-compatible endpoint (/compatible-mode/v1/embeddings) supports text embedding only.

How many items can I send per request?

There is no fixed element count limit. The constraint is the total token count across all inputs — the batch must stay within the model's per-request token limit. For text, each input is limited to 1,024 tokens.

Reranking

APIs: OpenAI compatible, DashScope

When does reranking add the most value?

Reranking adds the most value when initial retrieval returns 20–100+ candidates with mixed relevance. A typical RAG pipeline: retrieve 50–100 candidates with embeddings, rerank to the top 5–10, then pass those to the LLM. If initial retrieval already returns highly relevant results (such as exact keyword match), reranking adds less value.

What is the `instruct` parameter and how should I write instructions?

instruct guides the model's ranking strategy. Always write instructions in English. Two common examples:

QA retrieval (default): "Given a web search query, retrieve relevant passages that answer the query." — prioritizes documents that directly answer the question.
Semantic similarity: "Retrieve semantically similar text." — prioritizes documents that express the same meaning in different words, useful for FAQ matching.

If omitted, the model defaults to QA retrieval.

What is `top_n`?

top_n limits the number of documents returned. If set to 5, only the top-5 ranked documents are returned. If omitted, all documents are returned in ranked order. If top_n exceeds the total document count, all documents are returned.

Model selection

Which text embedding model should I use?

Use text-embedding-v4 for most cases. It supports instructions, sparse vectors, and more embedding dimensions than text-embedding-v3. Both models share the same pricing ($0.07 per 1M input tokens — this is the list price; for current promotions and discounted pricing, visit the Model Marketplace) and batch limit (10 texts, 8,192 tokens per text).

Which reranking model is available?

qwen3-rerank is the available reranking model. It supports up to 500 documents per request, 4,000 tokens per document, and 100+ languages. For pricing, see Models.

​Text embedding

​What is the maximum input length per text?

​What batch size is supported?

​What embedding dimensions are available?

​When should I use the text_type parameter?

​What are dense vs. sparse vectors?

​What use cases are text embeddings suited for?

​Multimodal embedding

​What modalities are supported?

​What image and video formats are accepted?

​Is the OpenAI-compatible endpoint supported for multimodal embedding?

​How many items can I send per request?

​Reranking

​When does reranking add the most value?

​What is the instruct parameter and how should I write instructions?

​What is top_n?

​Model selection

​Which text embedding model should I use?

​Which reranking model is available?