Common questions about text embeddings, multimodal embeddings, and reranking — model selection, dimensions, batch limits, and use cases.
Text embedding
APIs: OpenAI compatible, DashScope
What is the maximum input length per text?
Each text can contain at most 8,192 tokens. Content exceeding this limit is truncated before embedding. Monitor input length when embedding long documents.
What batch size is supported?
Each API call accepts up to 10 texts. For larger collections, split the input into batches of 10 and make multiple calls. If you pass a file instead of an array, the file may contain at most 10 lines.
What embedding dimensions are available?
text-embedding-v4 supports 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, and 64 dimensions. text-embedding-v3 supports 1,024 (default), 768, and 512 dimensions. Use the dimension parameter to select a specific size.
Higher dimensions retain more semantic information but increase storage and computation cost. 1,024 dimensions work well for most use cases. Choose 1,536 or 2,048 for high-precision domains; choose 512 or lower when storage is constrained.
When should I use the text_type parameter?
Use text_type for search tasks to get better results:
text_type: 'query'— apply to user queries. Produces a directional vector optimized for finding information.text_type: 'document'(default) — apply to stored documents. Produces a comprehensive vector optimized for being retrieved.
text_type. This parameter is available via the DashScope endpoint only.
What are dense vs. sparse vectors?
text-embedding-v4 and text-embedding-v3 support three output types controlled by the output_type parameter:
| Type | Strengths | Weaknesses | Best for |
|---|---|---|---|
dense | Deep semantic understanding, handles synonyms and context | Higher compute and storage cost, no exact keyword match guarantee | Semantic search, RAG, content recommendation |
sparse | Fast exact keyword matching, low overhead | No semantic understanding, misses synonyms | Log retrieval, SKU lookup, precise filtering |
dense&sparse | Combines semantics and keywords | Higher storage, more complex retrieval logic | Production hybrid search engines |
dense&sparse is the same as for single-vector mode.
This parameter is available via the DashScope endpoint only. The OpenAI-compatible endpoint does not support output_type.
What use cases are text embeddings suited for?
Common applications: semantic search (vector similarity matching), RAG (retrieval-augmented generation), recommendation systems (similarity between items), clustering, and text classification.
Multimodal embedding
API: DashScope multimodal embedding
What modalities are supported?
Both tongyi-embedding-vision-plus and tongyi-embedding-vision-flash support text, image, and video. Text is limited to Chinese and English. Use cases include cross-modal search (text-to-image, image-to-image, text-to-video), image classification, and video classification.
What image and video formats are accepted?
- Images: JPEG, PNG, BMP — passed as a public URL or Base64-encoded string. Up to 8 images per request. Maximum 3 MB per image.
- Videos: MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV — URL only (Base64 not supported). Maximum 10 MB per video.
Is the OpenAI-compatible endpoint supported for multimodal embedding?
No. Multimodal embedding requires the DashScope SDK or REST API. The OpenAI-compatible endpoint (/compatible-mode/v1/embeddings) supports text embedding only.
How many items can I send per request?
There is no fixed element count limit. The constraint is the total token count across all inputs — the batch must stay within the model's per-request token limit. For text, each input is limited to 1,024 tokens.
Reranking
APIs: OpenAI compatible, DashScope
When does reranking add the most value?
Reranking adds the most value when initial retrieval returns 20–100+ candidates with mixed relevance. A typical RAG pipeline: retrieve 50–100 candidates with embeddings, rerank to the top 5–10, then pass those to the LLM.
If initial retrieval already returns highly relevant results (such as exact keyword match), reranking adds less value.
What is the instruct parameter and how should I write instructions?
instruct guides the model's ranking strategy. Always write instructions in English.
Two common examples:
- QA retrieval (default):
"Given a web search query, retrieve relevant passages that answer the query."— prioritizes documents that directly answer the question. - Semantic similarity:
"Retrieve semantically similar text."— prioritizes documents that express the same meaning in different words, useful for FAQ matching.
What is top_n?
top_n limits the number of documents returned. If set to 5, only the top-5 ranked documents are returned. If omitted, all documents are returned in ranked order. If top_n exceeds the total document count, all documents are returned.
Model selection
Which text embedding model should I use?
Use text-embedding-v4 for most cases. It supports instructions, sparse vectors, and more embedding dimensions than text-embedding-v3. Both models share the same pricing ($0.07 per 1M input tokens) and batch limit (10 texts, 8,192 tokens per text).
Which reranking model is available?
qwen3-rerank is the available reranking model. It supports up to 500 documents per request, 4,000 tokens per document, and 100+ languages. Pricing is $0.1 per 1M tokens with a free quota of 1M tokens valid for 90 days after activating Qwen Cloud.