Skip to main content
Models & inference

Embedding & reranking FAQ

Common questions about text embeddings, multimodal embeddings, and reranking — model selection, dimensions, batch limits, and use cases.

Text embedding

APIs: OpenAI compatible, DashScope

What is the maximum input length per text?

Each text can contain at most 8,192 tokens. Content exceeding this limit is truncated before embedding. Monitor input length when embedding long documents.

What batch size is supported?

Each API call accepts up to 10 texts. For larger collections, split the input into batches of 10 and make multiple calls. If you pass a file instead of an array, the file may contain at most 10 lines.

What embedding dimensions are available?

text-embedding-v4 supports 2,048, 1,536, 1,024 (default), 768, 512, 256, 128, and 64 dimensions. text-embedding-v3 supports 1,024 (default), 768, and 512 dimensions. Use the dimension parameter to select a specific size. Higher dimensions retain more semantic information but increase storage and computation cost. 1,024 dimensions work well for most use cases. Choose 1,536 or 2,048 for high-precision domains; choose 512 or lower when storage is constrained.

When should I use the text_type parameter?

Use text_type for search tasks to get better results:
  • text_type: 'query' — apply to user queries. Produces a directional vector optimized for finding information.
  • text_type: 'document' (default) — apply to stored documents. Produces a comprehensive vector optimized for being retrieved.
For tasks where all texts have the same role (clustering, classification), omit text_type. This parameter is available via the DashScope endpoint only.

What are dense vs. sparse vectors?

text-embedding-v4 and text-embedding-v3 support three output types controlled by the output_type parameter:
TypeStrengthsWeaknessesBest for
denseDeep semantic understanding, handles synonyms and contextHigher compute and storage cost, no exact keyword match guaranteeSemantic search, RAG, content recommendation
sparseFast exact keyword matching, low overheadNo semantic understanding, misses synonymsLog retrieval, SKU lookup, precise filtering
dense&sparseCombines semantics and keywordsHigher storage, more complex retrieval logicProduction hybrid search engines
Generation cost for dense&sparse is the same as for single-vector mode. This parameter is available via the DashScope endpoint only. The OpenAI-compatible endpoint does not support output_type.

What use cases are text embeddings suited for?

Common applications: semantic search (vector similarity matching), RAG (retrieval-augmented generation), recommendation systems (similarity between items), clustering, and text classification.

Multimodal embedding

API: DashScope multimodal embedding

What modalities are supported?

Both tongyi-embedding-vision-plus and tongyi-embedding-vision-flash support text, image, and video. Text is limited to Chinese and English. Use cases include cross-modal search (text-to-image, image-to-image, text-to-video), image classification, and video classification.

What image and video formats are accepted?

  • Images: JPEG, PNG, BMP — passed as a public URL or Base64-encoded string. Up to 8 images per request. Maximum 3 MB per image.
  • Videos: MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV — URL only (Base64 not supported). Maximum 10 MB per video.

Is the OpenAI-compatible endpoint supported for multimodal embedding?

No. Multimodal embedding requires the DashScope SDK or REST API. The OpenAI-compatible endpoint (/compatible-mode/v1/embeddings) supports text embedding only.

How many items can I send per request?

There is no fixed element count limit. The constraint is the total token count across all inputs — the batch must stay within the model's per-request token limit. For text, each input is limited to 1,024 tokens.

Reranking

APIs: OpenAI compatible, DashScope

When does reranking add the most value?

Reranking adds the most value when initial retrieval returns 20–100+ candidates with mixed relevance. A typical RAG pipeline: retrieve 50–100 candidates with embeddings, rerank to the top 5–10, then pass those to the LLM. If initial retrieval already returns highly relevant results (such as exact keyword match), reranking adds less value.

What is the instruct parameter and how should I write instructions?

instruct guides the model's ranking strategy. Always write instructions in English. Two common examples:
  • QA retrieval (default): "Given a web search query, retrieve relevant passages that answer the query." — prioritizes documents that directly answer the question.
  • Semantic similarity: "Retrieve semantically similar text." — prioritizes documents that express the same meaning in different words, useful for FAQ matching.
If omitted, the model defaults to QA retrieval.

What is top_n?

top_n limits the number of documents returned. If set to 5, only the top-5 ranked documents are returned. If omitted, all documents are returned in ranked order. If top_n exceeds the total document count, all documents are returned.

Model selection

Which text embedding model should I use?

Use text-embedding-v4 for most cases. It supports instructions, sparse vectors, and more embedding dimensions than text-embedding-v3. Both models share the same pricing ($0.07 per 1M input tokens) and batch limit (10 texts, 8,192 tokens per text).

Which reranking model is available?

qwen3-rerank is the available reranking model. It supports up to 500 documents per request, 4,000 tokens per document, and 100+ languages. Pricing is $0.1 per 1M tokens with a free quota of 1M tokens valid for 90 days after activating Qwen Cloud.