Choose a model for semantic search, RAG retrieval, cross-modal matching, and reranking.
Text embedding
Text-only search, RAG, or clustering → text-embedding-v4.
Migrating an existing v3 index → text-embedding-v3 (compatible dimensions).
How many dimensions?
Large-scale search where storage matters → 256 or 512.
General use → 1024 (default, good balance).
Maximum accuracy on benchmarks → 1536 or 2048.
Multimodal embedding
Search images or videos by text query → tongyi-embedding-vision-plus.
Text-only data?
Use text-embedding-v4 instead — faster, cheaper, more dimension options.
Multimodal embedding is for cross-modal retrieval (text↔image, text↔video).
Accuracy vs speed
Best accuracy → tongyi-embedding-vision-plus (dimensions up to 1152).
Budget or latency-sensitive → tongyi-embedding-vision-flash (up to 768).
Reranking
Improve RAG precision → add qwen3-rerank after your embedding search. Re-scores top-N results with cross-attention for better ranking quality.
Limits: 500 documents per request, 4,000 tokens per item, 30,000 tokens per request.
All models
| Model | Use this when... | Dimensions | Max tokens |
|---|---|---|---|
text-embedding-v4 | Text search, RAG, clustering | 64, 128, 256, 512, 768, 1024 (default), 1536, 2048 | 8,192 |
text-embedding-v3 | Existing v3 index migration | 512, 768, 1024 (default) | 8,192 |
tongyi-embedding-vision-plus | Cross-modal search, best accuracy | 1152, 1024, 512, 256, 128, 64 | 1,024 |
tongyi-embedding-vision-flash | Cross-modal search, budget | 768, 512, 256, 128, 64 | 1,024 |
qwen3-rerank | Re-rank search results | — | 4,000/item |