Skip to main content
Embedding & reranking

Embedding & reranking models

Choose a model for semantic search, RAG retrieval, cross-modal matching, and reranking.

Text embedding

Text-only search, RAG, or clustering → text-embedding-v4. Migrating an existing v3 index → text-embedding-v3 (compatible dimensions).

How many dimensions?

Large-scale search where storage matters → 256 or 512. General use → 1024 (default, good balance). Maximum accuracy on benchmarks → 1536 or 2048.

Multimodal embedding

Search images or videos by text query → tongyi-embedding-vision-plus.

Text-only data?

Use text-embedding-v4 instead — faster, cheaper, more dimension options. Multimodal embedding is for cross-modal retrieval (text↔image, text↔video).

Accuracy vs speed

Best accuracy → tongyi-embedding-vision-plus (dimensions up to 1152). Budget or latency-sensitive → tongyi-embedding-vision-flash (up to 768).

Reranking

Improve RAG precision → add qwen3-rerank after your embedding search. Re-scores top-N results with cross-attention for better ranking quality. Limits: 500 documents per request, 4,000 tokens per item, 30,000 tokens per request.

All models

ModelUse this when...DimensionsMax tokens
text-embedding-v4Text search, RAG, clustering64, 128, 256, 512, 768, 1024 (default), 1536, 20488,192
text-embedding-v3Existing v3 index migration512, 768, 1024 (default)8,192
tongyi-embedding-vision-plusCross-modal search, best accuracy1152, 1024, 512, 256, 128, 641,024
tongyi-embedding-vision-flashCross-modal search, budget768, 512, 256, 128, 641,024
qwen3-rerankRe-rank search results4,000/item

Learn more