Improve search accuracy
Retrieval systems prioritize speed, so results may lack precision. Reranking models re-score retrieved documents to place the most relevant results at the top, significantly improving search accuracy.
Get an API key and set it as an environment variable. To use the SDK, install it.
Pass a query and a list of candidate documents to the API. The model returns the documents ranked by relevance.
The
Use
Key terms:
If a call fails, see Error messages.
See Rate limits.
When reranking helps most: Reranking gives the biggest accuracy boost when your initial retrieval returns 20-100+ candidates with mixed relevance. If your retrieval already returns highly relevant results (such as exact keyword match), reranking adds less value. A typical RAG pipeline: retrieve 50-100 candidates with embeddings, rerank to top 5-10, then pass those to the LLM.
Prerequisites
Get an API key and set it as an environment variable. To use the SDK, install it.
Rerank documents
Pass a query and a list of candidate documents to the API. The model returns the documents ranked by relevance.
- OpenAI compatible
- DashScope
Core features
Use instructions to improve ranking (instruct)
The instruct parameter guides the model to use different ranking strategies. Write instructions in English.
-
QA retrieval (default):
"Given a web search query, retrieve relevant passages that answer the query."- Focus: finding answers. For the query "How to prevent colds?", "Washing hands frequently prevents colds" scores higher than "The common cold is a widespread illness" (topically related but doesn't answer the question).
-
Semantic similarity:
"Retrieve semantically similar text."- Focus: semantic equivalence regardless of wording. Example: "How to change my password?" matches "What if I forgot my password?" in an FAQ scenario.
- OpenAI compatible
- DashScope
curl
Return top results (top_n)
Use top_n to return only the highest-ranked documents. If omitted, all documents are returned sorted by relevance. If top_n exceeds the total document count, all documents are returned.
Model overview
| Model | Max documents | Max tokens per document | Max tokens per request | Languages | Price (per 1M tokens) | Free quota | Use cases |
|---|---|---|---|---|---|---|---|
| qwen3-rerank | 500 | 4,000 | 120,000 | 100+ languages: Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, etc. | $0.1 | 1M tokens (valid for 90 days) | Semantic text search, RAG applications |
- Max tokens per document: Maximum token count per query or document. Content exceeding this limit is truncated. Results are computed on truncated content only, which may reduce ranking accuracy.
- Max documents: Maximum number of documents per request.
- Max tokens per request: Calculated as
Query Tokens x Document Count + Total Document Tokens. This value must not exceed the per-request limit.