Skip to main content
Embedding & reranking

Reranking

Improve search accuracy

Retrieval systems prioritize speed, so results may lack precision. Reranking models re-score retrieved documents to place the most relevant results at the top, significantly improving search accuracy.
When reranking helps most: Reranking gives the biggest accuracy boost when your initial retrieval returns 20-100+ candidates with mixed relevance. If your retrieval already returns highly relevant results (such as exact keyword match), reranking adds less value. A typical RAG pipeline: retrieve 50-100 candidates with embeddings, rerank to top 5-10, then pass those to the LLM.

Prerequisites

Get an API key and set it as an environment variable. To use the SDK, install it.

Rerank documents

Pass a query and a list of candidate documents to the API. The model returns the documents ranked by relevance.
  • OpenAI compatible
  • DashScope
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-api/v1",
)

results = client.post(
  "/reranks",
  body={
    "model": "qwen3-rerank",
    "query": "What is a reranking model",
    "documents": [
      "Reranking models are widely used in search engines and recommendation systems to rank candidate texts by relevance",
      "Quantum computing is a frontier field of computational science",
      "The development of pre-trained language models has brought new advances to reranking models"
    ],
    "top_n": 2
  },
  cast_to=object
)

print(results)

Core features

Use instructions to improve ranking (instruct)

The instruct parameter guides the model to use different ranking strategies. Write instructions in English.
  • QA retrieval (default): "Given a web search query, retrieve relevant passages that answer the query."
    • Focus: finding answers. For the query "How to prevent colds?", "Washing hands frequently prevents colds" scores higher than "The common cold is a widespread illness" (topically related but doesn't answer the question).
  • Semantic similarity: "Retrieve semantically similar text."
    • Focus: semantic equivalence regardless of wording. Example: "How to change my password?" matches "What if I forgot my password?" in an FAQ scenario.
If unset, the model defaults to QA retrieval. For more task instruction examples, see the model repository.
  • OpenAI compatible
  • DashScope
curl
curl --request POST \
  --url https://dashscope-intl.aliyuncs.com/compatible-api/v1/reranks \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "qwen3-rerank",
    "query": "How to change my password?",
    "documents": [
      "Click Settings > Security > Change Password to update your credentials",
      "What if I forgot my password?",
      "Our platform supports two-factor authentication"
    ],
    "instruct": "Retrieve semantically similar text."
}'

Return top results (top_n)

Use top_n to return only the highest-ranked documents. If omitted, all documents are returned sorted by relevance. If top_n exceeds the total document count, all documents are returned.

Model overview

ModelMax documentsMax tokens per documentMax tokens per requestLanguagesPrice (per 1M tokens)Free quotaUse cases
qwen3-rerank5004,000120,000100+ languages: Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, etc.$0.11M tokens (valid for 90 days)Semantic text search, RAG applications
Key terms:
  • Max tokens per document: Maximum token count per query or document. Content exceeding this limit is truncated. Results are computed on truncated content only, which may reduce ranking accuracy.
  • Max documents: Maximum number of documents per request.
  • Max tokens per request: Calculated as Query Tokens x Document Count + Total Document Tokens. This value must not exceed the per-request limit.

API reference

Error codes

If a call fails, see Error messages.

Rate limits

See Rate limits.