Reranking

Retrieval systems prioritize speed, so results may lack precision. Reranking models re-score retrieved documents to place the most relevant results at the top, significantly improving search accuracy.

The gte-rerank model will be discontinued on May 30, 2026. Switch to qwen3-rerank for continued service.

When reranking helps most: Reranking gives the biggest accuracy boost when your initial retrieval returns 20-100+ candidates with mixed relevance. If your retrieval already returns highly relevant results (such as exact keyword match), reranking adds less value. A typical RAG pipeline: retrieve 50-100 candidates with embeddings, rerank to top 5-10, then pass those to the LLM.

Prerequisites

Get an API key and set it as an environment variable. To use the SDK, install it.

Rerank documents

Pass a query and a list of candidate documents to the API. The model returns the documents ranked by relevance.

OpenAI compatible
DashScope

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-api/v1",
)

results = client.post(
  "/reranks",
  body={
    "model": "qwen3-rerank",
    "query": "What is a reranking model",
    "documents": [
      "Reranking models are widely used in search engines and recommendation systems to rank candidate texts by relevance",
      "Quantum computing is a frontier field of computational science",
      "The development of pre-trained language models has brought new advances to reranking models"
    ],
    "top_n": 2
  },
  cast_to=object
)

print(results)

import dashscope
from http import HTTPStatus

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

resp = dashscope.TextReRank.call(
  model="qwen3-rerank",
  query="What is a reranking model",
  documents=[
    "Reranking models are widely used in search engines and recommendation systems to rank candidate texts by relevance",
    "Quantum computing is a frontier field of computational science",
    "The development of pre-trained language models has brought new advances to reranking models"
  ],
  top_n=2,
  return_documents=True
)

if resp.status_code == HTTPStatus.OK:
  print(resp)

Core features

Use instructions to improve ranking (instruct)

The instruct parameter guides the model to use different ranking strategies. Write instructions in English.

QA retrieval (default): "Given a web search query, retrieve relevant passages that answer the query."
- Focus: finding answers. For the query "How to prevent colds?", "Washing hands frequently prevents colds" scores higher than "The common cold is a widespread illness" (topically related but doesn't answer the question).
Semantic similarity: "Retrieve semantically similar text."
- Focus: semantic equivalence regardless of wording. Example: "How to change my password?" matches "What if I forgot my password?" in an FAQ scenario.

If unset, the model defaults to QA retrieval. For more task instruction examples, see the model repository.

OpenAI compatible
DashScope

curl

curl --request POST \
  --url https://dashscope-intl.aliyuncs.com/compatible-api/v1/reranks \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "qwen3-rerank",
    "query": "How to change my password?",
    "documents": [
      "Click Settings > Security > Change Password to update your credentials",
      "What if I forgot my password?",
      "Our platform supports two-factor authentication"
    ],
    "instruct": "Retrieve semantically similar text."
}'

Python

import dashscope
from http import HTTPStatus

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

resp = dashscope.TextReRank.call(
  model="qwen3-rerank",
  query="How to change my password?",
  documents=[
    "Click Settings > Security > Change Password to update your credentials",
    "What if I forgot my password?",
    "Our platform supports two-factor authentication"
  ],
  instruct="Retrieve semantically similar text."
)

if resp.status_code == HTTPStatus.OK:
  print(resp)

Return top results (top_n)

Use top_n to return only the highest-ranked documents. If omitted, all documents are returned sorted by relevance. If top_n exceeds the total document count, all documents are returned.

Supported models

Model	Max documents	Max tokens per document	Max tokens per request	Languages	Use cases
qwen3-rerank	500	4,000	120,000	100+ languages: Chinese, English, Spanish, French, Portuguese, Indonesian, Japanese, Korean, German, Russian, etc.	Semantic text search, RAG applications

Key terms:

Max tokens per document: Maximum token count per query or document. Content exceeding this limit is truncated. Results are computed on truncated content only, which may reduce ranking accuracy.
Max documents: Maximum number of documents per request.
Max tokens per request: Calculated as Query Tokens x Document Count + Total Document Tokens. This value must not exceed the per-request limit.

Prerequisites

Rerank documents

Core features

Use instructions to improve ranking (instruct)

Return top results (top_n)

Supported models

API reference

Error codes

Rate limits

​Prerequisites

​Rerank documents

​Core features

​Use instructions to improve ranking (instruct)

​Return top results (top_n)

​Supported models

​API reference

​Error codes

​Rate limits

Prerequisites

Rerank documents

Core features

Use instructions to improve ranking (instruct)

Return top results (top_n)

Supported models

API reference

Error codes

Rate limits