OpenAI compatible reranking

POST

/reranks

qwen3-rerank

curl --request POST \
  --url https://dashscope-intl.aliyuncs.com/compatible-api/v1/reranks \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
        "model": "qwen3-rerank",
        "documents": [
                "Rerank models are widely used in search engines and recommendation systems. They sort candidate documents based on text relevance.",
                "Quantum computing is a cutting-edge field of computer science.",
                "The development of pre-trained language models has brought new advancements to rerank models."
        ],
        "query": "What is a rerank model?",
        "top_n": 2,
        "instruct": "Given a web search query, retrieve relevant passages that answer the query."
}'

{
  "id": "<string>",
  "object": "list",
  "model": "qwen3-rerank",
  "results": [
    {
      "document": {
        "text": "<string>"
      },
      "index": 0,
      "relevance_score": 0.9334521178273196
    }
  ],
  "usage": {
    "total_tokens": 0
  }
}

Rerank documents by semantic relevance to a query using qwen3-rerank.

The gte-rerank model will be discontinued on May 30, 2026. Switch to qwen3-rerank for continued service.

Before you call the API, get an API key and set it as an environment variable. If you use the OpenAI SDK, install it first.

Supported model: qwen3-rerank only.

Endpoint

HTTP: POST https://dashscope-intl.aliyuncs.com/compatible-api/v1/reranks
SDK base_url: https://dashscope-intl.aliyuncs.com/compatible-api/v1

Model overview

Model	Max Documents	Max Tokens/Doc	Max Request Tokens	Languages	Price (per 1M tokens)	Free Quota	Use Cases
qwen3-rerank	500	4,000	120,000	100+ languages	$0.1	1M tokens (valid for 90 days)	Text semantic search, RAG

Parameter definitions:

Max Tokens/Doc: Maximum token count per query or document. Content exceeding this limit is truncated, which may affect ranking accuracy.
Max Documents: Maximum number of documents per request.
Max Request Tokens: Calculated as Query Tokens x Document Count + Total Document Tokens. Must not exceed the limit.

Authorizations

string

header

required

Qwen Cloud API Key. Create one in the console.

Body

application/json

enum<string>

required

Model name. Must be qwen3-rerank for the text reranking endpoint.

Available options:qwen3-rerank

Example:qwen3-rerank

string

required

Query text. Max 4,000 tokens.

Example:What is a reranking model

string[]

required

Documents to rank. An array of strings. Max 500 documents.

Example:

[
  "Reranking models are widely used in search engines and recommendation systems to sort candidates by relevance",
  "Quantum computing is a frontier field of computer science",
  "The development of pre-trained language models has brought new advances to reranking"
]

integer

Return only the top N results. Defaults to returning all documents.

Example:2

Required range:x >= 1

string

Custom ranking task instruction. English recommended. Default behavior is QA retrieval: "Given a web search query, retrieve relevant passages that answer the query."

Example:Given a web search query, retrieve relevant passages that answer the query.

Response

200-application/json

string

Unique request identifier.

string

Object type. Always list.

Example:list

string

Model used for reranking.

Example:qwen3-rerank

object[]

Ranked results, sorted by relevance_score descending.

Show child attributes

object

Original document. Only returned when return_documents is true.

Show child attributes

string

Text content of the document.

integer

Original position in the input documents list.

Example:0

number

Relevance score between 0.0 and 1.0. Higher means more relevant. This is a relative score for the current request and should not be compared across requests.

Example:0.9334521178273196

object

Token usage statistics.

Show child attributes

integer

Total tokens consumed by this request.

​Endpoint

​Model overview

Authorizations

Body

Response

Endpoint

Model overview