Skip to main content
Batch

Batch API

Process bulk requests asynchronously at 50% off

Process bulk qwen-max, qwen-plus, qwen-flash, or qwen-turbo requests asynchronously at 50% of the real-time price. Results are delivered within 24 hours.
New to Qwen Cloud? Set up your API client first.

Input file format

Each line in the JSONL input file is one request:
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"user","content":"Summarize quantum computing in two sentences."}]}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"user","content":"What is 2+2?"}]}}
Set url to /v1/chat/completions for all requests. Up to 50,000 requests per file, 500 MB total, 6 MB per line. All requests must use the same model. Each custom_id must be unique.

Run a batch

Upload file

file_object = client.files.create(
  file=Path("input.jsonl"),
  purpose="batch"
)
print(file_object.id)  # <-- use this in the next step
Response (key fields):
{"id": "file-batch-xxx", "status": "uploaded", "purpose": "batch"}

Create batch

batch = client.batches.create(
  input_file_id="file-batch-xxx",       # <-- from upload step
  endpoint="/v1/chat/completions",       # <-- must match url in input file
  completion_window="24h",               # <-- 24h to 336h (14 days)
  metadata={
    "ds_name": "My batch job",         # <-- optional: task name (max 100 chars)
    "ds_description": "Weekly report", # <-- optional: task description (max 200 chars)
  }
)
print(batch.id)
Dry-run with the test model: Use model batch-test-model with endpoint /v1/chat/ds-test to validate your file format without inference costs. Limits: 1 MB file, 100 lines, 2 concurrent tasks.

Check status

batch = client.batches.retrieve("batch_xxx")
print(batch.status)  # <-- see status lifecycle below
Status lifecycle: validatingin_progressfinalizingcompleted. Terminal states: completed, failed, expired, cancelled. Poll every 1–2 minutes. Response (key fields):
{
  "id": "batch_xxx",
  "status": "completed",
  "output_file_id": "file-batch_output-xxx",  // <-- download this
  "error_file_id": "file-batch_error-xxx",     // <-- failed requests (if any)
  "request_counts": {"total": 100, "completed": 98, "failed": 2}
}

Download results

content = client.files.content("file-batch_output-xxx")  # <-- output_file_id from above
content.write_to_file("result.jsonl")
Each line in the output JSONL maps to a request by custom_id:
{"id": "batch_req_xxx", "custom_id": "req-1", "response": {"status_code": 200, "body": {"choices": [{"message": {"content": "..."}}], "usage": {...}}}}
Download the error file (error_file_id) the same way to inspect failed requests. See error codes for details.

Manage batches

List batches

batches = client.batches.list(limit=10)
Filter parameters (query string): ds_name (fuzzy match), input_file_ids (comma-separated, max 20), status (comma-separated), create_after / create_before (format: yyyyMMddHHmmss), after (cursor), limit (page size).

Cancel a batch

client.batches.cancel("batch_xxx")
Status moves to cancelling, then cancelled after in-flight requests finish. Completed requests before cancellation are still billed.

Utility scripts

import csv, json

def build_messages(content):
  return [{"role": "user", "content": content}]

with open("input.csv") as fin, open("input.jsonl", "w") as fout:
  for row in csv.reader(fin):
    request = {
      "custom_id": row[0],
      "method": "POST",
      "url": "/v1/chat/completions",
      "body": {"model": "qwen-plus", "messages": build_messages(row[1])}
    }
    fout.write(json.dumps(request, ensure_ascii=False) + "\n")
import json, csv

columns = ["custom_id", "status_code", "content", "usage"]

def get(obj, path):
  for key in path:
    obj = obj[key] if obj and key in obj else None
  return obj

with open("result.jsonl") as fin, open("result.csv", "w") as fout:
  writer = csv.writer(fout)
  writer.writerow(columns)
  for line in fin:
    r = json.loads(line)
    writer.writerow([
      r.get("custom_id"),
      get(r, ["response", "status_code"]),
      get(r, ["response", "body", "choices", 0, "message", "content"]),
      get(r, ["response", "body", "usage"]),
    ])

Notes

  • 50% discount: Input and output tokens are billed at half the real-time price. Only successful requests are billed. See pricing.
  • Thinking tokens: Models like qwen3.6-plus, qwen3.5-plus, and qwen3.5-flash enable thinking by default, generating extra tokens at output price. Set enable_thinking based on task needs. See thinking.
  • Not stackable: Batch discount does not stack with context cache or other discounts.
  • File storage: 10,000 files / 100 GB per account. Delete old files to free space.
  • Rate limits: Create 1,000/min (1,000 concurrent), query 1,000/min, list 100/min, cancel 1,000/min.
  • Task retention: Only tasks from the last 30 days are queryable via list.