Skip to main content
Models

Batch API

Process bulk requests asynchronously at 50% off

Process bulk qwen-max, qwen-plus, qwen-flash, or qwen-turbo requests asynchronously at 50% of the real-time price. Results are delivered within 24 hours.

Input file format

Each line in the JSONL input file is one request:
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"user","content":"Summarize quantum computing in two sentences."}]}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"user","content":"What is 2+2?"}]}}
Set url to /v1/chat/completions for all requests. Up to 50,000 requests per file, 500 MB total, 6 MB per line. All requests must use the same model. Each custom_id must be unique.

Run a batch

Upload file

import os
from pathlib import Path
from openai import OpenAI
client = OpenAI(api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1")

file_object = client.files.create(
  file=Path("input.jsonl"),
  purpose="batch"
)
print(file_object.id)  # <-- use this in the next step
Response (key fields):
{"id": "file-batch-xxx", "status": "uploaded", "purpose": "batch"}

Create batch

batch = client.batches.create(
  input_file_id="file-batch-xxx",       # <-- from upload step
  endpoint="/v1/chat/completions",       # <-- must match url in input file
  completion_window="24h",               # <-- 24h to 336h (14 days)
  metadata={
    "ds_name": "My batch job",         # <-- optional: task name (max 100 chars)
    "ds_description": "Weekly report", # <-- optional: task description (max 200 chars)
  }
)
print(batch.id)
Dry-run with the test model: Use model batch-test-model with endpoint /v1/chat/ds-test to validate your file format without inference costs. Limits: 1 MB file, 100 lines, 2 concurrent tasks.

Check status

batch = client.batches.retrieve("batch_xxx")
print(batch.status)  # <-- see status lifecycle below
Status lifecycle: validatingin_progressfinalizingcompleted. Terminal states: completed, failed, expired, cancelled. Poll every 1–2 minutes. Response (key fields):
{
  "id": "batch_xxx",
  "status": "completed",
  "output_file_id": "file-batch_output-xxx",  // <-- download this
  "error_file_id": "file-batch_error-xxx",     // <-- failed requests (if any)
  "request_counts": {"total": 100, "completed": 98, "failed": 2}
}

Download results

content = client.files.content("file-batch_output-xxx")  # <-- output_file_id from above
content.write_to_file("result.jsonl")
Each line in the output JSONL maps to a request by custom_id:
{"id": "batch_req_xxx", "custom_id": "req-1", "response": {"status_code": 200, "body": {"choices": [{"message": {"content": "..."}}], "usage": {...}}}}
Download the error file (error_file_id) the same way to inspect failed requests. See error codes for details.

Manage batches

List batches

batches = client.batches.list(limit=10)
Filter parameters (query string): ds_name (fuzzy match), input_file_ids (comma-separated, max 20), status (comma-separated), create_after / create_before (format: yyyyMMddHHmmss), after (cursor), limit (page size).

Cancel a batch

client.batches.cancel("batch_xxx")
Status moves to cancelling, then cancelled after in-flight requests finish. Completed requests before cancellation are still billed.

Utility scripts

import csv, json

def build_messages(content):
  return [{"role": "user", "content": content}]

with open("input.csv") as fin, open("input.jsonl", "w") as fout:
  for row in csv.reader(fin):
    request = {
      "custom_id": row[0],
      "method": "POST",
      "url": "/v1/chat/completions",
      "body": {"model": "qwen-plus", "messages": build_messages(row[1])}
    }
    fout.write(json.dumps(request, ensure_ascii=False) + "\n")
import json, csv

columns = ["custom_id", "status_code", "content", "usage"]

def get(obj, path):
  for key in path:
    obj = obj[key] if obj and key in obj else None
  return obj

with open("result.jsonl") as fin, open("result.csv", "w") as fout:
  writer = csv.writer(fout)
  writer.writerow(columns)
  for line in fin:
    r = json.loads(line)
    writer.writerow([
      r.get("custom_id"),
      get(r, ["response", "status_code"]),
      get(r, ["response", "body", "choices", 0, "message", "content"]),
      get(r, ["response", "body", "usage"]),
    ])

Notes

  • 50% discount: Input and output tokens are billed at half the real-time price. Only successful requests are billed. See pricing.
  • Thinking tokens: Models like qwen3.6-plus, qwen3.5-plus, and qwen3.5-flash enable thinking by default, generating extra tokens at output price. Set enable_thinking based on task needs. See thinking.
  • Not stackable: Batch discount does not stack with context cache or other discounts.
  • File storage: 10,000 files / 100 GB per account. Delete old files to free space.
  • Rate limits: Create 1,000/min (1,000 concurrent), query 1,000/min, list 100/min, cancel 1,000/min.
  • Task retention: Only tasks from the last 30 days are queryable via list.