Process bulk requests asynchronously at 50% off
Process bulk
Each line in the JSONL input file is one request:
Set
Open Batch API in the Qwen Cloud console.
On the task list, view each task's progress (processed / total requests) and status. Filter by status to locate a task.
Click Cancel to stop a task that is
After the task reaches
On the Pay-As-You-Go page, view spending by model. Batch usage appears as a line item in the Spending Trends table. Data may lag by up to 1–2 hours.
Response (key fields):
Status lifecycle:
Each line in the output JSONL maps to a request by
Download the error file (
Filter parameters (query string):
Status moves to
qwen-max, qwen-plus, qwen-flash, or qwen-turbo requests asynchronously at 50% of the real-time price. Results are delivered within 24 hours. You can create and manage batch jobs using the Qwen Cloud console or the API.
Input file format
Each line in the JSONL input file is one request:
url to /v1/chat/completions for all requests. Up to 50,000 requests per file, 500 MB total, 6 MB per line. All requests must use the same model. Each custom_id must be unique.
Use the Qwen Cloud console
Open Batch API in the Qwen Cloud console.
Create a batch job
- Click Create batch job.
- Fill in the Task name and Description, set the Max wait time (1–14 days), and upload your JSONL input file.
- Click Create batch job to submit.
Click Sample File on the right to download a template JSONL file.
Monitor and manage tasks
On the task list, view each task's progress (processed / total requests) and status. Filter by status to locate a task.
Click Cancel to stop a task that is validating or in_progress. Click Detail to view job configuration, statistics, and files.
Download results
After the task reaches completed status, open the job detail page to download from Input & Output Files:
- Output file: Successful requests with their responses.
- Error file (if any): Failed requests with error details.
custom_id for matching against the original input.
View usage
On the Pay-As-You-Go page, view spending by model. Batch usage appears as a line item in the Spending Trends table. Data may lag by up to 1–2 hours.
Use the API
Upload file
Create batch
Dry-run with the test model: Use model
batch-test-model with endpoint /v1/chat/ds-test to validate your file format without inference costs. Limits: 1 MB file, 100 lines, 2 concurrent tasks.Check status
validating → in_progress → finalizing → completed. Terminal states: completed, failed, expired, cancelled. Poll every 1–2 minutes.
Response (key fields):
Download results
custom_id:
error_file_id) the same way to inspect failed requests. See error codes for details.
Manage batches
List batches
ds_name (fuzzy match), input_file_ids (comma-separated, max 20), status (comma-separated), create_after / create_before (format: yyyyMMddHHmmss), after (cursor), limit (page size).
Cancel a batch
cancelling, then cancelled after in-flight requests finish. Completed requests before cancellation are still billed.
Utility scripts
CSV to JSONL converter
CSV to JSONL converter
JSONL results to CSV converter
JSONL results to CSV converter
Notes
- 50% discount: Input and output tokens are billed at half the real-time price. Only successful requests are billed. See pricing.
- Thinking tokens: Models like
qwen3.6-plus,qwen3.5-plus, andqwen3.5-flashenable thinking by default, generating extra tokens at output price. Setenable_thinkingbased on task needs. See thinking. - Not stackable: Batch discount does not stack with context cache or other discounts.
- File storage: 10,000 files / 100 GB per account. Delete old files to free space.
- Rate limits: Create 1,000/min (1,000 concurrent), query 1,000/min, list 100/min, cancel 1,000/min.
- Task retention: Only tasks from the last 30 days are queryable via list.