Process bulk requests asynchronously at 50% off
Process bulk
Each line in the JSONL input file is one request:
Set
Response (key fields):
Status lifecycle:
Each line in the output JSONL maps to a request by
Download the error file (
Filter parameters (query string):
Status moves to
qwen-max, qwen-plus, qwen-flash, or qwen-turbo requests asynchronously at 50% of the real-time price. Results are delivered within 24 hours.
Input file format
Each line in the JSONL input file is one request:
url to /v1/chat/completions for all requests. Up to 50,000 requests per file, 500 MB total, 6 MB per line. All requests must use the same model. Each custom_id must be unique.
Run a batch
Upload file
Create batch
Dry-run with the test model: Use model
batch-test-model with endpoint /v1/chat/ds-test to validate your file format without inference costs. Limits: 1 MB file, 100 lines, 2 concurrent tasks.Check status
validating → in_progress → finalizing → completed. Terminal states: completed, failed, expired, cancelled. Poll every 1–2 minutes.
Response (key fields):
Download results
custom_id:
error_file_id) the same way to inspect failed requests. See error codes for details.
Manage batches
List batches
ds_name (fuzzy match), input_file_ids (comma-separated, max 20), status (comma-separated), create_after / create_before (format: yyyyMMddHHmmss), after (cursor), limit (page size).
Cancel a batch
cancelling, then cancelled after in-flight requests finish. Completed requests before cancellation are still billed.
Utility scripts
CSV to JSONL converter
CSV to JSONL converter
JSONL results to CSV converter
JSONL results to CSV converter
Notes
- 50% discount: Input and output tokens are billed at half the real-time price. Only successful requests are billed. See pricing.
- Thinking tokens: Models like
qwen3.6-plus,qwen3.5-plus, andqwen3.5-flashenable thinking by default, generating extra tokens at output price. Setenable_thinkingbased on task needs. See thinking. - Not stackable: Batch discount does not stack with context cache or other discounts.
- File storage: 10,000 files / 100 GB per account. Delete old files to free space.
- Rate limits: Create 1,000/min (1,000 concurrent), query 1,000/min, list 100/min, cancel 1,000/min.
- Task retention: Only tasks from the last 30 days are queryable via list.