Manage training datasets for fine-tuning models on Qwen Cloud.
Datasets are structured data files used to fine-tune models on Qwen Cloud. You can create, upload, and manage datasets from the Datasets console page.
Datasets use JSONL format containing instruction-response pairs structured as a
You can upload up to 10 files per dataset, each no larger than 200 MB.
Each line in the JSONL file must contain a
When you create a dataset, it is saved as a Draft. You must publish the dataset before it can be used in a fine-tuning job. See Manage datasets for details.
Data format
Datasets use JSONL format containing instruction-response pairs structured as a messages array. This format is used for SFT (Supervised Fine-Tuning) training.
| Format | Max size | Requirements |
|---|---|---|
| JSONL | 200 MB | Must contain a messages array |
Data format example
Each line in the JSONL file must contain a messages array with role and content fields:
The
<think> tag in the assistant response is optional. Include it if you want the fine-tuned model to produce chain-of-thought reasoning before its final answer.Dataset workflow
When you create a dataset, it is saved as a Draft. You must publish the dataset before it can be used in a fine-tuning job. See Manage datasets for details.
Next steps
- Create a dataset -- Upload data and create a new dataset.
- Manage datasets -- Publish, edit, or delete datasets.
- Create a fine-tuning job -- Use your published dataset to train a custom model.