Skip to main content

Datasets overview

Manage training datasets for fine-tuning models on Qwen Cloud.

Datasets are structured data files used to fine-tune models on Qwen Cloud. You can create, upload, and manage datasets from the Datasets console page.

Data format

Datasets use JSONL format containing instruction-response pairs structured as a messages array. This format is used for SFT (Supervised Fine-Tuning) training.
FormatMax sizeRequirements
JSONL200 MBMust contain a messages array
You can upload up to 10 files per dataset, each no larger than 200 MB.

Data format example

Each line in the JSONL file must contain a messages array with role and content fields:
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Who painted the human body during the Renaissance?"},
    {"role": "assistant", "content": "<think>Optional model thinking</think>The Renaissance was a period of rebirth in art, culture, and scholarship. Many artists depicted the human form during this era."}
  ]
}
The <think> tag in the assistant response is optional. Include it if you want the fine-tuned model to produce chain-of-thought reasoning before its final answer.

Dataset workflow

When you create a dataset, it is saved as a Draft. You must publish the dataset before it can be used in a fine-tuning job. See Manage datasets for details.

Next steps