Datasets overview - Qwen Cloud

Datasets are structured data files used to fine-tune models on Qwen Cloud. You can create, upload, and manage datasets from the Datasets console page.

Data format

Datasets use JSONL format containing instruction-response pairs structured as a messages array. This format is used for SFT (Supervised Fine-Tuning) training.

Format	Max size	Requirements
JSONL	200 MB	Must contain a `messages` array

You can upload up to 10 files per dataset, each no larger than 200 MB.

Data format example

Each line in the JSONL file must contain a messages array with role and content fields:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Who painted the human body during the Renaissance?"},
    {"role": "assistant", "content": "<think>Optional model thinking</think>The Renaissance was a period of rebirth in art, culture, and scholarship. Many artists depicted the human form during this era."}
  ]
}

The <think> tag in the assistant response is optional. Include it if you want the fine-tuned model to produce chain-of-thought reasoning before its final answer.

Dataset workflow

When you create a dataset, it is saved as a Draft. You must publish the dataset before it can be used in a fine-tuning job. See Manage datasets for details.

Next steps

Create a dataset -- Upload data and create a new dataset.
Manage datasets -- Publish, edit, or delete datasets.
Create a fine-tuning job -- Use your published dataset to train a custom model.

​Data format

​Data format example

​Dataset workflow

​Next steps

Data format

Data format example

Dataset workflow

Next steps