Skip to main content

Manage fine-tuning jobs

Monitor, manage, and deploy fine-tuning jobs from the Qwen Cloud console.

After creating a fine-tuning job, use the Qwen Cloud console to monitor progress, review results, and deploy trained models.

Job list

The fine-tuning list page shows all your jobs. Use the filter at the top to narrow by status. From here you can:
  • Click a job name to view its detail page.
  • Click To Deploy on a completed job to create a deployment.
  • Click Delete to remove a job and its associated data.

Job detail page

Click a job name to open the detail page, which has four tabs:

Details tab

Review the full training configuration — base model, hyperparameters, dataset references, and token consumption. Use this tab to verify what was submitted.

Metrics tab

Monitor training quality through charts:
  • Training loss: Should decrease over time as the model improves.
  • Validation loss: If it diverges from training loss, the model is overfitting.
  • Accuracy: Token-level accuracy for both training and validation sets.

Logs tab

View raw training logs to debug failures or track detailed progress.

Output tab

Manage saved checkpoints here. Each checkpoint shows the epoch it was saved, publish status, and remaining TTL. Click Publish to make a checkpoint available as a custom model for deployment.

Publish checkpoints

To use a checkpoint, you must publish it:
  1. Go to the Output tab of the job detail page.
  2. Click Publish next to the desired checkpoint.
  3. Assign a model name. The published model appears in your Custom Models list.

Deploy trained models

A custom model cannot be called via API until you create a deployment for it. After publishing a checkpoint:
  1. Go to the Deployments page.
  2. Click Create Deployment and select your custom model.
  3. Once the deployment reaches Active, use its model code to call the API.
See Deployment overview for the full guide.

Stop and delete jobs

  • Stop: Cancels a running job. Any completed checkpoints are preserved.
  • Delete: Removes the job and its associated data. This action cannot be undone.

Job statuses

StatusDescription
PendingJob is submitted and waiting for resources.
QueuedJob is in the scheduling queue.
RunningTraining is actively in progress.
CompletedTraining finished successfully.
FailedTraining encountered an error. Check logs for details.