MLOps & observability

Overview

Qwen Cloud provides two complementary observability features for your model deployments:

Analytics: View token consumption, request counts, latency, and success rates
Monitoring: Track per-model performance metrics, view call logs, and manage rate limits

Analytics

Go to the Analytics page to view usage and analytics for your workspace.

Filters

Time range: Select the time window (such as 24 Hours).
Models: Filter by specific model or view all models.
Granularity: Choose the aggregation interval (such as 1 Hour).

Metrics

The page shows four key metrics:

Metric	Description
Tokens	Total token consumption
Requests	Total number of API requests
Avg Latency	Average response latency
Success rate	Percentage of successful requests

The Tokens Analysis chart below provides a visual breakdown of token consumption over time.

Cost includes all consumption across the entire platform. Refer to billing data for details.

Usage units by model type

Type	Subcategory	Unit	Billing basis
Large language model	Text generation, Deep thinking, Vision understanding	Token	Billed by input and output token count
Vision model	Image generation	Image (count)	Billed by successfully generated images
Vision model	Video generation	Seconds	Billed by successfully generated video duration
Speech model	TTS, Realtime TTS, File ASR, Realtime ASR, Audio/video translation	Seconds, characters, or tokens	Varies by model -- may bill by audio duration, text characters, or token count
Omni-modal model	Omni-modal, Realtime multimodal	Token	Text billed by tokens; other modalities (audio, image, video) billed by corresponding token count

Monitoring

Go to the Monitoring page to monitor your API usage, configure alert rules, and manage rate limits.

Monitoring tab

The Monitoring tab shows a dashboard of your model performance for the selected workspace. At the top, summary cards display aggregate metrics including total models called, total calls, failures, average time to first token, and average latency. You can adjust the time range to focus on a specific period. Below the summary, a per-model table breaks down throughput (TPM/RPM), call volume, failure rate, and latency for each model. Use this to identify underperforming models or unexpected error spikes.

Rate limit tab

The Rate Limit tab lets you request temporary rate limit increases for specific models. Click Increase Rate Limit Temporarily to submit a request, and track the status of previous requests in the table below.

​Overview

​Analytics

​Filters

​Metrics

​Usage units by model type

​Monitoring

​Monitoring tab

​Rate limit tab

Overview

Analytics

Filters

Metrics

Usage units by model type

Monitoring

Monitoring tab

Rate limit tab