Production AI monitoring
Overview
Qwen Cloud provides two complementary observability features for your model deployments:
- Analytics: View token consumption, request counts, latency, and success rates
- Monitoring: Track per-model performance metrics, view call logs, and manage rate limits
Analytics
Go to the Analytics page to view usage and analytics for your workspace.
Filters
- Time range: Select the time window (such as 24 Hours).
- Models: Filter by specific model or view all models.
- Granularity: Choose the aggregation interval (such as 1 Hour).
Metrics
The page shows four key metrics:
| Metric | Description |
|---|---|
| Tokens | Total token consumption |
| Requests | Total number of API requests |
| Avg Latency | Average response latency |
| Success rate | Percentage of successful requests |
Cost includes all consumption across the entire platform. Refer to billing data for details.
Usage units by model type
| Type | Subcategory | Unit | Billing basis |
|---|---|---|---|
| Large language model | Text generation, Deep thinking, Vision understanding | Token | Billed by input and output token count |
| Vision model | Image generation | Image (count) | Billed by successfully generated images |
| Vision model | Video generation | Seconds | Billed by successfully generated video duration |
| Speech model | TTS, Realtime TTS, File ASR, Realtime ASR, Audio/video translation | Seconds, characters, or tokens | Varies by model -- may bill by audio duration, text characters, or token count |
| Omni-modal model | Omni-modal, Realtime multimodal | Token | Text billed by tokens; other modalities (audio, image, video) billed by corresponding token count |