Skip to main content
Integrations

MLOps & observability

Production AI monitoring

Overview

The Analytics page provides observability for your model deployments, including token consumption, request counts, latency, success rates, and per-model performance metrics.

Analytics

Go to the Analytics page to view usage and analytics for your workspace.

Filters

  • Time range: Select the time window (such as 24 Hours).
  • Models: Filter by specific model or view all models.
  • Granularity: Choose the aggregation interval (such as 1 Hour).

Metrics

The page shows four key metrics:
MetricDescription
TokensTotal token consumption
RequestsTotal number of API requests
Avg LatencyAverage response latency
Success ratePercentage of successful requests
The Tokens Analysis chart below provides a visual breakdown of token consumption over time.
Cost includes all consumption across the entire platform. Refer to billing data for details.

Usage units by model type

TypeSubcategoryUnitBilling basis
Large language modelText generation, Deep thinking, Vision understandingTokenBilled by input and output token count
Vision modelImage generationImage (count)Billed by successfully generated images
Vision modelVideo generationSecondsBilled by successfully generated video duration
Speech modelTTS, Realtime TTS, File ASR, Realtime ASR, Audio/video translationSeconds, characters, or tokensVaries by model -- may bill by audio duration, text characters, or token count
Omni-modal modelOmni-modal, Realtime multimodalTokenText billed by tokens; other modalities (audio, image, video) billed by corresponding token count

Per-model metrics

Below the usage charts, a per-model table breaks down throughput (TPM/RPM), call volume, success rate, time to first token, and latency for each model. Use this to identify underperforming models or unexpected error spikes.

Request logs

Go to the Logs tab to inspect individual API requests. Logs are retained for the last 14 days.

Filters

  • Time range: Narrow the log window to a specific period.
  • Model: Filter by a specific model.
  • Status: Filter by HTTP status code (such as 200, 400, 429).
  • Request ID: Search for a specific request by its ID.

Log table

Each row shows:
ColumnDescription
Request IDUnique identifier for the request
TimestampWhen the request was made
ModelModel used for the request
UsageToken breakdown (total, input, output)
TTFTTime to first token
LatencyTotal response time
StatusHTTP status code

Request details

Click Details to open a side panel with the full request breakdown. You can view the data in List or JSON format. Click Export to download logs as a file for offline analysis.

Alerts

Set up custom alert rules to monitor API call metrics and receive real-time notifications when anomalies occur. For details, see Monitoring alerts.