Connection pooling, object pooling, high-concurrency patterns, and performance monitoring for speech services.
TTS services use WebSocket connections for real-time streaming. In production, creating a new connection per request wastes resources and adds latency. This guide covers connection pooling, object pooling, and concurrent request management for high-throughput TTS workloads.
The Python SDK provides
The Java SDK uses OkHttp3 connection pooling (enabled by default) plus an optional Apache Commons Pool2 object pool for
Step 2: Add commons-pool2 dependency
Step 3: Create and use the object pool
Track these metrics to maintain healthy production TTS services:
Access these metrics from the SDK:
Before going live, verify the following:
Prerequisites
- Obtain and configure your API Key as the
DASHSCOPE_API_KEYenvironment variable. - Install the latest DashScope SDK:
- Python SDK: >= 1.25.2
- Java SDK: >= 2.16.6
High-concurrency TTS patterns
Python: Object pool
The Python SDK provides SpeechSynthesizerObjectPool to manage and reuse SpeechSynthesizer instances. The pool pre-creates objects and establishes WebSocket connections at initialization, eliminating per-request connection overhead.
Pool sizing: Set max_size to 1.5x-2x your peak concurrency. Do not exceed your account's QPS limit.
Never return a synthesizer to the pool if the task failed or is still running. Close it manually instead.
Java: Connection pool + object pool
The Java SDK uses OkHttp3 connection pooling (enabled by default) plus an optional Apache Commons Pool2 object pool for SpeechSynthesizer instances.
Step 1: Configure connection pool via environment variables
| Variable | Default | Recommendation |
|---|---|---|
DASHSCOPE_CONNECTION_POOL_SIZE | 32 | 2x peak concurrency |
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS | 32 | Match connection pool size |
DASHSCOPE_MAXIMUM_ASYNC_REQUESTS_PER_HOST | 32 | Match connection pool size |
- Maven
- Gradle
| Variable | Default | Recommendation |
|---|---|---|
SAMBERT_OBJECTPOOL_SIZE (Sambert) | 500 | 1.5x-2x peak concurrency, must not exceed connection pool size |
Reference server sizing: a 4-core 8 GiB machine can handle ~600 concurrent Sambert TTS tasks with an object pool of 1200 and a connection pool of 2000.
Performance monitoring
Track these metrics to maintain healthy production TTS services:
| Metric | Description | Target |
|---|---|---|
| First packet delay | Time from request to first audio chunk | < 500 ms |
| End-to-end latency | Total time for complete synthesis | Depends on text length |
| Error rate | Percentage of failed requests | < 0.1% |
| Pool utilization | Borrowed objects / pool size | 60%-80% at peak |
| Connection reuse ratio | Reused connections / total requests | > 95% |
Production checklist
Before going live, verify the following:
- API Key stored in environment variable, not hardcoded.
- Connection pool and object pool sizes configured for expected peak load.
- Pool sizes do not exceed your account's QPS limit.
- Error handling returns failed objects to disposal (not back to pool).
- Graceful shutdown calls
pool.shutdown()(Python) or pool close (Java). - WebSocket connections use the correct region endpoint.
- Monitoring dashboards track first-packet delay, error rate, and pool utilization.
- Load tested with 2x expected peak concurrency.
- Retry logic with exponential backoff for transient failures.
Related
- Text to Speech -- TTS models, parameters, and streaming modes.
- Realtime streaming -- realtime TTS streaming guide.
- Improve recognition accuracy -- ASR optimization including high-concurrency ASR patterns.