Receive text output token by token as it is generated.
Enable streaming
- OpenAI Chat Completions
- OpenAI Responses API
- DashScope
No usage by default. Set
stream_options to get token counts in the last chunk only.Event format
Each SSE event is a data: line containing a JSON chunk. The final data: [DONE] signals the end of the stream.
Streaming with thinking mode
Two-phase streaming: thinking first, then the answer.
- OpenAI Chat Completions
- OpenAI Responses API
- DashScope
Streaming with tool calls
When streaming function calling responses, tool call arguments arrive as incremental deltas that must be concatenated before JSON parsing.
- Chat Completions
- Responses API
Each chunk's
delta may contain tool_calls[i].function.arguments — a partial JSON string. Accumulate all fragments per tool call index, then JSON.parse() the complete string.With thinking mode enabled, the stream delivers three phases: thinking tokens, then tool call deltas, then (after you send tool results) the final answer.
Notes
- Nginx proxy: set
proxy_buffering offor SSE events will buffer - High concurrency: size connection pool, monitor file descriptors
- Web frontend: use
ReadableStream+TextDecoderStream - Quality: streaming does not affect response quality
- Streaming-only models: QwQ and QVQ only support streaming output. Non-streaming calls to these models will fail or return empty content.