NPCs and virtual personas
Qwen's role playing model is designed for virtual social interactions, game NPCs, IP personification, and hardware integration.
This model supports session caching to improve response speed. Tokens that hit the cache are billed as implicit caching.
For the input and output parameters, see Chat API reference.
Get an API key and set it as an environment variable. To use the SDK, install it.
Define a character persona and initiate a conversation by sending user requests.
When you use the Character model for role playing, configure the following aspects of the system message:
Use the assistant message to set a conversation starter. Recommendations:
To maintain a continuous conversation, append the new content to the end of the
Response example
Set the Response example
If you are not satisfied with the model's output, adjust the
Response example
The group chat feature lets the model play a specified role and interact with other characters.
Instructions:
Response example
If a user does not reply after receiving the model's output, prompt the model to continue the conversation. To do this, add an assistant message to the Response example
The model sometimes uses parentheses to indicate actions, such as Response exampleThe model does not output content that contains parentheses.
In a multi-turn conversation, you may need to insert one-time supplementary information or instructions such as game status, operational prompts, or retrieval results. This information is not initiated by the user or character. This type of information can influence the character's response while keeping the conversation prefix (session) consistent to improve the cache hit ratio. Insert this content as a Response example
The context window of role-playing models is limited to 32,000 tokens, which makes it difficult to support very long multi-turn conversations. After you enable long-term memory, the model regularly summarizes historical conversations and compresses them to within 1,000 tokens. This process retains key contextual information to support extended multi-turn conversations.
Set
Session caching automatically manages context to avoid recalculating tokens, reducing costs and latency without affecting response quality.
How to enable session caching: To enable the cache service, add the
As the number of conversation turns increases, the
If a call fails, see Error messages.
Supported models
| Model | Context window | Max input | Max output | Input cost | Output cost |
|---|---|---|---|---|---|
| qwen-plus-character | 32,768 | 30,000 | 4,000 | $0.5 | $1.4 |
| qwen-flash-character | 8,192 | 8,000 | 4,096 | $0.05 | $0.4 |
| qwen-plus-character-ja | 8,192 | 7,680 | 512 | $0.5 | $1.4 |
API reference
For the input and output parameters, see Chat API reference.
Prerequisites
Get an API key and set it as an environment variable. To use the SDK, install it.
Usage
Define a character persona and initiate a conversation by sending user requests.
Making a conversation call
Character settings
When you use the Character model for role playing, configure the following aspects of the system message:
- Character details Specify details including name, age, personality, occupation, profile, and relationships.
- Additional character descriptions Include a comprehensive description of the character's experiences and interests. Use tags to separate different categories of content and describe them in text.
- Conversation context Specify the background of the scenario and the relationships between characters. Provide clear instructions and requirements for the character to follow during the conversation.
- Additional style guidelines Specify the character's style and the length of their responses. If the character needs to exhibit special behaviors, such as actions or expressions, include these as well.
Setting the opening line
Use the assistant message to set a conversation starter. Recommendations:
- Reflect the character's speaking style. For example, use parentheses () to indicate actions and use a tone that is either assertive or gentle.
- Reflect the scenario and character settings, such as relationships between partners, parents and children, or colleagues.
Appending conversation history
To maintain a continuous conversation, append the new content to the end of the messages array after each turn. If the conversation becomes too long, pass only the last n turns of the conversation history to manage the context window. The first element of the messages array must always be the system message.
Making a request
- OpenAI compatible
- DashScope
Full JSON response
Full JSON response
Diverse responses
Set the n parameter (1–4, default 1) to get multiple responses in a single request.
- OpenAI compatible
- DashScope
Full JSON response
Full JSON response
Regenerating responses
If you are not satisfied with the model's output, adjust the seed parameter, which controls randomness, to generate a new response.
Result diversity is also affected by
top_p and temperature. Low values may produce similar results even with different seed values; high values may produce varied results regardless of seed. We recommend keeping the defaults and adjusting only one parameter at a time.- OpenAI compatible
- DashScope
Full JSON response
Full JSON response
Simulating a group chat
The group chat feature lets the model play a specified role and interact with other characters.
Instructions:
- The role played by the model is
assistant, and the role of other chat participants isuser. - Each character's name must be specified at the beginning of the
content. - When making a call, add an assistant message at the end. The message must start with the current character's name as a prefix, such as "Ling Lu:". Also set the parameter
"partial": true.
- OpenAI compatible
- DashScope
Full JSON response
Full JSON response
Continuous responses
If a user does not reply after receiving the model's output, prompt the model to continue the conversation. To do this, add an assistant message to the messages array with the content set to "Character Name:". In this message, you must also set the parameter "partial": true. This encourages the user to respond.
- OpenAI compatible
- DashScope
Restricting output content
The model sometimes uses parentheses to indicate actions, such as (waves at you). If you want to prevent the model from outputting certain content, set the logit_bias parameter to adjust the probability of specific tokens appearing. The logit_bias field is a map where the key is the token ID and the value specifies the token's probability. To view token IDs, download the logit_bias_id_mapping_table.json. The value ranges from [-100, 100]. A value of -1 reduces the likelihood of selection, while 1 increases it. A value of -100 completely bans the token, and 100 makes it the only selectable token. We do not recommend setting the value to 100 because it can cause output loops.
For example, to prohibit the output of parentheses ():
- OpenAI compatible
- DashScope
Full JSON response
Full JSON response
Inserting supplementary information
In a multi-turn conversation, you may need to insert one-time supplementary information or instructions such as game status, operational prompts, or retrieval results. This information is not initiated by the user or character. This type of information can influence the character's response while keeping the conversation prefix (session) consistent to improve the cache hit ratio. Insert this content as a system message before the last unanswered user message. For example, insert retrieved user information such as "\user's favorite food:\nFruit:Blueberry\nSnack:Fried chicken\nStaple food:Dumplings".
- OpenAI compatible
- DashScope
Long-term memory
The context window of role-playing models is limited to 32,000 tokens, which makes it difficult to support very long multi-turn conversations. After you enable long-term memory, the model regularly summarizes historical conversations and compresses them to within 1,000 tokens. This process retains key contextual information to support extended multi-turn conversations.
Long-term memory only supports Chinese scenarios.
Enable the feature
Set character_options.memory.enable_long_term_memory to true to enable long-term memory. Set the summary frequency using character_options.memory.memory_entries. After you enable this feature, use it as follows:
-
Session binding: Each request must provide a unique session ID, such as a universally unique identifier (UUID), in the header. Pass the session ID in the
x-dashscope-aca-sessionfield to associate sessions.The system automatically purges sessions that are unused for 365 days. -
Persona setting: Pass the user persona in the
character_options.profilefield. -
Incremental input: The
messagesfield only needs to include new messages. The system automatically loads and manages historical memory and summaries, which eliminates the need to manually concatenate the full context.
system messages, convey one-time supplementary information or instructions that are not part of the conversation history. These messages are not suitable for summarization in subsequent conversations. Examples include "Player enters Level 3" or "Today is Valentine's Day". Specify the message types to skip using the character_options.memory.skip_save_types parameter, which is an array:
system: Skips system messages that are added in the current turn.user: Skips user messages that are added in the current turn.assistant: Skips assistant messages that are added in the current turn.output: Skips assistant messages that are generated in the current turn.
Memory summary mechanism
Memory summary mechanism
Set For example, if
memory_entries to N. When the number of unsummarized messages reaches this value, a memory summary is triggered. The summary mechanism works as follows:- The content input to the model in each turn includes the
Profile, the latest summary if available, and the N most recent original messages. - Summary generation and the model response execute asynchronously and incur model invocation billing charges. The summary is generated by the
qwen-plus-charactermodel.
User_Message_XandAssistant_Message_Xrepresent the user input and assistant response for conversation turn X, respectively.- Summaries consolidate key persona and temporal information but do not retain all text details.
- Summaries are treated as model input and cannot be queried.
memory_entries is set to 3:| Conversation turn | User input | Input to model | Involved in summary generation |
|---|---|---|---|
| Turn 1 | Profile (persona information), User_Message_1 | Profile (persona information) + User_Message_1 | None |
| Turn 2 | Profile (persona information), User_Message_2 | Profile (persona information) + User_Message_1 + Assistant_Message_1 + User_Message_2 | User_Message_1 + Assistant_Message_1 + User_Message_2 generates Summary_1 |
| Turn 3 | Profile (persona information), User_Message_3 | Profile (persona information) + Summary_1 + User_Message_2 + Assistant_Message_2 + User_Message_3 | None |
| Turn 4 | Profile (persona information), User_Message_4 | Profile (persona information) + Summary_1 + User_Message_3 + Assistant_Message_3 + User_Message_4 | Assistant_Message_2 + User_Message_3 + Assistant_Message_3 + Summary_1 generates Summary_2 |
| Turn 5 | Profile (persona information), User_Message_5 | Profile (persona information) + Summary_2 + User_Message_4 + Assistant_Message_4 + User_Message_5 | User_Message_4 + Assistant_Message_4 + User_Message_5 + Summary_2 generates Summary_3 |
| Turn 6 | Profile (persona information), User_Message_6 | Profile (persona information) + Summary_3 + User_Message_5 + Assistant_Message_5 + User_Message_6 | None |
Example code
- OpenAI compatible
- DashScope
Long-term memory related API parameters
Long-term memory related API parameters
Session cache
Session caching automatically manages context to avoid recalculating tokens, reducing costs and latency without affecting response quality.
How to enable session caching: To enable the cache service, add the x-dashscope-aca-session parameter to the request header and pass a Session ID.
Request header parameter:
x-dashscope-aca-session(required, string) — The unique session identifier from your business system. It distinguishes different sessions. The value is user-defined.
Advanced optimization for session-cached model requests
As the number of conversation turns increases, the messages array grows. This growth can cause the following problems:
- Too many tokens in a single request can affect performance and increase costs.
- A long context can dilute key information.
system message and the 100 most recent conversation records.