Kimi - Qwen Cloud

This guide shows how to call the Kimi K2.7 Code model via the OpenAI-compatible API or DashScope SDK.

Quick start

kimi-k2.7-code is the most capable Kimi model for coding. It follows long-context instructions more reliably and achieves higher success rates on programming tasks. Supports text, image, and video input, thinking mode, conversation, and agent tasks. kimi-k2.7-code is a thinking-only model: thinking mode is always enabled (enable_thinking defaults to true and cannot be disabled), and preserve_thinking defaults to true. Before you begin, get an API key and set it as an environment variable. If you call the model through an SDK, install the OpenAI or DashScope SDK.

OpenAI compatible
DashScope

The enable_thinking parameter is not part of the standard OpenAI API. In the OpenAI Python SDK, pass it through extra_body. In the Node.js SDK, pass it as a top-level parameter.

Python
Node.js
curl

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
  model="kimi-k2.7-code",
  messages=[{"role": "user", "content": "Who are you?"}],
  stream=True,
)

reasoning_content = ""
answer_content = ""
is_answering = False
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in completion:
  if chunk.choices:
    delta = chunk.choices[0].delta
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
      if not is_answering:
        print(delta.reasoning_content, end="", flush=True)
      reasoning_content += delta.reasoning_content
    if hasattr(delta, "content") and delta.content:
      if not is_answering:
        print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
        is_answering = True
      print(delta.content, end="", flush=True)
      answer_content += delta.content

import OpenAI from "openai";
import process from "process";

const openai = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

const stream = await openai.chat.completions.create({
  model: "kimi-k2.7-code",
  messages: [{role: "user", content: "Who are you?"}],
  stream: true,
});

let reasoningContent = "";
let answerContent = "";
let isAnswering = false;

for await (const chunk of stream) {
  if (chunk.choices && chunk.choices.length > 0) {
    const delta = chunk.choices[0].delta;
    if (delta.reasoning_content) {
      if (!isAnswering) {
        process.stdout.write(delta.reasoning_content);
      }
      reasoningContent += delta.reasoning_content;
    }
    if (delta.content) {
      if (!isAnswering) {
        console.log("\n" + "=".repeat(20) + "Complete Response" + "=".repeat(20) + "\n");
        isAnswering = true;
      }
      process.stdout.write(delta.content);
      answerContent += delta.content;
    }
  }
}

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "kimi-k2.7-code",
  "messages": [
    {"role": "user", "content": "Who are you?"}
  ],
  "stream": true
}'

Python
Java
curl

import os
import dashscope

dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

response = dashscope.Generation.call(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  model="kimi-k2.7-code",
  messages=[{"role": "user", "content": "Who are you?"}],
  result_format="message",
  stream=True,
  incremental_output=True,
)

reasoning_content = ""
answer_content = ""
is_answering = False
print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

for chunk in response:
  if chunk.status_code == 200:
    message = chunk.output.choices[0].message
    if hasattr(message, "reasoning_content") and message.reasoning_content:
      if not is_answering:
        print(message.reasoning_content, end="", flush=True)
      reasoning_content += message.reasoning_content
    if message.content:
      if not is_answering:
        print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
        is_answering = True
      print(message.content, end="", flush=True)
      answer_content += message.content

import java.util.Arrays;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.protocol.Protocol;
import io.reactivex.Flowable;

public class Main {
  public static void main(String[] args) {
    Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
    Message userMsg = Message.builder()
        .role(Role.USER.getValue())
        .content("Who are you?")
        .build();
    GenerationParam param = GenerationParam.builder()
        .model("kimi-k2.7-code")
        .messages(Arrays.asList(userMsg))
        .resultFormat(GenerationParam.ResultFormat.MESSAGE)
        .incrementalOutput(true)
        .build();
    Flowable<GenerationResult> result = gen.streamCall(param);
    StringBuilder reasoningContent = new StringBuilder();
    StringBuilder answerContent = new StringBuilder();
    result.blockingForEach(chunk -> {
      String reasoning = chunk.getOutput().getChoices().get(0).getMessage().getReasoningContent();
      String content = chunk.getOutput().getChoices().get(0).getMessage().getContent();
      if (reasoning != null && !reasoning.isEmpty()) {
        reasoningContent.append(reasoning);
        System.out.print(reasoning);
      }
      if (content != null && !content.isEmpty()) {
        answerContent.append(content);
        System.out.print(content);
      }
    });
  }
}

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-DashScope-SSE: enable" \
  -d '{
  "model": "kimi-k2.7-code",
  "input": {
    "messages": [
      {"role": "user", "content": "Who are you?"}
    ]
  },
  "parameters": {
    "result_format": "message",
    "incremental_output": true
  }
}'

Supported features

Feature	kimi-k2.7-code
Multi-turn conversation	✓
Deep thinking	✓ (always on)
Function calling	✓
Structured output	—
Web search	—
Context cache	✓

Parameter defaults

Parameter	kimi-k2.7-code
enable_thinking	true (thinking mode only)
temperature	1.0
top_p	0.95
presence_penalty	0.0

Models and billing

The Kimi series are large language models from Moonshot AI.

kimi-k2.7-code: The most capable Kimi model for coding. It follows long-context instructions more reliably and achieves higher success rates on programming tasks. Supports text, image, and video input, thinking mode, conversation, and agent tasks.

For pricing and context window details, see the Qwen Cloud console. Billing is based on input and output token counts.

In thinking mode, the chain of thought counts as output tokens.

Error codes

If a model call fails and returns an error message, see Error codes.

​Quick start

​Supported features

​Parameter defaults

​Models and billing

​Error codes

Quick start

Supported features

Parameter defaults

Models and billing

Error codes