Add visual understanding capabilities

Models like qwen3.6-plus, qwen3.5-plus, and kimi-k2.5 support image understanding natively. Text-only models (glm-5, MiniMax-M2.5) need a local skill for visual capabilities.

Image understanding skills consume Coding Plan quota. No additional charges apply.

Prerequisites

Subscribe to Coding Plan. See Getting started.
Set up Coding Plan. See Clients and developer tools for your tool's setup guide.

Visual support status

Native support: qwen3.6-plus, qwen3.5-plus, kimi-k2.5 — pass images directly, no configuration needed. Via skill/agent: qwen3-max-2026-01-23, qwen3-coder-next, qwen3-coder-plus, glm-5, glm-4.7, MiniMax-M2.5 — add a skill or agent for visual capabilities.

Method 1: Use a visual model directly (recommended)

Switch to these models if you frequently work with images.

Tool	How to switch
Claude Code	`/model qwen3.6-plus`, `/model qwen3.5-plus`, or `/model kimi-k2.5`
OpenCode	`/models` then search for and select `qwen3.6-plus`, `qwen3.5-plus`, or `kimi-k2.5`
Qwen Code	`/model` then select `qwen3.6-plus`, `qwen3.5-plus`, or `kimi-k2.5`

Qwen Code using OpenAI-compatible API doesn't support image input. For image understanding tasks, use Claude Code or OpenCode instead.

Reference image paths directly or drag images into conversations.

Method 2: Add visual capabilities using a skill or agent

For text-only models (glm-5, MiniMax-M2.5), configure a skill or agent.

Claude Code
OpenCode

Add a skill

Create a skills/image-analyzer folder in the .claude directory:

mkdir -p .claude/skills/image-analyzer

Create SKILL.md:

---
name: image-analyzer
description: Analyzes images for text-only models. Use when you need to extract information from screenshots, charts, diagrams, or any visual content. Pass the image path.
model: qwen3.6-plus
---
qwen3.6-plus has visual understanding capabilities. Use the qwen3.6-plus model directly for image understanding.

Folder structure:

.claude/
└── skills/
  └── image-analyzer/
    └── SKILL.md

Getting started

Start Claude Code in your project directory. Switch to glm-5 with /model glm-5.
Place an image in your project directory, then ask: Load the image-analyzer skill and describe the information in <your-image>.

Add an agent

Create an agents folder in the .opencode directory:

mkdir -p .opencode/agents

Create image-analyzer.md:

Use the provider and model name from your opencode.json. Example: qwen-cloud-coding-plan.qwen3.6-plus (see OpenCode setup).

---
description: Analyzes images with a vision-capable model. Use when extracting information from screenshots, diagrams, UI mockups, or visual content. Invoke with @image-analyzer, the image path, and your question.
mode: subagent
model: qwen-cloud-coding-plan.qwen3.6-plus
tools:
  write: false
  edit: false
---
Analyze the provided image and return a clear description focused on the user's question.

Folder structure:

.opencode/
└── agents/
  └── image-analyzer.md

Getting started

Start OpenCode in your project directory and switch to glm-5.
Place an image in your project directory, then ask: @image-analyzer describe the information in <your-image>.

FAQ

Why doesn't OpenCode + qwen3.6-plus understand images?

OpenCode doesn't enable visual capabilities by default. Add modalities to each model entry in opencode.json and set input to ["text", "image"]:

Replace sk-sp-xxx with your API key.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "qwen-cloud-coding-plan": {
      "npm": "@ai-sdk/anthropic",
      "name": "Qwen Cloud Coding Plan",
      "options": {
        "baseURL": "https://coding-intl.dashscope.aliyuncs.com/apps/anthropic/v1",
        "apiKey": "sk-sp-xxx"
      },
      "models": {
        "qwen3.6-plus": {
          "name": "Qwen3.6 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "qwen3.5-plus": {
          "name": "Qwen3.5 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "kimi-k2.5": {
          "name": "Kimi K2.5",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        }
      }
    }
  }
}

Why doesn't OpenClaw + qwen3.6-plus understand images?

OpenClaw determines visual support based on the input field in the model definition. In ~/.openclaw/openclaw.json, ensure each vision model includes "input": ["text", "image"].

{
  "models": {
    "mode": "merge",
    "providers": {
      "bailian": {
        "baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
        "apiKey": "sk-sp-xxx",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.6-plus",
            "name": "qwen3.6-plus",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 1000000,
            "maxTokens": 65536
          },
          {
            "id": "qwen3.5-plus",
            "name": "qwen3.5-plus",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 1000000,
            "maxTokens": 65536
          },
          {
            "id": "kimi-k2.5",
            "name": "kimi-k2.5",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 262144,
            "maxTokens": 32768
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "bailian/qwen3.6-plus"
      },
      "models": {
        "bailian/qwen3.6-plus": {},
        "bailian/qwen3.5-plus": {},
        "bailian/kimi-k2.5": {}
      }
    }
  },
  "gateway": {
    "mode": "local"
  }
}

Clear the model cache and restart OpenClaw:

rm ~/.openclaw/agents/main/agent/models.json
openclaw gateway restart

​Prerequisites

​Visual support status

​Method 1: Use a visual model directly (recommended)

​Method 2: Add visual capabilities using a skill or agent

Add a skill

Getting started

​FAQ

Prerequisites

Visual support status

Method 1: Use a visual model directly (recommended)

Method 2: Add visual capabilities using a skill or agent

FAQ