Integrate multimodal generation models

Image generation models must be integrated through the extension mechanism (Skill, Slash Command, or Agent) of each tool.

Overview

AI coding tools cannot call image generation models directly through model configuration. You need to integrate them through the extension mechanism of each tool.

Example: Claude Code

The following example demonstrates how to integrate an image generation model in Claude Code using a Slash Command. The integration process is similar for other tools, with differences in the extension mechanism and configuration file path.

Step 1: Create a Slash Command

Create the file .claude/commands/text-to-image.md in your project root directory with the following content:

Call the Token Plan text-to-image API to generate an image based on a description.

User request: $ARGUMENTS

## Steps

1. Extract prompt (image description), model (default: qwen-image-2.0), and size (default: 1024*1024) from the user request.

2. Call the API to generate an image (use the Bash tool to run curl):

```
curl -s -X POST "https://token-plan.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
  -H "Authorization: Bearer $ANTHROPIC_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model>",
    "input": {
      "messages": [{"role":"user","content":[{"text":"<prompt>"}]}]
    },
    "parameters": {"size":"<size>"}
  }'
```

3. Extract the image URL from output.choices[*].message.content[*].image in the response JSON.

4. Download the image to the current directory with curl -s -o "generated_$(date +%Y%m%d_%H%M%S).png" "<URL>".

5. Display the generated image file path to the user.

Step 2: Generate an image

In Claude Code, type /text-to-image draw a cat.

Other tools

The following table shows the extension mechanism and configuration file path for each tool. Save the same content from the Claude Code example above to the corresponding path.

Tool	Extension mechanism	Configuration file path
Claude Code	Slash Command	`.claude/commands/text-to-image.md`
Codex	Skill	`~/.codex/skills/token-plan-image/SKILL.md`
Qwen Code	Skill	`~/.qwen/skills/text-to-image/SKILL.md`
OpenCode	Agent	`.opencode/agents/text-to-image.md`
OpenClaw	Skill	`~/.openclaw/workspace/skills/token-plan-image/SKILL.md`
Hermes Agent	Skill	`~/.hermes/skills/media/text-to-image/SKILL.md`

Skill-based tools (Codex, Qwen Code, OpenClaw, Hermes Agent) require YAML front matter at the beginning of the configuration file:

---
name: "token-plan-image"
description: "Call the Token Plan text-to-image model to generate images from text descriptions. Activates when the user asks to draw or generate images."
---

(... same content as the Claude Code example above ...)

OpenCode Agent requires a different front matter format:

---
description: "Call the Token Plan text-to-image model to generate images from text descriptions."
mode: subagent
tools:
  bash: true
  write: false
  edit: false
---

(... same content as the Claude Code example above ...)

​Overview

​Example: Claude Code

​Step 1: Create a Slash Command

​Step 2: Generate an image

​Other tools

Overview

Example: Claude Code

Step 1: Create a Slash Command

Step 2: Generate an image

Other tools