Image search - Qwen Cloud

The Responses API provides two built-in image search tools: text-to-image search finds images matching a text description, and image-to-image search finds visually similar images from an input image. Both tools return a JSON array of results and a model-generated analysis. These tools are only available through the Responses API.

Text-to-image search

Search the internet for images that match a text description, then let the model describe and reason about them. Pass {"type": "web_search_image"} in the tools parameter -- the model decides when to search based on the input.

Example

import os
import json
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

response = client.responses.create(
  model="qwen3.7-plus",
  input="Find a tech-themed background image suitable for a PowerPoint cover",
  tools=[
    {
      "type": "web_search_image"
    }
  ]
)

for item in response.output:
  if item.type == "web_search_image_call":
    print(f"[Tool call] Text-to-image search (status: {item.status})")
    if item.output:
      images = json.loads(item.output)
      print(f"  Found {len(images)} images:")
      for img in images[:5]:
        print(f"  [{img['index']}] {img['title']}")
        print(f"      {img['url']}")
      if len(images) > 5:
        print(f"  ... Total {len(images)} images")
  elif item.type == "message":
    print(f"\n[Model response]")
    print(response.output_text)

print(f"\n[Token usage] Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}, Total: {response.usage.total_tokens}")

Sample output:

[Tool call] Text-to-image search (status: completed)
  Found 30 images:
  [1] Best Free Information Technology Background S Google Slides Themes ...
      https://image.slidesdocs.com/responsive-images/slides/0-technology-line-network-information-training-courseware-powerpoint-background_17825ea41f__960_540.jpg
  [2] Data Technology Blue Abstract Business Glow Powerpoint Background ...
      https://image.slidesdocs.com/responsive-images/background/data-technology-blue-abstract-business-glow-powerpoint-background_e667bfafcb__960_540.jpg
  ...

[Model response]
Here are several tech-themed background images perfect for your PowerPoint cover...

[Token usage] Input: 4326, Output: 645, Total: 4971

Response format

The response output array contains two item types:

Item type	Description
`web_search_image_call`	The raw search results as a JSON array. Each object includes `index`, `title`, and `url`.
`message`	The model's analysis and recommendations based on the search results.

Billing

Text-to-image search incurs two types of charges:

The prices listed below are list prices. For current promotions and discounted pricing, visit the Model Marketplace.

Charge type	Details
Model call fees	Image search results are added to the prompt, which increases the input token count. Standard model rates apply. See Pricing for details.
Tool calling fees	$8 per 1,000 calls.

Image-to-image search

Find visually similar images on the internet from an input image, then let the model analyze the results. Pass {"type": "image_search"} in the tools parameter and provide the image using the input_image content type. Optionally, include an input_text message to provide additional search context.

Example

Replace image_url in the example code with a publicly accessible image URL (the OpenAI SDK does not support local file paths).

import os
import json
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

input_content = [
  {"type": "input_text", "text": "Find landscape images with a similar style to this one"},
  {"type": "input_image", "image_url": "https://img.alicdn.com/imgextra/i4/O1CN01YbrnSS1qtmsAkw0Ud_!!6000000005554-2-tps-788-450.png"}
]

response = client.responses.create(
  model="qwen3.7-plus",
  input=[{"role": "user", "content": input_content}],
  tools=[{"type": "image_search"}]
)

for item in response.output:
  if item.type == "image_search_call":
    print(f"[Tool call] Image-to-image search (status: {item.status})")
    if item.output:
      images = json.loads(item.output)
      print(f"  Found {len(images)} images:")
      for img in images[:5]:
        print(f"  [{img['index']}] {img['title']}")
        print(f"      {img['url']}")
      if len(images) > 5:
        print(f"  ... Total {len(images)} images")
  elif item.type == "message":
    print(f"\n[Model response]")
    print(response.output_text)

print(f"\n[Token usage] Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}, Total: {response.usage.total_tokens}")

Sample output:

[Tool call] Image-to-image search (status: completed)
  Found 2 images:
  [1] QingMing Festival Holiday Notice 2024
      https://www.healthcabin.net/blog/wp-content/uploads/2024/04/QingMing-Festival-Holiday-Notice-2024.jpg
  [2] Serene Asian Landscape Stone Bridge Reflecting in Misty Water
      https://thumbs.dreamstime.com/b/serene-asian-landscape-stone-bridge-reflecting-misty-water-tranquil-illustration-traditional-arch-spanning-lake-style-376972039.jpg

[Model response]
Okay, I have found several landscape images with a similar style for you...

[Token usage] Input: 2753, Output: 181, Total: 2934

Response format

The response output array contains two item types:

Item type	Description
`image_search_call`	The tool call result containing a JSON array of matched images. Each object includes `index`, `title`, and `url`.
`message`	The model's analysis of the search results, accessible through `response.output_text`.

Billing

Image-to-image search incurs two types of charges:

The prices listed below are list prices. For current promotions and discounted pricing, visit the Model Marketplace.

Charge type	Details
Model input tokens	Search results are appended to the prompt, which increases input token count. Billed at the model's standard rate. See Pricing for details.
Tool call fee	$8 per 1,000 calls.

Supported models

Both image search tools support the same set of models.

Model family	Model IDs
Qwen-Max	`qwen3.7-max-2026-06-08`
Qwen-Plus	`qwen3.7-plus`, `qwen3.5-plus`, `qwen3.5-plus-2026-04-20`, `qwen3.5-plus-2026-02-15`
Qwen-Flash	`qwen3.5-flash`, `qwen3.5-flash-2026-02-23`
Open source Qwen	`qwen3.6-35b-a3b`, `qwen3.5-397b-a17b`, `qwen3.5-122b-a10b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`

Streaming

For general streaming concepts (SSE protocol, how to enable streaming, billing, and token usage), see Streaming output. This section covers only the streaming behavior specific to image search.

Image search can take several seconds. Enable streaming to receive results incrementally by setting stream=True (Python) or stream: true (Node.js/curl). The response emits events in the following order:

Event type	Trigger	Action
`response.output_item.added`	Tool call starts	Display a loading indicator.
`response.output_item.done`	Tool call completes	Parse `event.item.output` as JSON to get the image list.
`response.content_part.added`	Model starts responding	Prepare to render streamed text.
`response.output_text.delta`	Model sends a text chunk	Append `event.delta` to the output.
`response.completed`	Full response ready	Read final `usage` statistics.

FAQs

What image formats and input methods are supported?

See Image limits for supported formats and size constraints, and File input methods for how to pass images.

The OpenAI SDK does not support local file path input.

How many images can I pass as input?

The total token count for all images and text must stay within the model's maximum input length. The model searches one image at a time but can invoke the tool multiple times in a single response to cover several images.

The model determines the number of images to search.

How many results does a search return?

The model determines the number of results per search. The count is not fixed, but the maximum is 100 images.

​Text-to-image search

​Example

​Response format

​Billing

​Image-to-image search

​Example

​Response format

​Billing

​Supported models

​Streaming

​FAQs

​What image formats and input methods are supported?

​How many images can I pass as input?

​How many results does a search return?

Text-to-image search

Example

Response format

Billing

Image-to-image search

Example

Response format

Billing

Supported models

Streaming

FAQs

What image formats and input methods are supported?

How many images can I pass as input?

How many results does a search return?