Vision (Multimodal)
Send images alongside text for visual understanding and analysis.
Overview
Vision-capable models can analyze images and answer questions about them. Yunxin supports multimodal inputs through the same Chat Completions API.
Sending Images
Include images in the content array using the image_url type:
{
"model": "model-id",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
}
}
]
}
]
}Image Formats
| Format | Supported |
|---|---|
| URL (HTTPS) | Yes |
| Base64 data URI | Yes |
| Local file path | No (use base64) |
Base64 Encoding
import base64
with open("image.png", "rb") as f:
base64_image = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="model-id",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
]
)Multiple Images
Send multiple images in a single request:
response = client.chat.completions.create(
model="model-id",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Compare these two images."},
{"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
{"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}}
]
}
]
)Image Detail Level
Control the resolution for analysis:
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg",
"detail": "high"
}
}| Detail | Description | Token Usage |
|---|---|---|
low | 512×512 fixed | Lower |
high | Full resolution (up to 2048px) | Higher |
auto | Model decides | Varies |
Vision-Capable Models
Not all models support vision. Use the GET /v1/models endpoint and check for the vision capability to find models that support image inputs.
How is this guide?