Reasoning Models

Native support for thinking and reasoning models with extended reasoning capabilities.

Overview

Yunxin provides first-class support for reasoning/thinking models. Unlike bolt-on solutions, reasoning is a core primitive in Yunxin's architecture.

Reasoning support is not a plugin — it's built into the core API adapter layer for each provider.

How Reasoning Works

Reasoning models produce an internal "thinking" process before generating the final answer. This thinking process may be:

Visible — Returned as reasoning_content in the response
Hidden — Used internally but not exposed
Configurable — Extended thinking can be toggled via parameters

Using Reasoning Models

POST /v1/chat/completions

Basic Usage

response = client.chat.completions.create(
    model="model-id",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ]
)

# Access reasoning content (if available)
if hasattr(response.choices[0].message, 'reasoning_content'):
    thinking = response.choices[0].message.reasoning_content
    print(f"Thinking process:\n{thinking}")

answer = response.choices[0].message.content
print(f"Answer:\n{answer}")

Extended Thinking

Some models support extended thinking with configurable token budgets:

response = client.chat.completions.create(
    model="model-id",
    messages=[
        {"role": "user", "content": "Analyze the complexity of the traveling salesman problem."}
    ],
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10000
        }
    }
)

Some reasoning models have specific constraints and may not support certain parameters like temperature, top_p, or system messages. Yunxin automatically adapts these parameters per model.

Streaming with Reasoning

When streaming reasoning models, thinking tokens are delivered before the final answer:

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "What is 99 * 97?"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta

    # Reasoning tokens
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        print(f"💭 {delta.reasoning_content}", end="")

    # Final answer tokens
    if delta.content:
        print(f"💬 {delta.content}", end="")

Available Reasoning Models

Use the GET /v1/models endpoint with capability=thinking to list all available reasoning models.

Overview

Yunxin provides first-class support for reasoning/thinking models. Unlike bolt-on solutions, reasoning is a core primitive in Yunxin's architecture.

Reasoning support is not a plugin — it's built into the core API adapter layer for each provider.

How Reasoning Works

Reasoning models produce an internal "thinking" process before generating the final answer. This thinking process may be:

Visible — Returned as reasoning_content in the response
Hidden — Used internally but not exposed
Configurable — Extended thinking can be toggled via parameters

Using Reasoning Models

POST /v1/chat/completions

Basic Usage

response = client.chat.completions.create(
    model="model-id",
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ]
)

# Access reasoning content (if available)
if hasattr(response.choices[0].message, 'reasoning_content'):
    thinking = response.choices[0].message.reasoning_content
    print(f"Thinking process:\n{thinking}")

answer = response.choices[0].message.content
print(f"Answer:\n{answer}")

Extended Thinking

Some models support extended thinking with configurable token budgets:

response = client.chat.completions.create(
    model="model-id",
    messages=[
        {"role": "user", "content": "Analyze the complexity of the traveling salesman problem."}
    ],
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10000
        }
    }
)

Some reasoning models have specific constraints and may not support certain parameters like temperature, top_p, or system messages. Yunxin automatically adapts these parameters per model.

Streaming with Reasoning

When streaming reasoning models, thinking tokens are delivered before the final answer:

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "What is 99 * 97?"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta

    # Reasoning tokens
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        print(f"💭 {delta.reasoning_content}", end="")

    # Final answer tokens
    if delta.content:
        print(f"💬 {delta.content}", end="")

Available Reasoning Models

Use the GET /v1/models endpoint with capability=thinking to list all available reasoning models.

Overview

How Reasoning Works

Using Reasoning Models

Basic Usage

Extended Thinking

Streaming with Reasoning

Available Reasoning Models

On this page

Reasoning Models

Overview

How Reasoning Works

Using Reasoning Models

Basic Usage

Extended Thinking

Streaming with Reasoning

Available Reasoning Models

On this page