Realtime API
Low-latency bidirectional communication via WebSocket.
Overview
The Realtime API provides WebSocket-based bidirectional communication for applications requiring ultra-low latency, such as voice assistants and interactive agents.
Connection
wss://api.yuhuanstudio.com/v1/realtime
Authentication
Pass the API key as a query parameter or in the initial handshake:
const ws = new WebSocket(
"wss://api.yuhuanstudio.com/v1/realtime?api_key=YOUR_API_KEY"
);Message Protocol
All messages are JSON-encoded:
Send Input
{
"type": "input_audio_buffer.append",
"audio": "<base64-encoded-audio>"
}Receive Output
{
"type": "response.audio.delta",
"delta": "<base64-encoded-audio-chunk>"
}Example: Voice Chat
const ws = new WebSocket(
"wss://api.yuhuanstudio.com/v1/realtime?api_key=YOUR_API_KEY"
);
ws.onopen = () => {
// Configure session
ws.send(JSON.stringify({
type: "session.update",
session: {
model: "model-id",
modalities: ["text", "audio"],
voice: "alloy"
}
}));
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch (message.type) {
case "response.text.delta":
process.stdout.write(message.delta);
break;
case "response.audio.delta":
// Play audio chunk
playAudio(message.delta);
break;
case "response.done":
console.log("\n--- Response complete ---");
break;
}
};Event Types
Client Events
| Event | Description |
|---|---|
session.update | Configure session parameters |
input_audio_buffer.append | Send audio data |
input_audio_buffer.commit | Finalize audio input |
conversation.item.create | Send text message |
response.create | Trigger response generation |
Server Events
| Event | Description |
|---|---|
session.created | Session initialized |
response.text.delta | Partial text response |
response.audio.delta | Partial audio response |
response.done | Response complete |
error | Error occurred |
The Realtime API requires models that support real-time interaction. Check model capabilities for realtime support.
How is this guide?