Audio (TTS & STT)
Text-to-speech synthesis and speech-to-text transcription.
Text-to-Speech (TTS)
POST /v1/audio/speechRequest
{
"model": "model-id",
"input": "Hello, welcome to the Yunxin API platform.",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | TTS model ID |
input | string | Yes | Text to synthesize (max 4096 chars) |
voice | string | Yes | Voice to use |
response_format | string | No | mp3, opus, aac, flac, wav, pcm |
speed | number | No | Speed factor (0.25–4.0) |
Available Voices
| Voice | Description |
|---|---|
alloy | Neutral |
echo | Male |
fable | British |
onyx | Deep male |
nova | Female |
shimmer | Soft female |
Example
response = client.audio.speech.create(
model="model-id",
voice="nova",
input="Welcome to Yunxin, your unified AI API gateway."
)
with open("output.mp3", "wb") as f:
f.write(response.content)Speech-to-Text (STT)
POST /v1/audio/transcriptionsRequest
with open("recording.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="model-id",
file=audio_file,
language="en"
)
print(transcript.text)Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | STT model ID |
file | file | Yes | Audio file (mp3, wav, m4a, etc.) |
language | string | No | ISO-639-1 language code |
response_format | string | No | json, text, srt, verbose_json, vtt |
temperature | number | No | Sampling temperature |
Translation
POST /v1/audio/translationsTranslate audio to English:
with open("chinese_audio.mp3", "rb") as audio_file:
translation = client.audio.translations.create(
model="model-id",
file=audio_file
)
print(translation.text) # English translationAudio Models
For a list of available audio models and their capabilities, please use the Models API with GET /v1/models?type=audio.
How is this guide?