Audio (TTS & STT)

Text-to-Speech (TTS)

POST /v1/audio/speech

Request

{
  "model": "model-id",
  "input": "Hello, welcome to the Yunxin API platform.",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	TTS model ID
`input`	string	Yes	Text to synthesize (max 4096 chars)
`voice`	string	Yes	Voice to use
`response_format`	string	No	`mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
`speed`	number	No	Speed factor (0.25–4.0)

Available Voices

Voice	Description
`alloy`	Neutral
`echo`	Male
`fable`	British
`onyx`	Deep male
`nova`	Female
`shimmer`	Soft female

Example

response = client.audio.speech.create(
    model="model-id",
    voice="nova",
    input="Welcome to Yunxin, your unified AI API gateway."
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

Speech-to-Text (STT)

POST /v1/audio/transcriptions

Request

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="model-id",
        file=audio_file,
        language="en"
    )

print(transcript.text)

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	STT model ID
`file`	file	Yes	Audio file (mp3, wav, m4a, etc.)
`language`	string	No	ISO-639-1 language code
`response_format`	string	No	`json`, `text`, `srt`, `verbose_json`, `vtt`
`temperature`	number	No	Sampling temperature

Translation

POST /v1/audio/translations

Translate audio to English:

with open("chinese_audio.mp3", "rb") as audio_file:
    translation = client.audio.translations.create(
        model="model-id",
        file=audio_file
    )

print(translation.text)  # English translation

Audio Models

For a list of available audio models and their capabilities, please use the Models API with GET /v1/models?type=audio.

Text-to-Speech (TTS)

POST /v1/audio/speech

Request

{
  "model": "model-id",
  "input": "Hello, welcome to the Yunxin API platform.",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1.0
}

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	TTS model ID
`input`	string	Yes	Text to synthesize (max 4096 chars)
`voice`	string	Yes	Voice to use
`response_format`	string	No	`mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`
`speed`	number	No	Speed factor (0.25–4.0)

Available Voices

Voice	Description
`alloy`	Neutral
`echo`	Male
`fable`	British
`onyx`	Deep male
`nova`	Female
`shimmer`	Soft female

Example

response = client.audio.speech.create(
    model="model-id",
    voice="nova",
    input="Welcome to Yunxin, your unified AI API gateway."
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

Speech-to-Text (STT)

POST /v1/audio/transcriptions

Request

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="model-id",
        file=audio_file,
        language="en"
    )

print(transcript.text)

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	STT model ID
`file`	file	Yes	Audio file (mp3, wav, m4a, etc.)
`language`	string	No	ISO-639-1 language code
`response_format`	string	No	`json`, `text`, `srt`, `verbose_json`, `vtt`
`temperature`	number	No	Sampling temperature

Translation

POST /v1/audio/translations

Translate audio to English:

with open("chinese_audio.mp3", "rb") as audio_file:
    translation = client.audio.translations.create(
        model="model-id",
        file=audio_file
    )

print(translation.text)  # English translation

Audio Models

For a list of available audio models and their capabilities, please use the Models API with GET /v1/models?type=audio.

Text-to-Speech (TTS)

Request

Parameters

Available Voices

Example

Speech-to-Text (STT)

Request

Parameters

Translation

Audio Models

On this page

Audio (TTS & STT)

Text-to-Speech (TTS)

Request

Parameters

Available Voices

Example

Speech-to-Text (STT)

Request

Parameters

Translation

Audio Models

On this page