Audio Speech (TTS)
POST /audio/speech matches the OpenAI Text-to-Speech spec — the official openai SDKs work unchanged with just a base URL swap. Accepts model, input, and an optional voice; returns raw audio bytes (MP3, 32 kHz, 128 kbps, mono).
Quickstart
#60a5fa]">from openai import OpenAI
client = OpenAI(
api_key=#60a5fa]">class="text-emerald-400">"tl-xxxxxxxxxxxxxxxxxxxxxxxx",
base_url=#60a5fa]">class="text-emerald-400">"https://api.thalam.ai/v1",
)
response = client.audio.speech.create(
model=#60a5fa]">class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
input=#60a5fa]">class="text-emerald-400">"Welcome to Thalam, your unified gateway to AI models.",
voice=#60a5fa]">class="text-emerald-400">"English_Graceful_Lady",
)
# response.content is the raw audio bytes (mp3 by default).
with open(#60a5fa]">class="text-emerald-400">"out.mp3", class="text-emerald-400">"wb") as f:
f.write(response.content)#60a5fa]">import OpenAI from class="text-emerald-400">"openai";
#60a5fa]">import fs from class="text-emerald-400">"node:fs";
#60a5fa]">const client = new OpenAI({
apiKey: process.env.THALAM_KEY,
baseURL: #60a5fa]">class="text-emerald-400">"https://api.thalam.ai/v1",
});
#60a5fa]">const response = await client.audio.speech.create({
model: #60a5fa]">class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
input: #60a5fa]">class="text-emerald-400">"Welcome to Thalam, your unified gateway to AI models.",
voice: #60a5fa]">class="text-emerald-400">"English_Graceful_Lady",
});
#60a5fa]">const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync(#60a5fa]">class="text-emerald-400">"out.mp3", buffer);curl https://api.thalam.ai/v1/audio/speech \
-H #60a5fa]">class="text-emerald-400">"Authorization: Bearer tl-xxxxxxxxxxxxxxxxxxxxxxxx" \
-H #60a5fa]">class="text-emerald-400">"Content-Type: application/json" \
-d '{
#60a5fa]">class="text-emerald-400">"model": class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
#60a5fa]">class="text-emerald-400">"input": class="text-emerald-400">"Welcome to Thalam.",
#60a5fa]">class="text-emerald-400">"voice": class="text-emerald-400">"English_Graceful_Lady"
}' \
--output out.mp3Request body
| Field | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| model | string | yes | — | see model table | TTS model id from the catalog below. |
| input | string | yes | — | 1 char – per-model max | Text to synthesize. Max length is per-model — see the model table for limits. |
| voice | string | optional | model's default | see voice catalog | Voice id from the model's catalog. Omit to use the model's default voice. |
Not yet supported. The OpenAI spec also defines response_format (mp3 / opus / aac / flac / wav / pcm) and speed (0.25 – 4.0). Our gateway accepts these fields but ignores them today — audio is always returned as MP3 at 1.0× speed. MiniMax-specific upstream params (pitch, volume, emotion, language_boost, sample_rate, bitrate) are also silently dropped. Wiring these through is on the roadmap — let us know if you need any of them sooner.
Response
The response body is the raw audio file. No JSON envelope.
| Header | Example | Notes |
|---|---|---|
| Content-Type | audio/mpeg | Always MP3 today. Format-control roadmap above. |
| Content-Length | 47291 | Total byte length of the audio body. |
| x-upstream-request-id | 019e16f4581a7b45b266421dd63e0997 | Upstream trace id. Paste in a support ticket if generation fails. |
| X-RateLimit-Remaining | 58 | Requests left in the current 60-second window. |
Streaming: stream: true is not currently supported on /audio/speech; the gateway waits for the upstream to finish and returns the full file in one response. For typical sentence-length inputs this is ~1–4 seconds end-to-end.
Available models
| Model ID | Max input | Price | Default voice | Notes |
|---|---|---|---|---|
| minimax/minimax-speech-2.8-hd | 10,000 chars | $100 / 1M chars | Wise_Woman | Highest quality, sync. Best for short-to-medium audio. |
| minimax/minimax-speech-2.8-turbo | 10,000 chars | $60 / 1M chars | Wise_Woman | Lower latency, same voice catalog as HD. |
| minimax/minimax-speech-2.8-hd-async | 50,000 chars | $100 / 1M chars | Wise_Woman | Long-form. Async upstream; gateway waits and returns inline. |
| elevenlabs/eleven-v3 | per ElevenLabs | $0.12 / minute | 21m00Tcm4TlvDq8ikWAM (Rachel) | Pass any ElevenLabs voice_id in the voice field. |
| fish-audio/fish-tts | per Fish Audio | $15 / 1M chars | (Fish default) | Multilingual. Pass a Fish reference_id via the voice field for cloned voices. |
Live rates are also on the .
MiniMax voice catalog
All three MiniMax models share the same voice IDs. Pass any of these as the voice field, grouped here by language:
English
English_Graceful_Lady
English_Insightful_Speaker
English_radiant_girl
Chinese / Japanese
Chinese (Mandarin)_Lyrical_Voice
Japanese_Whisper_Belle
Default (verified end-to-end)
Wise_Woman — used when voice is omitted
The English / Chinese / Japanese voices are listed in MiniMax's official voice catalog. Custom cloned voice IDs from your MiniMax account also work — the gateway forwards voice through unchanged.
Common errors
| Status | What it means | Fix |
|---|---|---|
| 400 | Missing model / input, or voice not recognized by upstream | Confirm the model id is in the catalog above and the voice id matches that model. |
| 402 | Insufficient balance | Top up in the dashboard. |
| 413 | Input text exceeds the per-model max | Split into multiple calls or switch to the async HD variant (50,000 chars). |
| 429 | Account-level rate limit (60 req/min) | Slow down or contact us for a higher limit. |
| 502 | Upstream returned an unexpected response | Retry. If persistent, paste x-upstream-request-id in a support ticket. |
| 504 | Upstream took too long | Retry, or use the async HD variant for long inputs. |
Want to try it in your browser? Open the , pick a TTS model from the dropdown, type a sentence, and hit Send. No code required.