Media Models

Audio Speech (TTS)

POST /audio/speech matches the OpenAI Text-to-Speech spec — the official openai SDKs work unchanged with just a base URL swap. Accepts model, input, and an optional voice; returns raw audio bytes (MP3, 32 kHz, 128 kbps, mono).

Quickstart

tts.py
#60a5fa]">from openai import OpenAI

client = OpenAI(
    api_key=#60a5fa]">class="text-emerald-400">"tl-xxxxxxxxxxxxxxxxxxxxxxxx",
    base_url=#60a5fa]">class="text-emerald-400">"https://api.thalam.ai/v1",
)

response = client.audio.speech.create(
    model=#60a5fa]">class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
    input=#60a5fa]">class="text-emerald-400">"Welcome to Thalam, your unified gateway to AI models.",
    voice=#60a5fa]">class="text-emerald-400">"English_Graceful_Lady",
)

# response.content is the raw audio bytes (mp3 by default).
with open(#60a5fa]">class="text-emerald-400">"out.mp3", class="text-emerald-400">"wb") as f:
    f.write(response.content)
tts.ts
#60a5fa]">import OpenAI from class="text-emerald-400">"openai";
#60a5fa]">import fs from class="text-emerald-400">"node:fs";

#60a5fa]">const client = new OpenAI({
  apiKey: process.env.THALAM_KEY,
  baseURL: #60a5fa]">class="text-emerald-400">"https://api.thalam.ai/v1",
});

#60a5fa]">const response = await client.audio.speech.create({
  model: #60a5fa]">class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
  input: #60a5fa]">class="text-emerald-400">"Welcome to Thalam, your unified gateway to AI models.",
  voice: #60a5fa]">class="text-emerald-400">"English_Graceful_Lady",
});

#60a5fa]">const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync(#60a5fa]">class="text-emerald-400">"out.mp3", buffer);
curl
curl https://api.thalam.ai/v1/audio/speech \
  -H #60a5fa]">class="text-emerald-400">"Authorization: Bearer tl-xxxxxxxxxxxxxxxxxxxxxxxx" \
  -H #60a5fa]">class="text-emerald-400">"Content-Type: application/json" \
  -d '{
    #60a5fa]">class="text-emerald-400">"model": class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
    #60a5fa]">class="text-emerald-400">"input": class="text-emerald-400">"Welcome to Thalam.",
    #60a5fa]">class="text-emerald-400">"voice": class="text-emerald-400">"English_Graceful_Lady"
  }' \
  --output out.mp3

Request body

FieldTypeRequiredDefaultRangeDescription
modelstringyessee model tableTTS model id from the catalog below.
inputstringyes1 char – per-model maxText to synthesize. Max length is per-model — see the model table for limits.
voicestringoptionalmodel's defaultsee voice catalogVoice id from the model's catalog. Omit to use the model's default voice.

Not yet supported. The OpenAI spec also defines response_format (mp3 / opus / aac / flac / wav / pcm) and speed (0.25 – 4.0). Our gateway accepts these fields but ignores them today — audio is always returned as MP3 at 1.0× speed. MiniMax-specific upstream params (pitch, volume, emotion, language_boost, sample_rate, bitrate) are also silently dropped. Wiring these through is on the roadmap — let us know if you need any of them sooner.

Response

The response body is the raw audio file. No JSON envelope.

HeaderExampleNotes
Content-Typeaudio/mpegAlways MP3 today. Format-control roadmap above.
Content-Length47291Total byte length of the audio body.
x-upstream-request-id019e16f4581a7b45b266421dd63e0997Upstream trace id. Paste in a support ticket if generation fails.
X-RateLimit-Remaining58Requests left in the current 60-second window.

Streaming: stream: true is not currently supported on /audio/speech; the gateway waits for the upstream to finish and returns the full file in one response. For typical sentence-length inputs this is ~1–4 seconds end-to-end.

Available models

Model IDMax inputPriceDefault voiceNotes
minimax/minimax-speech-2.8-hd10,000 chars$100 / 1M charsWise_WomanHighest quality, sync. Best for short-to-medium audio.
minimax/minimax-speech-2.8-turbo10,000 chars$60 / 1M charsWise_WomanLower latency, same voice catalog as HD.
minimax/minimax-speech-2.8-hd-async50,000 chars$100 / 1M charsWise_WomanLong-form. Async upstream; gateway waits and returns inline.
elevenlabs/eleven-v3per ElevenLabs$0.12 / minute21m00Tcm4TlvDq8ikWAM (Rachel)Pass any ElevenLabs voice_id in the voice field.
fish-audio/fish-ttsper Fish Audio$15 / 1M chars(Fish default)Multilingual. Pass a Fish reference_id via the voice field for cloned voices.

Live rates are also on the .

MiniMax voice catalog

All three MiniMax models share the same voice IDs. Pass any of these as the voice field, grouped here by language:

English

English_Graceful_Lady
English_Insightful_Speaker
English_radiant_girl

Chinese / Japanese

Chinese (Mandarin)_Lyrical_Voice
Japanese_Whisper_Belle

Default (verified end-to-end)

Wise_Woman — used when voice is omitted

The English / Chinese / Japanese voices are listed in MiniMax's official voice catalog. Custom cloned voice IDs from your MiniMax account also work — the gateway forwards voice through unchanged.

Common errors

StatusWhat it meansFix
400Missing model / input, or voice not recognized by upstreamConfirm the model id is in the catalog above and the voice id matches that model.
402Insufficient balanceTop up in the dashboard.
413Input text exceeds the per-model maxSplit into multiple calls or switch to the async HD variant (50,000 chars).
429Account-level rate limit (60 req/min)Slow down or contact us for a higher limit.
502Upstream returned an unexpected responseRetry. If persistent, paste x-upstream-request-id in a support ticket.
504Upstream took too longRetry, or use the async HD variant for long inputs.

Want to try it in your browser? Open the , pick a TTS model from the dropdown, type a sentence, and hit Send. No code required.