Media Models

Audio Speech (TTS)

POST /audio/speech matches the OpenAI Text-to-Speech spec — the official openai SDKs work unchanged with just a base URL swap. Accepts model, input, and an optional voice; returns raw audio bytes (MP3, 32 kHz, 128 kbps, mono).

Quickstart

tts.py

#60a5fa]">from openai import OpenAI

client = OpenAI(
    api_key=#60a5fa]">class="text-emerald-400">"tl-xxxxxxxxxxxxxxxxxxxxxxxx",
    base_url=#60a5fa]">class="text-emerald-400">"https://api.thalam.ai/v1",
)

response = client.audio.speech.create(
    model=#60a5fa]">class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
    input=#60a5fa]">class="text-emerald-400">"Welcome to Thalam, your unified gateway to AI models.",
    voice=#60a5fa]">class="text-emerald-400">"English_Graceful_Lady",
)

# response.content is the raw audio bytes (mp3 by default).
with open(#60a5fa]">class="text-emerald-400">"out.mp3", class="text-emerald-400">"wb") as f:
    f.write(response.content)

tts.ts

#60a5fa]">import OpenAI from class="text-emerald-400">"openai";
#60a5fa]">import fs from class="text-emerald-400">"node:fs";

#60a5fa]">const client = new OpenAI({
  apiKey: process.env.THALAM_KEY,
  baseURL: #60a5fa]">class="text-emerald-400">"https://api.thalam.ai/v1",
});

#60a5fa]">const response = await client.audio.speech.create({
  model: #60a5fa]">class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
  input: #60a5fa]">class="text-emerald-400">"Welcome to Thalam, your unified gateway to AI models.",
  voice: #60a5fa]">class="text-emerald-400">"English_Graceful_Lady",
});

#60a5fa]">const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync(#60a5fa]">class="text-emerald-400">"out.mp3", buffer);

curl

curl https://api.thalam.ai/v1/audio/speech \
  -H #60a5fa]">class="text-emerald-400">"Authorization: Bearer tl-xxxxxxxxxxxxxxxxxxxxxxxx" \
  -H #60a5fa]">class="text-emerald-400">"Content-Type: application/json" \
  -d '{
    #60a5fa]">class="text-emerald-400">"model": class="text-emerald-400">"minimax/minimax-speech-2.8-hd",
    #60a5fa]">class="text-emerald-400">"input": class="text-emerald-400">"Welcome to Thalam.",
    #60a5fa]">class="text-emerald-400">"voice": class="text-emerald-400">"English_Graceful_Lady"
  }' \
  --output out.mp3

Request body

Field	Type	Required	Default	Range	Description
model	string	yes	—	see model table	TTS model id from the catalog below.
input	string	yes	—	1 char – per-model max	Text to synthesize. Max length is per-model — see the model table for limits.
voice	string	optional	model's default	see voice catalog	Voice id from the model's catalog. Omit to use the model's default voice.

Not yet supported. The OpenAI spec also defines response_format (mp3 / opus / aac / flac / wav / pcm) and speed (0.25 – 4.0). Our gateway accepts these fields but ignores them today — audio is always returned as MP3 at 1.0× speed. MiniMax-specific upstream params (pitch, volume, emotion, language_boost, sample_rate, bitrate) are also silently dropped. Wiring these through is on the roadmap — let us know if you need any of them sooner.

Response

The response body is the raw audio file. No JSON envelope.

Header	Example	Notes
Content-Type	audio/mpeg	Always MP3 today. Format-control roadmap above.
Content-Length	47291	Total byte length of the audio body.
x-upstream-request-id	019e16f4581a7b45b266421dd63e0997	Upstream trace id. Paste in a support ticket if generation fails.
X-RateLimit-Remaining	58	Requests left in the current 60-second window.

Streaming: stream: true is not currently supported on /audio/speech; the gateway waits for the upstream to finish and returns the full file in one response. For typical sentence-length inputs this is ~1–4 seconds end-to-end.

Available models

Model ID	Max input	Price	Default voice	Notes
minimax/minimax-speech-2.8-hd	10,000 chars	$100 / 1M chars	Wise_Woman	Highest quality, sync. Best for short-to-medium audio.
minimax/minimax-speech-2.8-turbo	10,000 chars	$60 / 1M chars	Wise_Woman	Lower latency, same voice catalog as HD.
minimax/minimax-speech-2.8-hd-async	50,000 chars	$100 / 1M chars	Wise_Woman	Long-form. Async upstream; gateway waits and returns inline.
elevenlabs/eleven-v3	per ElevenLabs	$0.12 / minute	21m00Tcm4TlvDq8ikWAM (Rachel)	Pass any ElevenLabs voice_id in the voice field.
fish-audio/fish-tts	per Fish Audio	$15 / 1M chars	(Fish default)	Multilingual. Pass a Fish reference_id via the voice field for cloned voices.

Live rates are also on the .

MiniMax voice catalog

All three MiniMax models share the same voice IDs. Pass any of these as the voice field, grouped here by language:

English

English_Graceful_Lady
English_Insightful_Speaker
English_radiant_girl

Chinese / Japanese

Chinese (Mandarin)_Lyrical_Voice
Japanese_Whisper_Belle

Default (verified end-to-end)

Wise_Woman — used when voice is omitted

The English / Chinese / Japanese voices are listed in MiniMax's official voice catalog. Custom cloned voice IDs from your MiniMax account also work — the gateway forwards voice through unchanged.

Common errors

Status	What it means	Fix
400	Missing model / input, or voice not recognized by upstream	Confirm the model id is in the catalog above and the voice id matches that model.
402	Insufficient balance	Top up in the dashboard.
413	Input text exceeds the per-model max	Split into multiple calls or switch to the async HD variant (50,000 chars).
429	Account-level rate limit (60 req/min)	Slow down or contact us for a higher limit.
502	Upstream returned an unexpected response	Retry. If persistent, paste x-upstream-request-id in a support ticket.
504	Upstream took too long	Retry, or use the async HD variant for long inputs.

Want to try it in your browser? Open the , pick a TTS model from the dropdown, type a sentence, and hit Send. No code required.