Honest comparison

Which AI gateway should you actually use?

An honest map of where each provider wins — including the ones we don’t. We curate 60+ models. OpenRouter has three hundred. Groq has unbeatable speed. Replicate dominates image generation. None of that is up for debate. The question is which trade-off fits your workload.

See the matrix

thalam.

Us

For teams shipping to production on the Chinese open-weight frontier, who want predictable caching, per-key governance, and AED invoicing in one place.

OpenRouter

Excellent when you want every model under one key for rapid experimentation across hundreds of options.

Together AI

Strong choice if you’re fine-tuning custom open-weight models — the fine-tuning UX is a category leader.

Groq

The right answer when latency is the dominant constraint — voice, real-time UX, agent loops. Their LPU hardware is a real moat.

Fireworks AI

A good fit if you need fine-tuning or want Day-0 access to brand-new open-weight models the moment they ship.

Replicate

The natural home for image and video generation workloads — a model marketplace strongest in visual generation.

The matrix

Every feature, every provider, one page

Coverage & compatibility

OpenAI-compatible API
thalam.
OpenRouter
Together AI
Groq
Fireworks
Replicate

Replicate added OpenAI-style endpoints for selected models in 2024; the primary surface is still the predictions REST API.

Catalog size
60+
60+
thalam.
300+
OpenRouter
200+
Together AI
25
Groq
100+
Fireworks
1000s
Replicate
Image / video models
thalam.
OpenRouter

OpenRouter routes to image models via partner endpoints; coverage depth varies.

Together AI

Together hosts FLUX (image) and a small video catalog — narrower than Replicate but not zero.

Groq
Fireworks

Fireworks has FLUX + a few video options; not a primary focus.

Replicate
Fine-tuning / custom deployment
thalam.
OpenRouter
Together AI
Groq
Fireworks
Replicate

Replicate Cog supports custom model uploads but not weights-level fine-tuning UX.

Routing & trust

Fixed upstream per model
thalam.
OpenRouter

OpenRouter's default routing is dynamic; the `provider` parameter and provider-preference UI let users pin a single upstream when needed.

Together AI
Groq
Fireworks
Replicate
Per-model quant transparency
thalam.

We surface the upstream model id and provider per call but don't yet label FP16/FP8 in the catalog UI — partial.

OpenRouter
Together AI

Quant level visible on some model pages, not consistently across catalog.

Groq
Fireworks

FP16 vs FP8 labelled on most flagship models; not all.

Replicate
Sub-100ms inference (LPU-class)
thalam.
OpenRouter
Together AI
Groq
Fireworks
Replicate

Governance

Per-key spend caps
thalam.
OpenRouter

OpenRouter Provisioning Keys can carry a credit ceiling — coarser than per-call caps but real.

Together AI
Groq
Fireworks
Replicate
Audit logs
thalam.
OpenRouter

Per-request log in the OpenRouter dashboard; not enterprise-grade but usable.

Together AI

Activity log in dashboard; limited filtering and retention.

Groq
Fireworks

Request log surfaces in dashboard, no SIEM export.

Replicate
Arabic-first model support
thalam.
OpenRouter

Hosts Jais and Falcon-Arabic via partner; not a curated focus.

Together AI

Some Arabic models in catalog, no dedicated Arabic surfacing.

Groq

Llama 3.x family on Groq has decent multilingual / Arabic generation; not Arabic-first by design.

Fireworks

Carries a few Arabic-capable models, no dedicated track.

Replicate

Pricing & region

Credits never expire
thalam.
OpenRouter
Together AI

Free credits expire after 12 months; paid balance behaviour varies by tier.

Groq
Fireworks

Free credits expire; paid balance long-lived.

Replicate
Pure pay-per-token, no minimum
thalam.
OpenRouter
Together AI
Groq

Free tier is generous; commercial tier carries a minimum spend.

Fireworks

Recently introduced subscription tier alongside pay-per-token.

Replicate

Per-prediction billing with platform overhead on top.

Free tier
thalam.
OpenRouter
Together AI
Groq
Fireworks
Replicate
AED invoicing on Enterprise
thalam.
OpenRouter
Together AI
Groq
Fireworks
Replicate
FullPartialNot availableTap any feature to see all providers.

Last verified 8 May 2026 against each provider's published documentation. Prices and capabilities change frequently — verify current rates before budgeting.

Where thalam fits

We made a deliberate choice: a curated catalog of frontier open-weight models — DeepSeek, Qwen, Kimi, GLM, Kling — alongside the leading Western models, each running on a fixed upstream. Per-key spend caps and audit logs in every account. Pure pay-per-token with credits that never expire. Built so a team in Dubai can pay in AED, and so a team in San Francisco doesn’t have to know about that. That’s what we built for. Every workload has a right home — and this is the home for production workloads on frontier open-weight models with governance built in and regional billing.

FAQ

Questions you probably have

Ready to ship

Try thalam on the workload it’s built for

No demo call, no waitlist, no credit card required. Sign up, top up when you’re ready to ship.