Honest comparison

Which AI gateway should you actually use?

An honest map of where each provider wins — including the ones we don’t. We curate 60+ models. OpenRouter has three hundred. Groq has unbeatable speed. Replicate dominates image generation. None of that is up for debate. The question is which trade-off fits your workload.

See the matrix

thalam.

For teams shipping to production on the Chinese open-weight frontier, who want predictable caching, per-key governance, and AED invoicing in one place.

OpenRouter

Excellent when you want every model under one key for rapid experimentation across hundreds of options.

Together AI

Strong choice if you’re fine-tuning custom open-weight models — the fine-tuning UX is a category leader.

Groq

The right answer when latency is the dominant constraint — voice, real-time UX, agent loops. Their LPU hardware is a real moat.

Fireworks AI

A good fit if you need fine-tuning or want Day-0 access to brand-new open-weight models the moment they ship.

Replicate

The natural home for image and video generation workloads — a model marketplace strongest in visual generation.

The matrix

Every feature, every provider, one page

Feature	thalam.	OpenRouter	Together AI	Groq	Fireworks	Replicate
●Coverage & compatibility
OpenAI-compatible API
Catalog size	60+	300+	200+	25	100+	1000s
Image / video models
Fine-tuning / custom deployment
●Routing & trust
Fixed upstream per model
Per-model quant transparency
Sub-100ms inference (LPU-class)
●Governance
Per-key spend caps
Audit logs
Arabic-first model support
●Pricing & region
Credits never expire
Pure pay-per-token, no minimum
Free tier
AED invoicing on Enterprise

●Coverage & compatibility

OpenAI-compatible API

thalam.

OpenRouter

Together AI

Groq

Fireworks

Replicate

Replicate added OpenAI-style endpoints for selected models in 2024; the primary surface is still the predictions REST API.

Catalog size

60+

thalam.

300+

OpenRouter

200+

Together AI

Groq

100+

Fireworks

1000s

Replicate

Image / video models

thalam.

OpenRouter

OpenRouter routes to image models via partner endpoints; coverage depth varies.

Together AI

Together hosts FLUX (image) and a small video catalog — narrower than Replicate but not zero.

Groq

Fireworks

Fireworks has FLUX + a few video options; not a primary focus.

Replicate

Fine-tuning / custom deployment

thalam.

OpenRouter

Together AI

Groq

Fireworks

Replicate

Replicate Cog supports custom model uploads but not weights-level fine-tuning UX.

●Routing & trust

Fixed upstream per model

thalam.

OpenRouter

OpenRouter's default routing is dynamic; the `provider` parameter and provider-preference UI let users pin a single upstream when needed.

Together AI

Groq

Fireworks

Replicate

Per-model quant transparency

thalam.

We surface the upstream model id and provider per call but don't yet label FP16/FP8 in the catalog UI — partial.

OpenRouter

Together AI

Quant level visible on some model pages, not consistently across catalog.

Groq

Fireworks

FP16 vs FP8 labelled on most flagship models; not all.

Replicate

Sub-100ms inference (LPU-class)

thalam.

OpenRouter

Together AI

Groq

Fireworks

Replicate

●Governance

Per-key spend caps

thalam.

OpenRouter

OpenRouter Provisioning Keys can carry a credit ceiling — coarser than per-call caps but real.

Together AI

Groq

Fireworks

Replicate

Audit logs

thalam.

OpenRouter

Per-request log in the OpenRouter dashboard; not enterprise-grade but usable.

Together AI

Activity log in dashboard; limited filtering and retention.

Groq

Fireworks

Request log surfaces in dashboard, no SIEM export.

Replicate

Arabic-first model support

thalam.

OpenRouter

Hosts Jais and Falcon-Arabic via partner; not a curated focus.

Together AI

Some Arabic models in catalog, no dedicated Arabic surfacing.

Groq

Llama 3.x family on Groq has decent multilingual / Arabic generation; not Arabic-first by design.

Fireworks

Carries a few Arabic-capable models, no dedicated track.

Replicate

●Pricing & region

Credits never expire

thalam.

OpenRouter

Together AI

Free credits expire after 12 months; paid balance behaviour varies by tier.

Groq

Fireworks

Free credits expire; paid balance long-lived.

Replicate

Pure pay-per-token, no minimum

thalam.

OpenRouter

Together AI

Groq

Free tier is generous; commercial tier carries a minimum spend.

Fireworks

Recently introduced subscription tier alongside pay-per-token.

Replicate

Per-prediction billing with platform overhead on top.

Free tier

thalam.

OpenRouter

Together AI

Groq

Fireworks

Replicate

AED invoicing on Enterprise

thalam.

OpenRouter

Together AI

Groq

Fireworks

Replicate

FullPartialNot availableHover any "partial" dot for context.Tap any feature to see all providers.

Last verified 8 May 2026 against each provider's published documentation. Prices and capabilities change frequently — verify current rates before budgeting.

Where thalam fits

We made a deliberate choice: a curated catalog of frontier open-weight models — DeepSeek, Qwen, Kimi, GLM, Kling — alongside the leading Western models, each running on a fixed upstream. Per-key spend caps and audit logs in every account. Pure pay-per-token with credits that never expire. Built so a team in Dubai can pay in AED, and so a team in San Francisco doesn’t have to know about that. That’s what we built for. Every workload has a right home — and this is the home for production workloads on frontier open-weight models with governance built in and regional billing.

FAQ

Questions you probably have

Ready to ship

Try thalam on the workload it’s built for

No demo call, no waitlist. Sign up, unlock $1 free with a card, top up when you’re ready to ship.