Honest comparison
Which AI gateway should you actually use?
An honest map of where each provider wins — including the ones we don’t. We curate 60+ models. OpenRouter has three hundred. Groq has unbeatable speed. Replicate dominates image generation. None of that is up for debate. The question is which trade-off fits your workload.
thalam.
UsFor teams shipping to production on the Chinese open-weight frontier, who want predictable caching, per-key governance, and AED invoicing in one place.
OpenRouter
Excellent when you want every model under one key for rapid experimentation across hundreds of options.
Together AI
Strong choice if you’re fine-tuning custom open-weight models — the fine-tuning UX is a category leader.
Groq
The right answer when latency is the dominant constraint — voice, real-time UX, agent loops. Their LPU hardware is a real moat.
Fireworks AI
A good fit if you need fine-tuning or want Day-0 access to brand-new open-weight models the moment they ship.
Replicate
The natural home for image and video generation workloads — a model marketplace strongest in visual generation.
The matrix
Every feature, every provider, one page
| Feature | thalam. | OpenRouter | Together AI | Groq | Fireworks | Replicate |
|---|---|---|---|---|---|---|
| ●Coverage & compatibility | ||||||
| OpenAI-compatible API | ||||||
| Catalog size | 60+ | 300+ | 200+ | 25 | 100+ | 1000s |
| Image / video models | ||||||
| Fine-tuning / custom deployment | ||||||
| ●Routing & trust | ||||||
| Fixed upstream per model | ||||||
| Per-model quant transparency | ||||||
| Sub-100ms inference (LPU-class) | ||||||
| ●Governance | ||||||
| Per-key spend caps | ||||||
| Audit logs | ||||||
| Arabic-first model support | ||||||
| ●Pricing & region | ||||||
| Credits never expire | ||||||
| Pure pay-per-token, no minimum | ||||||
| Free tier | ||||||
| AED invoicing on Enterprise | ||||||
●Coverage & compatibility
OpenAI-compatible API
Replicate added OpenAI-style endpoints for selected models in 2024; the primary surface is still the predictions REST API.
Catalog size60+
Image / video models
OpenRouter routes to image models via partner endpoints; coverage depth varies.
Together hosts FLUX (image) and a small video catalog — narrower than Replicate but not zero.
Fireworks has FLUX + a few video options; not a primary focus.
Fine-tuning / custom deployment
Replicate Cog supports custom model uploads but not weights-level fine-tuning UX.
●Routing & trust
Fixed upstream per model
OpenRouter's default routing is dynamic; the `provider` parameter and provider-preference UI let users pin a single upstream when needed.
Per-model quant transparency
We surface the upstream model id and provider per call but don't yet label FP16/FP8 in the catalog UI — partial.
Quant level visible on some model pages, not consistently across catalog.
FP16 vs FP8 labelled on most flagship models; not all.
Sub-100ms inference (LPU-class)
●Governance
Per-key spend caps
OpenRouter Provisioning Keys can carry a credit ceiling — coarser than per-call caps but real.
Audit logs
Per-request log in the OpenRouter dashboard; not enterprise-grade but usable.
Activity log in dashboard; limited filtering and retention.
Request log surfaces in dashboard, no SIEM export.
Arabic-first model support
Hosts Jais and Falcon-Arabic via partner; not a curated focus.
Some Arabic models in catalog, no dedicated Arabic surfacing.
Llama 3.x family on Groq has decent multilingual / Arabic generation; not Arabic-first by design.
Carries a few Arabic-capable models, no dedicated track.
●Pricing & region
Credits never expire
Free credits expire after 12 months; paid balance behaviour varies by tier.
Free credits expire; paid balance long-lived.
Pure pay-per-token, no minimum
Free tier is generous; commercial tier carries a minimum spend.
Recently introduced subscription tier alongside pay-per-token.
Per-prediction billing with platform overhead on top.
Free tier
AED invoicing on Enterprise
Last verified 8 May 2026 against each provider's published documentation. Prices and capabilities change frequently — verify current rates before budgeting.
Where thalam fits
We made a deliberate choice: a curated catalog of frontier open-weight models — DeepSeek, Qwen, Kimi, GLM, Kling — alongside the leading Western models, each running on a fixed upstream. Per-key spend caps and audit logs in every account. Pure pay-per-token with credits that never expire. Built so a team in Dubai can pay in AED, and so a team in San Francisco doesn’t have to know about that. That’s what we built for. Every workload has a right home — and this is the home for production workloads on frontier open-weight models with governance built in and regional billing.
FAQ
Questions you probably have
Ready to ship
Try thalam on the workload it’s built for
No demo call, no waitlist, no credit card required. Sign up, top up when you’re ready to ship.