Pay for what
you use.
Transparent, GPU-hour based metering with clear quotas. From solo developers to enterprise deployment fleets, only pay for the compute you consume.
Developer
Start building with OpenAI-compatible inference. Perfect for prototyping and side projects.
5 GPU Hours / mo
OPENAI-COMPATIBLE API
3 Model Deployments
DEPLOYMENT-AWARE ROUTING
All SDK Languages
TYPESCRIPT, PYTHON, GO, RUST
Community Support
DISCORD + GITHUB
Team
Production inference with usage metering and priority support. For teams shipping AI features.
100 GPU Hours / mo
MULTI-BACKEND ROUTING
Unlimited Deployments
PER-API-KEY QUOTAS
All SDK Languages
TYPESCRIPT, PYTHON, GO, RUST
Priority Support
4H RESPONSE GUARANTEE
Enterprise
Custom SLA, dedicated inference fleet, and white-glove onboarding for production at scale.
Unlimited GPU Hours
H100 CLUSTER READY
All SDK Languages
TYPESCRIPT, PYTHON, GO, RUST
Custom Governance
SAML SSO & RBAC
Dedicated Support
NAMED SOLUTIONS ENGINEER
Full Capability Matrix
Deploy your first model today.
OpenAI-compatible inference, GPU-hour based pricing, and per-API-key metering. No GPU cluster required.