deployment options
AI on your terms
Run our AI anywhere with deployment options tailored to your unique infrastructure, security, and performance needs.
Private
Run our models on-premises or within an isolated virtual private cloud (VPC) environment for complete data sovereignty and governance.
Highly regulated industries
Meeting strict data residency needs
Protecting highly sensitive data
Model Vault
Dedicated. Logically isolated. Fully managed. Deploy through Model Vault for scalable, high-performance inference without sacrificing enterprise control. Available for both model deployments and North.
Enterprises constrained by infrastructure provisioning
Companies balancing multi-tenant convenience with performance guarantees
Teams managing products with highly variable inference demand
Public/Hybrid cloud
Integrate private infrastructure with public cloud resources to balance compliance requirements with flexibility and scalability.
Enterprises needing local control and cloud flexibility
Companies with variable workloads
Optimizing cost and performance across environments
SaaS
Deploy through our fully managed SaaS platform to scale securely without the cost and complexity of managing your own infrastructure. Available for model deployments only.
Small to midsize businesses
Handling non-sensitive data
Running AI without infrastructure overhead
Compare deployment options
Private
Model Vault
Public/Hybrid cloud
SaaS
Available for
Command, Rerank, Embed, North, Compass
Command, Rerank, Embed, North, Compass
Command, Rerank, Embed, North, Compass
Command, Rerank, Embed
Time to get started
< 1 day
Instantly
< 1 day
Instantly
Key benefits
Run securely behind your firewall
Achieve complete data sovereignty
Scale and customize to your exact needs
Ensure guaranteed performance and availability
Scale model inference needs with fully managed compute
Eliminate rate limits and other shared resource constraints
Compatibility with any cloud AI/ML platform
Run each workload in its optimal environment for max. efficiency
Run regulated workloads privately
Get started instantly with no setup
Simplify operations with fully managed compute
Protect data with dedicated instances
Pricing structure
Per model instance
Per model performance tier and per instance range
Per token and per instance
Per token