deployment options

AI on your terms

Run our AI anywhere with deployment options tailored to your unique infrastructure, security, and performance needs.

Private

Run our models on-premises or within an isolated virtual private cloud (VPC) environment for complete data sovereignty and governance.

Learn more


  • Highly regulated industries

  • Meeting strict data residency needs

  • Protecting highly sensitive data

Model Vault

Dedicated. Logically isolated. Fully managed. Deploy through Model Vault for scalable, high-performance inference without sacrificing enterprise control. Available for both model deployments and North.



  • Enterprises constrained by infrastructure provisioning

  • Companies balancing multi-tenant convenience with performance guarantees

  • Teams managing products with highly variable inference demand

Public/Hybrid cloud

Integrate private infrastructure with public cloud resources to balance compliance requirements with flexibility and scalability.



  • Enterprises needing local control and cloud flexibility

  • Companies with variable workloads

  • Optimizing cost and performance across environments

SaaS

Deploy through our fully managed SaaS platform to scale securely without the cost and complexity of managing your own infrastructure. Available for model deployments only.


  • Small to midsize businesses

  • Handling non-sensitive data

  • Running AI without infrastructure overhead

Compare deployment options

Private

Model Vault

Public/Hybrid cloud

SaaS

Available for

Command, Rerank, Embed, North, Compass

Command, Rerank, Embed, North, Compass

Command, Rerank, Embed, North, Compass

Command, Rerank, Embed

Time to get started

< 1 day

Instantly

< 1 day

Instantly

Key benefits

  • Run securely behind your firewall

  • Achieve complete data sovereignty

  • Scale and customize to your exact needs

  • Ensure guaranteed performance and availability

  • Scale model inference needs with fully managed compute

  • Eliminate rate limits and other shared resource constraints

  • Compatibility with any cloud AI/ML platform

  • Run each workload in its optimal environment for max. efficiency

  • Run regulated workloads privately

  • Get started instantly with no setup

  • Simplify operations with fully managed compute

  • Protect data with dedicated instances

Pricing structure

Per model instance

Per model performance tier and per instance range

Per token and per instance

Per token