Reserved instances are designed for cutting-edge customers running larger workloads, allowing inference at scale with full control over the model configuration and performance profile.
Reserved instance rentals are based on reserved compute units with 3-month or 1-year (~15% savings) commitments. Running an individual model instance (see below for current SKUs) requires a specific number of compute units:
*Example: With a 1-year commitment, the 2 instance minimum for 300CU GPT-4 Turbo would cost $1,584,000. Each additional instance of 300CU costs $792,000/year
Can the input and output data be used by OpenAI for training? No: we'll never use or train on your data. Reserved Capacity adheres to our Enterprise Privacy commitments. What happens when the pay-as-you-go price of a model changes? Reserved Capacity pricing is based on raw compute costs. Sometimes price changes are enabled by improved price efficiency, in which case updating to that model will yield more throughput on your reserved instances. Sometimes pay-as-you-go price changes are made for reasons separate from model efficiency in which case your reserved capacity throughput will be unaffected.
What throughput should I expect? How do I know how many units I should buy? Throughput on a reserved instance - and the total units and instances you decide to use - will depend on your request pattern. Shared capacity API calls use a “blended” per-request “cost” across all traffic. With reserved instances, your performance is a function of the underlying performance of the hardware and your workload. We will provide a benchmarking instance and tooling for you to determine how many reserved units to purchase for your specific workloads and requirements.