The Cloud AI Commitment Landscape

Cloud AI infrastructure costs are unlike traditional enterprise software licence costs in three important ways. They are consumption-based — costs scale with usage, not with a fixed seat count. They are model-dependent — different AI models within the same platform carry dramatically different per-token prices. And they are rapidly changing — pricing, model availability, and commitment structures have all shifted materially year-over-year since 2023.

These characteristics create a procurement environment that is structurally different from conventional enterprise software negotiations. Consumption billing creates budget unpredictability that is difficult to manage without real-time monitoring and spending controls. Model dependency means that a commitment to a specific model may become commercially unfavourable if a newer, cheaper model becomes available mid-commitment. And rapid pricing change means that any long-term commitment carries the risk of locking in pricing that becomes uncompetitive within the commitment window.

Despite these challenges, meaningful AI cost reduction requires commitment. All three major hyperscalers — AWS, Azure, and Google Cloud — offer substantially lower effective rates for enterprises that commit to defined spend levels or provisioned capacity. The strategic question is not whether to commit, but how to structure the commitment to minimise risk while capturing the available discount. For ongoing AI contract intelligence, visit our GenAI knowledge hub or speak to our enterprise AI negotiation specialists.

"We consistently see enterprises signing three-year cloud AI commitments before they have validated their AI workload requirements. The discount is real, but so is the risk of paying for capacity that is three model generations behind by the time the commitment ends." — Morten Andersen, Co-Founder

AWS: Enterprise Discount Programme and Bedrock Pricing

AWS offers enterprise AI cost reduction through two complementary mechanisms: the Enterprise Discount Programme (EDP) at the account level, and provisioned throughput (Model Units) within Amazon Bedrock at the model level.

AWS Enterprise Discount Programme

The AWS Enterprise Discount Programme — previously known as the Private Pricing Agreement — provides percentage discounts on all AWS consumption, including Bedrock, in exchange for a committed annual spend. Meaningful EDP discounts begin at approximately $2 million or more in annual committed AWS spend. Below this threshold, AWS's standard published pricing with Reserved Instance or Savings Plan optimisation typically delivers better economics than an EDP commitment.

EDP discounts scale with commitment size and term length, typically ranging from 5 to 25 percent off standard pay-as-you-go pricing. Three-year EDP terms produce the largest discounts but carry the highest commitment risk for AI workloads where model usage patterns may shift materially over the term. For AI-heavy organisations in 2025, we recommend maximum one-year EDP terms until AI workload requirements are stable and predictable. The flexibility cost of a shorter term is outweighed by the strategic value of renegotiating annually as AI infrastructure costs continue to fall.

Data egress is the most common source of unexpected AWS cost overruns, and it is a cost that EDP discounts do not reduce. AWS charges for data transferred out of AWS regions — to the internet, to other cloud providers, or to on-premises infrastructure. Enterprise AI workloads that generate large outputs (image generation, document processing, large language model responses) can generate material egress costs that are absent from initial budget models. Always model egress costs explicitly before committing to AWS AI spend.

AWS Reserved Instances vs Savings Plans

Within AWS compute, the distinction between Reserved Instances and Savings Plans is critical for AI infrastructure planning. Reserved Instances provide a discount of up to 75 percent off on-demand pricing in exchange for a commitment to a specific instance type, region, and term (one or three years). Savings Plans provide a discount of up to 66 percent off on-demand pricing in exchange for a commitment to a consistent compute spend per hour, with flexibility to apply that commitment across any EC2 instance type, region, or Fargate compute.

For AI inference workloads running on defined GPU instance types with predictable throughput requirements, Reserved Instances deliver the maximum discount. For AI workloads where instance type or capacity requirements may shift — particularly as new GPU generations become available from AWS — Savings Plans provide the commitment flexibility that prevents stranding investment in reserved capacity that becomes suboptimal.

For Amazon Bedrock specifically, provisioned throughput (Model Units) is the primary commitment mechanism. Model Units reserve a defined inference throughput for a specific model, providing price certainty and guaranteed availability for high-volume AI applications. Longer-term Model Unit commitments reduce the hourly rate but commit the organisation to a specific model — a model that may be superseded by a materially more capable or cheaper model before the commitment expires.

Azure OpenAI: Provisioned Throughput and the EDP Comparison

Azure AI Foundry (formerly Azure Cognitive Services and Azure OpenAI) offers two pricing modes: pay-as-you-go (consumption-based per token) and Provisioned Throughput Units (PTUs). PTUs represent Azure's commitment mechanism for AI inference, offering 30 to 50 percent savings compared to pay-as-you-go for predictable high-volume workloads.

Azure OpenAI vs Direct OpenAI

Enterprises evaluating GPT-4 and GPT-4o workloads face a choice between accessing these models through Azure OpenAI Service or directly through OpenAI's enterprise API. The pricing comparison between Azure OpenAI and direct OpenAI is not straightforward and varies by model, volume, and negotiated terms.

Azure OpenAI pricing is typically equivalent to or slightly below direct OpenAI API pricing for equivalent models. The substantive difference is not in base pricing but in the commercial structure: Azure OpenAI integrates with Azure's enterprise agreements, MACC commitments, and enterprise billing infrastructure. For organisations with existing Azure commitments, Azure OpenAI spend contributes toward MACC drawdown — effectively getting the AI spend at enterprise agreement pricing rather than pay-as-you-go rates.

Direct OpenAI enterprise agreements carry lock-in provisions that warrant careful review before signature. OpenAI's enterprise agreements typically include minimum purchase commitments, auto-renewal clauses with limited cancellation windows, and data usage terms that differ from Azure OpenAI's enterprise data handling commitments. OpenAI's enterprise agreements have historically included provisions that limited the customer's ability to use competing models for the duration of the agreement — a material restriction in a market where Anthropic Claude, Google Gemini, and Meta LLaMA are all credible alternatives. Always have these provisions reviewed by legal counsel with specific AI contract expertise before signature.

Consumption billing is the shared risk for both Azure OpenAI and direct OpenAI at scale. Token consumption creates budget unpredictability that is difficult to manage without explicit spending controls, real-time usage monitoring, and application-level rate limiting. Enterprises that deploy GPT-4 or GPT-4o without token budgeting controls in production applications consistently report month-over-month budget overruns until monitoring and control infrastructure is in place.

Google Cloud: Vertex AI and the Committed Use Discount Structure

Google Vertex AI offers on-demand token pricing with committed use discounts (CUDs) for one-year or three-year terms. Google caps committed use discounts at 57 percent for AI inference workloads — higher than comparable AWS Reserved Instance discounts for equivalent GPU resources — but Google also offers automatic Sustained Use Discounts (SUDs) of up to 30 percent for compute resources used continuously within a month without any commitment required.

For AI workloads with predictable, continuous inference requirements, Google's combination of automatic SUDs and CUDs can deliver effective pricing that is 15 to 25 percent below comparable AWS Bedrock pricing for equivalent model types. However, Google Vertex AI's model selection is more limited than AWS Bedrock's multi-model marketplace — Vertex AI is primarily optimised for Google's Gemini model family, while Bedrock provides access to Anthropic Claude, Meta LLaMA, Cohere, Mistral, and others in addition to Amazon's Titan models.

Model diversity is a negotiating advantage with Google Vertex AI. Google's incentive to capture AI inference spend from AWS and Azure means that large commitments to Vertex AI are negotiated commercially with more flexibility than the published committed use discount tiers suggest. Enterprise organisations committing $500,000 or more annually to Vertex AI should engage Google's commercial team directly rather than accepting the standard CUD structure.

Evaluating cloud AI commitment options?

We provide independent pricing comparison and negotiation support across AWS, Azure, and Google Cloud.
Cloud & AI Advisory →

Negotiation Tactics for Cloud AI Commitment Deals

Cloud AI commitment negotiations differ from traditional enterprise software negotiations in pace and precedent. These markets are moving quickly, vendor negotiating frameworks are still maturing, and there is less standardised transaction data available for benchmarking. Despite this, several effective tactics apply across all three hyperscalers.

Use Competitive Evaluation as Real Leverage

AWS, Azure, and Google Cloud are in active competition for enterprise AI spend. Signalling to one provider that you are evaluating the other two — and providing evidence of that evaluation in the form of technical assessments and indicative pricing — generates materially more aggressive commercial responses than single-vendor negotiations. This leverage is most effective when the competitive evaluation is genuine: a credible proof-of-concept on AWS Bedrock strengthens your Azure OpenAI negotiation, and vice versa.

Negotiate Term Length Separately from Discount Level

Vendors will default to three-year commitments for maximum discount. AI workload requirements in 2025 justify one-year maximum commitments in most cases, with renegotiation rights at each annual renewal. The incremental discount for extending from one year to three years — typically 5 to 10 percentage points — is rarely worth the commitment risk in a market where AI model pricing is falling 20 to 40 percent annually. Negotiate the maximum discount available for a one-year term and revisit commitment length in the second year when workload patterns are better established.

Require Spending Controls as Contract Terms

Consumption billing creates budget unpredictability that consumption billing controls can mitigate but not eliminate. Require the vendor to provide native budget alert thresholds, automated spending caps, and API-level rate limiting as part of any enterprise commitment. Document the control mechanisms available in the contract, and include a right to pause consumption at defined spending thresholds without penalty. This provision is negotiable and protects against runaway AI spend scenarios.

Model Transition Rights for Longer Commitments

If you accept a commitment of more than one year, negotiate explicit model transition rights — the right to apply committed spend toward new models as they become available at no additional cost. AWS Bedrock's multi-model architecture makes this easier to negotiate than Azure OpenAI's model-specific PTU structure. Google Vertex AI's commitment structure is somewhat more flexible by design, as Gemini model families share committed throughput infrastructure.

Get Cloud AI Pricing Intelligence

Subscribe to our GenAI and cloud advisory newsletter for quarterly updates on AWS Bedrock, Azure OpenAI, and Google Vertex AI pricing developments, model releases, and commercial structure changes.

In one engagement, a European financial services group committed to a $3.2M Azure OpenAI Provisioned Throughput Unit block based on projected AI usage that never materialised at forecasted volumes. Redress identified the over-commitment during a mid-term review, renegotiated the PTU allocation, and recovered $840,000 in prepaid capacity that was redeployed to actual production workloads. The engagement fee was under 6% of the recovered value.