The Commercial Landscape: Why AI Licensing Is Different

Traditional enterprise software licensing is based on deterministic metrics: seats, named users, CPU cores, or revenue bands. The cost of running 10,000 seats of a business application is predictable before deployment. AI model API consumption is the opposite: it is entirely variable, driven by query volume, prompt length, response length, model tier selection, and feature usage patterns that are difficult to forecast with accuracy.

This consumption-based billing model creates genuine budget unpredictability that enterprises have not had to manage in traditional software procurement. A developer productivity tool that uses an LLM API might consume 100 million tokens per month at a modest cost during development. The same tool deployed to 5,000 developers with active daily usage can consume 10 billion tokens per month — a 100x increase that translates directly to a 100x cost increase, materialising in a single billing cycle.

Every major AI provider — OpenAI, Anthropic, Google Gemini, and AWS Bedrock — operates on consumption billing. The differences between them are in pricing levels, commitment structures, enterprise agreement terms, access pathways, and lock-in depth. Understanding these differences is the foundation of effective enterprise AI cost governance.

OpenAI: Lock-in Provisions and Enterprise Agreement Risks

OpenAI is the dominant enterprise AI provider by market share and brand recognition. For enterprises adopting GPT-4o, GPT-4, or the o3/o4 reasoning models, OpenAI's enterprise agreement is the primary commercial instrument — and it contains provisions that require careful scrutiny before signing.

Lock-in Provisions That Require Negotiation

OpenAI enterprise agreements carry lock-in provisions that are frequently not identified during standard procurement review. The most significant include: volume commitment minimums that require paying for unused token capacity if actual consumption falls below the committed level; price change provisions in standard terms that historically allowed rate increases with as little as 14 days' notice; data retention and usage provisions that may affect your ability to use similar terms with competing providers; and model version lock provisions in some agreements that restrict the right to substitute later model versions without amending the agreement.

Each of these provisions is negotiable for enterprise customers. Price lock provisions — guaranteeing no rate increases during the initial contract term — should be a non-negotiable requirement for any enterprise OpenAI commitment. Volume commitment structures should include ramp provisions that allow consumption to build to the committed level over 3 to 6 months rather than from month one. And data handling provisions should be reviewed by your legal team for compliance with applicable data residency, processing, and portability requirements.

Enterprise organisations that sign OpenAI agreements without independent commercial review consistently accept terms that are more favourable to OpenAI than necessary. The OpenAI sales team is accustomed to negotiating with enterprise procurement teams — there is no published standard that defines what terms are and are not available. Benchmarking against what comparable organisations have negotiated is the only reliable anchor.

OpenAI Pricing: Standard vs Enterprise

OpenAI's published API pricing for GPT-4o is $2.50 per million input tokens and $10.00 per million output tokens at standard rates. At enterprise scale, volume discounts reduce these rates, but the published pricing creates a baseline that can be directly compared against Azure OpenAI pricing for identical models accessed through the Azure channel.

OpenAI's enterprise API access includes additional features beyond the standard API: dedicated capacity, enhanced uptime SLAs, SSO and enterprise identity integration, and enhanced data privacy controls. These enterprise features have genuine value for regulated enterprises, but their availability should not be taken as given — they should be explicitly enumerated and guaranteed in the enterprise agreement.

Azure OpenAI vs Direct OpenAI: The Critical Pricing Comparison

Azure OpenAI Service provides access to OpenAI's models — including GPT-4o, GPT-4, DALL-E 3, and Whisper — through Microsoft's Azure commercial and compliance infrastructure. This creates a choice that enterprise buyers frequently fail to analyse rigorously: should OpenAI models be accessed through Azure OpenAI or directly through OpenAI's API?

Pricing Parity and Where It Diverges

At the token pricing level, Azure OpenAI and direct OpenAI API pricing are broadly similar for standard consumption-based access. Both services price GPT-4o at comparable rates per million tokens. The meaningful pricing difference emerges at committed throughput: Azure OpenAI offers Provisioned Throughput Units (PTUs), which provide reserved model serving capacity at a fixed hourly rate. PTU pricing, when amortised across a high-volume production workload, reduces the effective per-token cost substantially compared to on-demand token pricing — often by 30 to 50 percent for workloads above one million tokens daily.

Direct OpenAI does not offer a PTU-equivalent committed capacity model. This means that enterprises deploying OpenAI models at high, predictable volume should evaluate Azure OpenAI PTUs as the cost-optimal access pathway, even if the enterprise has a direct OpenAI relationship for other use cases. The two channels are not mutually exclusive — an enterprise can maintain a direct OpenAI API agreement for development and lower-volume use cases while using Azure OpenAI PTUs for production workloads requiring cost efficiency and latency predictability.

Enterprise Compliance and Data Residency

Azure OpenAI's most material advantage for regulated enterprises is its integration with Azure's compliance framework. Azure OpenAI supports data residency configurations that keep data within specific geographic regions, integrates with Azure Active Directory for enterprise identity management, and is covered by Microsoft's enterprise Data Processing Addendum (DPA) for GDPR, HIPAA, and other compliance frameworks. Direct OpenAI API access is covered by OpenAI's own privacy and security terms, which may not satisfy regulated industry requirements without additional negotiated provisions.

For enterprises in regulated industries — financial services, healthcare, government, pharmaceuticals — Azure OpenAI's compliance coverage often makes it the only viable access channel for production workloads, regardless of comparative pricing. The compliance decision should drive the access channel decision, with pricing optimisation applied within the chosen channel.

Azure Lock-in Through Azure OpenAI

Enterprises that build production AI applications on Azure OpenAI are simultaneously creating dependency on Microsoft as the infrastructure provider and on OpenAI as the model provider. This dual dependency is a form of vendor lock-in that amplifies the risk of either party making commercial changes. If Azure OpenAI pricing increases, or if Microsoft changes the terms of Azure OpenAI access, the enterprise cannot simply switch to direct OpenAI access without re-engineering infrastructure that depends on Azure-specific services (Azure API Management, Azure Monitor, Azure AI Studio integration, and Azure Active Directory authentication).

Mitigating this risk requires explicit model abstraction in application architecture — ensuring that model access calls are mediated through an abstraction layer that could support other providers' APIs without application re-engineering — and maintaining direct OpenAI API credentials as a fallback channel, even if Azure OpenAI is the primary production pathway.

Need independent advice on your enterprise AI licensing strategy?

We provide commercial reviews of OpenAI, Anthropic, Google, and AWS AI agreements.
Talk to an Advisor →

Anthropic Claude: Pricing, Enterprise Terms, and the Safety Positioning

Anthropic's Claude models — Claude Opus 4.6, Claude Sonnet 4.6, and Claude Haiku 4.5 — represent a serious commercial alternative to OpenAI GPT models for a growing number of enterprise use cases. Anthropic's positioning emphasises AI safety and responsible deployment, which resonates with risk-sensitive enterprises in regulated sectors. The commercial terms, however, require careful analysis.

Claude API Pricing Structure

Anthropic publishes consumption-based API pricing for the Claude model family. As of 2026, Claude Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens — the most cost-efficient tier for high-volume applications where Claude's core text capabilities suffice. Claude Sonnet 4.6, the mid-tier general-purpose model, is priced at $3 per million input tokens and $15 per million output tokens. Claude Opus 4.6, the most capable model tier, is priced at $5 per million input tokens and $25 per million output tokens.

These prices are significantly lower than GPT-4o list prices at equivalent capability tiers, making Anthropic models an attractive cost option for enterprises that have evaluated Claude's quality for their specific use cases. The pricing advantage is real — but it must be evaluated against the fact that model version transitions (from Claude 3.5 to Claude 4, for example) can result in pricing changes that affect the total cost calculation for long-running production deployments.

Anthropic Enterprise Agreements and Consumption Billing Risk

Anthropic's enterprise programme requires a minimum of 50 seats and charges are billed per seat per month on an annual basis, with API consumption billed separately on a pay-as-you-go basis. This hybrid model — fixed seat fees plus variable consumption charges — creates a consumption billing exposure that is additive to the seat cost. Enterprises deploying Claude extensively should model both components and establish token consumption monitoring from day one.

Consumption billing creates budget unpredictability that is particularly acute when AI capabilities are adopted broadly across an organisation. A pilot deployment involving 50 users may generate modest token consumption. Scaling to 2,000 users — each making multiple AI-assisted requests per day — creates token consumption volumes that can generate monthly API bills 40 to 100 times larger than the pilot phase. Without budget alerts, token consumption dashboards, and per-team cost allocation, consumption billing overruns are identified only at month-end when remediation options are limited.

AWS Bedrock: The Multi-Model Marketplace Approach

AWS Bedrock is Amazon's managed AI model access service, providing access to foundational models from multiple providers — including Anthropic Claude, Amazon Titan, Meta Llama, Mistral AI, Cohere, and others — through a single AWS API and commercial relationship. For enterprises with existing AWS relationships, Bedrock provides a path to access leading AI models without establishing separate commercial relationships with each model provider.

Bedrock Pricing Model

AWS Bedrock charges per API call based on the number of input and output tokens processed, with rates varying by model. For Anthropic Claude models accessed through Bedrock, the pricing is generally comparable to direct Anthropic API pricing, with some variation by model version and usage tier. The commercial advantage of Bedrock for AWS-primary organisations is that Bedrock consumption is eligible for EDP credits if structured appropriately in the EDP agreement — meaning organisations with substantial EDP commitments may be able to apply EDP discounts to Bedrock spend.

Bedrock also offers Provisioned Throughput for high-volume models, providing committed capacity at reduced per-token rates similar to Azure OpenAI's PTU model. For enterprises deploying AI at scale within the AWS ecosystem, Bedrock Provisioned Throughput is the cost-efficient access model, and should be evaluated alongside direct model provider agreements.

Bedrock Lock-in vs Provider Diversity

Bedrock's multi-model structure creates a nuanced lock-in profile. By consolidating model access through Bedrock, the organisation creates infrastructure dependency on AWS (API architecture, authentication, logging, monitoring, and billing integration), while nominally maintaining access to multiple model providers. In practice, migrating from Bedrock to direct model provider APIs, or to Azure AI Foundry, requires re-engineering application architecture even if the underlying models are available from both providers. The abstraction layer that Bedrock provides creates its own form of lock-in — AWS Bedrock lock-in rather than model-specific lock-in.

For organisations whose primary cloud commitment is AWS and whose AI workloads are entirely AWS-based, Bedrock lock-in is an acceptable trade-off for the commercial convenience of unified billing and EDP credit eligibility. For organisations with multi-cloud strategies or with genuine requirements for cross-cloud AI deployment, direct model provider relationships may provide better long-term flexibility.

Google Gemini and Vertex AI: The Native Google AI Stack

Google's enterprise AI offering spans two complementary access channels: direct Gemini API access (through Google AI Studio and the Gemini API for developers) and enterprise model access through Vertex AI (Google's managed ML platform). Gemini models — Gemini 1.5 Pro, Gemini 2.0, and Gemini Ultra — provide Google's most capable multimodal AI capabilities, including long-context windows of up to one million tokens that exceed competing models on document and code processing tasks.

Google Gemini Pricing

Gemini's consumption-based pricing includes per-token charges for input and output, with significant variation by model tier and context length. Gemini 1.5 Pro with standard context is priced at $1.25 per million input tokens and $5.00 per million output tokens; prompts exceeding 128,000 tokens are charged at $2.50 per million input tokens, reflecting the computational cost of long-context processing. Gemini 2.0 Flash, Google's cost-efficient tier, is priced at $0.10 per million input tokens and $0.40 per million output tokens — making it one of the lowest per-token prices among leading multimodal models.

Google offers committed use discounts for Vertex AI spend, and Gemini API spend within Vertex AI is eligible for Google Cloud Committed Use Discounts (CUDs). For enterprises with substantial Google Cloud commitments, CUD eligibility for Gemini spend provides a meaningful pricing advantage over direct consumption billing.

Consumption Billing Unpredictability Across All Google AI Services

Google's AI services share the consumption billing characteristics of all major AI providers. Budget unpredictability is particularly acute for long-context applications: a document processing pipeline that routinely processes 500-page documents through Gemini 1.5 Pro at long-context rates can generate per-document costs of $1.25 to $2.50 per document — trivial at small scale but significant at production volume. Enterprises processing millions of documents annually should model the per-document AI cost explicitly and compare it against the cost of running open-source models on Google Kubernetes Engine or Vertex AI custom training infrastructure.

Every major AI provider uses consumption billing. The enterprise that deploys AI without token consumption monitoring, per-team cost allocation, and budget alerts will encounter the same outcome: a month-end bill that is materially higher than anticipated, with limited ability to remediate retrospectively.

The Multi-Model Strategy: Managing Vendor Dependency

The most commercially resilient enterprise AI strategy maintains relationships with multiple model providers — using different providers for different use cases based on capability-cost fit, with no single provider dependency for any critical workflow. In practice, multi-model strategies are more complex to implement than single-provider strategies and require explicit architectural support, but the commercial benefits in terms of pricing leverage and switching optionality are material.

Model Selection by Use Case

Effective multi-model strategies assign models based on use case requirements. High-complexity reasoning tasks — legal analysis, financial modelling, scientific research — may justify GPT-4o or Claude Opus 4.6 at higher per-token costs. Customer service applications processing high transaction volumes benefit from lower-cost, fast-response models like Gemini 2.0 Flash or Claude Haiku 4.5 that provide sufficient quality at dramatically lower per-token rates. Code generation and technical documentation benefit from models with strong coding benchmarks — GitHub Copilot (based on OpenAI models), AWS CodeWhisperer (Amazon), and Gemini Code Assist (Google) each offer different capability-cost profiles.

The commercial benefit of this differentiation is significant. Replacing GPT-4o with Claude Haiku or Gemini Flash for appropriate high-volume use cases can reduce AI inference costs by 70 to 90 percent on those workloads, while maintaining premium model access where it is genuinely needed.

Open-Source Models as a Cost Baseline

Llama (Meta), Mistral, and other high-quality open-source models provide a cost baseline that enterprise AI licensing negotiations should reflect. Running a 70-billion parameter Llama model on dedicated GPU infrastructure (available through AWS, Azure, or Google Cloud) provides substantial AI capability at per-token costs that are orders of magnitude lower than proprietary model API pricing at scale. The operational overhead of managing open-source model infrastructure is real, but for predictable, high-volume workloads where the open-source model provides sufficient quality, the cost saving can justify that overhead.

Enterprise AI licensing negotiations should always be conducted with awareness of what open-source deployment would cost for the same workload. This creates credible negotiating pressure — API providers know that sufficiently large enterprises have the infrastructure capability to run open-source alternatives, and pricing that materially exceeds the open-source TCO will face pushback in enterprise negotiations.

Budget Governance for Enterprise AI Spending

Consumption billing requires real-time budget governance to prevent cost overruns. The following controls are standard practice for enterprise AI deployments at scale:

  • Token consumption monitoring: Implement token usage tracking per application, per team, and per model, with daily reporting dashboards. Consumption billing overruns compound daily — monthly reviews are insufficient.
  • Budget alerts: Set automated alerts at 50 percent, 75 percent, and 90 percent of monthly budget thresholds for each AI service. Configure alerts to trigger both operational (engineering) and financial (finance) notifications.
  • Per-team cost allocation: Allocate AI API costs to business teams via chargeback or showback mechanisms. Teams that see the cost of their AI usage make more cost-conscious application design decisions than teams insulated from consumption costs.
  • Request-level cost estimation: For high-volume applications, implement client-side token estimation before each API call to identify and prevent runaway prompts or context windows that generate disproportionate costs.
  • Model tiering controls: Enforce model tier selection policies that require justification for use of premium-tier models in production applications. Default production model selection should be the most cost-efficient model tier that meets quality requirements.

Data Egress and Infrastructure Costs in AI Deployments

Enterprises running AI workloads on cloud infrastructure face a cost dimension that is frequently absent from AI budget models: data egress. AWS charges $0.09 per GB for data transferred out of its network — a cost that appears modest until AI pipelines begin processing and returning large volumes of inference results, embeddings, or fine-tuned model outputs. An enterprise running a retrieval-augmented generation (RAG) pipeline that transfers several terabytes of context data per month between storage and inference infrastructure can generate egress charges that rival the token consumption costs of the AI calls themselves.

Azure and Google Cloud both impose comparable data egress fees, though each provider structures their inter-region and internet-egress rates differently. Azure OpenAI customers processing large documents through the API and returning results to on-premises systems face inter-region and internet egress costs that are separate from the token billing shown on the AI service invoice. Google Vertex AI customers running batch prediction jobs that output results to external storage encounter similar charges. These data movement costs should be modelled alongside token consumption costs in any enterprise AI total cost of ownership analysis.

The practical implication for enterprise AI procurement is straightforward: do not evaluate AI provider costs on token pricing alone. Factor in the data infrastructure costs of the deployment architecture — egress from the cloud to on-premises systems, cross-region data movement for compliance or latency requirements, and storage costs for model artefacts, training datasets, and inference logs. For AWS Bedrock deployments in particular, data egress costs are billed under the same AWS account and may not appear in AI-specific cost reports without explicit tagging and cost allocation configuration.

Enterprise AI Licensing: Key Negotiation Principles

Enterprise procurement of AI services should be approached with the same commercial rigour as any significant software investment. The following principles apply across all four major provider ecosystems:

  • Always flag OpenAI lock-in provisions during contract review. Price change notice periods, volume commitment shortfall penalties, model version restrictions, and data portability terms require negotiation before signing.
  • Always compare Azure OpenAI vs direct OpenAI pricing models. For high-volume production workloads, Azure OpenAI Provisioned Throughput Units typically provide lower effective per-token costs than direct OpenAI consumption pricing. The compliance and governance advantages of Azure OpenAI are real and should factor into the decision for regulated enterprises.
  • Consumption billing creates budget unpredictability. Every AI agreement signed without consumption monitoring, budget alerts, and token cost allocation in place will generate unexpected overspend. Governance infrastructure must be implemented before production deployment, not after the first overrun.
  • Negotiate model substitution rights in all enterprise AI agreements. The ability to substitute Anthropic Claude for OpenAI GPT, or Google Gemini for both, without breaching the agreement or triggering penalties is essential for maintaining commercial leverage and technology flexibility over a multi-year relationship.
  • Benchmark all enterprise AI agreements. Pricing for enterprise AI is not published and varies significantly based on committed volume, relationship history, competitive pressure, and negotiation approach. Independent benchmarking of what comparable enterprises pay provides the negotiating anchor that standard procurement processes cannot generate.