Why Enterprise AI Licensing Demands Independent Analysis

The three largest enterprise AI platforms — Google Gemini via Vertex AI, OpenAI via direct API or Azure OpenAI, and Anthropic Claude — each present compelling capability narratives. But the licensing economics behind each platform differ in ways that have a material impact on total cost of ownership over a two- to three-year enterprise contract.

Consumption-based billing, the pricing model used by all three vendors, creates budget unpredictability that flat-rate software procurement teams are not designed to manage. A single heavy-load application can consume between $5,000 and $20,000 per month in API tokens alone — a number that bears no relationship to the initial purchase order signed with the vendor. Understanding how consumption translates to cost, and negotiating protections against cost spikes before signing, is the foundation of responsible AI procurement.

The Three-Vendor Market Structure

Google Gemini is accessed either through the Gemini Developer API (consumer-facing) or through Vertex AI on Google Cloud Platform — the enterprise deployment path. OpenAI is accessed either directly via the OpenAI API (api.openai.com) or through Azure OpenAI Service, Microsoft's enterprise-grade hosting of OpenAI models on Azure infrastructure. Anthropic Claude is accessed via the Anthropic API directly, or through Amazon Bedrock or Google Vertex AI as managed deployments.

This structural complexity matters because the pricing, data governance, compliance posture, and contractual terms differ significantly depending on the access path chosen — even for the same underlying model.

Token Pricing: What the Published Rates Mean in Practice

All three vendors price API access on a per-token basis, where tokens are roughly equivalent to 0.75 words. Input tokens (the prompt and context you send) are typically priced lower than output tokens (the response generated). The gap between input and output pricing ranges from 4x to 6x across vendors, making output-heavy use cases — summarisation, generation, agentic tasks — significantly more expensive than input-heavy use cases like classification or extraction.

Gemini Pricing on Vertex AI

Google's flagship Gemini 2.5 Pro on Vertex AI is priced at $1.25 per million input tokens and $10.00 per million output tokens for context windows up to 200,000 tokens. Beyond 200,000 tokens, rates double — an important threshold for applications using long-form documents or extended multi-turn conversations. Gemini 2.5 Flash, the mid-tier model, offers $0.30 per million input and $2.50 per million output, representing one of the most cost-competitive capable models in the market. Google also offers context caching on Vertex AI, reducing costs by up to 90 percent for applications with large, repeated system prompts.

OpenAI Pricing: Direct vs Azure

OpenAI's GPT-4o family is priced at approximately $2.50 per million input tokens and $10.00 per million output tokens at standard rates, with cached input pricing reducing costs by up to 50 percent for repeated context. The critical distinction for enterprise buyers is whether to access OpenAI models directly via api.openai.com or through Azure OpenAI Service. Per-token pricing is identical for the same models across both platforms. However, Azure OpenAI adds infrastructure costs — Private Endpoints, VNet integration, enhanced monitoring, and Azure support plans — that can increase total deployment costs by 10 to 50 percent above pure token costs. Organisations already running Microsoft Azure agreements will find Azure OpenAI consumption counts against their Microsoft Azure Consumption Commitment (MACC), which can be advantageous for discount qualification.

Anthropic Claude Pricing

Anthropic's Claude Opus 4 is the most expensive flagship model in this comparison at approximately $15.00 per million input tokens and $75.00 per million output tokens. Claude Sonnet 4 provides a more accessible mid-tier at $3.00 input and $15.00 output per million tokens. Anthropic also supports prompt caching, with cache read pricing at 10 percent of the base input rate — significant savings for applications that repeatedly reference large system prompts or document libraries. Claude is available on Amazon Bedrock and Google Vertex AI as alternative deployment paths, where pricing is broadly comparable to Anthropic direct but subject to the hosting platform's data governance and compliance posture.

Need an independent GenAI licensing assessment?

We analyse consumption projections, contract lock-in, and negotiation leverage across all three platforms.
Request a Review →

OpenAI Enterprise Agreements: The Lock-In Provisions to Know

OpenAI enterprise agreements contain provisions that demand careful legal and procurement review before signature. Enterprise agreements typically involve multi-year commitments with minimum consumption floors that cannot be reduced without penalty — regardless of whether the organisation's usage scales to the projected level. The lock-in risk is compounded by the speed of model evolution: organisations that commit to GPT-4o pricing and capability today may find themselves contractually obligated to a model generation that has been superseded within 12 months.

OpenAI's relationship with Microsoft adds a further dimension of complexity. Microsoft holds a 27 percent equity stake in OpenAI and has committed to a $250 billion Azure consumption agreement with OpenAI. This structural relationship creates alignment between OpenAI enterprise sales and Microsoft Azure growth objectives — a dynamic that procurement teams negotiating OpenAI enterprise agreements should factor into their vendor management strategy. While OpenAI's commercial exclusivity with Microsoft has technically been restructured, the practical reality is that Microsoft's Azure infrastructure remains the primary compute backbone for OpenAI models, and this relationship shapes both pricing and commercial terms.

Azure OpenAI vs Direct OpenAI: The Right Choice Depends on Context

Azure OpenAI Service offers enterprise compliance advantages that direct OpenAI cannot match: data residency controls, private network access, integration with Microsoft Entra ID for identity management, and consumption that counts against Microsoft Azure Committed Use Discounts. For organisations with existing Microsoft EA or MCA agreements, Azure OpenAI may be the more commercially rational choice even if per-token costs appear identical.

Direct OpenAI access provides faster access to new model releases, which Azure OpenAI typically receives 30 to 90 days after the direct API. For organisations where model recency is a competitive differentiator, direct access may justify the loss of Azure compliance integration. The decision should be made explicitly, not by default — and the contractual implications of each path should be reviewed before commitment.

Consumption Billing: The Budget Predictability Problem

All three vendors operate consumption-based billing models. This creates a structural budget unpredictability problem that flat-rate SaaS procurement processes are not designed to handle. Unlike a Microsoft 365 seat licence where the monthly cost is fixed and predictable, an enterprise AI API deployment's monthly cost varies with usage — which in turn varies with application design, user behaviour, model selection, and prompt engineering quality.

In practice, organisations that deploy enterprise AI applications without pre-production consumption modelling routinely experience cost overruns of 30 to 60 percent in the first 12 months. The primary causes are underestimated context window usage, unexpectedly high output token generation in generation tasks, failure to implement prompt caching for repeated context, and model selection decisions made on capability rather than cost efficiency.

How to Model Consumption Before Signing

A rigorous pre-contract consumption model should estimate: average input tokens per API call (including system prompt, conversation history, and user input); average output tokens per API call; estimated API calls per user per day; and total user count. Multiply these figures across each use case and apply the relevant per-token pricing to produce a monthly cost estimate. Then apply a 40 to 60 percent buffer for model drift, use case expansion, and prompt engineering iteration. This modelled estimate, not vendor TCO projections, should be the basis for any enterprise commitment.

Vendor-by-Vendor Contract Risk Analysis

Google Gemini on Vertex AI

Google's enterprise contracts for Vertex AI AI services are governed by the Google Cloud Platform terms, which provide strong data governance guarantees including customer data not being used to train Google models and data residency controls for regulated industries. Google's primary lock-in mechanism is infrastructure coupling: organisations that build production applications on Vertex AI also tend to expand their Google Cloud footprint for storage, databases, and compute — creating a broader ecosystem dependency that increases switching costs over time. Negotiation leverage with Google increases significantly above $500,000 in annual GCP commitment, where custom pricing, SLA improvements, and Committed Use Discount structures become available.

OpenAI Direct

OpenAI enterprise agreements are still maturing relative to traditional enterprise software vendors. Contract terms including usage minimums, data processing addenda, and model version guarantees have historically been less standardised than Google or Microsoft equivalents. Enterprise buyers should insist on explicit provisions covering: model version continuity or migration support, data processing and training opt-out guarantees, consumption caps or budget alerts, and termination provisions in the event of significant price changes. OpenAI enterprise agreements have lock-in provisions that can be difficult to exit without financial penalty — always engage legal review before signing.

Anthropic Claude

Anthropic's enterprise agreements are more straightforward than OpenAI's in terms of contract terms, but the company's smaller commercial scale means enterprise support, SLA guarantees, and procurement process maturity lag behind Google and Microsoft. Organisations accessing Claude via Amazon Bedrock or Google Vertex AI benefit from the hosting platform's enterprise contract framework, which typically provides stronger SLA protections and more familiar procurement processes than Anthropic direct. The trade-off is potential pricing differences and slightly delayed access to new model versions.

Side-by-Side Pricing Comparison

The following comparison uses publicly published list rates for mid-tier models — the workhorses of most enterprise AI deployments — to illustrate relative cost positioning. Note that enterprise-negotiated rates can differ significantly from published list prices for commitments above $500,000 annually.

  • Google Gemini 2.5 Flash: $0.30 per million input tokens, $2.50 per million output tokens. Best cost-performance ratio for high-volume, routine workloads.
  • OpenAI GPT-4o: $2.50 per million input tokens, $10.00 per million output tokens. Premium model with strong reasoning capabilities but significantly higher cost than Gemini Flash for equivalent workloads.
  • Anthropic Claude Sonnet 4: $3.00 per million input tokens, $15.00 per million output tokens. Strong for nuanced language tasks; higher output pricing relative to Google equivalents.
  • Google Gemini 2.5 Pro: $1.25 per million input tokens, $10.00 per million output tokens. Comparable capability to GPT-4o at lower input cost but similar output pricing.
  • Anthropic Claude Opus 4: $15.00 per million input tokens, $75.00 per million output tokens. Highest capability model in this comparison; justified only for tasks where accuracy materially impacts business outcomes.
"The cheapest model per token is rarely the cheapest model per outcome. Enterprise AI cost optimisation requires matching model capability to task requirements — not deploying the most powerful model across all workloads by default."

Five Negotiation Levers Across All Three Vendors

1. Minimum Commitment Negotiation: All three vendors offer pre-committed pricing tiers that reduce per-token rates in exchange for minimum monthly or annual consumption commitments. Push back on commitment levels that exceed your pre-production modelled usage — over-commitment generates shelfware and reduces your negotiation leverage at renewal.

2. Model Version Guarantees: Request explicit contractual protections against forced model migrations. If your application is tuned for GPT-4o or Gemini 2.5 Pro, a vendor-mandated migration to a successor model may require significant re-engineering and testing effort at your cost.

3. Price Change Protections: AI pricing has changed rapidly across all three platforms. Negotiate rate lock provisions — or at minimum, price change notification periods of 90 days or more — into enterprise agreements to protect budget predictability.

4. Data Governance Addenda: For regulated industries, ensure your contract explicitly excludes customer data from model training, specifies data residency requirements, and includes provisions for data deletion on contract termination. Do not rely on vendor privacy policies, which can change unilaterally.

5. Multi-Vendor Leverage: All three vendors are in active competition for enterprise AI spend. Running parallel evaluations and communicating this to each vendor's enterprise sales team generates pricing concessions that single-vendor negotiations do not. A credible alternative is the strongest negotiation lever available.

Stay Current on GenAI Licensing Developments

AI vendor pricing and contract terms evolve rapidly. Subscribe to the Redress Compliance newsletter for quarterly GenAI licensing updates and negotiation intelligence.