Both platforms quote a low per token rate. The real enterprise bill is set by throughput commitments, data egress, and the cloud agreement the inference sits inside.
A token rate alone never predicts the enterprise bill. Throughput commitments, data egress, and the host cloud contract decide what generative AI actually costs.
Both platforms bill consumption by the token. The difference is the wrapper around that token rate.
Amazon Bedrock pricing is on demand by default, charging separately for input and output tokens with no monthly platform fee. Azure OpenAI pricing mirrors that structure but offers Provisioned Throughput Units that reserve capacity for a fixed hourly rate.
Indicative enterprise cost structure, 2026
| Dimension | Amazon Bedrock | Azure OpenAI |
|---|---|---|
| Base billing | Per input and output token | Per input and output token |
| Capacity reservation | Provisioned Throughput model units | Provisioned Throughput Units |
| Platform fee | None | None |
| Counts toward | AWS EDP commitment | Azure Consumption Commitment |
| Primary lock in | Region and model availability | Reserved PTU term |
Model choice is the clearest split. Each platform leads with a different frontier family.
Bedrock is a model marketplace. It serves Anthropic Claude, Meta Llama, Mistral, Cohere, and Amazon Nova through one API, so you can route by task without leaving the platform. The Bedrock documentation sets out per model token rates that vary widely.
Azure OpenAI is single family. It serves the OpenAI GPT and o series models with enterprise controls, regional deployment, and content filtering described in the Azure OpenAI documentation. Depth on one family, not breadth across many.
The token rate is rarely the line item that breaks a budget. Two others do.
Retrieval augmented generation pulls context from storage and vector databases on every call. That traffic, plus cross region calls, lands as egress and request volume that no proof of concept measured.
Reserved capacity discounts the unit rate but bills the full reservation. Buy it before traffic is steady and you pay for idle units. We see this reservation sit underused for two quarters after purchase.
Generative AI spend is large enough to move your whole cloud agreement. Negotiate it there.
Bedrock consumption counts toward an AWS Enterprise Discount Program commitment. Folding projected AI spend into the EDP can lift the discount tier across the entire account, including compute and storage. The AWS Savings Plans model shows how committed spend changes unit economics.
Azure OpenAI consumption draws down the Microsoft Azure Consumption Commitment. A growing AI workload can justify a larger MACC at better terms, but it also raises the floor you must consume. Size the commitment to the workload you can prove, not the roadmap you hope for.
The standard advice is to pick the platform with the lower published token rate and move on. We disagree. In just over half of the cloud AI cost reviews we ran, the platform with the lower advertised rate produced the higher annual bill once provisioned throughput, data egress, and the host cloud commitment were counted. The token rate is a rounding error against an underused capacity reservation or an egress pattern nobody modeled. The buyer side move is to model six months of real traffic, including retrieval and cross region calls, then negotiate the AI spend inside your EDP or MACC. Pick the contract you can shape, not the rate card you can read.
Source: Redress Compliance advisory engagement file, 2024 to 2025.
The cheaper platform is the contract you negotiate hardest, not the rate card with the lower number.
Not by default. Bedrock and Azure OpenAI publish similar per token rates, so the cheaper platform depends on your traffic pattern, reserved capacity use, egress, and the cloud commitment the spend counts toward.
No. Amazon Bedrock bills only for the tokens you process plus any provisioned throughput or fine tuning you choose, with no separate monthly platform fee.
They are reserved inference capacity sold for a fixed term. The unit rate falls, but you pay for the full reservation whether you use it or not, so size them to proven steady state.
Yes. Bedrock consumption counts toward an AWS EDP, and Azure OpenAI consumption draws down a Microsoft Azure Consumption Commitment, so AI workloads can move your overall discount tier.
Bedrock offers more model families, including Anthropic Claude, Meta Llama, Mistral, Cohere, and Amazon Nova. Azure OpenAI offers depth on the OpenAI GPT and o series with strong enterprise controls.
Data egress, retrieval traffic, and idle reserved capacity. These line items, not the headline token rate, are the most common reason a generative AI bill exceeds the proof of concept estimate.
Yes, but prompts, fine tuned models, and integration code are platform specific. Plan for rework and negotiate an exit ramp so a migration is a lever rather than a threat you cannot execute.
No. Hold reserved capacity until traffic is stable for at least two consecutive months. Early commitments are the leading cause of paying for inference you never use.
Model access tiers, on demand against provisioned throughput math, egress traps, and the EDP levers that move the real generative AI bill.
Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.
We have never seen a generative AI bill decided by the token rate. It is decided by the commitment around it.