Global cloud infrastructure visualization representing AWS and Azure inference regions
GenAI Cost Comparison

Amazon Bedrock vs Azure OpenAI: the enterprise cost truth

Both platforms quote a low per token rate. The real enterprise bill is set by throughput commitments, data egress, and the cloud agreement the inference sits inside.

Contact Us AWS Advisory
500+Enterprise clients
$2B+Under advisory
Industry Recognized
500+ Enterprise Clients
$2B+ Under Advisory
11 Vendor Practices
100% Buyer Side Independent

A token rate alone never predicts the enterprise bill. Throughput commitments, data egress, and the host cloud contract decide what generative AI actually costs.

Key takeaways

  • Amazon Bedrock charges per input and output token with no separate platform fee, billed inside your existing AWS account.
  • Azure OpenAI matches the headline token rate but pushes large users toward Provisioned Throughput Units that commit capacity in advance.
  • Data egress and cross region traffic, not the model rate, drive the surprise line items on both platforms.
  • Bedrock spend counts toward an AWS EDP commitment, so generative AI can pull forward a better discount tier.
  • Azure OpenAI consumption counts toward a Microsoft Azure Consumption Commitment, which changes the renewal math.
  • The cheaper platform is the one whose enterprise agreement you negotiate harder, not the one with the lower list token rate.

How do Amazon Bedrock and Azure OpenAI price enterprise inference?

Both platforms bill consumption by the token. The difference is the wrapper around that token rate.

Amazon Bedrock pricing is on demand by default, charging separately for input and output tokens with no monthly platform fee. Azure OpenAI pricing mirrors that structure but offers Provisioned Throughput Units that reserve capacity for a fixed hourly rate.

  • On demand: pay per token, ideal for spiky or early stage workloads on either platform.
  • Committed throughput: reserved capacity that lowers the unit rate but bills whether you use it or not.
  • Batch: both platforms discount asynchronous batch inference for jobs that tolerate latency.

Indicative enterprise cost structure, 2026

DimensionAmazon BedrockAzure OpenAI
Base billingPer input and output tokenPer input and output token
Capacity reservationProvisioned Throughput model unitsProvisioned Throughput Units
Platform feeNoneNone
Counts towardAWS EDP commitmentAzure Consumption Commitment
Primary lock inRegion and model availabilityReserved PTU term

Which models and capabilities differ between the two platforms?

Model choice is the clearest split. Each platform leads with a different frontier family.

How does Bedrock structure model access?

Bedrock is a model marketplace. It serves Anthropic Claude, Meta Llama, Mistral, Cohere, and Amazon Nova through one API, so you can route by task without leaving the platform. The Bedrock documentation sets out per model token rates that vary widely.

How does Azure OpenAI structure deployments?

Azure OpenAI is single family. It serves the OpenAI GPT and o series models with enterprise controls, regional deployment, and content filtering described in the Azure OpenAI documentation. Depth on one family, not breadth across many.

What hidden costs change the real bill?

The token rate is rarely the line item that breaks a budget. Two others do.

What does data egress add?

Retrieval augmented generation pulls context from storage and vector databases on every call. That traffic, plus cross region calls, lands as egress and request volume that no proof of concept measured.

What do throughput commitments lock in?

Reserved capacity discounts the unit rate but bills the full reservation. Buy it before traffic is steady and you pay for idle units. We see this reservation sit underused for two quarters after purchase.

  1. Egress: model the storage and retrieval traffic, not just the inference call.
  2. Idle reservation: size committed throughput to proven steady state, never to peak.
  3. Fine tuning: custom model hosting carries its own hourly charge on both platforms.

How should an enterprise negotiate either commitment?

Generative AI spend is large enough to move your whole cloud agreement. Negotiate it there.

What levers move an AWS EDP?

Bedrock consumption counts toward an AWS Enterprise Discount Program commitment. Folding projected AI spend into the EDP can lift the discount tier across the entire account, including compute and storage. The AWS Savings Plans model shows how committed spend changes unit economics.

What levers move an Azure MACC?

Azure OpenAI consumption draws down the Microsoft Azure Consumption Commitment. A growing AI workload can justify a larger MACC at better terms, but it also raises the floor you must consume. Size the commitment to the workload you can prove, not the roadmap you hope for.

Where the common advice on choosing a generative AI platform by token price is wrong

The standard advice is to pick the platform with the lower published token rate and move on. We disagree. In just over half of the cloud AI cost reviews we ran, the platform with the lower advertised rate produced the higher annual bill once provisioned throughput, data egress, and the host cloud commitment were counted. The token rate is a rounding error against an underused capacity reservation or an egress pattern nobody modeled. The buyer side move is to model six months of real traffic, including retrieval and cross region calls, then negotiate the AI spend inside your EDP or MACC. Pick the contract you can shape, not the rate card you can read.

Rows of cloud data center servers that host enterprise model inference
Provisioned throughput reserves physical capacity, so the discount applies even to the hours the reservation sits idle.
52%
Reviews where cheaper rate cost more
18%
Median egress share of AI bill
11%
Average AI workloads with idle reserve

Source: Redress Compliance advisory engagement file, 2024 to 2025.

The cheaper platform is the contract you negotiate hardest, not the rate card with the lower number.

What should a buyer do next?

  1. Pull six months of token, egress, and retrieval volume from your current proof of concept before committing.
  2. Map each workload to the model family that fits, not to the platform your cloud rep prefers.
  3. Model on demand against provisioned throughput at your proven steady state, never at peak.
  4. Quantify how Bedrock spend lifts your AWS EDP tier or how Azure OpenAI draws down your MACC.
  5. Hold reserved capacity purchases until traffic is stable for two consecutive months.
  6. Put a benchmarking clause and an exit ramp in the commitment before you sign.

Frequently asked questions

Is Amazon Bedrock cheaper than Azure OpenAI?

Not by default. Bedrock and Azure OpenAI publish similar per token rates, so the cheaper platform depends on your traffic pattern, reserved capacity use, egress, and the cloud commitment the spend counts toward.

Does Bedrock charge a platform fee?

No. Amazon Bedrock bills only for the tokens you process plus any provisioned throughput or fine tuning you choose, with no separate monthly platform fee.

What are Provisioned Throughput Units?

They are reserved inference capacity sold for a fixed term. The unit rate falls, but you pay for the full reservation whether you use it or not, so size them to proven steady state.

Does generative AI spend count toward my cloud commitment?

Yes. Bedrock consumption counts toward an AWS EDP, and Azure OpenAI consumption draws down a Microsoft Azure Consumption Commitment, so AI workloads can move your overall discount tier.

Which platform offers more model choice?

Bedrock offers more model families, including Anthropic Claude, Meta Llama, Mistral, Cohere, and Amazon Nova. Azure OpenAI offers depth on the OpenAI GPT and o series with strong enterprise controls.

What drives surprise costs on both platforms?

Data egress, retrieval traffic, and idle reserved capacity. These line items, not the headline token rate, are the most common reason a generative AI bill exceeds the proof of concept estimate.

Can I move workloads between the two platforms?

Yes, but prompts, fine tuned models, and integration code are platform specific. Plan for rework and negotiate an exit ramp so a migration is a lever rather than a threat you cannot execute.

Should I sign reserved capacity at launch?

No. Hold reserved capacity until traffic is stable for at least two consecutive months. Early commitments are the leading cause of paying for inference you never use.

Free Download

The AWS Bedrock Licensing Guide

Model access tiers, on demand against provisioned throughput math, egress traps, and the EDP levers that move the real generative AI bill.

Used across more than five hundred enterprise engagements. Independent. Buyer side. Built for procurement leaders running the next renewal cycle.

No spam. We will only email you about this download. Privacy.
Run a software spend health check across your cloud and AI estate in under five minutes.
Open the Tool →

We have never seen a generative AI bill decided by the token rate. It is decided by the commitment around it.

Fredrik Filipsson
Co Founder and Group CEO, Redress Compliance