Why AWS Bedrock Pricing Surprises Enterprise Buyers
AWS Bedrock launched as a managed platform for accessing foundation models — Anthropic's Claude, Meta's Llama, Amazon's own Titan series, Mistral, Cohere, and others — through a unified serverless API. The headline value proposition is straightforward: pay per token, no upfront commitment, swap models as your needs evolve. In practice, enterprise deployments discover a significantly more complicated cost picture within the first quarter of production usage.
The core pricing mechanism is simple enough. Bedrock charges for input tokens (the text or data you send to the model) and output tokens (the text the model returns). Output tokens consistently cost five times more than input tokens across every model family on the platform. What erodes budgets is everything around that core rate: the wide variance between model families, the additional per-character charges for document processing, egress costs for moving results back to applications, and the sharp increase in cost when workloads shift from experimental to production-scale.
Enterprises that enter 2026 without a deliberate Bedrock procurement strategy routinely overpay by 30 to 50 percent compared to peers who have structured their consumption around the right tier, the right model for each use case, and a negotiated Enterprise Discount Program (EDP) that covers GenAI workloads explicitly.
The Three Pricing Tiers: On-Demand, Batch, and Provisioned Throughput
AWS Bedrock organises all billing into three tiers, each designed for a different consumption pattern. Understanding the commercial logic of each tier is the first step toward a defensible procurement position.
On-Demand (Standard Tier)
On-demand is the default. You call the model API and pay the published per-token rate. There is no commitment, no minimum spend, and no negotiation required. For proof-of-concept work, internal tooling with variable usage, and applications where inference volume is unpredictable, on-demand is commercially sensible.
The limitation emerges at scale. On-demand pricing carries AWS's full commercial margin and is subject to service throttling during peak periods. At production volumes — hundreds of millions of tokens per day — on-demand rates become difficult to forecast and difficult to defend in budget reviews. AWS also recently introduced a Priority Tier that carries a 75 percent premium over Standard tier rates but guarantees queue priority and reduced latency during high-demand periods. For real-time customer-facing applications, that premium may be justifiable; for batch analytics, it almost never is.
Batch Processing (50% Discount)
Batch mode processes inference requests asynchronously, typically completing within 24 hours. In exchange for that flexibility, AWS prices batch at 50 percent below on-demand standard rates. For document processing pipelines, data enrichment workflows, offline summarisation, and model evaluation tasks, batch mode is almost always the right choice. The economics are compelling: a workload that costs $10,000 per month on-demand costs $5,000 in batch, with no quality difference in the output and no change to the underlying model.
The adoption barrier is architectural. Batch mode requires your application to be designed for asynchronous result handling, which many teams initially built around synchronous API calls. Refactoring existing pipelines to use batch mode is typically a one-to-three week engineering project that pays back within the first month of operation.
Provisioned Throughput (Reserved Capacity)
Provisioned Throughput is Bedrock's committed capacity model. You purchase Model Units — blocks of reserved inference capacity — measured in tokens per minute, for either a one-month or six-month term. Provisioned Throughput eliminates throttling, guarantees consistent latency, and is the appropriate model for production applications where query-per-second requirements are defined and consistent.
The cost of Provisioned Throughput varies significantly by model family. Meta Llama provisioned throughput is priced at approximately $21 per hour per model unit on a one-month commitment. Anthropic Claude models are significantly more expensive, in the range of $50 per hour per model unit. The hourly charge applies whether or not you are actively sending requests, making right-sizing the critical commercial variable. An over-provisioned Bedrock deployment with idle model units burning $50 per hour at the weekend is a significant source of unbudgeted spend that AWS has no incentive to flag proactively.
Model-by-Model Pricing: The Cost Variance That Matters
AWS Bedrock hosts a catalogue of foundation models, and the price variation between them is substantial. Choosing the right model for each workload — rather than defaulting to the most capable model for every task — is consistently the highest-return cost optimisation available to enterprise buyers.
Anthropic Claude (Premium Tier)
Claude models occupy the premium end of the Bedrock catalogue. Claude Sonnet 4.5 on-demand is priced at $3.00 per million input tokens and $15.00 per million output tokens in the us-east-1 region — a 5x output multiplier that is consistent across all Claude variants. Claude Opus, the highest-capability model, runs approximately $15 per million input tokens and $75 per million output tokens. For complex reasoning, legal document analysis, code generation, and tasks requiring nuanced understanding of unstructured text, the Claude premium is frequently justified. For classification tasks, extraction from structured documents, or routing queries in an agentic pipeline, it almost never is.
Meta Llama (Volume-Efficient Tier)
Llama 3.1 70B is one of the most cost-efficient models for production volume workloads on Bedrock. At approximately $0.99 per million input tokens and $0.99 per million output tokens, Llama is roughly three to fifteen times cheaper than equivalent Claude tiers depending on the task. Enterprises running internal knowledge base queries, customer intent classification, translation pipelines, or moderate-complexity summarisation have found Llama models match Claude performance at a fraction of the cost once their prompts are well-engineered.
Amazon Titan (Native AWS Models)
Amazon's own Titan models are the most economical option in the Bedrock catalogue. Titan Text Express runs at $0.0002 per thousand input tokens — among the lowest rates on the platform. Titan is appropriate for simpler generation tasks, embedding generation for retrieval-augmented generation (RAG) pipelines, and text-to-embedding workflows where model sophistication is less important than throughput economics. Because Titan is AWS-native, it also integrates cleanly with other AWS services without cross-platform data transfer charges.
Mistral and Cohere (Mid-Tier Alternatives)
Mistral and Cohere models occupy the middle ground of the Bedrock catalogue — more capable than Titan for complex tasks, meaningfully cheaper than Claude. They are worth evaluating for European language workloads (Mistral has particularly strong French and German performance) and for enterprise search and retrieval use cases where Cohere's specialisation in embedding quality is commercially significant.
Struggling to map AWS Bedrock costs to your business case?
We provide independent Bedrock cost modelling and EDP negotiation support for enterprise buyers.The Hidden Cost: Data Egress
Data egress is the most common surprise cost in AWS Bedrock deployments — and the one least likely to appear in early budget models. Every token of output that your Bedrock model generates is delivered back to your application over AWS infrastructure. When that application runs in the same AWS region and the same VPC, the egress cost is minimal. As deployments grow more complex — multi-region architectures, hybrid on-premises environments, cross-cloud integrations, or large-scale data export to external systems — egress charges accumulate rapidly and outside the per-token pricing framework that procurement teams typically monitor.
AWS charges between $0.05 and $0.09 per GB for data transferred out to the public internet, depending on the region and volume. For a production LLM application generating large output documents — legal briefs, medical summaries, detailed analysis reports — this adds a meaningful and often unbudgeted cost layer on top of token pricing. The first 100 GB per month is free, which masks the problem during initial pilots but provides no material offset once workloads scale.
The architectural remediation is straightforward: keep inference results inside the same AWS VPC and region as the consuming application wherever possible, use Amazon CloudFront for distributing model outputs to end users rather than direct S3 egress, and implement VPC Gateway Endpoints for S3 and DynamoDB to eliminate NAT Gateway processing charges on internal traffic. These are not complex configurations, but they require deliberate attention during architecture review — and they will rarely be raised by your AWS account team.
Reserved Instances vs Savings Plans: The Bedrock Interaction
AWS Bedrock Provisioned Throughput is a separate commitment mechanism from EC2 Reserved Instances and Compute Savings Plans, but understanding how these interact within an overall AWS cost strategy is essential for enterprise buyers optimising GenAI spend alongside existing cloud infrastructure.
Reserved Instances (RIs) lock you into a specific EC2 instance type in a specific region for one or three years. They offer the highest possible discount — up to 75 percent against on-demand rates for standard RIs — but zero flexibility if your workload changes. If the AI inference layer you build on Bedrock also drives changes in your EC2 footprint (which is typical for organisations building agentic AI pipelines), over-committed RIs can become expensive anchors within 12 months.
Compute Savings Plans offer a better risk profile for most 2026 enterprise deployments. Rather than committing to a specific instance type, you commit to a dollar amount of compute spend per hour — and that commitment applies across EC2, Fargate, and Lambda automatically. For organisations whose GenAI workloads are shifting between services as architectures mature, Savings Plans provide the discount benefit without the lock-in penalty. Compute Savings Plans deliver up to 66 percent savings against on-demand, with the additional flexibility that justifies the slightly lower headline discount versus standard RIs.
The 2026 strategic recommendation is to use Compute Savings Plans as the primary commitment vehicle for EC2 and serverless infrastructure, supplement with Reserved Instances only for the database tier and for any scenario where capacity reservation is a technical requirement, and handle Bedrock inference costs through provisioned throughput commitments sized to actual production traffic rather than forecasted peak.
The Enterprise Discount Program and AWS Bedrock
The AWS Enterprise Discount Program (EDP) is the primary mechanism for large enterprises to secure contractual discounts on overall AWS spend. Understanding how EDP interacts with Bedrock pricing — and where its limitations lie — is critical for organisations looking to negotiate GenAI costs alongside their broader cloud agreement.
EDP is structured as a committed spend agreement. You commit to a minimum annual AWS spend — typically $1 million or more, with meaningful discount tiers beginning at approximately $2 million annually — and in exchange receive a percentage discount applied across eligible services. Baseline discounts for a $1 million annual commitment typically range from 6 to 9 percent. Three-year agreements at $2 million annual spend routinely achieve 15 percent discounts. At $5 million and above, negotiated discounts of 20 to 30 percent are achievable with appropriate leverage — though AWS does not publish these tiers publicly.
The critical question for Bedrock buyers is whether GenAI inference spend counts toward EDP commitment thresholds. As of 2026, AWS Bedrock on-demand and batch inference charges are generally eligible for EDP discount application, but the rules are not uniform across all model providers, and some Provisioned Throughput contracts are structured as separate private pricing agreements rather than folding into the standard EDP mechanism. Before assuming that your Bedrock spend will attract EDP discounts, verify the specific eligibility terms with your AWS account team in writing and ensure that the commitment schedule reflects actual planned Bedrock consumption.
One additional EDP constraint that catches enterprise buyers: all accounts consolidated under an EDP are required to enrol in AWS Enterprise Support, which is priced at 3 percent of monthly spend with a $15,000 monthly minimum. For organisations with lower historical AWS spend, this mandatory support charge can offset a material portion of the discount benefit. Model this explicitly before committing.
Consumption Billing and Budget Unpredictability
Consumption billing is the defining commercial characteristic of AWS Bedrock — and the characteristic that creates the most difficulty for enterprise finance teams accustomed to the predictability of per-user SaaS contracts or annual ELA structures. When you pay per token, every change to user behaviour, application traffic, prompt engineering, or model selection translates directly into cost variance. A new employee onboarding an AI assistant, a marketing campaign that drives document processing volume, or a developer testing a new prompt template with unexpectedly verbose outputs can all create cost anomalies that appear in the bill before they are flagged in any monitoring dashboard.
The remediation framework for consumption billing unpredictability has three components. First, implement AWS Cost Anomaly Detection with thresholds set at the service and model level — not just the account level — so that Bedrock-specific cost spikes are identified within hours rather than at month end. Second, establish application-level token budgets with circuit breakers that throttle requests when per-user or per-application token consumption exceeds defined limits. Third, build monthly Bedrock cost reviews into your FinOps cadence, with attention to the on-demand versus batch ratio (any workload consistently running on-demand that could be batch-processed is leaving 50 percent savings on the table).
Organisations that have moved from reactive billing reviews to proactive consumption governance consistently report 25 to 40 percent reductions in Bedrock spend within the first six months — not because they are using the models less, but because they are using the right models, at the right tier, with architectural configurations that eliminate unnecessary egress and idle provisioned capacity.
Comparing AWS Bedrock to Direct OpenAI and Azure OpenAI
Enterprise buyers evaluating GenAI platforms in 2026 typically consider three options alongside AWS Bedrock: direct access to OpenAI's API, Azure OpenAI Service, and Google Vertex AI. Each carries distinct commercial characteristics that affect total cost of ownership beyond headline token pricing.
Direct OpenAI pricing for GPT-4o runs at approximately $5 per million input tokens and $15 per million output tokens — comparable to mid-tier Claude pricing on Bedrock, with the additional consideration that OpenAI enterprise agreements contain lock-in provisions that are more restrictive than typical AWS commercial terms. The lack of a multi-year committed discount structure similar to EDP means that high-volume OpenAI direct customers have limited leverage in pricing negotiations. OpenAI's enterprise agreements also typically include volume commitments with take-or-pay obligations that create financial exposure if adoption trajectories change.
Azure OpenAI Service offers the same base token prices as direct OpenAI but introduces Provisioned Throughput Units (PTUs) for reserved capacity — a model structurally similar to AWS Bedrock's provisioned throughput. Azure's commercial advantage for Microsoft-heavy organisations is the ability to apply Azure Monetary Commitment credits (from an existing Microsoft Azure agreement) to Azure OpenAI spend, effectively achieving the equivalent of EDP discounts within the Microsoft commercial framework. Organisations that already hold significant Azure Monetary Commitment balances should model this credit application carefully before assuming AWS Bedrock is the more cost-effective platform.
AWS Bedrock's distinctive commercial advantage is model agnosticism. Because the platform provides access to Claude, Llama, Titan, Mistral, and Cohere through a single API with consistent commercial terms, enterprise buyers can route workloads to the lowest-cost capable model dynamically, maintain optionality as the model landscape evolves, and avoid the lock-in risk of building production workflows around a single model provider's proprietary API.
Practical Negotiation Strategies for AWS Bedrock
Enterprise procurement teams approaching AWS Bedrock as part of a broader commercial negotiation have more leverage than they typically exercise. The following strategies consistently deliver better commercial outcomes for Redress Compliance clients.
Structure your EDP to include explicit Bedrock spend projections. AWS account teams will sometimes offer additional discount incentives when a client can demonstrate credible GenAI adoption plans — not just infrastructure growth. Document your Bedrock use cases, model preferences, and token volume forecasts and include them in the EDP commercial conversation rather than treating Bedrock as a separate procurement.
Negotiate egress cost treatment as part of the EDP. While AWS generally does not discount data transfer through the standard EDP mechanism, large enterprise agreements have historically included credits or reduced rates for egress in the context of specific workloads, particularly during migration phases or for organisations moving significant data volumes from on-premises to AWS. This is rarely offered; it must be requested.
Use end-of-quarter timing. AWS sales teams operate on quarterly targets and are more willing to include additional concessions — extended provisioned throughput commitment terms, egress credits, or incremental EDP discount rate improvements — in the final weeks of a financial quarter. If your procurement timeline is flexible, aligning AWS negotiations with the final two weeks of a calendar quarter consistently improves outcomes.
Benchmark against published pricing from alternative platforms. The availability of comparable models on Azure AI Foundry, Google Vertex AI, and direct from Anthropic gives enterprise buyers genuine competitive alternatives that create pricing pressure in AWS negotiations. AWS is not accustomed to losing large enterprise AI accounts to competitors, and demonstrating a credible evaluation process strengthens your position.
AWS Bedrock Cost Governance: The Operational Framework
Procurement negotiation secures the right starting rate. Operational governance is what keeps Bedrock costs under control as production deployments scale. The governance framework that Redress Compliance recommends for enterprise Bedrock deployments covers four operational dimensions.
Model governance establishes an approved model catalogue for each use case category — complex reasoning, document extraction, classification, embedding generation — with cost-per-accuracy benchmarks that inform model selection decisions. Without this, individual development teams will default to the most capable model available and accumulate token costs that cannot be attributed to specific business outcomes.
Tier governance defines the conditions under which each pricing tier is appropriate. Batch mode should be the default for any workload that can tolerate asynchronous processing. On-demand standard should be used for interactive applications with variable load. Provisioned throughput should be reserved for production services with defined latency requirements and consistent token volume. The Priority Tier should be used only with explicit approval from FinOps given its 75 percent premium over standard rates.
Architecture governance establishes egress minimisation as a design principle for all Bedrock integrations — ensuring that inference results stay within the AWS network perimeter wherever possible, and that any external delivery is routed through CloudFront rather than direct internet egress from S3 or EC2.
Commitment governance maintains a rolling 90-day review of provisioned throughput utilisation, identifying model units that are consistently underutilised and should be reduced at the next commitment renewal. Given the one-month minimum commitment period for provisioned throughput, this review cycle can recoup material idle capacity cost within a single quarter.
What to Do Before Your Next AWS Bedrock Bill Arrives
If your organisation is currently running AWS Bedrock workloads without a formal cost governance framework, the practical starting point is a usage audit that maps current token consumption by model family, tier, and application. In our experience, most enterprises running Bedrock for six months or more will find at least one of the following in their first audit: significant on-demand usage that could be shifted to batch; provisioned throughput commitments with utilisation below 60 percent during off-peak hours; egress charges attributable to architectural choices rather than business requirements; and model selection that has defaulted to Claude for tasks where Llama or Titan would perform equivalently at one-third the cost.
Addressing these findings typically requires changes across engineering, FinOps, and procurement — which is why the most effective Bedrock cost governance programmes are cross-functional, not just engineering-led cost reduction exercises. The procurement lever (EDP renegotiation, provisioned throughput right-sizing, egress credit negotiation) and the engineering lever (batch migration, model substitution, architecture review) need to be pulled simultaneously to capture the full savings available.
Ready to reduce your AWS Bedrock spend?
Redress Compliance provides independent AWS cost governance and EDP negotiation advisory. Buyer side only.What AWS Won't Tell You at Renewal
AWS Bedrock is a genuinely powerful platform for enterprise GenAI deployment. Its multi-model architecture, serverless pricing, and deep AWS ecosystem integration make it the natural starting point for organisations whose applications already run on AWS infrastructure. The commercial complexity is real but manageable with the right governance framework.
The five principles that characterise enterprise buyers who manage Bedrock costs well are: matching each workload to the right pricing tier rather than defaulting to on-demand; choosing the appropriate model for each task rather than the most capable model for every task; treating data egress as a first-class cost variable in architecture decisions; including Bedrock explicitly in EDP commercial negotiations rather than treating it as a line item to be managed after commitment; and establishing consumption governance before production scale rather than attempting to retrofit controls after budget overruns have occurred.
Organisations that apply these principles consistently are capturing 25 to 40 percent cost reductions against their initial Bedrock deployments — savings that compound as production workloads scale through 2026 and beyond.