How Amazon Bedrock Is Priced: The Three Models
Amazon Bedrock does not have a licence fee in the traditional sense. There is no annual subscription, no per-user charge, and no perpetual licence to purchase. Instead, you pay for what you use, measured in tokens — the basic unit of text that foundation models process. Understanding the three pricing modes available is the foundation for managing Bedrock costs effectively.
On-Demand Pricing
On-demand is the default pricing mode. You pay per 1,000 input tokens processed and per 1,000 output tokens generated. Input and output prices differ — output tokens are consistently more expensive than input tokens, typically by a factor of three to five. Prices vary significantly by model: a standard Claude Haiku inference is dramatically cheaper per token than a Claude Opus or Claude Sonnet inference, and Meta Llama models are generally priced lower than Anthropic models for comparable context windows.
On-demand pricing requires no commitment and is billed monthly. It is the right choice for development, testing, and variable or unpredictable production workloads. The risk is cost unpredictability — consumption billing creates budget unpredictability that is difficult to manage with traditional IT budgeting approaches. A change in application usage patterns, a new integration that generates unexpectedly high token volumes, or a misconfiguration that results in excessive inference calls can produce a monthly bill that significantly exceeds forecast.
Batch Pricing
Batch mode is available for workloads where near-real-time response is not required. AWS processes batch inference requests asynchronously, delivering results within 24 hours, in exchange for a 50% discount on on-demand rates. This makes batch processing significantly more economical for document processing, content generation at scale, data enrichment, and any AI workload that does not require immediate response. Organisations running Bedrock for offline analytics, document summarisation pipelines, or bulk content generation tasks should model whether batch processing is appropriate for those workloads, because the cost saving over on-demand is material at scale.
Provisioned Throughput
Provisioned throughput allows you to purchase a guaranteed level of inference capacity — measured in model units — for a fixed period, either one month or six months. In exchange for committing upfront, you receive guaranteed throughput regardless of demand spikes, and pricing is expressed as a fixed hourly rate per model unit. As of 2025, provisioned throughput pricing typically starts around $15,000 per month as a minimum commitment, making it viable primarily for enterprise-grade use cases where throughput consistency is operationally critical — customer-facing AI applications, high-volume document processing pipelines, or production workloads where rate limiting would cause business impact.
Provisioned throughput is charged whether or not you use the capacity during the commitment period. This means it carries commitment risk similar to AWS Reserved Instances — if your workload changes or the model you have provisioned is no longer your preferred choice, you continue paying for the reserved capacity. The decision to move to provisioned throughput should be based on demonstrated, stable production usage patterns, not anticipated future usage.
The Cost Drivers Enterprises Most Often Miss
Data Egress: The Single Biggest Surprise Cost
Data egress is the most common surprise cost in AWS deployments generally, and Bedrock is no exception. When your Bedrock application transfers data out of AWS — to on-premises systems, to another cloud provider, to end-user applications outside the AWS network, or across AWS regions — data egress charges apply. The standard AWS data transfer rate is $0.09 per GB for the first 10TB per month outbound from US East regions, with similar rates applying in other regions.
For Bedrock workloads, egress costs accumulate through several mechanisms. If your application retrieves model outputs and sends them to an on-premises system or a non-AWS SaaS application, every response triggers egress charges. If you are using Bedrock Knowledge Bases with a vector store that retrieves and returns large document chunks, the data transfer for those retrieval operations adds up. If your AI application serves users across multiple geographies and you are running inference in a single region, the cross-region or internet-facing data transfer generates egress costs that may not have been visible during architectural planning. The first step in managing egress costs is mapping every data flow in your Bedrock application architecture and identifying which flows carry AWS-to-internet or AWS-to-non-AWS transfer costs.
Knowledge Bases and Embedding Costs
Amazon Bedrock Knowledge Bases, which enable retrieval-augmented generation (RAG) applications, carry additional costs beyond the inference charges for the foundation model itself. Creating and maintaining a Knowledge Base requires embedding model calls (to convert documents into vector representations), vector storage in a compatible vector database (either Amazon OpenSearch Serverless or third-party options), and retrieval queries that generate both embedding and inference charges each time a user interacts with the RAG application. Organisations that build Bedrock Knowledge Bases for internal document search, customer support, or knowledge management should model the full stack cost — embedding, storage, retrieval, and inference — not just the inference cost.
Guardrails and Bedrock Flows
Amazon Bedrock Guardrails, which provide content filtering, topic blocking, PII detection, and grounding validation, are priced per text unit processed. For high-volume applications, guardrail costs can become a significant line item. Bedrock Flows — the visual composition tool for building multi-step AI workflows — is priced at $0.035 per 1,000 node transitions. For complex multi-step workflows with high transaction volumes, Flows pricing adds up in ways that may not have been anticipated during initial cost modelling.
Concerned about runaway AWS Bedrock costs?
We conduct independent AWS AI spend reviews and build governance frameworks to prevent budget overruns.Bedrock vs Azure OpenAI: A Pricing Comparison
For enterprises evaluating Bedrock against Azure OpenAI Service, the pricing comparison is more complex than it appears. Both platforms use token-based consumption pricing, but several structural differences affect the total cost of ownership for enterprise workloads.
Model access and pricing. Azure OpenAI provides access to OpenAI models (GPT-4o, GPT-4 Turbo, and others) at rates that mirror OpenAI's direct API pricing. Amazon Bedrock provides access to a broader range of models from multiple providers (Anthropic, Meta, Mistral, Cohere, Amazon Titan, and others) at rates that vary by model and in some cases represent meaningful discounts versus direct provider APIs. For organisations that specifically require OpenAI models, Azure OpenAI is typically the appropriate choice. For organisations open to model flexibility, Bedrock's multi-provider access provides more options to optimise cost against capability.
Platform integration. Azure OpenAI integrates natively with the Microsoft ecosystem — Azure Active Directory, Azure Key Vault, Azure Monitor, and the Microsoft Copilot platform. Bedrock integrates natively with the AWS ecosystem — IAM, CloudWatch, SageMaker, Lambda, and S3. For organisations with significant existing investment in one cloud ecosystem, the native integration advantage of the aligned platform is a genuine cost consideration beyond raw inference pricing, because it reduces the cost and complexity of building production-grade AI applications.
Provisioned throughput availability. Azure OpenAI has historically experienced GPU scarcity for Provisioned Throughput Units (PTUs), particularly for GPT-4 class models, making it difficult for organisations to secure the guaranteed throughput they need for high-volume production workloads. Bedrock's provisioned throughput has generally been more accessible for Anthropic and other models, which is relevant for organisations with throughput consistency requirements.
Reserved Instances, Savings Plans, and the EDP: What Applies to Bedrock
One of the most common questions we hear from enterprises managing Bedrock spend is whether AWS Reserved Instances or Savings Plans apply to Bedrock inference costs. The answer requires understanding what each commitment mechanism covers.
Reserved Instances provide discounted pricing for specific EC2 instance types in exchange for a one-year or three-year commitment. They apply to EC2 compute costs and certain other compute-bound services. They do not apply to Bedrock API inference costs, which are billed separately as managed AI service calls.
Savings Plans are more flexible than Reserved Instances. Compute Savings Plans provide discounts on EC2, Lambda, and Fargate compute regardless of instance type, family, or region. Machine Learning Savings Plans cover SageMaker usage. Neither Compute Savings Plans nor ML Savings Plans apply directly to Bedrock API inference calls. Bedrock is priced as a managed service API, not as underlying compute consumption.
The Enterprise Discount Program (EDP) is where meaningful discounts on Bedrock spend become available. The AWS EDP is a negotiated contract that provides a blanket discount percentage across eligible AWS services in exchange for a committed annual spend. Bedrock API calls are eligible for EDP discounts. However, meaningful EDP discounts — those that represent a material cost reduction on your Bedrock spend — typically require a total AWS annual commitment of approximately $2 million or more. At $1 million annual spend, the EDP discount is typically in the 6–9% range. At $2 million or above, discounts in the 12–18% range become achievable on multi-year agreements. For organisations with Bedrock spend that represents a significant portion of a larger AWS footprint, ensuring that Bedrock usage is included in EDP eligible spend calculation is an important step in maximising the value of the EDP negotiation.
Cost Governance for Bedrock: What Good Looks Like
Preventing runaway Bedrock costs requires governance mechanisms that most enterprises do not have in place at the point they begin production deployment. The following are the minimum governance requirements for organisations running Bedrock in production at scale.
Token budget limits at the application level. Every Bedrock application should have a maximum token budget enforced at the application layer — both for individual requests (limiting context window size and response length) and for aggregate daily or monthly consumption. This prevents individual requests from generating unexpectedly large token counts due to misuse, prompt injection, or implementation errors.
AWS Cost Anomaly Detection for Bedrock. AWS Cost Anomaly Detection uses machine learning to identify unusual spending patterns and can be configured to alert on Bedrock-specific spend anomalies. This provides an operational safety net that catches cost spikes before they become end-of-month surprises.
Model selection governance. Not all Bedrock workloads require the highest-capability and highest-cost model. A document classification task that costs $0.25 per 1,000 tokens using a frontier model may cost $0.025 per 1,000 tokens using a smaller, faster model with acceptable accuracy for the specific use case. Organisations should define governance policies that require application teams to demonstrate that the model tier selected is appropriate for the task, not simply the most capable available.
Egress architecture review. Before deploying Bedrock in production, conduct an explicit review of data flows that cross the AWS network boundary. Identify every integration point where Bedrock outputs are transmitted to non-AWS endpoints and calculate the expected monthly egress volume. Include this in your cost model. Data egress costs are avoidable with good architecture — keeping Bedrock outputs within the AWS network boundary wherever possible is a straightforward optimisation that can reduce costs materially.
For more on AWS AI governance, see our GenAI Knowledge Hub and our AWS contract negotiation specialists.
Bedrock and Your AWS EDP: Negotiation Considerations
If you are negotiating or renegotiating an AWS Enterprise Discount Program agreement, Bedrock spend should be explicitly addressed. Several specific considerations apply. First, confirm with your AWS account team that Bedrock API inference costs are included in your EDP eligible spend calculation. This is not always automatic and varies by agreement structure. Second, if your Bedrock consumption is growing rapidly, model the expected Bedrock spend over the EDP term and include it in your commitment sizing — a higher total commitment secures a higher discount rate, and Bedrock's growth trajectory may make a higher commitment appropriate even if current spend is modest. Third, AWS Marketplace purchases of Bedrock-integrated ISV products can offset up to 25% of your EDP commitment, which may be relevant if your Bedrock implementation relies on third-party tools sourced through the Marketplace.