The AI Spend Problem in Enterprise AWS Environments
Most enterprises negotiated their AWS commercial structures — Reserved Instances, Savings Plans, Enterprise Discount Program agreements — before AI inference and training workloads became material line items. The result is a commercial framework built for stable compute that struggles to contain the variable, token-based, and throughput-dependent cost structures of AWS AI services.
Amazon Bedrock operates on consumption-based pricing: you pay per token processed, or per model unit reserved for throughput. Amazon SageMaker charges for instance-hours consumed by training jobs, endpoints, and notebooks, with a separate Savings Plans mechanism for the predictable portion of that spend. Neither service fits neatly into the RI model that controls most enterprise compute costs, and neither is automatically covered by legacy EDP agreements that predate the AI product portfolio.
Consumption billing creates budget unpredictability that is structurally different from the compute cost problem. A single engineering team adding a new generative AI feature can double AI inference costs within a sprint cycle. The token economics of large language models are opaque until you have production traffic data, and models vary by an order of magnitude in cost per equivalent task. Without active cost governance at the AI layer, enterprise AI spend routinely runs 30 to 60 percent over forecast in the first year of production deployment.
Understanding Amazon Bedrock Pricing
Amazon Bedrock provides access to foundation models from Anthropic, Meta, Mistral, Cohere, Amazon, and others through a unified API. Pricing varies significantly by model, modality, and usage pattern, with three main commercial mechanisms available to enterprise buyers.
On-Demand Token Pricing: Flexibility at a Premium
On-demand Bedrock pricing charges per 1,000 input tokens and per 1,000 output tokens, with rates varying by model. Claude 3 Haiku is among the lowest-cost options for high-volume tasks; Claude 3 Sonnet and Opus carry higher per-token rates for complex reasoning workloads. The on-demand model is appropriate for development, testing, and unpredictable production workloads but becomes expensive at scale because it carries no commitment discount.
The cost variance between models is significant enough to drive architecture decisions. Using Claude 3 Haiku for metadata classification tasks instead of Sonnet can reduce per-task cost by 15 to 20 times for equivalent quality outputs. Model selection is the highest-leverage cost optimisation available in Bedrock before any commercial negotiation takes place.
Provisioned Throughput: Committing for Predictability and Discount
Provisioned Throughput reserves dedicated inference capacity in the form of model units (MUs). Each model unit guarantees a specific tokens-per-minute throughput level and is charged at a fixed hourly rate regardless of actual utilisation — a critical distinction from on-demand pricing. The fixed charge applies whether you send zero requests or run at full capacity throughout the commitment period.
Provisioned Throughput commitments are available in one-month and six-month terms. As of 2023, the minimum commitment is approximately $15,000 per month, making this mechanism viable only for workloads that have validated production traffic volumes. Six-month commitments provide higher discount rates than one-month terms, but the fixed-charge nature means a six-month commitment on an overestimated workload creates significant stranded cost.
The right time to purchase Provisioned Throughput is after three to six months of on-demand production data confirms stable, predictable traffic patterns. Purchasing based on projected usage rather than measured usage is the most common Bedrock cost management mistake and one of the key contributors to AI budget overruns.
Batch Inference: The Underused Cost Reduction Option
Bedrock Batch Inference processes large jobs asynchronously at a discount versus on-demand real-time pricing — typically 50 percent off the on-demand rate for supported models. For workloads that do not require synchronous responses — document classification, bulk analysis, overnight summarisation jobs — Batch Inference is consistently the lowest-cost Bedrock option. Its underutilisation across enterprise environments reflects engineering team defaults toward real-time inference rather than cost-aware architecture design.
Need help structuring your AWS AI spend framework?
We've helped 30+ enterprises bring Bedrock and SageMaker costs under commercial control.Amazon SageMaker: Pricing Structure and Savings Plans
Amazon SageMaker is a comprehensive ML platform covering data labelling, feature engineering, model training, hyperparameter tuning, model evaluation, deployment, and monitoring. Each capability has its own pricing dimension, which makes SageMaker one of the most complex cost management challenges in the AWS portfolio.
The SageMaker Cost Components
SageMaker instance pricing covers training, processing, hosting (real-time inference endpoints), and notebooks. Instance types span from ml.t3.medium for development notebooks to ml.p4d.24xlarge for distributed model training — a price range from cents per hour to over $32 per hour. SageMaker Inference also includes Serverless Inference (pay per request) and Asynchronous Inference (pay per processing time) for workloads with variable traffic patterns.
The complexity is compounded by SageMaker's role as a training platform versus an inference platform. Training job costs are typically high-intensity and short-duration (hours to days); inference endpoint costs are lower-intensity and indefinite. Managing both categories requires separate optimisation strategies and separate commitment mechanisms.
SageMaker Savings Plans: The Commitment Mechanism
Amazon SageMaker Savings Plans commit to a consistent dollar-per-hour spend on SageMaker services in exchange for discounts of up to 64 percent versus on-demand pricing. The commitment covers SageMaker instances across training, processing, hosting, and notebooks — a broader scope than most enterprise buyers realise. Savings Plans are available in one-year and three-year terms with all-upfront, partial-upfront, and no-upfront payment options.
Unlike Reserved Instances — which lock to a specific instance type — SageMaker Savings Plans apply to any SageMaker instance across any use case, as long as the committed hourly spend level is met. This flexibility is valuable for ML teams whose workloads shift between training and inference intensity across project cycles. A team that trains heavily in Q1 and runs production inference heavily in Q2 benefits from the same Savings Plan commitment covering both periods.
The practical discipline is sizing the Savings Plan to the consistent floor of SageMaker usage, not the peak. In our experience working with data science and ML engineering teams, SageMaker usage is highly variable — project sprints create training cost spikes that should remain on-demand, while production inference endpoints create the steady baseline that the Savings Plan should cover. Correctly calibrating this ratio typically delivers 40 to 55 percent overall SageMaker cost reduction versus pure on-demand.
The Data Egress Hidden Cost in ML Pipelines
Data egress is the most common surprise cost in AWS environments, and ML pipelines are particularly exposed. Training data typically resides in S3, and large-scale training jobs that transfer data to compute instances in different Availability Zones or regions incur data transfer charges that are separate from both the instance cost and any SageMaker Savings Plan coverage. Similarly, batch inference jobs that process data from external sources or write results to systems outside AWS generate egress charges that compound with scale.
A 10TB training dataset transferred from S3 us-east-1 to a training cluster in a different region costs $200 in data transfer charges at AWS standard rates — in addition to the compute cost. For iterative training workflows where the same dataset is loaded repeatedly, data transfer costs can equal or exceed the instance cost for shorter training runs. Keeping training data and training compute in the same region and Availability Zone is the most direct mitigation, but it requires architectural decisions that many teams make without explicit cost awareness.
The EDP Layer: Does It Cover Bedrock and SageMaker?
The AWS Enterprise Discount Program (EDP) provides a percentage discount on total AWS billing in exchange for a multi-year spend commitment. Whether Bedrock and SageMaker are in-scope for EDP coverage is one of the most important commercial questions for enterprises scaling AI workloads, and the answer requires explicit confirmation in the EDP contract language.
By default, EDP discounts apply broadly across AWS services, but specific AI and ML services may have carve-outs, reduced discount rates, or exclusions depending on how the EDP was structured. SageMaker has historically been included in EDP scope; Bedrock is newer and its EDP inclusion depends on the negotiation period and account team commitment. The critical action is to explicitly confirm in writing which AI services are in-scope for the EDP discount percentage before signing.
Meaningful EDP discounts start at approximately $2 million in annual AWS committed spend. At this threshold, EDP discounts of 5 to 10 percent on in-scope billing are achievable. At $10 million or more in annual AWS spend, EDP discounts of 15 to 25 percent are accessible with competitive negotiation. For enterprises at the threshold level where AI spend is contributing materially to total AWS billing, the incremental AI consumption may push the total AWS spend into a higher EDP discount tier — a leverage point worth calculating before committing to the next EDP term.
AWS ML Savings Plans Versus EDP: What to Use When
Both ML Savings Plans and EDP discounts can apply to SageMaker spend, but they operate differently and are not mutually exclusive. ML Savings Plans deliver up to 64 percent discount on the committed hourly SageMaker spend — a deep, service-specific discount tied directly to SageMaker usage. EDP provides a broader billing percentage discount across all in-scope AWS services including SageMaker, but typically at a lower percentage rate.
The interaction between the two mechanisms is important: ML Savings Plans apply first at the service level, reducing the effective SageMaker spend before EDP is applied. Organisations that purchase Savings Plans and then stack an EDP discount on top get compounding benefit — the ML Savings Plan reduces the SageMaker cost, and the EDP further reduces the post-Savings-Plan billing. Modelling this stacking effect is part of the commercial optimisation analysis that most enterprises do not complete before signing EDP agreements.
The decision framework: purchase ML Savings Plans for the predictable floor of SageMaker usage (typically 50 to 70 percent of baseline consumption). Keep training spikes on-demand or use Spot Instances for training jobs that tolerate interruption. Negotiate EDP to explicitly include both SageMaker and Bedrock in-scope, and use the total AI spend growth trajectory as a lever for higher EDP discount rates at the next renewal.
Governance Structures That Prevent AI Cost Overruns
Commercial negotiation alone cannot contain AWS AI spend. The consumption billing model means costs respond to engineering decisions in near-real time, and procurement teams that are only involved at contract renewal have no meaningful control over the variable component. Effective AI cost governance requires three organisational interventions that sit above the commercial framework.
Model cost awareness at the engineering level: Engineers building AI features need visibility into the per-request, per-token costs of the models they are using. Teams that select models based on capability without cost awareness consistently over-spend. Embedding cost metrics into the model selection criteria — alongside accuracy, latency, and safety — is the prerequisite for engineering-level cost discipline. Tagging all Bedrock and SageMaker usage by team, project, and model enables this visibility in AWS Cost Explorer.
Budget alerts calibrated to consumption patterns: AWS Budget alerts should be configured at the service level (Bedrock, SageMaker), the team level (via cost allocation tags), and the model level (via Bedrock model-specific tagging). A $50,000 monthly Bedrock budget alert does not provide early warning if a single new feature consumes $40,000 in its first week of deployment. Alert thresholds calibrated to daily and weekly burn rates are more operationally useful than monthly totals for consumption-billed services.
Architecture review checkpoints for AI features: Requiring an architecture review that includes a cost model before any new AI feature goes to production is the most effective upstream control. The review should cover model selection rationale, expected token volume at scale, batch versus real-time inference justification, data egress paths, and Provisioned Throughput requirements. Teams that complete this review consistently deliver AI features within cost projections; teams that deploy first and optimise later consistently overshoot.
Six Priority Actions for Enterprise AWS AI Buyers
1. Audit current EDP to confirm AI service scope: Before your next EDP renewal, request written confirmation from your AWS account team of which AI services (Bedrock, SageMaker, Comprehend, Rekognition) are in-scope for the EDP discount percentage. Negotiate explicit inclusion of Bedrock if it is currently excluded.
2. Purchase SageMaker Savings Plans based on measured baseline: Use three to six months of production SageMaker Cost Explorer data to identify your consistent hourly spend floor. Purchase Savings Plans to cover 70 to 80 percent of that baseline. Leave training spikes and new project capacity on-demand or Spot.
3. Defer Bedrock Provisioned Throughput until traffic is validated: Only purchase Provisioned Throughput commitments after confirmed production traffic data demonstrates stable model unit requirements. A 90-day observation window for new AI features before committing to throughput capacity is a reasonable minimum.
4. Implement per-model cost tagging in Bedrock: Configure Cost Allocation Tags at the Bedrock model and application level from the first day of deployment. Retroactive cost attribution across models and teams is difficult and the absence of tagging makes commercial optimisation discussions with AWS account teams harder to substantiate.
5. Model egress costs for all AI data pipelines: Before scaling any AI workload, map the data flow: where does training data originate, where does it flow during processing, and where do inference outputs go? Identify cross-region and cross-service data transfers that will be metered at AWS data transfer rates. Model the egress cost at production scale as part of the AI feature cost projection.
6. Use competitive leverage at EDP renewal: Azure OpenAI Service and Google Cloud Vertex AI provide credible enterprise-grade alternatives for most Bedrock use cases. Demonstrating active evaluation of these alternatives in the 90-day window before EDP renewal provides the commercial leverage required to negotiate explicit AI service inclusion, higher EDP discount tiers, and egress cost concessions. Contact Redress Compliance for EDP benchmark data against peer organisations at similar AWS spend levels.
AWS AI Cost Intelligence
Quarterly updates on Bedrock and SageMaker pricing changes, EDP negotiation outcomes, and AI cost governance frameworks from our AWS advisory practice.