Why the ML Platform Decision Is a Licensing Decision

Enterprise ML platform selection is increasingly treated as a purely technical decision — which platform has the best MLOps capabilities, the most model serving options, or the most convenient integration with existing data infrastructure. The commercial dimension is frequently an afterthought, addressed only when the first month's bill arrives.

This is a mistake. All three major platforms — AWS SageMaker, Azure Machine Learning, and Google Vertex AI — use consumption-based pricing models that create significant budget unpredictability at scale. All three carry vendor lock-in implications that affect architectural decisions for years after the initial selection. And all three have varying degrees of integration with foundational GenAI model providers — OpenAI, Anthropic, and Google's own models — that affect both the total cost and the lock-in profile of the GenAI layer above the ML platform.

Organisations that evaluate ML platforms without explicit commercial analysis consistently overspend, underestimate lock-in, and arrive at renewal with weaker negotiating positions than they need.

AWS SageMaker: Pricing Model and Lock-in Profile

SageMaker operates on a fully consumption-based pricing model with no mandatory subscription or licensing fee. The principal cost dimensions are notebook instance hours, training instance hours, inference endpoint instance hours, storage, and data processing for features such as SageMaker Pipelines, Model Monitor, and Feature Store.

Where SageMaker Costs Accumulate

The most significant and frequently underestimated SageMaker cost is persistent inference endpoints. A model deployed on a SageMaker real-time endpoint runs continuously unless explicitly deleted, regardless of whether it is receiving traffic. Organisations that deploy models for development or testing and fail to tear down endpoints accumulate ongoing instance costs 24 hours a day. Enterprise SageMaker deployments with multiple endpoints — a common pattern in teams running separate development, staging, and production environments — routinely run idle endpoint costs in the range of $3,000 to $8,000 per month.

Training costs are more predictable because training jobs are bounded in duration, but large training runs on GPU instances (p3.8xlarge at approximately $12.24 per hour, p4d.24xlarge at approximately $32.77 per hour) accumulate rapidly. Multi-experiment training regimes without disciplined cost controls frequently generate $10,000 to $30,000 in monthly training charges that were not anticipated at project initiation.

SageMaker Savings Plans and EDP

SageMaker Machine Learning Savings Plans provide discounts of up to 64 percent on SageMaker compute in exchange for a one-year or three-year hourly spend commitment. The structure mirrors EC2 Savings Plans: the commitment covers any SageMaker compute regardless of instance type or region, providing flexibility while delivering material savings for consistent usage.

For organisations spending $100,000 or more per month on SageMaker, negotiating SageMaker commitments as part of an AWS Enterprise Discount Programme agreement is advisable. EDP discounts stack with Savings Plans, compounding savings across the ML spend. Meaningful EDP discounts on overall AWS spend begin at approximately $2 million or more in annual committed spend.

Data egress charges are the most common surprise cost in SageMaker environments. Downloading training data from S3 to SageMaker training instances, transferring model artefacts across regions, and exporting inference results to non-AWS destinations all generate egress charges at standard AWS data transfer rates ($0.09 per GB for internet-destined traffic). ML workloads with large dataset transfers between S3 and training instances in the same region benefit from the fact that S3-to-EC2 and S3-to-SageMaker transfers within the same region are free — but cross-region transfers are not.

SageMaker Vendor Lock-in Assessment

SageMaker's lock-in profile is significant for organisations that adopt its proprietary abstractions. SageMaker Pipelines, SageMaker Feature Store, SageMaker Model Registry, and SageMaker Experiments are AWS-proprietary services with no direct equivalents on other platforms. Workloads built natively on these services require substantial migration effort to move to Azure or Google Cloud. Organisations using SageMaker primarily as managed compute for containerised training and inference — with MLflow or Kubeflow for orchestration and model management — have substantially lower lock-in exposure.

In one engagement, a global financial services firm selected Azure Machine Learning primarily on the basis of existing Microsoft EA coverage, without modelling SageMaker or Vertex AI costs at their anticipated scale. Twelve months later, AML compute costs were running 2.4x the original budget. Redress conducted a cross-platform cost model and negotiated a Savings Plan uplift that brought costs within 10% of the original projection — the advisory fee was less than 6% of the annual overspend identified.

Organisations that want independent analysis and negotiation support for AWS Marketplace strategy, EDP structuring, and procurement optimisation work with our AWS contract negotiation specialists. Redress Compliance is 100% buyer-side — no vendor commissions, no referral fees.

Need independent advice on your ML platform commercial strategy?

We provide platform-neutral enterprise cloud AI advisory.
Talk to an Advisor →

Azure Machine Learning: Pricing Model and GenAI Integration

Azure Machine Learning pricing follows the same consumption-based structure as SageMaker. There is no platform fee for Azure ML itself — costs are driven by the Azure virtual machines used for training and inference, storage, and managed services such as Azure ML Pipelines, Azure ML Model Registry, and Azure ML Compute Clusters.

Azure ML Cost Structure

Azure ML's primary cost advantage over SageMaker is its tighter integration with Azure Reserved Instance pricing. Azure 1-year VM reservations on compute-optimised or GPU instances save approximately 42 percent versus on-demand rates. Unlike SageMaker's proprietary Savings Plans, Azure ML reservations apply at the underlying Azure VM level, meaning they can be shared across Azure ML workloads and any other Azure VM-based services. This makes Azure ML's commitment discount structure more transferable and less vendor-specific.

Azure ML also supports serverless compute clusters that scale to zero when idle, eliminating persistent idle compute costs that are SageMaker's most common waste source. Organisations with highly variable training workloads — bursty experimentation followed by periods of inactivity — typically find Azure ML's auto-scaling compute clusters more cost-efficient than equivalent SageMaker configurations.

The Azure OpenAI Integration Lock-in Risk

Azure ML's most significant lock-in vector in the current market is its exclusive early access to OpenAI's models through Azure OpenAI Service. Azure OpenAI provides access to GPT-4, GPT-4o, DALL-E, and other OpenAI models through Azure's commercial terms and compliance framework, making it the preferred access route for regulated enterprises.

However, organisations that build Azure ML workflows tightly coupled to Azure OpenAI Service endpoints are simultaneously locked into Azure as an ML platform and into Microsoft as the gatekeeper for OpenAI model access. Direct OpenAI API access offers comparable models but outside Azure's compliance and governance layer — a meaningful difference for regulated industries. The pricing comparison matters: Azure OpenAI and direct OpenAI pricing are broadly similar for standard model tiers, but Azure OpenAI offers provisioned throughput (PTU) contracts that provide predictable cost and capacity guarantees that the direct OpenAI API does not. Enterprises deploying OpenAI models at scale should always compare Azure OpenAI PTU pricing against direct OpenAI enterprise agreements before committing.

OpenAI enterprise agreements carry lock-in provisions that require careful review. Long-term OpenAI enterprise contracts include volume commitments, data retention provisions, and model version guarantees that create exit friction. Enterprises should ensure contract flexibility for model substitution — the ability to swap OpenAI models for Anthropic Claude, Google Gemini, or open-source alternatives — before signing multi-year OpenAI enterprise commitments through any access channel.

Azure ML Lock-in Assessment

Azure ML's lock-in profile is lower than SageMaker's for organisations using it as managed compute. Azure ML supports MLflow natively and provides an open model registry format. The primary lock-in risk is integration depth with the Microsoft stack — Azure DevOps, Azure Active Directory, Microsoft Purview for data governance, and Azure Monitor for observability. Organisations deeply integrated with these services find migration to AWS or Google Cloud ML infrastructure costly at the organisational level even if the ML workloads themselves are portable.

Google Vertex AI: Pricing Model and Unique Characteristics

Vertex AI pricing combines per-instance-hour compute charges for training and serving with managed service charges for Vertex AI Pipelines, Vertex AI Feature Store, and Vertex AI Model Registry. Google's unique offering is hardware flexibility through TPU access — Tensor Processing Units provide significantly better price-performance for large model training than GPU alternatives, with TPU v5 instances offering up to 10x improvement in training throughput for compatible large language model workloads.

Vertex AI Cost Structure

Vertex AI's sustained use discounts (SUDs) automatically apply a discount of up to 30 percent for instances that run for more than 25 percent of a calendar month. Unlike AWS and Azure, which require explicit commitment purchases for similar discounts, Google applies SUDs automatically with no action required. This makes Vertex AI's cost model more predictable for steady-state inference workloads that run continuously.

However, Vertex AI's consumption billing creates the same budget unpredictability as its competitors for variable workloads. Organisations that run heavy batch prediction jobs, use Vertex AI AutoML for automated model training, or deploy multiple online prediction endpoints without governance controls routinely encounter bills that significantly exceed initial estimates. The consumption billing model means that cost overruns can accumulate before monthly billing reviews identify them.

Vertex AI Lock-in: The Google Ecosystem Trap

Vertex AI's lock-in profile is strongest for organisations that adopt Google's full data and AI stack: BigQuery for data warehousing, Dataflow for data pipelines, Vertex AI Feature Store for feature management, and Google's own foundational models (Gemini, PaLM 2) through Vertex AI's model garden. Each layer adds integration depth that increases the cost of migrating to alternative platforms.

Google's foundational model access through Vertex AI provides access to Gemini models at enterprise-grade terms — a significant advantage for organisations requiring Google's multimodal capabilities or preferring Google's data handling commitments over OpenAI's. However, Vertex AI's endpoint pricing lacks scale-to-zero for standard deployments, meaning always-on endpoints generate mandatory idle costs — a structural disadvantage versus Azure ML's serverless compute for cost-conscious deployments.

Consumption billing creates budget unpredictability across all three ML platforms. The organisation that spends the most time governing ML compute — setting budgets, enforcing auto-shutdown policies, and reviewing commitment coverage — consistently achieves 30 to 40 percent lower ML infrastructure costs than equivalent organisations without these disciplines.

Side-by-Side: Enterprise Decision Framework

For enterprise ML platform selection, the following factors drive the commercial recommendation:

  • Primary cloud provider alignment: Organisations with significant AWS EDP commitments should strongly favour SageMaker to benefit from stacking SageMaker Savings Plans with EDP discounts. The same logic applies to Azure and Google Cloud commitment relationships.
  • OpenAI model access requirements: Enterprises requiring GPT-4 or equivalent OpenAI models at scale, with enterprise compliance terms, should evaluate Azure OpenAI via Azure ML versus direct OpenAI enterprise agreements. Azure OpenAI's PTU contracts provide cost predictability that the direct API's consumption billing does not.
  • Training workload pattern: Bursty experimental training with frequent idle periods favours Azure ML's auto-scaling clusters. Large sustained training runs at GPU or TPU scale favour Vertex AI TPU access or SageMaker spot instances for training cost reduction.
  • Inference workload pattern: Always-on inference endpoints favour Vertex AI's sustained use discounts. Variable traffic inference favours SageMaker Serverless Inference or Azure ML's serverless compute.
  • Lock-in risk appetite: Organisations with multi-cloud strategies and high lock-in sensitivity should use open standards (MLflow, Kubernetes, containerised training) regardless of which managed ML platform is selected as the primary environment.

Managing Consumption Billing Unpredictability

Consumption billing is the structural characteristic that makes ML platform costs difficult to predict and govern. Unlike traditional software licensing with fixed annual fees, consumption billing means that a single poorly governed training run, an endpoint left running overnight, or an AutoML job triggered without cost constraints can generate unexpected costs that dwarf the original budget.

Enterprise ML cost governance requires budget alerts at both account and project level on all three platforms; auto-shutdown policies for development and testing endpoints; mandatory cost review as part of ML project gating; and specific approval workflows for GPU or TPU training jobs above defined cost thresholds. Organisations that implement these controls consistently report 25 to 40 percent lower ML infrastructure costs than those that rely on monthly bill reviews alone.

The GenAI layer above the ML platform adds a second consumption billing exposure. Model inference calls to OpenAI, Anthropic, Google, or AWS Bedrock models are all consumption-billed per token. At small scale, per-token billing is economical. At enterprise scale — high-volume document processing, customer-facing applications with millions of daily interactions, or developer productivity tools deployed to thousands of users — consumption billing creates genuine budget unpredictability that requires token-level monitoring, cost allocation, and committed spend negotiations with model providers.

Key Recommendations for Enterprise Buyers

  • Never evaluate ML platforms on technical capability alone. Commercial lock-in, consumption billing governance, and integration with existing cloud commitments have material long-term cost implications that technical evaluations miss.
  • Map SageMaker spend to AWS EDP and Savings Plans. For AWS-primary organisations, SageMaker Savings Plans stacking with EDP discounts provides the most efficient overall ML spend profile.
  • Scrutinise Azure OpenAI lock-in before committing. Azure ML's integration with Azure OpenAI is commercially compelling but creates dual lock-in to Microsoft and OpenAI. Always maintain the contractual ability to substitute models from Anthropic, Google, or open-source alternatives.
  • Budget governance is not optional with consumption billing. All three platforms will generate materially higher costs than planned without active cost governance. Implement budget alerts, auto-shutdown policies, and cost allocation before deployment at scale.
  • Negotiate foundational model access separately from platform access. OpenAI, Anthropic, and Google enterprise agreements for model API access should be negotiated independently of the cloud ML platform relationship to preserve price transparency and switching leverage.