The Two Pricing Models: Pay-As-You-Go and PTU
Azure OpenAI PTU (Provisioned Throughput Units) costs 3 to 7 times more than pay-as-you-go for low-utilisation workloads — but delivers 60 to 80% savings at scale. The pricing structure has two fundamentally different models, and choosing the wrong one is the most expensive mistake enterprises make when deploying AI at scale. The first is Pay-As-You-Go (PAYG), where you are charged per token processed — both input tokens and output tokens, at rates that vary by model. The second is Provisioned Throughput Units (PTU), where you commit to a reserved capacity allocation and pay an hourly rate regardless of actual utilisation.
PAYG is the right choice for exploratory workloads, development environments, pilot projects, and any use case where demand is unpredictable or low-volume. It offers complete flexibility — there are no minimum commitments and billing scales linearly with use. The limitation is that PAYG deployments are subject to rate limits and do not guarantee response latency, which makes them unsuitable for customer-facing production workloads where predictable throughput matters.
PTU resolves the latency and throughput predictability problem by pre-provisioning dedicated model capacity. You specify the number of PTUs you need — each PTU represents a certain amount of model processing throughput per minute — and Microsoft reserves that capacity exclusively for your deployment. The base PTU hourly rate is $2 per PTU per hour, but this headline number is rarely what large deployments pay in practice. Monthly reservations provide significant discounts, and annual reservations can reduce the effective rate by up to 82 to 85% over the hourly on-demand rate. The tradeoff is commitment: you are paying for that capacity whether or not your models are actively processing requests.
The PTU Commitment Decision: When It Makes Sense
PTU reservations make financial sense at a specific utilisation threshold that Microsoft does not state prominently in its documentation. The general rule is that PTU commitments become cost-effective only when you consistently maintain 60 to 70% or higher utilisation of the provisioned capacity. Below that threshold, PAYG at per-token rates will cost less than the PTU reservation charges for the same volume of work.
The utilisation calculation requires understanding your workload's peak-to-average ratio. A workload with highly variable demand — bursting to high throughput during business hours but near-idle overnight and on weekends — may achieve average utilisation well below the PTU break-even threshold even if peak demand requires PTU-level capacity. For these workloads, a hybrid approach is often more cost-effective: a baseline PTU reservation sized for average sustained load, supplemented by PAYG overflow capacity for peak periods. Microsoft supports this architecture through deployment overflow settings, but it requires active capacity planning rather than a single PTU commitment sized for peak demand.
What makes the PTU decision harder is that Microsoft's own PTU calculator and sizing guidance tends to recommend provisioning for peak demand rather than average demand. This is financially rational from Microsoft's perspective — PTU reservations generate committed revenue regardless of utilisation — but it is not the approach that minimises cost for buyers with variable workloads. The correct framing for enterprise buyers is to model actual workload demand profiles over at least 30 days before making a PTU commitment, and to start with monthly commitments (which provide meaningful but not maximum discounts) before committing to annual terms.
Fine-Tuning: The $5,000-Per-Month Trap Most Teams Don't See Coming
Fine-tuning is where Azure OpenAI's pricing model catches the largest number of enterprise teams off-guard. The training run itself — the compute-intensive process of adapting a base model to a specific dataset — is priced at a per-token rate that is straightforward and bounded. A fine-tuning run on a modest dataset completes in hours and costs tens to hundreds of dollars. This is the number that teams focus on when evaluating fine-tuning feasibility.
What is not prominently featured in the Azure OpenAI pricing documentation is the deployment hosting fee. Once a fine-tuned model is deployed — meaning it has been published to an inference endpoint so it can receive requests — Azure charges $7 per hour for hosting that model. This fee accrues continuously, 24 hours a day, 7 days a week, regardless of whether the model is receiving any inference requests. At $7 per hour, a single deployed fine-tuned model costs $5,040 per month in hosting fees alone. Two deployed models cost $10,080 per month. These costs are entirely independent of inference activity.
The practical consequence is what practitioners have started calling zombie fine-tuned models — deployed models that were created for a proof-of-concept or pilot that never went to production, or models that were superseded by a newer version but were never decommissioned. Most organisations discover zombie fine-tuned models three to six months after deployment, by which point they have accumulated between $5,508 and $11,016 in hosting fees per model without a single production inference request being processed. One Healthcare CTO found $7,344 per month in zombie hosting costs across two models in less than three minutes of reviewing her Azure billing dashboard — costs that had been accumulating for months beneath the noise level of the broader Azure invoice.
What Microsoft doesn't tell you: Azure OpenAI does not provide hard spend limits that prevent you from exceeding your budget. Unlike some other cloud providers, there is no native cap on AI service costs. Budget alerts can be configured in Azure Cost Management, but these are informational only — spending continues past the threshold. Enterprises deploying Azure OpenAI in production must implement proactive governance to avoid uncontrolled spend accumulation.
No Hard Spend Limits: The Budget Governance Gap
One of the most material differences between Azure OpenAI and competing AI service offerings is the absence of hard spend limits. OpenAI's own platform — accessed directly rather than through Azure — allows users to set hard monthly spend caps that prevent API calls once the cap is reached. Azure OpenAI does not offer this natively. Azure Cost Management allows budget alerts to be configured with email notifications, but these are advisory only; the service will continue to process requests and accumulate costs after the alert threshold is passed.
For enterprises with well-governed Azure environments, this gap is manageable through existing FinOps practices: budget alerts, anomaly detection, and regular cost reviews. For development teams that have been given direct Azure OpenAI access without a mature FinOps overlay, the gap creates genuine exposure. A runaway batch processing job that consumes millions of tokens over a weekend, a fine-tuned model hosting fee that no one is watching, or a PTU reservation that was sized too aggressively — these are real failure modes that Microsoft's pricing documentation does not highlight and that standard developer onboarding flows do not address.
The governance recommendation for enterprise Azure OpenAI deployments is to treat AI costs as a distinct spend category in your Azure cost management framework. Set separate budget alerts for Azure OpenAI specifically (not just the parent subscription), implement tagging requirements that attribute OpenAI costs to specific teams and use cases, and conduct a monthly review of fine-tuned model inventory to identify and decommission any models not in active production use. These steps are not onerous, but they require deliberate implementation rather than relying on defaults.
For enterprises negotiating Azure OpenAI commitments above $500,000 annually, our enterprise AI negotiation specialists provide PTU sizing analysis, commit-level discount benchmarking, and independent contract review — ensuring your Azure OpenAI investment is structured for flexibility as model pricing continues to evolve through 2026.
Need an independent Azure OpenAI cost review from Microsoft licensing advisory specialists?
We identify governance gaps, commitment mismatches, and zombie costs that standard Azure dashboards don't surface.Azure OpenAI vs Direct OpenAI: The Cost Structure Difference
A question that comes up regularly in enterprise AI planning conversations is whether to use Azure OpenAI or access the same models directly through OpenAI's API. The models available through both channels are substantively the same, and in many cases Azure OpenAI has a slight lag in model availability compared to OpenAI's own platform. The pricing differences are more nuanced than most teams realise.
For base inference (PAYG per-token pricing), Azure OpenAI and OpenAI API rates are broadly comparable, with occasional promotional periods where one is slightly cheaper than the other. The material difference emerges in fine-tuning. On OpenAI's platform, there is no hosting fee for deployed fine-tuned models — you pay only for the inference tokens you consume. This is a fundamentally different cost structure from Azure's $7-per-hour continuous hosting fee, and for organisations with multiple fine-tuned models in production, it can represent a significant ongoing cost differential in favour of the direct OpenAI platform.
The case for Azure OpenAI over the direct platform rests on enterprise governance and compliance features: private endpoints that keep inference traffic within the Azure network boundary, Azure Active Directory authentication integration, regional data residency guarantees, and alignment with existing Azure spend for EA commit consumption purposes. For organisations with strong data residency requirements or whose Azure OpenAI consumption can be applied against an existing Azure commitment, these benefits may outweigh the fine-tuning hosting premium. For organisations without those constraints, the fine-tuning cost differential warrants a direct comparison before defaulting to the Azure channel.
E7 and Azure OpenAI: What the Bundle Means
Microsoft's M365 E7 SKU, which became generally available on May 1, 2026 at $99 per user per month, includes Microsoft 365 Copilot as part of the bundle. It is worth being precise about what this does and does not provide in terms of Azure OpenAI access. Copilot, as included in E7, provides AI-assisted productivity capabilities within M365 applications — Teams, Word, Excel, Outlook — powered by Microsoft's backend infrastructure. End users accessing Copilot through M365 do not consume Azure OpenAI PTU or PAYG tokens from your organisation's Azure subscription in the same way that direct API usage does.
Organisations that want to build their own AI applications or automate their own workflows using Azure OpenAI models — beyond what Copilot's pre-built M365 integrations provide — are working with a separate infrastructure, a separate cost centre, and a separate pricing model. E7 does not provide free Azure OpenAI API access. E7 also includes Agent 365, but as noted earlier, Agent 365 is a governance control plane for managing agents; the actual agent compute runs through Copilot Studio or Microsoft Foundry, both of which have their own consumption-based pricing. Organisations evaluating E7 as an all-inclusive AI platform should be explicit in their analysis about which cost categories E7 actually covers and which remain as separate consumption-based charges.
Negotiating Azure OpenAI Costs in an EA Context
For organisations on a Microsoft Enterprise Agreement or Azure Commitment arrangement, Azure OpenAI consumption can typically be applied against the Azure prepayment — the annual Azure commit that qualifies for EA discounts. This is worth confirming with your Microsoft account team, as the qualifying services under Azure prepayment can change, and some newer services have had delayed eligibility. If your Azure OpenAI consumption qualifies, it means your effective Azure OpenAI spend is at the discounted EA rate, not at list price.
PTU reservations for Azure OpenAI are typically negotiable through Microsoft's standard EA amendment process, particularly for large-scale deployments. Microsoft has provided custom PTU pricing to large enterprise accounts — not as a published discount category but as a deal-specific concession to secure large AI workload commitments. Organisations in the planning phase for large Azure OpenAI PTU deployments should include their Microsoft licensing advisory specialists in the conversation before signing a reservation, as the standard Azure pricing page does not represent the floor for what is commercially achievable. Microsoft Q4 (April through June) is the highest-leverage window for these conversations, with field representatives having maximum authority to apply consumption credits, waive reservation minimums, or provide custom PTU pricing to close competitive deals.