The Two Azure OpenAI Pricing Models: Architecture and Trade-offs

Azure OpenAI Service offers enterprise buyers two pricing architectures with meaningfully different risk and cost profiles. Understanding the structural difference between them is the starting point for every procurement decision involving Azure AI workloads.

Pay-as-you-go (PAYG) charges per token consumed — input tokens processed and output tokens generated — with pricing set per thousand tokens and varying by model. There is no upfront commitment, no guaranteed throughput, and no capital at risk if usage patterns change. PAYG is processed on Microsoft's shared infrastructure, and throughput can be subject to rate limits during periods of high demand. This is the default consumption model and the correct choice for AI workloads in early stages of development, low-volume production use cases, and any scenario where usage patterns are genuinely unpredictable.

Provisioned Throughput Units (PTUs) reserve dedicated model processing capacity for your exclusive use within an Azure region. Each PTU provides a defined level of throughput — measured in tokens per minute — for a specific model deployment. You pay for the reserved capacity regardless of whether you use it, but in exchange you receive guaranteed throughput levels with no rate throttling and predictable performance for latency-sensitive workloads. PTUs are available on monthly or annual reservation terms with significant discounts for longer commitments.

Consumption billing creates budget unpredictability in ways that are particularly acute for GenAI workloads. Unlike traditional SaaS where seat counts are known, AI token consumption is driven by usage intensity — the length of prompts, the complexity of outputs, the number of users interacting with AI-enabled features, and the frequency of API calls from automated workflows. Enterprise GenAI deployments that begin with manageable PAYG costs frequently scale into cost ranges that demand a formal pricing model decision within six to twelve months of production deployment.

Client Outcome: In one engagement, a European financial services firm with $2.8M annual Azure OpenAI spend had committed to pay-as-you-go without modelling break-even thresholds. Redress restructured the commitment: a targeted PTU reservation for high-volume inference workloads, PAYG for variable use cases. First-year saving: $340,000. The engagement fee was under 3% of the identified saving.

PTU Pricing: What You Actually Pay

PTU pricing is structured around reserved capacity commitments. In 2024 pricing, PTU monthly reservations start at approximately $2,448 per unit per month. Annual reservations reduce this to approximately $1,500 to $1,800 per unit per month depending on model and region — representing savings of 50 to 70 percent compared to the equivalent hourly rate of $2 per PTU per hour.

The break-even analysis between PAYG and PTUs requires modelling your specific workload. However, as a general benchmark: if your monthly Azure OpenAI pay-as-you-go spend exceeds $1,800, you are at the economic threshold where PTU reservations become financially competitive. At $3,000 or more in monthly PAYG spend, PTUs with annual reservations almost certainly deliver superior economics — particularly when performance guarantees for production workloads are also factored into the comparison.

For high-volume production workloads where GPT-4-class models are processing sustained query volumes, a single PTU reservation at annual commitment pricing will typically deliver 50 to 70 percent lower token costs compared to PAYG. The gap is most significant for models with higher per-token PAYG rates, and narrows for lighter models where PAYG rates are already low.

"If your monthly Azure OpenAI pay-as-you-go spend consistently exceeds $1,800, PTU reservations at annual commitment pricing will almost certainly deliver superior economics — both on cost and on performance."

When Pay-As-You-Go Is the Right Choice

Pay-as-you-go is not the wrong choice in every scenario. There are genuine use cases where PAYG is the financially rational and operationally appropriate model.

PAYG is correct for AI workloads in pre-production or pilot phases where usage patterns are not yet established. Committing to PTU capacity before your actual throughput requirements are understood is a guaranteed path to wasted spend — PTUs charge for reserved capacity whether or not you use it, and undersized PTU allocations create throughput constraints that require additional reserved units to resolve. The correct sequence is to run PAYG in pilot, measure actual throughput and token consumption across representative time periods, and model forward demand before committing to PTU reservations.

PAYG is also appropriate for workloads with genuinely variable or bursty traffic patterns where average utilisation would be too low to justify the PTU reservation cost. A workload that processes high volumes for two weeks per month and low volumes for the remaining two weeks may find that PAYG is more cost-effective than paying for peak-capacity PTUs on a monthly basis.

Finally, PAYG retains maximum flexibility for procurement teams evaluating multiple AI vendors. Direct OpenAI API access, Azure OpenAI, and competing platforms like Anthropic Claude Enterprise and Google Gemini all offer consumption-based pricing. Committing to PTU capacity on Azure creates a form of lock-in — not contractual, but practical — that reduces your freedom to shift workloads to alternative providers in response to model capability improvements or pricing changes.

When Reserved Capacity (PTUs) Is the Right Choice

PTUs are the right choice when three conditions are simultaneously met: high and predictable throughput requirements, production SLA sensitivity, and cost optimisation maturity.

High throughput is the primary economic driver. PTUs provide a fixed cost per capacity unit regardless of the volume of tokens processed within that capacity envelope. For workloads processing millions of tokens per day with consistent demand patterns, the per-token effective cost at PTU pricing is dramatically lower than PAYG. This is the core financial case for reserved capacity.

Production SLA sensitivity is the second driver. PTUs provide dedicated, guaranteed throughput with no rate throttling from shared infrastructure demand. Enterprises running customer-facing AI applications — chatbots, AI-assisted customer service, real-time document analysis — cannot accept the throughput variability inherent in PAYG shared infrastructure. PTU reservations are the only way to guarantee consistent response latency at scale on Azure OpenAI.

Cost optimisation maturity means your organisation has established usage monitoring, has modelled usage growth scenarios, and is in a position to commit to a volume that will be genuinely utilised. Underutilised PTU reservations are expensive. Before committing to PTUs, validate with at least 60 to 90 days of production PAYG data that your usage patterns are both high enough and stable enough to justify the reservation.

Get independent analysis of your Azure OpenAI cost model

We assess PAYG vs PTU break-even for your specific workloads and negotiate Azure AI pricing within your EA framework.
Download the Guide →

Azure OpenAI vs Direct OpenAI: The Pricing Model Interaction

The PTU vs PAYG decision does not exist in isolation from the broader question of Azure OpenAI vs direct OpenAI. These two procurement channels have different pricing structures, contractual architectures, and discount frameworks — and the interaction between them matters for enterprise procurement strategy.

Direct OpenAI API access is priced on a per-token basis through OpenAI's own commercial terms. Enterprise agreements with direct OpenAI can include negotiated discounts below list rates, but the pricing is independent of any other vendor relationship. Azure OpenAI, by contrast, is accessed through Microsoft Azure — meaning your Azure OpenAI consumption can be incorporated into your existing Azure Enterprise Agreement or Microsoft Customer Agreement discount framework. For enterprises with substantial committed Azure spend, this can result in effective per-token pricing on Azure OpenAI that is materially lower than equivalent direct OpenAI pricing, particularly when PTU reservation discounts are layered on top of EA-level pricing adjustments.

The procurement implication is that enterprises with large Azure footprints should model Azure OpenAI pricing inclusive of EA discount tiers, PTU reservation savings, and any Azure commitment credits before accepting direct OpenAI enterprise pricing as the baseline for comparison. In many cases, the fully loaded Azure OpenAI cost for high-volume workloads — including PTU reservations negotiated within an EA framework — is substantially lower than equivalent direct OpenAI enterprise pricing for the same model and volume.

Negotiation Strategy for Azure OpenAI Capacity

Enterprise buyers have meaningful negotiating room on Azure OpenAI pricing, particularly when integrating AI capacity commitments into broader Azure Enterprise Agreement or Microsoft Customer Agreement renewals.

The primary lever is commitment aggregation. Microsoft's Azure discount framework rewards organisations that commit to aggregate Azure consumption across the estate. AI workloads, when modelled and included in an Azure consumption commitment, contribute to the total Azure committed spend tier — which determines the baseline discount level applied across all Azure services including Azure OpenAI. Enterprises that negotiate AI capacity as part of a broader Azure renewal, rather than as a standalone procurement, access more favourable economics.

For PTU reservations specifically, negotiation should focus on three areas. First, commitment term flexibility: negotiate the right to amend PTU capacity commitments mid-term in response to workload changes, either through capacity exchanges or credit mechanisms for unused PTUs. Second, price protection: for multi-year commitments, negotiate a cap on PTU pricing at renewal. Third, capacity pooling: for organisations with multiple Azure subscriptions or business units, negotiate the ability to pool PTU capacity across subscriptions to maximise utilisation efficiency.

Enterprises with direct OpenAI agreements and Azure OpenAI deployments should coordinate both procurement processes. The existence of direct OpenAI commitment spend provides leverage in Azure negotiations, and vice versa. Demonstrating that AI workloads are actively being evaluated across channels — direct OpenAI, Azure OpenAI, other cloud AI providers — creates competitive pressure that benefits the buyer in both relationships.

Five-Step Decision Framework for Azure OpenAI Procurement

To summarise the analytical framework for choosing between PAYG and PTU capacity for Azure OpenAI enterprise workloads:

Step 1 — Establish baseline usage: Run all new AI workloads on PAYG for a minimum of 60 days. Capture actual token consumption, throughput peaks and troughs, and hourly rate patterns. This data is the foundation for all subsequent decisions.

Step 2 — Model workload patterns: Categorise workloads by usage pattern (consistent, bursty, occasional) and by SLA sensitivity (latency-critical production vs background processing). High-volume, consistent, latency-critical workloads are PTU candidates. Low-volume, bursty, or latency-tolerant workloads are PAYG candidates.

Step 3 — Calculate break-even: For each PTU candidate workload, calculate the monthly PAYG equivalent cost at projected token volumes. Compare against PTU monthly and annual reservation pricing. Workloads where annual PTU pricing is lower than projected PAYG within the reservation term should be migrated to PTUs.

Step 4 — Incorporate into Azure EA strategy: Bundle PTU reservation commitments into your Azure EA or MCA renewal negotiation. This maximises the discount framework benefits and positions AI capacity as part of the total Azure commitment, not as a standalone purchase.

Step 5 — Build in governance: Establish quarterly consumption reviews comparing actual PTU utilisation against reserved capacity. Underutilised PTUs should trigger either workload migration to those deployments or capacity reduction at the next available amendment window.

GenAI Pricing Intelligence

Azure OpenAI PTU pricing and availability evolves continuously. Subscribe to quarterly GenAI licensing updates from Redress Compliance for the latest pricing intelligence and negotiation guidance.