How to use this assessment: How to use this assessment: Work through each item and mark it complete once confirmed. Items flagged High Risk represent the most common sources of material overspend. A score of 15 or more indicates a well-governed position.

Scoring Guide
Tally your confirmed items against these benchmarks to determine your current maturity level.
0 – 5 High Exposure
6 – 10 Partial Governance
11 – 20 Well Governed

Section 1

1. You have confirmed you are using the most cost-efficient OpenAI model that meets your accuracy requirements for each use case — not defaulting to the flagship model for all workloads.
Expert Commentary: GPT-5 is OpenAI's most capable model at $1.25 per million input tokens (standard) or as low as $0.625 with batch API. GPT-5 Mini at $0.25 per million input tokens is sufficient for the majority of classification, summarisation, and structured extraction tasks. Deploying GPT-5 across all workloads when GPT-5 Mini or GPT-4o would suffice is the single most common source of preventable OpenAI API overspend. Run accuracy benchmarks on your specific tasks across model tiers before finalising model selection.
● High Risk
2. You have confirmed the current pricing for every model you are using in production — including the distinction between standard, cached, and batch rates — from OpenAI's official pricing page, not from documentation that may be months old.
Expert Commentary: OpenAI has reduced API pricing multiple times in 2025 and 2026. Organisations that last reviewed pricing at initial deployment and have not re-checked since may be paying list rates that have since been reduced, or may be missing new optimisation tiers that have been introduced. Assign ownership of the quarterly pricing review and confirm current rates from the OpenAI pricing page directly.
● High Risk
3. You have reviewed your input-to-output token ratio for each production use case and confirmed that your cost model reflects the actual ratio rather than a 1:1 assumption.
Expert Commentary: Output tokens are priced at a significant premium over input tokens in OpenAI's pricing. GPT-5 Mini output is $1.00 per million versus $0.25 per million input — a 4x premium. A workflow generating 800 output tokens per 200 input tokens has an effective blended rate 3.4x higher than the input rate alone would suggest. Build your cost model from actual token logs, not from the input rate applied to total token count.
● High Risk
4. You have confirmed whether your use cases benefit from OpenAI's o-series reasoning models and, if so, whether the reasoning token costs have been included in your cost model.
Expert Commentary: OpenAI's o-series models (o4, o4-mini) generate internal reasoning tokens that are billed as output tokens before the visible response is produced. On complex analytical tasks, reasoning tokens can represent 60 to 80 percent of total billable output tokens. Teams that deploy o-series models without reasoning token instrumentation consistently underestimate per-request costs by 3 to 5 times versus their initial projections.
● High Risk
5. You have enabled prompt caching for all use cases with static or repeated system prompts and confirmed the caching hit rate in your production logs.
Expert Commentary: OpenAI's prompt caching reduces input token costs by 50 to 90 percent on the cached portion depending on the model. GPT-5 cached reads are charged at approximately 10 percent of the standard input rate — a 90 percent reduction. For enterprise applications where a large system prompt is sent with every request, caching is a configuration change that typically takes less than one engineering day to implement and can reduce monthly input token costs by 40 to 70 percent immediately.
● High Risk

Section 2

6. You have identified all workloads in production that do not require real-time response and confirmed that they are routed through OpenAI's Batch API.
Expert Commentary: OpenAI's Batch API processes requests asynchronously with up to 24-hour turnaround and charges 50 percent less than the standard API for both input and output tokens. Document processing, data enrichment, report generation, nightly classification tasks, and any workflow that does not require a response in under 60 seconds qualifies for batch processing. Enterprises that route all workloads through the real-time API are paying double the necessary rate for batch-eligible tasks.
● High Risk
7. You have confirmed the caching eligibility of your prompts — specifically that your system prompts are long enough and repeated frequently enough to benefit materially from caching — before relying on cached rates in your cost model.
Expert Commentary: OpenAI's prompt caching applies to prompts above a minimum length threshold. Very short system prompts may not cache efficiently. Prompts that vary significantly between requests — because they include dynamic user context injected into the system prompt — have lower cache hit rates than fully static system prompts. Measure your actual cache hit rate in production rather than assuming maximum cache benefit in your cost model.
● Medium Risk
8. You have confirmed whether your retrieval-augmented generation architecture can benefit from prompt caching on the retrieved context — and implemented context-order stability to maximise cache hits.
Expert Commentary: Prompt caching works most effectively when the beginning of the prompt is stable across requests. In RAG architectures that inject retrieved documents into the beginning of the context window, cache hits are maximised when retrieved documents are ordered consistently — same documents in the same position. If your RAG retrieval returns documents in random or query-dependent order, restructure the prompt to place the stable system prompt and retrieved context in a consistent order before the dynamic user query.
● Medium Risk
9. You have requested enterprise pricing from OpenAI's sales team and documented your monthly token consumption and growth trajectory as the basis for negotiation.
Expert Commentary: OpenAI enterprise accounts at the 300 to 499 seat or equivalent API consumption tier can achieve 10 to 15 percent below list price; at 500+ equivalent consumption, 15 to 25 percent discounts are achievable. Two-year or three-year commitments add a further 5 to 15 percent. Without a documented consumption forecast and a formal enterprise pricing request, you are paying list price for a consumption volume that qualifies for a discount. Every month on list price at enterprise consumption volumes is money left on the table.
● High Risk
10. You have confirmed that your OpenAI enterprise agreement includes price stability terms — specifically that your negotiated rate is locked for the contract period and cannot increase above a defined cap.
Expert Commentary: OpenAI has reduced list prices multiple times since 2023, but enterprise agreements without MFN clauses or rate re-openers may not automatically pass these reductions to your negotiated rate. Conversely, OpenAI could increase specific model rates in the future. Negotiate price stability for your committed model tiers with a cap on increases and automatic rate reduction if OpenAI's list price falls below your negotiated rate.
● High Risk

Section 3

11. You have reviewed whether any existing Microsoft Azure Enterprise Agreement credits can be applied to Azure OpenAI Service consumption as an alternative to direct OpenAI API pricing.
Expert Commentary: Azure OpenAI Service provides access to the same OpenAI models (GPT-4o, o-series) via Azure's infrastructure and billing. Enterprises with committed Azure EA credits can apply those credits to Azure OpenAI consumption, effectively reducing the net cost by the credit discount on their overall Azure commitment. Compare your negotiated Azure OpenAI rate (net of EA credits) against your direct OpenAI API negotiated rate to determine which commercial pathway delivers lower effective cost.
● Medium Risk
12. You have confirmed that your OpenAI enterprise agreement includes data privacy terms — specifically the data training exclusion and regional processing options — that your legal team has reviewed and approved.
Expert Commentary: OpenAI's enterprise terms offer stronger data protection than the standard API terms, but require explicit negotiation. Confirm that your enterprise agreement includes: written exclusion of your data from model training, confirmation of data deletion timelines, regional processing endpoint access for EU data residency (with the 10 percent surcharge factored into your cost model), and audit rights for data processing.
● High Risk
13. You have implemented per-application API key segmentation and confirmed that you have token consumption visibility by application, team, and use case — not just total organisational spend.
Expert Commentary: Without API key segmentation, OpenAI spend is visible only as a total organisational figure. Individual use cases cannot be benchmarked against their projected costs, rogue consumption cannot be identified, and budget allocation for new use cases is based on assumption rather than data. Implement separate API keys per application and team, configure consumption logging in your observability stack, and review per-use-case token costs weekly.
● High Risk
14. You have set rate limits and budget alerts at the application level that prevent a single workload from consuming a disproportionate share of your OpenAI budget without governance review.
Expert Commentary: A single misconfigured application — an agent in an infinite loop, a prompt that generates unexpectedly large outputs, or a batch job that scales faster than expected — can exhaust a month's API budget in hours. Configure rate limits at the API key level (requests per minute, tokens per minute), set budget alerts that trigger at 50 percent, 75 percent, and 90 percent of monthly allocation, and confirm that your governance process requires approval for any use case expected to exceed a defined daily token threshold.
● High Risk
15. You have reviewed your error handling and retry logic for rate limit errors and confirmed that exponential backoff is implemented rather than tight retry loops that waste tokens on failed requests.
Expert Commentary: OpenAI API rate limit errors (HTTP 429) are common at enterprise consumption volumes during peak hours. Applications that implement tight retry loops — retrying immediately after a 429 without backoff — generate significant wasted token consumption on failed requests and amplify rate limit pressure, creating a feedback loop. Implement exponential backoff with jitter, and log 429 rate and retry counts as an operational metric.
● Medium Risk

Section 4

16. You have confirmed whether any of your OpenAI workloads are candidates for open-source model replacement — specifically workloads where a smaller, domain-specific model could match GPT quality at a fraction of the API cost.
Expert Commentary: Open-source LLMs hosted on your own infrastructure — including Llama 3.x, Mistral, and domain-fine-tuned variants — can match GPT-4o Mini performance on narrow, high-volume use cases at an effective per-token cost 80 to 95 percent below OpenAI API rates at sufficient scale. Evaluate open-source alternatives for your highest-volume, lowest-complexity workloads — classification, extraction, routing, and simple summarisation — as part of your annual API cost review.
● Medium Risk
17. You have reviewed the full pricing structure for any fine-tuning projects — including training token cost, hosting fees for the fine-tuned endpoint, and the inference price premium — and confirmed the total cost of ownership is justified by the performance improvement.
Expert Commentary: Fine-tuning a GPT model involves three cost components: training token cost (charged per million tokens in the training dataset), a hosting fee for the dedicated fine-tuned model endpoint (typically a fixed monthly fee), and an inference price premium (fine-tuned endpoints are priced higher than base model endpoints). Confirm that the performance improvement from fine-tuning justifies the total cost versus prompt engineering optimisation on the base model, which is zero-cost.
● Medium Risk
18. You have confirmed the cost of any OpenAI platform features you are using — Assistants API thread storage, file retrieval, code interpreter sandbox compute — as separate line items in your cost model.
Expert Commentary: OpenAI's Assistants API charges for thread storage ($0.10 per GB per day), file retrieval, and code interpreter compute time as separate fees that do not appear in token consumption logs. Enterprise applications built on the Assistants API that accumulate large thread histories or use file retrieval at scale can incur significant ancillary charges that are not visible in standard token usage reports.
● Medium Risk
19. You have reviewed OpenAI's image generation (DALL-E), speech-to-text (Whisper), and text-to-speech (TTS) pricing and confirmed these are included in your cost model if you use these capabilities.
Expert Commentary: OpenAI's multimodal capabilities — image generation, speech processing, and text-to-speech — are billed separately from text model token consumption and do not appear in the same usage category. Organisations that use these capabilities as part of integrated AI applications frequently track them separately and miss the opportunity to consolidate spend reporting or negotiate bundled pricing for multi-capability commitments.
● Lower Risk
20. You conduct a quarterly OpenAI pricing review that compares your current effective per-token rates against new model releases, competitor pricing, and available optimisation levers — and updates your cost model accordingly.
Expert Commentary: OpenAI's pricing landscape changes quarterly. New model versions offer better performance at lower prices. New optimisation features — expanded caching, new batch tiers, updated fine-tuning economics — are released on an ongoing basis. Without a structured quarterly pricing review, your cost model grows stale and your optimisation strategy lags the market. Assign an owner for the quarterly review, define a standardised comparison methodology, and schedule it as a recurring governance event.
● Medium Risk

Ready to optimise your AI contract and cost position?

Download our AI Platform Contract Negotiation Guide — covering all major vendors, pricing structures, and negotiation tactics.
Download Free Guide →

Next Steps

Score your confirmed items against the benchmarks above. If you are in the High Exposure or Partial Governance bands, prioritise the items flagged High Risk — these represent the most common sources of material overspend and are addressable within a single procurement or FinOps cycle.

Redress Compliance works exclusively on the buyer side, with no vendor affiliations. Our GenAI advisory practice has benchmarked AI costs, negotiated enterprise AI contracts, and built governance frameworks across 500+ enterprise engagements. Contact us for a confidential review of your AI cost and contract position.