When a European insurance group with 28,000 employees across Germany, France, the Netherlands, and the United Kingdom began its GenAI pilot programme, OpenAI seemed like an obvious choice. The vendor proposed an enterprise agreement covering four high-impact use cases: claims summarisation, underwriting note generation, customer email triage, and policy document Q&A. The initial proposal came with a price tag of $6.4M over 24 months.
The group's digital transformation team had done their due diligence—they understood the business case, had tested the models, and knew GenAI was strategic. But they also knew that enterprise software agreements hide complexity in their terms. Before signing, they brought in Redress Compliance to audit the proposed engagement.
What we discovered—and what the group's internal team had missed—would reduce that commitment by nearly a third.
The Challenge: Peak Estimates, Not Reality
OpenAI's proposal had been built on sound logic, but with a critical flaw: it modelled AI consumption at peak throughput rather than average operational load. The group's claims processing centre might handle 5,000 claims on a heavy day, but the daily average was closer to 2,500. The underwriting team could theoretically process 300 policies per day, but actual throughput was 150–170. Email triage volumes spiked during seasonal campaigns but ran at 30% of peak during baseline periods.
By anchoring the 24-month commitment to peak-day volumes, OpenAI's model inflated the baseline token allocation by approximately 40%. This is a common pattern in vendor proposals: they build safety margins into consumption forecasts, which is prudent, but those margins can easily become excess capacity—and excess capacity is expensive.
A second issue emerged during our technical review: the proposal treated all four use cases as requiring GPT-4-class models. This made sense for two of them. Policy document Q&A and underwriting note generation involve complex reasoning and need the reasoning power of GPT-4. But claims summarisation and email triage are lower-complexity tasks. They benefit from GPT-4's accuracy, but GPT-3.5-class models can perform 85–90% as well at approximately 75% lower per-token cost. The group was planning to pay GPT-4 prices for work that GPT-3.5 could handle effectively.
The group's procurement team sensed the overscoping but lacked the technical context to challenge OpenAI confidently. Our audit gave them both the evidence and the language to renegotiate.
The Approach: Use-Case-Level Workload Analysis
Our engagement followed a structured methodology designed to isolate overscoping at every layer: commercial, technical, and operational.
First, we conducted a historical throughput audit. We reviewed three months of actual transaction volumes across all four use cases, extracted daily averages and peak-day volumes, and modelled a realistic 24-month projection using conservative growth assumptions (5% annual uplift). This replaced OpenAI's peak-based model with a data-driven, operations-anchored forecast. The result: a reduction of approximately 40% in baseline token allocation.
Second, we mapped each use case to the minimum viable model tier. We ran comparative testing on claims summarisation and email triage tasks using both GPT-4 and GPT-3.5 models, documenting accuracy, latency, and output quality. Both use cases performed within the group's tolerance on GPT-3.5 without degradation in business outcomes. We recommended GPT-3.5 for those two workloads and GPT-4 for policy Q&A and underwriting notes.
Third, we restructured the commercial agreement. Instead of a single 24-month commitment with fixed token allocations per model tier, we proposed: a 12-month initial term with transparent renewal triggers; separate token pools for GPT-3.5 and GPT-4, sized to actual monthly averages plus a 20% operational buffer; a model-substitution clause allowing the group to reallocate unused tokens across model tiers without penalty; and explicit extension rights for year 2 that didn't require renegotiation if the group remained within the token thresholds we had agreed.
The Outcome: $1.9M Saved, Flexibility Locked In
The revised agreement reduced the 2-year commitment from $6.4M to $4.5M—a 30% reduction, translating to $1.9M in immediate savings. The group paid nothing to achieve this. The leverage came from transparency: OpenAI was willing to re-scope once we presented the data-driven analysis.
But the financial saving tells only part of the story. The restructured agreement delivered three other gains the group valued equally:
Shortened initial commitment. The group moved from a two-year lock-in to 12 months. This mattered because GenAI capabilities—and internal adoption rates—were moving fast. A 12-month initial term let them pilot aggressively, measure ROI, and adjust tactics without being contractually bound to a three-year consumption model that might not fit reality by month 18.
Right-sized model allocations. By codifying GPT-3.5 for claims summarisation and email triage, the agreement locked in the cost advantage. The group wasn't paying premium prices for work that didn't need premium models. As OpenAI's GPT-3.5 pricing evolved (typically decreasing over time), the group would automatically benefit. Conversely, if they later discovered that GPT-4 would improve claims summarisation, the model-substitution clause allowed a swap without renegotiating the entire commitment.
Optionality for year 2. Many enterprise agreements lock you into renewal on vendor terms if you've consumed a certain percentage of tokens. Our version reversed that: if the group stayed within the agreed thresholds, they had the right to renew on the same terms, or to expand use cases to other workflows—policy issuance, customer support chatbots, fraud detection—without triggering a full commercial renegotiation. This was critical for a firm still building its GenAI roadmap.
Key Takeaways
Vendor proposals are anchored to peak consumption, not average. When OpenAI (or any SaaS vendor) models a new engagement, they build in safety margins. Those margins are reasonable from a risk-management perspective, but they can inflate costs by 30–50%. Always audit consumption forecasts against three to six months of historical operational data before committing to a multi-year agreement.
Model-tier decisions are often binary when they could be granular. Vendors propose a single model for an engagement because it's simpler to sell and manage. But workloads vary. Claims summarisation is not the same cognitive task as policy Q&A. The group saved $1.9M partly because they were willing to run two use cases on a lower-cost model tier. The trade-off in accuracy was negligible; the trade-off in cost was not.
Commitment structure matters as much as price. A 24-month lock at $6.4M sounds fixed and safe. A 12-month term with extension options and model-substitution rights sounds riskier but is actually more flexible. The group preferred the optionality, especially because they had negotiated the total financial envelope downward. If your GenAI strategy is still evolving—and most groups' strategies still are—structure for flexibility first, then price for scale.
Model-tier switching is a negotiating point. Most enterprise agreements define model allocation upfront and penalize over-usage. Few explicitly allow cost-neutral reallocation across model tiers. This was a gap in OpenAI's standard template. By making it a negotiating point, the group gained a tool to adapt their AI portfolio as new capabilities rolled out and use cases matured.
Renegotiating vendor agreements for your organization?
Our AI platform contract guidance covers model-tier optimization, consumption forecasting, and commitment structures that preserve optionality.The engagement concluded in March 2026. The group renewed for 12 months at the negotiated terms and began planning year 2 use cases—fraud detection, customer engagement analytics, and regulatory reporting—without renegotiating the commercial framework. They had paid 30% less than OpenAI's original proposal, shortened their initial commitment by 12 months, and created a foundation for GenAI expansion that didn't require starting commercial negotiations from scratch each time a new use case emerged.
This is how enterprise GenAI agreements should be structured: anchored in operational reality, modelled to actual workload complexity, and designed for the speed at which AI capabilities—and business priorities—evolve.