The Challenge: Token Costs Scaling Faster Than Revenue
A global streaming media company — serving 48 million subscribers across 60 countries — had embedded OpenAI API capabilities into four production systems: automated subtitle generation and quality review, content metadata enrichment and tagging, personalised content recommendation explanation (generating natural-language summaries of why content was recommended), and a generative AI content marketing function producing localised promotional copy for new releases.
The deployments had been built over 18 months by separate product and engineering teams, each integrating directly with the OpenAI API on pay-as-you-go pricing. By Q4 2024, aggregate monthly token consumption had reached a scale that placed the company in the top tier of OpenAI enterprise customers by volume — yet it had no enterprise agreement, no volume discount, and was paying list pricing on every token.
The financial impact was material. AI API costs were running at $148K per month — $1.78M annualised — and growing at approximately 12% per month as subscriber numbers and content catalogue size expanded. The company's CFO had flagged AI infrastructure as the fastest-growing line item in the technology cost structure, with projections showing costs reaching $3.2M in the following 12 months without intervention. The CTO engaged Redress Compliance to conduct a full commercial review of the AI API cost structure and negotiate a framework appropriate to the company's scale.
— Chief Technology Officer, anonymised global streaming media company
The Approach: Workload Analysis, Batch Architecture, and Enterprise Restructuring
Workload Latency Classification
The first step was a latency requirements audit across the company's four AI workloads. This revealed a fundamental commercial opportunity: two of the four workloads — subtitle quality review and metadata enrichment — had no user-facing latency requirement. They were running as real-time API calls because they had been built that way, not because the use case required it. Combined, these two workloads represented 61% of total monthly token spend.
Batch Processing Migration
Redress Compliance's engineering advisory team worked with the company's infrastructure engineers to migrate subtitle quality review and metadata enrichment to OpenAI's asynchronous batch processing mode. Batch mode delivers approximately 50% reduction in token pricing with up to 24-hour processing latency — entirely compatible with the overnight processing cadence that both workloads already used in practice. The migration was completed in three weeks with no change to output quality or downstream system interfaces.
Model Right-Sizing for Real-Time Workloads
For the two real-time workloads — personalised recommendation explanations and promotional copy generation — Redress Compliance conducted a capability benchmarking exercise comparing the flagship model against mid-tier alternatives for both use cases. Recommendation explanations, which required factual accuracy but limited creative complexity, were migrated to a mid-tier model achieving equivalent quality scores at 71% lower per-token cost. Promotional copy generation retained the flagship model for creative quality reasons, but average context windows were reduced by 24% through prompt engineering and few-shot example caching.
Enterprise Agreement Negotiation
With the optimised usage profile modelled, Redress Compliance negotiated a 12-month enterprise committed spend agreement with OpenAI, securing a 26% volume discount on real-time flagship model traffic and a separate preferred rate structure for the company's batch workloads. The agreement included a 20-month rate lock and a data governance addendum confirming that subscriber behaviour data processed through recommendation models was excluded from OpenAI's training pipeline — a requirement driven by the company's data privacy obligations in the EU and California.
High-volume AI workloads? Download our token cost optimisation guide.
Batch processing, model tiering, committed spend, and rate lock strategiesThe Outcome: $3.2M in AI Infrastructure Savings
| Intervention | Detail | Annual Impact |
|---|---|---|
| Batch migration (subtitle + metadata, 61% of traffic) | 50% token cost reduction on batch workloads | $653K/year saved |
| Model right-sizing (recommendation explanations) | Migrated to mid-tier model, 71% lower cost | $290K/year saved |
| Prompt engineering and context reduction | 24% context reduction on flagship workloads | $118K/year saved |
| Enterprise committed spend (26% discount) | 12-month agreement, 20-month rate lock | $539K/year saved |
| Total over 24 months | $3.2M saved |
The restructured commercial model reduced monthly API spend from $148K to $68K — a 54% reduction — while accommodating the company's projected 12% monthly growth. On an annualised basis, the savings were $960K in year one, growing to $2.24M over 24 months as the growth trajectory was absorbed by the optimised cost structure rather than passed through at list prices. Total savings over 24 months: $3.2M.
The subscriber data governance addendum resolved a latent regulatory risk the company had not previously scoped: GDPR Article 22 requirements applicable to automated decision-making from personalised recommendation systems, and CCPA obligations governing the use of subscriber behaviour data in AI model development. Both were addressed in the renegotiated agreement without requiring product or infrastructure changes.
— Chief Financial Officer, anonymised global streaming media company
Key Lessons for High-Volume AI Deployments
Three lessons from this engagement are directly applicable to any company running AI workloads at scale. First, real-time API calls are the default integration pattern, not always the right one. A significant proportion of production AI workloads have no genuine real-time requirement; migrating them to batch processing is one of the highest-value commercial changes available — delivered in weeks with no functional impact. Second, volume without an enterprise agreement means paying list pricing at scale. OpenAI does not proactively offer enterprise agreements to high-volume API customers; buyers must identify and negotiate their way into the commercial tier appropriate to their usage. Third, data governance addenda are commercially negotiable and legally necessary. For companies processing subscriber, customer, or employee data through AI systems, the standard API terms do not provide the contractual protections required by GDPR, CCPA, or sector-specific regulations — and this exposure grows directly with deployment scale.
High-volume streaming or media AI workloads? Let's build your commercial framework.
Batch architecture, enterprise agreements, and data governance — buyer side only.