FinOps for AI: Governing GenAI and ML

Section 1: Why AI Spend is the New FinOps Frontier

The explosion in enterprise AI deployment has outpaced the governance frameworks designed to control it. Where traditional cloud spend (AWS, Azure, Google) is measured in virtual machine hours, AI spend operates on a fundamentally different consumption metric: tokens. One prompt, one completion, one API call can cost fractions of a penny or hundreds of dollars depending on model selection, context window, and batch size. This granularity makes visibility exponentially harder to achieve than infrastructure spend.

In 2024, only 31 percent of FinOps teams managed AI spending explicitly. By 2025, that figure jumped to 63 percent. Today, 98 percent of FinOps organisations now manage AI spend as a formal budget line. The FinOps Foundation's AI governance working group expanded the framework from public cloud only to a "Cloud+" model that encompasses public cloud, SaaS, AI, licensing, and on-premises infrastructure in a unified cost governance approach.

Gartner's 2025 estimate places GenAI enterprise spending at $644 billion globally. IDC warns that by 2027, 75 percent of G1000 organisations will face a 30 percent underestimation of AI infrastructure costs, translating to billions in unanticipated spend across the enterprise. The visibility problem is acute: AI costs scatter across OpenAI API, Azure OpenAI, Google Vertex AI, AWS Bedrock, internal ML platforms, vector databases, and embedded LLM capabilities within SaaS applications. No unified billing layer captures spending across all these touchpoints.

This guide covers the complete FinOps framework for AI governance, from consumption measurement to contract negotiation to cost allocation and optimisation strategies. The goal is straightforward: achieve cost transparency, enforce accountability, and deliver 60–80 percent savings through intelligent model selection and governance gates.

Section 2: The AI Cost Taxonomy

AI spend in the enterprise divides into three distinct cost categories, each requiring different measurement, allocation, and optimisation approaches.

Category 1: API and Token Costs

Token-based costs are the most visible and fastest-growing AI expense. OpenAI charges for input (prompt) tokens and output (completion) tokens separately, ranging from $0.00015 per input token for GPT-4o mini to $0.015 per input token for GPT-4 Turbo. Google Gemini Enterprise charges $0.01 per input token and $0.03 per output token. Anthropic Claude pricing scales from $0.003 per input token for Claude Haiku to $0.015 for Claude Sonnet 4.6.

Context window matters enormously. A 128,000-token context window (Claude) costs more to process than a 4,000-token window because longer contexts mean more input tokens charged. A single RAG (retrieval-augmented generation) call with a 10,000-token context window injected costs significantly more than a simple completion request. Chat applications with long conversation histories accumulate context costs rapidly, becoming expensive at scale.

Category 2: Infrastructure Costs (GPU Compute)

Training and inference of proprietary or fine-tuned models runs on GPU infrastructure. AWS charges $3.90 per hour for on-demand H100 instances, whilst spot instances cost $1.00 to $2.00 per hour. Azure and Google Cloud offer similar spot pricing at $1.38 to $1.50 per hour for specialist GPU clouds. Commitment discounts (AWS Reserved Instances, Azure Reservations, GCP Commitments) reduce on-demand rates by 60–72 percent but require annual or three-year prepayment.

The choice between on-demand and spot is architectural. Real-time inference workloads require on-demand GPU capacity with SLAs. Training and batch processing can tolerate spot instance interruptions with graceful retry logic, delivering 85–90 percent cost reduction. For organisations running large language models (LLMs) internally, the GPU infrastructure bill often exceeds API token costs.

Category 3: SaaS AI Subscriptions

Enterprise-grade AI subscriptions charge per seat per month. Microsoft Copilot Pro costs $20 per user per month. Copilot for Microsoft 365 (enterprise) costs $30 per user per month. Google Gemini Enterprise launched October 2025 at $30 per user per month. OpenAI Enterprise is priced at $45 to $75 per user per month with a 150-seat minimum and annual commitment required. Anthropic Claude Enterprise is approximately $30–35 per seat for 500+ seat licenses.

These subscriptions often bundle features (higher rate limits, priority inference, custom model tuning, audit logging) that incrementally shift from PAYG token costs to fixed seat-based pricing. The crossover point where seat subscriptions beat token-based consumption varies by organisation size and usage pattern.

Hidden Costs

Data egress charges when moving data out of training environments cost $0.09 to $0.20 per GB. Vector databases (Pinecone, Weaviate, Milvus) charge for storage and query operations. Fine-tuning runs incur per-token charges plus infrastructure. Embedding generation for RAG systems adds vector creation costs. Prompt caching, where supported, reduces costs by 90 percent for repeated queries but requires platform investment.

Need a comprehensive AI spend assessment?

Map all token costs, infrastructure, and SaaS subscriptions with precise visibility.

Request AI Audit →

Section 3: The FinOps for AI Framework

The FinOps Foundation's governance approach for AI rests on five integrated disciplines that move organisations from reactive cost management to proactive governance.

Discipline 1: Understand AI Unit Economics

Define the fundamental cost drivers specific to each AI workload. For an AI chatbot, measure cost-per-query and cost-per-active-user. For batch processing, measure cost-per-batch and cost-per-output-token. For SaaS AI subscriptions, calculate cost-per-feature and cost-per-transaction. Unit economics are the foundation for all downstream allocation and optimisation.

Discipline 2: Allocate AI Costs to Teams and Use Cases

Token-level cost allocation is the critical capability gap in most enterprises. Tagging prompts with team identifiers, product codes, and use case labels enables cost showback at granular levels. Azure OpenAI's FinOps toolkit supports FOCUS 1.2 compliance for AI billing data, normalising tokens as consumption units. Internal API gateways (Kong, API7, Apigee) can enforce cost allocation tags before forwarding requests to GenAI APIs.

The FinOps Foundation's FOCUS 1.2 specification, updated in 2025, extends unified billing to include AI services. This normalisation allows enterprises to view cloud spend, SaaS spend, and AI spend in a single cost data model, applying consistent allocation logic across all technology categories.

Discipline 3: Establish Governance Gates for New AI Deployments

No new AI capability should enter production without cost pre-approval. Governance gates should capture: model selection justification (why Claude Opus versus Claude Sonnet?), cost-per-transaction or cost-per-user projection, break-even analysis, and business value quantification. Monthly cost reviews with department leaders keep spending visible and accountable.

Discipline 4: Optimise Model Selection and Usage Patterns

This is the highest-leverage cost control mechanism. An organisation deploying exclusively premium models pays 8–10 times more than one using intelligent tiering. Prompt caching, context window optimisation, and batch processing can reduce token costs by 50–90 percent for specific workload patterns.

Discipline 5: Report Business Value Against AI Spend

Cost governance without business context is false economy. Measure cost-per-outcome: cost per customer inquiry resolved, cost per candidate evaluated, cost per document classified. When AI spending delivers measurable business value at known unit cost, budget allocation becomes a strategic decision, not a cost-cutting exercise.

Section 4: OpenAI, Gemini, and Claude — Commercial Models Compared

The market for enterprise generative AI has consolidated around four primary vendors, each with distinct pricing, capabilities, and contract terms.

OpenAI Enterprise

OpenAI's enterprise offering, launched in January 2024, prices at $45 to $75 per user per month with a 150-seat minimum and annual commitment requirement. The current model generation is GPT-5, with advanced reasoning and extended context. Enterprise customers receive: dedicated infrastructure, advanced security options, audit logging, custom instructions, and commercial use rights for generated outputs. OpenAI does not offer training data residency commitments outside the US or EU.

Google Gemini Enterprise

Google launched Gemini Enterprise as a standalone product in October 2025 at $30 per user per month, significantly undercutting OpenAI on pricing. The offering includes Gemini 2.0 models, 1 million token context window, integrated workspace tooling, and audit logging. Gemini Enterprise requires commitment but allows month-to-month after the initial term. Google's data residency options are more flexible than OpenAI's, with multi-region deployment available.

Anthropic Claude Enterprise

Anthropic's Claude Enterprise is priced at approximately $30–35 per seat for 500+ seat licenses. The current model generation includes Claude Sonnet 4.6 and Claude Opus 4.6, with 200,000-token context window standard. Claude Opus is positioned as the most capable model for complex reasoning tasks. Anthropic offers a data DPA with clear commitments that prompts are not used for training. For enterprises sensitive to data residency, Claude provides transparent handling of prompt data.

Azure OpenAI Service

Microsoft's Azure OpenAI offering operates on two pricing models: Provisioned Throughput Units (PTU) for predictable, consistent workloads, and pay-as-you-go (PAYG) tokens for variable demand. PTU pricing starts at $40 per hour for baseline capacity, offering 40–50 percent cost reduction versus PAYG for consistent workloads. PAYG tokens mirror OpenAI's public pricing. Azure OpenAI is integrated with Microsoft 365 security and compliance, making it the default choice for Microsoft-centric organisations.

AWS Bedrock

AWS Bedrock provides multi-model API access to OpenAI, Anthropic Claude, Google Gemini, and proprietary models (Amazon Titan, Mistral, Cohere) through a unified interface. Pricing is consumption-based with no seat minimums. Bedrock's advantage is API abstraction: switch models without code changes, manage capacity across multiple vendors in one console. Bedrock's disadvantage is cost opacity: enterprise customers cannot easily benchmark PAYG token rates against direct vendor pricing.

Key Contract Terms to Negotiate

Any AI vendor commitment should address four non-negotiable contract elements: (1) AI Data Processing Agreement (DPA) clarifying prompt ownership and training use; (2) IP indemnification for AI-generated outputs if copyright claims arise; (3) Data residency commitment specifying where prompts are processed and stored; (4) Exit rights covering model portability, fine-tuning export, and termination without penalty.

Section 5: GPU Cloud Infrastructure — The Hidden Cost Layer

Enterprise AI training and inference at scale runs on GPU infrastructure. A single H100 GPU costs $30,000 to $40,000. Enterprise clusters require 8, 16, or more H100s networked for distributed training. Understanding GPU cloud economics is essential for any organisation deploying proprietary models or large-scale fine-tuning.

Hyperscaler Pricing and Commitment Discounts

AWS on-demand H100 instances cost $3.90 per hour. Azure H100 clusters cost $3.40 per hour on-demand. Google Cloud H100 pricing is $3.25 per hour on-demand. Spot instance pricing (interruptible compute) drops to $1.00 to $2.00 per hour, representing 70–85 percent savings. Specialist GPU cloud providers (Lambda Labs, CoreWeave, Paperspace) often undercut hyperscalers by 20–40 percent for sustained training workloads.

AWS Savings Plans covering three-year commitments reduce on-demand rates by 72 percent. Azure Reservations offer similar discounts. GCP Commitments provide 60–70 percent discounts. For organisations projecting sustained GPU infrastructure spend, commitment discounts are often mandatory to achieve acceptable economics.

Spot Instance Strategy for Training

Spot instances are suitable for training and batch inference but not for real-time inference workloads with SLA requirements. Organisations implementing graceful retry logic, checkpoint saving, and distributed training across multiple spot instances can reduce GPU costs by 85–90 percent. The implementation cost (engineering effort to implement fault tolerance) must be weighed against cost savings for smaller teams.

Section 6: Showback and Chargeback for AI Spend

Organisational units deploying AI without cost transparency experience unconstrained demand growth. The data science team orders Claude Opus for every use case. The product team fine-tunes GPT-4 Turbo without cost justification. The operations team spins up GPU clusters for experimental models that never ship. Chargeback systems create accountability.

Showback First, Chargeback Second

Begin with showback: make AI costs visible to each department without directly charging them. Dashboard reporting of AI spend by team, product, and use case creates awareness before cost allocation becomes painful. After six months of showback, introduce gentle chargeback: teams begin to bear 50 percent of their AI costs, with IT absorbing the remainder. Gradually shift to full chargeback as teams optimise their usage patterns.

Token-Level Cost Allocation

Tagging every token-based request with team and product identifiers enables precise cost allocation. Azure OpenAI's cost tagging system supports this natively. For organisations using OpenAI's public API or alternative models, API gateway middleware (Kong, Apigee, custom wrappers) can inject tags before forwarding requests. Tags should include: team identifier, product code, use case classification, and cost centre.

Per-Team Unit Economics Benchmarking

Establish benchmarks: cost-per-customer-interaction for chatbots, cost-per-document-processed for data classification, cost-per-employee for copilot tools. Compare per-team unit economics monthly and identify outliers for review. Teams with 50 percent higher unit costs than peers are candidates for model optimisation, prompt engineering improvements, or caching strategy implementation.

Governance Policy: Monthly Reviews and Budget Guardrails

Establish a monthly AI spend governance board with representatives from finance, technology, and business units. Review spend trends by team and use case. Set per-team budget guardrails with approval gates for overages. Define escalation paths: spending exceeding budget by 20 percent requires vice president sign-off; 40 percent requires C-suite approval.

Section 7: Model Selection and Cost Optimisation Strategies

The single highest-leverage cost optimisation mechanism is intelligent model tiering. Deploy 70 percent of requests to budget models (Claude Haiku, GPT-4o mini, Gemini 1.5 Flash), 20 percent to mid-tier models (Claude Sonnet, GPT-4o, Gemini Pro), and 10 percent to premium models (Claude Opus, GPT-4 Turbo, Gemini Ultra). This distribution delivers 60–80 percent cost reduction versus all-premium routing whilst maintaining quality for complex reasoning tasks.

Prompt Caching and Context Optimisation

OpenAI's Prompt Caching feature reduces costs by 90 percent on cached reads, charging 10 percent of the full input token rate for repeated context windows. This is transformational for RAG systems with stable knowledge bases, document analysis with large reference materials, and multi-turn conversations with long system prompts. Anthropic and Google offer similar caching mechanisms. Organisations deploying RAG-heavy architectures should architect for caching from the start.

Context window management is equally important. Injecting 50,000 tokens of irrelevant context into every prompt multiplies costs 10-fold versus surgically selecting 5,000 tokens of relevant context. Advanced RAG systems use embedding similarity search and reranking models to minimise injected context whilst maintaining quality.

Fine-Tuning versus RAG Cost Trade-Offs

Fine-tuning a model on 10,000 training examples costs $5,000 to $15,000 depending on model size. The resulting fine-tuned model then costs slightly less per token than the base model. RAG systems with vector retrieval and dynamic context injection cost more per inference but require no training investment. The decision tree is straightforward: if the custom knowledge corpus is stable and reused across thousands of queries, fine-tuning wins. If the knowledge corpus is frequently updated or use cases are unpredictable, RAG wins.

Batch Processing for Non-Real-Time Use Cases

OpenAI's Batch API and similar bulk processing endpoints charge 50 percent less than real-time API rates for asynchronous requests processed during off-peak hours. Email summarisation, document classification, and analytics processing are candidates. The trade-off is latency: batch requests are processed within 24 hours rather than milliseconds. For non-real-time workloads, batch processing delivers substantial savings.

Model Version Management and Deprecation

OpenAI and other vendors continuously release new model versions at lower cost. GPT-4 Turbo is 50 percent cheaper than GPT-4 for similar capabilities. Gemini 1.5 Flash is 80 percent cheaper than Gemini Pro for many tasks. Organisations must establish a quarterly model review cycle: evaluate new models against incumbent models, measure quality impact, and migrate to cheaper alternatives where quality is maintained.

Section 8: Governing AI Contracts — The Four Non-Negotiables

AI vendor contracts differ fundamentally from traditional software licensing because the vendor's systems process sensitive customer data (prompts). Standard SaaS contracts are insufficient. Every AI vendor agreement must include four specific contractual provisions.

AI Data Processing Agreement (DPA)

Clarify in writing: (1) Does the vendor use prompts for model training? (2) Can the vendor retain prompts after processing? (3) Is there an opt-out mechanism for sensitive data? (4) Who owns the prompts and the outputs? OpenAI's enterprise product explicitly states that prompts are not used for training. Anthropic provides written commitments on data handling. Many SaaS vendors embed LLM capabilities without clear data commitments. Before deploying any AI vendor product with sensitive data, obtain written clarification on data handling.

IP Indemnification

AI models are trained on internet-scale data, including copyrighted material. Copyright litigation around AI is nascent but accelerating. Any vendor contract should include: "Vendor indemnifies Customer against claims that AI-generated outputs infringe third-party IP rights." OpenAI Enterprise includes this indemnification. Most other vendors do not. Negotiating indemnification is essential for organisations in regulated industries.

Data Residency

GDPR and other privacy regulations may require that customer data remain within specific geographic regions. Organisations with GDPR obligations must contractually ensure that prompts are not transmitted to the US or other jurisdictions. Azure OpenAI and Google Gemini Enterprise offer multi-region deployment. OpenAI Enterprise offers limited residency options. Negotiate residency requirements into the contract before signing.

Exit Rights and Model Portability

What happens to fine-tuned models if you switch vendors? Can you export training data and prompts? Are there termination fees if you exit before contract end? Responsible vendors provide export mechanisms and reasonable exit provisions. Negotiate the right to download fine-tuned model weights, training data, and prompt logs at contract termination without penalty or delay.

Section 9: Building Your AI FinOps Operating Model

Implementing AI FinOps requires organisational structure, tooling, and governance cadence. Unlike traditional cloud FinOps (which IT owns), AI FinOps requires collaboration between IT, data science, business units, and finance.

Roles and Responsibilities

The AI FinOps practitioner role combines technical knowledge of GenAI APIs, cost visibility tooling, and procurement. This person (or small team in large enterprises) owns cost tracking, model benchmarking, and contract negotiations. The AI governance owner sits in the office of the CTO or Chief Data Officer and owns approval gates, policy enforcement, and unit economics tracking. The AI platform team owns infrastructure (GPU clusters, API gateway middleware, cost tagging systems). Business unit budget owners own departmental AI spend forecasting and optimisation within their budgets.

Tooling Stack

Native cloud cost explorers (AWS Cost Explorer, Azure Cost Management) provide PAYG token cost visibility but often lack model-level detail. Specialised AI observability platforms (LangSmith, Portkey, Helicone) capture token counts, model selection, latency, and cost per request, feeding into FOCUS-compliant cost data. API gateway middleware (Kong, API7) enforces cost allocation tagging and rate limiting. Internal dashboards (Looker, Tableau, Grafana) consume FOCUS-compliant cost data and present unit economics by team and use case.

Governance Cadence

Weekly spend alerts flag unexpected spikes. Monthly unit economics reviews examine cost-per-transaction trends by use case and team. Quarterly governance board meetings assess new use cases, model selections, and contract negotiations. Annual AI spend forecast and budget setting incorporates historical burn rates, anticipated new projects, and cost optimisation initiatives.

Key Performance Indicators

Track: cost-per-business-outcome (cost per customer interaction, cost per document processed); AI ROI by use case (business value divided by AI spend); budget variance (actual spend versus forecast); model efficiency ratio (average token cost per request, trending down over time). These metrics close the loop: cost governance becomes not merely cost control but cost accountability tied to business impact.

Stay Informed on AI Spend Governance

AI pricing, model releases, and FinOps frameworks evolve monthly. Subscribe to the Redress GenAI Knowledge Hub for quarterly updates on AI vendor pricing, model benchmarking, and cost optimisation strategies.

Subscribe to Newsletter →

Section 10: Getting Started — The Redress AI FinOps Advisory Approach

Moving from reactive AI spend to governed AI economics requires a structured engagement. The Redress AI FinOps program is built on four foundational phases.

Phase 1: 4-Week AI Spend Audit (Weeks 1–4)

Map all current AI spending across OpenAI, Azure OpenAI, Gemini, Bedrock, internal ML platforms, and embedded LLM capabilities in SaaS applications. Establish baseline cost visibility. Segment spending by team, product, use case, and model. Calculate unit economics (cost-per-query, cost-per-active-user, cost-per-transaction). This phase delivers your first complete picture of where AI dollars actually flow.

Phase 2: Governance Design (Weeks 5–8)

Design the chargeback model: how are costs allocated to teams and business units? Define approval gates: what cost or new-model criteria require governance review? Design the model tiering policy: budget, mid-tier, and premium model distribution targets. Create the reporting dashboard: weekly/monthly metrics, unit economics by team, and budget variance tracking. This phase defines the operating model that will sustain AI cost governance long-term.

Phase 3: Contract Optimisation (Weeks 9–12)

Renegotiate existing AI vendor subscriptions at next renewal with utilisation data, volume commitments, and competitive benchmarking as leverage. Implement OpenAI Batch API where asynchronous processing is feasible, capturing 50 percent cost reductions. Evaluate alternative models (Anthropic Claude, Google Gemini) against incumbent solutions for identical use cases and renegotiate based on equivalent capability at lower cost.

Phase 4: Continuous Optimisation (Ongoing)

Monthly unit economics reviews identify high-cost use cases for optimisation (model tiering, prompt caching, context optimisation). Quarterly model benchmarking evaluates new model releases for cost-per-capability improvements. Annual AI spend forecast incorporates historical trends, anticipated new projects, and cost optimisation initiatives.

For comprehensive guidance on the intersection of FinOps and enterprise technology governance, review our resources on FinOps for enterprise software licensing and enterprise technology cost governance. For AWS-specific governance frameworks, consult our guide to AWS spend governance and negotiation. For organisations operating on Oracle infrastructure, review the Oracle OCI and infrastructure cost framework.

AI spend governance is ultimately an exercise in organisational behaviour. When teams bear the cost of their AI decisions, they optimise. When costs are hidden or centralised, unconstrained growth continues. The framework, tooling, and governance cadence in this guide exist solely to create transparency and accountability.

The organisations that will dominate AI adoption in 2027 and beyond will not be those with the largest AI budgets but those with the most disciplined AI cost governance. A 20 percent reduction in per-unit AI costs across 1,000 use cases compounds to hundreds of millions in savings. More importantly, cost discipline forces hard questions about business value: which AI capabilities genuinely drive outcomes, and which are digital theatre? That clarity is worth far more than the cost savings alone.

FinOps for AI: Governing GenAI and ML Spend in the Enterprise