Enterprise AI Procurement Framework: A

Why Traditional SaaS Procurement Fails for AI

Standard enterprise SaaS procurement frameworks were designed for per-seat, fixed-term subscription agreements with predictable renewal cycles and established vendor risk profiles. AI platforms introduce five dimensions that fall outside this framework entirely.

First, consumption billing replaces per-seat pricing. AI platform costs scale with token consumption, GPU hours, or API call volume — not with headcount. Budget unpredictability is structural, not incidental. Second, model risk introduces a category of technical and regulatory risk that SaaS procurement checklists do not assess: hallucination rates, output consistency, model deprecation timelines, and AI Act compliance classification. Third, data governance requirements for AI platforms are more stringent than for SaaS applications because AI systems process input data in ways that affect model outputs, creating data retention, sovereignty, and non-training obligations that require contractual protections. Fourth, lock-in in AI platforms is architectural as well as contractual — fine-tuned models, embedded prompt flows, and integrated data pipelines create switching costs that accumulate rapidly. Fifth, the regulatory landscape for AI is evolving faster than any other technology category, with EU AI Act obligations, financial services model risk guidance, and healthcare AI regulations creating compliance requirements that must be embedded in vendor contracts from day one.

"Most enterprise AI RFPs fail because vendors are compared on demos and brand strength rather than measurable evidence of technical fit, governance maturity, and commercial sustainability."

Phase One: Define Requirements Before Evaluating Vendors

The most common procurement failure is launching vendor evaluation before the organisation has achieved internal alignment on what it is actually procuring. An AI platform is not a monolithic product — it is a combination of foundation model access, infrastructure, MLOps tooling, security controls, and support services that can be assembled in multiple configurations from multiple vendors.

Use Case Classification

Begin by classifying AI use cases by risk tier and capability requirement. Low-risk productivity use cases (document summarisation, meeting transcription, code assistance) have fundamentally different requirements from high-risk decision-support applications (credit risk scoring, medical imaging analysis, HR screening). The EU AI Act creates binding obligations for high-risk AI systems that must flow into vendor contracts; failing to classify use cases before procurement means these obligations will be identified after the contract is signed, creating renegotiation costs or compliance gaps.

Document each AI use case against five variables: the business function it serves, the data it processes (including sensitivity classification), the regulatory framework applicable to it, the minimum acceptable performance threshold, and the consequence of failure. This classification becomes the foundation for vendor requirement specifications that are measurable rather than narrative.

Build vs Buy Decision

The build versus buy decision deserves explicit analysis before entering vendor evaluation, because the answer determines the scope of procurement. The market has shifted decisively: 76 percent of enterprises now purchase AI platforms rather than building foundation models from scratch. However, the build versus buy question applies at multiple levels simultaneously — the foundation model layer (almost always buy), the application platform layer (buy or blend), and the last-mile integration layer (almost always build).

Most Fortune 500 organisations settle on a blended approach: purchasing vendor platforms for governance infrastructure, audit trails, multi-model routing, RBAC, DLP, and compliance attestations, while building custom retrieval, tool adapters, evaluation datasets, and sector-specific guardrails. Procurement frameworks must accommodate this blended architecture, with contracts covering both the platform layer and the professional services layer that builds the custom components.

Phase Two: The Six-Dimension Vendor Evaluation Framework

A rigorous AI vendor evaluation assesses six dimensions with equal weight, using measurable evidence rather than vendor narratives. This framework prevents the most common failure mode: selecting vendors based on demo quality and brand strength rather than the attributes that determine production success.

Dimension 1: Technical Fit

Technical fit assessment evaluates whether the vendor's AI capabilities meet the organisation's use case requirements across accuracy, latency, context window, multi-modal capability, and fine-tuning support. Require vendors to complete standardised technical evaluations on representative samples of the organisation's actual data and use cases — not generic benchmarks. Specify minimum accuracy thresholds, maximum acceptable latency, and required output format consistency as pass/fail criteria before any qualitative assessment.

Model freshness and deprecation policy deserve explicit assessment. OpenAI, Google, Anthropic, and others regularly release new model versions and deprecate older ones, sometimes with as little as three months' notice. Organisations that build production applications on specific model versions face forced migrations on the vendor's timeline. Require vendors to commit to minimum deprecation notice periods — at least twelve months for production models — and evaluate their historical track record on this commitment.

Dimension 2: Data Governance and Security

AI systems are inherently data-intensive, and data governance is the highest-stakes evaluation dimension for regulated industries. Assess vendors against five mandatory data governance requirements: data residency (the ability to specify geographic boundaries for data processing and storage), data isolation (confirmation that the organisation's data cannot be accessed by other customers or used to train the vendor's shared models), data retention controls (configurable retention periods with auditable deletion), audit logging (complete records of all data inputs and model outputs for regulatory review), and encryption at rest and in transit using the organisation's preferred key management infrastructure.

For OpenAI enterprise agreements specifically: the standard API does not train on customer data, but this commitment must be obtained in writing within the contract. Do not rely on the vendor's public policy documentation, which can be changed unilaterally. The contractual commitment to data non-training and the specific definition of customer data covered by that commitment are essential negotiating points in every OpenAI enterprise agreement.

Dimension 3: Governance and AI Compliance

The EU AI Act creates a tiered compliance framework based on AI system risk classification, with high-risk applications requiring conformity assessments, bias testing, transparency disclosures, and human oversight mechanisms. Vendors supporting high-risk AI use cases must provide compliance documentation that supports the customer's conformity assessment obligations. Require vendors to demonstrate: model explainability capabilities appropriate to the use case risk tier, bias testing results and remediation commitments, human override mechanisms for automated decision systems, audit trail functionality sufficient to support regulatory review, and a formal AI governance programme with named ownership.

Consumption billing creates a specific governance challenge: budget accountability. Azure OpenAI vs direct OpenAI API pricing should be compared not only on token cost but on governance tooling. Azure OpenAI includes native Microsoft Cost Management integration, Azure Policy controls, and Azure Monitor telemetry that provide superior consumption governance compared to direct OpenAI API, which requires third-party tooling for equivalent budget controls. The 10 to 15 percent Azure price premium is partially offset by governance tooling that would need to be procured separately for direct API deployments.

Dimension 4: Operating Model Compatibility

AI platforms must integrate with the organisation's existing technology stack, team capabilities, and operating processes. Evaluate vendor integration with the organisation's identity management (SSO, RBAC), monitoring and observability infrastructure, CI/CD pipelines for model deployment, and data platform architecture. Platform integration friction is consistently underestimated in initial procurement and overruns budget during implementation.

Assess the vendor's support model explicitly: what are the SLA commitments, what support tier is included in the base contract, and what are the escalation paths for production incidents? OpenAI's standard API tier does not include guaranteed SLA commitments — enterprise agreements can negotiate 99.9 percent uptime SLAs with credit remedies, but this requires explicit negotiation. Do not assume standard enterprise SLA protections apply without verifying the contract language.

Dimension 5: Economics and Contracting

AI platform economics require a fundamentally different financial model than SaaS procurement. Consumption billing creates budget unpredictability that must be managed through contract structure, not just operational controls. The economics dimension of vendor evaluation should assess: list price versus negotiated price for expected consumption volumes, the availability and pricing of reserved capacity or committed throughput options (Azure PTUs, AWS Bedrock provisioned throughput), minimum contract commitments and the consequences of under-utilisation, price escalation protections over multi-year agreements, and exit provisions including data portability costs.

Lock-in provisions in enterprise AI agreements deserve explicit legal review. OpenAI enterprise agreements contain multi-year commit structures, model version dependencies, and limited portability provisions that create financial and architectural lock-in. Azure OpenAI compounds this with infrastructure dependency on the Azure platform. Negotiate explicit exit provisions including: the right to export all custom model artifacts, prompt libraries, and evaluation datasets; API compatibility guarantees for minimum twelve months following any model deprecation; and termination-for-convenience rights with reasonable notice periods. These protections are negotiable — the default contract language favours the vendor.

Dimension 6: Business Value and Accountability

The final evaluation dimension is the most frequently skipped: what business outcomes will the vendor be contractually accountable for, and how will performance be measured? AI platform vendors are expert at framing evaluation in terms of capability features rather than outcome delivery. Require vendors to commit to measurable business value metrics — not just technical performance benchmarks — and define remediation commitments if those metrics are not achieved within defined timeframes.

Define pilot success conditions in contractual terms before any proof of concept begins. The most common procurement failure pattern is running an unstructured pilot, receiving positive qualitative feedback, and proceeding to a full enterprise commitment without evidence-based decision criteria. A rigorous procurement framework requires standardised scorecards with weighted criteria, required evidence thresholds, and predefined success conditions for pilots that create an objective basis for the production decision.

Need support structuring your AI vendor evaluation?

We've designed and executed AI procurement processes for 200+ enterprises across all major platforms.

Talk to an Advisor →

Phase Three: Contract Negotiation for AI Platforms

AI platform contract negotiation requires specialised expertise because the commercial structures are unlike traditional software licensing. Consumption billing, model versioning, compliance obligations, and lock-in provisions are all negotiating dimensions that do not have analogues in legacy enterprise software agreements.

Consumption Billing Protections

Negotiate budget guardrails directly into the contract: hard spending caps that trigger vendor notification before invoicing, the right to implement application-level token limits without vendor approval, and a 30-day billing dispute window for anomalous consumption charges. Consumption billing creates budget unpredictability that is an inherent feature of the pricing model — contract protections reduce but do not eliminate this risk.

Model Version Commitments

Require minimum twelve-month deprecation notice for any production model version the organisation's applications depend on. Require API compatibility guarantees that prevent breaking changes without advance notice and migration support. These commitments protect against forced migrations that can cost significantly more than the annual contract value to execute.

Data Provisions

In addition to the data non-training commitment discussed above, negotiate explicit data portability rights: the right to export all fine-tuned model weights, prompt libraries, evaluation datasets, and usage analytics in portable formats within thirty days of contract termination. Without these provisions, the organisation may discover that years of proprietary AI investment is effectively owned by the platform vendor.

SLA and Remediation

Negotiate explicit uptime SLA with financial remedies, not just service credits. Require escalation paths to named technical contacts for production-critical incidents. Require a defined root-cause analysis process for incidents above a threshold severity. Standard API tier agreements do not include these protections — enterprise agreements can and should.

Phase Four: Governance and Ongoing Management

AI procurement does not end at contract signature. The ongoing governance of AI platform costs, compliance, and performance requires a dedicated management process that differs materially from traditional software asset management.

Establish a monthly AI platform cost review at the application owner level, not just the finance reporting level. Tag all API calls by application, use case, team, and environment from deployment day one — retroactive tagging is practically impossible after production scale. Implement consumption dashboards with weekly trend analysis and anomaly alerting. Require quarterly compliance reviews against applicable AI regulation frameworks, including EU AI Act requirements for high-risk systems.

Create a model governance committee with representation from legal, compliance, security, and the business units deploying AI. This committee owns the organisation's AI use case classification, approves production deployments of high-risk AI systems, monitors vendor model updates for compliance impact, and manages the relationship with AI platform vendors at the commercial and governance level.

Priority Recommendations for AI Procurement Leaders

1. Never Evaluate AI Vendors on Demo Quality Alone: Require standardised technical evaluations on the organisation's own representative data. Define pass/fail criteria before evaluation begins. The most capable vendor for a benchmark use case may be entirely unsuitable for the organisation's specific requirements.

2. Classify AI Use Cases by Risk Tier Before Procurement: EU AI Act obligations, data governance requirements, and SLA expectations differ fundamentally between low-risk productivity tools and high-risk decision-support systems. Procurement terms must reflect these differences.

3. Flag Lock-In Provisions in Every AI Enterprise Agreement: OpenAI enterprise agreements contain commitment structures, model versioning dependencies, and limited portability provisions that should receive explicit legal review. Negotiate exit provisions, data portability rights, and model deprecation protections before signing.

4. Compare Azure OpenAI vs Direct OpenAI Explicitly: The Azure premium (10 to 15 percent over direct API pricing) is justified for regulated industries requiring data residency, compliance certification, and enterprise SLA. For organisations without these requirements, direct API access delivers equivalent capabilities at lower cost. This decision should be deliberate, not assumed.

5. Implement Consumption Governance Before Production Launch: Consumption billing predictability requires application-level token budgets, granular cost tagging, weekly consumption reviews, and scenario-based board approvals. Establish these controls before any application goes into production, not after the first unexpected invoice.

6. Negotiate Governance Commitments, Not Just Pricing: The most durable AI platform agreements include model deprecation commitments, data portability rights, compliance support obligations, and performance accountability metrics — not just discounted token rates. The governance terms will matter more than the price over the life of a three-to-five year AI platform commitment.

Enterprise AI Procurement Intelligence

Model price changes, contract negotiation insights, and AI governance updates across all major platforms — delivered monthly to enterprise procurement and compliance teams.

Subscribe to Newsletter →

Enterprise AI Procurement Framework: A CIO's Complete Guide