The State of the LLM Market in Enterprise AI
The enterprise AI market has bifurcated into two distinct procurement models. The commercial AI model — dominated by OpenAI's GPT-4o and o1 series, Anthropic's Claude, and Google's Gemini — offers API-based access to frontier models with consumption-based pricing, enterprise support, and data processing agreements. The open source model — led by Meta's Llama 3, Mistral AI's models, Alibaba's Qwen, and Google's Gemma — provides model weights that organisations can download, fine-tune, and deploy on their own infrastructure, at the cost of managing that infrastructure themselves.
A 2025 analysis across enterprise AI deployments found that open source LLMs now achieve approximately 80 percent of proprietary model coverage for real-world enterprise use cases, at 86 percent lower cost on a per-token basis. Gartner forecasts that more than 60 percent of enterprises will adopt open source LLMs for at least one production AI application by the end of 2025, up from 25 percent in 2023. The inflection point in open source capability — driven largely by Meta's Llama 3.1 and 3.3 families and Mistral's Large 3 model — has made the commercial-only AI strategy increasingly difficult to justify on cost grounds alone.
Open Source LLM Landscape and Licensing
Meta Llama: The Market-Dominant Open Source Model
Meta's Llama family — particularly Llama 3.1 70B and 3.3 70B — represents the most widely deployed open source LLM in enterprise environments. The Llama Community License permits royalty-free commercial use, modification, and distribution for most organisations, subject to attribution requirements ("Built with Llama") and one critical threshold: organisations with more than 700 million monthly active users across their products must obtain a separate commercial license from Meta. This threshold is irrelevant for all but a handful of the largest technology platforms, but it is a contractual obligation that enterprise legal teams must document.
A notable restriction applies to Llama 3.2, 3.3, and the Llama 4 multimodal models: Meta's license contains an EU geographical restriction that prevents EU-based organisations from being the licensee. This restriction applies to the deploying organisation's location, not to end-user geography — EU companies can still provide Llama-powered products to users globally, but EU-domiciled enterprises cannot be the named licensee. For European enterprises, this creates a material legal complexity that requires careful structuring, often through a non-EU entity or by choosing alternative models such as Mistral.
Llama also prohibits using the model outputs to train competing models (Llama 3.0) though this restriction was relaxed from Llama 3.1 onwards, which permits training on outputs with proper attribution. Legal review of the specific Llama version in use is essential before incorporating output data into training pipelines.
Mistral: The European Alternative with Apache 2.0
Mistral AI's flagship models — Mistral Large 3, Mistral Medium, and the Mixtral mixture-of-experts family — are released under the Apache 2.0 license, which provides the most permissive commercial terms in the open source AI landscape. Apache 2.0 permits unrestricted commercial use, modification, and distribution, with only attribution requirements. Enterprise legal departments that require simple, unambiguous licensing terms — particularly European organisations affected by Meta's EU restriction — consistently prefer Mistral's licensing over Meta's bespoke Community License.
Mistral Large 3, released in December 2025, is a 675 billion parameter mixture-of-experts model that performs competitively with GPT-4o class models on most benchmarks while running under an Apache 2.0 licence. Mistral's European provenance also provides genuine data sovereignty advantages for EU enterprises: models can be self-hosted within EU data centres with no data leaving EU jurisdiction, supporting GDPR Article 44 compliance without the data residency engineering complexity that commercial API providers require.
Qwen and Other Alternatives
Alibaba's Qwen 3 family (also Apache 2.0 licensed) and Google's Gemma (Gemma Terms of Use license, broadly permissive but with some restrictions) round out the primary open source options. Qwen 3 235B performs at frontier model levels on many benchmarks at a fraction of commercial API cost. Gemma is notable for its small model sizes (2B and 7B parameters) optimised for edge and on-device deployment, with licensing terms that are more restrictive than Apache 2.0 but broadly permissive for commercial enterprise use.
Commercial AI: The Real Cost of Consumption Billing
Commercial AI providers — primarily OpenAI, Anthropic, and Google — price their API access on consumption-based models: you pay per token processed (input tokens and output tokens billed separately). This creates fundamental budget unpredictability that enterprise finance teams consistently underestimate when approving initial AI projects.
The unpredictability problem is structural. AI workloads are not like traditional SaaS subscriptions where usage is relatively stable and predictable. Token consumption varies with prompt engineering choices, document sizes, context window usage, and application traffic patterns. A chatbot that handles five-paragraph queries in development may encounter users submitting 50-page documents in production. A summarisation pipeline that processes 10,000 documents per month in initial testing may scale to 500,000 documents per month within six months of production. Each of these growth scenarios compounds token costs in ways that are difficult to model before production data is available.
Across commercial AI deployments, actual production costs routinely exceed initial budget estimates by 40 to 300 percent in the first year. The most common cause is not price changes — OpenAI's token prices have generally decreased over time — but rather consumption growth driven by successful adoption that exceeds the initial volume model.
OpenAI Enterprise Agreements: Lock-In Provisions You Must Review
OpenAI enterprise agreements contain several provisions that create material vendor lock-in that organisations frequently overlook during initial contract review. First, enterprise agreements require minimum usage commitments — typically expressed as minimum annual token volumes or minimum annual spend thresholds — that create financial obligations even if the organisation's AI usage reduces or the product is deprecated in favour of a better model. Second, enterprise agreements typically include model version continuity provisions that specify which model versions are covered under the agreement and at what price, creating complexity when OpenAI releases successor models (for example, the transition from GPT-4 to GPT-4o to GPT-4.5 required contract renegotiation rather than automatic benefit). Third, data processing agreements in OpenAI enterprise contracts contain provisions around model training opt-outs, data retention, and jurisdiction that require careful legal review for regulated industries.
Organisations that sign OpenAI enterprise agreements without independent legal review of these lock-in provisions frequently discover that migrating to an alternative model — whether Anthropic Claude, Google Gemini, or an open source alternative — requires breaking contract commitments or paying minimum spend penalties. The lock-in is not primarily technical (OpenAI's APIs are relatively standard) but contractual.
Evaluating your GenAI procurement strategy?
We provide independent AI vendor contract review and cost modelling.Azure OpenAI vs Direct OpenAI: A Critical Pricing Comparison
Many enterprise AI buyers do not realise that they have a choice between accessing GPT-4o and other OpenAI models directly through OpenAI's API or through Microsoft's Azure OpenAI Service. The pricing models, data handling terms, and commercial implications differ significantly between the two access paths.
On pricing: Azure OpenAI and direct OpenAI API use the same published token prices for most models. The price parity is deliberate — Microsoft and OpenAI maintain list price alignment to avoid channel conflict. However, enterprise customers with existing Azure commitments or Azure Enterprise Agreements (EAs) can consume Azure OpenAI token usage against their Azure committed spend, effectively unlocking discounts and reserved capacity terms that the direct OpenAI API does not offer. Organisations with large Azure EAs who purchase AI via the Azure marketplace receive better effective per-token economics than equivalent direct OpenAI contracts, because Azure consumption qualifies for EA discounts and committed spend credits.
On data residency and compliance: Azure OpenAI provides data residency in specific Azure regions (including EU regions) with Microsoft's enterprise compliance framework, SOC 2, ISO 27001, and GDPR data processing agreements built into the Azure service terms. Direct OpenAI API provides OpenAI's own compliance framework, which is strong but less mature than Microsoft's enterprise compliance infrastructure. For regulated industries — financial services, healthcare, government — Azure OpenAI's compliance posture is typically stronger and easier to document for regulatory purposes.
On model availability: Azure OpenAI does not always have the latest OpenAI models available immediately. New OpenAI model releases are typically available on the direct API first, with Azure availability following days to weeks later. Organisations that need access to the absolute latest OpenAI models must access them directly, at least temporarily.
The practical implication: enterprises with significant Azure EA commitments should default to Azure OpenAI for consumption of OpenAI models, while organisations without Azure commitments or with multi-cloud strategies should evaluate direct OpenAI API access with careful contract review of minimum spend and lock-in terms.
The Hidden Costs of Open Source LLMs
The 86 percent cost reduction claim for open source LLMs is accurate in a narrow context: API token costs. But open source LLMs are not free — they shift costs from recurring API fees to one-time and ongoing infrastructure and engineering costs that are less visible on a vendor invoice but are no less real.
GPU Infrastructure Costs
Running a 70 billion parameter model like Llama 3.3 70B requires approximately two A100-80GB GPUs at minimum for inference at acceptable latency. At cloud market rates, two A100 instances on AWS or Azure cost approximately $5 to $8 per hour depending on reservation level and cloud provider. At 8,760 hours per year, this represents $43,800 to $70,080 per year in raw infrastructure cost before networking, storage, and management overhead. For high-throughput workloads, multiple GPU instances running in parallel multiply this base cost. On-premises GPU hardware amortises over three to five years but requires significant upfront capital: an A100 server with two GPUs costs approximately $30,000 to $50,000 in 2025 hardware pricing.
Engineering and Operational Costs
Deploying and operating an open source LLM in production requires ML engineering capabilities that most enterprise IT organisations do not have in-house. Fine-tuning the base model for domain-specific tasks, integrating the model into application APIs, building monitoring for model drift and output quality, managing GPU driver and framework updates, and implementing model versioning for reproducibility all require specialist skills. The typical fully-loaded cost of one ML engineer capable of managing an on-premises LLM deployment is $180,000 to $280,000 per year in most major markets.
Model Evaluation and Quality Management
Commercial AI providers bear the cost of model evaluation, safety testing, and quality management. Open source deployers assume this responsibility. Building evaluation frameworks, red-teaming for adversarial inputs, monitoring output quality in production, and responding to model failures are ongoing operational costs that are invisible in the initial build-versus-buy calculation but significant in total cost of ownership.
Decision Framework: When Open Source Wins, When Commercial Wins
The build-versus-buy decision in enterprise LLMs is not universal — it depends on use case, volume, data sensitivity, and organisational capability. Open source LLMs win decisively when data sovereignty or data privacy requirements make external API access legally or commercially prohibited, when token volumes are high enough that commercial API costs exceed open source infrastructure and engineering costs (typically at $500,000 or more in annual API spend), when fine-tuning on proprietary data is required for domain-specific quality, and when the organisation has existing GPU infrastructure or ML engineering capability that can absorb open source deployment without incremental headcount.
Commercial AI wins when speed to production is the primary constraint, when token volumes are moderate (under $200,000 annual API spend), when the organisation lacks ML engineering capability, when the use case requires frontier model performance on complex reasoning tasks, and when the compliance infrastructure offered by a major cloud provider (Azure OpenAI) simplifies regulatory documentation.
The hybrid approach — using open source LLMs for high-volume, data-sensitive workloads while retaining commercial API access for complex reasoning tasks that require frontier capability — is the most common architecture in mature enterprise AI deployments and typically delivers the best total cost of ownership while avoiding single-vendor lock-in.
Six Recommendations for Enterprise AI Buyers
1. Model production consumption before signing any commercial AI contract. Use actual application data — document sizes, query frequencies, expected growth — to model token consumption at scale. Apply a 2x to 3x buffer to account for growth and unanticipated usage patterns. The resulting annual cost projection is the number that should drive the build-versus-buy analysis, not the headline per-token price.
2. Review OpenAI enterprise agreement lock-in provisions with independent counsel. Minimum spend commitments, model version terms, and data processing provisions in OpenAI enterprise agreements create obligations that survive initial use cases. Understand what you are committing to before signing, not after the use case has been deployed and switching costs have materialised.
3. Evaluate Azure OpenAI against direct OpenAI access. If your organisation has an Azure EA or significant Azure committed spend, Azure OpenAI consumption qualifies for EA discounts and credits that direct OpenAI API access does not. For regulated industries, Azure's compliance posture is typically stronger and easier to document.
4. Treat Llama's EU geographic restriction as a real legal issue, not a theoretical one. EU-domiciled organisations cannot be the named licensee on Llama 3.2 and later multimodal models. Structure your open source AI deployment through a non-EU entity where Llama is selected, or choose Mistral (Apache 2.0) as the primary open source alternative for EU deployments.
5. Include infrastructure and engineering costs in the open source TCO model. A fair comparison of open source versus commercial AI must account for GPU infrastructure, ML engineering time, and operational monitoring costs. Open source is not free — it is infrastructure-intensive. The cost advantage is real but smaller than the raw token price comparison suggests.
6. Build a multi-vendor AI strategy that avoids single-provider dependency. Enterprise AI strategy should not replicate the single-vendor dependency that organisations have spent decades trying to escape in traditional software. Architect AI capabilities so that model providers can be swapped without re-engineering the application layer, using abstraction layers (LangChain, LlamaIndex, or proprietary middleware) and standard API contracts.
GenAI Licensing Intelligence
The GenAI licensing landscape evolves faster than any other enterprise software category. Subscribe for quarterly updates on commercial AI pricing, open source licensing changes, and enterprise negotiation intelligence.