Cohere Private Deployments Shift Capex

Diving deeper into

Cohere

Company Report
Because customers can run its models through managed cloud services or on their own hardware, Cohere avoids large upfront infrastructure costs
Analyzed 5 sources

Cohere is set up more like an enterprise software vendor than a hyperscale AI utility. When a bank, government agency, or large manufacturer runs Cohere models inside its own cloud account or data center, Cohere sells the model and deployment layer without having to prebuild and carry the full inference fleet on its own balance sheet. That shifts compute spend closer to the customer, keeps gross margins in a SaaS like range, and lets Cohere put more of its cash into product and sales instead of idle GPUs.

  • Private deployments are the core of the model. Around 85% of Cohere revenue comes from private deployments on multi year enterprise contracts, with customers like Oracle, RBC, Fujitsu, and LG running workloads on their own infrastructure. That is why Cohere can avoid the capex burden and weak consumer style unit economics that hit API heavy labs.
  • The deployment choice is also a sales wedge. SAP said in May 2025 it would bring Cohere models into the SAP ecosystem through SAP AI Core, while Dell announced a first on premises deployment of Cohere North. Both partnerships make it easier for large enterprises to buy Cohere in the environment they already trust.
  • This is a real strategic split versus peers. OpenAI has been pushed toward giant centrally hosted infrastructure because of consumer scale, while Mistral is moving down the stack into its own sovereign compute offering. Cohere has taken the opposite path, using customer owned or customer chosen environments as the delivery layer.

Going forward, this structure gives Cohere a clear lane in regulated and sovereign AI. The more enterprises want private AI inside existing clouds and on premises systems, the more Cohere can grow like a software company attached to customer infrastructure, rather than like a lab forced to finance massive compute capacity ahead of demand.