Together AI pressures Baseten Model APIs
Baseten
Together AI narrows one of Baseten's cleanest wedges by pairing enterprise trust signals with commodity style economics. Baseten can still charge more when a customer needs dedicated deployments, hybrid control, or workflow tooling, but its simple OpenAI compatible Model APIs become much harder to premium price when a buyer can get SOC 2 Type II, broad open model access, and low token prices from another managed provider.
-
Baseten sells two very different things. Dedicated deployments are closer to managed infrastructure, where customers pay for reserved GPU capacity and control scaling. Model APIs are closer to a catalog business, where the buyer mainly compares model availability, latency, compliance, and price per token. That second category is where Together applies the most pressure.
-
Together is built to win that catalog layer at scale. It does 30% to 40% of revenue from per token APIs, runs a low margin model with about 45% gross margins, and publicly lists low prices on Llama family models. That lets it treat open model inference more like a volume business than a premium software product.
-
Replicate shows why compliance matters. It has a huge developer friendly model directory and simple API workflow, but it still lacks the governance features larger fintech, healthcare, and enterprise buyers often require. Together having both enterprise compliance and aggressive pricing means Baseten cannot rely on compliance alone to defend Model API pricing.
The next step is a split market. Open model APIs will keep drifting toward transparent, low priced utility infrastructure, while value pools move toward dedicated clusters, private deployments, orchestration, and enterprise specific controls. That plays to Baseten's stronger position higher up the stack, but only if it keeps turning Model APIs into an entry point rather than the core margin engine.