Together AI
Revenue
Sacra estimates that Together AI hit $300M in annualized revenue in September 2025, up from $130M at the end of 2024 off growing demand for generative AI applications and the need, particularly among startups, for developer tooling used to train, fine-tune, and deploy AI models.
Together AI generates revenue through two primary lines: per-token API usage and GPU server rentals. The API business—where customers run open-source models via Together's endpoints—accounts for roughly 30–40% of revenue and scales with inference volume.
The larger share comes from renting Nvidia GPU capacity for training, fine-tuning, and serving, historically sourced from providers like CoreWeave and Lambda but increasingly from Together's own data centers (e.g., Maryland live since July 2025, Memphis forthcoming). This mix yields ~45% gross margins, with ownership of GPUs expected to improve unit economics over time.
Valuation & Funding
Together AI is in talks to raise approximately $1B at a $7.5B pre-money valuation as of March 2026, with Prosperity7 reportedly involved. If completed, this would represent more than a 2x step-up from its prior $3.3B valuation set less than 14 months earlier.
Together AI raised a $305M Series B in February 2025 led by General Catalyst and co-led by Prosperity7, valuing the company at $3.3B—more than 2x its prior $1.25B valuation from the previous year.
Based on 2024 revenue of $130M and a $1.25B valuation, the company traded at a 9.6x revenue multiple at its prior round.
Together AI has raised $533.5M in total funding to date. Key strategic investors include Salesforce Ventures, Nvidia Corp., General Catalyst, Prosperity7, Kleiner Perkins, Coatue Management, and Lux Capital.
Product
Together AI was founded in 2021 by Percy Liang, Chris Ré, and Vipul Ved Prakash with the mission of making AI development more accessible and affordable by leveraging open-source models.
Companies use cloud GPU hosts Together AI, CoreWeave, and Lambda Labs to use Nvidia (NASDAQ: NVDA) graphics processing units (GPUs) to train AI models on their datasets, fine-tune them, and deploy them into production.
Together AI differentiated itself as a GPU cloud platform early by indexing on open source, allowing its customers access to 100+ open models from Mistral to Llama-2 to rapidly experiment with training different LLMs on their data.
Together AI's platform is designed to be an all-in-one solution for AI development, offering a suite of tools similar to Heroku but specifically tailored for AI workloads. This includes:
Access to GPU compute resources: Together AI provides access to high-performance GPU servers, sourced from a variety of providers including CoreWeave, Lambda Labs, and academic institutions, as well as its own data centers in Maryland (live since July 2025), Memphis, and Sweden (operational since September 2025). The company is deploying NVIDIA Blackwell GPU clusters (including GB200 NVL72 and HGX B200 systems) and has secured 200 MW of power capacity across North America. This allows developers to run compute-intensive AI workloads without having to invest in their own hardware.
Model hosting and serving: Together AI's platform makes it easy for developers to host their trained models and serve them via API endpoints, enabling seamless integration of AI capabilities into applications. Together AI offers both serverless inference APIs and dedicated Together Reasoning Clusters for token-heavy, low-latency workloads, with up to 110 tokens/sec decoding on reasoning clusters.
Fine-tuning and training tools: Together AI offers a range of tools and workflows for fine-tuning pre-trained models on custom datasets and training new models from scratch, allowing developers to adapt open-source models to their specific use cases.
Data management: The platform includes tools for managing datasets used to train and fine-tuning models, including data versioning, labeling, and preprocessing. Together AI acquired Refuel.ai in May 2025 to strengthen its data transformation and structuring capabilities for production AI applications. Refuel.ai processes tens of millions of records and billions of tokens per week, with 50% fewer errors than state-of-the-art models for certain data tasks. The Refuel LLM-2 model is now available on Together for serverless inference and LoRA fine-tuning.
Experiment tracking and reproducibility: Together AI provides features for tracking and managing the various experiments and iterations involved in developing AI models, helping ensure reproducibility and facilitating collaboration.
Together AI's platform supports a wide range of popular open-source models, including Mistral, Llama-2, DeepSeek, and its own RedPajama models.
Business Model

Together AI found product-market fit charging per-token, based on the number of API calls, as a developer experience-centric layer on top of CoreWeave and Lambda Labs’s per-hour pricing.
While CoreWeave and Lambda Labs focus on locking in multi-year reservations to recoup the fixed capex costs of their data centers and GPUs, Together AI operates a layer above, aligning their pricing with the spiky API volumes of startups training new models and launching new products.
Together AI's token-based pricing is attractive to customers, particularly early-stage startups and individual developers, who have variable or unpredictable workloads—particularly around training new models and launching new products.
Token-based pricing allows them to align their costs more directly with the value they derive from their AI models, and mitigates the expensive risk of paying for idle GPU time that accompanies per-hour pricing.
While Together AI does incur costs in sourcing GPU compute from various providers, its value-add comes from the bundling of this compute with a comprehensive set of AI development tools and the convenience of a token-based pricing model. This allows Together AI to charge a (small) premium over the base cost of the GPU compute itself while still offering cheaper compute than the hyperscalers by ~80%. Sacra estimates Together AI's gross margin at ~45%.
Competition
The cloud GPU market is becoming increasingly crowded, with a range of players vying to provide the tools and resources developers need to build and deploy AI models.
Big Cloud
The big three major cloud providers—Google Cloud ($75B in revenue in 2023), Amazon Web Services ($80B in revenue in 2023) and Microsoft Azure ($26B in revenue in 2023)—are all investing in their own GPU clouds, as well as developer tooling for training and fine-tuning models.
While their focus has traditionally been on proprietary models and tools, they are beginning to embrace open-source AI as well. For example, AWS now offers Hugging Face's open-source models on its SageMaker platform.
GPU clouds
Companies like CoreWeave and Lambda Labs offer GPU compute resources specifically designed for AI workloads. While they don't provide the same level of software tooling as Together AI, they do offer raw compute power at competitive prices.
Lambda Labs is more directly competitive as it positions itself as a better option for smaller companies and developers working on less intensive computational tasks, offering Nvidia H100 PCIe GPUs at a price of roughly $2.49 per hour, compared to CoreWeave at $4.25 per hour.
On the other hand, Lambda Labs does not offer access to the more powerful HGX H100—$27.92 per hour for a group of 8 at CoreWeave—which is designed for maximum efficiency in large-scale AI workloads.
Inference services
Since 2023, there have been a number of startups that have been founded or pivoted to serving AI models—as the core or a part of their business—particularly open-source LLMs. Besides Together AI, there are Anyscale, Deepinfra, Hugging Face, Perplexity, OpenRouter, Fireworks.ai, and others.
Research shows that Together AI, despite handling more traffic, demonstrates higher speed than other standalone inference services, with higher rate limits and better reliability. It prices its inference at roughly breakeven: lower than some other providers like Fireworks.ai, but higher than other providers that are operating at a loss.
TAM Expansion
Together AI's initial focus has been on providing infrastructure and tools for training and deploying open-source AI models. However, the company has several potential mechanisms to grow its total addressable market over time.
Higher-level services
As the AI market matures, there will likely be growing demand for higher-level, application-specific AI services. Together AI could leverage its platform to offer APIs for common AI tasks like text generation, image creation, and data analysis.
The company's acquisition of Refuel.ai (May 2025) moves Together up the stack into data transformation and structuring—Refuel processes tens of millions of records and billions of tokens per week with 50% fewer errors than competing approaches for certain tasks. This allows Together to capture more of the value chain and serve customers who don't want to build and maintain their own models.
Geographic expansion
Together AI has expanded significantly into Europe, partnering with Hypertec and 5C Group (announced June 2025) to deploy up to 100,000 GPUs across the region through 2028 in what the company describes as its largest European infrastructure deployment.
Sweden infrastructure went live in September 2025, reducing round-trip latency by 50–70 ms and improving response times by up to 25–30% for some real-time applications in Northern/Central Europe. This expansion enables Together to serve EU data residency requirements and capture demand from European enterprises and developers.
Enterprise adoption
To date, Together AI has primarily served individual developers and small startups. However, Together AI does offer services like GPU clusters designed for organizations dealing with more traffic. As larger enterprises increasingly adopt AI, there will be a significant opportunity to provide enterprise-grade tools and services. This could include features like advanced security, compliance, and governance capabilities.
Commoditization of compute
Unlike pure play GPU clouds like CoreWeave, Together AI is a beneficiary of the commoditization of compute—GPU prices going down lowers Together's cost basis, while its per-token prices can expand as it focuses on winning on developer experience by delivering faster and more reliable inference across the widest variety of open source models.
During the rise of mobile in the 2010s, the first beneficiaries were semiconductor companies like Qualcomm (NASDAQ: QCOM) and ARM (NASDAQ: ARM), but app-layer companies like Apple (NASDAQ: AAPL) and Google (NASDAQ: GOOG) went on to capture 10x as much value—with NVIDIA reporting it has already hit peak margins and LLMs quickly falling in price, value in AI will begin shifting towards app-layer wearables, search engines, and verticalized SaaS for the enterprise.