Home  >  Companies  >  Scale AI
Scale AI
Tooling suite for data labeling, model training, and deployment for enterprise ML teams

Revenue

$1.50B

2024

Valuation

$13.80B

2024

Funding

$1.60B

2024

Growth Rate (y/y)

25%

2024

Details
Headquarters
San Francisco, CA
CEO
Alexandr Wang
Website
Milestones
FOUNDING YEAR
2016

Revenue

Click here for our Scale AI dataset

Sacra estimates that Scale generated $870M in revenue in 2024, hitting a $1.5B annualized run rate by end of year, up 97% from $760M ARR in 2023. Scale generated approximately $2B in revenue in 2025, representing continued doubling year-over-year, though the Meta deal in June 2025 introduced significant customer disruption mid-year as Google and OpenAI cut ties.

The 2023 revenue explosion was driven by $18B in capital flowing into foundational model companies like OpenAI ($1.3B ARR in 2023), Anthropic ($200M ARR in 2023) and Cohere that used Scale to train their large language models (LLMs) using reinforcement learning from human feedback (RLHF).

However, Meta's $14.3B investment for 49% ownership in June 2025 has triggered customer flight from Scale's largest clients. OpenAI and Google—who together represented a significant portion of Scale's revenue—have cut ties following the deal, with Microsoft and xAI reportedly exploring alternatives. Google alone had been planning to spend $200 million on Scale's services in 2025 before pulling back.

Scale's initial launch in 2016 coincided with the rapid rise of self-driving vehicle startups that needed enormous high-quality training data that the general-purpose vendors couldn't provide.

General Motors's Cruise (NYSE: GM), Lyft (NASDAQ: LYFT), and Voyage (acquired by Cruise for $19B) were crucial customers that got Scale across the $100M revenue line by 2020.

In recent years, Scale has become part of the core training infrastructure for leading LLM companies that utilize Scale's platform for reinforcement learning with human feedback (RLHF).

Valuation & Funding

Meta's June 2025 investment values Scale at $29 billion, with Meta paying $14.3B for a 49% non-voting stake. The deal represents one of Meta's largest acquisitions since WhatsApp and was structured to avoid regulatory scrutiny. It brought CEO Alexandr Wang to Meta to lead its superintelligence division, with Jason Droege installed as Scale's interim CEO.

Prior to the Meta deal, Scale was valued at $25B in a tender offer discussed in early 2025, and at $13.8B following its 2023 funding round, with significant participation from Cisco Investments, ServiceNow Ventures, and Amazon.

The company has raised $1.6B in total funding across multiple rounds, backed by prominent investors including Tiger Global Management, Accel, and Index Ventures.

Business Model

Scale is a vertically integrated API and business process outsourcing (BPO) company that enables programmatic access to a pool of human labor in low cost of living countries like the Philippines, Nigeria and Kenya to label snippets of data that feed into machine learning algorithms for LLMs and self-driving cars.

Scale monetizes per task, marking up the cost of labor giving the company 50%+ gross margin.

Prior to Scale, this kind of data labeling was farmed out to crowdsourcing platforms like Amazon's Mechanical Turk, which was clunky and lacked quality control, or it was conducted in-house by large teams, possible only for companies such as Meta and Google.

Like AWS, Scale charges its customers on usage with two types of plans. One is pay-as-you-go with no minimum commitment and a self-serve platform, priced per data unit labeled. For instance, images are 2 ₵ per image and 6 ₵ per annotation. Its enterprise plan has annual volume commitments and volume discounts.

As Scale's contractors label more images/video per hour using improved pre-labeling AI models, a usage-based pricing model lets Scale expand its revenue better than hourly/seat-based models typically used by outsourcing firms. It also helps Scale close deals faster as customers can easily estimate how much it will cost them even before talking to the sales team.

Unlike the AI software companies like C3 and Palantir, Scale is not a build-once-and-sell-everywhere company as it pays its contractors every time they label images. It typically employs independent contractors in the Philippines, Kenya, and Venezuela, whom it recruits through a separate portal, Remotasks.

The labor-intensive nature of the model makes headcount management a key operational lever. Scale cut 200 full-time employees—about 14% of staff—and ended work with roughly 500 contractors (July 2025), with interim CEO Jason Droege citing over-rapid gen-AI capacity expansion and excess bureaucracy as the drivers. The broader market is experiencing the same structural shift: xAI separately laid off roughly 500 generalist human annotators—about a third of its data labeling team—underscoring how quickly companies can pivot from broad human labor pools to smaller groups of specialist AI tutors as model training shifts toward domain expertise and automation.

Product

Scale offers four types of products to its customers that run on top of its ML models, driven by the taskers platform used by independent contractors.

Data labeling: This is Scale's core product and comes in two variants. One is Rapid, where Scale provides the software and outsourced contractors as a bundle, now offering production-quality labels and instruction feedback in as little as one hour via a self-serve, pay-as-you-go early-access release. The other is Studio, a DIY SaaS for customers who just want the pre-labeling software and hire their own human workforce.

Data management: Nucleus, introduced by Scale in 2020, is like Google Photos for training data. It comes with an ML-powered search engine that customers use to visually create data slices like daytime photos of trucks, identify incorrectly labeled data, and filter data through graphs of the model's performance. Scale has since extended the data management suite with two additional products: Validate, for running tests on ML models and comparing performance, and Launch, for shipping ML models to production.

Data Generation: Synthetic applies ML on the real-world raw data collected by Scale over the years to generate artificial datasets that its customers use to augment their training data.

Workflow automation: Customers can use Scale's automation SaaS to extract data from documents or build metadata from images and texts.

Scale feeds the data from human labelers into its ML models, which are used across its product suite to make them faster and more accurate. The ML models also feed into the allocations algorithm in the taskers platform to pick the best set of contractors for a project.

Screenshots from Scale Rapid

Comparison of raw data with labeled data.

Beyond software, Scale has expanded into adjacent data-collection and research verticals. On the collection side, Scale operates a robotics data program, hiring hundreds of contractors globally to record point-of-view demonstrations for startups training AI-powered robots (launched September 2025). Scale has deepened its physical AI push further through an integration of its Physical AI Data Engine into Universal Robots' UR AI Trainer, targeting UR's installed base of more than 100,000 industrial deployments, with a large-scale industrial robotics dataset planned for later in 2026. On the agent-training side, Scale now offers RL Environments to train and evaluate agents in simulated tool-use and computer-use workflows (launched February 2026); the company reports that nearly half of all new data-training projects now involve reinforcement learning environments. On the research side, Scale Labs is a dedicated hub studying advanced AI systems in real-world environments, with work spanning evaluation, agentic and multimodal systems, post-training, enterprise deployment, and AI risk and oversight infrastructure (launched March 2026).

Competition

None

When Scale started, it got an edge over competitors by bundling pre-labeling software and outsourced independent contractors in an easy-to-use self-serve solution.

At that time, Mechanical Turk and Appen (Valuation: $286M) had a larger army of independent contractors but didn’t have Scale’s software, and CloudFactory ($343M) and Sama used employees for data labeling, reducing their flexibility of scaling up or down quickly.

At the same time, Sama and Cloud Factory rolled up third-party contractors into their projects. Many startups, such as Labelbox, Snorkel, and Heartex, started crowding the data-labeling market in the last few years.

On the enterprise side, companies use data labeling from Amazon Sagemaker or Google’s AI Platform, where they may be already running their ML models, or consider outsourcing firms such as Accenture or Cognizant that are already managing their IT projects.

TAM Expansion

New products

With the launch of new products Launch and Validate, Scale wants to move up the AI value chain from a Mechanical Turk replacement to becoming Atlassian for AI software development with data management, model training and deployment, and performance monitoring.

Scale wants to use the easy integration of its new products with Rapid and Nucleus as a wedge to get into the market. The global AI software market is expected to become 2x to reach ~$120B by 2025, large enough for multiple picks-and-shovels companies like Scale.

While the market is large, it is dominated by platforms such as Datarobot, Dataiku, and Alteryx and cloud providers such as Amazon, Google, and Microsoft.

Growth of the AI software market.

Customer base expansion

A key strength of Scale is its ability to expand its customer base by leveraging the generalizability of its data labeling platform.

Mid 2022, Scale's bread-and-butter of autonomous vehicle (AV) data labeling workloads went into a decline alongside falling R&D investment and VC funding. Then, in 2023, Scale's business exploded with the rise in demand for data labeling from LLM companies.

The data labeling market is expected to continue to grow to $22B in 2027, much of it coming from a digital shift in sectors such as public sector enterprises, government departments, and legacy companies that have collected data for years—from purchase orders to website traffic, inventory levels and SKUs—without knowing what to do with it.

Scale is present in almost all segments of the data labeling market, which gives it an advantage in expanding to more sectors beyond autonomous vehicle companies. Scale has made government a dedicated growth vector, anchored by a $99M Army R&D contract (August 2025) and a separate five-year, $100M ceiling agreement with the DoD's CDAO (September 2025)—framed as a shift from prototype work to production deployment for mission-ready AI—with the first project under that deal carrying a $40.7M commitment. To support that push, Scale opened an office in St. Louis, Missouri, near the DoD's innovation hub. Beyond defense, Scale has also moved into adjacent enterprise verticals, signing a deal with States Title, a real estate startup that applies AI to the title and escrow process.

Scale is also expanding into sovereign AI deployments internationally. Scale has launched a five-year program with Qatar spanning education, civil service, contact centers, tourism, health care, and transportation (announced April 2025), with government work in Asia and Europe expected to represent a significant share of near-term sales.

Growth of the data labeling market.

Geographic expansion

Scale makes most of its revenue from the US and can expand to other geographies. Scale has made its first major dedicated international spending commitment, pledging to invest $52 million (£39 million) in Britain's AI talent and grow its local team to 200 employees over two years (September 2025).

The European AI software market is expected to grow to $26.5B by 2025, 13x its size in 2018.

In China, while much of the growth in the AI market has come from consumer internet companies such as Alibaba and ByteDance, going forward, traditional sectors are expected to lead. By 2030, AI is expected to add $600B annually to the Chinese economy, of which Automotive, transportation, and logistics, Scale's core offering, is expected to contribute 64%. Scale's competitors like Appen are seeing strong revenue growth in China, mostly from autonomous vehicle companies.

Risks

Margin compression: Competition is growing in Scale's core data labeling market, and as companies face financial headwinds, the basis of competition may shift from features and efficiency to price, diminishing any margin expansion benefits Scale gets from using more pre-labeling software. Its new products are entering a market with strong incumbents and may not provide immediate margin benefits as Scale would want to price them competitively to sign up new users.

Customer concentration: Meta's acquisition of a 49% stake caused Scale's two largest customers—Google and OpenAI—to cut ties, exposing how a single structural transaction can rapidly hollow out the customer base. Scale must replace a material portion of its revenue base while simultaneously integrating a dominant investor whose competitive interests conflict with those of potential enterprise clients.

Synthetic data: The emergence of synthetic data and the improving capabilities of LLMs in data labeling introduce structural risk to Scale's business model. If the industry shifts toward automated and synthetic data solutions, Scale's model—which heavily relies on human labor—could face obsolescence.

News

DISCLAIMERS

This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.

This research report has been prepared solely by Sacra and should not be considered a product of any person or entity that makes such report available, if any.

Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.

Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.

All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.