Revenue
$950.00M
2024
Valuation
$13.80B
2024
Growth Rate (y/y)
25%
2024
Funding
$1.60B
2024
Revenue
As of the end of August 2024, Scale reported that it was at "nearly $1B" in ARR.
In 2023, Scale’s revenue exploded to $760M ARR, up 162% YoY, as $18B in capital flowed into foundational model companies like OpenAI ($1.3B ARR in 2023), Anthropic ($200M ARR in 2023) and Cohere that use Scale to train their large language models (LLMs) using reinforcement learning from human feedback (RLHF).
Scale’s initial launch in 2016 coincided with the rapid rise of self-driving vehicle startups that needed enormous high-quality training data that the general-purpose vendors couldn’t provide.
General Motors’s Cruise (NYSE: GM), Lyft (NASDAQ: LYFT), and Voyage (acquired by Cruise for $19B) were crucial customers that got Scale across the $100M revenue line by 2020.
Business Model
Scale is a vertically integrated API and business process outsourcing (BPO) company that enables programmatic access to a pool of human labor in low cost of living countries like the Philippines, Nigeria and Kenya to label snippets of data that feed into machine learning algorithms for LLMs and self-driving cars.
Scale monetizes per task, marking up the cost of labor giving the company 50%+ gross margin.
Prior to Scale, this kind of data labeling was farmed out to crowdsourcing platforms like Amazon’s Mechanical Turk, which was clunky and lacked quality control, or it was conducted in-house by large teams, possible only for companies such as Meta and Google.
Like AWS, Scale charges its customers on usage with two types of plans. One is pay-as-you-go with no minimum commitment and a self-serve platform, priced per data unit labeled. For instance, images are 2 ₵ per image and 6 ₵ per annotation. Its enterprise plan has annual volume commitments and volume discounts.
As Scale’s contractors label more images/video per hour using improved pre-labeling AI models, a usage-based pricing model lets Scale expand its revenue better than hourly/seat-based models typically used by outsourcing firms. It also helps Scale close deals faster as customers can easily estimate how much it will cost them even before talking to the sales team.
Unlike the AI software companies like C3 and Palantir, Scale is not a build-once-and-sell-everywhere company as it pays its contractors every time they label images. It typically employs independent contractors in the Philippines, Kenya, and Venezuela, whom it recruits through a separate portal, Remotasks.
Product
Scale offers four types of products to its customers that run on top of its ML models, driven by the taskers platform used by independent contractors.
Data labeling: This is Scale’s core product and comes in two variants. One is Rapid, where Scale provides the software and outsourced contractors as a bundle. The other is Studio, a DIY SaaS for customers who just want the pre-labeling software and hire their own human workforce.
Data management: Nucleus, introduced by Scale in 2020, is like Google Photos for training data. It comes with an ML-powered search engine that customers use to visually create data slices like daytime photos of trucks, identify incorrectly labeled data, and filter data through graphs of the model’s performance. Scale recently launched two new products, Validate to run tests on ML models and compare performance, and Launch to ship the ML models to production.
Data Generation: Synthetic applies ML on the real-world raw data collected by Scale over the years to generate artificial datasets that its customers use to augment their training data.
Workflow automation: Customers can use Scale’s automation SaaS to extract data from documents or build metadata from images and texts.
Scale feeds the data from human labelers into its ML models, which are used across its product suite to make them faster and more accurate. The ML models also feed into the allocations algorithm in the taskers platform to pick the best set of contractors for a project.
Screenshots from Scale Rapid
Comparison of raw data with labeled data.
Competition
When Scale started, it got an edge over competitors by bundling pre-labeling software and outsourced independent contractors in an easy-to-use self-serve solution.
At that time, Mechanical Turk and Appen (Valuation: $286M) had a larger army of independent contractors but didn’t have Scale’s software, and CloudFactory ($343M) and Sama used employees for data labeling, reducing their flexibility of scaling up or down quickly.
However, over the years, competition caught up. Appen acquired Figure Eight, and TELUS International acquired Lionbridge AI and Playment to build ML-assisted pre-labeling features.
At the same time, Sama and Cloud Factory rolled up third-party contractors into their projects. Many startups, such as Labelbox, Snorkel, and Heartex, started crowding the data-labeling market in the last few years.
On the enterprise side, companies use data labeling from Amazon Sagemaker or Google’s AI Platform, where they may be already running their ML models, or consider outsourcing firms such as Accenture or Cognizant that are already managing their IT projects.
TAM Expansion
New products
With the launch of new products Launch and Validate, Scale wants to move up the AI value chain from a Mechanical Turk replacement to becoming Atlassian for AI software development with data management, model training and deployment, and performance monitoring.
Scale wants to use the easy integration of its new products with Rapid and Nucleus as a wedge to get into the market. The global AI software market is expected to become 2x to reach ~$120B by 2025, large enough for multiple picks-and-shovels companies like Scale.
While the market is large, it is dominated by platforms such as Datarobot, Dataiku, and Alteryx and cloud providers such as Amazon, Google, and Microsoft.
Growth of the AI software market.
Customer base expansion
A key strength of Scale is its ability to expand its customer base by leveraging the generalizability of its data labeling platform.
Mid 2022, Scale’s bread-and-butter of autonomous vehicle (AV) data labeling workloads went into a decline alongside falling R&D investment and VC funding. Then, in 2023, Scale's business exploded with the rise in demand for data labeling from LLM companies.
The data labeling market is expected to continue to grow to $22B in 2027, much of it coming from a digital shift in sectors such as public sector enterprises, government departments, and legacy companies that have collected data for years—from purchase orders to website traffic, inventory levels and SKUs—without knowing what to do with it.
Scale is present in almost all segments of the data labeling market, which gives it an advantage in expanding to more sectors beyond autonomous vehicle companies.
For instance, Scale recently signed a $249M contract with the US Department of Defense and is opening a new office outside the Silicon Valley in St. Louis, Missouri, the innovation hub of the US Department of Defense. Another example is Scale’s deal with States Title, a real estate startup that applies AI to the title and escrow process for faster transactions.
Growth of the data labeling market.
Geographic expansion
Scale makes most of its revenue from the US and can expand to other geographies. The European AI software market is expected to grow to $26.5B by 2025, 13x its size in 2018.
In China, while much of the growth in the AI market has come from consumer internet companies such as Alibaba and ByteDance, going forward, traditional sectors are expected to lead.
Risks
1. Margin expansion: Competition is growing in Scale’s core data labeling market, and as companies face financial headwinds, the basis of competition may shift from features and efficiency to price, diminishing any margin expansion benefits Scale gets from using more pre-labeling software. Its new products are entering a market with strong incumbents and may not provide immediate margin benefits as Scale would want to price them competitively to sign up new users.
2. Rich valuation: Scale’s rich valuation seems to be that of a product company with 70% gross margins, while Scale is a professional services company. Scale’s high valuation has implications for new investors to generate a sizable return and Scale’s ability to recruit high-quality talent for whom Scale may be too richly valued for any upside in their next job.
3. Synthetic data: The emergence of synthetic data and the improving capabilities of LLMs in data labeling also introduce risks to Scale's valuation. If the industry shifts toward automated and synthetic data solutions, Scale's business model—which heavily relies on human labor—could face obsolescence.
Funding Rounds
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
|||||||||
View the source Certificate of Incorporation copy. |
News
DISCLAIMERS
This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.
This research report has been prepared solely by Sacra and should not be considered a product of any person or entity that makes such report available, if any.
Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.
Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.
All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.