Growth Rate (y/y)
Scale started in 2016 as a smarter, concierge-operated version of Amazon’s Mechanical Turk to take on the data labeling work for companies building AI/ML models. As an AI model is only as good as the data used to train it, AI teams spend ~80% of their time on data sourcing, labeling, and preparation. AI models are trained on data labeled by humans to tell the difference between a cat, car, and truck before going live. Prior to Scale, the data labeling was farmed out to crowdsourcing platforms like Amazon’s Mechanical Turk, which was clunky and lacked quality control or conducted in-house by large teams, possible only for companies such as Meta and Google.
Scale got its start by simplifying the data labeling workflow with an API to get high-volume raw data, software to pre-label the raw data, and human contractors to make final edits & quality checks. Scale’s launch coincided with the rapid rise of self-driving vehicle startups that needed enormous high-quality training data that the general-purpose vendors couldn’t provide. Scale focused on this narrow use case, becoming better at labeling complex data like LiDAR point clouds than its competitors, solving a fundamental hair-on-fire problem for self-driving car companies.
Like AWS, Scale charges its customers on usage with two types of plans. One is pay-as-you-go with no minimum commitment and a self-serve platform, priced per data unit labeled. For instance, images are 2 ₵ per image and 6 ₵ per annotation. Its enterprise plan has annual volume commitments and volume discounts. As Scale’s contractors label more images/video per hour using improved pre-labeling AI models, a usage-based pricing model lets Scale expand its revenue better than hourly/seat-based models typically used by outsourcing firms. It also helps Scale close deals faster as customers can easily estimate how much it will cost them even before talking to the sales team.
Unlike the AI software companies like C3 and Palantir, Scale is not a build-once-and-sell-everywhere company as it pays its contractors every time they label images. It typically employs independent contractors in the Philippines, Kenya, and Venezuela, whom it recruits through a separate portal, Remotasks.
Remotasks portal for recruiting independent contractors.
Scale offers four types of products to its customers that run on top of its ML models, driven by the taskers platform used by independent contractors.
This is Scale’s core product and comes in two variants. One is Rapid, where Scale provides the software and outsourced contractors as a bundle. The other is Studio, a DIY SaaS for customers who just want the pre-labeling software and hire their own human workforce.
Nucleus, introduced by Scale in 2020, is like Google Photos for training data. It comes with an ML-powered search engine that customers use to visually create data slices like daytime photos of trucks, identify incorrectly labeled data, and filter data through graphs of the model’s performance. Scale recently launched two new products, Validate to run tests on ML models and compare performance, and Launch to ship the ML models to production.
Synthetic applies ML on the real-world raw data collected by Scale over the years to generate artificial datasets that its customers use to augment their training data.
Customers can use Scale’s automation SaaS to extract data from documents or build metadata from images and texts.
Scale feeds the data from human labelers into its ML models, which are used across its product suite to make them faster and more accurate. The ML models also feed into the allocations algorithm in the taskers platform to pick the best set of contractors for a project.
Screenshots from Scale Rapid
Comparison of raw data with labeled data.
When Scale started, it got an edge over competitors by bundling pre-labeling software and outsourced independent contractors in an easy-to-use self-serve solution. At that time, Mechanical Turk and Appen (Valuation: $286M) had a larger army of independent contractors but didn’t have Scale’s software, and CloudFactory ($343M) and Sama used employees for data labeling, reducing their flexibility of scaling up or down quickly.
However, over the years, competition caught up. Appen acquired Figure Eight, and TELUS International acquired Lionbridge AI and Playment to build ML-assisted pre-labeling features. At the same time, Sama and Cloud Factory rolled up third-party contractors into their projects. Many startups, such as Labelbox, Snorkel, and Heartex, started crowding the data-labeling market in the last few years. On the enterprise side, companies use data labeling from Amazon Sagemaker or Google’s AI Platform, where they may be already running their ML models, or consider outsourcing firms such as Accenture or Cognizant that are already managing their IT projects.
With the launch of new products Launch and Validate, Scale wants to move up the AI value chain from a Mechanical Turk replacement to becoming Atlassian for AI software development with data management, model training and deployment, and performance monitoring. Scale wants to use the easy integration of its new products with Rapid and Nucleus as a wedge to get into the market. The global AI software market is expected to become 2x to reach ~$120B by 2025, large enough for multiple picks-and-shovels companies like Scale. While the market is large, it is dominated by platforms such as Datarobot, Dataiku, and Alteryx and cloud providers such as Amazon, Google, and Microsoft.
Growth of the AI software market.
Customer base expansion
The data labeling market is expected to grow from $5B in 2022 to $22B in 2027, much of it coming from a digital shift in sectors such as public sector enterprises, government departments, and legacy companies that have collected data for years—from purchase orders to website traffic, inventory levels and SKUs—without knowing what to do with it. Scale is present in almost all segments of the data labeling market, which gives it an advantage in expanding to more sectors beyond autonomous vehicle companies. For instance, Scale recently signed a $249M contract with the US Department of Defense and is opening a new office outside the Silicon Valley in St. Louis, Missouri, the innovation hub of the US Department of Defense. Another example is Scale’s deal with States Title, a real estate startup that applies AI to the title and escrow process for faster transactions.
Growth of the data labeling market.
This report is for information purposes only and is not to be used or considered as an offer or the solicitation of an offer to sell or to buy or subscribe for securities or other financial instruments. Nothing in this report constitutes investment, legal, accounting or tax advice or a representation that any investment or strategy is suitable or appropriate to your individual circumstances or otherwise constitutes a personal trade recommendation to you.
Information and opinions presented in the sections of the report were obtained or derived from sources Sacra believes are reliable, but Sacra makes no representation as to their accuracy or completeness. Past performance should not be taken as an indication or guarantee of future performance, and no representation or warranty, express or implied, is made regarding future performance. Information, opinions and estimates contained in this report reflect a determination at its original date of publication by Sacra and are subject to change without notice.
Sacra accepts no liability for loss arising from the use of the material presented in this report, except that this exclusion of liability does not apply to the extent that liability arises under specific statutes or regulations applicable to Sacra. Sacra may have issued, and may in the future issue, other reports that are inconsistent with, and reach different conclusions from, the information presented in this report. Those reports reflect different assumptions, views and analytical methods of the analysts who prepared them and Sacra is under no obligation to ensure that such other reports are brought to the attention of any recipient of this report.
All rights reserved. All material presented in this report, unless specifically indicated otherwise is under copyright to Sacra. Sacra reserves any and all intellectual property rights in the report. All trademarks, service marks and logos used in this report are trademarks or service marks or registered trademarks or service marks of Sacra. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any report is strictly prohibited. None of the material, nor its content, nor any copy of it, may be altered in any way, transmitted to, copied or distributed to any other party, without the prior express written permission of Sacra. Any unauthorized duplication, redistribution or disclosure of this report will result in prosecution.