Scale: the $290M/year Mechanical Turk of machine learning

TL;DR: Scale (dataset), formerly Scale AI, built a $290M/year revenue business indexed on labeling data for AI in the world of atoms, from autonomous vehicles to defense. Foundation models from OpenAI, by unlocking the ability to build complex apps with zero/few-shot learning, now threaten the future of that business. For more, check out our Scale report (dataset) and interviews with Oscar Beijbom, co-founder of Nyckel, and Cristobal Valenzuela, CEO of Runway.

Amazon Mechanical Turk (2005) and reCAPTCHA (2007) created APIs into human labor, making programmable lower cost of overseas labor that companies like Google used to scale optical character recognition (OCR) for e.g. digitizing 18,000 books for Google Books. Businesses hired remote crowdworkers from Mechanical Turk to deal with tasks that were difficult for computers, like distinguishing a pole from a firehose. (link)
Scale (2021 valuation: $7.3B) launched in 2016 as a Mechanical Turk for machine learning, providing an API and workflow into Filipino, Nigerian and Kenyan workers for machine learning data labeling tasks—in 2022, Scale AI grew 61% to ~$290M in revenue. Scale found its initial traction with AI applications in the world of atoms that generate combinatorial numbers of edge cases that humans need to check to ensure safety—like autonomous driving, where Scale sold to companies like General Motors’s Cruise (NYSE: GM), Lyft (NASDAQ: LYFT), and Voyage (acquired by Cruise for $19B) were crucial customers. (link)
Via usage-based pricing, Scale indexed on the ~50TB daily workloads of LIDAR datasets used by autonomous vehicle (AV) companies and through a combination of software-led outsourcing and a narrow focus on the AV use case, Scale hit 50%+ gross margin, higher than generalized data labeling companies like Appen (ASX: APX; gross margin: 24%) and TELUS (NYSE: TIXT; gross margin: 21%). Like competitors Labelbox ($1B valuation, 200 customers), Snorkel ($1B valuation) and BasicAI, Scale AI charges its customers via usage-based pricing for labeling images, video, and 3D maps, generated by the sensors and cameras of the self-driving cars and sends it back to the ML teams which use it to teach their models differences between a cat, a person, and a car. (link)
In the post-zero interest rate environment of 2022-23, AV data labeling workloads have declined alongside the decreasing investment in R&D—see the closure of AV startups like Argo AI (2021 valuation: $7.5B) and Lyft (NASDAQ: LYFT) and Uber (NYSE: UBER) discontinuing their self-driving car units. Ford, the majority owner of Argo, said—after spending about $2.7B on R&D—that it would focus on building Level 2 and Level 3 driver-assist technologies rather than Level 4 “full self driving” technology. (link)
The launch of foundation models like OpenAI’s GPT-3 (2020) and DALL-E (2022) created a seismic shift in ML—models better than humans at labeling reduce by ~100X the number of edge cases that need to be checked by humans. Foundation models like GPT-3 and DALL-E are pre-trained on the entire internet using self-supervision instead of a limited proprietary dataset—GPT-3’s 175 billion parameters came from being trained on ~1T words published from the web scraped between 2016 and 2019, as well as Wikipedia articles and books. (link)
Zero and few shot learning and fine-tuning on foundation models enable teams to build context-specific AI apps and features with only a handful of labeled examples, get to market quickly and start the flywheel of using user engagement to label data and continually improve fine-tuning—the flywheel which companies like Jasper used to reach $42M in annual recurring revenue within 12 months. As with captchas, implicit, large-scale data labeling can be embedded into AI apps via different kinds of engagement mechanics. (link)

To get ahead of the shift away from data as the center of gravity in ML, Scale has begun the transition towards becoming an end-to-end ML stack company with Nucleus (MLOps), Spellbook (fine-tuning), InstantML (AutoML engine), and Launch (model deployment and monitoring)—but that requires Scale to cannibalize its main data labeling business and overcome internal misalignment with the 10,000s of data labeling contractors they have around the world. As with Uber and DoorDash (NYSE: DASH), an organizational split between white collar software and blue collar gig workers creates operational overhead and makes it hard to innovate. (link)
Scale getting into the ML model business has the effect of commoditizing its own customers’ proprietary models which they used Scale’s data labeling to create. Scale customers don’t want to be inadvertently training the models that another company might one day use to compete against them, and some of Scale’s customers already negotiate a rule that Scale can’t train models on their data into their contracts. (link)
While the ML apps and tooling space has gotten incredibly noisy —from MLOps platforms like Weights & Biases ($200M raised) and Outerbounds ($24M) to ML-in-a-box startups like Nyckel (Y Combinator winter 2022) and creative tools like Runway ($95M raised)—Scale is one of the few players with the capability to make a consolidation play and build a Twilio-like ML platform for the enterprise. The customer profile is shifting from the ML engineer stitching together an MLOps pipeline using point solutions to the product manager who wants a Webflow-like self-serve experience without worrying about the ML models or tooling under-the-hood. (link)
Scale’s vertically integrated platform helps them sell into enterprise and government—who have a greater need for consolidation and a harder time hiring to build and secure their own ML infrastructure—with Scale already winning $300M+ in contracts with the Department of Defense to work on image and text analysis for national security. In 2020, Scale won a $91M contract with the Army, following that with a $250M contract with the Department of Defense in 2022 to provide their tooling to all agencies involved in national security—overall, AI R&D is one of the fastest growing investment areas in the defense budget, with total spend hitting $2.58B in FY22, up from $1B in FY2020. (link)
Scale’s advantages on mission-critical use cases position them well to win the world of atoms, while OpenAI’s foundation models—and how they enable faster deployment of and iteration on apps—position them to win the world of bits. Scale’s business model, built on human labeling of huge visual datasets with large numbers of edge cases—like aerial photographs for the military or crosswalk photos for an autonomous vehicle company—gives them an advantage on use cases in heavily regulated industries where precision can mean the difference between life or death and decisions must have an audit trail. (link)

For more, check out our Scale one-pager, Scale dataset, and this other research from our platform:

Scale: the $290M/year Mechanical Turk of machine learning

Read more from

Scale AI

Scale at $760M ARR

Scale AI revenue, growth, and valuation

Read more from
#machine-learning

Pinecone: the MongoDB of AI

Read more from
#b2b

How Clearbit sold to HubSpot

Instabase: the $70M/year Palantir of banking

Instabase revenue, growth, and valuation

Read more from Scale AI