Scale: the $290M/year Mechanical Turk of machine learning

Jan-Erik Asplund
View PDF

TL;DR: Scale (dataset), formerly Scale AI, built a $290M/year revenue business indexed on labeling data for AI in the world of atoms, from autonomous vehicles to defense. Foundation models from OpenAI, by unlocking the ability to build complex apps with zero/few-shot learning, now threaten the future of that business. For more, check out our Scale report (dataset) and interviews with Oscar Beijbom, co-founder of Nyckel, and Cristobal Valenzuela, CEO of Runway.

None

None
  • To get ahead of the shift away from data as the center of gravity in ML, Scale has begun the transition towards becoming an end-to-end ML stack company with Nucleus (MLOps), Spellbook (fine-tuning), InstantML (AutoML engine), and Launch (model deployment and monitoring)—but that requires Scale to cannibalize its main data labeling business and overcome internal misalignment with the 10,000s of data labeling contractors they have around the world. As with Uber and DoorDash (NYSE: DASH), an organizational split between white collar software and blue collar gig workers creates operational overhead and makes it hard to innovate. (link)
  • Scale getting into the ML model business has the effect of commoditizing its own customers’ proprietary models which they used Scale’s data labeling to create. Scale customers don’t want to be inadvertently training the models that another company might one day use to compete against them, and some of Scale’s customers already negotiate a rule that Scale can’t train models on their data into their contracts. (link)
  • While the ML apps and tooling space has gotten incredibly noisy —from MLOps platforms like Weights & Biases ($200M raised) and Outerbounds ($24M) to ML-in-a-box startups like Nyckel (Y Combinator winter 2022) and creative tools like Runway ($95M raised)—Scale is one of the few players with the capability to make a consolidation play and build a Twilio-like ML platform for the enterprise. The customer profile is shifting from the ML engineer stitching together an MLOps pipeline using point solutions to the product manager who wants a Webflow-like self-serve experience without worrying about the ML models or tooling under-the-hood. (link)
  • Scale’s vertically integrated platform helps them sell into enterprise and government—who have a greater need for consolidation and a harder time hiring to build and secure their own ML infrastructure—with Scale already winning $300M+ in contracts with the Department of Defense to work on image and text analysis for national security. In 2020, Scale won a $91M contract with the Army, following that with a $250M contract with the Department of Defense in 2022 to provide their tooling to all agencies involved in national security—overall, AI R&D is one of the fastest growing investment areas in the defense budget, with total spend hitting $2.58B in FY22, up from $1B in FY2020. (link)
  • Scale’s advantages on mission-critical use cases position them well to win the world of atoms, while OpenAI’s foundation models—and how they enable faster deployment of and iteration on apps—position them to win the world of bits. Scale’s business model, built on human labeling of huge visual datasets with large numbers of edge cases—like aerial photographs for the military or crosswalk photos for an autonomous vehicle company—gives them an advantage on use cases in heavily regulated industries where precision can mean the difference between life or death and decisions must have an audit trail. (link)

For more, check out our Scale one-pager, Scale dataset, and this other research from our platform:

Read more from

Scale at $760M ARR

lightningbolt_icon Unlocked Report
Continue Reading

Scale AI revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading

Read more from

Read more from

Plaud revenue, growth, and valuation

lightningbolt_icon Unlocked Report
Continue Reading