Scale Must Cannibalize Its Labeling Business
Scale: the $290M/year Mechanical Turk of machine learning
The real challenge is not shipping more ML tools, it is rewiring Scale from a labor marketplace into software that helps customers need fewer labeled tasks in the first place. Scale originally won by bundling annotation software with a large overseas contractor base and charging per task, so every product that automates labeling or shifts work into self serve fine tuning cuts against the engine that built the company. The harder part is organizational, because product teams want less human effort while the operating system of the business was built to keep a large tasker network utilized.
-
Scale’s core model was usage based data labeling with 50%+ gross margins, powered by contractors recruited through Remotasks. That model worked especially well for autonomous driving, where open ended edge cases kept huge volumes of image, video, and LIDAR work flowing through the platform.
-
Foundation models changed the workflow. Instead of sending 10,000 examples to an offshore labeling queue, product teams can often label 10 to 100 examples themselves, fine tune quickly, and use live user behavior to generate more training data. That reduces the need for the large managed workforce that Scale was built around.
-
The market has also moved up the skill stack. More recent human in the loop demand is shifting from broad crowdwork toward credentialed experts for evals, reasoning, safety, and cultural nuance. That favors networks built around verified specialists, while making a general purpose contractor army less central than it was in Scale’s first phase.
Going forward, the winners in human data will look less like Mechanical Turk and more like infrastructure for specialized evaluation, expert sourcing, and model oversight. Scale has the customer base and workflow foothold to make that transition, but success depends on making software and expert orchestration the center of the business, with bulk labeling becoming a lower value feeder layer rather than the main event.