Scale Must Move Toward Expert Judgment

Scale: the $290M/year Mechanical Turk of machine learning

Foundation models from OpenAI, by unlocking the ability to build complex apps with zero/few-shot learning, now threaten the future of that business.

Analyzed 4 sources

The real threat is not that human data disappears, it is that the old bulk labeling workflow stops being the center of AI development. Scale originally won when teams needed huge outsourced datasets for open ended vision problems like autonomous driving, where every new weather condition, lighting change, and road scenario created more annotation work. Foundation models changed that by letting developers get useful performance from pre trained systems with a small number of examples, which cuts demand for the highest volume, lowest context labeling work.

1 sacra 2 sacra 3 sacra

This pressure showed up first in Scale’s core autonomous vehicle business. As AV spending slowed in 2022 and 2023, the labeling engine that had powered Scale’s early growth weakened, which exposed how dependent classic annotation vendors were on one unusually data hungry market.

2 sacra 3 sacra
Scale did not stall, it pivoted. By 2023, growth was being driven by RLHF and model post training for LLM companies, where humans rank outputs, write ideal answers, and test model behavior. That is different work from drawing boxes around cars, and it moved Scale from commodity labeling toward model evaluation and alignment.

2 sacra 3 sacra
The next market is shifting again, from large anonymous labor pools to smaller pools of verified experts. New reasoning models need people in law, medicine, finance, and science who can judge whether an answer is actually correct, not just whether a label matches instructions. That favors networks built around credentialed expertise over Mechanical Turk style scale.

4 sacra

Going forward, the winners in human data will look less like annotation factories and more like infrastructure for expert judgment. Scale’s path is to keep moving up the stack into evaluation, red teaming, and workflow software, because foundation models are steadily automating the repetitive labeling work that built the first version of the business.

2 sacra 4 sacra