Owning the AI labeling feedback loop

Diving deeper into

$1.1B/year Indeed for data labelers

Document
whether they can graduate from marketplaces with minimally differentiated supply & no switching cost for either customers or contractors into a stickier layer of data infrastructure
Analyzed 6 sources

The real moat in AI labeling is shifting from access to workers into ownership of the feedback loop that decides what data to collect, how to score it, and where the model is still weak. A marketplace can fill experts fast, but that alone is easy to swap out. The sticky layer is software and research that sits inside the customer workflow, audits labels, runs evals, and generates the next batch of training tasks. Handshake buying Cleanlab points directly at that transition.

  • Pure labor marketplaces are vulnerable because both sides can multi home. In labor markets, stickiness comes from proprietary performance data, workflow software, payments, compliance, and reputation systems that make repeat work easier than starting over elsewhere. That is the playbook Handshake needs to copy for AI data work.
  • The category leaders are already moving up stack. Mercor now sells benchmarks, eval environments, and research tooling alongside expert supply, while Surge publishes RL environments and benchmark research. That changes the product from finding people for a task into helping labs measure model quality and train the next version.
  • Handshake starts with a strong wedge, a deep pool of credentialed students, graduates, and experts, and that drove a jump from about $190M annualized revenue in 2024 to about $1.1B by April 2026. But marketplace economics are thinner, with contractor payouts taking much of gross revenue, so owning higher margin infrastructure matters for durability.

The next phase of this market belongs to companies that turn human data work into an always on system for post training, evals, safety checks, and agent environments. If Handshake can make Cleanlab style quality software part of every customer loop, it moves from being a vendor labs hire project by project into infrastructure they build their model pipeline around.