AI Data Labeling Shifts to Experts

Diving deeper into

Joe Kim, CEO of Office Hours, on the end of crowdwork

Interview
the existing data labeling players in the market today are all based on crowd work.
Analyzed 8 sources

This marks a shift in AI data work from cheap labor supply to trusted expert supply. Older labeling companies were built to process huge volumes of repetitive tasks through large contractor pools, which works for image tags, transcription, and simple moderation. The newer bottleneck is different, model labs need credentialed people in law, finance, healthcare, and science to judge nuanced outputs, write better examples, and catch subtle mistakes that a generic crowd cannot reliably spot.

  • The crowdwork model looks like a marketplace for thousands of low cost workers doing tightly specified microtasks. Scale grew on that playbook, and Appen, LXT, and similar vendors still compete on workforce size, language breadth, and price, which matters when the job is volume and speed.
  • Reasoning models changed the spec. Handshake grew quickly by tapping PhD students, graduates, and postdocs for expert evaluation work, while Surge built around elite annotators and RLHF. In both cases, the winning asset is not just labor capacity, it is verified access to scarce people with the right background.
  • That is why Office Hours is adjacent to data labeling even though it started as an expert network. Its core product is finding, vetting, matching, scheduling, and paying hard to reach specialists. For frontier labs, that workflow can matter as much as annotation software, because the real constraint is getting the right human in the loop fast.

The market is heading toward a split. High volume generic tasks will keep getting automated or pushed to large labor pools, while the highest value work in training and evaluation will flow to networks that can prove expertise, identity, and reliability. That favors platforms built around expert discovery and trust infrastructure, not just contractor throughput.