Provenance-First Platforms Win Human Data

Diving deeper into

$350M/year Mercor for human personality

Document
As Amazon’s Mechanical Turk (launched 2005) became increasingly gamed by bots and “professional Turkers” with keyboard macros & autoclickers, Prolific (2014) launched as a provenance-first alternative for academic researchers gathering data
Analyzed 5 sources

Prolific won by treating worker identity and response quality as the product, not a side effect of volume. Mechanical Turk was optimized for cheap anonymous task completion, which worked for simple labeling but broke down for academic research that needed known demographics, repeat participants, and lower fraud. Prolific built verified participant records, direct pay visibility, and fast matching around those needs, which later made it a natural fit for AI labs that needed trusted humans rather than the biggest possible crowd.

  • Mechanical Turk and the first wave of AI labeling were built around scale and low cost. That model fit image boxes and basic ranking tasks. As model work moved toward safety review, cultural nuance, and preference data, buyers needed people whose background, language, and behavior were actually known in advance.
  • Prolific’s early academic workflow was concrete. A researcher chose exactly who to recruit, saw what participants would be paid, and got responses quickly from a standing pool that had already been screened and profiled. That is very different from a generic job board or open crowd market where quality has to be reconstructed after the fact.
  • The competitive split now is clearer. Mercor and Handshake index on credentialed experts like doctors and lawyers for reasoning era post training. Prolific indexes on breadth of humanity, with participants segmented by demographics, languages, behaviors, and personality related traits. Scale sits further toward large managed labeling programs and tooling.

This market is moving from anonymous labor pools toward owned supply with deeper provenance. As AI products shift into companion, education, healthcare, and global consumer use cases, the winning platforms will be the ones that can reliably produce the exact kind of human judgment a model needs, whether that means expert knowledge, cultural fluency, or personality level fit.