Neutral Human Evaluation for AI Safety
Jemma White, COO of Prolific, on why humans ensure AI safety
The Meta deal turned neutrality into a product feature. In this market, labs are not just buying labels, they are handing over model prompts, failure cases, eval sets, and roadmap clues, so once a vendor is tied to a direct rival, customers start treating that vendor as part of the rival stack. That is why work moved back into the market and toward providers like Prolific, where the pitch is less giant labor pool and more trusted, auditable access to independent humans.
-
Scale built its business by bundling software with a large contractor base, then grew from $760M ARR in 2023 to $1.5B by end of 2024 as frontier labs bought RLHF and eval capacity fast. That scale made it a core supplier, but also made independence more important once Meta bought in.
-
Prolific is positioned differently. Customers set filters, choose sample size and pay rates, and tap a pool of 200,000 ID verified participants across 40 plus countries. That makes it useful for second opinion testing, specialized raters, and sensitive safety work where customers want distance from any one frontier lab.
-
The beneficiaries are not just Prolific. Invisible, Mercor, Handshake, and Surge are all part of the same reshuffle toward expert pools and managed workflows. The center of gravity is moving from generic crowd labor to providers that can supply domain experts, traceable outputs, and cleaner separation from model competitors.
Going forward, the winning vendors in human data will look less like outsourced annotation factories and more like neutral infrastructure for evaluation and reasoning work. Independence, specialist supply, and auditability are becoming the three things labs will pay up for, which should keep pushing spend away from conflicted platforms and toward trusted multi customer providers.