User Engagement as Training Labels
Scale: the $290M/year Mechanical Turk of machine learning
The key shift is that many AI products now turn usage into training data, which makes data labeling part of the product loop instead of a separate procurement line item. When a user edits an output, picks the better of two answers, flags a bad response, or keeps using one workflow and abandons another, the app is collecting preference data and edge cases that can be used to tune prompts, ranking, and fine-tuned models with far fewer dedicated annotators.
-
Foundation models cut the amount of manual labeling needed to get useful performance. In many app workflows, a product manager or end user can label the last 100 hard examples in the UI while the base model handles the rest, which is why the bottleneck shifts from bulk annotation to product instrumentation and feedback capture.
-
The engagement mechanic is the label. Captcha turned image recognition into a user task. Modern AI apps do the same with thumbs up, regenerate, choose best draft, accept or reject moderation, or voice interview follow ups. Jasper showed how shipping quickly with few-shot and fine-tuning can start this usage data flywheel fast.
-
This changes who wins. Large crowdwork vendors were built to sell outsourced labor by the hour or by the task. Newer systems win by sitting inside the workflow, collecting higher signal feedback from users, or by sourcing narrower expert pools for the remaining hard cases like legal, medical, safety, and cultural nuance.
The next phase is a split market. Commodity labeling keeps shrinking as apps learn from their own users, while human data spending concentrates in two places, expert evaluation for hard domains and ongoing trust, safety, and cultural review. The most durable AI products will be the ones that bake feedback collection directly into everyday use and reserve paid human labor for the narrow slices where judgment still matters most.