Scale Commoditizing Customers' Proprietary Models
Scale: the $290M/year Mechanical Turk of machine learning
Moving from data vendor to model vendor turns Scale from neutral infrastructure into a potential competitor, and that changes how customers think about every dataset they share. Once Scale is offering training, evaluation, and deployment on top of customer data, a buyer has to ask whether today’s labeling work is also making tomorrow’s off the shelf model better. That is why data use restrictions become strategic, not just legal boilerplate.
-
Scale already sits close to the customer’s crown jewels. Its core business is handling proprietary images, documents, prompts, and human feedback that customers use to improve models, and its newer products extend further into training, testing, and production. That increases the value of Scale’s platform, but also increases the trust burden.
-
The comparable pattern in ML tooling is vertical integration. Nyckel argues the winning product bundles labeling, training, deployment, and monitoring into one workflow, while Dataiku packages technical and non technical work into one system of record. The more Scale follows that path, the more it standardizes capabilities that used to live inside customer specific model stacks.
-
This is no longer a theoretical concern. After Meta bought a 49% stake in Scale in June 2025, major model builders including Google and OpenAI pulled back, because the same vendor that touched sensitive training pipelines was now tied to a direct platform rival. Neutrality became a product feature that competitors could sell against.
The next phase of the market favors vendors that can offer full stack ML products without asking customers to weaken control over proprietary data. Scale can still win by becoming the easiest system for turning raw company data into deployed AI, but the durable leaders will pair vertical integration with strict data isolation, model separation, and contract terms that make neutrality visible and enforceable.