Owning the LLM Feedback Loop
Kyle Corbitt, CEO of OpenPipe, on the future of fine-tuning LLMs
This reveals that OpenPipe’s real moat is not just training models, it is owning the feedback loop that tells it which training changes actually work in production. Because teams log live prompts and define their own evals inside the platform, OpenPipe can quietly test new base models, hyperparameters, and training recipes on representative historical workloads, then score them against the exact pass fail logic customers already use. That turns customer traffic into a continuous improvement engine for the product itself.
-
The workflow is unusually concrete. A team swaps in OpenPipe’s SDK as a drop in OpenAI replacement, logs production requests and responses, filters those logs into datasets, fine tunes a model, and then runs code evals, rubric based LLM judge evals, or side by side comparisons before deployment. The shadow jobs sit on top of this existing pipeline.
-
What matters is not just access to datasets, but access to user defined evaluations. Most teams doing fine tuning still rely on rough output checks, while OpenPipe has customers specifying what good looks like for their own task. That lets it compare different training choices on the same tasks and improve faster than a vendor that only offers training infrastructure.
-
This also explains the buyer and competitive position. OpenPipe is built for product teams that already ship an LLM feature and want a one stop loop for logging, dataset prep, evals, training, and deployment. Predibase is closer to an ML platform for managing fleets of custom models, while self managed tools like Unsloth and Axolotl leave the eval and workflow glue to the customer.
The next step is that fine tuning platforms become reward and improvement systems, not just training consoles. As more agent teams define rubrics, collect preference data, and route live traffic through the platform, the winning product will be the one that can most quickly turn production traces into better models, better agents, and eventually a full closed loop post training stack.