Frontier Models as Expensive Teachers
Kyle Corbitt, CEO of OpenPipe, on the future of fine-tuning LLMs
This shift turns frontier models into expensive teachers and open models into cheap production workers. The pattern matters because most high volume AI features do not need broad world knowledge on every call, they need the same judgment repeated fast and consistently, like classifying a receipt, extracting fields from a contract, or rewriting text into a house style. Once a company has enough real examples, fine tuning a smaller model can keep quality high while cutting latency and inference cost sharply.
-
The workflow is concrete. Teams first ship a prompt based feature on OpenAI or Anthropic, log real requests and outputs, clean or relabel a few hundred to a few thousand examples, then train a task specific model that can replace the original API call with minimal code changes.
-
The gain is not just lower cost. Fine tuning is most useful when the task has multi step instructions or edge cases that prompting follows inconsistently. In that setting, the smaller model is being trained to repeat one narrow behavior over and over, not to match GPT 4 everywhere.
-
Notion has described this architecture directly, using the best model for each job and deploying specialized fine tuned models that cut latency in half while improving quality. Databricks has built its stack around serving and fine tuning open models like Llama and Mistral, which shows this is becoming a standard production pattern rather than an edge case.
As more companies accumulate production logs and evaluation data, more AI workloads will move from frontier APIs to custom small models. That pushes the market toward tooling for data collection, evals, retraining, and deployment, because the real moat stops being access to a base model and becomes the quality of the feedback loop around a specific task.