Fine-tuning optional for document workflows
Fireworks AI customer at Hebbia on serving state-of-the-art models with unified APIs
This reveals that Hebbia treated model progress as an external supply curve, not an internal training problem. For Hebbia’s document analysis workflows, the bottleneck was getting the newest strong open models into production fast, with stable latency and routing, not squeezing marginal gains from retraining weights. In practice, prompting, retrieval, and model choice were enough to adapt models to customer work, while each new model release delivered the bigger jump in reasoning quality.
-
Hebbia’s actual workloads were mixed, high concurrency chat for analysts, token heavy batch reviews of large data rooms, and bursty model experimentation. That made unified APIs, token controls, observability, and same day access to new checkpoints more valuable than a custom fine tuning pipeline.
-
This is a specific product philosophy, not a universal law. OpenPipe argues the opposite for repetitive tasks, that fine tuning smaller models can cut inference cost, improve reliability on nuanced instructions, and beat stuffing many examples into every prompt. The split depends on whether the job is general reasoning or narrow repeatable classification.
-
The provider choice followed that philosophy. Hebbia compared Fireworks mainly against Bedrock, and picked Fireworks because its catalog of fresh open models moved faster and fit Hebbia’s existing OpenAI style abstraction layer. Raw GPU clouds were not serious options because Hebbia did not want to own training, scheduling, or serving complexity.
As frontier models keep improving in large discrete jumps, more application companies will treat fine tuning as optional and compete on routing, retrieval, and workflow design instead. Fine tuning will remain strongest in narrow, high volume tasks where consistency and token efficiency matter more than having instant access to the newest reasoning model.