Tinker becomes metered model infrastructure
Thinking Machines
This turns Tinker from a research demo into metered infrastructure for custom models. Pricing per million tokens means a lab or enterprise can treat fine tuning like cloud compute, while the open weight lineup means they can start with a cheap 1B to 8B Llama or Qwen model, then move up to larger MoE models like DeepSeek, Qwen3, or Kimi without changing tools. Checkpoint downloads also matter, because customers keep the tuned weights instead of being locked into a hosted black box.
-
The product is exposing the actual cost drivers of training. Tinker lists separate prefill, sample, and train rates in USD per million tokens, plus storage, so buyers can estimate spend from dataset size and usage patterns instead of negotiating custom contracts first.
-
The model roster is broad enough to cover several workflows. Dense Llama models fit smaller, cheaper task tuning. MoE models like Qwen3, DeepSeek-V3.1, GPT-OSS, and Kimi-K2 are aimed at heavier reasoning and multimodal work. Tinker also added vision language tuning and an OpenAI compatible inference interface as it opened general availability.
-
The closest product pattern is fine tuning platforms like OpenPipe, where the value is turning prompt heavy workflows into cheaper, more reliable tuned models and letting teams export what they trained. Thinking Machines pushes further down the stack with primitives like forward_backward and optim_step, which makes it look more like managed training infrastructure for researchers than a no code tuning tool.
This is heading toward a market where open weight post training is bought the same way teams buy GPUs and APIs today. If Thinking Machines keeps adding larger open models, multimodal support, and exportable checkpoints, it can become the default place to customize frontier class open models while clouds and closed model labs stay focused on generic hosted inference.