EvoLLM-JP 7B Outperforms 70B
Sakana AI
This result shows that Sakana AI is attacking the cost curve of model building, not just chasing benchmark wins. Instead of spending months pretraining a giant Japanese model from scratch, it recombined existing open models, tested large numbers of variants, and found a much smaller 7B model that performed like a top tier national language model. That matters because a 7B model is far easier to serve, fine tune, and run inside enterprise environments than a 70B model.
-
The workflow is concrete. Sakana starts with a pool of existing models, mixes layers and weights to create child models, scores them on a task specific benchmark, keeps the best ones, and repeats. EvoLLM-JP came out of that loop without gradient based retraining, which means far lower compute use than standard foundation model development.
-
The benchmark win matters most in Japan, where local incumbents like Preferred Networks have trained domestic models such as PLaMo and market them on strong Japanese performance with compact sizes. Sakana is taking a different route, using search and recombination instead of massive pretraining runs on large GPU clusters.
-
A smaller model changes the product surface. A bank, manufacturer, or software vendor can deploy a 7B class model on cheaper hardware, keep more workloads on premises, and tune it for narrow document or workflow tasks. That fits Sakana's licensing model better than selling one giant general purpose model.
The next step is turning evolutionary search into a repeatable model factory for each domain and language. If Sakana can keep producing compact models for finance, government, and other Asian language markets, it can compete less as a frontier lab and more as the fastest way for enterprises to get a strong local model without paying frontier model training costs.