Sakana's Model Merge for Regional Markets

Sakana AI

This approach avoids the trillion-token pretraining costs typically required for new language coverage.

Analyzed 4 sources

The real advantage is not just lower training spend, it is faster entry into regional markets with models built from parts that already work. Sakana can take an English model that is strong at coding or reasoning, combine it with a local language model, and search through many merged versions until one performs well. That turns language expansion from a giant data collection and pretraining project into a much shorter model assembly and testing loop.

1 sacra 2 sakana 3 arxiv

Sakana already used this playbook in Japan. Its Evolutionary Model Merge system produced EvoLLM-JP and EvoVLM-JP by combining existing models instead of training a new foundation model from scratch, and the resulting Japanese model outperformed much larger prior Japanese models on benchmarks.

1 sacra 2 sakana 3 arxiv
That matters most in languages where the commercial opportunity is real but the open web corpus is smaller than English. Pretraining a frontier model now often runs into the trillions of tokens, so reusing existing English and local language building blocks can cut both data requirements and GPU time by a large margin.

3 arxiv 4 nvidia
The competitive edge is practical. Large incumbents like SoftBank and other frontier labs can afford giant pretraining runs, but Sakana can pitch banks, telecoms, and government contractors on a cheaper path to local models that fit regional compliance and on premises deployment needs.

1 sacra

If this works across Korean, Thai, and Bahasa workflows, Asian language coverage becomes less of a scale game and more of a search and distribution game. That would let Sakana expand country by country through enterprise partnerships, while larger rivals are still justifying the cost of training another full model stack.

1 sacra