Is pricing and latency a bottleneck for Nyckel in light of critics saying LLM/GPT latency and costs must reduce 10x?

Most of the models we're using right now are not as big as GPT-3, so we don't have the same issues. None of our customers have issues with pricing or latency.

The way we've set up our architecture is that we have a suite of deep nets that are shared among all the customers. They do some of the processing, and we can amortize the cost of keeping those nets warm. Then, there are secondary nets that customize and make it do exactly what the customer wants, and those are very light. So the cost of deploying those is almost negligible. It allows us to spin up and have a very elastic infrastructure on those shallow nets or those smaller nets while we amortize the cost of the bigger nets across the whole customer base. I think that's what GPT-3 does as well. It has one deployed net that powers all queries.

The problem is when I fine-tune that GPT-3 model, it becomes very expensive because you have to have a whole GPT-3 warm for every customer, which is probably not very feasible.

I think the trend line there is very hard to predict. Now we're getting into research territories. Can you get a model that generalizes as well as GPT-3, but it's like a hundredth of the size? I don't know. It's a big leap, but that would probably change the game for them, at least.