kforge standalone kernel generation toolkit
Gimlet Labs
kforge matters because it turns Gimlet from a cloud operator into a picks and shovels supplier for any team trying to make AI models run well on mismatched chips. The key product step is that it already lives outside the managed cloud as its own toolkit and site, generating kernels from PyTorch across CUDA, ROCm, and Metal, which makes it useful to chip vendors, model labs, and enterprise platform teams even if they never buy inference capacity.
-
This is a separate buying motion, not just a cloud feature. Gimlet presents kforge as a standalone toolkit with its own product surface, while the hosted cloud is listed separately, which supports a software licensing path alongside usage based infrastructure revenue.
-
The product solves a very concrete pain point. New chips usually need engineers to rewrite and retune low level kernels by hand. kforge instead searches for working, fast implementations automatically, while Gimlet ties that into a broader compiler stack meant to port workloads to new hardware without code changes.
-
The market need is real and getting broader. MLIR was built to reduce fragmentation across heterogeneous hardware, and Meta has now described the same kernel optimization problem across NVIDIA, AMD, and its own chips. That makes portable kernel generation look like an emerging software category, not a one off feature.
The next step is for kforge to become the control point between model code and an increasingly fragmented accelerator market. If Gimlet keeps expanding backend support and packaging the compiler and scheduler around it, the company can sell into far more accounts than a single inference cloud could reach.