Action Models Drive Humanoid Advantage
Sankaet Pathak, CEO of Foundation, on why humanoids win in robotics
The hard part in humanoid robotics is no longer getting a model to describe a task, it is getting a robot to move its body reliably in the real world. Foundation is saying the language layer is already cheap and available, but the action layer still has to be built from scratch around scarce robot data, tight latency, and physical failure costs. That is why its core technical bet is on a more data efficient state based action model, not on a generic off the shelf VLA stack.
-
In practice, high level reasoning can already turn a prompt like bring me something healthy into a task list. The missing piece is low level control, the stream of joint moves, grip force, and body positioning that lets a robot actually pick up the apple without dropping it or colliding with the environment.
-
The closest off the shelf systems today are general robotics foundation models like Physical Intelligence openpi and Covariant RFM-1. But they are built for broad experimentation or warehouse manipulation, not necessarily for a new humanoid fleet operating with limited proprietary data and needing immediate production reliability.
-
That creates a real moat for vertically integrated humanoid companies. The winner is unlikely to be whoever has the flashiest language model. It is more likely to be whoever ships robots, captures intervention data from live jobs, and turns those edge cases into better action policies faster than rivals.
Over the next few years, action models should become the main competitive bottleneck in humanoids. As reasoning layers commoditize further, value will shift toward companies that pair deployed robots with teleoperation, data collection, and retraining loops, because that is how a fragile demo becomes a worker that can stay on the line all day.