We’ve seen humanoid robots perform impressive acrobatics, like cartwheels, backflips, and even complex dance routines, for decades. Yet, hardly any can reliably climb any type of staircase or difficult obstacles (like stepping stones) in the wild. This is a classic example of Moravec’s Paradox: things that come easily to humans are hard for robots and vice versa.
Stair climbing, for instance, demands intricate coordination between visual perception and motor control. The robot must interact precisely with the physical structure of the stairs, adapting dynamically to variations in step height and geometry. In contrast, acrobatics and dancing are typically performed in free space and can often be executed blind without any visual input, relying solely on proprioception and internal motor sensing.
If you ask someone, “How many steps are there in the stairs leading from the street to their apartment?”, they probably wouldn’t know. Unlike today’s humanoid solutions, humans don’t build detailed terrain maps. We navigate complex terrain effortlessly, not by mapping and pre-planning every step, but by seeing and reacting in the moment.