A recent conversation with a friend in the robotics industry naturally led to a discussion of how the industry as a whole can build more generalizable

Foundation models, internet-scale data, and the path to generalized robots

submited by
Style Pass
2024-02-13 05:00:03

A recent conversation with a friend in the robotics industry naturally led to a discussion of how the industry as a whole can build more generalizable robotic systems: systems that can perform well in novel tasks with little new data, systems that are robust to task perturbations, and systems that can adapt to different robot morphologies. We discussed prospects for leveraging large datasets of heterogeneous robotics data and the lessons we can take from the astonishing success of LLMs/other foundation models. This post explores that topic further.

Below I summarize some of the areas of intense general robotics research over the last few years. It focuses on the attempts to leverage LLMs/VLMs and related techniques for robotics and what techniques might prove fruitful for extracting the most value out of large robotics datasets. These techniques are being applied all across the robotics field to varying success. Considering my active work on driverless vehicles, I avoid analyzing related advances in autonomous driving. Instead, I focus this review mostly on grasping and manipulation robots, as that is an area of particular research emphasis. I think the lessons learned in this robotics sub-domain extend to most other areas of robotics.

As a preliminary historical note, the idea of using data from other robots or models trained on non-robotics data to build better robots is still a novel one. The hand-engineered components in conventional robotics systems (pre-deep learning) struggled to leverage robot data at all except to scrutinize directly for debugging purposes. To be clear, model based approaches can still produce powerful results, including the notable robot parkour exhibited by Boston Dynamics robots. However, this post assumes a more modern architecture that includes data driven components that are architected to enable learning from data, whether generated directly by the target platform or otherwise.

Leave a Comment