Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to impressive progress. Still, imitation alone remains insuffic

From Imitation to Refinement -- Residual RL for Precise Assembly

submited by
Style Pass
2024-11-06 22:00:02
Recent advances in behavior cloning (BC), like action-chunking and diffusion, have led to impressive progress. Still, imitation alone remains insufficient for tasks requiring reliable and precise movements, such as aligning and inserting objects. Our key insight is that chunked BC policies function as trajectory planners, enabling long-horizon tasks. Conversely, as they execute action chunks open-loop, they lack the fine-grained reactivity necessary for reliable execution. Further, we find that the performance of BC policies saturates despite increasing data. Reinforcement learning (RL) is a natural way to overcome this, but it is not straightforward to apply directly to action-chunked models like diffusion policies. We present a simple yet effective method, ResiP (Residual for Precise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with RL. The residual policy is trained via on-policy RL, addressing distribution shifts and introducing reactivity without altering the BC trajectory planner. Evaluation on high-precision manipulation tasks demonstrates strong performance of ResiP over BC methods and direct RL fine-tuning. Videos, code, and data are available at \url{https://residual-assembly.github.io}.
Leave a Comment
Related Posts