Generative Omnimatte Learning to Decompose Video into Layers

submited by
Style Pass
2024-11-27 02:00:02

We compare our method with existing omnimatte methods (Omnimatte, Omnimatte3D, OmnimatteRF, and FactorMatte). Existing methods rely on restrictive motion assumptions, such as stationary background, resulting in dynamic background elements becoming entangled with foreground object layers. Omnimatte3D and OmnimatteRF may also produce blurry background layers (e.g., horses) because their 3D-aware background representations are sensitive to camera pose estimation quality. Furthermore, these methods lack a generative and semantic prior for completing occluded pixels and accurately associating effects with their corresponding objects.

We compare our object-effect-removal model, Casper, with existing methods for object removal. Video inpainting models (ProPainter and Lumiere-Inpainting) fail to remove soft shadows and reflections outside the input masks. ObjectDrop is an image-based model, and thus, it processes each video frame independently and inpaints regions without global context and temporal consistency. We use the same ratio of mask dilation for all the methods.

Given an input video and binary object masks, we first apply our object-effect-removal model, Casper, to generate a clean-plate background and a set of single-object (solo) videos applying different trimask conditions. The trimasks specify regions to preserve (white), remove (black), and regions that potentially contain uncertain object effects (gray). In Stage 2, a test-time optimization reconstructs the omnimatte layers Oi from pairs of solo video and background video.

Leave a Comment
Related Posts