We address the task of estimating camera parameters from a set of images depicting a scene. Popular f

Scene Coordinate Reconstruction

submited by

Style Pass

2024-04-23 11:00:04

We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. Our method, ACE0 (ACE Zero), estimates camera poses to an accuracy comparable to feature-based SfM, as demonstrated by novel view synthesis.

We visualize the reconstruction process of ACE Zero for some of the scenes form our experiments. During each reconstruction, we show the point cloud extracted from the current implicit scene model. At the end of each reconstruction, we switch to a point cloud extracted from a Nerfacto model trained on top of the ACE Zero camera poses. Use the controls to switch between scenes.