We render a novel free-form trajectory across five highly diverse sequences using the same model. We obtain the trajectory by interpolating between ke

Dynamic 3D Gaussian Fields for Urban Areas | Tobias Fischer

submited by
Style Pass
2024-06-07 18:30:08

We render a novel free-form trajectory across five highly diverse sequences using the same model. We obtain the trajectory by interpolating between keyframes selected throughout a common geographic area at a constant speed of 10 m/s. We render each sequence with its unique apperance and set of dynamic objects, simulating various distinct traffic scenarios.

We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas. Existing works are not well suited for applications like mixed-reality or closed-loop simulation due to their limited visual quality and non-interactive rendering speeds. Recently, rasterization-based approaches have achieved high-quality NVS at impressive speeds. However, these methods are limited to small-scale, homogeneous data, i.e. they cannot handle severe appearance and geometry variations due to weather, season, and lighting and do not scale to larger, dynamic areas with thousands of images. We propose 4DGF, a neural scene representation that scales to large-scale dynamic urban areas, handles heterogeneous input data, and substantially improves rendering speeds. We use 3D Gaussians as an efficient geometry scaffold while relying on neural fields as a compact and flexible appearance model. We integrate scene dynamics via a scene graph at global scale while modeling articulated motions on a local level via deformations. This decomposed approach enables flexible scene composition suitable for real-world applications. In experiments, we surpass the state-of-the-art by over 3 dB in PSNR and more than 100x in rendering speed.

We use sets of 3D Gaussians $G$ as geometry scaffold, neural fields $\phi$ and $\psi$ to represent sequence- and object-specific appearance and geometry variations, and a scene graph $\mathcal{G} = (\mathcal{V}, \mathcal{E})$ to express the scene configuration at each sequence-time pair $(s, t)$. We condition the neural fields with latent codes $\omega$ of the nodes in $\mathcal{V}$. To render a view at $(s, t)$, we compose the sets of 3D Gaussians using the coordinate system transformations $[\mathbf{R} | \mathbf{t}]$ along the edges $\mathcal{E}$.

Leave a Comment