Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 3.12 KB

2501.00601.md

File metadata and controls

7 lines (5 loc) · 3.12 KB

DreamDrive: Generative 4D Scene Modeling from Street View Images

Synthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize geometry-consistent driving videos through neural rendering, but their dependence on costly object annotations limits their ability to generalize to in-the-wild driving scenarios. On the other hand, generative models can synthesize action-conditioned driving videos in a more generalizable way but often struggle with maintaining 3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references and further elevate them to 4D with a novel hybrid Gaussian representation. Given a driving trajectory, we then render 3D-consistent driving videos via Gaussian splatting. The use of generative priors allows our method to produce high-quality 4D scenes from in-the-wild driving data, while neural rendering ensures 3D-consistent video generation from the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate that DreamDrive can generate controllable and generalizable 4D driving scenes, synthesize novel views of driving videos with high fidelity and 3D consistency, decompose static and dynamic elements in a self-supervised manner, and enhance perception and planning tasks for autonomous driving.

生成从自车驾驶轨迹中获取的真实感视觉观测是实现自动驾驶模型可扩展训练的关键一步。基于重建的方法通过驾驶日志创建三维场景,并利用神经渲染生成几何一致的驾驶视频,但其对高成本目标标注的依赖限制了其在真实场景驾驶环境中的泛化能力。另一方面,生成模型能够以更具泛化性的方式生成动作条件下的驾驶视频,但常常难以保持三维视觉一致性。 在本文中,我们提出了 DreamDrive,一种结合生成和重建优势的四维时空场景生成方法,用于生成具有三维一致性的通用四维驾驶场景和动态驾驶视频。具体而言,我们利用视频扩散模型的生成能力,生成一系列视觉参考,并通过一种新颖的混合高斯表示将其提升到四维场景。在给定驾驶轨迹的情况下,我们通过高斯喷射(Gaussian Splatting)渲染出三维一致的驾驶视频。 生成先验的使用使我们的方法能够从真实驾驶数据中生成高质量的四维场景,而神经渲染则确保了从四维场景生成的驾驶视频具备三维一致性。我们在 nuScenes 和街景图像上的广泛实验表明,DreamDrive 可以生成可控且具有泛化能力的四维驾驶场景,生成高保真且三维一致的新视图驾驶视频,能够以自监督方式分解静态和动态元素,并增强自动驾驶的感知与规划任务。