Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training times, and temporal consistency. To address these issues, we introduce a novel neural video representation that combines 3D Gaussian splatting with continuous camera motion modeling. By leveraging Neural ODEs, our approach learns smooth camera trajectories while maintaining an explicit 3D scene representation through Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy, progressively refining spatial and temporal features to enhance reconstruction quality and accelerate convergence. This memory-efficient approach achieves high-quality rendering at impressive speeds. Experimental results show that our hierarchical learning, combined with robust camera motion modeling, captures complex dynamic scenes with strong temporal consistency, achieving state-of-the-art performance across diverse video datasets in both high- and low-motion scenarios.
高效的神经表示对于动态视频场景在视频压缩和交互式仿真等应用中至关重要。然而,现有方法常面临高内存占用、训练时间过长以及时间一致性不足等问题。为了解决这些问题,我们提出了一种新的神经视频表示方法,将 3D 高斯点绘制(3D Gaussian Splatting) 与连续相机运动建模相结合。 通过引入 神经微分方程(Neural ODEs),我们的方法在保持通过高斯进行显式 3D 场景表示的同时,学习平滑的相机轨迹。此外,我们提出了一种 时空分层学习策略(spatiotemporal hierarchical learning strategy),逐步优化空间和时间特征,以提升重建质量并加速收敛。该方法在内存高效的基础上,实现了高质量渲染,速度表现尤为出色。 实验结果表明,时空分层学习与稳健的相机运动建模相结合,使得该方法能够捕捉复杂的动态场景,同时保持强大的时间一致性。在高运动和低运动场景下的多样化视频数据集上,该方法均达到了当前最先进的性能表现。